From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id B22F5B0FE for ; Fri, 16 May 2014 19:01:34 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 16 May 2014 10:01:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,1068,1389772800"; d="scan'208";a="540326607" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by fmsmga002.fm.intel.com with ESMTP; 16 May 2014 10:01:15 -0700 Received: from irsmsx152.ger.corp.intel.com (163.33.192.66) by IRSMSX104.ger.corp.intel.com (163.33.3.159) with Microsoft SMTP Server (TLS) id 14.3.123.3; Fri, 16 May 2014 18:01:14 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.70]) by IRSMSX152.ger.corp.intel.com ([169.254.6.128]) with mapi id 14.03.0123.003; Fri, 16 May 2014 18:01:14 +0100 From: "Ananyev, Konstantin" To: Olivier MATZ , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support Thread-Index: AQHPa5ZWUBVmR8JmOEOPGVNhAVBtGJtBxefg///59wCAABPWgIABRFSAgABRdbA= Date: Fri, 16 May 2014 17:01:13 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580EFA6ED5@IRSMSX105.ger.corp.intel.com> References: <1399647038-15095-1-git-send-email-olivier.matz@6wind.com> <1399647038-15095-12-git-send-email-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB9772580EFA6ADD@IRSMSX105.ger.corp.intel.com> <5374DFC4.50808@6wind.com> <2601191342CEEE43887BDE71AB9772580EFA6B2A@IRSMSX105.ger.corp.intel.com> <53760079.7090106@6wind.com> In-Reply-To: <53760079.7090106@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 May 2014 17:01:35 -0000 Hi Oliver, >Yes, recalculating the pseudo-header checksum without the ip_len >is a slow down. This slow down should however be compared to the >operation in progress. When you do TSO, you are generally transmitting >a large TCP packet (several KB), and the cost of the TCP stack is >probably much higher than fixing the checksum. You can't always predict the context in which PMD TX routine will be called= . Consider the scenario: one core doing IO over several ports, while few othe= r cores doing upper layer processing of the packets. In that case, pseudo-header checksum (re)calculation inside PMD TX function= will slow-down not only that particular packet flow, but the RX/TX over all ports that are managed by the given core. =20 That's why I think that=20 rte_eth_dev_prep_tx(portid, mbuf[], num) rte_eth_dev_tx(portid, mbuf[], num) might have an advantage over rte_eth_dev_tx(portid, mbuf[], num) /* the tx does the cksum job */ As it gives us a freedom to choose: do prep_tx() either on the same execut= ion context with actual tx() or on different one.=20 Though yes, it comes with a price: extra function call with all correspond= ing drawbacks. Anyway, right now we probably can argue for a while trying to define how ge= neric TX HW offload API should look like. So, from your options list: >1/ Update patch to calculate the pseudo header without the ip_len when > doing TSO. In this case the API is mapped on ixgbe behavior, > probably providing the best performance today. If another PMD comes > in the future, this API may change to something more generic. >2/ Try to define a generic API today, accepting that the first driver > that supports TSO is a bit slower, but reducing the risks of changing > the API for TSO in the future. If #1 means moving pseudo checksum calculation out of PMD code, then my vo= te would be for it. Konstantin -----Original Message----- From: Olivier MATZ [mailto:olivier.matz@6wind.com]=20 Sent: Friday, May 16, 2014 1:12 PM To: Ananyev, Konstantin; dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support Hi Konstantin, On 05/15/2014 06:30 PM, Ananyev, Konstantin wrote: > With the current DPDK implementation the upper code would still be differ= ent for TCP checksum (without segmentation) and TCP segmentation: > different flags in mbuf, with TSO you need to setup l4_len and mss fields= inside mbuf, with just checksum - you don't. You are right on this point. > Plus, as I said, it is a bit confusing to calculate PSD csum once in the = stack and then re-calculate in PMD. > Again - unnecessary slowdown. > So why not just have get_ipv4_psd_sum() and get_ipv4_psd_tso_sum() inside= testpmd/csumonly.c and call them accordingly? Yes, recalculating the pseudo-header checksum without the ip_len is a slow down. This slow down should however be compared to the operation in progress. When you do TSO, you are generally transmitting a large TCP packet (several KB), and the cost of the TCP stack is probably much higher than fixing the checksum. But that's not the main argument: my idea was to choose the proper API that will reduce the slow down for most cases. Let's take the case of a future vnic pmd driver supporting an emulation of TSO. In this case, the calculation of the pseudo header is also an unnecessary slowdown. Also, some other hardware I've seen don't need to calculate a different pseudo header checksum when doing TSO. Last argument, the way Linux works is the same that what I've implemented. See in linux ixgbe driver [1] at line 6461, there is a call to csum_tcpudp_magic() which reprocesses the checksum without the ip_len. On the other hand, that's true that today ixgbe is the only hardware supporting TSO in DPDK. The pragmatic approach could be to choose the API that gives the best performance with what we have (the ixgbe driver). I'm ok with this approach if we accept to reconsider the API (and maybe modifying it) when another PMD supporting TSO will be implemented. > About potential future problem with NICs that implement TX checksum/segme= ntation offloads in a different manner - yeh that's true... > I think at the final point all that logic could be hidden inside some fun= ction at rte_ethdev level, something like: rte_eth_dev_prep_tx(portid, mbu= f[], num). I don't see the real difference between: rte_eth_dev_prep_tx(portid, mbuf[], num) rte_eth_dev_tx(portid, mbuf[], num) and: rte_eth_dev_tx(portid, mbuf[], num) /* the tx does the cksum job */ And the second is faster because there is only one pointer dereference. > So, based on mbuf TX offload flags and device type, it would do necessar= y modifications inside the packet. > But that's future discussion, I suppose. To me, it's not an option to fill that the network stack fills the mbuf differently depending on the device type. For instance, when doing ethernet bonding or bridging, the stack may not know which physical device will be used at the end. So the API to enable TSO on a packet has to be the same whatever the device. If fixing the checksum in the PMD is an unnecessary slowdown, forcing the network stack to check what has to be filled in the mbuf depending on the device type also has a cost. > For now, I still think we need to keep pseudo checksum calculations out o= f PMD code. To me, there are 2 options: 1/ Update patch to calculate the pseudo header without the ip_len when doing TSO. In this case the API is mapped on ixgbe behavior, probably providing the best performance today. If another PMD comes in the future, this API may change to something more generic. 2/ Try to define a generic API today, accepting that the first driver that supports TSO is a bit slower, but reducing the risks of changing the API for TSO in the future. I'm fine with both options. Regards, Olivier [1]=20 http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe= _main.c?v=3D3.14#L6434