From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id E97B35A94 for ; Mon, 19 Jan 2015 14:04:49 +0100 (CET) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP; 19 Jan 2015 05:01:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,426,1418112000"; d="scan'208";a="664040050" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by fmsmga002.fm.intel.com with ESMTP; 19 Jan 2015 05:04:43 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.81]) by IRSMSX107.ger.corp.intel.com ([169.254.10.75]) with mapi id 14.03.0195.001; Mon, 19 Jan 2015 13:04:42 +0000 From: "Ananyev, Konstantin" To: Olivier MATZ , "Liu, Jijiang" Thread-Topic: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine Thread-Index: AQHQFSvOf5ZHcwOIQ0S6LPFf4KIhoZyLUqAAgADVrYCAJ+m7AIAAhGKAgAAcY4CAAAaCgIABXQMAgAAfBJCAAZMJAIAEQKUAgACGkwCAAQFCAIAAcvAAgAEemACAAkDH8IAB1gMAgARiT1A= Date: Mon, 19 Jan 2015 13:04:41 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> References: <1418173403-30202-1-git-send-email-jijiang.liu@intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA1B70@SHSMSX101.ccr.corp.intel.com> <548B18C9.3020408@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7699@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D337B@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA789E@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D34AE@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7CC5@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D3897@irsmsx105.ger.corp.intel.com> <54AFB13E.2080200@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA85A1@SHSMSX101.ccr.corp.intel.com> <54B3B35A.5030803@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA8E36@SHSMSX101.ccr.corp.intel.com> <54B4EB92.40209@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DB0789@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D4FCF@irsmsx105.ger.corp.intel.com> <54B94A18.5030700@6wind.com> In-Reply-To: <54B94A18.5030700@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jan 2015 13:04:50 -0000 Hi Olivier, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Friday, January 16, 2015 5:28 PM > To: Ananyev, Konstantin; Liu, Jijiang > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and cs= um forwarding engine >=20 > Hi Konstantin, Hi Jijiang, >=20 > On 01/15/2015 02:31 PM, Ananyev, Konstantin wrote: > > To be honest, there are so many mails around that subject, so I am alre= ady lost :) > > Oliver, as I understand you are not happy with the test-pmd commands Fr= ank is proposing. > > Both syntax and semantics. > > Is that correct? > > If so, could you suggest something from your side? > > That would allow to configure test-pmd to behave in any of 4 possible w= ays we discussed previously: > > http://dpdk.org/ml/archives/dev/2014-December/009213.html >=20 > I first wanted to send a mail to describe the current problem with > testpmd command line and the 2 solutions (Jijiang's and mine). > But, first, I think we need to fully clarify the checksum offload > API through examples as it will help to implement testpmd and do > the documentation. They are based on Jijiang's previous mail [1]. >=20 > I will submit a patchset fixing the problems described below in > the coming days. If we agree on it, I'll submit another one for testpmd. >=20 > Let's use the following packet for all the examples below: > out_eth / out_ip / out_udp / vxlan / in_eth / in_ip / in_tcp >=20 >=20 > The following cases are supposed to work on niantic and fortville: >=20 > case 1) calculate checksum of out_ip (was case A in [1]) >=20 > mb->l2_len =3D len(out_eth) > mb->l3_len =3D len(out_ip) > mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_IP_CSUM > set out_ip checksum to 0 in the packet >=20 > supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM >=20 > case 2) calculate checksum of out_ip and out_udp >=20 > mb->l2_len =3D len(out_eth) > mb->l3_len =3D len(out_ip) > mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM > set out_ip checksum to 0 in the packet > set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >=20 > supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and > DEV_TX_OFFLOAD_UDP_CKSUM >=20 > *Problem 1*: The comment above PKT_TX_IPV4 says "Packet is IPv4 > without requiring IP checksum offload" [2], and the help of L4 > checksum and TSO says that it is required to set the PKT_TX_IPV4 > flag [3]. This is not coherent. So what is the problem? Comments in rte_mbuf.h are not coherent? >=20 > We are back on the debate about the meaning of PKT_TX_IPV4 vs > PKT_TX_IP_CSUM from [4]. This incoherency in comments are introduced > by patch [5]. The question is "when an application should set > this flag? for any IP packet that does not require IP checksum?". Yes, if it is an IPv4 packet and application required TX offload for L4 che= cksum or TSO, but doesn't want HW offload ofr IPV4 checksum calculation.=20 > This would break many applications. Which ones? As I know, so far nothing is broken. > I think a good definition would > be: >=20 > Packet is IPv4. This flag must be set when using any offload > feature (TSO, L3 or L4 checksum) to tell the NIC that the packet > is an IPv4 packet. >=20 > That's why I added PKT_TX_IPV4 in the examples. I suppose we discussed it several times: both ways are possible. >>From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM As mutually exclusive seems a bit more plausible. >>From the upper layer - my understanding, that it is doesn't really matter.= =20 I thought we had an agreement about it in 1.8, no? >=20 > case 3) calculate checksum of in_ip >=20 > mb->l2_len =3D len(out_eth + out_ip + out_udp + vxlan + in_eth) > mb->l3_len =3D len(in_ip) > mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_IP_CSUM > set in_ip checksum to 0 in the packet >=20 > This is similar to case 1), but l2_len is different. >=20 > supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM >=20 > Note that it can only work if outer L4 checksum is 0. >=20 > case 4) calculate checksum of in_ip and in_tcp (was case B.2 in [1]) >=20 > mb->l2_len =3D len(out_eth + out_ip + out_udp + vxlan + in_eth) > mb->l3_len =3D len(in_ip) > mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_TCP_CKSUM > set in_ip checksum to 0 in the packet > set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() >=20 > This is similar to case 2), but l2_len is different. >=20 > supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and > DEV_TX_OFFLOAD_TCP_CKSUM >=20 > Note that it can only work if outer L4 checksum is 0. >=20 > case 5) segment inner TCP >=20 > mb->l2_len =3D len(out_eth + out_ip + out_udp + vxlan + in_eth) > mb->l3_len =3D len(in_ip) > mb->l4_len =3D len(in_tcp) > mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | > PKT_TX_TCP_SEG; > set in_ip checksum to 0 in the packet > set in_tcp checksum to pseudo header without including the IP > payload length using rte_ipv4_phdr_cksum() >=20 > supported on hardware advertising DEV_TX_OFFLOAD_TCP_TSO. >=20 > Note that it can only work if outer L4 checksum is 0. >=20 > Problem 1 is also visible here. >=20 >=20 > The following cases are supposed to *work on fortville*: >=20 > case 6) calculate checksum of out_ip, in_ip, in_tcp (was case C in [1]) >=20 > mb->outer_l2_len =3D len(out_eth) > mb->outer_l3_len =3D len(out_ip) > mb->l2_len =3D len(out_udp + vxlan + in_eth) > mb->l3_len =3D len(in_ip) > mb->ol_flags |=3D PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \ > PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; > set out_ip checksum to 0 in the packet > set in_ip checksum to 0 in the packet > set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() >=20 > supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM, > DEV_TX_OFFLOAD_UDP_CKSUM and DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM >=20 > *Problem 2*: it is not written in the API comments that out_ip > checksum field should be set to 0 by the application. They should > be enhanced. Ok >=20 > *Problem 3*: without using the word "fortville", it is difficult > to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed, > once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a > tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT > flag. In linux, the driver doesn't care about the tunnel type, > it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6]. It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set, but it is not obvious what type of tunnelling it would be. FVL HW supports HW TX offloads for different type of tunnelling and requires that SW provide information about tunnelling type. >>From i40e datasheet: L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication: 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to z= ero) 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve). 10b - GRE tunneling header As we do plan to support other than UDP tunnelling types, I suppose we'll n= eed to keep =20 PKT_TX_UDP_TUNNEL_PKT flag. >=20 > *Problem 4*: features flags are missing here. A flag > DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM should be added. This is already > addressed by one patch from Jijiang [7] Ok, yes I think Frank already submit a patch for that. >=20 >=20 > The cases should work in some *future drivers*: >=20 > case 7) calculate checksum of out_ip, out_udp, in_ip and in_tcp >=20 > mb->outer_l2_len =3D len(out_eth) > mb->outer_l3_len =3D len(out_ip) > mb->l2_len =3D len(out_udp + vxlan + in_eth) > mb->l3_len =3D len(in_ip) > mb->ol_flags |=3D PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \ > PKT_TX_OUTER_UDP_CKSUM | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; > set out_ip checksum to 0 in the packet > set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() > set in_ip checksum to 0 in the packet > set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() >=20 > We need to add the flag PKT_TX_OUTER_UDP_CKSUM. We can, though right now we don't have a HW that is able to do that. Why need to do it now? >=20 > case 8) TSO on inner header + out_ip checksum >=20 > This is not supported yet, but latest patch from Jijiang [8] > implements this feature. >=20 > mb->outer_l2_len =3D len(out_eth) > mb->outer_l3_len =3D len(out_ip) > mb->l2_len =3D len(out_udp + vxlan + in_eth) > mb->l3_len =3D len(in_ip) > mb->l4_len =3D len(in_tcp) > mb->ol_flags |=3D PKT_TX_OUTER_IP_CKSUM | PKT_TX_OUTER_IPV4 | \ > PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | \ > PKT_TX_TCP_SEG; > set out_ip checksum to 0 in the packet > set in_ip checksum to 0 in the packet > set in_tcp checksum to pseudo header without including the IP > payload length using rte_ipv4_phdr_cksum() >=20 > supported on hardware advertising DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM > and DEV_TX_OFFLOAD_TCP_TSO. >=20 >=20 > I think the following cases should be *forbidden by the API*: >=20 > case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1]) >=20 > mb->outer_l2_len =3D len(out_eth) > mb->outer_l3_len =3D len(out_ip) > mb->l2_len =3D len(out_udp + vxlan + in_eth) > mb->l3_len =3D len(out_ip) > mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \ > PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM; > set out_ip checksum to 0 in the packet > set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >=20 > If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be > supported, but there is no reason to support it as there is > already one way to do the same. >=20 > I think the driver should not even look at mb->outer_l2_len > and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set. Why it should be forbidden? I admit it might be a bit slower than case 4), but I think absolutely legal way to setup HW offloads for inner L3/L4. As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not. PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.= =20 >=20 > case 10) calculate a checksum using only outer_lX fields >=20 > The outer_lX fields or PKT_TX_OUTER_* flags can only be used > if a inner checksum is enabled. So it's not possible to do > the following: >=20 > mb->outer_l2_len =3D len(out_eth) > mb->outer_l3_len =3D len(out_ip) > mb->ol_flags |=3D PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CSUM > set out_ip checksum to 0 in the packet Ok, I think no one plans to use it anyway. Konstantin >=20 > Regards, > Olivier >=20 >=20 >=20 > [1] http://dpdk.org/ml/archives/dev/2014-December/009213.html > [2] > http://dpdk.org/browse/dpdk/tree/lib/librte_mbuf/rte_mbuf.h?id=3Dv1.8.0#n= 147 > [3] > http://dpdk.org/browse/dpdk/tree/lib/librte_mbuf/rte_mbuf.h?id=3Dv1.8.0#n= 108 > [4] http://dpdk.org/ml/archives/dev/2014-December/009352.html > [5] > http://dpdk.org/browse/dpdk/commit/lib/librte_mbuf/rte_mbuf.h?id=3D711ba9= e23e681b97d547219de8af199ea03a33b3 > [6] > http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40evf/i4= 0e_txrx.c?v=3D3.17#L1223 > [7] http://dpdk.org/dev/patchwork/patch/1907/ > [8] http://dpdk.org/dev/patchwork/patch/2329/