From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id EF3585A15 for ; Tue, 20 Jan 2015 02:12:32 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP; 19 Jan 2015 17:08:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,430,1418112000"; d="scan'208";a="653455100" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by fmsmga001.fm.intel.com with ESMTP; 19 Jan 2015 17:12:30 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.81]) by IRSMSX154.ger.corp.intel.com ([169.254.12.235]) with mapi id 14.03.0195.001; Tue, 20 Jan 2015 01:12:29 +0000 From: "Ananyev, Konstantin" To: Olivier MATZ , "Liu, Jijiang" Thread-Topic: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine Thread-Index: AQHQFSvOf5ZHcwOIQ0S6LPFf4KIhoZyLUqAAgADVrYCAJ+m7AIAAhGKAgAAcY4CAAAaCgIABXQMAgAAfBJCAAZMJAIAEQKUAgACGkwCAAQFCAIAAcvAAgAEemACAAkDH8IAB1gMAgARiT1CAACVrgIAAHEZA Date: Tue, 20 Jan 2015 01:12:29 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com> References: <1418173403-30202-1-git-send-email-jijiang.liu@intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7699@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D337B@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA789E@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D34AE@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7CC5@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D3897@irsmsx105.ger.corp.intel.com> <54AFB13E.2080200@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA85A1@SHSMSX101.ccr.corp.intel.com> <54B3B35A.5030803@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA8E36@SHSMSX101.ccr.corp.intel.com> <54B4EB92.40209@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DB0789@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D4FCF@irsmsx105.ger.corp.intel.com> <54B94A18.5030700@6wind.com> <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> <54BD16F1.6050409@6wind.com> In-Reply-To: <54BD16F1.6050409@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2015 01:12:34 -0000 > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Monday, January 19, 2015 2:39 PM > To: Ananyev, Konstantin; Liu, Jijiang > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and cs= um forwarding engine >=20 > Hi Konstantin, >=20 > On 01/19/2015 02:04 PM, Ananyev, Konstantin wrote: > >> case 2) calculate checksum of out_ip and out_udp > >> > >> mb->l2_len =3D len(out_eth) > >> mb->l3_len =3D len(out_ip) > >> mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM > >> set out_ip checksum to 0 in the packet > >> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() > >> > >> supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and > >> DEV_TX_OFFLOAD_UDP_CKSUM > >> > >> *Problem 1*: The comment above PKT_TX_IPV4 says "Packet is IPv4 > >> without requiring IP checksum offload" [2], and the help of L4 > >> checksum and TSO says that it is required to set the PKT_TX_IPV4 > >> flag [3]. This is not coherent. > > > > So what is the problem? > > Comments in rte_mbuf.h are not coherent? >=20 > No there're not coherent Ok, if the problem is just comments - let's fix it. >=20 > >> We are back on the debate about the meaning of PKT_TX_IPV4 vs > >> PKT_TX_IP_CSUM from [4]. This incoherency in comments are introduce= d > >> by patch [5]. The question is "when an application should set > >> this flag? for any IP packet that does not require IP checksum?". > > > > Yes, if it is an IPv4 packet and application required TX offload for L4= checksum or TSO, > > but doesn't want HW offload ofr IPV4 checksum calculation. > > > >> This would break many applications. > > > > Which ones? > > As I know, so far nothing is broken. >=20 > The problem today is that it's not obvious for a developper to > know when an application should set the PKT_TX_IPV4 flag. From the > comments, we could think that an application has to set it for any > transmitted IP packet, even for packets that do not require tx > offload. Asking to do this in the API would break many applications. >=20 > The comment should at least say that this flag is *only* required > when asking for L4 checksum. As TSO implies IP checksum, it means the > PKT_TX_IPV4 should not be set, but PKT_TX_IP_CSUM instead. Ok, so the problem is in comments again? If so, sure let's update them to make things clear. >=20 > >> I think a good definition would > >> be: > >> > >> Packet is IPv4. This flag must be set when using any offload > >> feature (TSO, L3 or L4 checksum) to tell the NIC that the packet > >> is an IPv4 packet. > >> > >> That's why I added PKT_TX_IPV4 in the examples. > > > > I suppose we discussed it several times: both ways are possible. > > From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM > > As mutually exclusive seems a bit more plausible. > > From the upper layer - my understanding, that it is doesn't really matt= er. > > I thought we had an agreement about it in 1.8, no? >=20 > Indeed, this was already discussed, but there was a lot of pressure > for 1.8.0 to push something, even not perfect. The fog around comments > shows that the API was not very clearly defined for 1.8.0. If you read > the comments of the API, it is impossible to understand when the > PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say > more: the only place where the comments bring a valuable information > (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and > PKT_TX_IP_CSUM are not exclusive... >=20 > So I will fix that in my coming patch series. Just for information, > I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not > exclusive flag would not require any change anywhere in the PMDs (even > in i40e). Right now - no. Though as I said from PMD perspective having them exclusive is a bit prefer= able. Again, I don't see any big difference from upper layer code. > On the contrary, making them exclusive would require to > change the ixgbe TSO code because we check PKT_TX_IPV4. Hmm, so you are saying there is a bug somewhere in ixbe_rxtx.c? What particular place you are talking about? >=20 > >> *Problem 3*: without using the word "fortville", it is difficult > >> to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed, > >> once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a > >> tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT > >> flag. In linux, the driver doesn't care about the tunnel type, > >> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6]= . > > > > It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is s= et, > > but it is not obvious what type of tunnelling it would be. > > FVL HW supports HW TX offloads for different type of tunnelling and > > requires that SW provide information about tunnelling type. > > From i40e datasheet: > > L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indicatio= n: > > 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals = to zero) > > 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve). > > 10b - GRE tunneling header > > As we do plan to support other than UDP tunnelling types, I suppose we'= ll need to keep > > PKT_TX_UDP_TUNNEL_PKT flag. >=20 > As I've said: in linux, the driver doesn't care about the tunnel type, > it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations. Ok, and why it should be our problem? We have a lot of things done in a different manner then linux/freebsd kerne= l drivers, Why now it became a problem? > However I suppose that linux driver is able to process the hw outer > checksum even for other encapsulations (gre, ipip).=20 No idea, never tried. >And, does it mean > that ipip tunnels are not supported by i40e? I can't believe it. IPIP tunnels are supported. >>From the FVL spec, for IPIP L4TUNT and L4TUN should be set to 0. You can read all that yourself: http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-1= 0-40-controller-datasheet.html Sections 8.4.2.2.1 and 8.4.4 No need to do a reverse engineering from linux i40e source code.=20 > If it's the case... how an application on top of DPDK can know which > tunnel types are supported by the underlying port? > From what I've read, what the datasheet does not explain is: > "what is done differently for this packet between setting the register > to GRE (10b) or UDP (01b)?" That's a good question.=20 I don't know why HW requires that information, just follow the spec here. >=20 > >> case 7) calculate checksum of out_ip, out_udp, in_ip and in_tcp > >> > >> mb->outer_l2_len =3D len(out_eth) > >> mb->outer_l3_len =3D len(out_ip) > >> mb->l2_len =3D len(out_udp + vxlan + in_eth) > >> mb->l3_len =3D len(in_ip) > >> mb->ol_flags |=3D PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \ > >> PKT_TX_OUTER_UDP_CKSUM | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; > >> set out_ip checksum to 0 in the packet > >> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() > >> set in_ip checksum to 0 in the packet > >> set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() > >> > >> We need to add the flag PKT_TX_OUTER_UDP_CKSUM. > > > > We can, though right now we don't have a HW that is able to do that. > > Why need to do it now? >=20 > No, I agree we should not add it now. I just want to be sure we have > a consensus that it will work like this the day we'll have such drivers. Ok. >=20 > >> I think the following cases should be *forbidden by the API*: > >> > >> case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1]) > >> > >> mb->outer_l2_len =3D len(out_eth) > >> mb->outer_l3_len =3D len(out_ip) > >> mb->l2_len =3D len(out_udp + vxlan + in_eth) > >> mb->l3_len =3D len(out_ip) > >> mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \ > >> PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM; > >> set out_ip checksum to 0 in the packet > >> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() > >> > >> If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be > >> supported, but there is no reason to support it as there is > >> already one way to do the same. > >> > >> I think the driver should not even look at mb->outer_l2_len > >> and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set. > > > > Why it should be forbidden? > > I admit it might be a bit slower than case 4), > > but I think absolutely legal way to setup HW offloads for inner L3/L4. > > As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose > > PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or no= t. > > PKT_TX_OUTER_* flags indicate does outer cksum offload is required or n= ot. >=20 > I don't understand. The result in terms of hardware is exactly the > same than case 4). Why should we have 2 different ways for doing the > same thing? If HW supports that capability, why should we forbid it?=20 Let user to choose himself what way to use. FVL spec lists it as a valid approach. =20 As one of possible use-cases: HW VLAN tags insertion for both inner and ou= ter packets. FVL can do that, though as I know our PMD doesn't implement it yet. For that, we'll need to specify at least: outer_l2_len, outer_l3_len, l2_len. While PKT_TX_OUTER_* might stay cleared. > This is really confusing for an API. Moreover, you said > it: it is slower that case 4). I don't know would be slower then 4) or not for sure. That's my guess, based on the fact that for 9) we need to fill 2 descriptor= s, while fro 4) - only 1. Though I didn't measure the difference. That's actually one more reason why to allow and support it - so people can make sure that on FVL both ways work as expected and measure = the difference. Konstantin >=20 > It also seems easier to understand from an API point of view: the > PMD uses mb->outer_lX_len if and only if a PKT_TX_OUTER_* flag > is present. >=20 > >> case 10) calculate a checksum using only outer_lX fields > >> > >> The outer_lX fields or PKT_TX_OUTER_* flags can only be used > >> if a inner checksum is enabled. So it's not possible to do > >> the following: > >> > >> mb->outer_l2_len =3D len(out_eth) > >> mb->outer_l3_len =3D len(out_ip) > >> mb->ol_flags |=3D PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CSUM > >> set out_ip checksum to 0 in the packet > > > > Ok, I think no one plans to use it anyway. > > > > Konstantin >=20 > Thanks Konstantin for taking the time to reply and progress on this. >=20 > Regards, > Olivier