From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 79FB15A70 for ; Tue, 20 Jan 2015 18:23:52 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP; 20 Jan 2015 09:19:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,435,1418112000"; d="scan'208";a="639952248" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by orsmga001.jf.intel.com with ESMTP; 20 Jan 2015 09:23:50 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.81]) by IRSMSX152.ger.corp.intel.com ([169.254.6.43]) with mapi id 14.03.0195.001; Tue, 20 Jan 2015 17:23:49 +0000 From: "Ananyev, Konstantin" To: Olivier MATZ , "Liu, Jijiang" Thread-Topic: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine Thread-Index: AQHQFSvOf5ZHcwOIQ0S6LPFf4KIhoZyLUqAAgADVrYCAJ+m7AIAAhGKAgAAcY4CAAAaCgIABXQMAgAAfBJCAAZMJAIAEQKUAgACGkwCAAQFCAIAAcvAAgAEemACAAkDH8IAB1gMAgARiT1CAACVrgIAAHEZAgAFUrACAADW5sA== Date: Tue, 20 Jan 2015 17:23:48 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213DE5FB@irsmsx105.ger.corp.intel.com> References: <1418173403-30202-1-git-send-email-jijiang.liu@intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA789E@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D34AE@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7CC5@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D3897@irsmsx105.ger.corp.intel.com> <54AFB13E.2080200@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA85A1@SHSMSX101.ccr.corp.intel.com> <54B3B35A.5030803@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA8E36@SHSMSX101.ccr.corp.intel.com> <54B4EB92.40209@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DB0789@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D4FCF@irsmsx105.ger.corp.intel.com> <54B94A18.5030700@6wind.com> <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> <54BD16F1.6050409@6wind.com> <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com> <54BE4C70.7050406@6wind.com> In-Reply-To: <54BE4C70.7050406@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2015 17:23:53 -0000 Hi Olivier, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Tuesday, January 20, 2015 12:39 PM > To: Ananyev, Konstantin; Liu, Jijiang > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and cs= um forwarding engine >=20 > Hi, >=20 > On 01/20/2015 02:12 AM, Ananyev, Konstantin wrote: > >>>> I think a good definition would > >>>> be: > >>>> > >>>> Packet is IPv4. This flag must be set when using any offload > >>>> feature (TSO, L3 or L4 checksum) to tell the NIC that the packet > >>>> is an IPv4 packet. > >>>> > >>>> That's why I added PKT_TX_IPV4 in the examples. > >>> > >>> I suppose we discussed it several times: both ways are possible. > >>> From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM > >>> As mutually exclusive seems a bit more plausible. > >>> From the upper layer - my understanding, that it is doesn't really m= atter. > >>> I thought we had an agreement about it in 1.8, no? > >> > >> Indeed, this was already discussed, but there was a lot of pressure > >> for 1.8.0 to push something, even not perfect. The fog around comments > >> shows that the API was not very clearly defined for 1.8.0. If you read > >> the comments of the API, it is impossible to understand when the > >> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say > >> more: the only place where the comments bring a valuable information > >> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and > >> PKT_TX_IP_CSUM are not exclusive... > >> > >> So I will fix that in my coming patch series. Just for information, > >> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not > >> exclusive flag would not require any change anywhere in the PMDs (even > >> in i40e). > > > > Right now - no. > > Though as I said from PMD perspective having them exclusive is a bit pr= eferable. > > Again, I don't see any big difference from upper layer code. >=20 > Sure, it does not make a big difference in terms of code. But > in terms of API, the naming of the flag is coherent to what it is > used for. And it's easier to find a simple definition, like: >=20 > * Packet is IPv4. This flag must be set when using any offload feature > * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4 > * packet. Ok, and what's wrong with: "Packet is IPv4. This flag must be set when using any offload feature (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4 packet and no HW offload for IPv4 header checksum calculation is required" ? >=20 > >> On the contrary, making them exclusive would require to > >> change the ixgbe TSO code because we check. > > > > Hmm, so you are saying there is a bug somewhere in ixbe_rxtx.c? > > What particular place you are talking about? >=20 > Sorry, I spoke too fast. In TSO code, we check PKT_TX_IP_CKSUM (and not > PKT_TX_IPV4 as I thought), so it would work for both methods without > patching the code. >=20 > In this case, it means that both approach would not require to > modify the code. Ok. >=20 > >>>> *Problem 3*: without using the word "fortville", it is difficult > >>>> to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed= , > >>>> once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a > >>>> tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT > >>>> flag. In linux, the driver doesn't care about the tunnel type, > >>>> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations = [6]. > >>> > >>> It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is= set, > >>> but it is not obvious what type of tunnelling it would be. > >>> FVL HW supports HW TX offloads for different type of tunnelling and > >>> requires that SW provide information about tunnelling type. > >>> From i40e datasheet: > >>> L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indicat= ion: > >>> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equal= s to zero) > >>> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve). > >>> 10b - GRE tunneling header > >>> As we do plan to support other than UDP tunnelling types, I suppose w= e'll need to keep > >>> PKT_TX_UDP_TUNNEL_PKT flag. > >> > >> As I've said: in linux, the driver doesn't care about the tunnel type, > >> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations. > > > > Ok, and why it should be our problem? > > We have a lot of things done in a different manner then linux/freebsd k= ernel drivers, > > Why now it became a problem? >=20 > If linux doesn't need an equivalent flag for doing the same thing, > it probably means we don't need it either. Probably yes .... Or probably not. Why do we need to guess what was the intention of guys who wrote that part = of linux driver? BTW, the macro for GRE is here: find lib/librte_pmd_i40e/i40e -type f | xargs grep TUN | grep TXD lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_UDP_TUNNELING (0x= 1ULL << I40E_TXD_CTX_QW0_NATT_SHIFT) lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_GRE_TUNNELING (0x= 2ULL << I40E_TXD_CTX_QW0_NATT_SHIFT) Though it not used (yet?) by some reason.=20 >=20 > In a performance-oriented software like dpdk, having a flag that we > don't know what the hardware does with, that is not needed in other > drivers of the same harware, that makes the API harder to understand > could be a problem. Here is a HW spec, that says what values have to be setup for L4TUNT. Yes, I am not sure why they need to distinguish between VXLAN/GRE tunnellin= g. Though, I suppose that wouldn't eliminate the requirement. But for same, there is no good explanation why FVL HW need to know that it = is IPv4 or IPv6 packet, in the case when only L4 checksum offload is required (IIPT field). Niantic, as I remember, is able to work ok without that requirement. =20 Though, we still have to set it up. =20 > Another argument: if we can remove this flag, it would make the > testpmd commands reworkd proposed by Jijiang much more easy to > understand: only a new "csum parse-tunnel on|off" would be required, > and it can be explained in a few words. Well, from my point - testpmd commands that Jijiang proposed are perfectly = clear and understandable.=20 Another thing, as I remember, our primary concern should be public API, no = testpmd. >=20 > I'll try to do some tests on a fortville NIC if I can find one. I'm > curious to see if we can transmit any encapsulation packet (ip in ip, > ip in gre, eth in gre, eth in vxlan, or even a proprietary tunnel). Ok cool. >=20 > We should avoid the need to specify the tunnel type in the OUTER > checksum API if we can, else it would limit us to specific > supported protocols. >>From the FVL spec it is required by HW, it is not what we introducing on ou= r own. Spec stays explicitly that L4TUNT (L4 tunneling type) has to be setup for t= unnelling packets. Again from the spec, there are 3 different values it can take. If you have an idea how to pass that information to PMD without using flag= s, sure we can consider it. >=20 > >>>> I think the following cases should be *forbidden by the API*: > >>>> > >>>> case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1]= ) > >>>> > >>>> mb->outer_l2_len =3D len(out_eth) > >>>> mb->outer_l3_len =3D len(out_ip) > >>>> mb->l2_len =3D len(out_udp + vxlan + in_eth) > >>>> mb->l3_len =3D len(out_ip) > >>>> mb->ol_flags |=3D PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \ > >>>> PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM; > >>>> set out_ip checksum to 0 in the packet > >>>> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum(= ) > >>>> > >>>> If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be > >>>> supported, but there is no reason to support it as there is > >>>> already one way to do the same. > >>>> > >>>> I think the driver should not even look at mb->outer_l2_len > >>>> and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set. > >>> > >>> Why it should be forbidden? > >>> I admit it might be a bit slower than case 4), > >>> but I think absolutely legal way to setup HW offloads for inner L3/L4= . > >>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose > >>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or = not. > >>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or= not. > >> > >> I don't understand. The result in terms of hardware is exactly the > >> same than case 4). Why should we have 2 different ways for doing the > >> same thing? > > > > If HW supports that capability, why should we forbid it? > > Let user to choose himself what way to use. > > FVL spec lists it as a valid approach. >=20 > It is not a hardware feature. It is. > Case 4) and case 9) would fill the hardware registers exactly the same. No, they wouldn't. Please read corresponding section of FVL spec and i40e_rxtx.c For case 4) we only need to setup TDD (TX data descriptor) with the followi= ng values: IIPT, IPLEN, L4T, L4LEN=20 For case 9) we need to setup both TDD and TCD (TX context descriptor) with = the following values: TDD: IIPT, IPLEN, L4T, L4LEN TCD: EIPT, EIPLEN, L4TUNT, L4TUNLEN=20 > To me, it's just an API question. No, it is not. I still don't understand why you are so eager to 'forbid' it. Yes we support it for FVL, but no one forces you to use it.=20 >=20 > > As one of possible use-cases: HW VLAN tags insertion for both inner an= d outer packets. > > FVL can do that, though as I know our PMD doesn't implement it yet. > > For that, we'll need to specify at least: > > outer_l2_len, outer_l3_len, l2_len. > > While PKT_TX_OUTER_* might stay cleared. >=20 > If a VLAN flag has to be inserted in outer header, a new flag > PKT_TX_OUTER_INSERT_VLAN would be added. So my specification > would still be correct: >=20 > The driver should look at mb->outer_lX_len only if a > PKT_TX_OUTER_* flag is present. >=20 Introducing PKT_TX_OUTER_INSERT_VLAN is ok. Though I still think we'll need TX_*_TUNNEL flags and no need to 'forbid' c= ase 9). BTW, as I can see linux i40e driver for tunnelling packets uses case 9), no= t case 4), right? Konstantin > >> This is really confusing for an API. Moreover, you said > >> it: it is slower that case 4). > > > > I don't know would be slower then 4) or not for sure. > > That's my guess, based on the fact that for 9) we need to fill 2 descri= ptors, while fro 4) - only 1. > > Though I didn't measure the difference. > > That's actually one more reason why to allow and support it - > > so people can make sure that on FVL both ways work as expected and meas= ure the difference. > > > > Konstantin >=20 >=20 > Regards, > Olivier