From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 0E08568CD for ; Wed, 3 Dec 2014 12:11:37 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 03 Dec 2014 03:11:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,507,1413270000"; d="scan'208";a="647485596" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by orsmga002.jf.intel.com with ESMTP; 03 Dec 2014 03:11:34 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.144]) by IRSMSX154.ger.corp.intel.com ([169.254.12.18]) with mapi id 14.03.0195.001; Wed, 3 Dec 2014 11:11:34 +0000 From: "Ananyev, Konstantin" To: Olivier MATZ , didier.pallard , "Liu, Jijiang" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Thread-Index: AQHQDj/6HpZk5vbjdEW8W8awA8SRIpx8Z2aggAEqZYCAABTUQA== Date: Wed, 3 Dec 2014 11:11:33 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213BC3D9@IRSMSX105.ger.corp.intel.com> References: <1417503172-18642-1-git-send-email-jijiang.liu@intel.com> <1417503172-18642-4-git-send-email-jijiang.liu@intel.com> <547DD269.2080500@6wind.com> <2601191342CEEE43887BDE71AB977258213BC075@IRSMSX105.ger.corp.intel.com> <547ED071.3080101@6wind.com> In-Reply-To: <547ED071.3080101@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Dec 2014 11:11:39 -0000 Hi Oliver, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Wednesday, December 03, 2014 8:57 AM > To: Ananyev, Konstantin; didier.pallard; Liu, Jijiang; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and = the inner_l3_len fields >=20 > Hi Didier, Konstantin, Jijiang, >=20 > On 12/02/2014 04:36 PM, Ananyev, Konstantin wrote: > > Hi Didier > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard > >> Sent: Tuesday, December 02, 2014 2:53 PM > >> To: Liu, Jijiang; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len a= nd the inner_l3_len fields > >> > >> Hello, > >> > >> On 12/02/2014 07:52 AM, Jijiang Liu wrote: > >>> Replace the inner_l2_len and the inner_l3_len field with the outer_l2= _len and outer_l3_len field, and rework csum forward > engine > >> and i40e PMD due to these changes. > >>> > >>> Signed-off-by: Jijiang Liu > >> [...] > >>> --- a/lib/librte_mbuf/rte_mbuf.h > >>> +++ b/lib/librte_mbuf/rte_mbuf.h > >>> @@ -276,8 +276,8 @@ struct rte_mbuf { > >>> uint64_t tso_segsz:16; /**< TCP TSO segment size */ > >>> > >>> /* fields for TX offloading of tunnels */ > >>> - uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */ > >>> - uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */ > >>> + uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */ > >>> + uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */ > >>> > >>> /* uint64_t unused:8; */ > >>> }; > >> > >> Sorry for entering lately this discussion, but i'm not convinced by th= e > >> choice of outer_lx_len rather than inner_lx_len for new fields. > >> I agree with Olivier that new flags should only be related to the use = of > >> new fields, to maintain coherency with oldest implementations. > >> But from a stack point of view, i think it is better to have lx_len > >> fields that target the outer layers, and to name new fields inner_lx_l= en. > >> > >> Let's discuss the two possibilities. > >> > >> 1) outer_lx_len fields are introduced. > >> In this case, the stack should have knowledge that it is processing > >> tunneled packets to use outer_lx_len rather than lx_len, > >> or stack should always use outer_lx_len packet and move those fields t= o > >> lx_len packets if no tunneling occurs... > >> I think it will induce extra processing that does not seem to be reall= y > >> needed. > >> > >> 2) inner_lx_len fields are introduced. > >> In this case, the stack first uses lx_len fields. When packet should b= e > >> tunneled, lx_len fields are moved to inner_lx_len fields. > >> Then the stack can process the outer layer and still use the lx_len fi= elds. > > > > Not sure, that I understood why 2) is better than 1). > > Let say, you have a 'normal' (non-tunnelling) packet: ether/IP/TCP > > In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup= ol_flags as usual. > > Then later, you decided to 'tunnel' that packet. > > So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and= TX_TUNNEL_* bits in ol_flags and probably update l2_len. > > l3_len/l4_len and ol_flags bits set for them remain intact. > > That's with 1) > > > > With 2) - you'll have to move l3_len/l4_len to inner_lx_len. > > And I suppose ol_flags values too: > > ol_flags &=3D ~PKT_ IP_CKSUM; > > ol_flgas |=3D PKT_INNER_IP_CKSUM > > ? > > And same for L4_CKSUM flags too? >=20 > After reading Didier's mail, I start to be convinced that keeping inner > may be better than outer. From a network stack architecture point of > view, 2) looks better: >=20 > - the functions in the network stack that write the Ether/IP header > always deal with l2_len/l3_len, whatever it's a tunnel or not. >=20 > - the function that adds the tunnel header moves the l2_len/l3_len and > the flags to inner_l2_len/inner_l3_len and inner_flags. Hmm, still don't understand you here. Why all that you mentioned above suddenly become 'better'? Here is you original suggestion about introducing 'outer_lx_len': http://dpdk.org/ml/archives/dev/2014-November/008268.html As you pointed, it is a clean and straightforward way to extend DPDK HW TX = offload API with tunnelling support. And we agreed with you here. >>From other side, 2) approach looks like a mess: If packet is going to be tunnelled, the upper layer has to: 1. move lx_len values to inner_lx_len. 2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags.=20 Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)= _CKSUM flags (otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for o= uter IP header, but PKT_TX_TCP_CKSUM will be for inner). So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and un= necessary complication. =20 At least to me. You are talking about 'the network stack', but as I know at dpdk.org we don= 't have any open sourced L3/L4 stack supported.=20 >>From other side - in DPDK we just adding fields into mbuf for TX tunnelling= . So if any of the existing commercial stacks already support tunnelling - th= ey should have their own fields for that in their own packet metadata struc= ture. =20 Which means - they somehow have to copy information from their packet struc= ture into mbuf anyway.=20 If they don't support tunnelling yet and plan to use mbuf directly (without= copying info into their own packet metadata structure), I suppose they can adopt the DPDK approach. So, from my point - let's implement it in a way that makes most sense from = DPDK perspective: 1). Konstantin >=20 > Althought it was my first idea, now I cannot find a better argument in > favor of outer_lX_len. The initial argument was that the correspondance > between a flag and a lX_len should always remain the same, but it is > still possible with Didier's approach: > - PKT_IP_CKSUM uses l2_len and l3_len > - PKT_INNER_CKSUM uses inner_l2_len and inner_l3_len >=20 > Jijiang, I'm sorry to change my mind about this. If you want (and if > Konstantin is also ok with that), I can try to rebase your patches to > match this. Or do you prefer to do it by yourself? >=20 > Regards, > Olivier >=20