DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>,
	didier.pallard <didier.pallard@6wind.com>,
	"Liu, Jijiang" <jijiang.liu@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
Date: Wed, 3 Dec 2014 11:11:33 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258213BC3D9@IRSMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <547ED071.3080101@6wind.com>

Hi Oliver,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, December 03, 2014 8:57 AM
> To: Ananyev, Konstantin; didier.pallard; Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> 
> Hi Didier, Konstantin, Jijiang,
> 
> On 12/02/2014 04:36 PM, Ananyev, Konstantin wrote:
> > Hi Didier
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of didier.pallard
> >> Sent: Tuesday, December 02, 2014 2:53 PM
> >> To: Liu, Jijiang; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
> >>
> >> Hello,
> >>
> >> On 12/02/2014 07:52 AM, Jijiang Liu wrote:
> >>> Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field, and rework csum forward
> engine
> >> and i40e PMD due to  these changes.
> >>>
> >>> Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> >> [...]
> >>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>> @@ -276,8 +276,8 @@ struct rte_mbuf {
> >>>    			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >>>
> >>>    			/* fields for TX offloading of tunnels */
> >>> -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> >>> -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> >>> +			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> >>> +			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
> >>>
> >>>    			/* uint64_t unused:8; */
> >>>    		};
> >>
> >> Sorry for entering lately this discussion, but i'm not convinced by the
> >> choice of outer_lx_len rather than inner_lx_len for new fields.
> >> I agree with Olivier that new flags should only be related to the use of
> >> new fields, to maintain coherency with oldest implementations.
> >> But from a stack point of view, i think it is better to have lx_len
> >> fields that target the outer layers, and to name new fields inner_lx_len.
> >>
> >> Let's discuss the two possibilities.
> >>
> >> 1) outer_lx_len fields are introduced.
> >> In this case, the stack should have knowledge that it is processing
> >> tunneled packets to use outer_lx_len rather than lx_len,
> >> or stack should always use outer_lx_len packet and move those fields to
> >> lx_len packets if no tunneling occurs...
> >> I think it will induce extra processing that does not seem to be really
> >> needed.
> >>
> >> 2) inner_lx_len fields are introduced.
> >> In this case, the stack first uses lx_len fields. When packet should be
> >> tunneled, lx_len fields are moved to inner_lx_len fields.
> >> Then the stack can process the outer layer and still use the lx_len fields.
> >
> > Not sure, that I understood why 2) is better than 1).
> > Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
> > In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
> > Then later, you decided to 'tunnel' that packet.
> > So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
> > l3_len/l4_len and ol_flags bits set for them remain intact.
> > That's with 1)
> >
> > With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
> > And I suppose ol_flags values too:
> > ol_flags &= ~PKT_ IP_CKSUM;
> > ol_flgas  |=  PKT_INNER_IP_CKSUM
> > ?
> > And same for L4_CKSUM flags too?
> 
> After reading Didier's mail, I start to be convinced that keeping inner
> may be better than outer. From a network stack architecture point of
> view, 2) looks better:
> 
> - the functions in the network stack that write the Ether/IP header
>    always deal with l2_len/l3_len, whatever it's a tunnel or not.
> 
> - the function that adds the tunnel header moves the l2_len/l3_len and
>    the flags to inner_l2_len/inner_l3_len and inner_flags.

Hmm, still don't understand you here.
Why all that you mentioned above suddenly become 'better'?
Here is you original suggestion about introducing 'outer_lx_len':
http://dpdk.org/ml/archives/dev/2014-November/008268.html
As you pointed, it is a clean and straightforward way to extend DPDK HW TX offload API with tunnelling support.
And we agreed with you here.

>From other side, 2) approach looks like a mess:
If packet is going to be tunnelled, the upper layer has to:
1.  move lx_len values to inner_lx_len.
2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags. 
Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)_CKSUM flags
(otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for outer IP header, but
PKT_TX_TCP_CKSUM will be for inner).

So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and unnecessary complication.  
At least to me.
You are talking about 'the network stack', but as I know at dpdk.org we don't have any open sourced L3/L4 stack supported. 
>From other side - in DPDK we just adding fields into mbuf for TX tunnelling.
So if any of the existing commercial stacks already support tunnelling - they should have their own fields for that in their own packet metadata structure.   
Which means - they somehow have to copy information from their packet structure into mbuf anyway. 
If they don't support tunnelling yet and plan to use mbuf directly (without copying info into their own packet metadata structure),
I suppose they can adopt  the DPDK approach.
So, from my point - let's implement it in a way that makes most sense from DPDK perspective: 1).
Konstantin

> 
> Althought it was my first idea, now I cannot find a better argument in
> favor of outer_lX_len. The initial argument was that the correspondance
> between a flag and a lX_len should always remain the same, but it is
> still possible with Didier's approach:
>    - PKT_IP_CKSUM uses l2_len and l3_len
>    - PKT_INNER_CKSUM uses inner_l2_len and inner_l3_len
> 
> Jijiang, I'm sorry to change my mind about this. If you want (and if
> Konstantin is also ok with that), I can try to rebase your patches to
> match this. Or do you prefer to do it by yourself?
> 
> Regards,
> Olivier
> 

  reply	other threads:[~2014-12-03 11:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-02  6:52 [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 2/3] mbuf:add three TX ol_flags and repalce PKT_TX_VXLAN_CKSUM Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Jijiang Liu
2014-12-02 14:53   ` didier.pallard
2014-12-02 15:36     ` Ananyev, Konstantin
2014-12-03  8:57       ` Olivier MATZ
2014-12-03 11:11         ` Ananyev, Konstantin [this message]
2014-12-03 11:27           ` Olivier MATZ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB977258213BC3D9@IRSMSX105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=didier.pallard@6wind.com \
    --cc=jijiang.liu@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git