DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	 "didier.pallard" <didier.pallard@6wind.com>,
	"Liu, Jijiang" <jijiang.liu@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields
Date: Wed, 03 Dec 2014 12:27:49 +0100
Message-ID: <547EF3B5.9070307@6wind.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258213BC3D9@IRSMSX105.ger.corp.intel.com>

Hi Konstantin,

On 12/03/2014 12:11 PM, Ananyev, Konstantin wrote:
>>>> Let's discuss the two possibilities.
>>>> 1) outer_lx_len fields are introduced.
>>>> In this case, the stack should have knowledge that it is processing
>>>> tunneled packets to use outer_lx_len rather than lx_len,
>>>> or stack should always use outer_lx_len packet and move those fields to
>>>> lx_len packets if no tunneling occurs...
>>>> I think it will induce extra processing that does not seem to be really
>>>> needed.
>>>> 2) inner_lx_len fields are introduced.
>>>> In this case, the stack first uses lx_len fields. When packet should be
>>>> tunneled, lx_len fields are moved to inner_lx_len fields.
>>>> Then the stack can process the outer layer and still use the lx_len fields.
>>> Not sure, that I understood why 2) is better than 1).
>>> Let say,  you have a 'normal' (non-tunnelling) packet: ether/IP/TCP
>>> In that case you still use mbuf's l2_len/l3_len/l4_len fields and setup ol_flags as usual.
>>> Then later, you decided to 'tunnel' that packet.
>>> So you just fill mbuf's outer_l2_len/outer_l3_len, setup TX_OUTER_* and TX_TUNNEL_* bits in ol_flags and probably update l2_len.
>>> l3_len/l4_len and ol_flags bits set for them remain intact.
>>> That's with 1)
>>> With 2) - you'll have to move l3_len/l4_len to inner_lx_len.
>>> And I suppose ol_flags values too:
>>> ol_flags &= ~PKT_ IP_CKSUM;
>>> ol_flgas  |=  PKT_INNER_IP_CKSUM
>>> ?
>>> And same for L4_CKSUM flags too?
>> After reading Didier's mail, I start to be convinced that keeping inner
>> may be better than outer. From a network stack architecture point of
>> view, 2) looks better:
>> - the functions in the network stack that write the Ether/IP header
>>     always deal with l2_len/l3_len, whatever it's a tunnel or not.
>> - the function that adds the tunnel header moves the l2_len/l3_len and
>>     the flags to inner_l2_len/inner_l3_len and inner_flags.
> Hmm, still don't understand you here.
> Why all that you mentioned above suddenly become 'better'?
> Here is you original suggestion about introducing 'outer_lx_len':
> http://dpdk.org/ml/archives/dev/2014-November/008268.html
> As you pointed, it is a clean and straightforward way to extend DPDK HW TX offload API with tunnelling support.
> And we agreed with you here.
>  From other side, 2) approach looks like a mess:
> If packet is going to be tunnelled, the upper layer has to:
> 1.  move lx_len values to inner_lx_len.
> 2. move PKT_TX_*_CKSUM bit to PKT_TX_INNER_*_CKSUM bits in ol_flags.
> Plus in the mbuf we'll either have to introduce PKT_TX_INNER_(TCP|UDP|SCTP)_CKSUM flags
> (otherwise we'll have a weird situation when PKT_TX_IP_CKSUM we'll be for outer IP header, but
> PKT_TX_TCP_CKSUM will be for inner).
> So, from DPDK perspective, 2) looks like a waste of bits in ol_flags and unnecessary complication.
> At least to me.
> You are talking about 'the network stack', but as I know at dpdk.org we don't have any open sourced L3/L4 stack supported.
>  From other side - in DPDK we just adding fields into mbuf for TX tunnelling.
> So if any of the existing commercial stacks already support tunnelling - they should have their own fields for that in their own packet metadata structure.
> Which means - they somehow have to copy information from their packet structure into mbuf anyway.
> If they don't support tunnelling yet and plan to use mbuf directly (without copying info into their own packet metadata structure),
> I suppose they can adopt  the DPDK approach.
> So, from my point - let's implement it in a way that makes most sense from DPDK perspective: 1).

OK. your argumentation makes sense even though I think DPDK aims to be
a SDK for building network applications or stacks and should ease the
work done in application. But this is something we can handle in the
stack if the API is properly defined in DPDK.

Anyway, we need to find a way to conclude on this :)
And it does not prevent us to think again about it for 2.0, if a
dev_prep_tx() function is introduced.

I still have few comments on the v5 patch from Jijiang but I think we
can converge soon.


      reply	other threads:[~2014-12-03 11:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-02  6:52 [dpdk-dev] [PATCH v4 0/3] i40e VXLAN TX checksum rework Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 1/3] mbuf:redefine three TX ol_flags Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 2/3] mbuf:add three TX ol_flags and repalce PKT_TX_VXLAN_CKSUM Jijiang Liu
2014-12-02  6:52 ` [dpdk-dev] [PATCH v4 3/3] mbuf:replace the inner_l2_len and the inner_l3_len fields Jijiang Liu
2014-12-02 14:53   ` didier.pallard
2014-12-02 15:36     ` Ananyev, Konstantin
2014-12-03  8:57       ` Olivier MATZ
2014-12-03 11:11         ` Ananyev, Konstantin
2014-12-03 11:27           ` Olivier MATZ [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547EF3B5.9070307@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=didier.pallard@6wind.com \
    --cc=jijiang.liu@intel.com \
    --cc=konstantin.ananyev@intel.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git