DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <dev@dpdk.org>, "Dengdui Huang" <huangdengdui@huawei.com>,
	"Vladimir Medvedkin" <medvedkinv@gmail.com>, <techboard@dpdk.org>,
	"Patrick Robb" <probb@iol.unh.edu>,
	"fengchengwen" <fengchengwen@huawei.com>,
	<stephen@networkplumber.org>, <jasvinder.singh@intel.com>,
	<thomas@monjalon.net>, <aman.deep.singh@intel.com>,
	<lihuisong@huawei.com>, <liuyonglong@huawei.com>,
	"Dean Marx" <dmarx@iol.unh.edu>
Subject: RE: [RFC PATCH] doc: clarify VLAN and QinQ stripping behaviour
Date: Wed, 30 Jul 2025 22:10:42 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9FDD6@smartserver.smartshare.dk> (raw)
In-Reply-To: <aIpWP6FCcMXktUvq@bricha3-mobl1.ger.corp.intel.com>

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, 30 July 2025 19.29
> 
> On Mon, Jul 28, 2025 at 07:48:53PM +0200, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Monday, 28 July 2025 17.11
> > >
> > > On Mon, Jul 28, 2025 at 04:51:30PM +0200, Morten Brørup wrote:
> > > > > From: Dean Marx [mailto:dmarx@iol.unh.edu]
> > > > > Sent: Friday, 18 July 2025 15.18
> > > > >
> > > > > On Fri, Jul 18, 2025 at 4:23 AM Bruce Richardson
> > > > > <bruce.richardson@intel.com> wrote:
> > > > > >
> > > > > > On Thu, Jul 17, 2025 at 05:03:13PM -0400, Dean Marx wrote:
> > > > > > > I've created a v1 of a QinQ test suite around the set of
> test cases
> > > > > > > discussed earlier (which is not set in stone, and I expect
> it to
> > > > > > > change significantly across many future versions.) The
> PASS/FAIL
> > > > > > > values can be mostly disregarded in the context of this
> conversation,
> > > > > > > but I've added logging to explain which packets are sent,
> and what
> > > > > > > happened upon reception, which I hope will be more
> informative. After
> > > > > > > running on mlx5/i40e drivers, I got the following results:
> > > > > > >
> > > > > > > test_vlan_strip: QinQ strip OFF and VLAN strip ON
> > > > > > > test_qinq_strip: QinQ strip ON and VLAN strip ON
> > > > > > >
> > > > > > > i40e:
> > > > > > >     test_qinq_strip (sent packet: Single VLAN): FAIL
> > > > > > >       reason: VLAN tags found in packet when should have
> been
> > > > > > > stripped: Ether / Dot1Q / 802.1q (0x1c) vlan 1280 / LLC /
> Raw /
> > > > > > > Padding
> > > > > > >     test_qinq_strip (sent packet: Stacked VLAN): FAIL
> > > > > > >       reason: Expected one VLAN tag but found 2: Ether /
> Dot1Q / Dot1Q
> > > > > > > / 802.1q (0x1c) vlan 1280 / LLC / Raw / Padding
> > > > > > >     test_qinq_strip (sent packet: Single S-VLAN): FAIL
> > > > > > >       reason: VLAN tags found in packet when should have
> been
> > > > > > > stripped: Ether / Dot1Q / 802.1q (??) vlan ?? / LLC / Raw /
> Padding
> > > > > > >     test_qinq_strip (sent packet: QinQ): FAIL
> > > > > > >       reason: VLAN tags found in packet when should have
> been
> > > > > > > stripped: Ether / Dot1Q / Dot1AD / 802.1q (0x1c) vlan 1280 /
> LLC / Raw
> > > > > > > / Padding
> > > > > > >     test_vlan_strip (sent packet: Single VLAN): PASS
> > > > > > >       reason: VLAN tag stripped from packet
> > > > > > >     test_vlan_strip (sent packet: Stacked VLAN): PASS
> > > > > > >       reason: Received packet had outer VLAN stripped, with
> inner VLAN
> > > > > intact
> > > > > > >     test_vlan_strip (sent packet: Single S-VLAN): PASS
> > > > > > >       reason: S-VLAN tag stripped from packet
> > > > > > >     test_vlan_strip (sent packet: QinQ): FAIL
> > > > > > >       reason: Neither tag stripped
> > > > > > >
> > > > > >
> > > > > > Can you confirm exactly what is being sent in each case for
> the
> > > ethertype
> > > > > > of the VLAN tag? When you say single and stacked VLANs, that
> is VLANs
> > > with
> > > > > > 0x8100 type, correct? Is single S-VLAN a tag with ethertype
> 0x88a8, and
> > > > > > QinQ packet a packet with one 0x88a8 and one 0x8100? No other
> type
> > > options,
> > > > > > e.g. 0x9100 were checked, right?
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > That's correct, single VLAN is one 0x8100 tag, stacked is two,
> single
> > > > > S-VLAN is one 0x88a8, and QinQ is 0x88a8 and 0x8100. No other
> types
> > > > > were tested in the stripping case
> > > >
> > > >
> > > > Bruce,
> > > >
> > > > It seems the drivers have the ability to set the EtherType of the
> Outer (and
> > > sometimes Inner) tag:
> > > >
> https://elixir.bootlin.com/dpdk/v25.07/source/lib/ethdev/rte_ethdev.h#L3
> 752
> > > >
> > >
> https://elixir.bootlin.com/dpdk/v25.07/source/drivers/net/intel/e1000/ig
> b_ethd
> > > ev.c#L2739
> > > >
> > >
> https://elixir.bootlin.com/dpdk/v25.07/source/drivers/net/intel/i40e/i40
> e_ethd
> > > ev.c#L4038
> > > >
> > > > -Morten
> > > >
> > > Thanks. Question is, does this help us to clarify the behaviour for
> these
> > > tags? Based on the fact that the comment for the tag_type says that
> the
> > > outer is the same as single for vlan types, then should the
> behaviour be:
> > >
> > > * VLAN strip - strip at most one tag as defined by the
> "outer/single" VLAN
> > >   type.
> > > * QinQ strip:
> > >   - if outer/single VLAN tag matches the outer tag type, strip it
> > >   - if outer tag has been stripped, and inner tag matches the tag
> type,
> > >     strip that also.
> > >   - if a single VLAN tag is present, it gets stripped only if it's
> tag type
> > >     matches outer type - it is left alone if it matches the inner
> type
> > >   - if two VLAN tags are present, and the inner tag matches, it is
> not
> > >     stripped if the outer tag does not match/has not been stripped.
> > >
> > > Also, should we specify for DPDK what the default tags should be for
> the
> > > two cases. It seems for the Intel NICs that I tried, that both inner
> and
> > > outer tags always start with 0x8100. That's probably a good default
> for
> > > VLAN strip, but for QinQ strip, we probably want hardware to default
> to
> > > 0x88a8 and 0x8100. Alternatively, we could/should mandate that
> drivers
> > > explicitly set the required tags before starting the port.
> > >
> > > /Bruce
> >
> > My train of thought is:
> >
> > 1. When not configured for QinQ, maximum one tag is considered. I
> think we all agree on this.
> > VLAN is EtherType 0x8100, and (if present) is associated with the
> first (outermost) tag in the packet, and kept in mbuf->vlan_tci,
> regardless if any more (inner) tags are present or not.
> > This defines placement of the VLAN tag ID in the mbuf and the default
> EtherType used in the packet headers when parsing/stripping/inserting
> the tag.
> > It also defines the purpose of VLAN capability/present/strip/insert
> flags in the ethdev and mbuf offload flags.
> > Feature creep: The EtherType of this (outermost) tag might be
> configurable, if the hardware and driver supports it. I would consider
> such a feature very exotic.
> 
> And just to be explicit, "associated with the first tag" means that we
> never have a situation with VLAN strip or QinQ strip setting where we
> would
> strip a non-first tag, but leave the first tag in place. Tags are always
> stripped from the outer tag inwards, and tag stripping stops as soon as
> the
> first non-strippable tag is encountered.

Yes.
Also, the EtherType (of this first tag) must be 0x8100 (or optionally a configurable value) to be considered a match.

<thought experiment>
We could expand it to allow stripping an inner VLAN tag (EtherType 0x8100) if an outer QinQ tag (EtherType 0x88a8) was parsed.
It might even be possible for the HW to support that on RX.
But it wouldn't be possible to support on TX, as it would require the HW to detect if a QinQ tag is present in the packet data being transmitted, and then insert the (inner) VLAN tag after that outer QinQ tag during transmit.
So, without feature symmetry between RX and TX, I don't think it is relevant.
</thought experiment>

> 
> >
> > 2. For more tags, start by strictly conforming to the IEEE 802.3
> standard.
> > QinQ is EtherType 0x88a8 and is associated with the outer tag (S-TAG),
> mbuf->vlan_tci_outer, regardless if an inner tag (C-TAG) is present or
> not.
> > VLAN is EtherType 0x8100 and is associated with the inner tag (C-TAG)
> or simple tag (VLAN tag), mbuf->vlan_tci, regardless if an outer tag is
> present or not. (Note: When disregarding the outer tag here, this is
> exactly the same behavior as for VLAN in bullet 1 above.)
> > This defines placements of the tag IDs in the mbuf and the default
> EtherTypes used in the packet headers when parsing/stripping/inserting
> tags.
> > It also defines the purpose of VLAN and QINQ
> capability/present/strip/insert flags in the ethdev and mbuf offload
> flags.
> >
> 
> So based on what you say above, having a VLAN strip setting separate
> from
> the QinQ strip enabled setting doesn't make any sense, right? So we
> should
> decide and document that when QinQ strip is enabled, VLAN strip must be:
> 
> a) always on - which makes sense if single tagged 0x8100 packets are
> still
>                stripped as you describe above
> b) always off - which makes sense in the case below where singly tagged
>                 packets are treated as outer tags
> c) ignored - which is a bit of a cop-out, but avoids any issues with
> backward
>              compatibility for apps.

No. I'm saying something else:
With QinQ stripping/insertion enabled, the VLAN stripping/insertion flag controls what to do with the inner VLAN tag (if present):
Enabled: If inner VLAN tag is present (and outer QinQ tag was stripped), strip it.
Disabled: If inner VLAN tag is present (and outer QinQ tag was stripped), don't strip it.

Think of a conceptual pipeline:
First stage on RX (if QinQ stripping is enabled), the outermost QinQ tag is stripped, if present. After this, the packet has no outer tag anymore; the outermost tag has been removed, so what was the inner tag before has now become the outermost tag.
Second stage on RX (if VLAN stripping is enabled), the outermost VLAN tag is stripped, if present. After this, the packet has no inner tag anymore.

When QinQ stripping is disabled, or HW doesn't support it, there is only the "Second stage".
This is what I was trying to refer to in my bullet 2 "Note: When disregarding the outer tag here, this is exactly the same behavior as for VLAN in bullet 1 above."
In other words: The VLAN stripping engine only considers the outermost tag (possibly after QinQ stripping has made the innermost tag become the outermost).

And there is symmetry for TX, except the two conceptual pipeline stages are swapped.

> 
> > 3. Then add support for VLAN Stacking, i.e. using EtherType 0x8100 for
> the outer tag.
> > This mode of operation is only relevant when configured for QinQ. It
> does not support "super hybrid links" (mixing QinQ tagged packets with
> simple VLAN tagged packets), because a packet with one VLAN tag is
> considered an S-tagged packet with no C-TAG, and such a packet would
> normally be considered a simple VLAN tagged packet.
> > Now, QinQ means EtherType 0x8100. It requires that the hardware and
> driver supports changing the EtherType of the outer tag.
> > QinQ is EtherType 0x8100 and is associated with the outer tag (S-TAG),
> mbuf->vlan_tci_outer, regardless if an inner tag (C-TAG) is present or
> not.
> > VLAN is EtherType 0x8100 and is associated with the inner tag (C-TAG),
> mbuf->vlan_tci, and can only exist when an outer tag is present.
> > Note: rte_net_get_ptype() cannot support this mode of operation
> without adding a parameter containing information about the EtherType of
> the outer tag!
> >
> > This describes the behavior I have now come to expect, after getting
> wiser through this discussion.
> > Drivers might do something else, and applications might have adopted
> what drivers actually do.
> >
> 
> One followup question/thought then: the
> rte_eth_dev_set_vlan_ether_type()
> function only applies for QinQ traffic, since it specifies inner/outer?

No, it supports all tag types through the enum rte_vlan_type [1] passed to rte_eth_dev_set_vlan_ether_type(): QinQ Outer, QinQ Inner, Single VLAN.

But I think the documentation of the enum rte_vlan_type needs to be corrected too...

I agree with its description saying: "Note that single VLAN is treated the same as inner VLAN."
But the "Single VLAN" comment is at the wrong enum value.
I think it should be:

/**
 * VLAN types to indicate if it is for single VLAN, inner VLAN or outer VLAN.
 * Note that single VLAN is treated the same as inner VLAN.
 */
enum rte_vlan_type {
	RTE_ETH_VLAN_TYPE_UNKNOWN = 0,
-	RTE_ETH_VLAN_TYPE_INNER, /**< Inner VLAN. */
-	RTE_ETH_VLAN_TYPE_OUTER, /**< Single VLAN, or outer VLAN. */
+	RTE_ETH_VLAN_TYPE_INNER, /**< Inner VLAN, or single VLAN. */
+	RTE_ETH_VLAN_TYPE_OUTER, /**< Outer VLAN. */
	RTE_ETH_VLAN_TYPE_MAX,
};

[1]: https://elixir.bootlin.com/dpdk/v25.07/source/lib/ethdev/rte_ethdev.h#L444

> 
> /Bruce

  reply	other threads:[~2025-07-30 20:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-14 13:30 Bruce Richardson
2025-07-14 15:06 ` Stephen Hemminger
2025-07-14 15:11   ` Bruce Richardson
2025-07-14 16:33 ` Morten Brørup
2025-07-14 16:49   ` Bruce Richardson
2025-07-15 15:04     ` Morten Brørup
2025-07-15 16:20       ` Bruce Richardson
2025-07-14 20:09 ` Dean Marx
2025-07-14 21:41   ` Patrick Robb
2025-07-15  7:47   ` Bruce Richardson
2025-07-15 21:15     ` Patrick Robb
2025-07-16 10:11   ` Bruce Richardson
2025-07-16 19:24     ` Dean Marx
2025-07-16 19:46       ` Bruce Richardson
2025-07-17 17:45         ` Morten Brørup
2025-07-17 21:03           ` Dean Marx
2025-07-18  8:22             ` Bruce Richardson
2025-07-18 13:18               ` Dean Marx
2025-07-28 14:51                 ` Morten Brørup
2025-07-28 15:11                   ` Bruce Richardson
2025-07-28 17:48                     ` Morten Brørup
2025-07-30 17:28                       ` Bruce Richardson
2025-07-30 20:10                         ` Morten Brørup [this message]
2025-07-31  8:08                           ` Bruce Richardson
2025-08-01 10:20                           ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9FDD6@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=aman.deep.singh@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmarx@iol.unh.edu \
    --cc=fengchengwen@huawei.com \
    --cc=huangdengdui@huawei.com \
    --cc=jasvinder.singh@intel.com \
    --cc=lihuisong@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=medvedkinv@gmail.com \
    --cc=probb@iol.unh.edu \
    --cc=stephen@networkplumber.org \
    --cc=techboard@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).