From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Ferruh Yigit" <ferruh.yigit@intel.com>, <dev@dpdk.org>
Cc: "Thomas Monjalon" <thomas@monjalon.net>,
"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
<matan@nvidia.com>, "Qi Zhang" <qi.z.zhang@intel.com>,
"Ajit Khaparde" <ajit.khaparde@broadcom.com>,
"Stephen Hemminger" <stephen@networkplumber.org>,
"Ray Kinsella" <mdr@ashroe.eu>,
"Bruce Richardson" <bruce.richardson@intel.com>,
"Damjan Marion (damarion)" <damarion@cisco.com>,
"Roy Fan Zhang" <roy.fan.zhang@intel.com>,
"Min Hu (Connor)" <humin29@huawei.com>,
"Konstantin Ananyev" <konstantin.ananyev@intel.com>,
"Stokes, Ian" <ian.stokes@intel.com>,
"David Marchand" <david.marchand@redhat.com>
Subject: RE: MTU and frame size filtering inaccuracy
Date: Wed, 2 Mar 2022 09:53:42 +0100 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86F14@smartserver.smartshare.dk> (raw)
In-Reply-To: <e2554b78-cdda-aa33-ac6d-59a543a10640@intel.com>
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Tuesday, 1 March 2022 18.50
>
> Hi all,
>
> There is a problem in MTU setting in DPDK.
Yes, and the root cause is the unclear definition of what "MTU" means in DPDK! This is causing the confusion about L3 packet size, L2 raw packet size, and L2 encapsulated packet size.
Traditional Ethernet links are expected to provide a 1500 byte L3 MTU. This means that an untagged packet can be 1518 byte (incl. 14 byte Ethernet header and 4 byte Ethernet CRC), a VLAN tagged packet can be 1522 byte, a QinQ tagged packet can be 1526 byte, and MPLS tagged packets can be other sizes, depending on the number of MPLS labels.
Optimally, the NIC hardware would understand these additional headers and determine if the packet is oversized or not, e.g. on a hybrid link (i.e. mixed untagged and VLAN tagged traffic), it should consider a 1522 byte packet oversize if untagged, but correctly sized if VLAN tagged. However, the NIC hardware doesn't do this.
The above only describes the problem of converting between the L3 and L2 packet size - i.e. the logical packet sizes. There is also a physical limitation:
The NIC hardware might support a certain maximum raw L2 packet size, such as 1522 byte or 2048 byte. In this case, you don't want to allow larger packets regardless of the number of VLAN tags or MPLS labels preceding the actual packet. You could even risk allocating too small MBUFs.
In summary, I think the whole MTU handling API is utterly defective.
Optimally, the API should discriminate between maximum encapsulated L2 packet size (i.e. not counting the bytes used for VLAN tags and similar) and maximum raw L2 packet size (i.e. also counting bytes used for VLAN tags and similar).
When this was discussed on the DPDK mailing list a couple of years ago [1], there was no support for improving on this situation, and the decision was to blindly adopt Linux' way of handling it: Consider the MTU as if packets are untagged, and allow 4 more byte for single VLAN tagged packets. I don't recall exactly how QinQ tagged packets are supposed to be considered regarding the MTU, and I also don't know where any of this is documented.
[1] http://inbox.dpdk.org/dev/MN2PR18MB2432526A39C6ECEB2CEB8865AFE00@MN2PR18MB2432.namprd18.prod.outlook.com/
>
> In 'rte_eth_dev_configure()'and 'rte_eth_dev_set_mtu()', MTU is
> converted to frame size.
>
> Since L2 protocol header size changes based on what HW supports,
> L2 overhead information get from PMD, but this still doesn't solve
> the issue.
>
> PMD reports max overhead based on what it supports, but there is
> no way to know what will received packets have. Sample:
>
> i40e has 26 bytes overhead: HRD_LEN + CRC_LEN + VLAN_LEN *2
> when MTU set to 1500, configured frame size become 1526
> When a packet received with no VLAN tag and 1504 bytes payload,
> packet frame size is 1522 bytes and it is accepted.
> So although MTU is set 1500 bytes, packet with 1504 bytes is accepted.
>
> There is an inaccuracy in frame size filtering up to 8 bytes.
>
>
> Damjan reported the same, and he has good point on the application
> need (I hope it is OK to quote from his email):
>
> 1) information about the biggest l2 frame interface it can receive and
> send (1518,1522, 2000 or jumbo)
Yes, I think the API should report the "maximum raw L2 packet size" (i.e. also counting the bytes used for any preceding tags, regardless if they are stripped or not).
> 2) ability to ask hardware to help him with filtering oversized frames
>
>
> We need to fix (2), I am not quite sure how, any comment is welcome.
This would require NIC hardware support and optionally the addition of a NIC configuration flag to control whether it should count the bytes used by any preceding VLAN tags and/or MPLS labels when evaluating the packet size or not.
The short term solution is a workaround in the application: Configure the NICs with an oversize MTU (e.g. +8 byte to support QinQ packets) and check the packets for oversize in the application. Unfortunately, this also means that the NIC hardware counters are no longer correct, and the reported counters must be adjusted for the number of oversize packets detected by the application.
>
>
> --
> Thanks,
> ferruh
next prev parent reply other threads:[~2022-03-02 8:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 17:50 Ferruh Yigit
2022-03-02 8:53 ` Morten Brørup [this message]
2022-03-02 16:21 ` Stephen Hemminger
2022-03-02 16:50 ` Morten Brørup
2022-03-02 17:40 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D86F14@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=ajit.khaparde@broadcom.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=damarion@cisco.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=humin29@huawei.com \
--cc=ian.stokes@intel.com \
--cc=konstantin.ananyev@intel.com \
--cc=matan@nvidia.com \
--cc=mdr@ashroe.eu \
--cc=qi.z.zhang@intel.com \
--cc=roy.fan.zhang@intel.com \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).