From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0A931A04A4; Wed, 2 Mar 2022 17:22:06 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9B63D42715; Wed, 2 Mar 2022 17:22:05 +0100 (CET) Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by mails.dpdk.org (Postfix) with ESMTP id 0BA2440141 for ; Wed, 2 Mar 2022 17:22:04 +0100 (CET) Received: by mail-pg1-f182.google.com with SMTP id 27so2043884pgk.10 for ; Wed, 02 Mar 2022 08:22:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cQJHimD+Z0ZHI0Ggnbz4t/6heFiWbbOseMCb9KdSdDw=; b=tiKpzlvXKl/qGM1nqMktofbDkbiMBhRhWvhnPE2X3N+EAirt8irzyKP4OddPCOeisy RqsbjucEtGc4anjp22hiyV8rFVsDQYi84NfGHzEikSG9rmNfco/Mr/p0cgxHbN9UBE1Q N8VR3dNhlXXqKYU951R0bK4rlEoPFzD6CvADi9OncyXYuIU8/sy4wFQFdnIH8a8bvNRX Y5T4yZ/x30pbTISaPQQDRn8pIOja8++TEJYGCcaZ8Kma8uXuXqpfYo6DrPZzNLO6caJx s2isTHY5USNmZiU/SZFN4mneI22AeUvSJ8P65JO1/smgD1R9u7npZrrSYCkGQoqRLEVK 6tKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cQJHimD+Z0ZHI0Ggnbz4t/6heFiWbbOseMCb9KdSdDw=; b=jjbaCuTYlVmwhLP4VZUxBDdVX38dBMFc2CcJM4yKj6bWD5RfMCSCXQyh/Oq76pmmlA /Dxjcq1X1kQeYSjfgzruLh8L4bGlru+vwHPxMlekmZY/cOfOeY+Vzo22+6Qv4Jz6hHBc 0yu/Eu0mqadqhpQVKxwUhmeBGbxT0VzI0eJx+hTp7WGFc1r1+YgODVv/PMbr5ZMRaEp5 P1LGPP9myVrQt973gfl7LuUxENrS5ihGrLGKR6XCbRa3vVmrBZJOI1tb3vCQTYA1sL0I 7p80G1ycgjqepJJlkvgCk38Mu7jQXUyhow+lZo+7M1VXwxB4E7d/f1fzYNXqqsjmueim giOA== X-Gm-Message-State: AOAM533YrpVyq/y+tGp8HsVKxULSUGDRvY9cduH+E7AsvCA5jq08iFRf WhoEi/5ln0CSYf9ndedJQytqcg== X-Google-Smtp-Source: ABdhPJxUA1O0ey309d0Crd7wd8Y2LwUrZTNGCRCZnj6pkzKehbphtSjJ9BXqt3qdXG+E6+F1Ykjbww== X-Received: by 2002:a05:6a00:1a8f:b0:4e1:cde3:7bf7 with SMTP id e15-20020a056a001a8f00b004e1cde37bf7mr34223301pfv.52.1646238123045; Wed, 02 Mar 2022 08:22:03 -0800 (PST) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id v66-20020a622f45000000b004f129e7767fsm20508764pfv.130.2022.03.02.08.22.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Mar 2022 08:22:02 -0800 (PST) Date: Wed, 2 Mar 2022 08:21:59 -0800 From: Stephen Hemminger To: Morten =?UTF-8?B?QnLDuHJ1cA==?= Cc: "Ferruh Yigit" , , "Thomas Monjalon" , "Andrew Rybchenko" , , "Qi Zhang" , "Ajit Khaparde" , "Ray Kinsella" , "Bruce Richardson" , "Damjan Marion (damarion)" , "Roy Fan Zhang" , "Min Hu (Connor)" , "Konstantin Ananyev" , "Stokes, Ian" , "David Marchand" Subject: Re: MTU and frame size filtering inaccuracy Message-ID: <20220302082159.06d3d872@hermes.local> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86F14@smartserver.smartshare.dk> References: <98CBD80474FA8B44BF855DF32C47DC35D86F14@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, 2 Mar 2022 09:53:42 +0100 Morten Br=C3=B8rup wrote: > > From: Ferruh Yigit [mailto:ferruh.yigit@intel.com] > > Sent: Tuesday, 1 March 2022 18.50 > >=20 > > Hi all, > >=20 > > There is a problem in MTU setting in DPDK. =20 >=20 > Yes, and the root cause is the unclear definition of what "MTU" means in = DPDK! This is causing the confusion about L3 packet size, L2 raw packet siz= e, and L2 encapsulated packet size. >=20 > Traditional Ethernet links are expected to provide a 1500 byte L3 MTU. Th= is means that an untagged packet can be 1518 byte (incl. 14 byte Ethernet h= eader and 4 byte Ethernet CRC), a VLAN tagged packet can be 1522 byte, a Qi= nQ tagged packet can be 1526 byte, and MPLS tagged packets can be other siz= es, depending on the number of MPLS labels. >=20 > Optimally, the NIC hardware would understand these additional headers and= determine if the packet is oversized or not, e.g. on a hybrid link (i.e. m= ixed untagged and VLAN tagged traffic), it should consider a 1522 byte pack= et oversize if untagged, but correctly sized if VLAN tagged. However, the N= IC hardware doesn't do this. >=20 > The above only describes the problem of converting between the L3 and L2 = packet size - i.e. the logical packet sizes. There is also a physical limit= ation: >=20 > The NIC hardware might support a certain maximum raw L2 packet size, such= as 1522 byte or 2048 byte. In this case, you don't want to allow larger pa= ckets regardless of the number of VLAN tags or MPLS labels preceding the ac= tual packet. You could even risk allocating too small MBUFs. >=20 > In summary, I think the whole MTU handling API is utterly defective. >=20 > Optimally, the API should discriminate between maximum encapsulated L2 pa= cket size (i.e. not counting the bytes used for VLAN tags and similar) and = maximum raw L2 packet size (i.e. also counting bytes used for VLAN tags and= similar). >=20 > When this was discussed on the DPDK mailing list a couple of years ago [1= ], there was no support for improving on this situation, and the decision w= as to blindly adopt Linux' way of handling it: Consider the MTU as if packe= ts are untagged, and allow 4 more byte for single VLAN tagged packets. I do= n't recall exactly how QinQ tagged packets are supposed to be considered re= garding the MTU, and I also don't know where any of this is documented. >=20 > [1] http://inbox.dpdk.org/dev/MN2PR18MB2432526A39C6ECEB2CEB8865AFE00@MN2P= R18MB2432.namprd18.prod.outlook.com/ >=20 > >=20 > > In 'rte_eth_dev_configure()'and 'rte_eth_dev_set_mtu()', MTU is > > converted to frame size. > >=20 > > Since L2 protocol header size changes based on what HW supports, > > L2 overhead information get from PMD, but this still doesn't solve > > the issue. > >=20 > > PMD reports max overhead based on what it supports, but there is > > no way to know what will received packets have. Sample: > >=20 > > i40e has 26 bytes overhead: HRD_LEN + CRC_LEN + VLAN_LEN *2 > > when MTU set to 1500, configured frame size become 1526 > > When a packet received with no VLAN tag and 1504 bytes payload, > > packet frame size is 1522 bytes and it is accepted. > > So although MTU is set 1500 bytes, packet with 1504 bytes is accepted. > >=20 > > There is an inaccuracy in frame size filtering up to 8 bytes. > >=20 > >=20 > > Damjan reported the same, and he has good point on the application > > need (I hope it is OK to quote from his email): > >=20 > > 1) information about the biggest l2 frame interface it can receive and > > send (1518,1522, 2000 or jumbo) =20 >=20 > Yes, I think the API should report the "maximum raw L2 packet size" (i.e.= also counting the bytes used for any preceding tags, regardless if they ar= e stripped or not). >=20 > > 2) ability to ask hardware to help him with filtering oversized frames > >=20 > >=20 > > We need to fix (2), I am not quite sure how, any comment is welcome. =20 >=20 > This would require NIC hardware support and optionally the addition of a = NIC configuration flag to control whether it should count the bytes used by= any preceding VLAN tags and/or MPLS labels when evaluating the packet size= or not. >=20 > The short term solution is a workaround in the application: Configure the= NICs with an oversize MTU (e.g. +8 byte to support QinQ packets) and check= the packets for oversize in the application. Unfortunately, this also mean= s that the NIC hardware counters are no longer correct, and the reported co= unters must be adjusted for the number of oversize packets detected by the = application. MTU is often a confusing term. Ideally there would be Max Receive Unit and = Max Transmit Unit. I can tell you what Linux (and BSD) kernel do. On transmit MTU is used as f= ilter to size packets before they are passed to the device driver. Also it is used to te= ll TSO what size units to use. But on receive, in kernel any size packet is allowed! The MTU is used by t= he hardware to program receive buffers. Many devices round up to MTU + VLAN to what ever hardware= increment they can handle. Some devices only handle power of 2 which is why E1000 allows 2K pa= ckets to come in when there is a 1500 byte MTU. The other source of confusion is around MTU and VLAN's and encaps. DPDK sho= uld be doing what other OS's and most network vendors do. The convention is that the outer VLAN tag is not part of the MTU but any ot= her tags and encaps subtract from the usable MTU. I.e with MTU =3D 1500 an= d QinQ the usable MTU is 1500 - 4 =3D 1496.=20 For receive and MTU, DPDK should allow any size coming in that HW can recei= ve. Postel's Law - Be conservative in what you do, be liberal in what you accep= t from others.