From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id EDA83A04F9; Fri, 27 Dec 2019 15:23:40 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 611151C043; Fri, 27 Dec 2019 15:23:39 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 6B0141C034 for ; Fri, 27 Dec 2019 15:23:37 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Dec 2019 06:23:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,363,1571727600"; d="scan'208";a="212655556" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga008.jf.intel.com with ESMTP; 27 Dec 2019 06:23:35 -0800 Received: from fmsmsx111.amr.corp.intel.com (10.18.116.5) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 27 Dec 2019 06:23:35 -0800 Received: from FMSEDG002.ED.cps.intel.com (10.1.192.134) by fmsmsx111.amr.corp.intel.com (10.18.116.5) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 27 Dec 2019 06:23:34 -0800 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.171) by edgegateway.intel.com (192.55.55.69) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 27 Dec 2019 06:23:34 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DviXD9oARuqofJVOkka+7pMKV9/fZs0UoSYS2VsfIgosjaqOMxLYp3pLmQIoJdhIPhT5FK34Lp6Oo9Jdx5KMge76c+Fc0U6y3Mvye1PEzkmTJzdw9N5FfAerM1AeU0/kLY5a4JpK1mwDOymtFhYIoH2vsFSftHB3Pu8is9VYhcaJNNUGNGC7ccM1EyE1Vfm4puM39a11hWEGV/wfY797nkixIpCteTuq0OQ6B1NnYV8bbAlbO2Yo+ZIXYPGcKQ/9xSf3AhDLnRkgOkpnNDJBBTzPCbHpBi0oLYPO2BkqZOCN7rzU0dk2wFL5uzfXtK7757pg4bMrJPS8rcBaR7TPdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=v6diYyQzAXITO18L2+DZ+Q95y0RB5YYH/6IdOuOWWis=; b=ERx1BkHwC4VHlHqXPgR+CSWBqjk6WDDwgMexHBg3VX/VKUhmRwDRwBczXH2b2HMIoJ1VmGuj/XW8dydTSWvkJdxY5JbBc6WEVr6j7DuQMmiGUOkzWcFmhEOJnPPLeJWLq5D/Uj/kSxNAnCMqcvfGGXfuURA0eSGXTccSHWVriubLCeuOOKPvzvX+IOM/DYg6OvVEA3Hs2OzDubl1TjqAsdmKZm2okw8FDBKKEZWlAeoMDmsI0VKpfEA71C70wPSZcT+b52H0d7NeL2CJ6mRNeeq7sRW2Js5MUbU2ymK5tDsbvtqOaoiv3Mc/lDVZNFXWOdrKJotjIYw0FFSUCQUaBA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=v6diYyQzAXITO18L2+DZ+Q95y0RB5YYH/6IdOuOWWis=; b=jfgoCi9hvths9wgXJ3sz90H74u2qYpY6bEKH+oX5rKnj2+QcHzr0b3rvXUvG/JUY2xqXAmpSJ09fn0xd1l9Z9S6qO21Qyklp0zyLCdqGnyLCrP0T9PtkaF8Bkk1lfizp0NnqN4AVCu2yus2rxc2jCZuO1Gthsej7Mb33lQlZhyk= Received: from SN6PR11MB2558.namprd11.prod.outlook.com (52.135.94.19) by SN6PR11MB3294.namprd11.prod.outlook.com (52.135.109.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2581.11; Fri, 27 Dec 2019 14:23:19 +0000 Received: from SN6PR11MB2558.namprd11.prod.outlook.com ([fe80::4d86:362a:13c3:8386]) by SN6PR11MB2558.namprd11.prod.outlook.com ([fe80::4d86:362a:13c3:8386%7]) with mapi id 15.20.2581.007; Fri, 27 Dec 2019 14:23:19 +0000 From: "Ananyev, Konstantin" To: Olivier Matz , Andrew Rybchenko CC: "Yigit, Ferruh" , Shahaf Shuler , "dev@dpdk.org" , Thomas Monjalon , "Richardson, Bruce" , Matan Azrad , Jerin Jacob Kollanukkaran Thread-Topic: [dpdk-dev] questions about new offload ethdev api Thread-Index: AQHVr4ThSx/4Ath5Xkqtp9QP+rXTbqe8eYKAgBGhwwCAAAeJ4A== Date: Fri, 27 Dec 2019 14:23:18 +0000 Message-ID: References: <20180123135308.tr7nmuqsdeogm7bl@glumotte.dev.6wind.com> <65f5f247-15e7-ac0a-183e-8a66193f426f@intel.com> <20191227135428.GP22738@platinum> In-Reply-To: <20191227135428.GP22738@platinum> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOTJmZGY3NmMtMzI2Yy00MjVjLTgwZjAtZmUyNDQ4MjNlZWY3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoieFhqcHFMNElmdGZJbmsxQXEyRnZHTlNSbnE0bktPZ1FmUkZaaitHazlwN1hsVm1YVFpobTExelFabWxSZjVLViJ9 dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 x-ctpclassification: CTP_NT authentication-results: spf=none (sender IP is ) smtp.mailfrom=konstantin.ananyev@intel.com; x-originating-ip: [192.198.151.178] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 64c74a6b-4cd5-4703-dcdb-08d78ad85218 x-ms-traffictypediagnostic: SN6PR11MB3294: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0264FEA5C3 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39860400002)(346002)(376002)(366004)(396003)(136003)(51444003)(13464003)(199004)(189003)(76116006)(66556008)(316002)(64756008)(53546011)(6506007)(66946007)(66476007)(66446008)(7696005)(86362001)(26005)(110136005)(54906003)(5660300002)(33656002)(52536014)(9686003)(81166006)(2906002)(8676002)(186003)(4326008)(478600001)(81156014)(8936002)(30864003)(55016002)(71200400001); DIR:OUT; SFP:1102; SCL:1; SRVR:SN6PR11MB3294; H:SN6PR11MB2558.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: QQsOKoyb92F8BHxucxj7boQ07el6qwsyylWa0W0+v6u27PI4Hp2M+k8XoetzoEpkdq+Cw5UrA5rX2fPjrgGwwIas49OEPc1MHSar5JGdMeVwRgqUp0zXNUc0JZciG16YFCkMvqoFPK5DKptmCeH8vtjDn8+9uplFgIDSIbYWyNkDILpLsSAjn816VpEmhJ5SIyxZ2htF/vOM0RUJrXlURLURmPZ3V36EjlzsEFZ9RwkznH9tCq9mSb0hgdU8FkgFuEzS6Qui1Yd7pzr9YbIFWSdmNcHEAMZ0Ya5L3Obj6JacDRPwc+xWM3hdIcEcOeq90X2hLrqRh3fqv2BLSYrRNgiW7Rf4p7jw3UcN/ThrSgD2ZcnuISvmNxAuU1t/sXzBtnURZusUkIZEp9GW5az6hNHa9eSlLiPdGsUwD36cyh4ewIbNaKe/0jhVzG3ym24NYe5FG7h/FxBt0lcySYyO7T4b69nkiRqmpC/Lq4znCDCbapuY2nEiPoAI1fq98YcD6H749iACPDOBp2dixaiOBg== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 64c74a6b-4cd5-4703-dcdb-08d78ad85218 X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Dec 2019 14:23:18.7426 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: luZXJC9jCusVlfXq/NcVpSb0NDvxfJfSN7Qxf77ajphLn+WmSTfESILyxzrY88VzO7D5XCwXCSgxe6lLIsnuAwwbvKByHzxQy8vSx+QT6zo= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR11MB3294 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] questions about new offload ethdev api X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Olivier Matz > Sent: Friday, December 27, 2019 1:54 PM > To: Andrew Rybchenko > Cc: Yigit, Ferruh ; Shahaf Shuler ; dev@dpdk.org; Ananyev, Konstantin > ; Thomas Monjalon ; Ri= chardson, Bruce ; Matan > Azrad ; Jerin Jacob Kollanukkaran > Subject: Re: [dpdk-dev] questions about new offload ethdev api >=20 > Hi, >=20 > Few comments below. >=20 > On Mon, Dec 16, 2019 at 11:39:05AM +0300, Andrew Rybchenko wrote: > > On 12/10/19 9:07 PM, Ferruh Yigit wrote: > > > On 1/23/2018 2:34 PM, Shahaf Shuler wrote: > > >> Tuesday, January 23, 2018 3:53 PM, Olivier Matz: > > > > > > <...> > > > > > >>> > > >>> 2/ meaning of rxmode.jumbo_frame, rxmode.enable_scatter, > > >>> rxmode.max_rx_pkt_len > > >>> > > >>> While it's not related to the new API, it is probably a good opport= unity > > >>> to clarify the meaning of these flags. I'm not able to find a good > > >>> documentation about them. > > >>> > > >>> Here is my understanding, the configuration only depends on: - the = maximum > > >>> rx frame length - the amount of data available in a mbuf (minus hea= droom) > > >>> > > >>> Flags to set in rxmode (example): > > >>> +---------------+----------------+----------------+----------------= -+ | > > >>> |mbuf_data_len=3D1K|mbuf_data_len=3D2K|mbuf_data_len=3D16K| > > >>> +---------------+----------------+----------------+----------------= -+ > > >>> |max_rx_len=3D1500|enable_scatter | | = | > > >>> +---------------+----------------+----------------+----------------= -+ > > >>> |max_rx_len=3D9000|enable_scatter, |enable_scatter, |jumbo_frame = | | > > >>> |jumbo_frame |jumbo_frame | | > > >>> +---------------+----------------+----------------+----------------= -+ >=20 > Due to successive quotes, the table was not readable in my mail client, > here it is again (narrower): >=20 > +------------+---------------+---------------+---------------+ > | |mbuf_data_len=3D |mbuf_data_len=3D |mbuf_data_len=3D | > | |1K |2K |16K | > +------------+---------------+---------------+---------------+ > |max_rx_len=3D |enable_scatter | | | > |1500 | | | | > +------------+---------------+---------------+---------------+ > |max_rx_len=3D |enable_scatter,|enable_scatter,|jumbo_frame | > |9000 |jumbo_frame |jumbo_frame | | > +------------+---------------+---------------+---------------+ >=20 > > >>> If this table is correct, the flag jumbo_frame would be equivalent = to check > > >>> if max_rx_pkt_len is above a threshold. > > >>> > > >>> And enable_scatter could be deduced from the mbuf size of the given= rxq > > >>> (which is a bit harder but maybe doable). > > >> > > >> I glad you raised this subject. We had a lot of discussion on it int= ernally > > >> in Mellanox. > > >> > > >> I fully agree. All application needs is to specify the maximum packe= t size it > > >> wants to receive. > > >> > > >> I think also the lack of documentation is causing PMDs to use those = flags > > >> wrongly. For example - some PMDs set the jumbo_frame flag internally= without > > >> it being set by the application. > > >> > > >> I would like to add one more item : MTU. What is the relation (if an= y) > > >> between setting MTU and the max_rx_len ? I know MTU stands for Max T= ransmit > > >> Unit, however at least in Linux it is the same for the Send and the = receive. > > >> > > >> > > > > > > (Resurrecting the thread after two years, I will reply again with lat= est > > > understanding.) > > > > > > Thanks Olivier for above summary and table, and unfortunately usage s= till not > > > consistent between PMDs. According my understanding: > > > > > > 'max_rx_pkt_len' is user configuration value, to limit the size packe= t that is > > > shared with host, but this doesn't limit the size of packet that NIC = receives. >=20 > When you say the size of packet shared with the host, do you mean for > instance that the NIC will receive a 1500B packet and will only write > 128 bytes of data in the mbuf? >=20 > If yes, this was not my understanding. I suppose it could be used for > monitoring. What should be the value for rx offload infos like checksum > or packet type if the packet (or the header) is truncated? >=20 > > Also comment in lib/librte_ethdev/rte_ethdev.h says that the > > rxmode field is used if (and I think only if) JUMBO_FRAME is > > enabled. So, if user wants to set it on device configure stage, > > device *must* support JUMBO_FRAME offload which mean that > > driver code handles rxmode.max_rx_pkt_len and either accept it > > and configures HW appropriately or return an error if specified > > value is wrong. Basically it is written in jumbo frame feature > > definition in features.rst. User has max_rx_pktlen in dev_info > > to find out maximum supported value for max_rx_pkt_len. > > > > > Like if the mbuf size of the mempool used by a queue is 1024 bytes, w= e don't > > > want packets bigger than buffer size, but if NIC supports it is possi= ble receive > > > 6000 bytes packet and split data into multiple buffers, and we can us= e multi > > > segment packets to represent it. > > > So what we need is NIC ability to limit the size of data to share to = host and > > > scattered Rx support (device + driver). > > > > It requires RX_SCATTER offload enabled and it must be > > controlled by the user only (not PMD) since it basically > > mean if the application is ready to handle multi-segment > > packets (have code which takes a look at the number of > > segments and next pointers etc). Moreover, application > > may disable MULTI_SEG Tx offload (and drivers may ignore > > number of segments and next pointer as well). >=20 > Agree, I think it is important that the application can control the > enabling of rx scatter, either by a flag, or simply by passing > max_rx_len <=3D mbuf_data_len. >=20 > > > But MTU limits the size of the packet that NIC receives. > > > > Personally I've never treated it this way. For me the only > > difference between max_rx_pkt_len and MTU is: > > - max_rx_pkt_len is entire packet with all L2 headers and > > even FCS (basically everything which could be provided > > to application in mbuf) > > - MTU does not cover L2 (and VLANs, I'm not sure about MPLS) > > > > > Assuming above are correct J, > > > > > > Using mbuf data size as 'max_rx_pkt_len' without asking from user is = an option, > > > but perhaps user has different reason to limit packet size, so I thin= k better to > > > keep as different config option. > > > > > > I think PMD itself enabling "jumbo frame" offload is not too bad, and > > > acceptable, since providing a large MTU already implies it. > > > > Yes >=20 > +1 >=20 > > > But not sure about PMD enabling scattered Rx, application may want to= force to > > > receive single segment mbufs, for that case PMD enabling this config = on its own > > > looks like a problem. > > > > Yes >=20 > +1 >=20 > > > But user really needs this when a packet doesn't fit to the mbuf, so = providing a > > > MTU larger than 'max_rx_pkt_len' _may_ imply enabling scattered Rx, I= assume > > > this is the logic in some PMDs which looks acceptable. > > > > I don't think so. IMO auto enabling Rx scatter from PMD is a > > breakage of a contract between application and driver. > > As stated above the application may be simply not ready to > > handle multi-segment packets correctly. > > > > I think that providing an MTU larger than 'max_rx_pkt_len' is simply a > > change of max_rx_pkt_len =3D (MTU plus space for L2+). >=20 > As VLAN(s) are not taken in account in MTU, it means that if MTU is > 1500, max Ethernet len is 1500 + 14 (eth hdr) + 4 (vlan) + 4 (2nd vlan / > qinq) + 4 (crc) =3D 1522. >=20 > Shouldn't we only use a L2 lengths instead of MTU? I don't know what > is usually expected by different hardware (mtu or l2 len). >=20 > > > And PMD behavior should be according for mentioned configs: > > > > > > 1) Don't change user provided 'max_rx_pkt_len' value > > I have no strong opinion. However, it is important to clarify > > which understanding of max_rx_pkt_len vs MTU is the right one. > > > > > 2) If jumbo frame is not enabled, don't limit the size of packets to = the host (I > > > think this is based on assumption that mbuf size always will be > 151= 4) > > > > I think that JUMBO_FRAME is not relevant here. It is just a > > promise to take a look at max_rx_pkt_len on configure or > > start stage. > > > > > 3) When user request to set the MTU bigger than ETH_MAX, PMD enable j= umbo frame > > > support (if it is not enabled by user already and supported by HW). I= f HW > > > doesn't support if of course it should fail. > > > > I'm not sure which ETH_MAX is mentioned above. > > #define ETH_MAX_MTU 0xFFFFU /* 65535, same as IP_MAX_MTU */ > > or do you mean > > #define ETH_FRAME_LEN 1514 /* Max. octets in frame sans FCS */ > > or even > > #define ETH_DATA_LEN 1500 /* Max. octets in payload */ > > > > We should be careful when we talk about Ethernet lengths and > > MTU. > > > > > 4) When user request to set MTU bigger than 'max_rx_pkt_len' > > > > I think the second parameter to consider here is not > > max_rx_pkt_len, but amount of space for data in single > > mbuf (for all Rx queues). > > > > > 4a) if "scattered Rx" is enabled, configure the MTU and limit packet = size to > > > host to 'max_rx_pkt_len' > > > > Yes and no. IMO configure the MTU and bump max_rx_pkt_len. > > > > > 4b) if "scattered Rx" is not enabled but HW supports it, enable "scat= tered Rx" > > > by PMD, configure the MTU and limit packet size to host to 'max_rx_pk= t_len' > > > > No, I think it is wrong to enable Rx scatter from PMD. > > > > > 4c) if "scattered Rx" is not enabled and not supported by HW, fail MT= U set. > > > > Yes, regardless support in HW. > > > > > 4d) if HW doesn't support to limit the packet size to host, but reque= sted MTU > > > bigger than 'max_rx_pkt_len' it should fail. > > > > I would rephrase it as impossibility to disable Rx scatter. > > If so, it must be driver responsibility to drop scattered > > packets if Rx scatter offload is not enabled. > > > > > Btw, I am aware of that some PMDs have a larger MTU by default and ca= n't limit > > > the packet size to host to 'max_rx_pkt_len' value, I don't know what = to do in > > > that case, fail in configure? Or at least be sure configured mempool'= s mbuf size > > > is big enough? > > > > See above. > > > > Thanks for reminder about the topic. >=20 > I have the impression that what we want can be done with these 3 > values: >=20 > - max_rx_pkt_size: maximum size of received packet, larger ones are dropp= ed > - max_rx_data_size: maximum size of data copied in a mbuf chain, larger p= ackets > are truncated Do we really need 'max_rx_data_size'? Can't it just always be equal to max_rx_pkt_size (as we have it right now)?= =20 > - max_rx_seg_size: maximum size written in a segment (this can be retriev= ed > from pool private info =3D rte_pktmbuf_data_room_size() - RTE_PKTMBUF_H= EADROOM) >=20 > I think the first 2 values should be L2 lengths, including CRC if > CRC-receive is enabled. >=20 > if max_rx_data_size <=3D max_rx_seg_size, scatter is disabled > if max_rx_data_size < max_rx_pkt_size, packets can be truncated >=20 > In case a PMD is not able to limit a packet size, it can be advertised > by a capability, and it would be up to the application to do it by sw. >=20 >=20 > Olivier