From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 05574A04B1; Thu, 5 Nov 2020 01:25:50 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 49E252E8D; Thu, 5 Nov 2020 01:25:49 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 1A0172C2E; Thu, 5 Nov 2020 01:25:45 +0100 (CET) IronPort-SDR: 1oEnioNUw3iJcuGvrWkkK86ZISeRdRjA7X5k9S8RNsgNKmnWJ4Q6sQtc4RO+uFETFL+tY+iw5M QF56ru/15ivQ== X-IronPort-AV: E=McAfee;i="6000,8403,9795"; a="149160175" X-IronPort-AV: E=Sophos;i="5.77,451,1596524400"; d="scan'208";a="149160175" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Nov 2020 16:25:43 -0800 IronPort-SDR: +PlzZDVGKhN0nIF5PbF7w7cJs1KMw+X3Ap9N1lRtARlf6bqUWfFEDx3GIhuDZNRpT+OHdXt2Rv ON3fKzEccU0Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,451,1596524400"; d="scan'208";a="363630266" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by FMSMGA003.fm.intel.com with ESMTP; 04 Nov 2020 16:25:43 -0800 Received: from orsmsx609.amr.corp.intel.com (10.22.229.22) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 4 Nov 2020 16:25:42 -0800 Received: from orsmsx601.amr.corp.intel.com (10.22.229.14) by ORSMSX609.amr.corp.intel.com (10.22.229.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 4 Nov 2020 16:25:42 -0800 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5 via Frontend Transport; Wed, 4 Nov 2020 16:25:42 -0800 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.108) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.1713.5; Wed, 4 Nov 2020 16:25:37 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VEGjskBbyG+72Lualks6HT72RfK+HRzZvzRPsEt1QtHhz2Kc6oKxA/h6ko2oXjjSlfw0PNFvoSrcwq0lvYgCkpTxt8ESHxeDDMpMe1nSy9YI1yf8S4hUkOtjMcl1Kl7DjMp9SpkQ9KN3aflsehRZ0Ro9tlXNjy42NuhX+YFTA7bwbDtzGnjlvomqHLCq7rbCBx6iNY8/JJTv0zEgZWbxuuWtxfw/GUZPMWziIV4rez18YO9MSGeEn33PoxePXKR4HrHbts8+0ShHLPFLgjL19fKHFFiG+XuRqbueP8p1ZF0/HSfY7X7i3dshgW3tWnSdWtSwUwNcauVMFyGNgdP1eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wA8fFEJ2YsUh4yOpuC23QjtX6kn4mKEOkW8f98GDcxI=; b=MIbXWqxkMisOtFvpDmBkauLO69mNCyczySKfJ/SHzZh5YyadywvUlSwoCptS++qmJmcJDxCOWTSmqQ0gTBKOekAMhVmeqgjeoNd2DZ80XNfvgvpEThawz3q3H7UePl0kQm+wKRUBl/EFZ0m72Adwr2gPccvvobOdB+MwIcF8H9JKKs6LiCAsaUxHzuytA4d0mnhTw2ObV+KtevasBvwONQg1UjEmF7rGD7DZk9HygbrTqHcQspm+qzG7vf7WbbuN3vXxqXmuKnYnH+MeXB9bgl7Wr85y2Vs+UQ9wOVyaMqeEa/X4VWwzLp/6rdCv5xR3GrRRKypHgMm/Fn18YcLZHA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wA8fFEJ2YsUh4yOpuC23QjtX6kn4mKEOkW8f98GDcxI=; b=Xr+SWDehxTD48hODzjG1hrV3oD/tLxBTYkKAiKiy1u1GGc9HYVtArN4dyFLN0brwnM2/cllcgPHvXCtQT9PNcaEQEBJfYfkXr9VHmwwe3ZuzzZyWG/ZIVA9fbDSNNgqXrpcTIjPhUjBwwtYe2uJtYIQGMb9pt8hx7MvYkpG4SYI= Received: from BYAPR11MB3301.namprd11.prod.outlook.com (2603:10b6:a03:7f::26) by BYAPR11MB3110.namprd11.prod.outlook.com (2603:10b6:a03:8a::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.29; Thu, 5 Nov 2020 00:25:33 +0000 Received: from BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f5a4:3f6b:ade3:296b]) by BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f5a4:3f6b:ade3:296b%3]) with mapi id 15.20.3499.032; Thu, 5 Nov 2020 00:25:33 +0000 From: "Ananyev, Konstantin" To: Olivier Matz , =?iso-8859-1?Q?Morten_Br=F8rup?= CC: Slava Ovsiienko , NBU-Contact-Thomas Monjalon , "dev@dpdk.org" , "techboard@dpdk.org" , Ajit Khaparde , "Andrew Rybchenko" , "Yigit, Ferruh" , "david.marchand@redhat.com" , "Richardson, Bruce" , "jerinj@marvell.com" , "honnappa.nagarahalli@arm.com" , "maxime.coquelin@redhat.com" , "stephen@networkplumber.org" , "hemant.agrawal@nxp.com" , Matan Azrad , Shahaf Shuler Thread-Topic: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half Thread-Index: AQHWr7KbUEgaQXEgpkGGumqY9vpmxqmyLNIAgADSGICAAHx6AIAASNyAgAE+RACAAVKegIAAH3IAgAARFACAAZGHgIAAm2pw Date: Thu, 5 Nov 2020 00:25:33 +0000 Message-ID: References: <20201029092751.3837177-1-thomas@monjalon.net> <3086227.yllCKDRCEA@thomas> <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk> <13044489.RHGIMAnax8@thomas> <98CBD80474FA8B44BF855DF32C47DC35C613DB@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35C613DF@smartserver.smartshare.dk> <20201104150053.GI1898@platinum> In-Reply-To: <20201104150053.GI1898@platinum> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.5.1.3 authentication-results: 6wind.com; dkim=none (message not signed) header.d=none;6wind.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [46.7.39.127] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: cb045bcb-9e31-4bb3-b08e-08d881214f09 x-ms-traffictypediagnostic: BYAPR11MB3110: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: KQaYwvRghGix1MyUhfAKFqVYFz+kvnMWdV3escjXc5czPrfU7GzIxZyAM2A/6121UbQwxDB6iX9LatxnoaFmR69vPQBplqj70abX7NBkP4pL/W8JL63H4jJCLDEiOhG2ZMR06e1CQKQ50j8C9otsoNBl3GONgWVIU7bN8+m3EW2MwDVYpjQrWzPzBz16L03InUxaU/cEKjWmMpNyTfzPJDTXodzjXVN/lRMLbW+G0ConbnDQDMGp86Lm75aL3JcAfzxRzFnvrnVdWPZaS8N8tTV3yowEbAJ06a3bpxTNmFXDjUXWUZIgJ4/6hMkkNtNOyTqjl/gKMJHN1UFjtKY6dg== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3301.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(376002)(396003)(366004)(136003)(346002)(5660300002)(2906002)(9686003)(4326008)(55016002)(478600001)(26005)(110136005)(33656002)(8676002)(66476007)(54906003)(76116006)(66946007)(66574015)(7416002)(71200400001)(64756008)(6506007)(52536014)(8936002)(83380400001)(186003)(316002)(7696005)(66556008)(86362001)(66446008); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: kw9DFQX5X0eLPojjGSAl+EdLD3gQfjWK5l4pJJLDd4r4tJQ+0l9QWsAvKvgF0T7vvk9s+TBHoo9rPXWacgk8aIS51EqFu2PEQlELhf81kXzKxZwsEL9VzSQdXb+CmHLKrz7dysjGLA9Y4/BWU701dOr9sUj9YEApsnG/nIrySrfkc853VnLVyEAkNZdHxZVFmxWd9CQUaznk46xNbWxLYVqtE9sxtjGdrusicD6VWrVo3xUL19vO4mmAs0AlsmPqmBrbmwzOhZ+TfoaLrmApYfXGbgzBS47pn2LIiqde1W3XMoRfBjzdm7G1IM4Ru65SGmdIPS83Nwj3X8Z8114tI+Q4mZ/InS2so4M+C2gc9jFyAuWuUiDm+ZHPFG0qsRhn0TpAYz8XZdSIyIce/OTlOt3/JIItC4gLOi0rXlbLcERiFOmSbnP8GzoOP7Wl+z5UNJhQ7ZA4nMKY36EB3EpwrF5O1DSrP55MTi49L7x6dcW2wUCBIajWXS9iFVHMM183GQVCl8GDjzT2T4XfRFXZJAC/6+9oMb/3gKpl1P6HLWH6ye7KFHcEhu7yzZyYa+BPa8zTPLz5s/qvVrntJ/Yur3Vi/7rZYnb5aMHHH+L4PzCLlaidpCOHZxq3Hk6Vsi7c7WI5AvQFc2/99ZaScT/Njw== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3301.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: cb045bcb-9e31-4bb3-b08e-08d881214f09 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Nov 2020 00:25:33.1335 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: J7PBz5aScmKtUkxUEROEAdP0dHQCH9IfMycKauuyVj/2wYvW3Ym/01IeB8/ygjxKex9qJ0lGBjnY6tXKTIQSkgNOX/FUcZAHCnPJf2pVqY0= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3110 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" >=20 > Hi, >=20 > On Tue, Nov 03, 2020 at 04:03:46PM +0100, Morten Br=F8rup wrote: > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko > > > Sent: Tuesday, November 3, 2020 3:03 PM > > > > > > Hi, Morten > > > > > > > From: Morten Br=F8rup > > > > Sent: Tuesday, November 3, 2020 14:10 > > > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > > > Sent: Monday, November 2, 2020 4:58 PM > > > > > > > > > > +Cc techboard > > > > > > > > > > We need benchmark numbers in order to take a decision. > > > > > Please all, prepare some arguments and numbers so we can discuss > > > the > > > > > mbuf layout in the next techboard meeting. >=20 > I did some quick tests, and it appears to me that just moving the pool > pointer to the first cache line has not a significant impact. Hmm, as I remember Thomas mentioned about 5%+ improvement with that change. Though I suppose a lot depends from actual test-case.=20 Would be good to know when it does help and when it doesn't. >=20 > However, I agree with Morten that there is some room for optimization > around m->pool: I did a hack in the ixgbe driver to assume there is only > one mbuf pool. This simplifies a lot the freeing of mbufs in Tx, because > we don't have to group them in bulks that shares the same pool (see > ixgbe_tx_free_bufs()). The impact of this hack is quite good: +~5% on a > real-life forwarding use case. I think we already have such optimization ability within DPDK: #define DEV_TX_OFFLOAD_MBUF_FAST_FREE 0x00010000 /**< Device supports optimization for fast release of mbufs. * When set application must guarantee that per-queue all mbufs comes fro= m * the same mempool and has refcnt =3D 1. */ Seems over-optimistic to me, but many PMDs do support it. >=20 > It is maybe possible to store the pool in the sw ring to avoid a later > access to m->pool. Having a pool index as suggested by Morten would also > help to reduce used room in sw ring in this case. But this is a bit > off-topic :) >=20 >=20 >=20 > > > > I propose that the techboard considers this from two angels: > > > > > > > > 1. Long term goals and their relative priority. I.e. what can be > > > achieved with > > > > wide-ranging modifications, requiring yet another ABI break and due > > > notices. > > > > > > > > 2. Short term goals, i.e. what can be achieved for this release. > > > > > > > > > > > > My suggestions follow... > > > > > > > > 1. Regarding long term goals: > > > > > > > > I have argued that simple forwarding of non-segmented packets using > > > only the > > > > first mbuf cache line can be achieved by making three > > > > modifications: > > > > > > > > a) Move m->tx_offload to the first cache line. > > > Not all PMDs use this field on Tx. HW might support the checksum > > > offloads > > > directly, not requiring these fields at all. >=20 > To me, a driver should use m->tx_offload, because the application > specifies the offset where the checksum has to be done, in case the hw > is not able to recognize the protocol. >=20 > > > > b) Use an 8 bit pktmbuf mempool index in the first cache line, > > > > instead of the 64 bit m->pool pointer in the second cache line. > > > 256 mpool looks enough, as for me. Regarding the indirect access to t= he > > > pool > > > (via some table) - it might introduce some performance impact. > > > > It might, but I hope that it is negligible, so the benefits outweigh th= e disadvantages. > > > > It would have to be measured, though. > > > > And m->pool is only used for free()'ing (and detach()'ing) mbufs. > > > > > For example, > > > mlx5 PMD strongly relies on pool field for allocating mbufs in Rx > > > datapath. > > > We're going to update (o-o, we found point to optimize), but for now = it > > > does. > > > > Without looking at the source code, I don't think the PMD is using m->p= ool in the RX datapath, I think it is using a pool dedicated to a > receive queue used for RX descriptors in the PMD (i.e. driver->queue->poo= l). > > > > > > > > > c) Do not access m->next when we know that it is NULL. > > > > We can use m->nb_segs =3D=3D 1 or some other invariant as the ga= te. > > > > It can be implemented by adding an m->next accessor function: > > > > struct rte_mbuf * rte_mbuf_next(struct rte_mbuf * m) > > > > { > > > > return m->nb_segs =3D=3D 1 ? NULL : m->next; > > > > } > > > > > > Sorry, not sure about this. IIRC, nb_segs is valid in the first > > > segment/mbuf only. > > > If we have the 4 segments in the pkt we see nb_seg=3D4 in the first o= ne, > > > and the nb_seg=3D1 > > > in the others. The next field is NULL in the last mbuf only. Am I wro= ng > > > and miss something ? > > > > You are correct. > > > > This would have to be updated too. Either by increasing m->nb_seg in th= e following segments, or by splitting up relevant functions into > functions for working on first segments (incl. non-segmented packets), an= d functions for working on following segments of segmented > packets. >=20 > Instead of maintaining a valid nb_segs, a HAS_NEXT flag would be easier > to implement. However it means that an accessor needs to be used instead > of any m->next access. >=20 > > > > Regarding the priority of this goal, I guess that simple forwarding > > > of non- > > > > segmented packets is probably the path taken by the majority of > > > packets > > > > handled by DPDK. > > > > > > > > An alternative goal could be: > > > > Do not touch the second cache line during RX. > > > > A comment in the mbuf structure says so, but it is not true anymore= . > > > > > > > > (I guess that regression testing didn't catch this because the test= s > > > perform TX > > > > immediately after RX, so the cache miss just moves from the TX to t= he > > > RX part > > > > of the test application.) > > > > > > > > > > > > 2. Regarding short term goals: > > > > > > > > The current DPDK source code looks to me like m->next is the most > > > frequently > > > > accessed field in the second cache line, so it makes sense moving > > > this to the > > > > first cache line, rather than m->pool. > > > > Benchmarking may help here. > > > > > > Moreover, for the segmented packets the packet size is supposed to be > > > large, > > > and it imposes the relatively low packet rate, so probably optimizati= on > > > of > > > moving next to the 1st cache line might be negligible at all. Just > > > compare 148Mpps of > > > 64B pkts and 4Mpps of 3000B pkts over 100Gbps link. Currently we are = on > > > benchmarking > > > and did not succeed yet on difference finding. The benefit can't be > > > expressed in mpps delta, > > > we should measure CPU clocks, but Rx queue is almost always empty - w= e > > > have an empty > > > loops. So, if we have the boost - it is extremely hard to catch one. > > > > Very good point regarding the value of such an optimization, Slava! > > > > And when free()'ing packets, both m->next and m->pool are touched. > > > > So perhaps the free()/detach() functions in the mbuf library can be mod= ified to handle first segments (and non-segmented packets) and > following segments differently, so accessing m->next can be avoided for n= on-segmented packets. Then m->pool should be moved to the > first cache line. > > >=20 > I also think that Moving m->pool without doing something else about > m->next is probably useless. And it's too late for 20.11 to do > additionnal changes, so I suggest to postpone the field move to 21.11, > once we have a clearer view of possible optimizations. >=20 > Olivier