From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 05574A04B1;
	Thu,  5 Nov 2020 01:25:50 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 49E252E8D;
	Thu,  5 Nov 2020 01:25:49 +0100 (CET)
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by dpdk.org (Postfix) with ESMTP id 1A0172C2E;
 Thu,  5 Nov 2020 01:25:45 +0100 (CET)
IronPort-SDR: 1oEnioNUw3iJcuGvrWkkK86ZISeRdRjA7X5k9S8RNsgNKmnWJ4Q6sQtc4RO+uFETFL+tY+iw5M
 QF56ru/15ivQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9795"; a="149160175"
X-IronPort-AV: E=Sophos;i="5.77,451,1596524400"; d="scan'208";a="149160175"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Nov 2020 16:25:43 -0800
IronPort-SDR: +PlzZDVGKhN0nIF5PbF7w7cJs1KMw+X3Ap9N1lRtARlf6bqUWfFEDx3GIhuDZNRpT+OHdXt2Rv
 ON3fKzEccU0Q==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.77,451,1596524400"; d="scan'208";a="363630266"
Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15])
 by FMSMGA003.fm.intel.com with ESMTP; 04 Nov 2020 16:25:43 -0800
Received: from orsmsx609.amr.corp.intel.com (10.22.229.22) by
 ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1713.5; Wed, 4 Nov 2020 16:25:42 -0800
Received: from orsmsx601.amr.corp.intel.com (10.22.229.14) by
 ORSMSX609.amr.corp.intel.com (10.22.229.22) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1713.5; Wed, 4 Nov 2020 16:25:42 -0800
Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by
 orsmsx601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5
 via Frontend Transport; Wed, 4 Nov 2020 16:25:42 -0800
Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.108)
 by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.1.1713.5; Wed, 4 Nov 2020 16:25:37 -0800
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=VEGjskBbyG+72Lualks6HT72RfK+HRzZvzRPsEt1QtHhz2Kc6oKxA/h6ko2oXjjSlfw0PNFvoSrcwq0lvYgCkpTxt8ESHxeDDMpMe1nSy9YI1yf8S4hUkOtjMcl1Kl7DjMp9SpkQ9KN3aflsehRZ0Ro9tlXNjy42NuhX+YFTA7bwbDtzGnjlvomqHLCq7rbCBx6iNY8/JJTv0zEgZWbxuuWtxfw/GUZPMWziIV4rez18YO9MSGeEn33PoxePXKR4HrHbts8+0ShHLPFLgjL19fKHFFiG+XuRqbueP8p1ZF0/HSfY7X7i3dshgW3tWnSdWtSwUwNcauVMFyGNgdP1eA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=wA8fFEJ2YsUh4yOpuC23QjtX6kn4mKEOkW8f98GDcxI=;
 b=MIbXWqxkMisOtFvpDmBkauLO69mNCyczySKfJ/SHzZh5YyadywvUlSwoCptS++qmJmcJDxCOWTSmqQ0gTBKOekAMhVmeqgjeoNd2DZ80XNfvgvpEThawz3q3H7UePl0kQm+wKRUBl/EFZ0m72Adwr2gPccvvobOdB+MwIcF8H9JKKs6LiCAsaUxHzuytA4d0mnhTw2ObV+KtevasBvwONQg1UjEmF7rGD7DZk9HygbrTqHcQspm+qzG7vf7WbbuN3vXxqXmuKnYnH+MeXB9bgl7Wr85y2Vs+UQ9wOVyaMqeEa/X4VWwzLp/6rdCv5xR3GrRRKypHgMm/Fn18YcLZHA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com;
 dkim=pass header.d=intel.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; 
 s=selector2-intel-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=wA8fFEJ2YsUh4yOpuC23QjtX6kn4mKEOkW8f98GDcxI=;
 b=Xr+SWDehxTD48hODzjG1hrV3oD/tLxBTYkKAiKiy1u1GGc9HYVtArN4dyFLN0brwnM2/cllcgPHvXCtQT9PNcaEQEBJfYfkXr9VHmwwe3ZuzzZyWG/ZIVA9fbDSNNgqXrpcTIjPhUjBwwtYe2uJtYIQGMb9pt8hx7MvYkpG4SYI=
Received: from BYAPR11MB3301.namprd11.prod.outlook.com (2603:10b6:a03:7f::26)
 by BYAPR11MB3110.namprd11.prod.outlook.com (2603:10b6:a03:8a::23)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.29; Thu, 5 Nov
 2020 00:25:33 +0000
Received: from BYAPR11MB3301.namprd11.prod.outlook.com
 ([fe80::f5a4:3f6b:ade3:296b]) by BYAPR11MB3301.namprd11.prod.outlook.com
 ([fe80::f5a4:3f6b:ade3:296b%3]) with mapi id 15.20.3499.032; Thu, 5 Nov 2020
 00:25:33 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>, =?iso-8859-1?Q?Morten_Br=F8rup?=
 <mb@smartsharesystems.com>
CC: Slava Ovsiienko <viacheslavo@nvidia.com>, NBU-Contact-Thomas Monjalon
 <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>, "techboard@dpdk.org"
 <techboard@dpdk.org>, Ajit Khaparde <ajit.khaparde@broadcom.com>, "Andrew
 Rybchenko" <andrew.rybchenko@oktetlabs.ru>, "Yigit, Ferruh"
 <ferruh.yigit@intel.com>, "david.marchand@redhat.com"
 <david.marchand@redhat.com>, "Richardson, Bruce"
 <bruce.richardson@intel.com>, "jerinj@marvell.com" <jerinj@marvell.com>,
 "honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
 "maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>, Matan Azrad
 <matan@nvidia.com>, Shahaf Shuler <shahafs@nvidia.com>
Thread-Topic: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst
 half
Thread-Index: AQHWr7KbUEgaQXEgpkGGumqY9vpmxqmyLNIAgADSGICAAHx6AIAASNyAgAE+RACAAVKegIAAH3IAgAARFACAAZGHgIAAm2pw
Date: Thu, 5 Nov 2020 00:25:33 +0000
Message-ID: <BYAPR11MB3301E6C03E11C4508064483C9AEE0@BYAPR11MB3301.namprd11.prod.outlook.com>
References: <20201029092751.3837177-1-thomas@monjalon.net>
 <3086227.yllCKDRCEA@thomas>
 <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk>
 <13044489.RHGIMAnax8@thomas>
 <98CBD80474FA8B44BF855DF32C47DC35C613DB@smartserver.smartshare.dk>
 <MWHPR12MB150109404B0B68CD9D70B4A6DF110@MWHPR12MB1501.namprd12.prod.outlook.com>
 <98CBD80474FA8B44BF855DF32C47DC35C613DF@smartserver.smartshare.dk>
 <20201104150053.GI1898@platinum>
In-Reply-To: <20201104150053.GI1898@platinum>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-reaction: no-action
dlp-version: 11.5.1.3
authentication-results: 6wind.com; dkim=none (message not signed)
 header.d=none;6wind.com; dmarc=none action=none header.from=intel.com;
x-originating-ip: [46.7.39.127]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: cb045bcb-9e31-4bb3-b08e-08d881214f09
x-ms-traffictypediagnostic: BYAPR11MB3110:
x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr
x-ms-exchange-transport-forked: True
x-microsoft-antispam-prvs: <BYAPR11MB31103DD6764907FA9D49B0399AEE0@BYAPR11MB3110.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: KQaYwvRghGix1MyUhfAKFqVYFz+kvnMWdV3escjXc5czPrfU7GzIxZyAM2A/6121UbQwxDB6iX9LatxnoaFmR69vPQBplqj70abX7NBkP4pL/W8JL63H4jJCLDEiOhG2ZMR06e1CQKQ50j8C9otsoNBl3GONgWVIU7bN8+m3EW2MwDVYpjQrWzPzBz16L03InUxaU/cEKjWmMpNyTfzPJDTXodzjXVN/lRMLbW+G0ConbnDQDMGp86Lm75aL3JcAfzxRzFnvrnVdWPZaS8N8tTV3yowEbAJ06a3bpxTNmFXDjUXWUZIgJ4/6hMkkNtNOyTqjl/gKMJHN1UFjtKY6dg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:BYAPR11MB3301.namprd11.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(4636009)(39860400002)(376002)(396003)(366004)(136003)(346002)(5660300002)(2906002)(9686003)(4326008)(55016002)(478600001)(26005)(110136005)(33656002)(8676002)(66476007)(54906003)(76116006)(66946007)(66574015)(7416002)(71200400001)(64756008)(6506007)(52536014)(8936002)(83380400001)(186003)(316002)(7696005)(66556008)(86362001)(66446008);
 DIR:OUT; SFP:1102; 
x-ms-exchange-antispam-messagedata: kw9DFQX5X0eLPojjGSAl+EdLD3gQfjWK5l4pJJLDd4r4tJQ+0l9QWsAvKvgF0T7vvk9s+TBHoo9rPXWacgk8aIS51EqFu2PEQlELhf81kXzKxZwsEL9VzSQdXb+CmHLKrz7dysjGLA9Y4/BWU701dOr9sUj9YEApsnG/nIrySrfkc853VnLVyEAkNZdHxZVFmxWd9CQUaznk46xNbWxLYVqtE9sxtjGdrusicD6VWrVo3xUL19vO4mmAs0AlsmPqmBrbmwzOhZ+TfoaLrmApYfXGbgzBS47pn2LIiqde1W3XMoRfBjzdm7G1IM4Ru65SGmdIPS83Nwj3X8Z8114tI+Q4mZ/InS2so4M+C2gc9jFyAuWuUiDm+ZHPFG0qsRhn0TpAYz8XZdSIyIce/OTlOt3/JIItC4gLOi0rXlbLcERiFOmSbnP8GzoOP7Wl+z5UNJhQ7ZA4nMKY36EB3EpwrF5O1DSrP55MTi49L7x6dcW2wUCBIajWXS9iFVHMM183GQVCl8GDjzT2T4XfRFXZJAC/6+9oMb/3gKpl1P6HLWH6ye7KFHcEhu7yzZyYa+BPa8zTPLz5s/qvVrntJ/Yur3Vi/7rZYnb5aMHHH+L4PzCLlaidpCOHZxq3Hk6Vsi7c7WI5AvQFc2/99ZaScT/Njw==
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3301.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: cb045bcb-9e31-4bb3-b08e-08d881214f09
X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Nov 2020 00:25:33.1335 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: J7PBz5aScmKtUkxUEROEAdP0dHQCH9IfMycKauuyVj/2wYvW3Ym/01IeB8/ygjxKex9qJ0lGBjnY6tXKTIQSkgNOX/FUcZAHCnPJf2pVqY0=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3110
X-OriginatorOrg: intel.com
Subject: Re: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst
 half
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>



>=20
> Hi,
>=20
> On Tue, Nov 03, 2020 at 04:03:46PM +0100, Morten Br=F8rup wrote:
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> > > Sent: Tuesday, November 3, 2020 3:03 PM
> > >
> > > Hi, Morten
> > >
> > > > From: Morten Br=F8rup <mb@smartsharesystems.com>
> > > > Sent: Tuesday, November 3, 2020 14:10
> > > >
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Monday, November 2, 2020 4:58 PM
> > > > >
> > > > > +Cc techboard
> > > > >
> > > > > We need benchmark numbers in order to take a decision.
> > > > > Please all, prepare some arguments and numbers so we can discuss
> > > the
> > > > > mbuf layout in the next techboard meeting.
>=20
> I did some quick tests, and it appears to me that just moving the pool
> pointer to the first cache line has not a significant impact.

Hmm, as I remember Thomas mentioned about 5%+ improvement
with that change. Though I suppose a lot depends from actual test-case.=20
Would be good to know when it does help and when it doesn't.

>=20
> However, I agree with Morten that there is some room for optimization
> around m->pool: I did a hack in the ixgbe driver to assume there is only
> one mbuf pool. This simplifies a lot the freeing of mbufs in Tx, because
> we don't have to group them in bulks that shares the same pool (see
> ixgbe_tx_free_bufs()). The impact of this hack is quite good: +~5% on a
> real-life forwarding use case.

I think we already have such optimization ability within DPDK:
#define DEV_TX_OFFLOAD_MBUF_FAST_FREE   0x00010000
/**< Device supports optimization for fast release of mbufs.
 *   When set application must guarantee that per-queue all mbufs comes fro=
m
 *   the same mempool and has refcnt =3D 1.
 */

Seems over-optimistic to me, but many PMDs do support it.

>=20
> It is maybe possible to store the pool in the sw ring to avoid a later
> access to m->pool. Having a pool index as suggested by Morten would also
> help to reduce used room in sw ring in this case. But this is a bit
> off-topic :)
>=20
>=20
>=20
> > > > I propose that the techboard considers this from two angels:
> > > >
> > > > 1. Long term goals and their relative priority. I.e. what can be
> > > achieved with
> > > > wide-ranging modifications, requiring yet another ABI break and due
> > > notices.
> > > >
> > > > 2. Short term goals, i.e. what can be achieved for this release.
> > > >
> > > >
> > > > My suggestions follow...
> > > >
> > > > 1. Regarding long term goals:
> > > >
> > > > I have argued that simple forwarding of non-segmented packets using
> > > only the
> > > > first mbuf cache line can be achieved by making three
> > > > modifications:
> > > >
> > > > a) Move m->tx_offload to the first cache line.
> > > Not all PMDs use this field on Tx. HW might support the checksum
> > > offloads
> > > directly, not requiring these fields at all.
>=20
> To me, a driver should use m->tx_offload, because the application
> specifies the offset where the checksum has to be done, in case the hw
> is not able to recognize the protocol.
>=20
> > > > b) Use an 8 bit pktmbuf mempool index in the first cache line,
> > > >    instead of the 64 bit m->pool pointer in the second cache line.
> > > 256 mpool looks enough, as for me. Regarding the indirect access to t=
he
> > > pool
> > > (via some table) - it might introduce some performance impact.
> >
> > It might, but I hope that it is negligible, so the benefits outweigh th=
e disadvantages.
> >
> > It would have to be measured, though.
> >
> > And m->pool is only used for free()'ing (and detach()'ing) mbufs.
> >
> > > For example,
> > > mlx5 PMD strongly relies on pool field for allocating mbufs in Rx
> > > datapath.
> > > We're going to update (o-o, we found point to optimize), but for now =
it
> > > does.
> >
> > Without looking at the source code, I don't think the PMD is using m->p=
ool in the RX datapath, I think it is using a pool dedicated to a
> receive queue used for RX descriptors in the PMD (i.e. driver->queue->poo=
l).
> >
> > >
> > > > c) Do not access m->next when we know that it is NULL.
> > > >    We can use m->nb_segs =3D=3D 1 or some other invariant as the ga=
te.
> > > >    It can be implemented by adding an m->next accessor function:
> > > >    struct rte_mbuf * rte_mbuf_next(struct rte_mbuf * m)
> > > >    {
> > > >        return m->nb_segs =3D=3D 1 ? NULL : m->next;
> > > >    }
> > >
> > > Sorry, not sure about this. IIRC, nb_segs is valid in the first
> > > segment/mbuf  only.
> > > If we have the 4 segments in the pkt we see nb_seg=3D4 in the first o=
ne,
> > > and the nb_seg=3D1
> > > in the others. The next field is NULL in the last mbuf only. Am I wro=
ng
> > > and miss something ?
> >
> > You are correct.
> >
> > This would have to be updated too. Either by increasing m->nb_seg in th=
e following segments, or by splitting up relevant functions into
> functions for working on first segments (incl. non-segmented packets), an=
d functions for working on following segments of segmented
> packets.
>=20
> Instead of maintaining a valid nb_segs, a HAS_NEXT flag would be easier
> to implement. However it means that an accessor needs to be used instead
> of any m->next access.
>=20
> > > > Regarding the priority of this goal, I guess that simple forwarding
> > > of non-
> > > > segmented packets is probably the path taken by the majority of
> > > packets
> > > > handled by DPDK.
> > > >
> > > > An alternative goal could be:
> > > > Do not touch the second cache line during RX.
> > > > A comment in the mbuf structure says so, but it is not true anymore=
.
> > > >
> > > > (I guess that regression testing didn't catch this because the test=
s
> > > perform TX
> > > > immediately after RX, so the cache miss just moves from the TX to t=
he
> > > RX part
> > > > of the test application.)
> > > >
> > > >
> > > > 2. Regarding short term goals:
> > > >
> > > > The current DPDK source code looks to me like m->next is the most
> > > frequently
> > > > accessed field in the second cache line, so it makes sense moving
> > > this to the
> > > > first cache line, rather than m->pool.
> > > > Benchmarking may help here.
> > >
> > > Moreover, for the segmented packets the packet size is supposed to be
> > > large,
> > > and it imposes the relatively low packet rate, so probably optimizati=
on
> > > of
> > > moving next to the 1st cache line might be negligible at all. Just
> > > compare 148Mpps of
> > > 64B pkts and 4Mpps of 3000B pkts over 100Gbps link. Currently we are =
on
> > > benchmarking
> > > and did not succeed yet on difference finding. The benefit can't be
> > > expressed in mpps delta,
> > > we should measure CPU clocks, but Rx queue is almost always empty - w=
e
> > > have an empty
> > > loops. So, if we have the boost - it is extremely hard to catch one.
> >
> > Very good point regarding the value of such an optimization, Slava!
> >
> > And when free()'ing packets, both m->next and m->pool are touched.
> >
> > So perhaps the free()/detach() functions in the mbuf library can be mod=
ified to handle first segments (and non-segmented packets) and
> following segments differently, so accessing m->next can be avoided for n=
on-segmented packets. Then m->pool should be moved to the
> first cache line.
> >
>=20
> I also think that Moving m->pool without doing something else about
> m->next is probably useless. And it's too late for 20.11 to do
> additionnal changes, so I suggest to postpone the field move to 21.11,
> once we have a clearer view of possible optimizations.
>=20
> Olivier