From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6A6EEA0548; Sun, 4 Apr 2021 01:54:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E3097140E59; Sun, 4 Apr 2021 01:54:06 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 49E914068C for ; Sun, 4 Apr 2021 01:54:04 +0200 (CEST) IronPort-SDR: c6Q+KXOTTLMeeSyuTql8zHrDQ7y+iSpb9do1FVXLclg8B2TXu+vqyIkDgpJRmptERL9VDiU03k zjLhk0fH6kJQ== X-IronPort-AV: E=McAfee;i="6000,8403,9943"; a="172091096" X-IronPort-AV: E=Sophos;i="5.81,303,1610438400"; d="scan'208";a="172091096" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2021 16:54:03 -0700 IronPort-SDR: wHOZDRki1g4bliRkaWipbPioNwKr1QWht5UKTSDNhzq3cbQP+6xz4z18cVjyFcz6E4JqY/9f3p SqBFmdJxBxsA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,303,1610438400"; d="scan'208";a="447416325" Received: from fmsmsx604.amr.corp.intel.com ([10.18.126.84]) by fmsmga002.fm.intel.com with ESMTP; 03 Apr 2021 16:54:02 -0700 Received: from fmsmsx607.amr.corp.intel.com (10.18.126.87) by fmsmsx604.amr.corp.intel.com (10.18.126.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Sat, 3 Apr 2021 16:54:01 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx607.amr.corp.intel.com (10.18.126.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2 via Frontend Transport; Sat, 3 Apr 2021 16:54:01 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.169) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2106.2; Sat, 3 Apr 2021 16:54:00 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=F5Vw58cmUhi3WD0XRJtwbO0Xz7FHO73+SyfHbgsyAW3xGH2PA41Fl3Tq3zPITq26B0s7/jIuKlnJgthhIW9wlfo0UQohVA3qjSqwoCMuMTX7jLx4TKZlgqogPW2uM8/1EA50uesTeCat3+uY3tcuXje2Sc/pHJ7yw5onvJtb5I/z9+iEa3UaGEHBqL+Cc3ozFpaP0Gw+2/e2qwY8WJzNzX4ej4GWNFOEtZ64g2yBGZVrtR54N/bag9SKMYVjKzOfz38CZ831jVv10DmvntDrNJVcBYAW8vGYKFWA0cjvk8YtCwpyXIVE08IeV1Pb2pSNEp7k2n09/+J8S3JPiQ024w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=E9b4cuJRLqBO+T/aDz5VDbunyZQlSxM5EVdf2lhdZ1A=; b=ZusM0KvCsa6DD8lr2O+R4aEGvSM/JrEjmNH/fj7YoUJqp0R7oLd62tavn5/OhGJmOGl5zddfeOznbas+oUZFn+EYV19v+BYLLWATkVl7VdifrZAHz4JZPg4rLvOZF9CZ9sZnI2mHaXsP9YyLps5WvIjqVRS2cPilYRUdw13uyb8kA6LOiGNmoJwVRPcSZjfQjQFWj3IwdNz/sHuLnOe1EzmZk7UfkI2qRn0cjCjhvlfo7wYakxVuXJi+b1LT2dJoQhn8CfluRA/aWfeTtFPfkvCvCHBzMfOsbVmyDK94m+SbCR0WU+n1VLQ3g9fuvZicklbPXf9joucBxk921CKYMg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=E9b4cuJRLqBO+T/aDz5VDbunyZQlSxM5EVdf2lhdZ1A=; b=FryV948HZLCBx2FlIY8KmtotWosyd++m4Ng8K11jkuiJaWuQr1DfATDfMgA3GYJcs6LqyOdE5hWg6a95wB2qH1hpUVVgLIj5BZIJW56C7i8tGrZtdzdNAgHP2yYRIAHTHhihCGNpV+WdTdF4GeHY/Vmn6jJuQvqTnp2g2a8sB0A= Received: from DM6PR11MB4491.namprd11.prod.outlook.com (2603:10b6:5:204::19) by DM6PR11MB3530.namprd11.prod.outlook.com (2603:10b6:5:72::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3999.27; Sat, 3 Apr 2021 23:53:59 +0000 Received: from DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::3182:6da2:8c64:f07a]) by DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::3182:6da2:8c64:f07a%3]) with mapi id 15.20.3999.032; Sat, 3 Apr 2021 23:53:59 +0000 From: "Ananyev, Konstantin" To: "Dumitrescu, Cristian" , "Singh, Jasvinder" , "dev@dpdk.org" Thread-Topic: [PATCH v2 2/2] qos: rearrange enqueue procedure Thread-Index: AQHXG+3YaRpteLHwfkObmmkmunSsp6qhXScAgAB01ICAAblisA== Date: Sat, 3 Apr 2021 23:53:58 +0000 Message-ID: References: <20210316170723.22036-1-konstantin.ananyev@intel.com> <20210318115613.5503-1-konstantin.ananyev@intel.com> <20210318115613.5503-2-konstantin.ananyev@intel.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.5.1.3 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [109.255.184.192] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: d44e5571-e65d-4668-b80d-08d8f6fbc006 x-ms-traffictypediagnostic: DM6PR11MB3530: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ClJZR84QB0B9fbkQnke26sqH5EEIqa9298cBdsJElELWtmmv4jBN04NwXQmr9M71w7g31iVqJpO3nzqlQoWIM9dV4wYKLXMZLbL0NnTXyxm9dsReOhEECstrUSvTT6s8JuJ28ZP8is4F9jLJ2pgVgZR+lAWshq5xNTSrqOw1CsPC/ZFQpHeIXqiOuZ5x63IgxUKGpo/AXPeUPv7L3I8L39as5B0JKRlOw7/W4GOa2RghtvMve+YCkhVj4AV5cBdY8fftiLyf6QfHoAdTrJV8qs51pkNrgvI0kchK05dKwZW+5ul7H+3UbUCjGsLqIT8ZAmZIZhNiDV74k6JRAEt8DeVBY5QcjxRqucDR9PdNmfLWZSbq/lBM2QYQC/NDS02KFIyzcqOFFHTkpdir9cpwS9kbzDwCJEp0l3SzLfOBt84ICs5pVloMrfKit5k7Fl/YnrpJsgX3rLfftvv0xrqLQNgyjK3Xj2UVcjHQlDNzp4Urr7RwqMmr5XJDlzfkJkVmIEtvbK4PGHttappkWNbz9I+eqIXJ/KkIbltLXCyurx+QzbwD8GwavTKrpyG1y6kzXuuIfWlfuKS6XmzMXTDBxBois/l4HvSNutULsA2XaGo8k2QtCgbZQwpQg3/dMUYfd5ubX+9XNMiJZpen0u/Ai8QAas5cae5xh0Ct4ja1t2c= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB4491.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(136003)(396003)(376002)(39860400002)(346002)(366004)(186003)(8936002)(76116006)(26005)(52536014)(9686003)(66556008)(66446008)(71200400001)(478600001)(2906002)(66476007)(66946007)(86362001)(64756008)(7696005)(83380400001)(6506007)(8676002)(55236004)(55016002)(110136005)(316002)(5660300002)(38100700001)(33656002); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?ciLJWx7KIKPpJD4MtGlRBCcwJGgIDOCPqfM99i+lwDscoKAuMKRd6jwpSP1q?= =?us-ascii?Q?B1FfVQdrpHmC3JvJf5OsFQcG9ImykflfdWtMeUuw9YAdht9mkcO1j9t5Al1e?= =?us-ascii?Q?9tAQysJwbPyVMPBc7fOeqtpTND9YprJBHnwLNwfkPZs0yTP2M4l81idhUyWR?= =?us-ascii?Q?R03oRj/1m/g+8mVoUwc/46KSG+oueoocyVmc/LUT22jAJ3LMF9HO+FR67vYs?= =?us-ascii?Q?LfaBfV7N4ffWMt+DXT8spID7HTfy0vNifL9pKF0DY4wfom9W3r1elDIPKQm4?= =?us-ascii?Q?lKgodOa+FuNVjYBNJDIGQ7JvoHCHd7BfoNYFcg+Z7MXAQoq+h7f5BoPgJAaU?= =?us-ascii?Q?hRQ27xsNZJow1J4pM6PExR0cf31rvTW4diIbiBQ731/Hspm/Q0QT2+GSM23V?= =?us-ascii?Q?f4OepU+NQKLPduAl1F4rV4IxzXbdyL380eEmU6rppN8p7sM+GVuVvhZTHLfY?= =?us-ascii?Q?8WCqq93QXl/mX7Bte35YS88rXl8WqUzU4IhJIO55CBxRlooIq9wQ0wPPi+Jn?= =?us-ascii?Q?BvQ0a4m/Ht9DoW95ygCNuLkLHWB94YaraV1guQl0dMKNswz/q9jqLhWHB8to?= =?us-ascii?Q?qsIsImi0DwDzyrSfXrE+nW/vKSptJFwFGKIBlre0W3SghhjyFw1LVXAOt4S3?= =?us-ascii?Q?P3GNy8vM9KjgJ7+2OT2WH40jhOnGCvHNvzPIGKXLSt2oR68xiLcl5VpfBgvv?= =?us-ascii?Q?GZZA/r7vnZ7ERic958Pl00jeTylrYIS1zC6C4JTpJhIgAlFtNoZHaea3Wfoq?= =?us-ascii?Q?uFBSsPx84qYMOmaGn1qyaKO1QCZm9WhXJozFWQeDdmMtEEXIPlfZUkZF/ks6?= =?us-ascii?Q?9q7ACsKg47Hk2ZZUlVW2qiJSwzACB+h6iWgcpxXeh8F42XOzvfaafZ+MUHdl?= =?us-ascii?Q?vAdjnVZF1Ax8ONXtu4i/uDSf86zyDXIXJB3vFpa26jWkIOe0tU6TK6PXUIY4?= =?us-ascii?Q?RVjTxWrNXPVL5UGiO3irIQYtHwa7rWZPVeGcnRjoRlRl/21jeweC2Ke7T5ed?= =?us-ascii?Q?38pzY1zqmcbiLqORo8xFKSzg7TVM4VInA0oGtIUWWMOl8YL12VPG6APDI+r2?= =?us-ascii?Q?60gqLV/kwI0zI2kQdiGwdwAWpvkzLuySIDHOdcZulXRpt0SDJ3G7npafYLMc?= =?us-ascii?Q?zHBQ3WZGVc8fuOXbcBTAxjgNggwKGUjYzz7kOLk4FbXd5Bs6vkKznTkgXozb?= =?us-ascii?Q?2/tqCuTSvmw2gb0ZMgsRYOAbZ+Bm1xFn2x9RaElku7nxJ8I/EzQ6jZel6VgN?= =?us-ascii?Q?Acwk2XUYQjRDTLOdQHDzFTrQDXtGSOLo7N48O8sMZcUXYLax/WHGcTa1wIrZ?= =?us-ascii?Q?50wjiC8SldgLeCjlubgvW50k?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB4491.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: d44e5571-e65d-4668-b80d-08d8f6fbc006 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Apr 2021 23:53:59.0625 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ll3of4TAQeI6+IIsXFLOWkF2Zvh0yLcBFtuROSssnwxT2AucU6pzjjzv9IlnUKuPe4v5BbCrjDxgBLk4O6m0KbgsF2ssQItXaTPVA8U6vh8= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR11MB3530 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v2 2/2] qos: rearrange enqueue procedure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi guys, > > > In many usage scenarios input mbufs for rte_sched_port_enqueue() are > > not > > > yet in the CPU cache(s). That causes quite significant stalls due to = memory > > > latency. Current implementation tries to migitate it using SW pipelin= e and > > SW > > > prefetch techniques, but stalls are still present. > > > Rework rte_sched_port_enqueue() to do actual fetch of all mbufs > > metadata > > > as a first stage of that function. > > > That helps to minimise load stalls at further stages of enqueue() and > > > improves overall enqueue performance. > > > With examples/qos_sched I observed: > > > on ICX box: up to 30% cycles reduction > > > on CSX AND BDX: 20-15% cycles reduction > > > I also run tests with mbufs already in the cache (one core doing RX, = QOS > > and > > > TX). > > > With such scenario, on all mentioned above IA boxes no performance dr= op > > > was observed. > > > > > > Signed-off-by: Konstantin Ananyev > > > --- > > > v2: fix clang and checkpatch complains > > > --- > > > lib/librte_sched/rte_sched.c | 219 +++++----------------------------= -- > > > 1 file changed, 31 insertions(+), 188 deletions(-) > > > > > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sche= d.c > > index > > > 7c5688068..41ef147e0 100644 > > > --- a/lib/librte_sched/rte_sched.c > > > +++ b/lib/librte_sched/rte_sched.c > > > @@ -1861,24 +1861,23 @@ debug_check_queue_slab(struct > > > rte_sched_subport *subport, uint32_t bmp_pos, #endif /* > > > RTE_SCHED_DEBUG */ > > > > > > static inline struct rte_sched_subport * -rte_sched_port_subport(str= uct > > > rte_sched_port *port, > > > - struct rte_mbuf *pkt) > > > +sched_port_subport(const struct rte_sched_port *port, struct > > > +rte_mbuf_sched sch) > > > { > > > - uint32_t queue_id =3D rte_mbuf_sched_queue_get(pkt); > > > + uint32_t queue_id =3D sch.queue_id; > > > uint32_t subport_id =3D queue_id >> (port- > > > >n_pipes_per_subport_log2 + 4); > > > > > > return port->subports[subport_id]; > > > } > > > > > > static inline uint32_t > > > -rte_sched_port_enqueue_qptrs_prefetch0(struct rte_sched_subport > > > *subport, > > > - struct rte_mbuf *pkt, uint32_t subport_qmask) > > > +sched_port_enqueue_qptrs_prefetch0(const struct rte_sched_subport > > > *subport, > > > + struct rte_mbuf_sched sch, uint32_t subport_qmask) > > > { > > > struct rte_sched_queue *q; > > > #ifdef RTE_SCHED_COLLECT_STATS > > > struct rte_sched_queue_extra *qe; > > > #endif > > > - uint32_t qindex =3D rte_mbuf_sched_queue_get(pkt); > > > + uint32_t qindex =3D sch.queue_id; > > > uint32_t subport_queue_id =3D subport_qmask & qindex; > > > > > > q =3D subport->queue + subport_queue_id; @@ -1971,197 +1970,41 > > > @@ int rte_sched_port_enqueue(struct rte_sched_port *port, struct > > > rte_mbuf **pkts, > > > uint32_t n_pkts) > > > { > > > - struct rte_mbuf *pkt00, *pkt01, *pkt10, *pkt11, *pkt20, *pkt21, > > > - *pkt30, *pkt31, *pkt_last; > > > - struct rte_mbuf **q00_base, **q01_base, **q10_base, > > > **q11_base, > > > - **q20_base, **q21_base, **q30_base, **q31_base, > > > **q_last_base; > > > - struct rte_sched_subport *subport00, *subport01, *subport10, > > > *subport11, > > > - *subport20, *subport21, *subport30, *subport31, > > > *subport_last; > > > - uint32_t q00, q01, q10, q11, q20, q21, q30, q31, q_last; > > > - uint32_t r00, r01, r10, r11, r20, r21, r30, r31, r_last; > > > - uint32_t subport_qmask; > > > uint32_t result, i; > > > + struct rte_mbuf_sched sch[n_pkts]; > > > + struct rte_sched_subport *subports[n_pkts]; > > > + struct rte_mbuf **q_base[n_pkts]; > > > + uint32_t q[n_pkts]; > > > + > > > + const uint32_t subport_qmask =3D > > > + (1 << (port->n_pipes_per_subport_log2 + 4)) - 1; > > > > > > result =3D 0; > > > - subport_qmask =3D (1 << (port->n_pipes_per_subport_log2 + 4)) - 1; > > > > > > - /* > > > - * Less then 6 input packets available, which is not enough to > > > - * feed the pipeline > > > - */ > > > - if (unlikely(n_pkts < 6)) { > > > - struct rte_sched_subport *subports[5]; > > > - struct rte_mbuf **q_base[5]; > > > - uint32_t q[5]; > > > - > > > - /* Prefetch the mbuf structure of each packet */ > > > - for (i =3D 0; i < n_pkts; i++) > > > - rte_prefetch0(pkts[i]); > > > - > > > - /* Prefetch the subport structure for each packet */ > > > - for (i =3D 0; i < n_pkts; i++) > > > - subports[i] =3D rte_sched_port_subport(port, pkts[i]); > > > - > > > - /* Prefetch the queue structure for each queue */ > > > - for (i =3D 0; i < n_pkts; i++) > > > - q[i] =3D > > > rte_sched_port_enqueue_qptrs_prefetch0(subports[i], > > > - pkts[i], subport_qmask); > > > - > > > - /* Prefetch the write pointer location of each queue */ > > > - for (i =3D 0; i < n_pkts; i++) { > > > - q_base[i] =3D > > > rte_sched_subport_pipe_qbase(subports[i], q[i]); > > > - rte_sched_port_enqueue_qwa_prefetch0(port, > > > subports[i], > > > - q[i], q_base[i]); > > > - } > > > + /* Prefetch the mbuf structure of each packet */ > > > + for (i =3D 0; i < n_pkts; i++) > > > + sch[i] =3D pkts[i]->hash.sched; > > > > > > > Hi Konstantin, thanks for the patch. In above case, all packets are to= uched > > straight with any prefetch. If we consider the input burst size of 64 p= kts, it > > means 512 bytes of packet addresses (8 cache-lines) which is likely to= be > > available in cache. For larger size burst, e.g. 128 or 256, there might= be > > instances when some addresses are not available the cache, may stall co= re. > > How about adding explicit prefetch before starting to iterate through t= he > > packets if that helps? I don't think we need any prefetch here. pkts[] is a sequential array, HW prefetcher should be able to do good job h= ere. Again in majority of use-cases pkts[] contents will already present in the = cache. Though there is a valid concern here: n_pkts can be big, in that case we pr= obably don't want to store too much on the stack and read too much from pkts[]. It is better to work in some fixed chunks (64 or so). I can prepare v2 with these changes, if there still is an interest in this = patch. =20 > Exactly. Konstantin, you might not be a fan of prefetches, but the curren= t enqueue implementation (as well as the dequeue) uses a prefetch > state machine. Please keep the prefetch state machine in the scalar code. It is not about our own preferences. >From my measurements new version is faster and it is definitely simpler. > Even if the examples/qos_sched might not show an advantage, > this is just a sample app and there are some more relevant use-cases as w= ell. Well, I hope that examples/qos_sched reflects at least some real-world use-= cases for QOS library. Otherwise why do we have it inside DPDK codebase?=20 About 'more relevant use-cases': if you do know such, can you try them with= the patch? I would really appreciate that. In fact, it is an ask not only to Cristian, but to all other interested par= ties: if your app does use librte_sched - please try this patch and provide the f= eedback. If some tests would flag a regression - I am absolutely ok to drop the patc= h. Konstantin