From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C9675A0548; Fri, 2 Apr 2021 16:14:41 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7B2F44069E; Fri, 2 Apr 2021 16:14:41 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by mails.dpdk.org (Postfix) with ESMTP id 4D86240142 for ; Fri, 2 Apr 2021 16:14:38 +0200 (CEST) IronPort-SDR: 51zATRrpry9OK8jELUjz5pM0iYnXORejYc8Suv2jcJCOghAG+Lz3SCsw5yRZydvaw9YbvSMji0 2pHXBdBvMaZQ== X-IronPort-AV: E=McAfee;i="6000,8403,9942"; a="180004238" X-IronPort-AV: E=Sophos;i="5.81,300,1610438400"; d="scan'208";a="180004238" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2021 07:14:37 -0700 IronPort-SDR: gwku0Jb9/uf7jg2qktEV+s5+cJdrHbZPqlsL+0uhG4+Urz5lHXkwQ65q1fBhbp+2fq8cwyoXNM ioM+AVRzVa4Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,300,1610438400"; d="scan'208";a="439628678" Received: from orsmsx604.amr.corp.intel.com ([10.22.229.17]) by fmsmga004.fm.intel.com with ESMTP; 02 Apr 2021 07:14:37 -0700 Received: from orsmsx608.amr.corp.intel.com (10.22.229.21) by ORSMSX604.amr.corp.intel.com (10.22.229.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Fri, 2 Apr 2021 07:14:36 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx608.amr.corp.intel.com (10.22.229.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2 via Frontend Transport; Fri, 2 Apr 2021 07:14:36 -0700 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.173) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2106.2; Fri, 2 Apr 2021 07:14:36 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MCmGazJSQy4/0xFShfFMm21g1pan+GOTpFBAG+Xb2XdyWnyStkQt7W8WBdK8m1mR7t9fSYw8Z8j+FiyfayvErvj6kt1Vzr+W8YLQY2fItrjFAcAHn+p7sJRJ/QOkTDsXGzSOl61OOxtbVoxagRwXeWJBsKlre/IuRmYNUi2vjWJlhHSJx/EUHIf3p6kEzwQnQGEw1xLIFYzERL97+RDOLkPrd/O6OdPEiH0STmYu4NU4dwLr2iA2RqFbddXpjxcy6+MZW5ZRorUW2CGQLAYoxUKtYQ/SFCrfyLBm6feC/MfIWkr4eUwHJb1U5JmiZkzECHGTLyzXU8oMYTIxn6NmFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TLx32goxW/VeM2y7/Qb0McEHoWBgShWfKrJKhJvGw+M=; b=VdSrfjZY1u0ZPyE5BnuySFCstjWsdBTVUlnhE/KZCatOz957zuwgDryQ3+KRrSRhKVOHsNZsoI9njL075BG6bRc6BGsrkrAXkJAwmCmbwdQZ017EhIkAg5p1T3bxd4g06+iTd3pLHfs85KMZQvX4q0Sk5FzM4Jx5zwHMTjtjmqkRhbKn/IbOox+GCSF05eqC+uSwUoFLjJVOEHgeFoRn7eYSoMHDvAdq1tiPLbtJkr58ivSmoa5aMKxYKz/Wybpdd5kXISwniqMKK/Xp/mfEVLoDORiGXTpJUuDwhRs+tJ/GS4+nfYLWwX3JamM+AS5VBHUs5n0vSRq9wZ0vyWVV8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TLx32goxW/VeM2y7/Qb0McEHoWBgShWfKrJKhJvGw+M=; b=gpkFEyO8gCTfmS6coElf97P/Sa39YWivt8lGvCpqtB4ORM0vaWeG4tg+FtmguIV2o5+KeVmA7G8Advs17O3BRoQCaYw1Sc1boZjF0iYsU7SQT7lCL8xuM/E518arKZ4ZWZKGMMlg8bDLIw3i+d/ERCiKOL3WrE70JT0jkuaRehc= Received: from CY4PR1101MB2134.namprd11.prod.outlook.com (2603:10b6:910:19::22) by CY4PR1101MB2182.namprd11.prod.outlook.com (2603:10b6:910:1e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3999.27; Fri, 2 Apr 2021 14:14:35 +0000 Received: from CY4PR1101MB2134.namprd11.prod.outlook.com ([fe80::1f:6163:d1df:d89f]) by CY4PR1101MB2134.namprd11.prod.outlook.com ([fe80::1f:6163:d1df:d89f%9]) with mapi id 15.20.3977.037; Fri, 2 Apr 2021 14:14:34 +0000 From: "Singh, Jasvinder" To: "Ananyev, Konstantin" , "dev@dpdk.org" CC: "Dumitrescu, Cristian" Thread-Topic: [PATCH v2 2/2] qos: rearrange enqueue procedure Thread-Index: AQHXG+3YxOUaybJRu066KY/yFj1NtKqhWLZQ Date: Fri, 2 Apr 2021 14:14:34 +0000 Message-ID: References: <20210316170723.22036-1-konstantin.ananyev@intel.com> <20210318115613.5503-1-konstantin.ananyev@intel.com> <20210318115613.5503-2-konstantin.ananyev@intel.com> In-Reply-To: <20210318115613.5503-2-konstantin.ananyev@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-version: 11.5.1.3 dlp-reaction: no-action dlp-product: dlpe-windows authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [78.152.207.12] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: bc67abd9-95bb-471e-5f4d-08d8f5e1a486 x-ms-traffictypediagnostic: CY4PR1101MB2182: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8273; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: QP1abVii6l0W9YOxACoHdCCWEEmrdfFt8GPzMv2WKPWHvTE/6pxC7H9QcqUCNHwgMt0trSdvfJXqcL4l7jG91LY2xSKa+D8QjoUo+gs/1uR2aa3xcSMDI7ieV7wewYZKbkh/l3yOJIbwR7SUmNkn5lkW2bLVfPON/U5iG/NsAhBziAgpmv8oe1drdVRLQDePS1qWje0UFahIIZ3JX88qHlP3EwtLWdKmOZCizE/H5oh/W4ttFgepEJdTKJ+7Z7k6ed+Fz5kee/M9yfiFfVWDztkZ7+0aa7C3z5Us54EB8nVb/4AZOpYLfQo2cOoe/BR0NFljcXEGt56YCJgG5c5sqzUx+yT/yQw9+D7jr+SDBlJZQb+80NgstlxSLUrZ2nF1n38TK1Ika4UaECifZtPnN4xnPMv0Z8UgLvi29hseg3y8qfc0O7sS0Goo5MMXFGZDKZ7kPrlt/1bArpm6t8bGCA2AnPrRMzVlunG/UVFLnCP4pbeVuAhZ7hM6MSMbmwzS3b/0AvG5EdndWFvSOTewa51TkRaqw6liOJixZ+gLgnv+tWHshMXR6fmr7XdBnLJTNNOHQEEG48ki7H2Wc1dFfjn9Z8IXz9bEXPBcabPi6CEc42CVfdOjZtbOwsalNOoi5BDnpHWNdMuHfUEbZBicDRySOsZk+jizwC4GaKW9L2E= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CY4PR1101MB2134.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(396003)(376002)(366004)(136003)(346002)(39860400002)(52536014)(86362001)(38100700001)(8676002)(7696005)(66476007)(107886003)(2906002)(66946007)(26005)(8936002)(33656002)(4326008)(71200400001)(5660300002)(186003)(316002)(66556008)(66446008)(64756008)(9686003)(55016002)(76116006)(83380400001)(6506007)(110136005)(478600001)(53546011); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?f+4YFPErGwYnH926c9OJ2kpanrZefGzO+gL2WW8vMgxJRVZbYmImLOO4YU+g?= =?us-ascii?Q?6YEJfW1rYyAdnT41MzF9m/fDmWKp34QFc82wRMdB+v5R4LeCx+OmCV62FmO7?= =?us-ascii?Q?SpGJLHE3eEASmDu7oECeQ6acy2V2UfYyGYDWKQbq0pb9GGM8Uize44i0Lmx2?= =?us-ascii?Q?4MTq+q98mOZce07HXzBm87AZn5O5E4WJv/v47FbtZISgImdF2iZ81hic0GHe?= =?us-ascii?Q?my64argOXpePJNcrUy2T3LCbTtA57Oj0QgA7wgzR8VZtYFQ4ovYbbdpmBRra?= =?us-ascii?Q?AD3qKwUF9avmqZ4e5nNPVgr6Fq2Hk24bVsfSICjTtaIZ3j0SPIeeqhL7rjHK?= =?us-ascii?Q?i21Awu+AXdzAKTHxNf9hP/zzheScNMi0J92kTYAsweJjWiZprsxqR8B42RJL?= =?us-ascii?Q?8rOEVpeVMAzBr8zxYFCf8i78ZPi3bGp20vnW6xTeYlusbHaRdG3Xj/v/ARTM?= =?us-ascii?Q?vYJsA6Q9LWvsequG4G1vv3lTWsyjB4VuI4Xt53HOR9OD06/1awv9Hd4Fo4lU?= =?us-ascii?Q?X+Ctok4ygF6FgZpVT7qREbs7Lvj6B8SMvkBuK+dkQaCVAxR4G/LcwuTMquqf?= =?us-ascii?Q?J35AS1gA7MEggX/JB6lX/HOaHIdRJARYF6lrcK7C6fcBD/Kp3Jvt5F2SIF8m?= =?us-ascii?Q?U8bSEYbaU/CrtJdr6f54Ch8BT8FGk17MNtvn7Tipd7b0VJvjCh3ONEWoVTER?= =?us-ascii?Q?h0/LUp/ZffxTbzbReH3kUbqr0/giveXxQaKxCfV985B9NkrEK4GyUZNdifRp?= =?us-ascii?Q?HBM8q9xEOcrSCLqmHVCN1D8wJUXmcvgN45tTzVZsnqarz95MiTdOE0pTJPuI?= =?us-ascii?Q?ncZegsWWWp4+FYiIbdQEoBlGbfWkAODawtWj/gdhu7yR3AR3rwVj7PeWgHYb?= =?us-ascii?Q?s/Iup7o38F1DD9a1OQZMZsUZ3iYA7/GZwFjSP1aCXeXZY8Hnjr7kk83zNtPx?= =?us-ascii?Q?IT5P3ovGfrFWuhHADNkBB6KcvxTd67ppGCCVJHD5MXtxUXROokXvBTBLd4zP?= =?us-ascii?Q?1TXkfrxZal8NIE7XJeV8Urvg5GBjejGUcNPwiXYPr7L+/rayuS5ZfQP6/821?= =?us-ascii?Q?ajFvGDBFlDMmJTVwoGsivsHtbpVsp0FVV2t84pnYeuJOG2ILFmRuDWeWVN5h?= =?us-ascii?Q?QLr93JCQqJJhbOpskLtY82a7a4QMcWMkHUAbShK82EnqbEsLb5oTo3Qf8TBD?= =?us-ascii?Q?NHRTmx3Y5cy+VNNV+6ue57djsJiziSPEPOvdogMotITpAkLPCa3vvVn3PRCH?= =?us-ascii?Q?6k0ElyA/FzGi5G9d3cy95HI0DDx/0/8bMzf0QeK+SvM5tpguapsQ6jHvLcyQ?= =?us-ascii?Q?+/Z8LiqW5r1IdHbl96GQ3m1L?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CY4PR1101MB2134.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: bc67abd9-95bb-471e-5f4d-08d8f5e1a486 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Apr 2021 14:14:34.8392 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JR+PUQaPY7XUdCp0miD9PTrje34PIYxA76upndbK4HjGA1tzCRneVrTiXCipq1iK61uKmGeq6D//FCtRmwMPAn93iUtll3fZc71NvWZ8ip0= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1101MB2182 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v2 2/2] qos: rearrange enqueue procedure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Ananyev, Konstantin > Sent: Thursday, March 18, 2021 11:56 AM > To: dev@dpdk.org > Cc: Dumitrescu, Cristian ; Singh, Jasvinde= r > ; Ananyev, Konstantin > > Subject: [PATCH v2 2/2] qos: rearrange enqueue procedure >=20 > In many usage scenarios input mbufs for rte_sched_port_enqueue() are not > yet in the CPU cache(s). That causes quite significant stalls due to memo= ry > latency. Current implementation tries to migitate it using SW pipeline an= d SW > prefetch techniques, but stalls are still present. > Rework rte_sched_port_enqueue() to do actual fetch of all mbufs metadata > as a first stage of that function. > That helps to minimise load stalls at further stages of enqueue() and > improves overall enqueue performance. > With examples/qos_sched I observed: > on ICX box: up to 30% cycles reduction > on CSX AND BDX: 20-15% cycles reduction > I also run tests with mbufs already in the cache (one core doing RX, QOS = and > TX). > With such scenario, on all mentioned above IA boxes no performance drop > was observed. >=20 > Signed-off-by: Konstantin Ananyev > --- > v2: fix clang and checkpatch complains > --- > lib/librte_sched/rte_sched.c | 219 +++++------------------------------ > 1 file changed, 31 insertions(+), 188 deletions(-) >=20 > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c = index > 7c5688068..41ef147e0 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -1861,24 +1861,23 @@ debug_check_queue_slab(struct > rte_sched_subport *subport, uint32_t bmp_pos, #endif /* > RTE_SCHED_DEBUG */ >=20 > static inline struct rte_sched_subport * -rte_sched_port_subport(struct > rte_sched_port *port, > - struct rte_mbuf *pkt) > +sched_port_subport(const struct rte_sched_port *port, struct > +rte_mbuf_sched sch) > { > - uint32_t queue_id =3D rte_mbuf_sched_queue_get(pkt); > + uint32_t queue_id =3D sch.queue_id; > uint32_t subport_id =3D queue_id >> (port- > >n_pipes_per_subport_log2 + 4); >=20 > return port->subports[subport_id]; > } >=20 > static inline uint32_t > -rte_sched_port_enqueue_qptrs_prefetch0(struct rte_sched_subport > *subport, > - struct rte_mbuf *pkt, uint32_t subport_qmask) > +sched_port_enqueue_qptrs_prefetch0(const struct rte_sched_subport > *subport, > + struct rte_mbuf_sched sch, uint32_t subport_qmask) > { > struct rte_sched_queue *q; > #ifdef RTE_SCHED_COLLECT_STATS > struct rte_sched_queue_extra *qe; > #endif > - uint32_t qindex =3D rte_mbuf_sched_queue_get(pkt); > + uint32_t qindex =3D sch.queue_id; > uint32_t subport_queue_id =3D subport_qmask & qindex; >=20 > q =3D subport->queue + subport_queue_id; @@ -1971,197 +1970,41 > @@ int rte_sched_port_enqueue(struct rte_sched_port *port, struct > rte_mbuf **pkts, > uint32_t n_pkts) > { > - struct rte_mbuf *pkt00, *pkt01, *pkt10, *pkt11, *pkt20, *pkt21, > - *pkt30, *pkt31, *pkt_last; > - struct rte_mbuf **q00_base, **q01_base, **q10_base, > **q11_base, > - **q20_base, **q21_base, **q30_base, **q31_base, > **q_last_base; > - struct rte_sched_subport *subport00, *subport01, *subport10, > *subport11, > - *subport20, *subport21, *subport30, *subport31, > *subport_last; > - uint32_t q00, q01, q10, q11, q20, q21, q30, q31, q_last; > - uint32_t r00, r01, r10, r11, r20, r21, r30, r31, r_last; > - uint32_t subport_qmask; > uint32_t result, i; > + struct rte_mbuf_sched sch[n_pkts]; > + struct rte_sched_subport *subports[n_pkts]; > + struct rte_mbuf **q_base[n_pkts]; > + uint32_t q[n_pkts]; > + > + const uint32_t subport_qmask =3D > + (1 << (port->n_pipes_per_subport_log2 + 4)) - 1; >=20 > result =3D 0; > - subport_qmask =3D (1 << (port->n_pipes_per_subport_log2 + 4)) - 1; >=20 > - /* > - * Less then 6 input packets available, which is not enough to > - * feed the pipeline > - */ > - if (unlikely(n_pkts < 6)) { > - struct rte_sched_subport *subports[5]; > - struct rte_mbuf **q_base[5]; > - uint32_t q[5]; > - > - /* Prefetch the mbuf structure of each packet */ > - for (i =3D 0; i < n_pkts; i++) > - rte_prefetch0(pkts[i]); > - > - /* Prefetch the subport structure for each packet */ > - for (i =3D 0; i < n_pkts; i++) > - subports[i] =3D rte_sched_port_subport(port, pkts[i]); > - > - /* Prefetch the queue structure for each queue */ > - for (i =3D 0; i < n_pkts; i++) > - q[i] =3D > rte_sched_port_enqueue_qptrs_prefetch0(subports[i], > - pkts[i], subport_qmask); > - > - /* Prefetch the write pointer location of each queue */ > - for (i =3D 0; i < n_pkts; i++) { > - q_base[i] =3D > rte_sched_subport_pipe_qbase(subports[i], q[i]); > - rte_sched_port_enqueue_qwa_prefetch0(port, > subports[i], > - q[i], q_base[i]); > - } > + /* Prefetch the mbuf structure of each packet */ > + for (i =3D 0; i < n_pkts; i++) > + sch[i] =3D pkts[i]->hash.sched; >=20 Hi Konstantin, thanks for the patch. In above case, all packets are touche= d straight with any prefetch. If we consider the input burst size of 64 pkt= s, it means 512 bytes of packet addresses (8 cache-lines) which is likely = to be available in cache. For larger size burst, e.g. 128 or 256, there mig= ht be instances when some addresses are not available the cache, may stall = core. How about adding explicit prefetch before starting to iterate through= the packets if that helps? =20