From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id CD8CCA0548; Fri, 2 Apr 2021 23:12:52 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 51DA240150; Fri, 2 Apr 2021 23:12:52 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 017F140142 for ; Fri, 2 Apr 2021 23:12:49 +0200 (CEST) IronPort-SDR: tKm2LhtEuXhRD2DkCaLFB0SaOgzd53JS2vfjDhfOW605xUI1/YsFIhz0230vhV0C/0GTV07oeb qVL7k11tq5yA== X-IronPort-AV: E=McAfee;i="6000,8403,9942"; a="192638542" X-IronPort-AV: E=Sophos;i="5.81,300,1610438400"; d="scan'208";a="192638542" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2021 14:12:48 -0700 IronPort-SDR: MBJQDGKH+OSAUV0BnuMrG+vYKKwdWEXrRZUObStvRhOF7Z4MV9byVO7ll8v4DA4uI/Pc5Z1ebX 5vNb8do4lSuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,300,1610438400"; d="scan'208";a="419811835" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by orsmga008.jf.intel.com with ESMTP; 02 Apr 2021 14:12:48 -0700 Received: from orsmsx609.amr.corp.intel.com (10.22.229.22) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Fri, 2 Apr 2021 14:12:47 -0700 Received: from orsmsx605.amr.corp.intel.com (10.22.229.18) by ORSMSX609.amr.corp.intel.com (10.22.229.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Fri, 2 Apr 2021 14:12:47 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx605.amr.corp.intel.com (10.22.229.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2 via Frontend Transport; Fri, 2 Apr 2021 14:12:47 -0700 Received: from NAM04-SN1-obe.outbound.protection.outlook.com (104.47.44.56) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2106.2; Fri, 2 Apr 2021 14:12:47 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cvDw6ygFtwHKx3Uf8u3MlEuyB0ePdpzzfcOcQJCQ1jSakrx1M6GyATJoscW5iM5Zv2zaHVvo12zL7Zunq92AvVKbf1o0fPOgut4klvT8uqFyHw6EjNfTWAFZHHsBhNyXhHyBdfKjX3pI+dsAInkravKScoA4hWz4TjIlF1fFtG41GG4UGbnk+Z6oMsJfWhTJ/FdIjBIOL89B6QkfOJ0vDyrXwPl8F8/FEHJ+TBDUJcH2ApGJYhOZssotR4+xxfA15elwikDkNvspvaG0fb3XSGECZsnHd4AL40my2kiqhkMZxJS4xflcxvzssHKfQ54Ug3R7waML58iI00gWYNsZQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8yMFeXKFbk2g93eRJ8k5XhELzLEB0RclmSRWiyWPOZc=; b=Fh8/obRQsZahLabipIboWQXl7Nax1O95L1Yd5xGqdgiOmVE8X5HHJzfRcU0hVEYxNRInZlNEzEZF9TlJA8szNU2y0BmOVC9Suib7CrSA8j/GNCkbJd/N3HXZ4s5Gem0+CZ8L4sqdgecpFoXLXLVlGnuvSawxkdRsOSAz2rDnbyoqWZF1dwYUY3qvvs+b+fufXDSG+Qqsx7LXM8pnIm8v80iUAcR6b9/dnWl0vi1/ODDnDNNzYqUn3DZ730hKIn9QNccpZI2OmOiWNml0N8iUW6tMBziaA3GGb1q9rHfzvk0IrN1uspqua+U7kpBHxH5M1nG1NpFqpdci6K72DYjU7w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8yMFeXKFbk2g93eRJ8k5XhELzLEB0RclmSRWiyWPOZc=; b=CLIZd5Tv6keQyzh/A6L3bgMuc7xt6TRI+DbZg4jourRLc3nSLScKVVwxUEubUpz5dP2pBekLsbO/YZ5jp4KUTfsjE0zDu2/haTwyfbJ6Q+V0T7oNNNZ/xFfOElXWIVtVKq53vAZBA8BZULuttDqZWD4eOmUSXoIzobn9qhH+oRE= Received: from DM6PR11MB2796.namprd11.prod.outlook.com (2603:10b6:5:c8::23) by DM5PR1101MB2297.namprd11.prod.outlook.com (2603:10b6:4:54::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3999.29; Fri, 2 Apr 2021 21:12:44 +0000 Received: from DM6PR11MB2796.namprd11.prod.outlook.com ([fe80::b5d2:6f20:3c0:3627]) by DM6PR11MB2796.namprd11.prod.outlook.com ([fe80::b5d2:6f20:3c0:3627%7]) with mapi id 15.20.3999.028; Fri, 2 Apr 2021 21:12:43 +0000 From: "Dumitrescu, Cristian" To: "Singh, Jasvinder" , "Ananyev, Konstantin" , "dev@dpdk.org" Thread-Topic: [PATCH v2 2/2] qos: rearrange enqueue procedure Thread-Index: AQHXG+3Y3R+v0ZMUIESpw248ks+w76qhXScAgABzwJA= Date: Fri, 2 Apr 2021 21:12:43 +0000 Message-ID: References: <20210316170723.22036-1-konstantin.ananyev@intel.com> <20210318115613.5503-1-konstantin.ananyev@intel.com> <20210318115613.5503-2-konstantin.ananyev@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-version: 11.5.1.3 dlp-reaction: no-action dlp-product: dlpe-windows authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [109.79.23.215] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 6543dc7c-d29f-4c95-a02b-08d8f61c0ea2 x-ms-traffictypediagnostic: DM5PR1101MB2297: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: MKhG1b6yCfXZz7431gK0rqqLz9O5rSEbo6l2TFl3sf+J5vof9HbxDs2k3FrwX3ZOW0D26Jisq+fyHBwK6+xBXpyAFqBnHfNCU8On1U/D2ZP5/ooSUquNBCxBkH47OoNUoDUI8NiEHHK8yBUMctPQp0Q+6/yG7UDyJbMknbS1jd2TDZkn18XpOrhFO3pcKuqONrzFdJwzyYYNQb07OAhWFcoH2tQP0UQhWxWkEjUU5EcQyNI0SNf6rhhhV4lazq6j8Tx2+iyiEZEXyAQ0MxPXscMUXBPVa6PSjkGkFmVYR8iPROkN2ahACl3dmPiO/ps5RpxMtrnJcrZLI0TE5JFQJkaTbPHxL7EiF+r3/a2pZvzXEEajn5dwWxWGwjAS+8Bklc2Nq8kjA7k0H6O3dbuTiZzCTyDvd4PdZ27xY8BeQJiPQKDr/eZYeUWoCzrTKrzakxUMAeh9wI6Y7Z6qCnSowUd0ot/ARfM1tNhGd80UajUnE81IznF//PHdLeFKQOmPAtP7qi/dNmRqXcYytpA3PLhfjqi4szxmdBLUYa7kvJFxM9dYZjj3/VBfOpk4LRBHYhOPbPYA70/osG8gK+VCUWOUTY63an0Exj8yO1m0xGa8KjA31KFTykunGszEkb8U8vKOXneRjt8wvBwKMaoQPEXcLdUilIFpyOHABZdUDCA= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB2796.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(376002)(346002)(366004)(396003)(39860400002)(136003)(2906002)(8936002)(110136005)(55016002)(83380400001)(8676002)(316002)(26005)(186003)(53546011)(66946007)(478600001)(5660300002)(33656002)(66446008)(64756008)(66556008)(66476007)(71200400001)(52536014)(86362001)(76116006)(6506007)(9686003)(7696005)(38100700001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?IT1yxynkfndZKOvS6x95aOazRbHoeGanHbOHqc95Rt0qQN4mZqEwpeYVeTRd?= =?us-ascii?Q?Geh2/1y89c7quDQyg723sdwAmqD2WyzG77GqXOrcJ8oRVNHDIDMMIEtEBRzu?= =?us-ascii?Q?kseYsOdi6FKL/CZEEgBkqkHn7c5IlWVzMpAuw6DDNGJmcfyMICDsDzeBNvxI?= =?us-ascii?Q?lM+jBomAahR9RQ00FN0LlcHejKEHNjduTas9xEKS9+WkYlMhaXEvibcjuoUG?= =?us-ascii?Q?sClGAJdxEJX+IOr/nvxHlr4IHKf62H4Rn/fQXjxPNPn4kmnpC0nsEr9kl0z0?= =?us-ascii?Q?p2u70fMwkQnYo3y9+KnXvAXLXstdhmgXyc54lpoGwy6cvQpKFetShCRNEq1n?= =?us-ascii?Q?+yW+TAo7gcanM6KuD8dN6Pvp7Nz/SWXm+CtxdsPWreaKb9lKHKDWo20idSle?= =?us-ascii?Q?6bSZ3b4JMMjrzFXB1yEpruSp2WcrgHp3J2TXC/j9rNwtaYq7JDDsi43pY99f?= =?us-ascii?Q?Wdd1fugKP69X+3LoNPgzYHxelPFu9sIWA9USDHJfeMdZm6f1hgGBEE/hT/8s?= =?us-ascii?Q?G9phiNdYJcg0WTQj/6b7VLB3/5upJZaP3UFKAxhdNa2jlgWRPPEak73oiiZb?= =?us-ascii?Q?SS90cesE/ayRrqojVjaM46Eu5Bm73G6FTKqkS0fFuLAU2RciKJtgnd8qbXTQ?= =?us-ascii?Q?DfGcHWXr0cnt0NzdiokY+ctXebR/bhnR2AeRV2b8npkqKHINV7YtOACRfaVa?= =?us-ascii?Q?2y241WEnYCQNpNeK0M4RP5g5ne8oUKp9mMgTb3s4kdJJrpp7pl6VdVwAt35w?= =?us-ascii?Q?MYceX0CBaVqAoJjYdH3U6v2wxLnLot/J6aRJnEzUxMCNPYppgBG6leQLTdsF?= =?us-ascii?Q?NcAMVYEJbLR0ztLw7Fg7x1OgbgYycHi3NYz0LdBC4gXz+BdBG1XGAgpge54X?= =?us-ascii?Q?D+PXiidOW57JqJi7XWmPKVplGpgBsIwqeLzBo09dQuRqvvTxHXDeznhqh1un?= =?us-ascii?Q?Z0s8+KIbvhmnOW0aWegd24sKdzJlAOPw5mJ2yDLWjRjYjZpNk1YhHEsfc3Rl?= =?us-ascii?Q?UHSkl3OTS2UIrqU0cGnGD8RAFXYAtViLQdcImsoytZYdd9a8FR6QzLBHD3Be?= =?us-ascii?Q?f7d8whTHJcw2MyKkG14pht/gJYfU7NOQhsaa7rHv1ZJ/aBD75UGzW/WcoIfl?= =?us-ascii?Q?86WmqzPGIYtpV0KayMsRl3rz5Wy6T5B6J3QDj/GO8FJOWTV447yQejkl40GV?= =?us-ascii?Q?oW6bMaOLJRHJwJrkwBOzhO7+ELEJTAJLK9eeFkN7o7F8CrFG4e7CmUEh2gF5?= =?us-ascii?Q?24zFBK+SBrpt7D/ZdETK6KcYGtCk3rQfkLTJLyCnNxbN/aD8rsfi6phBQG5c?= =?us-ascii?Q?S8PyAVXPQLqwXlWGkkegyQUM?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB2796.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6543dc7c-d29f-4c95-a02b-08d8f61c0ea2 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Apr 2021 21:12:43.6961 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: zZFXd2vmmqhv4qC9GLow4LLm1AY9RYI4sJ9sQXN18yUXXQy5ktuu1yzDOCc/KkuQxRmo+nh+PwjGpbPK1gr43KCjhTn/CPesVjvAWp8czNY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR1101MB2297 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v2 2/2] qos: rearrange enqueue procedure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Singh, Jasvinder > Sent: Friday, April 2, 2021 3:15 PM > To: Ananyev, Konstantin ; dev@dpdk.org > Cc: Dumitrescu, Cristian > Subject: RE: [PATCH v2 2/2] qos: rearrange enqueue procedure >=20 >=20 >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Thursday, March 18, 2021 11:56 AM > > To: dev@dpdk.org > > Cc: Dumitrescu, Cristian ; Singh, Jasvin= der > > ; Ananyev, Konstantin > > > > Subject: [PATCH v2 2/2] qos: rearrange enqueue procedure > > > > In many usage scenarios input mbufs for rte_sched_port_enqueue() are > not > > yet in the CPU cache(s). That causes quite significant stalls due to me= mory > > latency. Current implementation tries to migitate it using SW pipeline = and > SW > > prefetch techniques, but stalls are still present. > > Rework rte_sched_port_enqueue() to do actual fetch of all mbufs > metadata > > as a first stage of that function. > > That helps to minimise load stalls at further stages of enqueue() and > > improves overall enqueue performance. > > With examples/qos_sched I observed: > > on ICX box: up to 30% cycles reduction > > on CSX AND BDX: 20-15% cycles reduction > > I also run tests with mbufs already in the cache (one core doing RX, QO= S > and > > TX). > > With such scenario, on all mentioned above IA boxes no performance drop > > was observed. > > > > Signed-off-by: Konstantin Ananyev > > --- > > v2: fix clang and checkpatch complains > > --- > > lib/librte_sched/rte_sched.c | 219 +++++------------------------------ > > 1 file changed, 31 insertions(+), 188 deletions(-) > > > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.= c > index > > 7c5688068..41ef147e0 100644 > > --- a/lib/librte_sched/rte_sched.c > > +++ b/lib/librte_sched/rte_sched.c > > @@ -1861,24 +1861,23 @@ debug_check_queue_slab(struct > > rte_sched_subport *subport, uint32_t bmp_pos, #endif /* > > RTE_SCHED_DEBUG */ > > > > static inline struct rte_sched_subport * -rte_sched_port_subport(struc= t > > rte_sched_port *port, > > - struct rte_mbuf *pkt) > > +sched_port_subport(const struct rte_sched_port *port, struct > > +rte_mbuf_sched sch) > > { > > - uint32_t queue_id =3D rte_mbuf_sched_queue_get(pkt); > > + uint32_t queue_id =3D sch.queue_id; > > uint32_t subport_id =3D queue_id >> (port- > > >n_pipes_per_subport_log2 + 4); > > > > return port->subports[subport_id]; > > } > > > > static inline uint32_t > > -rte_sched_port_enqueue_qptrs_prefetch0(struct rte_sched_subport > > *subport, > > - struct rte_mbuf *pkt, uint32_t subport_qmask) > > +sched_port_enqueue_qptrs_prefetch0(const struct rte_sched_subport > > *subport, > > + struct rte_mbuf_sched sch, uint32_t subport_qmask) > > { > > struct rte_sched_queue *q; > > #ifdef RTE_SCHED_COLLECT_STATS > > struct rte_sched_queue_extra *qe; > > #endif > > - uint32_t qindex =3D rte_mbuf_sched_queue_get(pkt); > > + uint32_t qindex =3D sch.queue_id; > > uint32_t subport_queue_id =3D subport_qmask & qindex; > > > > q =3D subport->queue + subport_queue_id; @@ -1971,197 +1970,41 > > @@ int rte_sched_port_enqueue(struct rte_sched_port *port, struct > > rte_mbuf **pkts, > > uint32_t n_pkts) > > { > > - struct rte_mbuf *pkt00, *pkt01, *pkt10, *pkt11, *pkt20, *pkt21, > > - *pkt30, *pkt31, *pkt_last; > > - struct rte_mbuf **q00_base, **q01_base, **q10_base, > > **q11_base, > > - **q20_base, **q21_base, **q30_base, **q31_base, > > **q_last_base; > > - struct rte_sched_subport *subport00, *subport01, *subport10, > > *subport11, > > - *subport20, *subport21, *subport30, *subport31, > > *subport_last; > > - uint32_t q00, q01, q10, q11, q20, q21, q30, q31, q_last; > > - uint32_t r00, r01, r10, r11, r20, r21, r30, r31, r_last; > > - uint32_t subport_qmask; > > uint32_t result, i; > > + struct rte_mbuf_sched sch[n_pkts]; > > + struct rte_sched_subport *subports[n_pkts]; > > + struct rte_mbuf **q_base[n_pkts]; > > + uint32_t q[n_pkts]; > > + > > + const uint32_t subport_qmask =3D > > + (1 << (port->n_pipes_per_subport_log2 + 4)) - 1; > > > > result =3D 0; > > - subport_qmask =3D (1 << (port->n_pipes_per_subport_log2 + 4)) - 1; > > > > - /* > > - * Less then 6 input packets available, which is not enough to > > - * feed the pipeline > > - */ > > - if (unlikely(n_pkts < 6)) { > > - struct rte_sched_subport *subports[5]; > > - struct rte_mbuf **q_base[5]; > > - uint32_t q[5]; > > - > > - /* Prefetch the mbuf structure of each packet */ > > - for (i =3D 0; i < n_pkts; i++) > > - rte_prefetch0(pkts[i]); > > - > > - /* Prefetch the subport structure for each packet */ > > - for (i =3D 0; i < n_pkts; i++) > > - subports[i] =3D rte_sched_port_subport(port, pkts[i]); > > - > > - /* Prefetch the queue structure for each queue */ > > - for (i =3D 0; i < n_pkts; i++) > > - q[i] =3D > > rte_sched_port_enqueue_qptrs_prefetch0(subports[i], > > - pkts[i], subport_qmask); > > - > > - /* Prefetch the write pointer location of each queue */ > > - for (i =3D 0; i < n_pkts; i++) { > > - q_base[i] =3D > > rte_sched_subport_pipe_qbase(subports[i], q[i]); > > - rte_sched_port_enqueue_qwa_prefetch0(port, > > subports[i], > > - q[i], q_base[i]); > > - } > > + /* Prefetch the mbuf structure of each packet */ > > + for (i =3D 0; i < n_pkts; i++) > > + sch[i] =3D pkts[i]->hash.sched; > > >=20 > Hi Konstantin, thanks for the patch. In above case, all packets are touc= hed > straight with any prefetch. If we consider the input burst size of 64 pkt= s, it > means 512 bytes of packet addresses (8 cache-lines) which is likely to b= e > available in cache. For larger size burst, e.g. 128 or 256, there might b= e > instances when some addresses are not available the cache, may stall core= . > How about adding explicit prefetch before starting to iterate through the > packets if that helps? Exactly. Konstantin, you might not be a fan of prefetches, but the current = enqueue implementation (as well as the dequeue) uses a prefetch state machi= ne. Please keep the prefetch state machine in the scalar code. Even if the = examples/qos_sched might not show an advantage, this is just a sample app a= nd there are some more relevant use-cases as well. Thanks, Cristian