From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D9F5AA0521; Thu, 23 Jul 2020 18:53:26 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B10221BFBA; Thu, 23 Jul 2020 18:53:25 +0200 (CEST) Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10085.outbound.protection.outlook.com [40.107.1.85]) by dpdk.org (Postfix) with ESMTP id CF2EB1BF94 for ; Thu, 23 Jul 2020 18:53:24 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OR8ZXfsfU116ekY0tiDrtIrFr/KTzVnZrdNh/SQLQOxZUriL0g0mrf91eydaYxNJJ1XRBsImepMkRNCrvpdBMO9DgYW7vwqgP1bugq3chqlGWWhy+DXozV5tScIff9kSe4+TUNYIoPbFamB9zKAYWczmP51CBOikh0tTy4WCts6AzCpylsBBJamCptVJcY98ikqR17ZxWw47vwtyCcN5lo1CvhBiCKZ2j/P1W1sBM45HvkpmhJs3QO2OHpzkXO9NdpAkjhXWCpij30cZWuS5z3iTWIb1pI/xNT1u8KRHrC6a6A9io9OcKLWunbdoggBChLvVrdQgqJrFGvX6McefIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uRqp7O/dX6BkqdxGKQakcIvs1DfFTMW2WpnQVq/kJdg=; b=BRYfP5n/Ae0H1tYryMnuIjb802CTwwh1r/k4OIc/n11Gi1CiNxuiL3h/05pDBORWBnYkf6OVh65oqwIg3rH0DCDLalzfBMsPJfyjkUSY1WBnlg/MW48pu7suWlFn4OY1R5ktfg3LZuDxKxDZeW/S8X5rV4zMihbyoWNSNySWk6EL8oPaed59dl7y1nWUbRit/Ke1sKEy18HeRk7YH20kKpQHXHvbsOz98amHzyXRKo9cftx6FTlvATpfI8I5+k/ctZYjFQk5GeG7QTYbFjmxGJLj1bUezXFLEZrxFtkXBktX+/a0fRrTGFs9OGV2+e8COwI5DMg1N5Z3yacBqBWjJQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com; dkim=pass header.d=mellanox.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uRqp7O/dX6BkqdxGKQakcIvs1DfFTMW2WpnQVq/kJdg=; b=TuBjRNT2jRtiSWIEFA7XPrPqDY+RmLgBip7bzIo51Wce98hCeMzy9dwcCg2FuLzPwtj9uWkRtmDQ7HkRxHdJMxuDhimp0T+JpxdkUTsTFY5msQ3hBE2aNQfzfr2NSPdC4bev7LrEMZg6RFR2gD0quPW7nL+RALNqc5LyxF5MlVc= Received: from AM0PR05MB4561.eurprd05.prod.outlook.com (2603:10a6:208:ad::20) by AM4PR0501MB2308.eurprd05.prod.outlook.com (2603:10a6:200:46::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3195.18; Thu, 23 Jul 2020 16:53:22 +0000 Received: from AM0PR05MB4561.eurprd05.prod.outlook.com ([fe80::456:89d0:7f5c:c6c3]) by AM0PR05MB4561.eurprd05.prod.outlook.com ([fe80::456:89d0:7f5c:c6c3%4]) with mapi id 15.20.3216.021; Thu, 23 Jul 2020 16:53:22 +0000 From: Alexander Kozyrev To: Phil Yang , Honnappa Nagarahalli , Matan Azrad , Shahaf Shuler , Slava Ovsiienko CC: "drc@linux.vnet.ibm.com" , nd , "dev@dpdk.org" , nd , nd Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt Thread-Index: AQHWWMITvRJ6cPOqEEiXXa7L02rE6qkRJjhAgAOA/wCAABdygIAAsZng Date: Thu, 23 Jul 2020 16:53:22 +0000 Message-ID: References: <20200410164127.54229-7-gavin.hu@arm.com> <1592900807-13289-1-git-send-email-phil.yang@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=mellanox.com; x-originating-ip: [2607:fea8:e380:d8e0:74b7:98e0:58ad:1fb6] x-ms-publictraffictype: Email x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 08533efe-3f5f-4962-880d-08d82f28e8b1 x-ms-traffictypediagnostic: AM4PR0501MB2308: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Y6s7RLCuGZg1yCLugxXHVUE76Z23sXWy8LHeaZp+qj7Kq9aKfbn/ucfRe4iQqU2oj0I2ejq5XIYluSGbX4K5/nVl1Syfdo1NdGgH/HNJcMm735ZH5Jka9jXwLpkqWdBDfgnRspMjx/q2bH6VS1X8QdnmMr5hHIom8/gKTusDl+YBYawYbs/32LQBO77qXJRbCFcgf9KgGA8FRrdiRohp/ww+keYwN4Em4a0klRTntla3u/CjrwyNg1GDwqzuFbimW9MI5tVFrqNx/j+WrMNinUF9Nd2hMDImSJmy+30DH0a/jNdmhXhXeuKNlQSQOM5TcPznIAiVgjxwQqSA+xfsDQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM0PR05MB4561.eurprd05.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(396003)(39860400002)(136003)(346002)(366004)(376002)(55016002)(316002)(478600001)(9686003)(52536014)(5660300002)(6636002)(33656002)(7696005)(2906002)(110136005)(54906003)(66446008)(8936002)(64756008)(66476007)(66946007)(66556008)(76116006)(71200400001)(83380400001)(53546011)(6506007)(186003)(4326008)(86362001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: 2oRIwhDLdrp8mm2w8kfyTnL1NYhYPjU8qkVEfNthT1kQTJMs8XbWyCagCNckowa7B7gPeSnAe1zeN9cXjPe2Or4lfB6TVP4ckHpDKWXYO4iuvhYyaGm/G/9aUvetu8Q2gXzdLIGbAbUTT7Ar/pwSO2wspcn6w+IjRZnDlZ8DkpDqxQSvzm9CWwAs0x3aYV24r1XEVv6TKjLH45L+yr3qYWryYtn/dMNmMuCIh5/c5rOIlH+UDjUgXS22b6d3vEMbbiZ+NYVDd6CBayvdFu0h/DoXhSEIoQME/6Y+5newSBnrEqW8zs38KmBiHZVfz+eI7nOke43aQSVxUfGq/JnoSPSthrWPIPmKLz+9Qq2NGeTe+5iOveXZY6XNfIGlQT6XLNhibKWoPpmRO+doyg1KVZ3SefDaz9kzGJr6/IMGO0bLtoh8z+m4COzwG8AIM7kEMPZkBD2QFmRXFnouamUCN3ERQqz+UlMEMVFDTXyxtGch2pgR/d7cDpM2pHCL/YauBkl5qn/3mhJN1AC8Ybb4Bwxa1gzSLnRDYG916X+LsUklUyHio757FaWSdwHMfvLk Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: AM0PR05MB4561.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 08533efe-3f5f-4962-880d-08d82f28e8b1 X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Jul 2020 16:53:22.1021 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 5xIuRVSxcKOD8jD6ooWVI/9ACgHAedj7+mdDzAZztfGM928yIsZP+v/L1/gRDRyJRTboINbEBvPURfsXgpA54g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0501MB2308 Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > > > > > > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for > > > > multi-packet RQ buffer refcnt > > > > > > > > Hi, > > > > > > > > We are also doing C11 atomics converting for other components. > > > > Your insight would be much appreciated. > > > > > > > > Thanks, > > > > Phil Yang > > > > > > > > > -----Original Message----- > > > > > From: dev On Behalf Of Phil Yang > > > > > Sent: Tuesday, June 23, 2020 4:27 PM > > > > > To: dev@dpdk.org > > > > > Cc: matan@mellanox.com; shahafs@mellanox.com; > > > > > viacheslavo@mellanox.com; Honnappa Nagarahalli > > > > > ; drc@linux.vnet.ibm.com; nd > > > > > > > > > > Subject: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for > > > > > multi-packet RQ buffer refcnt > > > > > > > > > > Use c11 atomics with explicit ordering instead of the rte_atomic > > > > > ops which enforce unnecessary barriers on aarch64. > > > > > > > > > > Signed-off-by: Phil Yang > > > > > --- > <...> > > > > > > > > > > drivers/net/mlx5/mlx5_rxq.c | 2 +- > > > > > drivers/net/mlx5/mlx5_rxtx.c > > > > > | 16 +++++++++------- drivers/net/mlx5/mlx5_rxtx.h | 2 +- > > > > > 3 files changed, 11 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/drivers/net/mlx5/mlx5_rxq.c > > > > > b/drivers/net/mlx5/mlx5_rxq.c index dda0073..7f487f1 100644 > > > > > --- a/drivers/net/mlx5/mlx5_rxq.c > > > > > +++ b/drivers/net/mlx5/mlx5_rxq.c > > > > > @@ -1545,7 +1545,7 @@ mlx5_mprq_buf_init(struct rte_mempool > > *mp, > > > > > void *opaque_arg, > > > > > > > > > > memset(_m, 0, sizeof(*buf)); > > > > > buf->mp =3D mp; > > > > > -rte_atomic16_set(&buf->refcnt, 1); > > > > > +__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED); > > > > > for (j =3D 0; j !=3D strd_n; ++j) { shinfo =3D &buf->shinfos[j]= ; > > > > > shinfo->free_cb =3D mlx5_mprq_buf_free_cb; diff --git > > > > > a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c > > > > > index > > > > > e4106bf..f0eda88 100644 > > > > > --- a/drivers/net/mlx5/mlx5_rxtx.c > > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.c > > > > > @@ -1595,10 +1595,11 @@ mlx5_mprq_buf_free_cb(void *addr > > > > __rte_unused, > > > > > void *opaque) { > > > > > struct mlx5_mprq_buf *buf =3D opaque; > > > > > > > > > > -if (rte_atomic16_read(&buf->refcnt) =3D=3D 1) { > > > > > +if (__atomic_load_n(&buf->refcnt, __ATOMIC_RELAXED) =3D=3D 1) { > > > > > rte_mempool_put(buf->mp, buf); > > > > > -} else if (rte_atomic16_add_return(&buf->refcnt, -1) =3D=3D 0) { > > > > > -rte_atomic16_set(&buf->refcnt, 1); > > > > > +} else if (unlikely(__atomic_sub_fetch(&buf->refcnt, 1, > > > > > + __ATOMIC_RELAXED) =3D=3D 0)) { > > > > > +__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED); > > > > > rte_mempool_put(buf->mp, buf); > > > > > } > > > > > } > > > > > @@ -1678,7 +1679,8 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, > struct > > > > > rte_mbuf **pkts, uint16_t pkts_n) > > > > > > > > > > if (consumed_strd =3D=3D strd_n) { > > > > > /* Replace WQE only if the buffer is still in use. */ -if > > > > > (rte_atomic16_read(&buf->refcnt) > 1) { > > > > > +if (__atomic_load_n(&buf->refcnt, > > > > > + __ATOMIC_RELAXED) > 1) { > > > > > mprq_buf_replace(rxq, rq_ci & wq_mask, > > > > strd_n); > > > > > /* Release the old buffer. */ > > > > > mlx5_mprq_buf_free(buf); > > > > > @@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, > struct > > > > > rte_mbuf **pkts, uint16_t pkts_n) void *buf_addr; > > > > > > > > > > /* Increment the refcnt of the whole chunk. */ > > > > > -rte_atomic16_add_return(&buf->refcnt, 1); > > rte_atomic16_add_return includes a full barrier along with atomic > operation. > > But is full barrier required here? For ex: > > __atomic_add_fetch(&buf->refcnt, 1, > > __ATOMIC_RELAXED) will offer atomicity, but no barrier. Would that be > > enough? > > > > > > > -MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf- > > > > > >refcnt) <=3D > > > > > - strd_n + 1); > > > > > +__atomic_add_fetch(&buf->refcnt, 1, > > > > > __ATOMIC_ACQUIRE); >=20 > The atomic load in MLX5_ASSERT() accesses the same memory space as the > previous __atomic_add_fetch() does. > They will access this memory space in the program order when we enabled > MLX5_PMD_DEBUG. So the ACQUIRE barrier in __atomic_add_fetch() > becomes unnecessary. >=20 > By changing it to RELAXED ordering, this patch got 7.6% performance > improvement on N1 (making it generate A72 alike instructions). >=20 > Could you please also try it on your testbed, Alex? Situation got better with this modification, here are the results: - no patch: 3.0 Mpps CPU cycles/packet=3D51.52 - original patch: 2.1 Mpps CPU cycles/packet=3D71.05 - modified patch: 2.9 Mpps CPU cycles/packet=3D52.79 Also, I found that the degradation is there only in case I enable bursts st= ats. Could you please turn on the following config options and see if you can re= produce this as well? CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dy CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dy > > > > Can you replace just the above line with the following lines and test i= t? > > > > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED); > > __atomic_thread_fence(__ATOMIC_ACQ_REL); > > > > This should make the generated code same as before this patch. Let me > > know if you would prefer us to re-spin the patch instead (for testing). > > > > > > > +MLX5_ASSERT(__atomic_load_n(&buf->refcnt, > > > > > + __ATOMIC_RELAXED) <=3D strd_n + 1); > > > > > buf_addr =3D RTE_PTR_SUB(addr, > > > > > RTE_PKTMBUF_HEADROOM); > > > > > /* > > > > > * MLX5 device doesn't use iova but it is necessary in a > > > > diff > > > > > --git a/drivers/net/mlx5/mlx5_rxtx.h > > > > > b/drivers/net/mlx5/mlx5_rxtx.h index 26621ff..0fc15f3 100644 > > > > > --- a/drivers/net/mlx5/mlx5_rxtx.h > > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.h > > > > > @@ -78,7 +78,7 @@ struct rxq_zip { > > > > > /* Multi-Packet RQ buffer header. */ struct mlx5_mprq_buf { > > > > > struct rte_mempool *mp; -rte_atomic16_t refcnt; /* Atomically > > > > > accessed refcnt. */ > > > > > +uint16_t refcnt; /* Atomically accessed refcnt. */ > > > > > uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first > > > > packet. > > > > > */ > > > > > struct rte_mbuf_ext_shared_info shinfos[]; > > > > > /* > > > > > -- > > > > > 2.7.4 > >