From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D9F5AA0521;
	Thu, 23 Jul 2020 18:53:26 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id B10221BFBA;
	Thu, 23 Jul 2020 18:53:25 +0200 (CEST)
Received: from EUR02-HE1-obe.outbound.protection.outlook.com
 (mail-eopbgr10085.outbound.protection.outlook.com [40.107.1.85])
 by dpdk.org (Postfix) with ESMTP id CF2EB1BF94
 for <dev@dpdk.org>; Thu, 23 Jul 2020 18:53:24 +0200 (CEST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=OR8ZXfsfU116ekY0tiDrtIrFr/KTzVnZrdNh/SQLQOxZUriL0g0mrf91eydaYxNJJ1XRBsImepMkRNCrvpdBMO9DgYW7vwqgP1bugq3chqlGWWhy+DXozV5tScIff9kSe4+TUNYIoPbFamB9zKAYWczmP51CBOikh0tTy4WCts6AzCpylsBBJamCptVJcY98ikqR17ZxWw47vwtyCcN5lo1CvhBiCKZ2j/P1W1sBM45HvkpmhJs3QO2OHpzkXO9NdpAkjhXWCpij30cZWuS5z3iTWIb1pI/xNT1u8KRHrC6a6A9io9OcKLWunbdoggBChLvVrdQgqJrFGvX6McefIg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=uRqp7O/dX6BkqdxGKQakcIvs1DfFTMW2WpnQVq/kJdg=;
 b=BRYfP5n/Ae0H1tYryMnuIjb802CTwwh1r/k4OIc/n11Gi1CiNxuiL3h/05pDBORWBnYkf6OVh65oqwIg3rH0DCDLalzfBMsPJfyjkUSY1WBnlg/MW48pu7suWlFn4OY1R5ktfg3LZuDxKxDZeW/S8X5rV4zMihbyoWNSNySWk6EL8oPaed59dl7y1nWUbRit/Ke1sKEy18HeRk7YH20kKpQHXHvbsOz98amHzyXRKo9cftx6FTlvATpfI8I5+k/ctZYjFQk5GeG7QTYbFjmxGJLj1bUezXFLEZrxFtkXBktX+/a0fRrTGFs9OGV2+e8COwI5DMg1N5Z3yacBqBWjJQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=mellanox.com; dmarc=pass action=none header.from=mellanox.com;
 dkim=pass header.d=mellanox.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=uRqp7O/dX6BkqdxGKQakcIvs1DfFTMW2WpnQVq/kJdg=;
 b=TuBjRNT2jRtiSWIEFA7XPrPqDY+RmLgBip7bzIo51Wce98hCeMzy9dwcCg2FuLzPwtj9uWkRtmDQ7HkRxHdJMxuDhimp0T+JpxdkUTsTFY5msQ3hBE2aNQfzfr2NSPdC4bev7LrEMZg6RFR2gD0quPW7nL+RALNqc5LyxF5MlVc=
Received: from AM0PR05MB4561.eurprd05.prod.outlook.com (2603:10a6:208:ad::20)
 by AM4PR0501MB2308.eurprd05.prod.outlook.com (2603:10a6:200:46::10)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3195.18; Thu, 23 Jul
 2020 16:53:22 +0000
Received: from AM0PR05MB4561.eurprd05.prod.outlook.com
 ([fe80::456:89d0:7f5c:c6c3]) by AM0PR05MB4561.eurprd05.prod.outlook.com
 ([fe80::456:89d0:7f5c:c6c3%4]) with mapi id 15.20.3216.021; Thu, 23 Jul 2020
 16:53:22 +0000
From: Alexander Kozyrev <akozyrev@mellanox.com>
To: Phil Yang <Phil.Yang@arm.com>, Honnappa Nagarahalli
 <Honnappa.Nagarahalli@arm.com>, Matan Azrad <matan@mellanox.com>, Shahaf
 Shuler <shahafs@mellanox.com>, Slava Ovsiienko <viacheslavo@mellanox.com>
CC: "drc@linux.vnet.ibm.com" <drc@linux.vnet.ibm.com>, nd <nd@arm.com>,
 "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, nd <nd@arm.com>
Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for
 multi-packet	RQ buffer refcnt
Thread-Index: AQHWWMITvRJ6cPOqEEiXXa7L02rE6qkRJjhAgAOA/wCAABdygIAAsZng
Date: Thu, 23 Jul 2020 16:53:22 +0000
Message-ID: <AM0PR05MB4561C2A4F17F0C0CAF008CBFA2760@AM0PR05MB4561.eurprd05.prod.outlook.com>
References: <20200410164127.54229-7-gavin.hu@arm.com>
 <1592900807-13289-1-git-send-email-phil.yang@arm.com>
 <VE1PR08MB4640FDCC1C68241E77DC4CD7E9600@VE1PR08MB4640.eurprd08.prod.outlook.com>
 <AM0PR05MB4561A88F6F1F6B15FB754DCBA27B0@AM0PR05MB4561.eurprd05.prod.outlook.com>
 <DB6PR0802MB2216C67B62463133DBB2A49F98760@DB6PR0802MB2216.eurprd08.prod.outlook.com>
 <VE1PR08MB4640CE2CDEA76862C2AD520AE9760@VE1PR08MB4640.eurprd08.prod.outlook.com>
In-Reply-To: <VE1PR08MB4640CE2CDEA76862C2AD520AE9760@VE1PR08MB4640.eurprd08.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: arm.com; dkim=none (message not signed)
 header.d=none;arm.com; dmarc=none action=none header.from=mellanox.com;
x-originating-ip: [2607:fea8:e380:d8e0:74b7:98e0:58ad:1fb6]
x-ms-publictraffictype: Email
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: 08533efe-3f5f-4962-880d-08d82f28e8b1
x-ms-traffictypediagnostic: AM4PR0501MB2308:
x-ms-exchange-transport-forked: True
x-microsoft-antispam-prvs: <AM4PR0501MB2308EB012CAEF48BD61355F3A2760@AM4PR0501MB2308.eurprd05.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: Y6s7RLCuGZg1yCLugxXHVUE76Z23sXWy8LHeaZp+qj7Kq9aKfbn/ucfRe4iQqU2oj0I2ejq5XIYluSGbX4K5/nVl1Syfdo1NdGgH/HNJcMm735ZH5Jka9jXwLpkqWdBDfgnRspMjx/q2bH6VS1X8QdnmMr5hHIom8/gKTusDl+YBYawYbs/32LQBO77qXJRbCFcgf9KgGA8FRrdiRohp/ww+keYwN4Em4a0klRTntla3u/CjrwyNg1GDwqzuFbimW9MI5tVFrqNx/j+WrMNinUF9Nd2hMDImSJmy+30DH0a/jNdmhXhXeuKNlQSQOM5TcPznIAiVgjxwQqSA+xfsDQ==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:AM0PR05MB4561.eurprd05.prod.outlook.com; PTR:; CAT:NONE;
 SFTY:;
 SFS:(4636009)(396003)(39860400002)(136003)(346002)(366004)(376002)(55016002)(316002)(478600001)(9686003)(52536014)(5660300002)(6636002)(33656002)(7696005)(2906002)(110136005)(54906003)(66446008)(8936002)(64756008)(66476007)(66946007)(66556008)(76116006)(71200400001)(83380400001)(53546011)(6506007)(186003)(4326008)(86362001);
 DIR:OUT; SFP:1101; 
x-ms-exchange-antispam-messagedata: 2oRIwhDLdrp8mm2w8kfyTnL1NYhYPjU8qkVEfNthT1kQTJMs8XbWyCagCNckowa7B7gPeSnAe1zeN9cXjPe2Or4lfB6TVP4ckHpDKWXYO4iuvhYyaGm/G/9aUvetu8Q2gXzdLIGbAbUTT7Ar/pwSO2wspcn6w+IjRZnDlZ8DkpDqxQSvzm9CWwAs0x3aYV24r1XEVv6TKjLH45L+yr3qYWryYtn/dMNmMuCIh5/c5rOIlH+UDjUgXS22b6d3vEMbbiZ+NYVDd6CBayvdFu0h/DoXhSEIoQME/6Y+5newSBnrEqW8zs38KmBiHZVfz+eI7nOke43aQSVxUfGq/JnoSPSthrWPIPmKLz+9Qq2NGeTe+5iOveXZY6XNfIGlQT6XLNhibKWoPpmRO+doyg1KVZ3SefDaz9kzGJr6/IMGO0bLtoh8z+m4COzwG8AIM7kEMPZkBD2QFmRXFnouamUCN3ERQqz+UlMEMVFDTXyxtGch2pgR/d7cDpM2pHCL/YauBkl5qn/3mhJN1AC8Ybb4Bwxa1gzSLnRDYG916X+LsUklUyHio757FaWSdwHMfvLk
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: AM0PR05MB4561.eurprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 08533efe-3f5f-4962-880d-08d82f28e8b1
X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Jul 2020 16:53:22.1021 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 5xIuRVSxcKOD8jD6ooWVI/9ACgHAedj7+mdDzAZztfGM928yIsZP+v/L1/gRDRyJRTboINbEBvPURfsXgpA54g==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0501MB2308
Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering
	for	multi-packet	RQ buffer refcnt
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

> <snip>
> >
> > > > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for
> > > > multi-packet RQ buffer refcnt
> > > >
> > > > Hi,
> > > >
> > > > We are also doing C11 atomics converting for other components.
> > > > Your insight would be much appreciated.
> > > >
> > > > Thanks,
> > > > Phil Yang
> > > >
> > > > > -----Original Message-----
> > > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Phil Yang
> > > > > Sent: Tuesday, June 23, 2020 4:27 PM
> > > > > To: dev@dpdk.org
> > > > > Cc: matan@mellanox.com; shahafs@mellanox.com;
> > > > > viacheslavo@mellanox.com; Honnappa Nagarahalli
> > > > > <Honnappa.Nagarahalli@arm.com>; drc@linux.vnet.ibm.com; nd
> > > > > <nd@arm.com>
> > > > > Subject: [dpdk-dev] [PATCH v3] net/mlx5: relaxed ordering for
> > > > > multi-packet RQ buffer refcnt
> > > > >
> > > > > Use c11 atomics with explicit ordering instead of the rte_atomic
> > > > > ops which enforce unnecessary barriers on aarch64.
> > > > >
> > > > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > > > ---
> <...>
> > > > >
> > > > >  drivers/net/mlx5/mlx5_rxq.c  |  2 +-
> > > > > drivers/net/mlx5/mlx5_rxtx.c
> > > > > | 16 +++++++++------- drivers/net/mlx5/mlx5_rxtx.h |  2 +-
> > > > >  3 files changed, 11 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/mlx5/mlx5_rxq.c
> > > > > b/drivers/net/mlx5/mlx5_rxq.c index dda0073..7f487f1 100644
> > > > > --- a/drivers/net/mlx5/mlx5_rxq.c
> > > > > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > > > > @@ -1545,7 +1545,7 @@ mlx5_mprq_buf_init(struct rte_mempool
> > *mp,
> > > > > void *opaque_arg,
> > > > >
> > > > >  memset(_m, 0, sizeof(*buf));
> > > > >  buf->mp =3D mp;
> > > > > -rte_atomic16_set(&buf->refcnt, 1);
> > > > > +__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
> > > > >  for (j =3D 0; j !=3D strd_n; ++j) {  shinfo =3D &buf->shinfos[j]=
;
> > > > > shinfo->free_cb =3D mlx5_mprq_buf_free_cb; diff --git
> > > > > a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
> > > > > index
> > > > > e4106bf..f0eda88 100644
> > > > > --- a/drivers/net/mlx5/mlx5_rxtx.c
> > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.c
> > > > > @@ -1595,10 +1595,11 @@ mlx5_mprq_buf_free_cb(void *addr
> > > > __rte_unused,
> > > > > void *opaque)  {
> > > > >  struct mlx5_mprq_buf *buf =3D opaque;
> > > > >
> > > > > -if (rte_atomic16_read(&buf->refcnt) =3D=3D 1) {
> > > > > +if (__atomic_load_n(&buf->refcnt, __ATOMIC_RELAXED) =3D=3D 1) {
> > > > >  rte_mempool_put(buf->mp, buf);
> > > > > -} else if (rte_atomic16_add_return(&buf->refcnt, -1) =3D=3D 0) {
> > > > > -rte_atomic16_set(&buf->refcnt, 1);
> > > > > +} else if (unlikely(__atomic_sub_fetch(&buf->refcnt, 1,
> > > > > +       __ATOMIC_RELAXED) =3D=3D 0)) {
> > > > > +__atomic_store_n(&buf->refcnt, 1, __ATOMIC_RELAXED);
> > > > >  rte_mempool_put(buf->mp, buf);
> > > > >  }
> > > > >  }
> > > > > @@ -1678,7 +1679,8 @@ mlx5_rx_burst_mprq(void *dpdk_rxq,
> struct
> > > > > rte_mbuf **pkts, uint16_t pkts_n)
> > > > >
> > > > >  if (consumed_strd =3D=3D strd_n) {
> > > > >  /* Replace WQE only if the buffer is still in use. */ -if
> > > > > (rte_atomic16_read(&buf->refcnt) > 1) {
> > > > > +if (__atomic_load_n(&buf->refcnt,
> > > > > +    __ATOMIC_RELAXED) > 1) {
> > > > >  mprq_buf_replace(rxq, rq_ci & wq_mask,
> > > > strd_n);
> > > > >  /* Release the old buffer. */
> > > > >  mlx5_mprq_buf_free(buf);
> > > > > @@ -1790,9 +1792,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq,
> struct
> > > > > rte_mbuf **pkts, uint16_t pkts_n)  void *buf_addr;
> > > > >
> > > > >  /* Increment the refcnt of the whole chunk. */
> > > > > -rte_atomic16_add_return(&buf->refcnt, 1);
> > rte_atomic16_add_return includes a full barrier along with atomic
> operation.
> > But is full barrier required here? For ex:
> > __atomic_add_fetch(&buf->refcnt, 1,
> > __ATOMIC_RELAXED) will offer atomicity, but no barrier. Would that be
> > enough?
> >
> > > > > -MLX5_ASSERT((uint16_t)rte_atomic16_read(&buf-
> > > > > >refcnt) <=3D
> > > > > -    strd_n + 1);
> > > > > +__atomic_add_fetch(&buf->refcnt, 1,
> > > > > __ATOMIC_ACQUIRE);
>=20
> The atomic load in MLX5_ASSERT() accesses the same memory space as the
> previous __atomic_add_fetch() does.
> They will access this memory space in the program order when we enabled
> MLX5_PMD_DEBUG. So the ACQUIRE barrier in __atomic_add_fetch()
> becomes unnecessary.
>=20
> By changing it to RELAXED ordering, this patch got 7.6% performance
> improvement on N1 (making it generate A72 alike instructions).
>=20
> Could you please also try it on your testbed, Alex?

Situation got better with this modification, here are the results:
 - no patch:             3.0 Mpps CPU cycles/packet=3D51.52
 - original patch:    2.1 Mpps CPU cycles/packet=3D71.05
 - modified patch: 2.9 Mpps CPU cycles/packet=3D52.79
Also, I found that the degradation is there only in case I enable bursts st=
ats.
Could you please turn on the following config options and see if you can re=
produce this as well?
CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=3Dy
CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=3Dy

> >
> > Can you replace just the above line with the following lines and test i=
t?
> >
> > __atomic_add_fetch(&buf->refcnt, 1, __ATOMIC_RELAXED);
> > __atomic_thread_fence(__ATOMIC_ACQ_REL);
> >
> > This should make the generated code same as before this patch. Let me
> > know if you would prefer us to re-spin the patch instead (for testing).
> >
> > > > > +MLX5_ASSERT(__atomic_load_n(&buf->refcnt,
> > > > > +    __ATOMIC_RELAXED) <=3D strd_n + 1);
> > > > >  buf_addr =3D RTE_PTR_SUB(addr,
> > > > > RTE_PKTMBUF_HEADROOM);
> > > > >  /*
> > > > >   * MLX5 device doesn't use iova but it is necessary in a
> > > > diff
> > > > > --git a/drivers/net/mlx5/mlx5_rxtx.h
> > > > > b/drivers/net/mlx5/mlx5_rxtx.h index 26621ff..0fc15f3 100644
> > > > > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > > > > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > > > > @@ -78,7 +78,7 @@ struct rxq_zip {
> > > > >  /* Multi-Packet RQ buffer header. */  struct mlx5_mprq_buf {
> > > > > struct rte_mempool *mp; -rte_atomic16_t refcnt; /* Atomically
> > > > > accessed refcnt. */
> > > > > +uint16_t refcnt; /* Atomically accessed refcnt. */
> > > > >  uint8_t pad[RTE_PKTMBUF_HEADROOM]; /* Headroom for the first
> > > > packet.
> > > > > */
> > > > >  struct rte_mbuf_ext_shared_info shinfos[];
> > > > >  /*
> > > > > --
> > > > > 2.7.4
> >