From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id F001CA04B3
	for <public@inbox.dpdk.org>; Mon, 23 Dec 2019 10:35:51 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id A788A397D;
	Mon, 23 Dec 2019 10:35:51 +0100 (CET)
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 48242F72;
 Mon, 23 Dec 2019 10:35:46 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga005.fm.intel.com ([10.253.24.32])
 by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 23 Dec 2019 01:35:45 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.69,347,1571727600"; d="scan'208";a="417206322"
Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206])
 by fmsmga005.fm.intel.com with ESMTP; 23 Dec 2019 01:35:45 -0800
Received: from FMSMSX110.amr.corp.intel.com (10.18.116.10) by
 FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Mon, 23 Dec 2019 01:35:45 -0800
Received: from shsmsx107.ccr.corp.intel.com (10.239.4.96) by
 fmsmsx110.amr.corp.intel.com (10.18.116.10) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Mon, 23 Dec 2019 01:35:45 -0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.19]) by
 SHSMSX107.ccr.corp.intel.com ([169.254.9.164]) with mapi id 14.03.0439.000;
 Mon, 23 Dec 2019 17:35:43 +0800
From: "Li, Xiaoyun" <xiaoyun.li@intel.com>
To: Gavin Hu <Gavin.Hu@arm.com>, "Wu, Jingjing" <jingjing.wu@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Maslekar, Omkar"
 <omkar.maslekar@intel.com>, "stable@dpdk.org" <stable@dpdk.org>, nd
 <nd@arm.com>, "jerinj@marvell.com" <jerinj@marvell.com>, Honnappa Nagarahalli
 <Honnappa.Nagarahalli@arm.com>, "Richardson, Bruce"
 <bruce.richardson@intel.com>, nd <nd@arm.com>
Thread-Topic: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue
Thread-Index: AQHVs7Tq7x2YYQcLw0mJCFIUj/Hkaqe8D3eAgAtTD0D//4iOAIAAlKPQ
Date: Mon, 23 Dec 2019 09:35:42 +0000
Message-ID: <B9E724F4CB7543449049E7AE7669D82F0B35BAB9@SHSMSX101.ccr.corp.intel.com>
References: <20191204151916.12607-1-xiaoyun.li@intel.com>
 <20191216015854.28725-1-xiaoyun.li@intel.com>
 <VI1PR08MB5376ECC87D2E0907DCFBE47F8F510@VI1PR08MB5376.eurprd08.prod.outlook.com>
 <B9E724F4CB7543449049E7AE7669D82F0B35B9BC@SHSMSX101.ccr.corp.intel.com>
 <VI1PR08MB537694B24564C8E262BCC1328F2E0@VI1PR08MB5376.eurprd08.prod.outlook.com>
In-Reply-To: <VI1PR08MB537694B24564C8E262BCC1328F2E0@VI1PR08MB5376.eurprd08.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v2] raw/ntb: fix write memory
	barrier issue
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org
Sender: "stable" <stable-bounces@dpdk.org>

Hi

Still, stability and correctness are much more important than performance.
As I said, with WC can benefit more than 20X perf. Comparing to this benefi=
t, the difference between rte_wmb and rte_io_wmb is not that important.
And in my test, the performance is not that bad with rte_wmb especially wit=
h large packets which are the normal use cases.

BTW, I've searched linux kernel codes and don't see any NTB device on arm p=
latform.
So I don't think you need to consider the perf hurt to arm.

Best Regards
Xiaoyun Li

> -----Original Message-----
> From: Gavin Hu [mailto:Gavin.Hu@arm.com]
> Sent: Monday, December 23, 2019 16:38
> To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.c=
om>
> Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>;
> stable@dpdk.org; nd <nd@arm.com>; jerinj@marvell.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issu=
e
>=20
> Hi Xiaoyun,
>=20
> > -----Original Message-----
> > From: Li, Xiaoyun <xiaoyun.li@intel.com>
> > Sent: Monday, December 23, 2019 3:52 PM
> > To: Gavin Hu <Gavin.Hu@arm.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>;
> > stable@dpdk.org; nd <nd@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier
> > issue
> >
> > Hi
> > I reconsidered and retested about this issue.
> > I still need to use rte_wmb instead of using rte_io_wmb.
> >
> > Because to achieve high performance, ntb needs to turn on WC(write
> > combining) feature. The perf difference with and without WC enabled is
> > more than 20X.
> > And when WC enabled, rte_io_wmb cannot make sure the instructions are
> > in order only rte_wmb can make sure that.
> >
> > And in my retest, when sending 64 bytes packets, using rte_io_wmb will
> > cause out-of-order issue and cause memory corruption on rx side.
> > And using rte_wmb is fine.
> That's true, as it is declared as 'write combine' region, even x86 is kno=
wn as
> strong ordered, it is the interconnect or PCI RC may do the reordering, '=
write
> combine', 'write coalescing', which caused this problem.
> IMO, rte_io_*mb barriers on x86 should be promoted to stronger is WC is
> involved(but that will sap performance for non-WC memories?).
> https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/ar=
ch/
> x86/rte_atomic.h#L78
>=20
> Using rte_wmb will hurt performance for aarch64 also, as pci device memor=
y
> accesses to a single device are strongly ordered therefore the strongest
> rte_wmb is not necessary.
> > So I can only use v1 patch and suspend v2 patch in patchwork.
> >
> > Best Regards
> > Xiaoyun Li
> >
> > > -----Original Message-----
> > > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com]
> > > Sent: Monday, December 16, 2019 18:50
> > > To: Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>
> > > Cc: dev@dpdk.org; Maslekar, Omkar <omkar.maslekar@intel.com>;
> > > stable@dpdk.org; nd <nd@arm.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier
> > issue
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org> On Behalf Of Xiaoyun Li
> > > > Sent: Monday, December 16, 2019 9:59 AM
> > > > To: jingjing.wu@intel.com
> > > > Cc: dev@dpdk.org; omkar.maslekar@intel.com; Xiaoyun Li
> > > > <xiaoyun.li@intel.com>; stable@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier
> > > > issue
> > > >
> > > > All buffers and ring info should be written before tail register up=
date.
> > > > This patch relocates the write memory barrier before updating tail
> > > > register to avoid potential issues.
> > > >
> > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
> > > > ---
> > > > v2:
> > > >  * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough.
> > > > ---
> > > >  drivers/raw/ntb/ntb.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index
> > > > ad7f6abfd..c7de86f36 100644
> > > > --- a/drivers/raw/ntb/ntb.c
> > > > +++ b/drivers/raw/ntb/ntb.c
> > > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev,
> > > >  			   sizeof(struct ntb_used) * nb1);
> > > >  		rte_memcpy(txq->tx_used_ring, tx_used + nb1,
> > > >  			   sizeof(struct ntb_used) * nb2);
> > > > +		rte_io_wmb();
> > > As both txq->tx_used_ring and *txq->used_cnt are physically reside
> > > in the
> > PCI
> > > device side, rte_io_wmb is correct to ensure the ordering.
> > >
> > > >  		*txq->used_cnt =3D txq->last_used;
> > > > -		rte_wmb();
> > > >
> > > >  		/* update queue stats */
> > > >  		hw->ntb_xstats[NTB_TX_BYTES_ID + off] +=3D bytes; @@ -
> > 789,8
> > > +789,8 @@
> > > > ntb_dequeue_bufs(struct rte_rawdev *dev,
> > > >  			   sizeof(struct ntb_desc) * nb1);
> > > >  		rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
> > > >  			   sizeof(struct ntb_desc) * nb2);
> > > > +		rte_io_wmb();
> > > >  		*rxq->avail_cnt =3D rxq->last_avail;
> > > > -		rte_wmb();
> > > >
> > > >  		/* update queue stats */
> > > >  		off =3D NTB_XSTATS_NUM * ((size_t)context + 1);
> > > > --
> > > > 2.17.1
> > >
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>