From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <huawei.xie@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 798645954
 for <dev@dpdk.org>; Wed, 20 Jan 2016 04:39:23 +0100 (CET)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga101.jf.intel.com with ESMTP; 19 Jan 2016 19:39:22 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.22,319,1449561600"; d="scan'208";a="636588728"
Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201])
 by FMSMGA003.fm.intel.com with ESMTP; 19 Jan 2016 19:39:21 -0800
Received: from fmsmsx158.amr.corp.intel.com (10.18.116.75) by
 FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Tue, 19 Jan 2016 19:39:21 -0800
Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by
 fmsmsx158.amr.corp.intel.com (10.18.116.75) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Tue, 19 Jan 2016 19:39:21 -0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.215]) by
 shsmsx102.ccr.corp.intel.com ([169.254.2.172]) with mapi id 14.03.0248.002;
 Wed, 20 Jan 2016 11:39:19 +0800
From: "Xie, Huawei" <huawei.xie@intel.com>
To: "Polehn, Mike A" <mike.a.polehn@intel.com>, "Tan, Jianfeng"
 <jianfeng.tan@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the
 virtio	ring
Thread-Index: AQHRUzQkieIVNSepOEKwhW8jySn4vw==
Date: Wed, 20 Jan 2016 03:39:18 +0000
Message-ID: <C37D651A908B024F974696C65296B57B4C5A4FF7@SHSMSX101.ccr.corp.intel.com>
References: <1451918787-85887-1-git-send-email-huawei.xie@intel.com>
 <569E6372.5030200@intel.com>
 <C37D651A908B024F974696C65296B57B4C5A475A@SHSMSX101.ccr.corp.intel.com>
 <745DB4B8861F8E4B9849C970520ABBF1498488E5@ORSMSX102.amr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ann.zhuangyanying@huawei.com" <ann.zhuangyanying@huawei.com>
Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the
	virtio	ring
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Jan 2016 03:39:24 -0000

On 1/20/2016 2:33 AM, Polehn, Mike A wrote:=0A=
> SMP operations can be very expensive, sometimes can impact operations by =
100s to 1000s of clock cycles depending on what is the circumstances of the=
 synchronization. It is how you arrange the SMP operations within the tasks=
 at hand across the SMP cores that gives methods for top performance.  Usin=
g traditional general purpose SMP methods will result in traditional genera=
l purpose performance. Migrating to general libraries (understood by most g=
eneral purpose programmers) from expert abilities (understood by much small=
er group of expert programmers focused on performance) will greatly reduce =
the value of DPDK since the end result will be lower performance and/or hav=
e less predictable operation where rate performance, predictability, and lo=
w latency are the primary goals.=0A=
>=0A=
> The best method to date, is to have multiple outputs to a single port is =
to use a DPDK queue with multiple producer, single consumer to do an SMP op=
eration for multiple sources to feed a single non SMP task to output to the=
 port (that is why the ports are not SMP protected). Also when considerable=
 contention from multiple sources occur often (data feeding at same time), =
having DPDK queue with input and output variables  in separate cache lines =
can have a notable throughput improvement.=0A=
>=0A=
> Mike =0A=
=0A=
Mike:=0A=
Thanks for detailed explanation. Do you have comment to this patch?=0A=
=0A=
>=0A=
> -----Original Message-----=0A=
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xie, Huawei=0A=
> Sent: Tuesday, January 19, 2016 8:44 AM=0A=
> To: Tan, Jianfeng; dev@dpdk.org=0A=
> Cc: ann.zhuangyanying@huawei.com=0A=
> Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the vir=
tio ring=0A=
>=0A=
> On 1/20/2016 12:25 AM, Tan, Jianfeng wrote:=0A=
>> Hi Huawei,=0A=
>>=0A=
>> On 1/4/2016 10:46 PM, Huawei Xie wrote:=0A=
>>> This patch removes the internal lockless enqueue implmentation.=0A=
>>> DPDK doesn't support receiving/transmitting packets from/to the same =
=0A=
>>> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK =0A=
>>> applications normally have their own lock implmentation when enqueue =
=0A=
>>> packets to the same queue of a port.=0A=
>>>=0A=
>>> The atomic cmpset is a costly operation. This patch should help =0A=
>>> performance a bit.=0A=
>>>=0A=
>>> Signed-off-by: Huawei Xie <huawei.xie@intel.com>=0A=
>>> ---=0A=
>>>   lib/librte_vhost/vhost_rxtx.c | 86=0A=
>>> +++++++++++++------------------------------=0A=
>>>   1 file changed, 25 insertions(+), 61 deletions(-)=0A=
>>>=0A=
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c =0A=
>>> b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..26a1b9c 100644=0A=
>>> --- a/lib/librte_vhost/vhost_rxtx.c=0A=
>>> +++ b/lib/librte_vhost/vhost_rxtx.c=0A=
>> I think vhost example will not work well with this patch when=0A=
>> vm2vm=3Dsoftware.=0A=
>>=0A=
>> Test case:=0A=
>> Two virtio ports handled by two pmd threads. Thread 0 polls pkts from=0A=
>> physical NIC and sends to virtio0, while thread0 receives pkts from=0A=
>> virtio1 and routes it to virtio0.=0A=
> vhost port will be wrapped as port, by vhost PMD. DPDK APP treats all=0A=
> physical and virtual ports as ports equally. When two DPDK threads try=0A=
> to enqueue to the same port, the APP needs to consider the contention.=0A=
> All the physical PMDs doesn't support concurrent enqueuing/dequeuing.=0A=
> Vhost PMD should expose the same behavior unless absolutely necessary=0A=
> and we expose the difference of different PMD.=0A=
>=0A=
>>> -=0A=
>>>           *(volatile uint16_t *)&vq->used->idx +=3D entry_success;=0A=
>> Another unrelated question: We ever try to move this assignment out of=
=0A=
>> loop to save cost as it's a data contention?=0A=
> This operation itself is not that costly, but it has side effect on the=
=0A=
> cache transfer.=0A=
> It is outside of the loop for non-mergable case. For mergeable case, it=
=0A=
> is inside the loop.=0A=
> Actually it has pro and cons whether we do this in burst or in a smaller=
=0A=
> step. I prefer to move it outside of the loop. Let us address this later.=
=0A=
>=0A=
>> Thanks,=0A=
>> Jianfeng=0A=
>>=0A=
>>=0A=
>=0A=
=0A=