From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 8DCC2A0613
	for <public@inbox.dpdk.org>; Tue, 24 Sep 2019 05:30:50 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 43231397D;
	Tue, 24 Sep 2019 05:30:49 +0200 (CEST)
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 9D2D22C18
 for <dev@dpdk.org>; Tue, 24 Sep 2019 05:30:47 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 23 Sep 2019 20:30:41 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.64,542,1559545200"; d="scan'208";a="190879740"
Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203])
 by orsmga003.jf.intel.com with ESMTP; 23 Sep 2019 20:30:39 -0700
Received: from fmsmsx115.amr.corp.intel.com (10.18.116.19) by
 FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Mon, 23 Sep 2019 20:30:39 -0700
Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by
 fmsmsx115.amr.corp.intel.com (10.18.116.19) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Mon, 23 Sep 2019 20:30:38 -0700
Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.140]) by
 shsmsx102.ccr.corp.intel.com ([169.254.2.113]) with mapi id 14.03.0439.000;
 Tue, 24 Sep 2019 11:30:36 +0800
From: "Liu, Yong" <yong.liu@intel.com>
To: "Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
 "maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>, "Bie, Tiwei"
 <tiwei.bie@intel.com>, "Wang, Zhihong" <zhihong.wang@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>
Thread-Topic: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function
 for	packed ring
Thread-Index: AQHVcf9tU8514n/ZukGYby7mJZvv0Kc6KDHw
Date: Tue, 24 Sep 2019 03:30:36 +0000
Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E633AA324@SHSMSX103.ccr.corp.intel.com>
References: <20190905161421.55981-2-yong.liu@intel.com>
 <20190919163643.24130-1-yong.liu@intel.com>
 <20190919163643.24130-4-yong.liu@intel.com>
 <VI1PR08MB537678465C5F9E3B033389D68F850@VI1PR08MB5376.eurprd08.prod.outlook.com>
In-Reply-To: <VI1PR08MB537678465C5F9E3B033389D68F850@VI1PR08MB5376.eurprd08.prod.outlook.com>
Accept-Language: zh-CN, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYzllYTNkMjktZmMzMC00YmQ1LWEwMGMtNjkwZjM0NTBhYTVhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoid0s1UDdhS25paE9xVWlkMUFGWGlQR3dkWkdxZW5JZUtaRDJUS3BlMEd3RnF5bkpxT0xJcHpiaVJiRW1CdkVqbSJ9
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.2.0.6
dlp-reaction: no-action
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function
 for	packed ring
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Thanks, Gavin. My comments are inline.

> -----Original Message-----
> From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com]
> Sent: Monday, September 23, 2019 7:09 PM
> To: Liu, Yong <yong.liu@intel.com>; maxime.coquelin@redhat.com; Bie, Tiwe=
i
> <tiwei.bie@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue functio=
n
> for packed ring
>=20
> Hi Marvin,
>=20
> Is it possible to vectorize the processing?
> Other comments inline:
> /Gavin

Gavin,
According to our experiment, only vectorize some parts in [ed]nqueue functi=
on can't benefit performance.
Functions like vhost_iova_to_vva and virtio_enqueue_offload can't be easily=
 vectorized as they are full of judgment conditions.

Thanks,
Marvin

> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Marvin Liu
> > Sent: Friday, September 20, 2019 12:37 AM
> > To: maxime.coquelin@redhat.com; tiwei.bie@intel.com;
> > zhihong.wang@intel.com
> > Cc: dev@dpdk.org; Marvin Liu <yong.liu@intel.com>
> > Subject: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function
> for
> > packed ring
> >
> > Burst enqueue function will first check whether descriptors are cache
> > aligned. It will also check prerequisites in the beginning. Burst
> > enqueue function not support chained mbufs, single packet enqueue
> > function will handle it.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 5074226f0..67889c80a 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,9 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#define PACKED_DESCS_BURST (RTE_CACHE_LINE_SIZE / \
> > +			    sizeof(struct vring_packed_desc))
> > +
> >  #ifdef SUPPORT_GCC_UNROLL_PRAGMA
> >  #define PRAGMA_PARAM "GCC unroll 4"
> >  #endif
> > @@ -57,6 +60,8 @@
> >  #define UNROLL_PRAGMA(param) do {} while(0);
> >  #endif
> >
> > +#define PACKED_BURST_MASK (PACKED_DESCS_BURST - 1)
> > +
> >  /**
> >   * Structure contains buffer address, length and descriptor index
> >   * from vring to do scatter RX.
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 2b5c47145..c664b27c5 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -895,6 +895,84 @@ virtio_dev_rx_split(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> >  	return pkt_idx;
> >  }
> >
> > +static __rte_unused __rte_always_inline int
> I remember "__rte_always_inline" should start at the first and separate
> line, otherwise you will get a style issue.
> /Gavin
> > +virtio_dev_rx_burst_packed(struct virtio_net *dev, struct
> vhost_virtqueue
> > *vq,
> > +	 struct rte_mbuf **pkts)
> > +{
> > +	bool wrap_counter =3D vq->avail_wrap_counter;
> > +	struct vring_packed_desc *descs =3D vq->desc_packed;
> > +	uint16_t avail_idx =3D vq->last_avail_idx;
> > +
> > +	uint64_t desc_addrs[PACKED_DESCS_BURST];
> > +	struct virtio_net_hdr_mrg_rxbuf *hdrs[PACKED_DESCS_BURST];
> > +	uint32_t buf_offset =3D dev->vhost_hlen;
> > +	uint64_t lens[PACKED_DESCS_BURST];
> > +
> > +	uint16_t i;
> > +
> > +	if (unlikely(avail_idx & PACKED_BURST_MASK))
> > +		return -1;
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++) {
> > +		if (unlikely(pkts[i]->next !=3D NULL))
> > +			return -1;
> > +		if (unlikely(!desc_is_avail(&descs[avail_idx + i],
> > +					    wrap_counter)))
> > +			return -1;
> > +	}
> > +
> > +	rte_smp_rmb();
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++)
> > +		lens[i] =3D descs[avail_idx + i].len;
> Looks like the code is a strong candidate for vectorization.
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++) {
> > +		if (unlikely(pkts[i]->pkt_len > (lens[i] - buf_offset)))
> > +			return -1;
> > +	}
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++)
> > +		desc_addrs[i] =3D vhost_iova_to_vva(dev, vq,
> > +						 descs[avail_idx + i].addr,
> > +						 &lens[i],
> > +						 VHOST_ACCESS_RW);
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++) {
> > +		if (unlikely(lens[i] !=3D descs[avail_idx + i].len))
> > +			return -1;
> > +	}
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++) {
> > +		rte_prefetch0((void *)(uintptr_t)desc_addrs[i]);
> > +		hdrs[i] =3D (struct virtio_net_hdr_mrg_rxbuf *)desc_addrs[i];
> > +		lens[i] =3D pkts[i]->pkt_len + dev->vhost_hlen;
> > +	}
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++)
> > +		virtio_enqueue_offload(pkts[i], &hdrs[i]->hdr);
> > +
> A store barrier here is missing, last_avail_idx may be observed before th=
e
> above enqueue completion on weak memory order architectures.
> For x86, a compiler barrier is also required.
>=20
Thanks a lot for point out. I guess your mention is that need to add barrie=
r between memcpy and enqueue.
last_avail_idx is just local variable, no barrier is need to protect it.

> > +	vq->last_avail_idx +=3D PACKED_DESCS_BURST;
> > +	if (vq->last_avail_idx >=3D vq->size) {
> > +		vq->last_avail_idx -=3D vq->size;
> > +		vq->avail_wrap_counter ^=3D 1;
> > +	}
> > +
> > +	UNROLL_PRAGMA(PRAGMA_PARAM)
> > +	for (i =3D 0; i < PACKED_DESCS_BURST; i++) {
> > +		rte_memcpy((void *)(uintptr_t)(desc_addrs[i] + buf_offset),
> > +			   rte_pktmbuf_mtod_offset(pkts[i], void *, 0),
> > +			   pkts[i]->pkt_len);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static __rte_unused int16_t
> >  virtio_dev_rx_single_packed(struct virtio_net *dev, struct
> vhost_virtqueue
> > *vq,
> >  	struct rte_mbuf *pkt)
> > --
> > 2.17.1