From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 887B1A0613 for ; Wed, 25 Sep 2019 11:36:25 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 626CC288C; Wed, 25 Sep 2019 11:36:25 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 2C85923D for ; Wed, 25 Sep 2019 11:36:24 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2019 02:36:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,547,1559545200"; d="scan'208";a="179764480" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by orsmga007.jf.intel.com with ESMTP; 25 Sep 2019 02:36:22 -0700 Received: from shsmsx107.ccr.corp.intel.com (10.239.4.96) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.439.0; Wed, 25 Sep 2019 02:36:21 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.140]) by SHSMSX107.ccr.corp.intel.com ([169.254.9.89]) with mapi id 14.03.0439.000; Wed, 25 Sep 2019 17:36:20 +0800 From: "Liu, Yong" To: "Gavin Hu (Arm Technology China)" CC: "dev@dpdk.org" , nd , nd , "maxime.coquelin@redhat.com" , "Bie, Tiwei" , "Wang, Zhihong" Thread-Topic: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function for packed ring Thread-Index: AQHVcf9tU8514n/ZukGYby7mJZvv0Kc6KDHwgAF1FICAAIdFwA== Date: Wed, 25 Sep 2019 09:36:19 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E633B0F58@SHSMSX103.ccr.corp.intel.com> References: <20190905161421.55981-2-yong.liu@intel.com> <20190919163643.24130-1-yong.liu@intel.com> <20190919163643.24130-4-yong.liu@intel.com> <86228AFD5BCD8E4EBFD2B90117B5E81E633AA324@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDM5MGRjNWItNDk1NS00NjdiLThmYzItNzNkNTgwNmRjMTM3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiMUxUSUd1YWhEWGZQb3Q3Vm80OWtOWnVQbnk2OU9VUCt6RzNwWlwvaEhhXC96OU1ZXC83b0ExN3RvQkRRQUlNMVRXMSJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function for packed ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > Sent: Wednesday, September 25, 2019 5:29 PM > To: Liu, Yong ; maxime.coquelin@redhat.com; Bie, Tiwe= i > ; Wang, Zhihong > Cc: dev@dpdk.org; nd ; nd > Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue functio= n > for packed ring >=20 > Hi Marvin, >=20 > > -----Original Message----- > > From: Liu, Yong > > Sent: Tuesday, September 24, 2019 11:31 AM > > To: Gavin Hu (Arm Technology China) ; > > maxime.coquelin@redhat.com; Bie, Tiwei ; Wang, > > Zhihong > > Cc: dev@dpdk.org; nd > > Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue > function > > for packed ring > > > > Thanks, Gavin. My comments are inline. > > > > > -----Original Message----- > > > From: Gavin Hu (Arm Technology China) [mailto:Gavin.Hu@arm.com] > > > Sent: Monday, September 23, 2019 7:09 PM > > > To: Liu, Yong ; maxime.coquelin@redhat.com; Bie, > > Tiwei > > > ; Wang, Zhihong > > > Cc: dev@dpdk.org; nd > > > Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue > > function > > > for packed ring > > > > > > Hi Marvin, > > > > > > Is it possible to vectorize the processing? > > > Other comments inline: > > > /Gavin > > > > Gavin, > > According to our experiment, only vectorize some parts in [ed]nqueue > > function can't benefit performance. > > Functions like vhost_iova_to_vva and virtio_enqueue_offload can't be > > easily vectorized as they are full of judgment conditions. > > > > Thanks, > > Marvin > > > > > > -----Original Message----- > > > > From: dev On Behalf Of Marvin Liu > > > > Sent: Friday, September 20, 2019 12:37 AM > > > > To: maxime.coquelin@redhat.com; tiwei.bie@intel.com; > > > > zhihong.wang@intel.com > > > > Cc: dev@dpdk.org; Marvin Liu > > > > Subject: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue > function > > > for > > > > packed ring > > > > > > > > Burst enqueue function will first check whether descriptors are cac= he > > > > aligned. It will also check prerequisites in the beginning. Burst > > > > enqueue function not support chained mbufs, single packet enqueue > > > > function will handle it. > > > > > > > > Signed-off-by: Marvin Liu > > > > > > > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > > > > index 5074226f0..67889c80a 100644 > > > > --- a/lib/librte_vhost/vhost.h > > > > +++ b/lib/librte_vhost/vhost.h > > > > @@ -39,6 +39,9 @@ > > > > > > > > #define VHOST_LOG_CACHE_NR 32 > > > > > > > > +#define PACKED_DESCS_BURST (RTE_CACHE_LINE_SIZE / \ > > > > + sizeof(struct vring_packed_desc)) > > > > + > > > > #ifdef SUPPORT_GCC_UNROLL_PRAGMA > > > > #define PRAGMA_PARAM "GCC unroll 4" > > > > #endif > > > > @@ -57,6 +60,8 @@ > > > > #define UNROLL_PRAGMA(param) do {} while(0); > > > > #endif > > > > > > > > +#define PACKED_BURST_MASK (PACKED_DESCS_BURST - 1) > > > > + > > > > /** > > > > * Structure contains buffer address, length and descriptor index > > > > * from vring to do scatter RX. > > > > diff --git a/lib/librte_vhost/virtio_net.c > > > b/lib/librte_vhost/virtio_net.c > > > > index 2b5c47145..c664b27c5 100644 > > > > --- a/lib/librte_vhost/virtio_net.c > > > > +++ b/lib/librte_vhost/virtio_net.c > > > > @@ -895,6 +895,84 @@ virtio_dev_rx_split(struct virtio_net *dev, > > struct > > > > vhost_virtqueue *vq, > > > > return pkt_idx; > > > > } > > > > > > > > +static __rte_unused __rte_always_inline int > > > I remember "__rte_always_inline" should start at the first and separa= te > > > line, otherwise you will get a style issue. > > > /Gavin > > > > +virtio_dev_rx_burst_packed(struct virtio_net *dev, struct > > > vhost_virtqueue > > > > *vq, > > > > + struct rte_mbuf **pkts) > > > > +{ > > > > + bool wrap_counter =3D vq->avail_wrap_counter; > > > > + struct vring_packed_desc *descs =3D vq->desc_packed; > > > > + uint16_t avail_idx =3D vq->last_avail_idx; > > > > + > > > > + uint64_t desc_addrs[PACKED_DESCS_BURST]; > > > > + struct virtio_net_hdr_mrg_rxbuf *hdrs[PACKED_DESCS_BURST]; > > > > + uint32_t buf_offset =3D dev->vhost_hlen; > > > > + uint64_t lens[PACKED_DESCS_BURST]; > > > > + > > > > + uint16_t i; > > > > + > > > > + if (unlikely(avail_idx & PACKED_BURST_MASK)) > > > > + return -1; > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) { > > > > + if (unlikely(pkts[i]->next !=3D NULL)) > > > > + return -1; > > > > + if (unlikely(!desc_is_avail(&descs[avail_idx + i], > > > > + wrap_counter))) > > > > + return -1; > > > > + } > > > > + > > > > + rte_smp_rmb(); > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) > > > > + lens[i] =3D descs[avail_idx + i].len; > > > Looks like the code is a strong candidate for vectorization. > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) { > > > > + if (unlikely(pkts[i]->pkt_len > (lens[i] - buf_offset))) > > > > + return -1; > > > > + } > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) > > > > + desc_addrs[i] =3D vhost_iova_to_vva(dev, vq, > > > > + descs[avail_idx + i].addr, > > > > + &lens[i], > > > > + VHOST_ACCESS_RW); > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) { > > > > + if (unlikely(lens[i] !=3D descs[avail_idx + i].len)) > > > > + return -1; > > > > + } > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) { > > > > + rte_prefetch0((void *)(uintptr_t)desc_addrs[i]); > > > > + hdrs[i] =3D (struct virtio_net_hdr_mrg_rxbuf > *)desc_addrs[i]; > > > > + lens[i] =3D pkts[i]->pkt_len + dev->vhost_hlen; > > > > + } > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) > > > > + virtio_enqueue_offload(pkts[i], &hdrs[i]->hdr); > > > > + > > > A store barrier here is missing, last_avail_idx may be observed befor= e > the > > > above enqueue completion on weak memory order architectures. > > > For x86, a compiler barrier is also required. > > > > > Thanks a lot for point out. I guess your mention is that need to add > barrier > > between memcpy and enqueue. > > last_avail_idx is just local variable, no barrier is need to protect it= . >=20 > Sorry I was wrong, yes, last_avail_idx is a local variable(or we may call > it meta data). > Copying the headers and payload does not need to be ordered, we just need > to ensure all these happen before updating the idx, which is the single > synchronization point. > In one word, no barriers are required here. > /Gavin > > NP:) Nothing changed here in V3 patch. Thanks for kindly reviewing.=20 > > > > + vq->last_avail_idx +=3D PACKED_DESCS_BURST; > > > > + if (vq->last_avail_idx >=3D vq->size) { > > > > + vq->last_avail_idx -=3D vq->size; > > > > + vq->avail_wrap_counter ^=3D 1; > > > > + } > > > > + > > > > + UNROLL_PRAGMA(PRAGMA_PARAM) > > > > + for (i =3D 0; i < PACKED_DESCS_BURST; i++) { > > > > + rte_memcpy((void *)(uintptr_t)(desc_addrs[i] + > buf_offset), > > > > + rte_pktmbuf_mtod_offset(pkts[i], void *, 0), > > > > + pkts[i]->pkt_len); > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > static __rte_unused int16_t > > > > virtio_dev_rx_single_packed(struct virtio_net *dev, struct > > > vhost_virtqueue > > > > *vq, > > > > struct rte_mbuf *pkt) > > > > -- > > > > 2.17.1