From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B40ECA00C2; Fri, 24 Apr 2020 15:47:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0D74C1D149; Fri, 24 Apr 2020 15:47:57 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 1B9872C5E for ; Fri, 24 Apr 2020 15:47:54 +0200 (CEST) IronPort-SDR: wR+EbXKks+cmuaZbE06P0bgUcYMXK0bJA/sZ2fPFXoKjCho/J+BGvXsgRLu5tx/leQHgV/0IST Oea9adz8O5zA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2020 06:47:54 -0700 IronPort-SDR: BPLwwxN80wZsD6yZCS10QoqvCQ4k23UtOZVMrehwO4749SwbJ1Kf9uiFdJ0bYQLRBbriiUbGhW LIpL48B+qCSg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,311,1583222400"; d="scan'208";a="256371913" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by orsmga003.jf.intel.com with ESMTP; 24 Apr 2020 06:47:53 -0700 Received: from fmsmsx123.amr.corp.intel.com (10.18.125.38) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 24 Apr 2020 06:47:32 -0700 Received: from shsmsx106.ccr.corp.intel.com (10.239.4.159) by fmsmsx123.amr.corp.intel.com (10.18.125.38) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 24 Apr 2020 06:47:32 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.146]) by SHSMSX106.ccr.corp.intel.com ([169.254.10.89]) with mapi id 14.03.0439.000; Fri, 24 Apr 2020 21:47:28 +0800 From: "Liu, Yong" To: Maxime Coquelin , "Ye, Xiaolong" , "Wang, Zhihong" CC: "dev@dpdk.org" , "Van Haaren, Harry" Thread-Topic: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path Thread-Index: AQHWGdql0lmObK2O1kCKL6rRtYeq5qiHrlGAgACTa/D//375AIAAh6KQ Date: Fri, 24 Apr 2020 13:47:27 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E635432F0@SHSMSX103.ccr.corp.intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200424092445.44693-1-yong.liu@intel.com> <20200424092445.44693-8-yong.liu@intel.com> <94281b4c-2b05-4cca-7df8-93cbdf6a4f74@redhat.com> <86228AFD5BCD8E4EBFD2B90117B5E81E63543297@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Maxime Coquelin > Sent: Friday, April 24, 2020 9:36 PM > To: Liu, Yong ; Ye, Xiaolong ; > Wang, Zhihong > Cc: dev@dpdk.org; Van Haaren, Harry > Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx pat= h >=20 >=20 >=20 > On 4/24/20 3:33 PM, Liu, Yong wrote: > > > > > >> -----Original Message----- > >> From: Maxime Coquelin > >> Sent: Friday, April 24, 2020 8:30 PM > >> To: Liu, Yong ; Ye, Xiaolong ; > >> Wang, Zhihong > >> Cc: dev@dpdk.org; Van Haaren, Harry > >> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx = path > >> > >> > >> > >> On 4/24/20 11:24 AM, Marvin Liu wrote: > >>> Optimize packed ring Tx path alike Rx path. Split Tx path into batch = and > >> > >> s/alike/like/ ? > >> > >>> single Tx functions. Batch function is further optimized by AVX512 > >>> instructions. > >>> > >>> Signed-off-by: Marvin Liu > >>> > >>> diff --git a/drivers/net/virtio/virtio_ethdev.h > >> b/drivers/net/virtio/virtio_ethdev.h > >>> index 5c112cac7..b7d52d497 100644 > >>> --- a/drivers/net/virtio/virtio_ethdev.h > >>> +++ b/drivers/net/virtio/virtio_ethdev.h > >>> @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue, > >> struct rte_mbuf **rx_pkts, > >>> uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf > >> **rx_pkts, > >>> uint16_t nb_pkts); > >>> > >>> +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf > >> **tx_pkts, > >>> + uint16_t nb_pkts); > >>> + > >>> int eth_virtio_dev_init(struct rte_eth_dev *eth_dev); > >>> > >>> void virtio_interrupt_handler(void *param); > >>> diff --git a/drivers/net/virtio/virtio_rxtx.c > b/drivers/net/virtio/virtio_rxtx.c > >>> index cf18fe564..f82fe8d64 100644 > >>> --- a/drivers/net/virtio/virtio_rxtx.c > >>> +++ b/drivers/net/virtio/virtio_rxtx.c > >>> @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue > >> __rte_unused, > >>> { > >>> return 0; > >>> } > >>> + > >>> +__rte_weak uint16_t > >>> +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused, > >>> + struct rte_mbuf **tx_pkts __rte_unused, > >>> + uint16_t nb_pkts __rte_unused) > >>> +{ > >>> + return 0; > >>> +} > >>> diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c > >> b/drivers/net/virtio/virtio_rxtx_packed_avx.c > >>> index 8a7b459eb..c023ace4e 100644 > >>> --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c > >>> +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c > >>> @@ -23,6 +23,24 @@ > >>> #define PACKED_FLAGS_MASK ((0ULL | > >> VRING_PACKED_DESC_F_AVAIL_USED) << \ > >>> FLAGS_BITS_OFFSET) > >>> > >>> +/* reference count offset in mbuf rearm data */ > >>> +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \ > >>> + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE) > >>> +/* segment number offset in mbuf rearm data */ > >>> +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \ > >>> + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE) > >>> + > >>> +/* default rearm data */ > >>> +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \ > >>> + 1ULL << REFCNT_BITS_OFFSET) > >>> + > >>> +/* id bits offset in packed ring desc higher 64bits */ > >>> +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \ > >>> + offsetof(struct vring_packed_desc, len)) * BYTE_SIZE) > >>> + > >>> +/* net hdr short size mask */ > >>> +#define NET_HDR_MASK 0x3F > >>> + > >>> #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \ > >>> sizeof(struct vring_packed_desc)) > >>> #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1) > >>> @@ -47,6 +65,48 @@ > >>> for (iter =3D val; iter < num; iter++) > >>> #endif > >>> > >>> +static inline void > >>> +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq) > >>> +{ > >>> + struct vring_packed_desc *desc =3D vq->vq_packed.ring.desc; > >>> + struct vq_desc_extra *dxp; > >>> + uint16_t used_idx, id, curr_id, free_cnt =3D 0; > >>> + uint16_t size =3D vq->vq_nentries; > >>> + struct rte_mbuf *mbufs[size]; > >>> + uint16_t nb_mbuf =3D 0, i; > >>> + > >>> + used_idx =3D vq->vq_used_cons_idx; > >>> + > >>> + if (!desc_is_used(&desc[used_idx], vq)) > >>> + return; > >>> + > >>> + id =3D desc[used_idx].id; > >>> + > >>> + do { > >>> + curr_id =3D used_idx; > >>> + dxp =3D &vq->vq_descx[used_idx]; > >>> + used_idx +=3D dxp->ndescs; > >>> + free_cnt +=3D dxp->ndescs; > >>> + > >>> + if (dxp->cookie !=3D NULL) { > >>> + mbufs[nb_mbuf] =3D dxp->cookie; > >>> + dxp->cookie =3D NULL; > >>> + nb_mbuf++; > >>> + } > >>> + > >>> + if (used_idx >=3D size) { > >>> + used_idx -=3D size; > >>> + vq->vq_packed.used_wrap_counter ^=3D 1; > >>> + } > >>> + } while (curr_id !=3D id); > >>> + > >>> + for (i =3D 0; i < nb_mbuf; i++) > >>> + rte_pktmbuf_free(mbufs[i]); > >>> + > >>> + vq->vq_used_cons_idx =3D used_idx; > >>> + vq->vq_free_cnt +=3D free_cnt; > >>> +} > >>> + > >> > >> > >> I think you can re-use the inlined non-vectorized cleanup function her= e. > >> Or use your implementation in non-vectorized path. > >> BTW, do you know we have to pass the num argument in non-vectorized > >> case? I'm not sure to remember. > >> > > > > Maxime, > > This is simple version of xmit clean up function. It is based on the co= ncept > that backend will update used id in burst which also match frontend's > requirement. >=20 > And what the backend doesn't follow that concept? > It is just slower or broken? It is just slower. More packets maybe drop due to no free room in the ring.= =20 I will replace vectorized with non-vectorized version it shown good number. >=20 > > I just found original version work better in loopback case. Will adapt = it in > next version. > > > > Thanks, > > Marvin > > > >> Maxime > >