From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 13D6AA00C2; Fri, 24 Apr 2020 15:33:59 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E064D1D168; Fri, 24 Apr 2020 15:33:58 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id EC2D91C2F9 for ; Fri, 24 Apr 2020 15:33:56 +0200 (CEST) IronPort-SDR: v1SSxqwPx+FIDrJfBtnxaPgVzzgqdCd1T4bckrsTe/Qxd1txA+BBc4lCc9CaVrLllc6E2ug0Ev wO/XPVQR8lMg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2020 06:33:56 -0700 IronPort-SDR: dgNojNeK2nxYjJ023QjfiKN02qA90JGu7a3vpRBzjGMBhHOF2P9CDKw8uDbTdwJhA2CRSk81Ct LiOHaRkXr5XQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,311,1583222400"; d="scan'208";a="248065532" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by fmsmga008.fm.intel.com with ESMTP; 24 Apr 2020 06:33:54 -0700 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 24 Apr 2020 06:33:48 -0700 Received: from shsmsx108.ccr.corp.intel.com (10.239.4.97) by fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 24 Apr 2020 06:33:48 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.146]) by SHSMSX108.ccr.corp.intel.com ([169.254.8.7]) with mapi id 14.03.0439.000; Fri, 24 Apr 2020 21:33:45 +0800 From: "Liu, Yong" To: Maxime Coquelin , "Ye, Xiaolong" , "Wang, Zhihong" CC: "dev@dpdk.org" , "Van Haaren, Harry" Thread-Topic: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path Thread-Index: AQHWGdql0lmObK2O1kCKL6rRtYeq5qiHrlGAgACTa/A= Date: Fri, 24 Apr 2020 13:33:45 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E63543297@SHSMSX103.ccr.corp.intel.com> References: <20200313174230.74661-1-yong.liu@intel.com> <20200424092445.44693-1-yong.liu@intel.com> <20200424092445.44693-8-yong.liu@intel.com> <94281b4c-2b05-4cca-7df8-93cbdf6a4f74@redhat.com> In-Reply-To: <94281b4c-2b05-4cca-7df8-93cbdf6a4f74@redhat.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Maxime Coquelin > Sent: Friday, April 24, 2020 8:30 PM > To: Liu, Yong ; Ye, Xiaolong ; > Wang, Zhihong > Cc: dev@dpdk.org; Van Haaren, Harry > Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx pat= h >=20 >=20 >=20 > On 4/24/20 11:24 AM, Marvin Liu wrote: > > Optimize packed ring Tx path alike Rx path. Split Tx path into batch an= d >=20 > s/alike/like/ ? >=20 > > single Tx functions. Batch function is further optimized by AVX512 > > instructions. > > > > Signed-off-by: Marvin Liu > > > > diff --git a/drivers/net/virtio/virtio_ethdev.h > b/drivers/net/virtio/virtio_ethdev.h > > index 5c112cac7..b7d52d497 100644 > > --- a/drivers/net/virtio/virtio_ethdev.h > > +++ b/drivers/net/virtio/virtio_ethdev.h > > @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue, > struct rte_mbuf **rx_pkts, > > uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf > **rx_pkts, > > uint16_t nb_pkts); > > > > +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf > **tx_pkts, > > + uint16_t nb_pkts); > > + > > int eth_virtio_dev_init(struct rte_eth_dev *eth_dev); > > > > void virtio_interrupt_handler(void *param); > > diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virt= io_rxtx.c > > index cf18fe564..f82fe8d64 100644 > > --- a/drivers/net/virtio/virtio_rxtx.c > > +++ b/drivers/net/virtio/virtio_rxtx.c > > @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue > __rte_unused, > > { > > return 0; > > } > > + > > +__rte_weak uint16_t > > +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused, > > + struct rte_mbuf **tx_pkts __rte_unused, > > + uint16_t nb_pkts __rte_unused) > > +{ > > + return 0; > > +} > > diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c > b/drivers/net/virtio/virtio_rxtx_packed_avx.c > > index 8a7b459eb..c023ace4e 100644 > > --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c > > +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c > > @@ -23,6 +23,24 @@ > > #define PACKED_FLAGS_MASK ((0ULL | > VRING_PACKED_DESC_F_AVAIL_USED) << \ > > FLAGS_BITS_OFFSET) > > > > +/* reference count offset in mbuf rearm data */ > > +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \ > > + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE) > > +/* segment number offset in mbuf rearm data */ > > +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \ > > + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE) > > + > > +/* default rearm data */ > > +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \ > > + 1ULL << REFCNT_BITS_OFFSET) > > + > > +/* id bits offset in packed ring desc higher 64bits */ > > +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \ > > + offsetof(struct vring_packed_desc, len)) * BYTE_SIZE) > > + > > +/* net hdr short size mask */ > > +#define NET_HDR_MASK 0x3F > > + > > #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \ > > sizeof(struct vring_packed_desc)) > > #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1) > > @@ -47,6 +65,48 @@ > > for (iter =3D val; iter < num; iter++) > > #endif > > > > +static inline void > > +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq) > > +{ > > + struct vring_packed_desc *desc =3D vq->vq_packed.ring.desc; > > + struct vq_desc_extra *dxp; > > + uint16_t used_idx, id, curr_id, free_cnt =3D 0; > > + uint16_t size =3D vq->vq_nentries; > > + struct rte_mbuf *mbufs[size]; > > + uint16_t nb_mbuf =3D 0, i; > > + > > + used_idx =3D vq->vq_used_cons_idx; > > + > > + if (!desc_is_used(&desc[used_idx], vq)) > > + return; > > + > > + id =3D desc[used_idx].id; > > + > > + do { > > + curr_id =3D used_idx; > > + dxp =3D &vq->vq_descx[used_idx]; > > + used_idx +=3D dxp->ndescs; > > + free_cnt +=3D dxp->ndescs; > > + > > + if (dxp->cookie !=3D NULL) { > > + mbufs[nb_mbuf] =3D dxp->cookie; > > + dxp->cookie =3D NULL; > > + nb_mbuf++; > > + } > > + > > + if (used_idx >=3D size) { > > + used_idx -=3D size; > > + vq->vq_packed.used_wrap_counter ^=3D 1; > > + } > > + } while (curr_id !=3D id); > > + > > + for (i =3D 0; i < nb_mbuf; i++) > > + rte_pktmbuf_free(mbufs[i]); > > + > > + vq->vq_used_cons_idx =3D used_idx; > > + vq->vq_free_cnt +=3D free_cnt; > > +} > > + >=20 >=20 > I think you can re-use the inlined non-vectorized cleanup function here. > Or use your implementation in non-vectorized path. > BTW, do you know we have to pass the num argument in non-vectorized > case? I'm not sure to remember. >=20 Maxime, This is simple version of xmit clean up function. It is based on the concep= t that backend will update used id in burst which also match frontend's req= uirement. I just found original version work better in loopback case. Will adapt it i= n next version.=20 Thanks, Marvin > Maxime