DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD
@ 2019-02-19 10:59 Tiwei Bie
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring Tiwei Bie
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Tiwei Bie @ 2019-02-19 10:59 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev

Below is a quick (unofficial) performance test (macfwd loop, 64B)
for the packed ring optimizations in this series on an Intel(R)
Xeon(R) Gold 6140 CPU @ 2.30GHz platform:

w/o this series:
packed ring normal/in-order:  ~10.4 Mpps

w/ this series:
packed ring normal:           ~10.9 Mpps
packed ring in-order:         ~11.3 Mpps

In the test, we need to make sure that the vhost side is fast enough.
So 4 forwarding cores are used in vhost side, and 1 forwarding core is
used in virtio side.

vhost side:

./x86_64-native-linuxapp-gcc/app/testpmd \
        -l 13,14,15,16,17 \
        --socket-mem 1024,0 \
        --file-prefix=vhost \
        --vdev=net_vhost0,iface=/tmp/vhost0,queues=4 \
        -- \
        --forward-mode=mac \
        -i \
        --rxq=4 \
        --txq=4 \
        --nb-cores 4

virtio side:

./x86_64-native-linuxapp-gcc/app/testpmd \
        -l 8,9,10,11,12 \
        --socket-mem 1024,0 \
        --single-file-segments \
        --file-prefix=virtio-user \
        --vdev=virtio_user0,path=/tmp/vhost0,queues=4,in_order=1,packed_vq=1 \
        -- \
        --forward-mode=mac \
        -i \
        --rxq=4 \
        --txq=4 \
        --nb-cores 1


Tiwei Bie (5):
  net/virtio: fix Tx desc cleanup for packed ring
  net/virtio: fix in-order Tx path for split ring
  net/virtio: fix in-order Tx path for packed ring
  net/virtio: introduce a helper for clearing net header
  net/virtio: optimize xmit enqueue for packed ring

 drivers/net/virtio/virtio_ethdev.c |   4 +-
 drivers/net/virtio/virtio_rxtx.c   | 203 ++++++++++++++++++++---------
 2 files changed, 146 insertions(+), 61 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
@ 2019-02-19 10:59 ` Tiwei Bie
  2019-02-21 11:05   ` Maxime Coquelin
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 2/5] net/virtio: fix in-order Tx path for split ring Tiwei Bie
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tiwei Bie @ 2019-02-19 10:59 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: stable

We should try to cleanup at least the 'need' number of descs.

Fixes: 892dc798fa9c ("net/virtio: implement Tx path for packed queues")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 4c701c514..b07ceac6d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -1943,7 +1943,6 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* Positive value indicates it need free vring descriptors */
 		if (unlikely(need > 0)) {
-			need = RTE_MIN(need, (int)nb_pkts);
 			virtio_xmit_cleanup_packed(vq, need);
 			need = slots - vq->vq_free_cnt;
 			if (unlikely(need > 0)) {
-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 2/5] net/virtio: fix in-order Tx path for split ring
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring Tiwei Bie
@ 2019-02-19 10:59 ` Tiwei Bie
  2019-02-21 11:08   ` Maxime Coquelin
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 3/5] net/virtio: fix in-order Tx path for packed ring Tiwei Bie
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tiwei Bie @ 2019-02-19 10:59 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: stable

When IN_ORDER feature is negotiated, device may just write out a
single used ring entry for a batch of buffers:

"""
Some devices always use descriptors in the same order in which they
have been made available. These devices can offer the VIRTIO_F_IN_ORDER
feature. If negotiated, this knowledge allows devices to notify the
use of a batch of buffers to the driver by only writing out a single
used ring entry with the id corresponding to the head entry of the
descriptor chain describing the last buffer in the batch.

The device then skips forward in the ring according to the size of
the batch. Accordingly, it increments the used idx by the size of
the batch.

The driver needs to look up the used id and calculate the batch size
to be able to advance to where the next used ring entry will be written
by the device.
"""

Currently, the in-order Tx path in split ring can't handle this.
With this patch, driver will allocate desc_extra[] based on the
index in avail/used ring instead of the index in descriptor table.
And driver can just relay on the used->idx written by device to
reclaim the descriptors and Tx buffers.

Fixes: e5f456a98d3c ("net/virtio: support in-order Rx and Tx")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b07ceac6d..407f58bce 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -279,7 +279,7 @@ virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
 static void
 virtio_xmit_cleanup_inorder(struct virtqueue *vq, uint16_t num)
 {
-	uint16_t i, used_idx, desc_idx = 0, last_idx;
+	uint16_t i, idx = vq->vq_used_cons_idx;
 	int16_t free_cnt = 0;
 	struct vq_desc_extra *dxp = NULL;
 
@@ -287,27 +287,16 @@ virtio_xmit_cleanup_inorder(struct virtqueue *vq, uint16_t num)
 		return;
 
 	for (i = 0; i < num; i++) {
-		struct vring_used_elem *uep;
-
-		used_idx = vq->vq_used_cons_idx & (vq->vq_nentries - 1);
-		uep = &vq->vq_ring.used->ring[used_idx];
-		desc_idx = (uint16_t)uep->id;
-
-		dxp = &vq->vq_descx[desc_idx];
-		vq->vq_used_cons_idx++;
-
+		dxp = &vq->vq_descx[idx++ & (vq->vq_nentries - 1)];
+		free_cnt += dxp->ndescs;
 		if (dxp->cookie != NULL) {
 			rte_pktmbuf_free(dxp->cookie);
 			dxp->cookie = NULL;
 		}
 	}
 
-	last_idx = desc_idx + dxp->ndescs - 1;
-	free_cnt = last_idx - vq->vq_desc_tail_idx;
-	if (free_cnt <= 0)
-		free_cnt += vq->vq_nentries;
-
-	vq_ring_free_inorder(vq, last_idx, free_cnt);
+	vq->vq_free_cnt += free_cnt;
+	vq->vq_used_cons_idx = idx;
 }
 
 static inline int
@@ -556,7 +545,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 
 	while (i < num) {
 		idx = idx & (vq->vq_nentries - 1);
-		dxp = &vq->vq_descx[idx];
+		dxp = &vq->vq_descx[vq->vq_avail_idx & (vq->vq_nentries - 1)];
 		dxp->cookie = (void *)cookies[i];
 		dxp->ndescs = 1;
 
@@ -708,7 +697,10 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 
 	head_idx = vq->vq_desc_head_idx;
 	idx = head_idx;
-	dxp = &vq->vq_descx[idx];
+	if (in_order)
+		dxp = &vq->vq_descx[vq->vq_avail_idx & (vq->vq_nentries - 1)];
+	else
+		dxp = &vq->vq_descx[idx];
 	dxp->cookie = (void *)cookie;
 	dxp->ndescs = needed;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 3/5] net/virtio: fix in-order Tx path for packed ring
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring Tiwei Bie
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 2/5] net/virtio: fix in-order Tx path for split ring Tiwei Bie
@ 2019-02-19 10:59 ` Tiwei Bie
  2019-02-21 11:16   ` Maxime Coquelin
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 4/5] net/virtio: introduce a helper for clearing net header Tiwei Bie
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tiwei Bie @ 2019-02-19 10:59 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: stable

When IN_ORDER feature is negotiated, device may just write out a
single used descriptor for a batch of buffers:

"""
Some devices always use descriptors in the same order in which they
have been made available. These devices can offer the VIRTIO_F_IN_ORDER
feature. If negotiated, this knowledge allows devices to notify the
use of a batch of buffers to the driver by only writing out a single
used descriptor with the Buffer ID corresponding to the last descriptor
in the batch.

The device then skips forward in the ring according to the size of the
batch. The driver needs to look up the used Buffer ID and calculate the
batch size to be able to advance to where the next used descriptor will
be written by the device.
"""

But the Tx path of packed ring can't handle this. With this patch,
when IN_ORDER is negotiated, driver will manage the IDs linearly,
look up the used buffer ID and advance to the next used descriptor
that will be written by the device.

Fixes: 892dc798fa9c ("net/virtio: implement Tx path for packed queues")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c |  4 +-
 drivers/net/virtio/virtio_rxtx.c   | 69 ++++++++++++++++++++++++------
 2 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 846983a01..78ba7bd29 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1453,7 +1453,8 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev)
 
 	if (vtpci_packed_queue(hw)) {
 		PMD_INIT_LOG(INFO,
-			"virtio: using packed ring standard Tx path on port %u",
+			"virtio: using packed ring %s Tx path on port %u",
+			hw->use_inorder_tx ? "inorder" : "standard",
 			eth_dev->data->port_id);
 		eth_dev->tx_pkt_burst = virtio_xmit_pkts_packed;
 	} else {
@@ -2069,7 +2070,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 	if (vtpci_packed_queue(hw)) {
 		hw->use_simple_rx = 0;
 		hw->use_inorder_rx = 0;
-		hw->use_inorder_tx = 0;
 	}
 
 #if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 407f58bce..c888aa9ff 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -224,9 +224,40 @@ virtqueue_dequeue_rx_inorder(struct virtqueue *vq,
 #define DEFAULT_TX_FREE_THRESH 32
 #endif
 
-/* Cleanup from completed transmits. */
 static void
-virtio_xmit_cleanup_packed(struct virtqueue *vq, int num)
+virtio_xmit_cleanup_inorder_packed(struct virtqueue *vq, int num)
+{
+	uint16_t used_idx, id, curr_id, free_cnt = 0;
+	uint16_t size = vq->vq_nentries;
+	struct vring_packed_desc *desc = vq->ring_packed.desc_packed;
+	struct vq_desc_extra *dxp;
+
+	used_idx = vq->vq_used_cons_idx;
+	while (num > 0 && desc_is_used(&desc[used_idx], vq)) {
+		virtio_rmb(vq->hw->weak_barriers);
+		id = desc[used_idx].id;
+		do {
+			curr_id = used_idx;
+			dxp = &vq->vq_descx[used_idx];
+			used_idx += dxp->ndescs;
+			free_cnt += dxp->ndescs;
+			num -= dxp->ndescs;
+			if (used_idx >= size) {
+				used_idx -= size;
+				vq->used_wrap_counter ^= 1;
+			}
+			if (dxp->cookie != NULL) {
+				rte_pktmbuf_free(dxp->cookie);
+				dxp->cookie = NULL;
+			}
+		} while (curr_id != id);
+	}
+	vq->vq_used_cons_idx = used_idx;
+	vq->vq_free_cnt += free_cnt;
+}
+
+static void
+virtio_xmit_cleanup_normal_packed(struct virtqueue *vq, int num)
 {
 	uint16_t used_idx, id;
 	uint16_t size = vq->vq_nentries;
@@ -252,6 +283,16 @@ virtio_xmit_cleanup_packed(struct virtqueue *vq, int num)
 	}
 }
 
+/* Cleanup from completed transmits. */
+static inline void
+virtio_xmit_cleanup_packed(struct virtqueue *vq, int num, int in_order)
+{
+	if (in_order)
+		virtio_xmit_cleanup_inorder_packed(vq, num);
+	else
+		virtio_xmit_cleanup_normal_packed(vq, num);
+}
+
 static void
 virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
 {
@@ -582,7 +623,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 
 static inline void
 virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
-			      uint16_t needed, int can_push)
+			      uint16_t needed, int can_push, int in_order)
 {
 	struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
 	struct vq_desc_extra *dxp;
@@ -593,7 +634,7 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	struct virtio_net_hdr *hdr;
 	uint16_t prev;
 
-	id = vq->vq_desc_head_idx;
+	id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx;
 
 	dxp = &vq->vq_descx[id];
 	dxp->ndescs = needed;
@@ -670,13 +711,14 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	start_dp[prev].id = id;
 
 	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed);
-
-	vq->vq_desc_head_idx = dxp->next;
-	if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
-		vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END;
-
 	vq->vq_avail_idx = idx;
 
+	if (!in_order) {
+		vq->vq_desc_head_idx = dxp->next;
+		if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
+			vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END;
+	}
+
 	virtio_wmb(vq->hw->weak_barriers);
 	head_dp->flags = head_flags;
 }
@@ -1889,6 +1931,7 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct virtio_hw *hw = vq->hw;
 	uint16_t hdr_size = hw->vtnet_hdr_size;
 	uint16_t nb_tx = 0;
+	bool in_order = hw->use_inorder_tx;
 	int error;
 
 	if (unlikely(hw->started == 0 && tx_pkts != hw->inject_pkts))
@@ -1900,7 +1943,8 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
 	PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
 
 	if (nb_pkts > vq->vq_free_cnt)
-		virtio_xmit_cleanup_packed(vq, nb_pkts - vq->vq_free_cnt);
+		virtio_xmit_cleanup_packed(vq, nb_pkts - vq->vq_free_cnt,
+					   in_order);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		struct rte_mbuf *txm = tx_pkts[nb_tx];
@@ -1935,7 +1979,7 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* Positive value indicates it need free vring descriptors */
 		if (unlikely(need > 0)) {
-			virtio_xmit_cleanup_packed(vq, need);
+			virtio_xmit_cleanup_packed(vq, need, in_order);
 			need = slots - vq->vq_free_cnt;
 			if (unlikely(need > 0)) {
 				PMD_TX_LOG(ERR,
@@ -1945,7 +1989,8 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
 		}
 
 		/* Enqueue Packet buffers */
-		virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push);
+		virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push,
+					      in_order);
 
 		virtio_update_packet_stats(&txvq->stats, txm);
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 4/5] net/virtio: introduce a helper for clearing net header
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
                   ` (2 preceding siblings ...)
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 3/5] net/virtio: fix in-order Tx path for packed ring Tiwei Bie
@ 2019-02-19 10:59 ` Tiwei Bie
  2019-02-21 11:18   ` Maxime Coquelin
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring Tiwei Bie
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Tiwei Bie @ 2019-02-19 10:59 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev

This patch introduces a helper for clearing the virtio net header
to avoid the code duplication. Macro is used as it shows slightly
better performance.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 46 +++++++++++++-------------------
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c888aa9ff..60fa3aa50 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -519,6 +519,15 @@ virtio_tso_fix_cksum(struct rte_mbuf *m)
 		(var) = (val);			\
 } while (0)
 
+#define virtqueue_clear_net_hdr(_hdr) do {		\
+	ASSIGN_UNLESS_EQUAL((_hdr)->csum_start, 0);	\
+	ASSIGN_UNLESS_EQUAL((_hdr)->csum_offset, 0);	\
+	ASSIGN_UNLESS_EQUAL((_hdr)->flags, 0);		\
+	ASSIGN_UNLESS_EQUAL((_hdr)->gso_type, 0);	\
+	ASSIGN_UNLESS_EQUAL((_hdr)->gso_size, 0);	\
+	ASSIGN_UNLESS_EQUAL((_hdr)->hdr_len, 0);	\
+} while (0)
+
 static inline void
 virtqueue_xmit_offload(struct virtio_net_hdr *hdr,
 			struct rte_mbuf *cookie,
@@ -594,18 +603,11 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 			rte_pktmbuf_prepend(cookies[i], head_size);
 		cookies[i]->pkt_len -= head_size;
 
-		/* if offload disabled, it is not zeroed below, do it now */
-		if (!vq->hw->has_tx_offload) {
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
-
-		virtqueue_xmit_offload(hdr, cookies[i],
-				vq->hw->has_tx_offload);
+		/* if offload disabled, hdr is not zeroed yet, do it now */
+		if (!vq->hw->has_tx_offload)
+			virtqueue_clear_net_hdr(hdr);
+		else
+			virtqueue_xmit_offload(hdr, cookies[i], true);
 
 		start_dp[idx].addr  = VIRTIO_MBUF_DATA_DMA_ADDR(cookies[i], vq);
 		start_dp[idx].len   = cookies[i]->data_len;
@@ -659,14 +661,8 @@ virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		cookie->pkt_len -= head_size;
 
 		/* if offload disabled, it is not zeroed below, do it now */
-		if (!vq->hw->has_tx_offload) {
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
+		if (!vq->hw->has_tx_offload)
+			virtqueue_clear_net_hdr(hdr);
 	} else {
 		/* setup first tx ring slot to point to header
 		 * stored in reserved region.
@@ -758,14 +754,8 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		cookie->pkt_len -= head_size;
 
 		/* if offload disabled, it is not zeroed below, do it now */
-		if (!vq->hw->has_tx_offload) {
-			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
-			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
-		}
+		if (!vq->hw->has_tx_offload)
+			virtqueue_clear_net_hdr(hdr);
 	} else if (use_indirect) {
 		/* setup tx ring slot to point to indirect
 		 * descriptor list stored in reserved region.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
                   ` (3 preceding siblings ...)
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 4/5] net/virtio: introduce a helper for clearing net header Tiwei Bie
@ 2019-02-19 10:59 ` Tiwei Bie
  2019-02-21 11:22   ` Maxime Coquelin
  2019-02-19 13:40 ` [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Jason Wang
  2019-02-21 17:45 ` Maxime Coquelin
  6 siblings, 1 reply; 16+ messages in thread
From: Tiwei Bie @ 2019-02-19 10:59 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev

This patch introduces an optimized enqueue function in packed
ring for the case that virtio net header can be prepended to
the unchained mbuf.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 63 +++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 60fa3aa50..771d3c3f6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -623,6 +623,62 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
 	vq->vq_desc_head_idx = idx & (vq->vq_nentries - 1);
 }
 
+static inline void
+virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
+				   struct rte_mbuf *cookie,
+				   int in_order)
+{
+	struct virtqueue *vq = txvq->vq;
+	struct vring_packed_desc *dp;
+	struct vq_desc_extra *dxp;
+	uint16_t idx, id, flags;
+	uint16_t head_size = vq->hw->vtnet_hdr_size;
+	struct virtio_net_hdr *hdr;
+
+	id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx;
+	idx = vq->vq_avail_idx;
+	dp = &vq->ring_packed.desc_packed[idx];
+
+	dxp = &vq->vq_descx[id];
+	dxp->ndescs = 1;
+	dxp->cookie = cookie;
+
+	flags = vq->avail_used_flags;
+
+	/* prepend cannot fail, checked by caller */
+	hdr = (struct virtio_net_hdr *)
+		rte_pktmbuf_prepend(cookie, head_size);
+	cookie->pkt_len -= head_size;
+
+	/* if offload disabled, hdr is not zeroed yet, do it now */
+	if (!vq->hw->has_tx_offload)
+		virtqueue_clear_net_hdr(hdr);
+	else
+		virtqueue_xmit_offload(hdr, cookie, true);
+
+	dp->addr = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
+	dp->len  = cookie->data_len;
+	dp->id   = id;
+
+	if (++vq->vq_avail_idx >= vq->vq_nentries) {
+		vq->vq_avail_idx -= vq->vq_nentries;
+		vq->avail_wrap_counter ^= 1;
+		vq->avail_used_flags ^=
+			VRING_DESC_F_AVAIL(1) | VRING_DESC_F_USED(1);
+	}
+
+	vq->vq_free_cnt--;
+
+	if (!in_order) {
+		vq->vq_desc_head_idx = dxp->next;
+		if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
+			vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END;
+	}
+
+	virtio_wmb(vq->hw->weak_barriers);
+	dp->flags = flags;
+}
+
 static inline void
 virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 			      uint16_t needed, int can_push, int in_order)
@@ -1979,8 +2035,11 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
 		}
 
 		/* Enqueue Packet buffers */
-		virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push,
-					      in_order);
+		if (can_push)
+			virtqueue_enqueue_xmit_packed_fast(txvq, txm, in_order);
+		else
+			virtqueue_enqueue_xmit_packed(txvq, txm, slots, 0,
+						      in_order);
 
 		virtio_update_packet_stats(&txvq->stats, txm);
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
                   ` (4 preceding siblings ...)
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring Tiwei Bie
@ 2019-02-19 13:40 ` Jason Wang
  2019-02-20  2:23   ` Tiwei Bie
  2019-02-21 17:45 ` Maxime Coquelin
  6 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2019-02-19 13:40 UTC (permalink / raw)
  To: Tiwei Bie, maxime.coquelin, zhihong.wang, dev


On 2019/2/19 下午6:59, Tiwei Bie wrote:
> Below is a quick (unofficial) performance test (macfwd loop, 64B)
> for the packed ring optimizations in this series on an Intel(R)
> Xeon(R) Gold 6140 CPU @ 2.30GHz platform:
>
> w/o this series:
> packed ring normal/in-order:  ~10.4 Mpps
>
> w/ this series:
> packed ring normal:           ~10.9 Mpps
> packed ring in-order:         ~11.3 Mpps


Since your series contain optimization for split ring as well. I wonder 
whether you have its numbers as well.

Thanks


>
> In the test, we need to make sure that the vhost side is fast enough.
> So 4 forwarding cores are used in vhost side, and 1 forwarding core is
> used in virtio side.
>
> vhost side:
>
> ./x86_64-native-linuxapp-gcc/app/testpmd \
>          -l 13,14,15,16,17 \
>          --socket-mem 1024,0 \
>          --file-prefix=vhost \
>          --vdev=net_vhost0,iface=/tmp/vhost0,queues=4 \
>          -- \
>          --forward-mode=mac \
>          -i \
>          --rxq=4 \
>          --txq=4 \
>          --nb-cores 4
>
> virtio side:
>
> ./x86_64-native-linuxapp-gcc/app/testpmd \
>          -l 8,9,10,11,12 \
>          --socket-mem 1024,0 \
>          --single-file-segments \
>          --file-prefix=virtio-user \
>          --vdev=virtio_user0,path=/tmp/vhost0,queues=4,in_order=1,packed_vq=1 \
>          -- \
>          --forward-mode=mac \
>          -i \
>          --rxq=4 \
>          --txq=4 \
>          --nb-cores 1
>
>
> Tiwei Bie (5):
>    net/virtio: fix Tx desc cleanup for packed ring
>    net/virtio: fix in-order Tx path for split ring
>    net/virtio: fix in-order Tx path for packed ring
>    net/virtio: introduce a helper for clearing net header
>    net/virtio: optimize xmit enqueue for packed ring
>
>   drivers/net/virtio/virtio_ethdev.c |   4 +-
>   drivers/net/virtio/virtio_rxtx.c   | 203 ++++++++++++++++++++---------
>   2 files changed, 146 insertions(+), 61 deletions(-)
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD
  2019-02-19 13:40 ` [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Jason Wang
@ 2019-02-20  2:23   ` Tiwei Bie
  0 siblings, 0 replies; 16+ messages in thread
From: Tiwei Bie @ 2019-02-20  2:23 UTC (permalink / raw)
  To: Jason Wang; +Cc: maxime.coquelin, zhihong.wang, dev

On Tue, Feb 19, 2019 at 09:40:05PM +0800, Jason Wang wrote:
> On 2019/2/19 下午6:59, Tiwei Bie wrote:
> > Below is a quick (unofficial) performance test (macfwd loop, 64B)
> > for the packed ring optimizations in this series on an Intel(R)
> > Xeon(R) Gold 6140 CPU @ 2.30GHz platform:
> > 
> > w/o this series:
> > packed ring normal/in-order:  ~10.4 Mpps
> > 
> > w/ this series:
> > packed ring normal:           ~10.9 Mpps
> > packed ring in-order:         ~11.3 Mpps
> 
> 
> Since your series contain optimization for split ring as well. I wonder
> whether you have its numbers as well.

The PPS of split ring in-order (with or without this series)
showed by testpmd isn't stable in my above test. So I didn't
manage to get some numbers..

> 
> Thanks
> 
> 
> > 
> > In the test, we need to make sure that the vhost side is fast enough.
> > So 4 forwarding cores are used in vhost side, and 1 forwarding core is
> > used in virtio side.
> > 
> > vhost side:
> > 
> > ./x86_64-native-linuxapp-gcc/app/testpmd \
> >          -l 13,14,15,16,17 \
> >          --socket-mem 1024,0 \
> >          --file-prefix=vhost \
> >          --vdev=net_vhost0,iface=/tmp/vhost0,queues=4 \
> >          -- \
> >          --forward-mode=mac \
> >          -i \
> >          --rxq=4 \
> >          --txq=4 \
> >          --nb-cores 4
> > 
> > virtio side:
> > 
> > ./x86_64-native-linuxapp-gcc/app/testpmd \
> >          -l 8,9,10,11,12 \
> >          --socket-mem 1024,0 \
> >          --single-file-segments \
> >          --file-prefix=virtio-user \
> >          --vdev=virtio_user0,path=/tmp/vhost0,queues=4,in_order=1,packed_vq=1 \
> >          -- \
> >          --forward-mode=mac \
> >          -i \
> >          --rxq=4 \
> >          --txq=4 \
> >          --nb-cores 1
> > 
> > 
> > Tiwei Bie (5):
> >    net/virtio: fix Tx desc cleanup for packed ring
> >    net/virtio: fix in-order Tx path for split ring
> >    net/virtio: fix in-order Tx path for packed ring
> >    net/virtio: introduce a helper for clearing net header
> >    net/virtio: optimize xmit enqueue for packed ring
> > 
> >   drivers/net/virtio/virtio_ethdev.c |   4 +-
> >   drivers/net/virtio/virtio_rxtx.c   | 203 ++++++++++++++++++++---------
> >   2 files changed, 146 insertions(+), 61 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring Tiwei Bie
@ 2019-02-21 11:05   ` Maxime Coquelin
  0 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 11:05 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: stable



On 2/19/19 11:59 AM, Tiwei Bie wrote:
> We should try to cleanup at least the 'need' number of descs.
> 
> Fixes: 892dc798fa9c ("net/virtio: implement Tx path for packed queues")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_rxtx.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 4c701c514..b07ceac6d 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -1943,7 +1943,6 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
>   
>   		/* Positive value indicates it need free vring descriptors */
>   		if (unlikely(need > 0)) {
> -			need = RTE_MIN(need, (int)nb_pkts);
>   			virtio_xmit_cleanup_packed(vq, need);
>   			need = slots - vq->vq_free_cnt;
>   			if (unlikely(need > 0)) {
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] net/virtio: fix in-order Tx path for split ring
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 2/5] net/virtio: fix in-order Tx path for split ring Tiwei Bie
@ 2019-02-21 11:08   ` Maxime Coquelin
  0 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 11:08 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: stable



On 2/19/19 11:59 AM, Tiwei Bie wrote:
> When IN_ORDER feature is negotiated, device may just write out a
> single used ring entry for a batch of buffers:
> 
> """
> Some devices always use descriptors in the same order in which they
> have been made available. These devices can offer the VIRTIO_F_IN_ORDER
> feature. If negotiated, this knowledge allows devices to notify the
> use of a batch of buffers to the driver by only writing out a single
> used ring entry with the id corresponding to the head entry of the
> descriptor chain describing the last buffer in the batch.
> 
> The device then skips forward in the ring according to the size of
> the batch. Accordingly, it increments the used idx by the size of
> the batch.
> 
> The driver needs to look up the used id and calculate the batch size
> to be able to advance to where the next used ring entry will be written
> by the device.
> """
> 
> Currently, the in-order Tx path in split ring can't handle this.
> With this patch, driver will allocate desc_extra[] based on the
> index in avail/used ring instead of the index in descriptor table.
> And driver can just relay on the used->idx written by device to
> reclaim the descriptors and Tx buffers.
> 
> Fixes: e5f456a98d3c ("net/virtio: support in-order Rx and Tx")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_rxtx.c | 28 ++++++++++------------------
>   1 file changed, 10 insertions(+), 18 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 3/5] net/virtio: fix in-order Tx path for packed ring
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 3/5] net/virtio: fix in-order Tx path for packed ring Tiwei Bie
@ 2019-02-21 11:16   ` Maxime Coquelin
  0 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 11:16 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: stable



On 2/19/19 11:59 AM, Tiwei Bie wrote:
> When IN_ORDER feature is negotiated, device may just write out a
> single used descriptor for a batch of buffers:
> 
> """
> Some devices always use descriptors in the same order in which they
> have been made available. These devices can offer the VIRTIO_F_IN_ORDER
> feature. If negotiated, this knowledge allows devices to notify the
> use of a batch of buffers to the driver by only writing out a single
> used descriptor with the Buffer ID corresponding to the last descriptor
> in the batch.
> 
> The device then skips forward in the ring according to the size of the
> batch. The driver needs to look up the used Buffer ID and calculate the
> batch size to be able to advance to where the next used descriptor will
> be written by the device.
> """
> 
> But the Tx path of packed ring can't handle this. With this patch,
> when IN_ORDER is negotiated, driver will manage the IDs linearly,
> look up the used buffer ID and advance to the next used descriptor
> that will be written by the device.
> 
> Fixes: 892dc798fa9c ("net/virtio: implement Tx path for packed queues")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_ethdev.c |  4 +-
>   drivers/net/virtio/virtio_rxtx.c   | 69 ++++++++++++++++++++++++------
>   2 files changed, 59 insertions(+), 14 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 4/5] net/virtio: introduce a helper for clearing net header
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 4/5] net/virtio: introduce a helper for clearing net header Tiwei Bie
@ 2019-02-21 11:18   ` Maxime Coquelin
  0 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 11:18 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev



On 2/19/19 11:59 AM, Tiwei Bie wrote:
> This patch introduces a helper for clearing the virtio net header
> to avoid the code duplication. Macro is used as it shows slightly
> better performance.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_rxtx.c | 46 +++++++++++++-------------------
>   1 file changed, 18 insertions(+), 28 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring
  2019-02-19 10:59 ` [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring Tiwei Bie
@ 2019-02-21 11:22   ` Maxime Coquelin
  2019-02-21 12:25     ` Tiwei Bie
  0 siblings, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 11:22 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev



On 2/19/19 11:59 AM, Tiwei Bie wrote:
> This patch introduces an optimized enqueue function in packed
> ring for the case that virtio net header can be prepended to
> the unchained mbuf.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_rxtx.c | 63 +++++++++++++++++++++++++++++++-
>   1 file changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 60fa3aa50..771d3c3f6 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -623,6 +623,62 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
>   	vq->vq_desc_head_idx = idx & (vq->vq_nentries - 1);
>   }
>   
> +static inline void
> +virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
> +				   struct rte_mbuf *cookie,
> +				   int in_order)
> +{
> +	struct virtqueue *vq = txvq->vq;
> +	struct vring_packed_desc *dp;
> +	struct vq_desc_extra *dxp;
> +	uint16_t idx, id, flags;
> +	uint16_t head_size = vq->hw->vtnet_hdr_size;
> +	struct virtio_net_hdr *hdr;
> +
> +	id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx;
> +	idx = vq->vq_avail_idx;
> +	dp = &vq->ring_packed.desc_packed[idx];
> +
> +	dxp = &vq->vq_descx[id];
> +	dxp->ndescs = 1;
> +	dxp->cookie = cookie;
> +
> +	flags = vq->avail_used_flags;
> +
> +	/* prepend cannot fail, checked by caller */
> +	hdr = (struct virtio_net_hdr *)
> +		rte_pktmbuf_prepend(cookie, head_size);
> +	cookie->pkt_len -= head_size;
> +
> +	/* if offload disabled, hdr is not zeroed yet, do it now */
> +	if (!vq->hw->has_tx_offload)
> +		virtqueue_clear_net_hdr(hdr);
> +	else
> +		virtqueue_xmit_offload(hdr, cookie, true);
> +
> +	dp->addr = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
> +	dp->len  = cookie->data_len;
> +	dp->id   = id;
> +
> +	if (++vq->vq_avail_idx >= vq->vq_nentries) {
> +		vq->vq_avail_idx -= vq->vq_nentries;
> +		vq->avail_wrap_counter ^= 1;
> +		vq->avail_used_flags ^=
> +			VRING_DESC_F_AVAIL(1) | VRING_DESC_F_USED(1);
> +	}
> +
> +	vq->vq_free_cnt--;
> +
> +	if (!in_order) {
> +		vq->vq_desc_head_idx = dxp->next;
> +		if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
> +			vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END;
> +	}
> +
> +	virtio_wmb(vq->hw->weak_barriers);
> +	dp->flags = flags;
> +}
> +
>   static inline void
>   virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>   			      uint16_t needed, int can_push, int in_order)
> @@ -1979,8 +2035,11 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
>   		}
>   
>   		/* Enqueue Packet buffers */
> -		virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push,
> -					      in_order);
> +		if (can_push)
> +			virtqueue_enqueue_xmit_packed_fast(txvq, txm, in_order);
> +		else
> +			virtqueue_enqueue_xmit_packed(txvq, txm, slots, 0,
> +						      in_order);
>   
>   		virtio_update_packet_stats(&txvq->stats, txm);
>   	}
> 

I like this patch, but shouldn't virtqueue_enqueue_xmit_packed() be
simplified to get rid off "can_push" now that this case as a dedicated
function?

Regards,
Maxime

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring
  2019-02-21 11:22   ` Maxime Coquelin
@ 2019-02-21 12:25     ` Tiwei Bie
  2019-02-21 12:31       ` Maxime Coquelin
  0 siblings, 1 reply; 16+ messages in thread
From: Tiwei Bie @ 2019-02-21 12:25 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: zhihong.wang, dev

On Thu, Feb 21, 2019 at 12:22:29PM +0100, Maxime Coquelin wrote:
> On 2/19/19 11:59 AM, Tiwei Bie wrote:
> > This patch introduces an optimized enqueue function in packed
> > ring for the case that virtio net header can be prepended to
> > the unchained mbuf.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/net/virtio/virtio_rxtx.c | 63 +++++++++++++++++++++++++++++++-
> >   1 file changed, 61 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> > index 60fa3aa50..771d3c3f6 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -623,6 +623,62 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
> >   	vq->vq_desc_head_idx = idx & (vq->vq_nentries - 1);
> >   }
> > +static inline void
> > +virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
> > +				   struct rte_mbuf *cookie,
> > +				   int in_order)
> > +{
> > +	struct virtqueue *vq = txvq->vq;
> > +	struct vring_packed_desc *dp;
> > +	struct vq_desc_extra *dxp;
> > +	uint16_t idx, id, flags;
> > +	uint16_t head_size = vq->hw->vtnet_hdr_size;
> > +	struct virtio_net_hdr *hdr;
> > +
> > +	id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx;
> > +	idx = vq->vq_avail_idx;
> > +	dp = &vq->ring_packed.desc_packed[idx];
> > +
> > +	dxp = &vq->vq_descx[id];
> > +	dxp->ndescs = 1;
> > +	dxp->cookie = cookie;
> > +
> > +	flags = vq->avail_used_flags;
> > +
> > +	/* prepend cannot fail, checked by caller */
> > +	hdr = (struct virtio_net_hdr *)
> > +		rte_pktmbuf_prepend(cookie, head_size);
> > +	cookie->pkt_len -= head_size;
> > +
> > +	/* if offload disabled, hdr is not zeroed yet, do it now */
> > +	if (!vq->hw->has_tx_offload)
> > +		virtqueue_clear_net_hdr(hdr);
> > +	else
> > +		virtqueue_xmit_offload(hdr, cookie, true);
> > +
> > +	dp->addr = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
> > +	dp->len  = cookie->data_len;
> > +	dp->id   = id;
> > +
> > +	if (++vq->vq_avail_idx >= vq->vq_nentries) {
> > +		vq->vq_avail_idx -= vq->vq_nentries;
> > +		vq->avail_wrap_counter ^= 1;
> > +		vq->avail_used_flags ^=
> > +			VRING_DESC_F_AVAIL(1) | VRING_DESC_F_USED(1);
> > +	}
> > +
> > +	vq->vq_free_cnt--;
> > +
> > +	if (!in_order) {
> > +		vq->vq_desc_head_idx = dxp->next;
> > +		if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
> > +			vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END;
> > +	}
> > +
> > +	virtio_wmb(vq->hw->weak_barriers);
> > +	dp->flags = flags;
> > +}
> > +
> >   static inline void
> >   virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
> >   			      uint16_t needed, int can_push, int in_order)
> > @@ -1979,8 +2035,11 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
> >   		}
> >   		/* Enqueue Packet buffers */
> > -		virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push,
> > -					      in_order);
> > +		if (can_push)
> > +			virtqueue_enqueue_xmit_packed_fast(txvq, txm, in_order);
> > +		else
> > +			virtqueue_enqueue_xmit_packed(txvq, txm, slots, 0,
> > +						      in_order);
> >   		virtio_update_packet_stats(&txvq->stats, txm);
> >   	}
> > 
> 
> I like this patch, but shouldn't virtqueue_enqueue_xmit_packed() be
> simplified to get rid off "can_push" now that this case as a dedicated
> function?

Yeah, I had the same thought. But after a second thought, I
think we may also want to push the net hdr to the mbuf even
if its nb_segs isn't 1 in the future, so I left it untouched.

Thanks,
Tiwei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring
  2019-02-21 12:25     ` Tiwei Bie
@ 2019-02-21 12:31       ` Maxime Coquelin
  0 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 12:31 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, dev



On 2/21/19 1:25 PM, Tiwei Bie wrote:
> On Thu, Feb 21, 2019 at 12:22:29PM +0100, Maxime Coquelin wrote:
>> On 2/19/19 11:59 AM, Tiwei Bie wrote:
>>> This patch introduces an optimized enqueue function in packed
>>> ring for the case that virtio net header can be prepended to
>>> the unchained mbuf.
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/net/virtio/virtio_rxtx.c | 63 +++++++++++++++++++++++++++++++-
>>>    1 file changed, 61 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
>>> index 60fa3aa50..771d3c3f6 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -623,6 +623,62 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx *txvq,
>>>    	vq->vq_desc_head_idx = idx & (vq->vq_nentries - 1);
>>>    }
>>> +static inline void
>>> +virtqueue_enqueue_xmit_packed_fast(struct virtnet_tx *txvq,
>>> +				   struct rte_mbuf *cookie,
>>> +				   int in_order)
>>> +{
>>> +	struct virtqueue *vq = txvq->vq;
>>> +	struct vring_packed_desc *dp;
>>> +	struct vq_desc_extra *dxp;
>>> +	uint16_t idx, id, flags;
>>> +	uint16_t head_size = vq->hw->vtnet_hdr_size;
>>> +	struct virtio_net_hdr *hdr;
>>> +
>>> +	id = in_order ? vq->vq_avail_idx : vq->vq_desc_head_idx;
>>> +	idx = vq->vq_avail_idx;
>>> +	dp = &vq->ring_packed.desc_packed[idx];
>>> +
>>> +	dxp = &vq->vq_descx[id];
>>> +	dxp->ndescs = 1;
>>> +	dxp->cookie = cookie;
>>> +
>>> +	flags = vq->avail_used_flags;
>>> +
>>> +	/* prepend cannot fail, checked by caller */
>>> +	hdr = (struct virtio_net_hdr *)
>>> +		rte_pktmbuf_prepend(cookie, head_size);
>>> +	cookie->pkt_len -= head_size;
>>> +
>>> +	/* if offload disabled, hdr is not zeroed yet, do it now */
>>> +	if (!vq->hw->has_tx_offload)
>>> +		virtqueue_clear_net_hdr(hdr);
>>> +	else
>>> +		virtqueue_xmit_offload(hdr, cookie, true);
>>> +
>>> +	dp->addr = VIRTIO_MBUF_DATA_DMA_ADDR(cookie, vq);
>>> +	dp->len  = cookie->data_len;
>>> +	dp->id   = id;
>>> +
>>> +	if (++vq->vq_avail_idx >= vq->vq_nentries) {
>>> +		vq->vq_avail_idx -= vq->vq_nentries;
>>> +		vq->avail_wrap_counter ^= 1;
>>> +		vq->avail_used_flags ^=
>>> +			VRING_DESC_F_AVAIL(1) | VRING_DESC_F_USED(1);
>>> +	}
>>> +
>>> +	vq->vq_free_cnt--;
>>> +
>>> +	if (!in_order) {
>>> +		vq->vq_desc_head_idx = dxp->next;
>>> +		if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
>>> +			vq->vq_desc_tail_idx = VQ_RING_DESC_CHAIN_END;
>>> +	}
>>> +
>>> +	virtio_wmb(vq->hw->weak_barriers);
>>> +	dp->flags = flags;
>>> +}
>>> +
>>>    static inline void
>>>    virtqueue_enqueue_xmit_packed(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>>>    			      uint16_t needed, int can_push, int in_order)
>>> @@ -1979,8 +2035,11 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
>>>    		}
>>>    		/* Enqueue Packet buffers */
>>> -		virtqueue_enqueue_xmit_packed(txvq, txm, slots, can_push,
>>> -					      in_order);
>>> +		if (can_push)
>>> +			virtqueue_enqueue_xmit_packed_fast(txvq, txm, in_order);
>>> +		else
>>> +			virtqueue_enqueue_xmit_packed(txvq, txm, slots, 0,
>>> +						      in_order);
>>>    		virtio_update_packet_stats(&txvq->stats, txm);
>>>    	}
>>>
>>
>> I like this patch, but shouldn't virtqueue_enqueue_xmit_packed() be
>> simplified to get rid off "can_push" now that this case as a dedicated
>> function?
> 
> Yeah, I had the same thought. But after a second thought, I
> think we may also want to push the net hdr to the mbuf even
> if its nb_segs isn't 1 in the future, so I left it untouched.

Ok, that makes sense.
And I think doing the change won't affect the generated code much
anyway, as "can_push" related things will be stripped out at compilation 
time.

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

> Thanks,
> Tiwei
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD
  2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
                   ` (5 preceding siblings ...)
  2019-02-19 13:40 ` [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Jason Wang
@ 2019-02-21 17:45 ` Maxime Coquelin
  6 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2019-02-21 17:45 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev



On 2/19/19 11:59 AM, Tiwei Bie wrote:
> Below is a quick (unofficial) performance test (macfwd loop, 64B)
> for the packed ring optimizations in this series on an Intel(R)
> Xeon(R) Gold 6140 CPU @ 2.30GHz platform:
> 
> w/o this series:
> packed ring normal/in-order:  ~10.4 Mpps
> 
> w/ this series:
> packed ring normal:           ~10.9 Mpps
> packed ring in-order:         ~11.3 Mpps
> 
> In the test, we need to make sure that the vhost side is fast enough.
> So 4 forwarding cores are used in vhost side, and 1 forwarding core is
> used in virtio side.
> 
> vhost side:
> 
> ./x86_64-native-linuxapp-gcc/app/testpmd \
>          -l 13,14,15,16,17 \
>          --socket-mem 1024,0 \
>          --file-prefix=vhost \
>          --vdev=net_vhost0,iface=/tmp/vhost0,queues=4 \
>          -- \
>          --forward-mode=mac \
>          -i \
>          --rxq=4 \
>          --txq=4 \
>          --nb-cores 4
> 
> virtio side:
> 
> ./x86_64-native-linuxapp-gcc/app/testpmd \
>          -l 8,9,10,11,12 \
>          --socket-mem 1024,0 \
>          --single-file-segments \
>          --file-prefix=virtio-user \
>          --vdev=virtio_user0,path=/tmp/vhost0,queues=4,in_order=1,packed_vq=1 \
>          -- \
>          --forward-mode=mac \
>          -i \
>          --rxq=4 \
>          --txq=4 \
>          --nb-cores 1
> 
> 
> Tiwei Bie (5):
>    net/virtio: fix Tx desc cleanup for packed ring
>    net/virtio: fix in-order Tx path for split ring
>    net/virtio: fix in-order Tx path for packed ring
>    net/virtio: introduce a helper for clearing net header
>    net/virtio: optimize xmit enqueue for packed ring
> 
>   drivers/net/virtio/virtio_ethdev.c |   4 +-
>   drivers/net/virtio/virtio_rxtx.c   | 203 ++++++++++++++++++++---------
>   2 files changed, 146 insertions(+), 61 deletions(-)
> 

Applied to dpdk-next-virtio/master.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-02-21 17:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 10:59 [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Tiwei Bie
2019-02-19 10:59 ` [dpdk-dev] [PATCH 1/5] net/virtio: fix Tx desc cleanup for packed ring Tiwei Bie
2019-02-21 11:05   ` Maxime Coquelin
2019-02-19 10:59 ` [dpdk-dev] [PATCH 2/5] net/virtio: fix in-order Tx path for split ring Tiwei Bie
2019-02-21 11:08   ` Maxime Coquelin
2019-02-19 10:59 ` [dpdk-dev] [PATCH 3/5] net/virtio: fix in-order Tx path for packed ring Tiwei Bie
2019-02-21 11:16   ` Maxime Coquelin
2019-02-19 10:59 ` [dpdk-dev] [PATCH 4/5] net/virtio: introduce a helper for clearing net header Tiwei Bie
2019-02-21 11:18   ` Maxime Coquelin
2019-02-19 10:59 ` [dpdk-dev] [PATCH 5/5] net/virtio: optimize xmit enqueue for packed ring Tiwei Bie
2019-02-21 11:22   ` Maxime Coquelin
2019-02-21 12:25     ` Tiwei Bie
2019-02-21 12:31       ` Maxime Coquelin
2019-02-19 13:40 ` [dpdk-dev] [PATCH 0/5] Fixes and enhancements for Tx path in Virtio PMD Jason Wang
2019-02-20  2:23   ` Tiwei Bie
2019-02-21 17:45 ` Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).