* [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
@ 2015-10-19 5:16 Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit Stephen Hemminger
` (6 more replies)
0 siblings, 7 replies; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 5:16 UTC (permalink / raw)
To: huawei.xie, changchun.ouyang; +Cc: dev
This is a tested version of the virtio Tx performance improvements
that I posted earlier on the list, and described at the DPDK Userspace
meeting in Dublin. Together they get a 25% performance improvement for
both small packet and large multi-segment packet case when testing
from DPDK guest application to Linux KVM host.
Stephen Hemminger (5):
virtio: clean up space checks on xmit
virtio: don't use unlikely for normal tx stuff
virtio: use indirect ring elements
virtio: use any layout on transmit
virtio: optimize transmit enqueue
drivers/net/virtio/virtio_ethdev.c | 38 +++++++---
drivers/net/virtio/virtio_ethdev.h | 4 +-
drivers/net/virtio/virtio_rxtx.c | 150 ++++++++++++++++++++-----------------
drivers/net/virtio/virtqueue.h | 19 +++++
4 files changed, 130 insertions(+), 81 deletions(-)
--
2.1.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
@ 2015-10-19 5:16 ` Stephen Hemminger
2015-10-19 8:02 ` Xie, Huawei
2015-10-19 5:16 ` [dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff Stephen Hemminger
` (5 subsequent siblings)
6 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 5:16 UTC (permalink / raw)
To: huawei.xie, changchun.ouyang; +Cc: dev
The space check for transmit ring only needs a single conditional.
I.e only need to recheck for space if there was no space in first check.
This can help performance and simplifies loop.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/virtio/virtio_rxtx.c | 66 ++++++++++++++++------------------------
1 file changed, 27 insertions(+), 39 deletions(-)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..5b50ed0 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -745,7 +745,6 @@ uint16_t
virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
{
struct virtqueue *txvq = tx_queue;
- struct rte_mbuf *txm;
uint16_t nb_used, nb_tx;
int error;
@@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
virtio_xmit_cleanup(txvq, nb_used);
- nb_tx = 0;
+ for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+ struct rte_mbuf *txm = tx_pkts[nb_tx];
+ int need = txm->nb_segs - txvq->vq_free_cnt + 1;
- while (nb_tx < nb_pkts) {
- /* Need one more descriptor for virtio header. */
- int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
-
- /*Positive value indicates it need free vring descriptors */
+ /* Positive value indicates it need free vring descriptors */
if (unlikely(need > 0)) {
nb_used = VIRTQUEUE_NUSED(txvq);
virtio_rmb();
need = RTE_MIN(need, (int)nb_used);
virtio_xmit_cleanup(txvq, need);
- need = (int)tx_pkts[nb_tx]->nb_segs -
- txvq->vq_free_cnt + 1;
- }
-
- /*
- * Zero or negative value indicates it has enough free
- * descriptors to use for transmitting.
- */
- if (likely(need <= 0)) {
- txm = tx_pkts[nb_tx];
-
- /* Do VLAN tag insertion */
- if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
- error = rte_vlan_insert(&txm);
- if (unlikely(error)) {
- rte_pktmbuf_free(txm);
- ++nb_tx;
- continue;
- }
+ need = txm->nb_segs - txvq->vq_free_cnt + 1;
+ if (unlikely(need > 0)) {
+ PMD_TX_LOG(ERR,
+ "No free tx descriptors to transmit");
+ break;
}
+ }
- /* Enqueue Packet buffers */
- error = virtqueue_enqueue_xmit(txvq, txm);
+ /* Do VLAN tag insertion */
+ if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
+ error = rte_vlan_insert(&txm);
if (unlikely(error)) {
- if (error == ENOSPC)
- PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
- else if (error == EMSGSIZE)
- PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
- else
- PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
- break;
+ rte_pktmbuf_free(txm);
+ continue;
}
- nb_tx++;
- txvq->bytes += txm->pkt_len;
- } else {
- PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
+ }
+
+ /* Enqueue Packet buffers */
+ error = virtqueue_enqueue_xmit(txvq, txm);
+ if (unlikely(error)) {
+ if (error == ENOSPC)
+ PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
+ else if (error == EMSGSIZE)
+ PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
+ else
+ PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
break;
}
+ txvq->bytes += txm->pkt_len;
}
txvq->packets += nb_tx;
--
2.1.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit Stephen Hemminger
@ 2015-10-19 5:16 ` Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements Stephen Hemminger
` (4 subsequent siblings)
6 siblings, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 5:16 UTC (permalink / raw)
To: huawei.xie, changchun.ouyang; +Cc: dev
Don't use unlikely() for VLAN or ring getting full.
GCC will not optimize code in unlikely paths and since these can
happen with normal code that can hurt performance.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/virtio/virtio_rxtx.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5b50ed0..dbe6665 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -763,7 +763,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
int need = txm->nb_segs - txvq->vq_free_cnt + 1;
/* Positive value indicates it need free vring descriptors */
- if (unlikely(need > 0)) {
+ if (need > 0) {
nb_used = VIRTQUEUE_NUSED(txvq);
virtio_rmb();
need = RTE_MIN(need, (int)nb_used);
@@ -778,7 +778,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
/* Do VLAN tag insertion */
- if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
+ if (txm->ol_flags & PKT_TX_VLAN_PKT) {
error = rte_vlan_insert(&txm);
if (unlikely(error)) {
rte_pktmbuf_free(txm);
@@ -798,10 +798,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
break;
}
txvq->bytes += txm->pkt_len;
+ ++txvq->packets;
}
- txvq->packets += nb_tx;
-
if (likely(nb_tx)) {
vq_update_avail_idx(txvq);
--
2.1.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff Stephen Hemminger
@ 2015-10-19 5:16 ` Stephen Hemminger
2015-10-19 13:19 ` Xie, Huawei
2015-10-30 18:01 ` Thomas Monjalon
2015-10-19 5:16 ` [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit Stephen Hemminger
` (3 subsequent siblings)
6 siblings, 2 replies; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 5:16 UTC (permalink / raw)
To: huawei.xie, changchun.ouyang; +Cc: dev
The virtio ring in QEMU/KVM is usually limited to 256 entries
and the normal way that virtio driver was queuing mbufs required
nsegs + 1 ring elements. By using the indirect ring element feature
if available, each packet will take only one ring slot even for
multi-segment packets.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/virtio/virtio_ethdev.c | 38 +++++++++++++++++------
drivers/net/virtio/virtio_ethdev.h | 3 +-
drivers/net/virtio/virtio_rxtx.c | 62 +++++++++++++++++++++++++++-----------
drivers/net/virtio/virtqueue.h | 19 ++++++++++++
4 files changed, 94 insertions(+), 28 deletions(-)
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..cfce4f0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -357,27 +357,45 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mem = 0;
if (queue_type == VTNET_TQ) {
+ const struct rte_memzone *hdr_mz;
+ struct virtio_tx_region *txr;
+ int i;
+
/*
* For each xmit packet, allocate a virtio_net_hdr
+ * and indirect ring elements
*/
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
dev->data->port_id, queue_idx);
- vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
- vq_size * hw->vtnet_hdr_size,
+ hdr_mz = rte_memzone_reserve_aligned(vq_name,
+ vq_size * sizeof(*txr),
socket_id, 0, RTE_CACHE_LINE_SIZE);
- if (vq->virtio_net_hdr_mz == NULL) {
+ if (hdr_mz == NULL) {
if (rte_errno == EEXIST)
- vq->virtio_net_hdr_mz =
- rte_memzone_lookup(vq_name);
- if (vq->virtio_net_hdr_mz == NULL) {
+ hdr_mz = rte_memzone_lookup(vq_name);
+ if (hdr_mz == NULL) {
rte_free(vq);
return -ENOMEM;
}
}
- vq->virtio_net_hdr_mem =
- vq->virtio_net_hdr_mz->phys_addr;
- memset(vq->virtio_net_hdr_mz->addr, 0,
- vq_size * hw->vtnet_hdr_size);
+ vq->virtio_net_hdr_mz = hdr_mz;
+ vq->virtio_net_hdr_mem = hdr_mz->phys_addr;
+
+ txr = hdr_mz->addr;
+ memset(txr, 0, vq_size * sizeof(*txr));
+ for (i = 0; i < vq_size; i++) {
+ struct vring_desc *start_dp = txr[i].tx_indir;
+
+ vring_desc_init(start_dp, VIRTIO_MAX_INDIRECT);
+
+ /* first indirect descriptor is always the tx header */
+ start_dp->addr = vq->virtio_net_hdr_mem
+ + i * sizeof(*txr)
+ + offsetof(struct virtio_tx_region, tx_hdr);
+
+ start_dp->len = vq->hw->vtnet_hdr_size;
+ start_dp->flags = VRING_DESC_F_NEXT;
+ }
} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..07a9265 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -64,7 +64,8 @@
1u << VIRTIO_NET_F_CTRL_VQ | \
1u << VIRTIO_NET_F_CTRL_RX | \
1u << VIRTIO_NET_F_CTRL_VLAN | \
- 1u << VIRTIO_NET_F_MRG_RXBUF)
+ 1u << VIRTIO_NET_F_MRG_RXBUF | \
+ 1u << VIRTIO_RING_F_INDIRECT_DESC)
/*
* CQ function prototype
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index dbe6665..f68ab8f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -199,14 +199,15 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
}
static int
-virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
+virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
+ int use_indirect)
{
struct vq_desc_extra *dxp;
struct vring_desc *start_dp;
uint16_t seg_num = cookie->nb_segs;
- uint16_t needed = 1 + seg_num;
+ uint16_t needed = use_indirect ? 1 : 1 + seg_num;
uint16_t head_idx, idx;
- uint16_t head_size = txvq->hw->vtnet_hdr_size;
+ unsigned long offs;
if (unlikely(txvq->vq_free_cnt == 0))
return -ENOSPC;
@@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
dxp = &txvq->vq_descx[idx];
dxp->cookie = (void *)cookie;
dxp->ndescs = needed;
-
start_dp = txvq->vq_ring.desc;
- start_dp[idx].addr =
- txvq->virtio_net_hdr_mem + idx * head_size;
- start_dp[idx].len = (uint32_t)head_size;
- start_dp[idx].flags = VRING_DESC_F_NEXT;
+
+ if (use_indirect) {
+ struct virtio_tx_region *txr
+ = txvq->virtio_net_hdr_mz->addr;
+
+ offs = idx * sizeof(struct virtio_tx_region)
+ + offsetof(struct virtio_tx_region, tx_indir);
+
+ start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
+ start_dp[idx].len = (seg_num + 1) * sizeof(struct vring_desc);
+ start_dp[idx].flags = VRING_DESC_F_INDIRECT;
+
+ start_dp = txr[idx].tx_indir;
+ idx = 0;
+ } else {
+ offs = idx * sizeof(struct virtio_tx_region)
+ + offsetof(struct virtio_tx_region, tx_hdr);
+
+ start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
+ start_dp[idx].len = txvq->hw->vtnet_hdr_size;
+ start_dp[idx].flags = VRING_DESC_F_NEXT;
+ }
for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
idx = start_dp[idx].next;
@@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
}
start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
- idx = start_dp[idx].next;
+
+ if (use_indirect)
+ idx = txvq->vq_ring.desc[head_idx].next;
+ else
+ idx = start_dp[idx].next;
+
txvq->vq_desc_head_idx = idx;
if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
txvq->vq_desc_tail_idx = idx;
@@ -261,7 +284,7 @@ static void
virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
{
struct rte_mbuf *m;
- int i, nbufs, error, size = vq->vq_nentries;
+ int nbufs, error, size = vq->vq_nentries;
struct vring *vr = &vq->vq_ring;
uint8_t *ring_mem = vq->vq_ring_virt_mem;
@@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
vq->vq_free_cnt = vq->vq_nentries;
memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
- /* Chain all the descriptors in the ring with an END */
- for (i = 0; i < size - 1; i++)
- vr->desc[i].next = (uint16_t)(i + 1);
- vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
+ vring_desc_init(vr->desc, size);
/*
* Disable device(host) interrupting guest
@@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
struct rte_mbuf *txm = tx_pkts[nb_tx];
- int need = txm->nb_segs - txvq->vq_free_cnt + 1;
+ int use_indirect, slots, need;
+
+ use_indirect = vtpci_with_feature(txvq->hw,
+ VIRTIO_RING_F_INDIRECT_DESC)
+ && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);
+
+ /* How many ring entries are needed to this Tx? */
+ slots = use_indirect ? 1 : 1 + txm->nb_segs;
+ need = slots - txvq->vq_free_cnt;
/* Positive value indicates it need free vring descriptors */
if (need > 0) {
@@ -769,7 +797,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
need = RTE_MIN(need, (int)nb_used);
virtio_xmit_cleanup(txvq, need);
- need = txm->nb_segs - txvq->vq_free_cnt + 1;
+ need = slots - txvq->vq_free_cnt;
if (unlikely(need > 0)) {
PMD_TX_LOG(ERR,
"No free tx descriptors to transmit");
@@ -787,7 +815,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
/* Enqueue Packet buffers */
- error = virtqueue_enqueue_xmit(txvq, txm);
+ error = virtqueue_enqueue_xmit(txvq, txm, use_indirect);
if (unlikely(error)) {
if (error == ENOSPC)
PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..fe3fa66 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -237,6 +237,25 @@ struct virtio_net_hdr_mrg_rxbuf {
uint16_t num_buffers; /**< Number of merged rx buffers */
};
+/* Region reserved to allow for transmit header and indirect ring */
+#define VIRTIO_MAX_TX_INDIRECT 8
+struct virtio_tx_region {
+ struct virtio_net_hdr_mrg_rxbuf tx_hdr;
+ struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
+ __attribute__((__aligned__(16)));
+};
+
+/* Chain all the descriptors in the ring with an END */
+static inline void
+vring_desc_init(struct vring_desc *dp, uint16_t n)
+{
+ uint16_t i;
+
+ for (i = 0; i < n - 1; i++)
+ dp[i].next = (uint16_t)(i + 1);
+ dp[i].next = VQ_RING_DESC_CHAIN_END;
+}
+
/**
* Tell the backend not to interrupt us.
*/
--
2.1.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
` (2 preceding siblings ...)
2015-10-19 5:16 ` [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements Stephen Hemminger
@ 2015-10-19 5:16 ` Stephen Hemminger
2015-10-19 16:28 ` Xie, Huawei
2015-10-19 5:16 ` [dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue Stephen Hemminger
` (2 subsequent siblings)
6 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 5:16 UTC (permalink / raw)
To: huawei.xie, changchun.ouyang; +Cc: dev
Virtio supports a feature that allows sender to put transmit
header prepended to data. It requires that the mbuf be writeable, correct
alignment, and the feature has been negotiatied. If all this works out,
then it will be the optimum way to transmit a single segment packet.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/virtio/virtio_ethdev.h | 3 +-
drivers/net/virtio/virtio_rxtx.c | 66 +++++++++++++++++++++++---------------
2 files changed, 42 insertions(+), 27 deletions(-)
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 07a9265..f260fbb 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -65,7 +65,8 @@
1u << VIRTIO_NET_F_CTRL_RX | \
1u << VIRTIO_NET_F_CTRL_VLAN | \
1u << VIRTIO_NET_F_MRG_RXBUF | \
- 1u << VIRTIO_RING_F_INDIRECT_DESC)
+ 1u << VIRTIO_RING_F_INDIRECT_DESC| \
+ 1u << VIRTIO_F_ANY_LAYOUT)
/*
* CQ function prototype
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index f68ab8f..dbedcc3 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -200,13 +200,13 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
static int
virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
- int use_indirect)
+ uint16_t needed, int use_indirect, int can_push)
{
struct vq_desc_extra *dxp;
struct vring_desc *start_dp;
uint16_t seg_num = cookie->nb_segs;
- uint16_t needed = use_indirect ? 1 : 1 + seg_num;
uint16_t head_idx, idx;
+ uint16_t head_size = txvq->hw->vtnet_hdr_size;
unsigned long offs;
if (unlikely(txvq->vq_free_cnt == 0))
@@ -223,7 +223,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
dxp->ndescs = needed;
start_dp = txvq->vq_ring.desc;
- if (use_indirect) {
+ if (can_push) {
+ /* put on zero'd transmit header (no offloads) */
+ void *hdr = rte_pktmbuf_prepend(cookie, head_size);
+
+ memset(hdr, 0, head_size);
+ } else if (use_indirect) {
struct virtio_tx_region *txr
= txvq->virtio_net_hdr_mz->addr;
@@ -235,7 +240,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
start_dp[idx].flags = VRING_DESC_F_INDIRECT;
start_dp = txr[idx].tx_indir;
- idx = 0;
+ idx = 1;
} else {
offs = idx * sizeof(struct virtio_tx_region)
+ offsetof(struct virtio_tx_region, tx_hdr);
@@ -243,22 +248,19 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
start_dp[idx].len = txvq->hw->vtnet_hdr_size;
start_dp[idx].flags = VRING_DESC_F_NEXT;
+ idx = start_dp[idx].next;
}
- for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
- idx = start_dp[idx].next;
+ while (cookie != NULL) {
start_dp[idx].addr = RTE_MBUF_DATA_DMA_ADDR(cookie);
start_dp[idx].len = cookie->data_len;
- start_dp[idx].flags = VRING_DESC_F_NEXT;
+ start_dp[idx].flags = cookie->next ? VRING_DESC_F_NEXT : 0;
cookie = cookie->next;
+ idx = start_dp[idx].next;
}
- start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
-
if (use_indirect)
idx = txvq->vq_ring.desc[head_idx].next;
- else
- idx = start_dp[idx].next;
txvq->vq_desc_head_idx = idx;
if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
@@ -761,10 +763,13 @@ virtio_recv_mergeable_pkts(void *rx_queue,
return nb_rx;
}
+
uint16_t
virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
{
struct virtqueue *txvq = tx_queue;
+ struct virtio_hw *hw = txvq->hw;
+ uint16_t hdr_size = hw->vtnet_hdr_size;
uint16_t nb_used, nb_tx;
int error;
@@ -780,14 +785,31 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
struct rte_mbuf *txm = tx_pkts[nb_tx];
- int use_indirect, slots, need;
+ int can_push = 0, use_indirect = 0, slots, need;
+
+ /* Do VLAN tag insertion */
+ if (txm->ol_flags & PKT_TX_VLAN_PKT) {
+ error = rte_vlan_insert(&txm);
+ if (unlikely(error)) {
+ rte_pktmbuf_free(txm);
+ continue;
+ }
+ }
- use_indirect = vtpci_with_feature(txvq->hw,
- VIRTIO_RING_F_INDIRECT_DESC)
- && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);
+ /* optimize ring usage */
+ if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
+ rte_mbuf_refcnt_read(txm) == 1 &&
+ txm->nb_segs == 1 &&
+ rte_pktmbuf_headroom(txm) >= hdr_size &&
+ rte_is_aligned(rte_pktmbuf_mtod(txm, char *),
+ __alignof__(struct virtio_net_hdr_mrg_rxbuf)))
+ can_push = 1;
+ else if (vtpci_with_feature(hw, VIRTIO_RING_F_INDIRECT_DESC) &&
+ txm->nb_segs < VIRTIO_MAX_TX_INDIRECT)
+ use_indirect = 1;
/* How many ring entries are needed to this Tx? */
- slots = use_indirect ? 1 : 1 + txm->nb_segs;
+ slots = use_indirect ? 1 : !can_push + txm->nb_segs;
need = slots - txvq->vq_free_cnt;
/* Positive value indicates it need free vring descriptors */
@@ -805,17 +827,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
}
- /* Do VLAN tag insertion */
- if (txm->ol_flags & PKT_TX_VLAN_PKT) {
- error = rte_vlan_insert(&txm);
- if (unlikely(error)) {
- rte_pktmbuf_free(txm);
- continue;
- }
- }
-
/* Enqueue Packet buffers */
- error = virtqueue_enqueue_xmit(txvq, txm, use_indirect);
+ error = virtqueue_enqueue_xmit(txvq, txm, slots,
+ use_indirect, can_push);
if (unlikely(error)) {
if (error == ENOSPC)
PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
--
2.1.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
` (3 preceding siblings ...)
2015-10-19 5:16 ` [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit Stephen Hemminger
@ 2015-10-19 5:16 ` Stephen Hemminger
2015-10-20 1:48 ` Xie, Huawei
2015-10-21 13:18 ` [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Thomas Monjalon
2015-10-26 14:05 ` Xie, Huawei
6 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 5:16 UTC (permalink / raw)
To: huawei.xie, changchun.ouyang; +Cc: dev
All the error checks in virtqueue_enqueue_xmit are already done
by the caller. Therefore they can be removed to improve performance.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/virtio/virtio_rxtx.c | 23 ++---------------------
1 file changed, 2 insertions(+), 21 deletions(-)
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index dbedcc3..8fa0dd7 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -198,7 +198,7 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
return 0;
}
-static int
+static inline void
virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
uint16_t needed, int use_indirect, int can_push)
{
@@ -209,14 +209,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
uint16_t head_size = txvq->hw->vtnet_hdr_size;
unsigned long offs;
- if (unlikely(txvq->vq_free_cnt == 0))
- return -ENOSPC;
- if (unlikely(txvq->vq_free_cnt < needed))
- return -EMSGSIZE;
head_idx = txvq->vq_desc_head_idx;
- if (unlikely(head_idx >= txvq->vq_nentries))
- return -EFAULT;
-
idx = head_idx;
dxp = &txvq->vq_descx[idx];
dxp->cookie = (void *)cookie;
@@ -267,8 +260,6 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
txvq->vq_desc_tail_idx = idx;
txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed);
vq_update_avail_ring(txvq, head_idx);
-
- return 0;
}
static inline struct rte_mbuf *
@@ -828,17 +819,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
/* Enqueue Packet buffers */
- error = virtqueue_enqueue_xmit(txvq, txm, slots,
- use_indirect, can_push);
- if (unlikely(error)) {
- if (error == ENOSPC)
- PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
- else if (error == EMSGSIZE)
- PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
- else
- PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
- break;
- }
+ virtqueue_enqueue_xmit(txvq, txm, slots, use_indirect, can_push);
txvq->bytes += txm->pkt_len;
++txvq->packets;
}
--
2.1.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit
2015-10-19 5:16 ` [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit Stephen Hemminger
@ 2015-10-19 8:02 ` Xie, Huawei
2015-10-19 15:48 ` Stephen Hemminger
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2015-10-19 8:02 UTC (permalink / raw)
To: Stephen Hemminger, changchun.ouyang; +Cc: dev
On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> The space check for transmit ring only needs a single conditional.
> I.e only need to recheck for space if there was no space in first check.
I see you reorganized the code. It is more clear and readable with your
change, but no check is removed.
We could remove the check after virtio_xmit_cleanup. In current
implementation, we assume the virtio_xmit_cleanup and vq_ring_free_chain
always succeed.
> This can help performance and simplifies loop.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> drivers/net/virtio/virtio_rxtx.c | 66 ++++++++++++++++------------------------
> 1 file changed, 27 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index c5b53bb..5b50ed0 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -745,7 +745,6 @@ uint16_t
> virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> {
> struct virtqueue *txvq = tx_queue;
> - struct rte_mbuf *txm;
> uint16_t nb_used, nb_tx;
> int error;
>
> @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
> virtio_xmit_cleanup(txvq, nb_used);
>
> - nb_tx = 0;
> + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> + struct rte_mbuf *txm = tx_pkts[nb_tx];
> + int need = txm->nb_segs - txvq->vq_free_cnt + 1;
>
> - while (nb_tx < nb_pkts) {
> - /* Need one more descriptor for virtio header. */
> - int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
> -
> - /*Positive value indicates it need free vring descriptors */
> + /* Positive value indicates it need free vring descriptors */
As you fix the comment, if it is correct, could you do s/need/needs/ as
well, :)?
> if (unlikely(need > 0)) {
> nb_used = VIRTQUEUE_NUSED(txvq);
> virtio_rmb();
> need = RTE_MIN(need, (int)nb_used);
>
> virtio_xmit_cleanup(txvq, need);
> - need = (int)tx_pkts[nb_tx]->nb_segs -
> - txvq->vq_free_cnt + 1;
> - }
> -
> - /*
> - * Zero or negative value indicates it has enough free
> - * descriptors to use for transmitting.
> - */
> - if (likely(need <= 0)) {
> - txm = tx_pkts[nb_tx];
> -
> - /* Do VLAN tag insertion */
> - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> - error = rte_vlan_insert(&txm);
> - if (unlikely(error)) {
> - rte_pktmbuf_free(txm);
> - ++nb_tx;
> - continue;
> - }
Still need is rechecked here. It could be removed if virtio_xmit_cleanup
ensures need would be <= 0 after the call.
> + need = txm->nb_segs - txvq->vq_free_cnt + 1;
> + if (unlikely(need > 0)) {
> + PMD_TX_LOG(ERR,
> + "No free tx descriptors to transmit");
> + break;
> }
> + }
>
> - /* Enqueue Packet buffers */
> - error = virtqueue_enqueue_xmit(txvq, txm);
> + /* Do VLAN tag insertion */
> + if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> + error = rte_vlan_insert(&txm);
> if (unlikely(error)) {
> - if (error == ENOSPC)
> - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
> - else if (error == EMSGSIZE)
> - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
> - else
> - PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
> - break;
> + rte_pktmbuf_free(txm);
> + continue;
> }
> - nb_tx++;
> - txvq->bytes += txm->pkt_len;
> - } else {
> - PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
> + }
> +
> + /* Enqueue Packet buffers */
> + error = virtqueue_enqueue_xmit(txvq, txm);
> + if (unlikely(error)) {
> + if (error == ENOSPC)
> + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
> + else if (error == EMSGSIZE)
> + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
> + else
> + PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
> break;
> }
> + txvq->bytes += txm->pkt_len;
> }
>
> txvq->packets += nb_tx;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements
2015-10-19 5:16 ` [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements Stephen Hemminger
@ 2015-10-19 13:19 ` Xie, Huawei
2015-10-19 15:47 ` Stephen Hemminger
2015-10-30 18:01 ` Thomas Monjalon
1 sibling, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2015-10-19 13:19 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> The virtio ring in QEMU/KVM is usually limited to 256 entries
> and the normal way that virtio driver was queuing mbufs required
> nsegs + 1 ring elements. By using the indirect ring element feature
> if available, each packet will take only one ring slot even for
> multi-segment packets.
[...]
> static int
> -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
> + int use_indirect)
> {
> struct vq_desc_extra *dxp;
> struct vring_desc *start_dp;
> uint16_t seg_num = cookie->nb_segs;
> - uint16_t needed = 1 + seg_num;
> + uint16_t needed = use_indirect ? 1 : 1 + seg_num;
> uint16_t head_idx, idx;
> - uint16_t head_size = txvq->hw->vtnet_hdr_size;
> + unsigned long offs;
>
> if (unlikely(txvq->vq_free_cnt == 0))
> return -ENOSPC;
> @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> dxp = &txvq->vq_descx[idx];
> dxp->cookie = (void *)cookie;
> dxp->ndescs = needed;
> -
> start_dp = txvq->vq_ring.desc;
> - start_dp[idx].addr =
> - txvq->virtio_net_hdr_mem + idx * head_size;
> - start_dp[idx].len = (uint32_t)head_size;
> - start_dp[idx].flags = VRING_DESC_F_NEXT;
> +
> + if (use_indirect) {
> + struct virtio_tx_region *txr
> + = txvq->virtio_net_hdr_mz->addr;
> +
> + offs = idx * sizeof(struct virtio_tx_region)
> + + offsetof(struct virtio_tx_region, tx_indir);
> +
> + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
> + start_dp[idx].len = (seg_num + 1) * sizeof(struct vring_desc);
a. In indirect mode, as we always use one descriptor, could we allocate
one fixed descriptor as what i did in RX/TX ring layout optimization? :).
b. If not a, we could cache the descriptor, avoid update unless the
fields are different. In current implementation of free desc list, we
could make them always use the same tx desc for the same ring slot. I am
to submit a patch for normal rx path.
c. Could we initialize the length of all tx descriptors to be
VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok
here? Does the spec require that the length field reflects the length of
real used descs, as we already have the next field to indicate the last
descriptor.
> + start_dp[idx].flags = VRING_DESC_F_INDIRECT;
> +
> + start_dp = txr[idx].tx_indir;
> + idx = 0;
> + } else {
> + offs = idx * sizeof(struct virtio_tx_region)
> + + offsetof(struct virtio_tx_region, tx_hdr);
> +
> + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
> + start_dp[idx].len = txvq->hw->vtnet_hdr_size;
> + start_dp[idx].flags = VRING_DESC_F_NEXT;
> + }
>
> for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
> idx = start_dp[idx].next;
This looks weird to me. Why skip the first user provided descriptor?
idx = 0
idx = start_dp[idx].next
start_dp[idx].addr = ...
> @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> }
>
> start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
> - idx = start_dp[idx].next;
> +
> + if (use_indirect)
> + idx = txvq->vq_ring.desc[head_idx].next;
> + else
> + idx = start_dp[idx].next;
> +
> txvq->vq_desc_head_idx = idx;
> if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
> txvq->vq_desc_tail_idx = idx;
> @@ -261,7 +284,7 @@ static void
> virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
> {
> struct rte_mbuf *m;
> - int i, nbufs, error, size = vq->vq_nentries;
> + int nbufs, error, size = vq->vq_nentries;
> struct vring *vr = &vq->vq_ring;
> uint8_t *ring_mem = vq->vq_ring_virt_mem;
>
> @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
> vq->vq_free_cnt = vq->vq_nentries;
> memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
>
> - /* Chain all the descriptors in the ring with an END */
> - for (i = 0; i < size - 1; i++)
> - vr->desc[i].next = (uint16_t)(i + 1);
> - vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
> + vring_desc_init(vr->desc, size);
>
> /*
> * Disable device(host) interrupting guest
> @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>
> for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> struct rte_mbuf *txm = tx_pkts[nb_tx];
> - int need = txm->nb_segs - txvq->vq_free_cnt + 1;
> + int use_indirect, slots, need;
> +
> + use_indirect = vtpci_with_feature(txvq->hw,
> + VIRTIO_RING_F_INDIRECT_DESC)
> + && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);
> +
> + /* How many ring entries are needed to this Tx? */
"how many descs" is more accurate , at least to me, ring entries/slots
means entries/slots in avail ring.
If it is OK, s/slots/descs/ as well.
> + slots = use_indirect ? 1 : 1 + txm->nb_segs;
> + need = slots - txvq->vq_free_cnt;
>
> /* Positive value indicates it need free vring descriptors */
> if (need > 0) {
> @@ -769,7 +797,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> need = RTE_MIN(need, (int)nb_used);
>
> virtio_xmit_cleanup(txvq, need);
> - need = txm->nb_segs - txvq->vq_free_cnt + 1;
> + need = slots - txvq->vq_free_cnt;
> if (unlikely(need > 0)) {
> PMD_TX_LOG(ERR,
> "No free tx descriptors to transmit");
> @@ -787,7 +815,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> }
>
> /* Enqueue Packet buffers */
> - error = virtqueue_enqueue_xmit(txvq, txm);
> + error = virtqueue_enqueue_xmit(txvq, txm, use_indirect);
> if (unlikely(error)) {
> if (error == ENOSPC)
> PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> index 7789411..fe3fa66 100644
> --- a/drivers/net/virtio/virtqueue.h
> +++ b/drivers/net/virtio/virtqueue.h
> @@ -237,6 +237,25 @@ struct virtio_net_hdr_mrg_rxbuf {
> uint16_t num_buffers; /**< Number of merged rx buffers */
> };
>
> +/* Region reserved to allow for transmit header and indirect ring */
> +#define VIRTIO_MAX_TX_INDIRECT 8
> +struct virtio_tx_region {
> + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
Any reason to use merge-able header here?
> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
> + __attribute__((__aligned__(16)));
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
[...]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements
2015-10-19 13:19 ` Xie, Huawei
@ 2015-10-19 15:47 ` Stephen Hemminger
2015-10-19 16:18 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 15:47 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Mon, 19 Oct 2015 13:19:50 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> > static int
> > -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> > +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
> > + int use_indirect)
> > {
> > struct vq_desc_extra *dxp;
> > struct vring_desc *start_dp;
> > uint16_t seg_num = cookie->nb_segs;
> > - uint16_t needed = 1 + seg_num;
> > + uint16_t needed = use_indirect ? 1 : 1 + seg_num;
> > uint16_t head_idx, idx;
> > - uint16_t head_size = txvq->hw->vtnet_hdr_size;
> > + unsigned long offs;
> >
> > if (unlikely(txvq->vq_free_cnt == 0))
> > return -ENOSPC;
> > @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> > dxp = &txvq->vq_descx[idx];
> > dxp->cookie = (void *)cookie;
> > dxp->ndescs = needed;
> > -
> > start_dp = txvq->vq_ring.desc;
> > - start_dp[idx].addr =
> > - txvq->virtio_net_hdr_mem + idx * head_size;
> > - start_dp[idx].len = (uint32_t)head_size;
> > - start_dp[idx].flags = VRING_DESC_F_NEXT;
> > +
> > + if (use_indirect) {
> > + struct virtio_tx_region *txr
> > + = txvq->virtio_net_hdr_mz->addr;
> > +
> > + offs = idx * sizeof(struct virtio_tx_region)
> > + + offsetof(struct virtio_tx_region, tx_indir);
> > +
> > + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
> > + start_dp[idx].len = (seg_num + 1) * sizeof(struct vring_desc);
> a. In indirect mode, as we always use one descriptor, could we allocate
> one fixed descriptor as what i did in RX/TX ring layout optimization? :).
The code can not assume all packets will be in indirect mode. If using
any_layout, then some packets will use that. Also if you give a packet
where nb_segs is very large, then it falls back to old mode.
Also some hosts (like vhost) don't support indirect.
> b. If not a, we could cache the descriptor, avoid update unless the
> fields are different. In current implementation of free desc list, we
> could make them always use the same tx desc for the same ring slot. I am
> to submit a patch for normal rx path.
See above
> c. Could we initialize the length of all tx descriptors to be
> VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok
> here? Does the spec require that the length field reflects the length of
> real used descs, as we already have the next field to indicate the last
> descriptor.
The largest VIRTIO_MAX_INDIRECT possible is very large 4K
>
> > + start_dp[idx].flags = VRING_DESC_F_INDIRECT;
> > +
> > + start_dp = txr[idx].tx_indir;
> > + idx = 0;
> > + } else {
> > + offs = idx * sizeof(struct virtio_tx_region)
> > + + offsetof(struct virtio_tx_region, tx_hdr);
> > +
> > + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
> > + start_dp[idx].len = txvq->hw->vtnet_hdr_size;
> > + start_dp[idx].flags = VRING_DESC_F_NEXT;
> > + }
> >
> > for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
> > idx = start_dp[idx].next;
> This looks weird to me. Why skip the first user provided descriptor?
> idx = 0
> idx = start_dp[idx].next
> start_dp[idx].addr = ...
The first descriptor (0) is initialized once to point to the static
all zeros tx header. Then code skips to second entry to initailize the
first data block.
>
> > @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
> > }
> >
> > start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
> > - idx = start_dp[idx].next;
> > +
> > + if (use_indirect)
> > + idx = txvq->vq_ring.desc[head_idx].next;
> > + else
> > + idx = start_dp[idx].next;
> > +
> > txvq->vq_desc_head_idx = idx;
> > if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
> > txvq->vq_desc_tail_idx = idx;
> > @@ -261,7 +284,7 @@ static void
> > virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
> > {
> > struct rte_mbuf *m;
> > - int i, nbufs, error, size = vq->vq_nentries;
> > + int nbufs, error, size = vq->vq_nentries;
> > struct vring *vr = &vq->vq_ring;
> > uint8_t *ring_mem = vq->vq_ring_virt_mem;
> >
> > @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
> > vq->vq_free_cnt = vq->vq_nentries;
> > memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
> >
> > - /* Chain all the descriptors in the ring with an END */
> > - for (i = 0; i < size - 1; i++)
> > - vr->desc[i].next = (uint16_t)(i + 1);
> > - vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
> > + vring_desc_init(vr->desc, size);
> >
> > /*
> > * Disable device(host) interrupting guest
> > @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> >
> > for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> > struct rte_mbuf *txm = tx_pkts[nb_tx];
> > - int need = txm->nb_segs - txvq->vq_free_cnt + 1;
> > + int use_indirect, slots, need;
> > +
> > + use_indirect = vtpci_with_feature(txvq->hw,
> > + VIRTIO_RING_F_INDIRECT_DESC)
> > + && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);
> > +
> > + /* How many ring entries are needed to this Tx? */
> "how many descs" is more accurate , at least to me, ring entries/slots
> means entries/slots in avail ring.
> If it is OK, s/slots/descs/ as well.
The virtio spec doesn't use the words descriptors. that is more an Intel
driver terminolgy.
>
> > + slots = use_indirect ? 1 : 1 + txm->nb_segs;
> > + need = slots - txvq->vq_free_cnt;
> >
> > /* Positive value indicates it need free vring descriptors */
> > if (need > 0) {
> > @@ -769,7 +797,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > need = RTE_MIN(need, (int)nb_used);
> >
> > virtio_xmit_cleanup(txvq, need);
> > - need = txm->nb_segs - txvq->vq_free_cnt + 1;
> > + need = slots - txvq->vq_free_cnt;
> > if (unlikely(need > 0)) {
> > PMD_TX_LOG(ERR,
> > "No free tx descriptors to transmit");
> > @@ -787,7 +815,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > }
> >
> > /* Enqueue Packet buffers */
> > - error = virtqueue_enqueue_xmit(txvq, txm);
> > + error = virtqueue_enqueue_xmit(txvq, txm, use_indirect);
> > if (unlikely(error)) {
> > if (error == ENOSPC)
> > PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
> > diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
> > index 7789411..fe3fa66 100644
> > --- a/drivers/net/virtio/virtqueue.h
> > +++ b/drivers/net/virtio/virtqueue.h
> > @@ -237,6 +237,25 @@ struct virtio_net_hdr_mrg_rxbuf {
> > uint16_t num_buffers; /**< Number of merged rx buffers */
> > };
> >
> > +/* Region reserved to allow for transmit header and indirect ring */
> > +#define VIRTIO_MAX_TX_INDIRECT 8
> > +struct virtio_tx_region {
> > + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
> Any reason to use merge-able header here?
> > + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
> > + __attribute__((__aligned__(16)));
> WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
> [...]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit
2015-10-19 8:02 ` Xie, Huawei
@ 2015-10-19 15:48 ` Stephen Hemminger
2015-10-19 16:27 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 15:48 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev, changchun.ouyang
On Mon, 19 Oct 2015 08:02:07 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> > The space check for transmit ring only needs a single conditional.
> > I.e only need to recheck for space if there was no space in first check.
> I see you reorganized the code. It is more clear and readable with your
> change, but no check is removed.
> We could remove the check after virtio_xmit_cleanup. In current
> implementation, we assume the virtio_xmit_cleanup and vq_ring_free_chain
> always succeed.
The only case that matters is if virtio_xmit_cleanup can not
free up any slots.
> > This can help performance and simplifies loop.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > drivers/net/virtio/virtio_rxtx.c | 66 ++++++++++++++++------------------------
> > 1 file changed, 27 insertions(+), 39 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> > index c5b53bb..5b50ed0 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -745,7 +745,6 @@ uint16_t
> > virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > {
> > struct virtqueue *txvq = tx_queue;
> > - struct rte_mbuf *txm;
> > uint16_t nb_used, nb_tx;
> > int error;
> >
> > @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
> > virtio_xmit_cleanup(txvq, nb_used);
> >
> > - nb_tx = 0;
> > + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> > + struct rte_mbuf *txm = tx_pkts[nb_tx];
> > + int need = txm->nb_segs - txvq->vq_free_cnt + 1;
> >
> > - while (nb_tx < nb_pkts) {
> > - /* Need one more descriptor for virtio header. */
> > - int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
> > -
> > - /*Positive value indicates it need free vring descriptors */
> > + /* Positive value indicates it need free vring descriptors */
> As you fix the comment, if it is correct, could you do s/need/needs/ as
> well, :)?
> > if (unlikely(need > 0)) {
> > nb_used = VIRTQUEUE_NUSED(txvq);
> > virtio_rmb();
> > need = RTE_MIN(need, (int)nb_used);
> >
> > virtio_xmit_cleanup(txvq, need);
> > - need = (int)tx_pkts[nb_tx]->nb_segs -
> > - txvq->vq_free_cnt + 1;
> > - }
> > -
> > - /*
> > - * Zero or negative value indicates it has enough free
> > - * descriptors to use for transmitting.
> > - */
> > - if (likely(need <= 0)) {
> > - txm = tx_pkts[nb_tx];
> > -
> > - /* Do VLAN tag insertion */
> > - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> > - error = rte_vlan_insert(&txm);
> > - if (unlikely(error)) {
> > - rte_pktmbuf_free(txm);
> > - ++nb_tx;
> > - continue;
> > - }
> Still need is rechecked here. It could be removed if virtio_xmit_cleanup
> ensures need would be <= 0 after the call.
> > + need = txm->nb_segs - txvq->vq_free_cnt + 1;
> > + if (unlikely(need > 0)) {
> > + PMD_TX_LOG(ERR,
> > + "No free tx descriptors to transmit");
> > + break;
> > }
> > + }
> >
> > - /* Enqueue Packet buffers */
> > - error = virtqueue_enqueue_xmit(txvq, txm);
> > + /* Do VLAN tag insertion */
> > + if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
> > + error = rte_vlan_insert(&txm);
> > if (unlikely(error)) {
> > - if (error == ENOSPC)
> > - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
> > - else if (error == EMSGSIZE)
> > - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
> > - else
> > - PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
> > - break;
> > + rte_pktmbuf_free(txm);
> > + continue;
> > }
> > - nb_tx++;
> > - txvq->bytes += txm->pkt_len;
> > - } else {
> > - PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
> > + }
> > +
> > + /* Enqueue Packet buffers */
> > + error = virtqueue_enqueue_xmit(txvq, txm);
> > + if (unlikely(error)) {
> > + if (error == ENOSPC)
> > + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
> > + else if (error == EMSGSIZE)
> > + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
> > + else
> > + PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
> > break;
> > }
> > + txvq->bytes += txm->pkt_len;
> > }
> >
> > txvq->packets += nb_tx;
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements
2015-10-19 15:47 ` Stephen Hemminger
@ 2015-10-19 16:18 ` Xie, Huawei
0 siblings, 0 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-19 16:18 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/19/2015 11:47 PM, Stephen Hemminger wrote:
> On Mon, 19 Oct 2015 13:19:50 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>>> static int
>>> -virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
>>> +virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie,
>>> + int use_indirect)
>>> {
>>> struct vq_desc_extra *dxp;
>>> struct vring_desc *start_dp;
>>> uint16_t seg_num = cookie->nb_segs;
>>> - uint16_t needed = 1 + seg_num;
>>> + uint16_t needed = use_indirect ? 1 : 1 + seg_num;
>>> uint16_t head_idx, idx;
>>> - uint16_t head_size = txvq->hw->vtnet_hdr_size;
>>> + unsigned long offs;
>>>
>>> if (unlikely(txvq->vq_free_cnt == 0))
>>> return -ENOSPC;
>>> @@ -220,12 +221,29 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
>>> dxp = &txvq->vq_descx[idx];
>>> dxp->cookie = (void *)cookie;
>>> dxp->ndescs = needed;
>>> -
>>> start_dp = txvq->vq_ring.desc;
>>> - start_dp[idx].addr =
>>> - txvq->virtio_net_hdr_mem + idx * head_size;
>>> - start_dp[idx].len = (uint32_t)head_size;
>>> - start_dp[idx].flags = VRING_DESC_F_NEXT;
>>> +
>>> + if (use_indirect) {
>>> + struct virtio_tx_region *txr
>>> + = txvq->virtio_net_hdr_mz->addr;
>>> +
>>> + offs = idx * sizeof(struct virtio_tx_region)
>>> + + offsetof(struct virtio_tx_region, tx_indir);
>>> +
>>> + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
>>> + start_dp[idx].len = (seg_num + 1) * sizeof(struct vring_desc);
>> a. In indirect mode, as we always use one descriptor, could we allocate
>> one fixed descriptor as what i did in RX/TX ring layout optimization? :).
> The code can not assume all packets will be in indirect mode. If using
> any_layout, then some packets will use that. Also if you give a packet
> where nb_segs is very large, then it falls back to old mode.
> Also some hosts (like vhost) don't support indirect.
Agree.
With always one descriptor, it is ok.
For the packets with more than VIRTIO_MAX_INDIRECT segs, we fall to old
mode.
>
>> b. If not a, we could cache the descriptor, avoid update unless the
>> fields are different. In current implementation of free desc list, we
>> could make them always use the same tx desc for the same ring slot. I am
>> to submit a patch for normal rx path.
> See above
I don't mean using fixed descriptors. Even in general implementation,
the desc allocated for the ring entry would be normally the same.
So we cache the descriptor id(in the case only one desc is used) last
allocated for each avail ring entry , compare, if the same, skip the store.
This introduces some overhead, but considering vhost has to fetch this
L1M cache line from virtio's processing core, it might be worth.
Mst has posted a patch for virtio's testcase.
Maybe it makes more sense for RX as currently we always uses one descriptor.
If you mean the "only one descriptor" again, skip this comment.
>
>> c. Could we initialize the length of all tx descriptors to be
>> VIRTIO_MAX_INDIRECT * sizeof(struct vring_desc)? Is maximum length ok
>> here? Does the spec require that the length field reflects the length of
>> real used descs, as we already have the next field to indicate the last
>> descriptor.
> The largest VIRTIO_MAX_INDIRECT possible is very large 4K
instead of
start_dp[idx].len = (seg_num + 1) * sizeof(struct vring_desc)
Is it ok to use
start_dp[idx].len = 4096 * sizeof(struct vring_desc)?
>
>
>>> + start_dp[idx].flags = VRING_DESC_F_INDIRECT;
>>> +
>>> + start_dp = txr[idx].tx_indir;
>>> + idx = 0;
>>> + } else {
>>> + offs = idx * sizeof(struct virtio_tx_region)
>>> + + offsetof(struct virtio_tx_region, tx_hdr);
>>> +
>>> + start_dp[idx].addr = txvq->virtio_net_hdr_mem + offs;
>>> + start_dp[idx].len = txvq->hw->vtnet_hdr_size;
>>> + start_dp[idx].flags = VRING_DESC_F_NEXT;
>>> + }
>>>
>>> for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
>>> idx = start_dp[idx].next;
>> This looks weird to me. Why skip the first user provided descriptor?
>> idx = 0
>> idx = start_dp[idx].next
>> start_dp[idx].addr = ...
> The first descriptor (0) is initialized once to point to the static
> all zeros tx header. Then code skips to second entry to initailize the
> first data block.
Ah, forget the header.
>>> @@ -236,7 +254,12 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
>>> }
>>>
>>> start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
>>> - idx = start_dp[idx].next;
>>> +
>>> + if (use_indirect)
>>> + idx = txvq->vq_ring.desc[head_idx].next;
>>> + else
>>> + idx = start_dp[idx].next;
>>> +
>>> txvq->vq_desc_head_idx = idx;
>>> if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
>>> txvq->vq_desc_tail_idx = idx;
>>> @@ -261,7 +284,7 @@ static void
>>> virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
>>> {
>>> struct rte_mbuf *m;
>>> - int i, nbufs, error, size = vq->vq_nentries;
>>> + int nbufs, error, size = vq->vq_nentries;
>>> struct vring *vr = &vq->vq_ring;
>>> uint8_t *ring_mem = vq->vq_ring_virt_mem;
>>>
>>> @@ -279,10 +302,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
>>> vq->vq_free_cnt = vq->vq_nentries;
>>> memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
>>>
>>> - /* Chain all the descriptors in the ring with an END */
>>> - for (i = 0; i < size - 1; i++)
>>> - vr->desc[i].next = (uint16_t)(i + 1);
>>> - vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
>>> + vring_desc_init(vr->desc, size);
>>>
>>> /*
>>> * Disable device(host) interrupting guest
>>> @@ -760,7 +780,15 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>>>
>>> for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>>> struct rte_mbuf *txm = tx_pkts[nb_tx];
>>> - int need = txm->nb_segs - txvq->vq_free_cnt + 1;
>>> + int use_indirect, slots, need;
>>> +
>>> + use_indirect = vtpci_with_feature(txvq->hw,
>>> + VIRTIO_RING_F_INDIRECT_DESC)
>>> + && (txm->nb_segs < VIRTIO_MAX_TX_INDIRECT);
>>> +
>>> + /* How many ring entries are needed to this Tx? */
>> "how many descs" is more accurate , at least to me, ring entries/slots
>> means entries/slots in avail ring.
>> If it is OK, s/slots/descs/ as well.
> The virtio spec doesn't use the words descriptors. that is more an Intel
> driver terminolgy.
>
>>> + slots = use_indirect ? 1 : 1 + txm->nb_segs;
>>> + need = slots - txvq->vq_free_cnt;
>>>
>>> /* Positive value indicates it need free vring descriptors */
>>> if (need > 0) {
>>> @@ -769,7 +797,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>>> need = RTE_MIN(need, (int)nb_used);
>>>
>>> virtio_xmit_cleanup(txvq, need);
>>> - need = txm->nb_segs - txvq->vq_free_cnt + 1;
>>> + need = slots - txvq->vq_free_cnt;
>>> if (unlikely(need > 0)) {
>>> PMD_TX_LOG(ERR,
>>> "No free tx descriptors to transmit");
>>> @@ -787,7 +815,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>>> }
>>>
>>> /* Enqueue Packet buffers */
>>> - error = virtqueue_enqueue_xmit(txvq, txm);
>>> + error = virtqueue_enqueue_xmit(txvq, txm, use_indirect);
>>> if (unlikely(error)) {
>>> if (error == ENOSPC)
>>> PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
>>> diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
>>> index 7789411..fe3fa66 100644
>>> --- a/drivers/net/virtio/virtqueue.h
>>> +++ b/drivers/net/virtio/virtqueue.h
>>> @@ -237,6 +237,25 @@ struct virtio_net_hdr_mrg_rxbuf {
>>> uint16_t num_buffers; /**< Number of merged rx buffers */
>>> };
>>>
>>> +/* Region reserved to allow for transmit header and indirect ring */
>>> +#define VIRTIO_MAX_TX_INDIRECT 8
>>> +struct virtio_tx_region {
>>> + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
>> Any reason to use merge-able header here?
>>> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
>>> + __attribute__((__aligned__(16)));
>> WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
>> [...]
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit
2015-10-19 15:48 ` Stephen Hemminger
@ 2015-10-19 16:27 ` Xie, Huawei
0 siblings, 0 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-19 16:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/19/2015 11:48 PM, Stephen Hemminger wrote:
> On Mon, 19 Oct 2015 08:02:07 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>>> The space check for transmit ring only needs a single conditional.
>>> I.e only need to recheck for space if there was no space in first check.
>> I see you reorganized the code. It is more clear and readable with your
>> change, but no check is removed.
>> We could remove the check after virtio_xmit_cleanup. In current
>> implementation, we assume the virtio_xmit_cleanup and vq_ring_free_chain
>> always succeed.
> The only case that matters is if virtio_xmit_cleanup can not
> free up any slots.
Is there the case in current implementation virtio_xmit_cleanup could
free up any descriptors, i.e, dxp->ndesc is zero?
vq->free_cnt += dxp->ndesc
>
>>> This can help performance and simplifies loop.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> ---
>>> drivers/net/virtio/virtio_rxtx.c | 66 ++++++++++++++++------------------------
>>> 1 file changed, 27 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
>>> index c5b53bb..5b50ed0 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -745,7 +745,6 @@ uint16_t
>>> virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>>> {
>>> struct virtqueue *txvq = tx_queue;
>>> - struct rte_mbuf *txm;
>>> uint16_t nb_used, nb_tx;
>>> int error;
>>>
>>> @@ -759,57 +758,46 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>>> if (likely(nb_used > txvq->vq_nentries - txvq->vq_free_thresh))
>>> virtio_xmit_cleanup(txvq, nb_used);
>>>
>>> - nb_tx = 0;
>>> + for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>>> + struct rte_mbuf *txm = tx_pkts[nb_tx];
>>> + int need = txm->nb_segs - txvq->vq_free_cnt + 1;
>>>
>>> - while (nb_tx < nb_pkts) {
>>> - /* Need one more descriptor for virtio header. */
>>> - int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
>>> -
>>> - /*Positive value indicates it need free vring descriptors */
>>> + /* Positive value indicates it need free vring descriptors */
>> As you fix the comment, if it is correct, could you do s/need/needs/ as
>> well, :)?
>>> if (unlikely(need > 0)) {
>>> nb_used = VIRTQUEUE_NUSED(txvq);
>>> virtio_rmb();
>>> need = RTE_MIN(need, (int)nb_used);
>>>
>>> virtio_xmit_cleanup(txvq, need);
>>> - need = (int)tx_pkts[nb_tx]->nb_segs -
>>> - txvq->vq_free_cnt + 1;
>>> - }
>>> -
>>> - /*
>>> - * Zero or negative value indicates it has enough free
>>> - * descriptors to use for transmitting.
>>> - */
>>> - if (likely(need <= 0)) {
>>> - txm = tx_pkts[nb_tx];
>>> -
>>> - /* Do VLAN tag insertion */
>>> - if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
>>> - error = rte_vlan_insert(&txm);
>>> - if (unlikely(error)) {
>>> - rte_pktmbuf_free(txm);
>>> - ++nb_tx;
>>> - continue;
>>> - }
>> Still need is rechecked here. It could be removed if virtio_xmit_cleanup
>> ensures need would be <= 0 after the call.
>>> + need = txm->nb_segs - txvq->vq_free_cnt + 1;
>>> + if (unlikely(need > 0)) {
>>> + PMD_TX_LOG(ERR,
>>> + "No free tx descriptors to transmit");
>>> + break;
>>> }
>>> + }
>>>
>>> - /* Enqueue Packet buffers */
>>> - error = virtqueue_enqueue_xmit(txvq, txm);
>>> + /* Do VLAN tag insertion */
>>> + if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
>>> + error = rte_vlan_insert(&txm);
>>> if (unlikely(error)) {
>>> - if (error == ENOSPC)
>>> - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
>>> - else if (error == EMSGSIZE)
>>> - PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
>>> - else
>>> - PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
>>> - break;
>>> + rte_pktmbuf_free(txm);
>>> + continue;
>>> }
>>> - nb_tx++;
>>> - txvq->bytes += txm->pkt_len;
>>> - } else {
>>> - PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
>>> + }
>>> +
>>> + /* Enqueue Packet buffers */
>>> + error = virtqueue_enqueue_xmit(txvq, txm);
>>> + if (unlikely(error)) {
>>> + if (error == ENOSPC)
>>> + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
>>> + else if (error == EMSGSIZE)
>>> + PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
>>> + else
>>> + PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
>>> break;
>>> }
>>> + txvq->bytes += txm->pkt_len;
>>> }
>>>
>>> txvq->packets += nb_tx;
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
2015-10-19 5:16 ` [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit Stephen Hemminger
@ 2015-10-19 16:28 ` Xie, Huawei
2015-10-19 16:43 ` Stephen Hemminger
2015-10-19 17:19 ` Stephen Hemminger
0 siblings, 2 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-19 16:28 UTC (permalink / raw)
To: Stephen Hemminger, changchun.ouyang; +Cc: dev
On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> Virtio supports a feature that allows sender to put transmit
> header prepended to data. It requires that the mbuf be writeable, correct
> alignment, and the feature has been negotiatied. If all this works out,
> then it will be the optimum way to transmit a single segment packet.
"When using legacy interfaces, transitional drivers which have not
negotiated VIRTIO_F_ANY_LAYOUT
MUST use a single descriptor for the struct virtio_net_hdr on both
transmit and receive, with the
network data in the following descriptors."
I think we shouldn't assume that virtio header descriptor uses a
separate descriptor. It could be with data. Virtio RX(and dpdk vhost)
actually is implemented like this before, i.e, i thought this should be
inherent but not a feature.
Is the current RX implementation wrong?
[...]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
2015-10-19 16:28 ` Xie, Huawei
@ 2015-10-19 16:43 ` Stephen Hemminger
2015-10-19 16:56 ` Xie, Huawei
2015-10-19 17:19 ` Stephen Hemminger
1 sibling, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 16:43 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev, changchun.ouyang
On Mon, 19 Oct 2015 16:28:30 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> > Virtio supports a feature that allows sender to put transmit
> > header prepended to data. It requires that the mbuf be writeable, correct
> > alignment, and the feature has been negotiatied. If all this works out,
> > then it will be the optimum way to transmit a single segment packet.
> "When using legacy interfaces, transitional drivers which have not
> negotiated VIRTIO_F_ANY_LAYOUT
> MUST use a single descriptor for the struct virtio_net_hdr on both
> transmit and receive, with the
> network data in the following descriptors."
The code checks for the any layout feature, what is the problem?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
2015-10-19 16:43 ` Stephen Hemminger
@ 2015-10-19 16:56 ` Xie, Huawei
2015-10-26 23:47 ` Stephen Hemminger
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2015-10-19 16:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/20/2015 12:43 AM, Stephen Hemminger wrote:
> On Mon, 19 Oct 2015 16:28:30 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>>> Virtio supports a feature that allows sender to put transmit
>>> header prepended to data. It requires that the mbuf be writeable, correct
>>> alignment, and the feature has been negotiatied. If all this works out,
>>> then it will be the optimum way to transmit a single segment packet.
>> "When using legacy interfaces, transitional drivers which have not
>> negotiated VIRTIO_F_ANY_LAYOUT
>> MUST use a single descriptor for the struct virtio_net_hdr on both
>> transmit and receive, with the
>> network data in the following descriptors."
> The code checks for the any layout feature, what is the problem?
My reply is removed. I said virtio RX is already implemented using this
feature by default without negotiation(at the time of implementation, no
idea of this feature), is the RX implementation wrong?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
2015-10-19 16:28 ` Xie, Huawei
2015-10-19 16:43 ` Stephen Hemminger
@ 2015-10-19 17:19 ` Stephen Hemminger
1 sibling, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-19 17:19 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev, changchun.ouyang
On Mon, 19 Oct 2015 16:28:30 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> "When using legacy interfaces, transitional drivers which have not
> negotiated VIRTIO_F_ANY_LAYOUT
> MUST use a single descriptor for the struct virtio_net_hdr on both
> transmit and receive, with the
> network data in the following descriptors."
>
> I think we shouldn't assume that virtio header descriptor uses a
> separate descriptor. It could be with data. Virtio RX(and dpdk vhost)
> actually is implemented like this before, i.e, i thought this should be
> inherent but not a feature.
> Is the current RX implementation wrong?
I believe current RX is ok, the any layout refers more to what is
handed to the host on transmit. Rusty said something like
"any sane implementation would work with contiguous buffer"
but the standard couldn't assume sanity!
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue
2015-10-19 5:16 ` [dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue Stephen Hemminger
@ 2015-10-20 1:48 ` Xie, Huawei
0 siblings, 0 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-20 1:48 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> All the error checks in virtqueue_enqueue_xmit are already done
> by the caller. Therefore they can be removed to improve performance.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Exactly the same thought as i commented in previous patch. We should
remove unnecessary checks as caller ensures they will pass.
We need check whether we need recheck "need" again after
virtio_xmit_cleanup for the previous indirect patch.
Acked-by: Huawei Xie <huawei.xie@intel.com>
> ---
> drivers/net/virtio/virtio_rxtx.c | 23 ++---------------------
> 1 file changed, 2 insertions(+), 21 deletions(-)
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
` (4 preceding siblings ...)
2015-10-19 5:16 ` [dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue Stephen Hemminger
@ 2015-10-21 13:18 ` Thomas Monjalon
2015-10-22 10:38 ` Xie, Huawei
2015-10-26 14:05 ` Xie, Huawei
6 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2015-10-21 13:18 UTC (permalink / raw)
To: huawei.xie; +Cc: dev
2015-10-18 22:16, Stephen Hemminger:
> This is a tested version of the virtio Tx performance improvements
> that I posted earlier on the list, and described at the DPDK Userspace
> meeting in Dublin. Together they get a 25% performance improvement for
> both small packet and large multi-segment packet case when testing
> from DPDK guest application to Linux KVM host.
>
> Stephen Hemminger (5):
> virtio: clean up space checks on xmit
> virtio: don't use unlikely for normal tx stuff
> virtio: use indirect ring elements
> virtio: use any layout on transmit
> virtio: optimize transmit enqueue
Huawei, do you ack this series?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-21 13:18 ` [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Thomas Monjalon
@ 2015-10-22 10:38 ` Xie, Huawei
2015-10-22 12:13 ` Xie, Huawei
2015-10-22 16:04 ` Stephen Hemminger
0 siblings, 2 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-22 10:38 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev
On 10/21/2015 9:20 PM, Thomas Monjalon wrote:
> 2015-10-18 22:16, Stephen Hemminger:
>> This is a tested version of the virtio Tx performance improvements
>> that I posted earlier on the list, and described at the DPDK Userspace
>> meeting in Dublin. Together they get a 25% performance improvement for
>> both small packet and large multi-segment packet case when testing
>> from DPDK guest application to Linux KVM host.
>>
>> Stephen Hemminger (5):
>> virtio: clean up space checks on xmit
>> virtio: don't use unlikely for normal tx stuff
>> virtio: use indirect ring elements
>> virtio: use any layout on transmit
>> virtio: optimize transmit enqueue
> Huawei, do you ack this series?
>
Okay with this patchset with two remained questions,
+/* Region reserved to allow for transmit header and indirect ring */
+#define VIRTIO_MAX_TX_INDIRECT 8
+struct virtio_tx_region {
+ struct virtio_net_hdr_mrg_rxbuf tx_hdr;
Why use merge-able rx header here in the tx region?
> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
> + __attribute__((__aligned__(16)));
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
[...]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-22 10:38 ` Xie, Huawei
@ 2015-10-22 12:13 ` Xie, Huawei
2015-10-22 16:04 ` Stephen Hemminger
1 sibling, 0 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-22 12:13 UTC (permalink / raw)
To: dev, Stephen Hemminger
On 10/22/2015 6:39 PM, Xie, Huawei wrote:
> On 10/21/2015 9:20 PM, Thomas Monjalon wrote:
>> 2015-10-18 22:16, Stephen Hemminger:
>>> This is a tested version of the virtio Tx performance improvements
>>> that I posted earlier on the list, and described at the DPDK Userspace
>>> meeting in Dublin. Together they get a 25% performance improvement for
>>> both small packet and large multi-segment packet case when testing
>>> from DPDK guest application to Linux KVM host.
>>>
>>> Stephen Hemminger (5):
>>> virtio: clean up space checks on xmit
>>> virtio: don't use unlikely for normal tx stuff
>>> virtio: use indirect ring elements
>>> virtio: use any layout on transmit
>>> virtio: optimize transmit enqueue
>> Huawei, do you ack this series?
>>
> Okay with this patchset with two remained questions,
Forget to cc Stephen.
>
> +/* Region reserved to allow for transmit header and indirect ring */
> +#define VIRTIO_MAX_TX_INDIRECT 8
> +struct virtio_tx_region {
> + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
>
> Why use merge-able rx header here in the tx region?
>
>> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
>> + __attribute__((__aligned__(16)));
> WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
> [...]
>
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-22 10:38 ` Xie, Huawei
2015-10-22 12:13 ` Xie, Huawei
@ 2015-10-22 16:04 ` Stephen Hemminger
2015-10-23 9:00 ` Xie, Huawei
1 sibling, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-22 16:04 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Thu, 22 Oct 2015 10:38:33 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> On 10/21/2015 9:20 PM, Thomas Monjalon wrote:
> > 2015-10-18 22:16, Stephen Hemminger:
> >> This is a tested version of the virtio Tx performance improvements
> >> that I posted earlier on the list, and described at the DPDK Userspace
> >> meeting in Dublin. Together they get a 25% performance improvement for
> >> both small packet and large multi-segment packet case when testing
> >> from DPDK guest application to Linux KVM host.
> >>
> >> Stephen Hemminger (5):
> >> virtio: clean up space checks on xmit
> >> virtio: don't use unlikely for normal tx stuff
> >> virtio: use indirect ring elements
> >> virtio: use any layout on transmit
> >> virtio: optimize transmit enqueue
> > Huawei, do you ack this series?
> >
> Okay with this patchset with two remained questions,
>
> +/* Region reserved to allow for transmit header and indirect ring */
> +#define VIRTIO_MAX_TX_INDIRECT 8
> +struct virtio_tx_region {
> + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
>
> Why use merge-able rx header here in the tx region?
If mergeable rx is negotiated then the header must be used for
both Tx and Rx. I chose to allocate the largest possible header
needed, rather than having to deal with variable size data structure.
>
> > + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
> > + __attribute__((__aligned__(16)));
>
> WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
That is true in kernel, but there is no __aligned macro in DPDK code.
It could be changed to __rte_aligned_16?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-22 16:04 ` Stephen Hemminger
@ 2015-10-23 9:00 ` Xie, Huawei
2015-10-26 23:52 ` Stephen Hemminger
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2015-10-23 9:00 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/23/2015 12:05 AM, Stephen Hemminger wrote:
> On Thu, 22 Oct 2015 10:38:33 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>> On 10/21/2015 9:20 PM, Thomas Monjalon wrote:
>>> 2015-10-18 22:16, Stephen Hemminger:
>>>> This is a tested version of the virtio Tx performance improvements
>>>> that I posted earlier on the list, and described at the DPDK Userspace
>>>> meeting in Dublin. Together they get a 25% performance improvement for
>>>> both small packet and large multi-segment packet case when testing
>>>> from DPDK guest application to Linux KVM host.
>>>>
>>>> Stephen Hemminger (5):
>>>> virtio: clean up space checks on xmit
>>>> virtio: don't use unlikely for normal tx stuff
>>>> virtio: use indirect ring elements
>>>> virtio: use any layout on transmit
>>>> virtio: optimize transmit enqueue
>>> Huawei, do you ack this series?
>>>
>> Okay with this patchset with two remained questions,
>>
>> +/* Region reserved to allow for transmit header and indirect ring */
>> +#define VIRTIO_MAX_TX_INDIRECT 8
>> +struct virtio_tx_region {
>> + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
>>
>> Why use merge-able rx header here in the tx region?
> If mergeable rx is negotiated then the header must be used for
> both Tx and Rx. I chose to allocate the largest possible header
> needed, rather than having to deal with variable size data structure.
Our original code is also using merge-able header for TX descriptor if
this negotiated.
I checked the virtio spec, all of the merge-able header is about
receiving buffers, which is expected. That is why i feel weird here.
Maybe not a big deal?
>>> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
>>> + __attribute__((__aligned__(16)));
>> WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
> That is true in kernel, but there is no __aligned macro in DPDK code.
ok, then we ignore this warning as __rte_cache_aligned is also using
this style.
> It could be changed to __rte_aligned_16?
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
` (5 preceding siblings ...)
2015-10-21 13:18 ` [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Thomas Monjalon
@ 2015-10-26 14:05 ` Xie, Huawei
2016-01-05 8:10 ` Xie, Huawei
6 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2015-10-26 14:05 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> This is a tested version of the virtio Tx performance improvements
> that I posted earlier on the list, and described at the DPDK Userspace
> meeting in Dublin. Together they get a 25% performance improvement for
> both small packet and large multi-segment packet case when testing
> from DPDK guest application to Linux KVM host.
>
> Stephen Hemminger (5):
> virtio: clean up space checks on xmit
> virtio: don't use unlikely for normal tx stuff
> virtio: use indirect ring elements
> virtio: use any layout on transmit
> virtio: optimize transmit enqueue
There is one open why merge-able header is used in tx path. Since old
implementation is also using the merge-able header in tx path if this
feature is negotiated, i choose to ack the patch and address this later
if not now.
Acked-by: Huawei Xie <huawei.xie@intel.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit
2015-10-19 16:56 ` Xie, Huawei
@ 2015-10-26 23:47 ` Stephen Hemminger
0 siblings, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-26 23:47 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Mon, 19 Oct 2015 16:56:02 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> On 10/20/2015 12:43 AM, Stephen Hemminger wrote:
> > On Mon, 19 Oct 2015 16:28:30 +0000
> > "Xie, Huawei" <huawei.xie@intel.com> wrote:
> >
> >> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> >>> Virtio supports a feature that allows sender to put transmit
> >>> header prepended to data. It requires that the mbuf be writeable, correct
> >>> alignment, and the feature has been negotiatied. If all this works out,
> >>> then it will be the optimum way to transmit a single segment packet.
> >> "When using legacy interfaces, transitional drivers which have not
> >> negotiated VIRTIO_F_ANY_LAYOUT
> >> MUST use a single descriptor for the struct virtio_net_hdr on both
> >> transmit and receive, with the
> >> network data in the following descriptors."
> > The code checks for the any layout feature, what is the problem?
> My reply is removed. I said virtio RX is already implemented using this
> feature by default without negotiation(at the time of implementation, no
> idea of this feature), is the RX implementation wrong?
>
No receiver is fine, it is okay to handle it coming in as long
as it doesn't assume that is the only possible layout.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-23 9:00 ` Xie, Huawei
@ 2015-10-26 23:52 ` Stephen Hemminger
2015-10-27 1:56 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-26 23:52 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Fri, 23 Oct 2015 09:00:38 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> >> Why use merge-able rx header here in the tx region?
> > If mergeable rx is negotiated then the header must be used for
> > both Tx and Rx. I chose to allocate the largest possible header
> > needed, rather than having to deal with variable size data structure.
> Our original code is also using merge-able header for TX descriptor if
> this negotiated.
> I checked the virtio spec, all of the merge-able header is about
> receiving buffers, which is expected. That is why i feel weird here.
> Maybe not a big deal?
Since num_buffers is only in merge-able header, the negotiation is implied
to be symmetric.
Reading 0.95 spec
Under "Packet Transmission"
3. If the driver negotatied the VIRTIO_NET_F_MGR_RXBUF feature
the num_buffers field is set to zero.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-26 23:52 ` Stephen Hemminger
@ 2015-10-27 1:56 ` Xie, Huawei
2015-10-27 2:23 ` Stephen Hemminger
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2015-10-27 1:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/27/2015 7:52 AM, Stephen Hemminger wrote:
> On Fri, 23 Oct 2015 09:00:38 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>>>> Why use merge-able rx header here in the tx region?
>>> If mergeable rx is negotiated then the header must be used for
>>> both Tx and Rx. I chose to allocate the largest possible header
>>> needed, rather than having to deal with variable size data structure.
>> Our original code is also using merge-able header for TX descriptor if
>> this negotiated.
>> I checked the virtio spec, all of the merge-able header is about
>> receiving buffers, which is expected. That is why i feel weird here.
>> Maybe not a big deal?
> Since num_buffers is only in merge-able header, the negotiation is implied
> to be symmetric.
>
Can we come to the conclusion that in tx case, we use merge-able header
though number_buffers is not used at all?
> Reading 0.95 spec
>
> Under "Packet Transmission"
> 3. If the driver negotatied the VIRTIO_NET_F_MGR_RXBUF feature
> the num_buffers field is set to zero.
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-27 1:56 ` Xie, Huawei
@ 2015-10-27 2:23 ` Stephen Hemminger
2015-10-27 2:38 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2015-10-27 2:23 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
You need to use the extended mergeable rx buffer format.
It is a virtio spec requirement, look at Linux virtio network driver or ask
the virtio maintainers for Linux
if you need more clarification.
On Tue, Oct 27, 2015 at 10:56 AM, Xie, Huawei <huawei.xie@intel.com> wrote:
> On 10/27/2015 7:52 AM, Stephen Hemminger wrote:
> > On Fri, 23 Oct 2015 09:00:38 +0000
> > "Xie, Huawei" <huawei.xie@intel.com> wrote:
> >
> >>>> Why use merge-able rx header here in the tx region?
> >>> If mergeable rx is negotiated then the header must be used for
> >>> both Tx and Rx. I chose to allocate the largest possible header
> >>> needed, rather than having to deal with variable size data structure.
> >> Our original code is also using merge-able header for TX descriptor if
> >> this negotiated.
> >> I checked the virtio spec, all of the merge-able header is about
> >> receiving buffers, which is expected. That is why i feel weird here.
> >> Maybe not a big deal?
> > Since num_buffers is only in merge-able header, the negotiation is
> implied
> > to be symmetric.
> >
> Can we come to the conclusion that in tx case, we use merge-able header
> though number_buffers is not used at all?
> > Reading 0.95 spec
> >
> > Under "Packet Transmission"
> > 3. If the driver negotatied the VIRTIO_NET_F_MGR_RXBUF feature
> > the num_buffers field is set to zero.
> >
> >
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-27 2:23 ` Stephen Hemminger
@ 2015-10-27 2:38 ` Xie, Huawei
0 siblings, 0 replies; 35+ messages in thread
From: Xie, Huawei @ 2015-10-27 2:38 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 10/27/2015 10:23 AM, Stephen Hemminger wrote:
> You need to use the extended mergeable rx buffer format.
> It is a virtio spec requirement, look at Linux virtio network driver
> or ask the virtio maintainers for Linux
> if you need more clarification.
Yes, it is spec requirement as far as we know though num_buffers might
not be used.
So far not a big deal for us. Let us get clarification from mst later.
>
> On Tue, Oct 27, 2015 at 10:56 AM, Xie, Huawei <huawei.xie@intel.com
> <mailto:huawei.xie@intel.com>> wrote:
>
> On 10/27/2015 7:52 AM, Stephen Hemminger wrote:
> > On Fri, 23 Oct 2015 09:00:38 +0000
> > "Xie, Huawei" <huawei.xie@intel.com
> <mailto:huawei.xie@intel.com>> wrote:
> >
> >>>> Why use merge-able rx header here in the tx region?
> >>> If mergeable rx is negotiated then the header must be used for
> >>> both Tx and Rx. I chose to allocate the largest possible header
> >>> needed, rather than having to deal with variable size data
> structure.
> >> Our original code is also using merge-able header for TX
> descriptor if
> >> this negotiated.
> >> I checked the virtio spec, all of the merge-able header is about
> >> receiving buffers, which is expected. That is why i feel weird
> here.
> >> Maybe not a big deal?
> > Since num_buffers is only in merge-able header, the negotiation
> is implied
> > to be symmetric.
> >
> Can we come to the conclusion that in tx case, we use merge-able
> header
> though number_buffers is not used at all?
> > Reading 0.95 spec
> >
> > Under "Packet Transmission"
> > 3. If the driver negotatied the VIRTIO_NET_F_MGR_RXBUF feature
> > the num_buffers field is set to zero.
> >
> >
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements
2015-10-19 5:16 ` [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements Stephen Hemminger
2015-10-19 13:19 ` Xie, Huawei
@ 2015-10-30 18:01 ` Thomas Monjalon
1 sibling, 0 replies; 35+ messages in thread
From: Thomas Monjalon @ 2015-10-30 18:01 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Sorry there is a gcc error.
2015-10-18 22:16, Stephen Hemminger:
> + for (i = 0; i < vq_size; i++) {
virtio_ethdev.c:386:17: error: comparison between signed and unsigned integer expressions
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2015-10-26 14:05 ` Xie, Huawei
@ 2016-01-05 8:10 ` Xie, Huawei
2016-01-06 12:03 ` Thomas Monjalon
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2016-01-05 8:10 UTC (permalink / raw)
To: Stephen Hemminger, Thomas Monjalon; +Cc: dev
On 10/26/2015 10:06 PM, Xie, Huawei wrote:
> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>> This is a tested version of the virtio Tx performance improvements
>> that I posted earlier on the list, and described at the DPDK Userspace
>> meeting in Dublin. Together they get a 25% performance improvement for
>> both small packet and large multi-segment packet case when testing
>> from DPDK guest application to Linux KVM host.
>>
>> Stephen Hemminger (5):
>> virtio: clean up space checks on xmit
>> virtio: don't use unlikely for normal tx stuff
>> virtio: use indirect ring elements
>> virtio: use any layout on transmit
>> virtio: optimize transmit enqueue
> There is one open why merge-able header is used in tx path. Since old
> implementation is also using the merge-able header in tx path if this
> feature is negotiated, i choose to ack the patch and address this later
> if not now.
>
> Acked-by: Huawei Xie <huawei.xie@intel.com>
Thomas:
This patch isn't in the patchwork. Does Stephen need to send a new one?
>
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2016-01-05 8:10 ` Xie, Huawei
@ 2016-01-06 12:03 ` Thomas Monjalon
2016-01-14 13:49 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2016-01-06 12:03 UTC (permalink / raw)
To: Xie, Huawei, Stephen Hemminger; +Cc: dev
2016-01-05 08:10, Xie, Huawei:
> On 10/26/2015 10:06 PM, Xie, Huawei wrote:
> > On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> >> This is a tested version of the virtio Tx performance improvements
> >> that I posted earlier on the list, and described at the DPDK Userspace
> >> meeting in Dublin. Together they get a 25% performance improvement for
> >> both small packet and large multi-segment packet case when testing
> >> from DPDK guest application to Linux KVM host.
> >>
> >> Stephen Hemminger (5):
> >> virtio: clean up space checks on xmit
> >> virtio: don't use unlikely for normal tx stuff
> >> virtio: use indirect ring elements
> >> virtio: use any layout on transmit
> >> virtio: optimize transmit enqueue
> > There is one open why merge-able header is used in tx path. Since old
> > implementation is also using the merge-able header in tx path if this
> > feature is negotiated, i choose to ack the patch and address this later
> > if not now.
> >
> > Acked-by: Huawei Xie <huawei.xie@intel.com>
>
> Thomas:
> This patch isn't in the patchwork. Does Stephen need to send a new one?
Yes please, I cannot find them in patchwork.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2016-01-06 12:03 ` Thomas Monjalon
@ 2016-01-14 13:49 ` Xie, Huawei
2016-03-04 6:18 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2016-01-14 13:49 UTC (permalink / raw)
To: Thomas Monjalon, Stephen Hemminger; +Cc: dev
On 1/6/2016 8:04 PM, Thomas Monjalon wrote:
> 2016-01-05 08:10, Xie, Huawei:
>> On 10/26/2015 10:06 PM, Xie, Huawei wrote:
>>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>>>> This is a tested version of the virtio Tx performance improvements
>>>> that I posted earlier on the list, and described at the DPDK Userspace
>>>> meeting in Dublin. Together they get a 25% performance improvement for
>>>> both small packet and large multi-segment packet case when testing
>>>> from DPDK guest application to Linux KVM host.
>>>>
>>>> Stephen Hemminger (5):
>>>> virtio: clean up space checks on xmit
>>>> virtio: don't use unlikely for normal tx stuff
>>>> virtio: use indirect ring elements
>>>> virtio: use any layout on transmit
>>>> virtio: optimize transmit enqueue
>>> There is one open why merge-able header is used in tx path. Since old
>>> implementation is also using the merge-able header in tx path if this
>>> feature is negotiated, i choose to ack the patch and address this later
>>> if not now.
>>>
>>> Acked-by: Huawei Xie <huawei.xie@intel.com>
>> Thomas:
>> This patch isn't in the patchwork. Does Stephen need to send a new one?
> Yes please, I cannot find them in patchwork.
ping
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2016-01-14 13:49 ` Xie, Huawei
@ 2016-03-04 6:18 ` Xie, Huawei
2016-03-04 18:17 ` Stephen Hemminger
0 siblings, 1 reply; 35+ messages in thread
From: Xie, Huawei @ 2016-03-04 6:18 UTC (permalink / raw)
To: Thomas Monjalon, Stephen Hemminger; +Cc: dev
On 1/14/2016 9:49 PM, Xie, Huawei wrote:
> On 1/6/2016 8:04 PM, Thomas Monjalon wrote:
>> 2016-01-05 08:10, Xie, Huawei:
>>> On 10/26/2015 10:06 PM, Xie, Huawei wrote:
>>>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
>>>>> This is a tested version of the virtio Tx performance improvements
>>>>> that I posted earlier on the list, and described at the DPDK Userspace
>>>>> meeting in Dublin. Together they get a 25% performance improvement for
>>>>> both small packet and large multi-segment packet case when testing
>>>>> from DPDK guest application to Linux KVM host.
>>>>>
>>>>> Stephen Hemminger (5):
>>>>> virtio: clean up space checks on xmit
>>>>> virtio: don't use unlikely for normal tx stuff
>>>>> virtio: use indirect ring elements
>>>>> virtio: use any layout on transmit
>>>>> virtio: optimize transmit enqueue
>>>> There is one open why merge-able header is used in tx path. Since old
>>>> implementation is also using the merge-able header in tx path if this
>>>> feature is negotiated, i choose to ack the patch and address this later
>>>> if not now.
>>>>
>>>> Acked-by: Huawei Xie <huawei.xie@intel.com>
>>> Thomas:
>>> This patch isn't in the patchwork. Does Stephen need to send a new one?
>> Yes please, I cannot find them in patchwork.
> ping
Ping.
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2016-03-04 6:18 ` Xie, Huawei
@ 2016-03-04 18:17 ` Stephen Hemminger
2016-03-08 1:38 ` Xie, Huawei
0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2016-03-04 18:17 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Fri, 4 Mar 2016 06:18:17 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> On 1/14/2016 9:49 PM, Xie, Huawei wrote:
> > On 1/6/2016 8:04 PM, Thomas Monjalon wrote:
> >> 2016-01-05 08:10, Xie, Huawei:
> >>> On 10/26/2015 10:06 PM, Xie, Huawei wrote:
> >>>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
> >>>>> This is a tested version of the virtio Tx performance improvements
> >>>>> that I posted earlier on the list, and described at the DPDK Userspace
> >>>>> meeting in Dublin. Together they get a 25% performance improvement for
> >>>>> both small packet and large multi-segment packet case when testing
> >>>>> from DPDK guest application to Linux KVM host.
> >>>>>
> >>>>> Stephen Hemminger (5):
> >>>>> virtio: clean up space checks on xmit
> >>>>> virtio: don't use unlikely for normal tx stuff
> >>>>> virtio: use indirect ring elements
> >>>>> virtio: use any layout on transmit
> >>>>> virtio: optimize transmit enqueue
> >>>> There is one open why merge-able header is used in tx path. Since old
> >>>> implementation is also using the merge-able header in tx path if this
> >>>> feature is negotiated, i choose to ack the patch and address this later
> >>>> if not now.
> >>>>
> >>>> Acked-by: Huawei Xie <huawei.xie@intel.com>
> >>> Thomas:
Resending them now. I don't understand the issue with merge-able header.
Virtio negotiation is symmetric, if receiver is using merge-able header
then the transmitter needs to send it also.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
2016-03-04 18:17 ` Stephen Hemminger
@ 2016-03-08 1:38 ` Xie, Huawei
0 siblings, 0 replies; 35+ messages in thread
From: Xie, Huawei @ 2016-03-08 1:38 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 3/5/2016 2:17 AM, Stephen Hemminger wrote:
> Resending them now. I don't understand the issue with merge-able header.
> Virtio negotiation is symmetric, if receiver is using merge-able header
> then the transmitter needs to send it also.
Yes, both receiver and transmitter use the same format of header though
merge-able is actually only meaningful in RX path.
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2016-03-08 1:39 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-19 5:16 [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 1/5] virtio: clean up space checks on xmit Stephen Hemminger
2015-10-19 8:02 ` Xie, Huawei
2015-10-19 15:48 ` Stephen Hemminger
2015-10-19 16:27 ` Xie, Huawei
2015-10-19 5:16 ` [dpdk-dev] [PATCH 2/5] virtio: don't use unlikely for normal tx stuff Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 3/5] virtio: use indirect ring elements Stephen Hemminger
2015-10-19 13:19 ` Xie, Huawei
2015-10-19 15:47 ` Stephen Hemminger
2015-10-19 16:18 ` Xie, Huawei
2015-10-30 18:01 ` Thomas Monjalon
2015-10-19 5:16 ` [dpdk-dev] [PATCH 4/5] virtio: use any layout on transmit Stephen Hemminger
2015-10-19 16:28 ` Xie, Huawei
2015-10-19 16:43 ` Stephen Hemminger
2015-10-19 16:56 ` Xie, Huawei
2015-10-26 23:47 ` Stephen Hemminger
2015-10-19 17:19 ` Stephen Hemminger
2015-10-19 5:16 ` [dpdk-dev] [PATCH 5/5] virtio: optimize transmit enqueue Stephen Hemminger
2015-10-20 1:48 ` Xie, Huawei
2015-10-21 13:18 ` [dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements Thomas Monjalon
2015-10-22 10:38 ` Xie, Huawei
2015-10-22 12:13 ` Xie, Huawei
2015-10-22 16:04 ` Stephen Hemminger
2015-10-23 9:00 ` Xie, Huawei
2015-10-26 23:52 ` Stephen Hemminger
2015-10-27 1:56 ` Xie, Huawei
2015-10-27 2:23 ` Stephen Hemminger
2015-10-27 2:38 ` Xie, Huawei
2015-10-26 14:05 ` Xie, Huawei
2016-01-05 8:10 ` Xie, Huawei
2016-01-06 12:03 ` Thomas Monjalon
2016-01-14 13:49 ` Xie, Huawei
2016-03-04 6:18 ` Xie, Huawei
2016-03-04 18:17 ` Stephen Hemminger
2016-03-08 1:38 ` Xie, Huawei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).