* [dpdk-dev] [PATCH 1/9] mbuf: make segment prefree function public
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 2/9] mbuf: make raw free " Olivier Matz
` (11 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Document the function and make it public, since it is used at several
places in the drivers. The old one is marked as deprecated.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/enic/enic_rxtx.c | 2 +-
drivers/net/fm10k/fm10k_rxtx.c | 6 +++---
drivers/net/fm10k/fm10k_rxtx_vec.c | 6 +++---
drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +++---
drivers/net/ixgbe/ixgbe_rxtx.c | 2 +-
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +++---
drivers/net/virtio/virtio_rxtx_simple.h | 6 +++---
lib/librte_mbuf/rte_mbuf.h | 30 +++++++++++++++++++++++++++---
8 files changed, 44 insertions(+), 20 deletions(-)
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 343dabc..1ee5cbb 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -473,7 +473,7 @@ static inline void enic_free_wq_bufs(struct vnic_wq *wq, u16 completed_index)
pool = ((struct rte_mbuf *)buf->mb)->pool;
for (i = 0; i < nb_to_free; i++) {
buf = &wq->bufs[tail_idx];
- m = __rte_pktmbuf_prefree_seg((struct rte_mbuf *)(buf->mb));
+ m = rte_pktmbuf_prefree_seg((struct rte_mbuf *)(buf->mb));
buf->mb = NULL;
if (unlikely(m == NULL)) {
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 144e5e6..c9bb04a 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -434,12 +434,12 @@ static inline void tx_free_bulk_mbuf(struct rte_mbuf **txep, int num)
if (unlikely(num == 0))
return;
- m = __rte_pktmbuf_prefree_seg(txep[0]);
+ m = rte_pktmbuf_prefree_seg(txep[0]);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < num; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -455,7 +455,7 @@ static inline void tx_free_bulk_mbuf(struct rte_mbuf **txep, int num)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < num; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (m != NULL)
rte_mempool_put(m->pool, m);
txep[i] = NULL;
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 27f3e43..825e3c1 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -754,12 +754,12 @@ fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
* next_dd - (rs_thresh-1)
*/
txep = &txq->sw_ring[txq->next_dd - (n - 1)];
- m = __rte_pktmbuf_prefree_seg(txep[0]);
+ m = rte_pktmbuf_prefree_seg(txep[0]);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -774,7 +774,7 @@ fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 3745558..76031fe 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -123,12 +123,12 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
* tx_next_dd - (tx_rs_thresh-1)
*/
txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
- m = __rte_pktmbuf_prefree_seg(txep[0].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool)) {
free[nb_free++] = m;
@@ -144,7 +144,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9502432..b056107 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -142,7 +142,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
/* free buffers one at a time */
- m = __rte_pktmbuf_prefree_seg(txep->mbuf);
+ m = rte_pktmbuf_prefree_seg(txep->mbuf);
txep->mbuf = NULL;
if (unlikely(m == NULL))
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
index a3473b9..a83afe5 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
@@ -123,12 +123,12 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
* tx_next_dd - (tx_rs_thresh-1)
*/
txep = &txq->sw_ring_v[txq->tx_next_dd - (n - 1)];
- m = __rte_pktmbuf_prefree_seg(txep[0].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -143,7 +143,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/drivers/net/virtio/virtio_rxtx_simple.h b/drivers/net/virtio/virtio_rxtx_simple.h
index b08f859..f531c54 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.h
+++ b/drivers/net/virtio/virtio_rxtx_simple.h
@@ -98,13 +98,13 @@ virtio_xmit_cleanup(struct virtqueue *vq)
desc_idx = (uint16_t)(vq->vq_used_cons_idx &
((vq->vq_nentries >> 1) - 1));
m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
- m = __rte_pktmbuf_prefree_seg(m);
+ m = rte_pktmbuf_prefree_seg(m);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
- m = __rte_pktmbuf_prefree_seg(m);
+ m = rte_pktmbuf_prefree_seg(m);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -123,7 +123,7 @@ virtio_xmit_cleanup(struct virtqueue *vq)
} else {
for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
- m = __rte_pktmbuf_prefree_seg(m);
+ m = rte_pktmbuf_prefree_seg(m);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ce57d47..b61c430 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1213,8 +1213,23 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
__rte_mbuf_raw_free(md);
}
-static inline struct rte_mbuf* __attribute__((always_inline))
-__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
+/**
+ * Decrease reference counter and unlink a mbuf segment
+ *
+ * This function does the same than a free, except that it does not
+ * return the segment to its pool.
+ * It decreases the reference counter, and if it reaches 0, it is
+ * detached from its parent for an indirect mbuf.
+ *
+ * @param m
+ * The mbuf to be unlinked
+ * @return
+ * - (m) if it is the last reference. It can be recycled or freed.
+ * - (NULL) if the mbuf still has remaining references on it.
+ */
+__attribute__((always_inline))
+static inline struct rte_mbuf *
+rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
__rte_mbuf_sanity_check(m, 0);
@@ -1227,6 +1242,14 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
return NULL;
}
+/* deprecated, replaced by rte_pktmbuf_prefree_seg() */
+__rte_deprecated
+static inline struct rte_mbuf *
+__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
+{
+ return rte_pktmbuf_prefree_seg(m);
+}
+
/**
* Free a segment of a packet mbuf into its original mempool.
*
@@ -1239,7 +1262,8 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
static inline void __attribute__((always_inline))
rte_pktmbuf_free_seg(struct rte_mbuf *m)
{
- if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
+ m = rte_pktmbuf_prefree_seg(m);
+ if (likely(m != NULL)) {
m->next = NULL;
__rte_mbuf_raw_free(m);
}
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 2/9] mbuf: make raw free function public
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 1/9] mbuf: make segment prefree function public Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 3/9] mbuf: set mbuf fields while in pool Olivier Matz
` (10 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/mlx5/mlx5_rxtx.c | 6 +++---
drivers/net/mpipe/mpipe_tilegx.c | 2 +-
lib/librte_mbuf/rte_mbuf.h | 22 ++++++++++++++++------
4 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index b5e6db6..5dd44d7 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -680,7 +680,7 @@ static void ena_rx_queue_release_bufs(struct ena_ring *ring)
ring->rx_buffer_info[ring->next_to_clean & ring_mask];
if (m)
- __rte_mbuf_raw_free(m);
+ rte_mbuf_raw_free(m);
ring->next_to_clean++;
}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 88b0354..41a5bb2 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1399,7 +1399,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(pkt != (*rxq->elts)[idx]);
rep = NEXT(pkt);
rte_mbuf_refcnt_set(pkt, 0);
- __rte_mbuf_raw_free(pkt);
+ rte_mbuf_raw_free(pkt);
pkt = rep;
}
break;
@@ -1410,13 +1410,13 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
&rss_hash_res);
if (!len) {
rte_mbuf_refcnt_set(rep, 0);
- __rte_mbuf_raw_free(rep);
+ rte_mbuf_raw_free(rep);
break;
}
if (unlikely(len == -1)) {
/* RX error, packet is likely too large. */
rte_mbuf_refcnt_set(rep, 0);
- __rte_mbuf_raw_free(rep);
+ rte_mbuf_raw_free(rep);
++rxq->stats.idropped;
goto skip;
}
diff --git a/drivers/net/mpipe/mpipe_tilegx.c b/drivers/net/mpipe/mpipe_tilegx.c
index 60d5f81..536b8ea 100644
--- a/drivers/net/mpipe/mpipe_tilegx.c
+++ b/drivers/net/mpipe/mpipe_tilegx.c
@@ -558,7 +558,7 @@ mpipe_recv_flush_stack(struct mpipe_dev_priv *priv)
mbuf->data_len = 0;
mbuf->pkt_len = 0;
- __rte_mbuf_raw_free(mbuf);
+ rte_mbuf_raw_free(mbuf);
}
}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index b61c430..575dc9d 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -790,20 +790,30 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
}
/**
- * @internal Put mbuf back into its original mempool.
- * The use of that function is reserved for RTE internal needs.
- * Please use rte_pktmbuf_free().
+ * Put mbuf back into its original mempool.
+ *
+ * The caller must ensure that the mbuf is direct and that the
+ * reference counter is 0.
*
* @param m
* The mbuf to be freed.
*/
static inline void __attribute__((always_inline))
-__rte_mbuf_raw_free(struct rte_mbuf *m)
+rte_mbuf_raw_free(struct rte_mbuf *m)
{
+ RTE_ASSERT(RTE_MBUF_DIRECT(m));
RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
rte_mempool_put(m->pool, m);
}
+/* compat with older versions */
+__rte_deprecated
+static inline void __attribute__((always_inline))
+__rte_mbuf_raw_free(struct rte_mbuf *m)
+{
+ rte_mbuf_raw_free(m);
+}
+
/* Operations on ctrl mbuf */
/**
@@ -1210,7 +1220,7 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
m->ol_flags = 0;
if (rte_mbuf_refcnt_update(md, -1) == 0)
- __rte_mbuf_raw_free(md);
+ rte_mbuf_raw_free(md);
}
/**
@@ -1265,7 +1275,7 @@ rte_pktmbuf_free_seg(struct rte_mbuf *m)
m = rte_pktmbuf_prefree_seg(m);
if (likely(m != NULL)) {
m->next = NULL;
- __rte_mbuf_raw_free(m);
+ rte_mbuf_raw_free(m);
}
}
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 3/9] mbuf: set mbuf fields while in pool
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 1/9] mbuf: make segment prefree function public Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 2/9] mbuf: make raw free " Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-31 11:21 ` Bruce Richardson
2017-03-08 9:41 ` [dpdk-dev] [PATCH 4/9] drivers/net: don't touch mbuf next or nb segs on Rx Olivier Matz
` (9 subsequent siblings)
12 siblings, 1 reply; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/mlx5/mlx5_rxtx.c | 5 ++---
drivers/net/mpipe/mpipe_tilegx.c | 1 +
lib/librte_mbuf/rte_mbuf.c | 2 ++
lib/librte_mbuf/rte_mbuf.h | 42 +++++++++++++++++++++++++++++-----------
4 files changed, 36 insertions(+), 14 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 41a5bb2..fc59544 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1398,7 +1398,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
while (pkt != seg) {
assert(pkt != (*rxq->elts)[idx]);
rep = NEXT(pkt);
- rte_mbuf_refcnt_set(pkt, 0);
+ NEXT(pkt) = NULL;
+ NB_SEGS(pkt) = 1;
rte_mbuf_raw_free(pkt);
pkt = rep;
}
@@ -1409,13 +1410,11 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
len = mlx5_rx_poll_len(rxq, cqe, cqe_cnt,
&rss_hash_res);
if (!len) {
- rte_mbuf_refcnt_set(rep, 0);
rte_mbuf_raw_free(rep);
break;
}
if (unlikely(len == -1)) {
/* RX error, packet is likely too large. */
- rte_mbuf_refcnt_set(rep, 0);
rte_mbuf_raw_free(rep);
++rxq->stats.idropped;
goto skip;
diff --git a/drivers/net/mpipe/mpipe_tilegx.c b/drivers/net/mpipe/mpipe_tilegx.c
index 536b8ea..0135e2f 100644
--- a/drivers/net/mpipe/mpipe_tilegx.c
+++ b/drivers/net/mpipe/mpipe_tilegx.c
@@ -557,6 +557,7 @@ mpipe_recv_flush_stack(struct mpipe_dev_priv *priv)
mbuf->packet_type = 0;
mbuf->data_len = 0;
mbuf->pkt_len = 0;
+ mbuf->next = NULL;
rte_mbuf_raw_free(mbuf);
}
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 72ad91e..0acc810 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -145,6 +145,8 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->pool = mp;
m->nb_segs = 1;
m->port = 0xff;
+ rte_mbuf_refcnt_set(m, 1);
+ m->next = NULL;
}
/* helper to create a mbuf pool */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 575dc9d..b4fe786 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -768,6 +768,11 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
* initializing all the required fields. See rte_pktmbuf_reset().
* For standard needs, prefer rte_pktmbuf_alloc().
*
+ * The caller can expect that the following fields of the mbuf structure
+ * are initialized: buf_addr, buf_physaddr, buf_len, refcnt=1, nb_segs=1,
+ * next=NULL, pool, priv_size. The other fields must be initialized
+ * by the caller.
+ *
* @param mp
* The mempool from which mbuf is allocated.
* @return
@@ -782,8 +787,9 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
if (rte_mempool_get(mp, &mb) < 0)
return NULL;
m = (struct rte_mbuf *)mb;
- RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
- rte_mbuf_refcnt_set(m, 1);
+ RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+ RTE_ASSERT(m->next == NULL);
+ RTE_ASSERT(m->nb_segs == 1);
__rte_mbuf_sanity_check(m, 0);
return m;
@@ -792,8 +798,13 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
/**
* Put mbuf back into its original mempool.
*
- * The caller must ensure that the mbuf is direct and that the
- * reference counter is 0.
+ * The caller must ensure that the mbuf is direct and properly
+ * reinitialized (refcnt=1, next=NULL, nb_segs=1), as done by
+ * rte_pktmbuf_prefree_seg().
+ *
+ * This function should be used with care, when optimization is
+ * required. For standard needs, prefer rte_pktmbuf_free() or
+ * rte_pktmbuf_free_seg().
*
* @param m
* The mbuf to be freed.
@@ -802,13 +813,16 @@ static inline void __attribute__((always_inline))
rte_mbuf_raw_free(struct rte_mbuf *m)
{
RTE_ASSERT(RTE_MBUF_DIRECT(m));
- RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
+ RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+ RTE_ASSERT(m->next == NULL);
+ RTE_ASSERT(m->nb_segs == 1);
+ __rte_mbuf_sanity_check(m, 0);
rte_mempool_put(m->pool, m);
}
/* compat with older versions */
__rte_deprecated
-static inline void __attribute__((always_inline))
+static inline void
__rte_mbuf_raw_free(struct rte_mbuf *m)
{
rte_mbuf_raw_free(m);
@@ -1219,8 +1233,12 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
m->data_len = 0;
m->ol_flags = 0;
- if (rte_mbuf_refcnt_update(md, -1) == 0)
+ if (rte_mbuf_refcnt_update(md, -1) == 0) {
+ md->next = NULL;
+ md->nb_segs = 1;
+ rte_mbuf_refcnt_set(md, 1);
rte_mbuf_raw_free(md);
+ }
}
/**
@@ -1244,9 +1262,13 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
__rte_mbuf_sanity_check(m, 0);
if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
- /* if this is an indirect mbuf, it is detached. */
if (RTE_MBUF_INDIRECT(m))
rte_pktmbuf_detach(m);
+
+ m->next = NULL;
+ m->nb_segs = 1;
+ rte_mbuf_refcnt_set(m, 1);
+
return m;
}
return NULL;
@@ -1273,10 +1295,8 @@ static inline void __attribute__((always_inline))
rte_pktmbuf_free_seg(struct rte_mbuf *m)
{
m = rte_pktmbuf_prefree_seg(m);
- if (likely(m != NULL)) {
- m->next = NULL;
+ if (likely(m != NULL))
rte_mbuf_raw_free(m);
- }
}
/**
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 3/9] mbuf: set mbuf fields while in pool
2017-03-08 9:41 ` [dpdk-dev] [PATCH 3/9] mbuf: set mbuf fields while in pool Olivier Matz
@ 2017-03-31 11:21 ` Bruce Richardson
2017-03-31 11:51 ` Ananyev, Konstantin
0 siblings, 1 reply; 155+ messages in thread
From: Bruce Richardson @ 2017-03-31 11:21 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, konstantin.ananyev, mb, andrey.chilikin, jblunck,
nelio.laranjeiro, arybchenko
On Wed, Mar 08, 2017 at 10:41:55AM +0100, Olivier Matz wrote:
> Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
> to NULL when the mbuf is stored inside the mempool (unused).
> This is done in rte_pktmbuf_prefree_seg(), before freeing or
> recycling a mbuf.
>
> Before this patch, the value of m->refcnt was expected to be 0
> while in pool.
>
> The objectives are:
>
> - to avoid drivers to set m->next to NULL in the early Rx path, since
> this field is in the second 64B of the mbuf and its access could
> trigger a cache miss
>
> - rationalize the behavior of raw_alloc/raw_free: one is now the
> symmetric of the other, and refcnt is never changed in these functions.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
> drivers/net/mlx5/mlx5_rxtx.c | 5 ++---
> drivers/net/mpipe/mpipe_tilegx.c | 1 +
> lib/librte_mbuf/rte_mbuf.c | 2 ++
> lib/librte_mbuf/rte_mbuf.h | 42 +++++++++++++++++++++++++++++-----------
> 4 files changed, 36 insertions(+), 14 deletions(-)
>
<snip>
> /**
> @@ -1244,9 +1262,13 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> __rte_mbuf_sanity_check(m, 0);
>
> if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
> - /* if this is an indirect mbuf, it is detached. */
> if (RTE_MBUF_INDIRECT(m))
> rte_pktmbuf_detach(m);
> +
> + m->next = NULL;
> + m->nb_segs = 1;
> + rte_mbuf_refcnt_set(m, 1);
> +
> return m;
> }
> return NULL;
Do we need to make this change to prefree_seg? If we update the detach
function to set the next point to null on detaching a segment, and if we
change the "free" function which frees a whole chain of mbufs, we should
be covered, should we not? If we are freeing a standalone segment, that
segment should already have it's nb_segs and next pointers correct.
/Bruce
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 3/9] mbuf: set mbuf fields while in pool
2017-03-31 11:21 ` Bruce Richardson
@ 2017-03-31 11:51 ` Ananyev, Konstantin
0 siblings, 0 replies; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-31 11:51 UTC (permalink / raw)
To: Richardson, Bruce, Olivier Matz
Cc: dev, mb, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
> -----Original Message-----
> From: Richardson, Bruce
> Sent: Friday, March 31, 2017 12:22 PM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> Subject: Re: [PATCH 3/9] mbuf: set mbuf fields while in pool
>
> On Wed, Mar 08, 2017 at 10:41:55AM +0100, Olivier Matz wrote:
> > Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
> > to NULL when the mbuf is stored inside the mempool (unused).
> > This is done in rte_pktmbuf_prefree_seg(), before freeing or
> > recycling a mbuf.
> >
> > Before this patch, the value of m->refcnt was expected to be 0
> > while in pool.
> >
> > The objectives are:
> >
> > - to avoid drivers to set m->next to NULL in the early Rx path, since
> > this field is in the second 64B of the mbuf and its access could
> > trigger a cache miss
> >
> > - rationalize the behavior of raw_alloc/raw_free: one is now the
> > symmetric of the other, and refcnt is never changed in these functions.
> >
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> > drivers/net/mlx5/mlx5_rxtx.c | 5 ++---
> > drivers/net/mpipe/mpipe_tilegx.c | 1 +
> > lib/librte_mbuf/rte_mbuf.c | 2 ++
> > lib/librte_mbuf/rte_mbuf.h | 42 +++++++++++++++++++++++++++++-----------
> > 4 files changed, 36 insertions(+), 14 deletions(-)
> >
> <snip>
> > /**
> > @@ -1244,9 +1262,13 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > __rte_mbuf_sanity_check(m, 0);
> >
> > if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
> > - /* if this is an indirect mbuf, it is detached. */
> > if (RTE_MBUF_INDIRECT(m))
> > rte_pktmbuf_detach(m);
> > +
> > + m->next = NULL;
> > + m->nb_segs = 1;
> > + rte_mbuf_refcnt_set(m, 1);
> > +
> > return m;
> > }
> > return NULL;
>
> Do we need to make this change to prefree_seg? If we update the detach
> function to set the next point to null on detaching a segment, and if we
> change the "free" function which frees a whole chain of mbufs, we should
> be covered, should we not? If we are freeing a standalone segment, that
> segment should already have it's nb_segs and next pointers correct.
detach() is invoked only for indirect mbufs.
We can have a chain of direct mbufs too.
About free() - most PMD use either rte_pktmbuf_free_seg()
or rte_pktmbuf_prefree_seg();rte_mempool_put_bulk(); directly.
Konstantin
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 4/9] drivers/net: don't touch mbuf next or nb segs on Rx
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (2 preceding siblings ...)
2017-03-08 9:41 ` [dpdk-dev] [PATCH 3/9] mbuf: set mbuf fields while in pool Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 5/9] mbuf: make rearm data address naturally aligned Olivier Matz
` (8 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Now that the m->next pointer and m->nb_segs is expected to be set (to
NULL and 1 respectively) after a mempool_get(), we can avoid to write them
in the Rx functions of drivers.
Only some drivers are patched, it's not an exhaustive patch. It gives
the idea to do the same in other drivers.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/i40e/i40e_rxtx_vec_sse.c | 6 ------
drivers/net/ixgbe/ixgbe_rxtx.c | 8 --------
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 6 ------
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 6 ------
drivers/net/null/rte_eth_null.c | 2 --
drivers/net/virtio/virtio_rxtx.c | 4 ----
6 files changed, 32 deletions(-)
diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c
index b95cc8e..2f861fd 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
@@ -424,12 +424,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
/* store the resulting 32-bit value */
*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
split_packet += RTE_I40E_DESCS_PER_LOOP;
-
- /* zero-out next pointers */
- rx_pkts[pos]->next = NULL;
- rx_pkts[pos + 1]->next = NULL;
- rx_pkts[pos + 2]->next = NULL;
- rx_pkts[pos + 3]->next = NULL;
}
/* C.3 calc available number of desc */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b056107..813c494 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1556,8 +1556,6 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
/* populate the static rte mbuf fields */
mb = rxep[i].mbuf;
if (reset_mbuf) {
- mb->next = NULL;
- mb->nb_segs = 1;
mb->port = rxq->port_id;
}
@@ -2165,12 +2163,6 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
goto next_desc;
}
- /*
- * This is the last buffer of the received packet - return
- * the current cluster to the user.
- */
- rxm->next = NULL;
-
/* Initialize the first mbuf of the returned packet */
ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq, staterr);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index e2715cb..2c04161 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -330,12 +330,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
*(int *)split_packet = ~stat & IXGBE_VPMD_DESC_EOP_MASK;
split_packet += RTE_IXGBE_DESCS_PER_LOOP;
-
- /* zero-out next pointers */
- rx_pkts[pos]->next = NULL;
- rx_pkts[pos + 1]->next = NULL;
- rx_pkts[pos + 2]->next = NULL;
- rx_pkts[pos + 3]->next = NULL;
}
rte_prefetch_non_temporal(rxdp + RTE_IXGBE_DESCS_PER_LOOP);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index abbf284..65c5da3 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -425,12 +425,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
/* store the resulting 32-bit value */
*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
split_packet += RTE_IXGBE_DESCS_PER_LOOP;
-
- /* zero-out next pointers */
- rx_pkts[pos]->next = NULL;
- rx_pkts[pos + 1]->next = NULL;
- rx_pkts[pos + 2]->next = NULL;
- rx_pkts[pos + 3]->next = NULL;
}
/* C.3 calc available number of desc */
diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index 57203e2..7e14da0 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -112,8 +112,6 @@ eth_null_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
break;
bufs[i]->data_len = (uint16_t)packet_size;
bufs[i]->pkt_len = packet_size;
- bufs[i]->nb_segs = 1;
- bufs[i]->next = NULL;
bufs[i]->port = h->internals->port_id;
}
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index cab6e8f..b3e6d80 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -772,8 +772,6 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
rxm->ol_flags = 0;
rxm->vlan_tci = 0;
- rxm->nb_segs = 1;
- rxm->next = NULL;
rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
rxm->data_len = (uint16_t)(len[i] - hdr_size);
@@ -900,7 +898,6 @@ virtio_recv_mergeable_pkts(void *rx_queue,
rxm->data_off = RTE_PKTMBUF_HEADROOM;
rxm->nb_segs = seg_num;
- rxm->next = NULL;
rxm->ol_flags = 0;
rxm->vlan_tci = 0;
rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
@@ -945,7 +942,6 @@ virtio_recv_mergeable_pkts(void *rx_queue,
rxm = rcv_pkts[extra_idx];
rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
- rxm->next = NULL;
rxm->pkt_len = (uint32_t)(len[extra_idx]);
rxm->data_len = (uint16_t)(len[extra_idx]);
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 5/9] mbuf: make rearm data address naturally aligned
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (3 preceding siblings ...)
2017-03-08 9:41 ` [dpdk-dev] [PATCH 4/9] drivers/net: don't touch mbuf next or nb segs on Rx Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 6/9] mbuf: use 2 bytes for port and nb segments Olivier Matz
` (7 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, Jerin Jacob
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To avoid multiple stores on fast path, Ethernet drivers
aggregate the writes to data_off, refcnt, nb_segs and port
to an uint64_t data and write the data in one shot
with uint64_t* at &mbuf->rearm_data address.
Some of the non-IA platforms have store operation overhead
if the store address is not naturally aligned.This patch
fixes the performance issue on those targets.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/fm10k/fm10k_rxtx_vec.c | 3 ---
drivers/net/i40e/i40e_rxtx_vec_sse.c | 5 +----
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 3 ---
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 ---
lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 3 +--
lib/librte_mbuf/rte_mbuf.h | 6 +++---
6 files changed, 5 insertions(+), 18 deletions(-)
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 825e3c1..61a65e9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -324,9 +324,6 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
/* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
p0 = (uintptr_t)&mb0->rearm_data;
*(uint64_t *)p0 = rxq->mbuf_initializer;
diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c
index 2f861fd..e17235a 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
@@ -87,11 +87,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
mb0 = rxep[0].mbuf;
mb1 = rxep[1].mbuf;
- /* Flush mbuf with pkt template.
+ /* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
p0 = (uintptr_t)&mb0->rearm_data;
*(uint64_t *)p0 = rxq->mbuf_initializer;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 2c04161..bc8924f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -85,9 +85,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
/*
* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
vst1_u8((uint8_t *)&mb0->rearm_data, p);
paddr = mb0->buf_physaddr + RTE_PKTMBUF_HEADROOM;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index 65c5da3..62afe31 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -90,9 +90,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
/*
* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
p0 = (uintptr_t)&mb0->rearm_data;
*(uint64_t *)p0 = rxq->mbuf_initializer;
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 09713b0..f24f79f 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -116,11 +116,10 @@ struct rte_kni_fifo {
struct rte_kni_mbuf {
void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
uint64_t buf_physaddr;
- char pad0[2];
uint16_t data_off; /**< Start address of data in segment buffer. */
char pad1[2];
uint8_t nb_segs; /**< Number of segments. */
- char pad4[1];
+ char pad4[3];
uint64_t ol_flags; /**< Offload features. */
char pad2[4];
uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index b4fe786..4dc9a20 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -393,10 +393,8 @@ struct rte_mbuf {
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
- uint16_t buf_len; /**< Length of segment buffer. */
-
/* next 6 bytes are initialised on RX descriptor rearm */
- MARKER8 rearm_data;
+ MARKER64 rearm_data;
uint16_t data_off;
/**
@@ -414,6 +412,7 @@ struct rte_mbuf {
};
uint8_t nb_segs; /**< Number of segments. */
uint8_t port; /**< Input port. */
+ uint16_t pad; /**< 2B pad for naturally aligned ol_flags */
uint64_t ol_flags; /**< Offload features. */
@@ -474,6 +473,7 @@ struct rte_mbuf {
/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
uint16_t vlan_tci_outer;
+ uint16_t buf_len; /**< Length of segment buffer. */
/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_min_aligned;
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 6/9] mbuf: use 2 bytes for port and nb segments
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (4 preceding siblings ...)
2017-03-08 9:41 ` [dpdk-dev] [PATCH 5/9] mbuf: make rearm data address naturally aligned Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-08 9:41 ` [dpdk-dev] [PATCH 7/9] mbuf: move sequence number in second cache line Olivier Matz
` (6 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Change the size of m->port and m->nb_segs to 16 bits. It is now possible
to reference a port identifier larger than 256 and have a mbuf chain
larger than 256 segments.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
app/test-pmd/csumonly.c | 4 ++--
.../linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++--
lib/librte_mbuf/rte_mbuf.h | 12 +++++++-----
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 88cc842..5eaff9b 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -583,7 +583,7 @@ pkt_copy_split(const struct rte_mbuf *pkt)
rc = mbuf_copy_split(pkt, md, seglen, nb_seg);
if (rc < 0)
RTE_LOG(ERR, USER1,
- "mbuf_copy_split for %p(len=%u, nb_seg=%hhu) "
+ "mbuf_copy_split for %p(len=%u, nb_seg=%u) "
"into %u segments failed with error code: %d\n",
pkt, pkt->pkt_len, pkt->nb_segs, nb_seg, rc);
@@ -801,7 +801,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
char buf[256];
printf("-----------------\n");
- printf("port=%u, mbuf=%p, pkt_len=%u, nb_segs=%hhu:\n",
+ printf("port=%u, mbuf=%p, pkt_len=%u, nb_segs=%u:\n",
fs->rx_port, m, m->pkt_len, m->nb_segs);
/* dump rx parsed packet info */
rte_get_rx_ol_flag_list(rx_ol_flags, buf, sizeof(buf));
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index f24f79f..2ac879f 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -118,8 +118,8 @@ struct rte_kni_mbuf {
uint64_t buf_physaddr;
uint16_t data_off; /**< Start address of data in segment buffer. */
char pad1[2];
- uint8_t nb_segs; /**< Number of segments. */
- char pad4[3];
+ uint16_t nb_segs; /**< Number of segments. */
+ char pad4[2];
uint64_t ol_flags; /**< Offload features. */
char pad2[4];
uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 4dc9a20..45cd6b9 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -393,12 +393,13 @@ struct rte_mbuf {
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
- /* next 6 bytes are initialised on RX descriptor rearm */
+ /* next 8 bytes are initialised on RX descriptor rearm */
MARKER64 rearm_data;
uint16_t data_off;
/**
- * 16-bit Reference counter.
+ * Reference counter. Its size should at least equal to the size
+ * of port field (16 bits), to support zero-copy broadcast.
* It should only be accessed using the following functions:
* rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
* rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
@@ -410,9 +411,10 @@ struct rte_mbuf {
rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
uint16_t refcnt; /**< Non-atomically accessed refcnt */
};
- uint8_t nb_segs; /**< Number of segments. */
- uint8_t port; /**< Input port. */
- uint16_t pad; /**< 2B pad for naturally aligned ol_flags */
+ uint16_t nb_segs; /**< Number of segments. */
+
+ /** Input port (16 bits to support more than 256 virtual ports). */
+ uint16_t port;
uint64_t ol_flags; /**< Offload features. */
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 7/9] mbuf: move sequence number in second cache line
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (5 preceding siblings ...)
2017-03-08 9:41 ` [dpdk-dev] [PATCH 6/9] mbuf: use 2 bytes for port and nb segments Olivier Matz
@ 2017-03-08 9:41 ` Olivier Matz
2017-03-08 9:42 ` [dpdk-dev] [PATCH 8/9] mbuf: add a timestamp field Olivier Matz
` (5 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:41 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Move this field in the second cache line, since no driver use it
in Rx path. The freed space will be used by a timestamp in next
commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
lib/librte_mbuf/rte_mbuf.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 45cd6b9..c75a62a 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -470,8 +470,6 @@ struct rte_mbuf {
uint32_t usr; /**< User defined tags. See rte_distributor_process() */
} hash; /**< hash information */
- uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
-
/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
uint16_t vlan_tci_outer;
@@ -516,6 +514,10 @@ struct rte_mbuf {
/** Timesync flags for use with IEEE1588. */
uint16_t timesync;
+
+ /** Sequence number. See also rte_reorder_insert(). */
+ uint32_t seqn;
+
} __rte_cache_aligned;
/**
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 8/9] mbuf: add a timestamp field
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (6 preceding siblings ...)
2017-03-08 9:41 ` [dpdk-dev] [PATCH 7/9] mbuf: move sequence number in second cache line Olivier Matz
@ 2017-03-08 9:42 ` Olivier Matz
2017-04-04 10:29 ` [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
` (2 more replies)
2017-03-08 9:42 ` [dpdk-dev] [PATCH 9/9] mbuf: reorder VLAN tci and buffer len fields Olivier Matz
` (4 subsequent siblings)
12 siblings, 3 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:42 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
The field itself is not fully described yet, but this commit reserves
the room in the mbuf.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
lib/librte_mbuf/rte_mbuf.c | 2 ++
lib/librte_mbuf/rte_mbuf.h | 12 ++++++++++++
2 files changed, 14 insertions(+)
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 0acc810..f679bce 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -322,6 +322,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
case PKT_RX_LRO: return "PKT_RX_LRO";
+ case PKT_RX_TIMESTAMP: return "PKT_RX_TIMESTAMP";
default: return NULL;
}
}
@@ -356,6 +357,7 @@ rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
{ PKT_RX_IEEE1588_TMST, PKT_RX_IEEE1588_TMST, NULL },
{ PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ_STRIPPED, NULL },
{ PKT_RX_LRO, PKT_RX_LRO, NULL },
+ { PKT_RX_TIMESTAMP, PKT_RX_TIMESTAMP, NULL },
};
const char *name;
unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c75a62a..fd97bd3 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -177,6 +177,11 @@ extern "C" {
*/
#define PKT_RX_LRO (1ULL << 16)
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP (1ULL << 17)
+
/* add new RX flags here */
/* add new TX flags here */
@@ -474,6 +479,12 @@ struct rte_mbuf {
uint16_t vlan_tci_outer;
uint16_t buf_len; /**< Length of segment buffer. */
+
+ /** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+ * are not normalized but are always the same for a given port.
+ */
+ uint64_t timestamp;
+
/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_min_aligned;
@@ -1201,6 +1212,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
mi->nb_segs = 1;
mi->ol_flags = m->ol_flags | IND_ATTACHED_MBUF;
mi->packet_type = m->packet_type;
+ mi->timestamp = m->timestamp;
__rte_mbuf_sanity_check(mi, 1);
__rte_mbuf_sanity_check(m, 0);
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX
2017-03-08 9:42 ` [dpdk-dev] [PATCH 8/9] mbuf: add a timestamp field Olivier Matz
@ 2017-04-04 10:29 ` Konstantin Ananyev
2017-04-07 15:13 ` Ferruh Yigit
2017-04-04 10:29 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
2017-04-04 10:29 ` [dpdk-dev] [PATCH " Konstantin Ananyev
2 siblings, 1 reply; 155+ messages in thread
From: Konstantin Ananyev @ 2017-04-04 10:29 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
Pretty much the same as one from Bruce:
http://dpdk.org/ml/archives/dev/2017-April/062936.html
but now for ixgbe.
Based on Olivier's mbuf rework patchset, and makes some
improvement to the ixgbe driver taking account of the rework.
It also removes a build-time option that seems unnecessary.
Depends on: http://dpdk.org/ml/archives/dev/2017-March/059693.html
Konstantin Ananyev (2):
net/ixgbe: eliminate mbuf write on rearm
net/ixgbe: remove option to disable offload flags
config/common_base | 1 -
doc/guides/nics/ixgbe.rst | 18 ----------
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 7 ----
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 11 ------
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 58 +++++++++++++------------------
5 files changed, 24 insertions(+), 71 deletions(-)
--
2.5.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX
2017-04-04 10:29 ` [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
@ 2017-04-07 15:13 ` Ferruh Yigit
2017-04-07 15:44 ` Ferruh Yigit
0 siblings, 1 reply; 155+ messages in thread
From: Ferruh Yigit @ 2017-04-07 15:13 UTC (permalink / raw)
To: Konstantin Ananyev, dev
On 4/4/2017 11:29 AM, Konstantin Ananyev wrote:
> Pretty much the same as one from Bruce:
> http://dpdk.org/ml/archives/dev/2017-April/062936.html
> but now for ixgbe.
> Based on Olivier's mbuf rework patchset, and makes some
> improvement to the ixgbe driver taking account of the rework.
> It also removes a build-time option that seems unnecessary.
>
> Depends on: http://dpdk.org/ml/archives/dev/2017-March/059693.html
>
> Konstantin Ananyev (2):
> net/ixgbe: eliminate mbuf write on rearm
> net/ixgbe: remove option to disable offload flags
>
> config/common_base | 1 -
> doc/guides/nics/ixgbe.rst | 18 ----------
> drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 7 ----
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 11 ------
> drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 58 +++++++++++++------------------
> 5 files changed, 24 insertions(+), 71 deletions(-)
Series applied to dpdk-next-net/master, thanks.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX
2017-04-07 15:13 ` Ferruh Yigit
@ 2017-04-07 15:44 ` Ferruh Yigit
2017-04-09 22:56 ` Ananyev, Konstantin
0 siblings, 1 reply; 155+ messages in thread
From: Ferruh Yigit @ 2017-04-07 15:44 UTC (permalink / raw)
To: Konstantin Ananyev; +Cc: DPDK
On 4/7/2017 4:13 PM, Ferruh Yigit wrote:
> On 4/4/2017 11:29 AM, Konstantin Ananyev wrote:
>> Pretty much the same as one from Bruce:
>> http://dpdk.org/ml/archives/dev/2017-April/062936.html
>> but now for ixgbe.
>> Based on Olivier's mbuf rework patchset, and makes some
>> improvement to the ixgbe driver taking account of the rework.
>> It also removes a build-time option that seems unnecessary.
>>
>> Depends on: http://dpdk.org/ml/archives/dev/2017-March/059693.html
>>
>> Konstantin Ananyev (2):
>> net/ixgbe: eliminate mbuf write on rearm
>> net/ixgbe: remove option to disable offload flags
>>
>> config/common_base | 1 -
>> doc/guides/nics/ixgbe.rst | 18 ----------
>> drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 7 ----
>> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 11 ------
>> drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 58 +++++++++++++------------------
>> 5 files changed, 24 insertions(+), 71 deletions(-)
>
> Series applied to dpdk-next-net/master, thanks.
Hi Konstantin,
I talked a little early, getting following build error [1] with
"default" machine type. Patches dropped from tree for now.
[1]
...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:217:11: error: implicit
declaration of function '_mm_blend_epi16' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 8), 0x10);
^
...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:217:9: error: assigning to
'__m128i' (vector of 2 'long long' values) from incompatible type 'int'
rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 8), 0x10);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:218:9: error: assigning to
'__m128i' (vector of 2 'long long' values) from incompatible type 'int'
rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 6), 0x10);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:219:9: error: assigning to
'__m128i' (vector of 2 'long long' values) from incompatible type 'int'
rearm2 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 4), 0x10);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:220:9: error: assigning to
'__m128i' (vector of 2 'long long' values) from incompatible type 'int'
rearm3 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 2), 0x10);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX
2017-04-07 15:44 ` Ferruh Yigit
@ 2017-04-09 22:56 ` Ananyev, Konstantin
0 siblings, 0 replies; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-04-09 22:56 UTC (permalink / raw)
To: Yigit, Ferruh; +Cc: DPDK
Hi Ferruh,
>
> Hi Konstantin,
>
> I talked a little early, getting following build error [1] with
> "default" machine type. Patches dropped from tree for now.
My bad, forgot to check with 'default'
Thanks for flagging that, will update and resend v2.
Konstantin
>
> [1]
> ...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:217:11: error: implicit
> declaration of function '_mm_blend_epi16' is invalid in C99
> [-Werror,-Wimplicit-function-declaration]
> rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 8), 0x10);
> ^
> ...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:217:9: error: assigning to
> '__m128i' (vector of 2 'long long' values) from incompatible type 'int'
> rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 8), 0x10);
> ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:218:9: error: assigning to
> '__m128i' (vector of 2 'long long' values) from incompatible type 'int'
> rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 6), 0x10);
> ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:219:9: error: assigning to
> '__m128i' (vector of 2 'long long' values) from incompatible type 'int'
> rearm2 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 4), 0x10);
> ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ...drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:220:9: error: assigning to
> '__m128i' (vector of 2 'long long' values) from incompatible type 'int'
> rearm3 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 2), 0x10);
> ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 1/2] net/ixgbe: eliminate mbuf write on rearm
2017-03-08 9:42 ` [dpdk-dev] [PATCH 8/9] mbuf: add a timestamp field Olivier Matz
2017-04-04 10:29 ` [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
@ 2017-04-04 10:29 ` Konstantin Ananyev
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
` (2 more replies)
2017-04-04 10:29 ` [dpdk-dev] [PATCH " Konstantin Ananyev
2 siblings, 3 replies; 155+ messages in thread
From: Konstantin Ananyev @ 2017-04-04 10:29 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
With the mbuf rework, we now have 8 contiguous bytes to be rearmed in the
mbuf just before the 8-bytes of olflags. If we don't do the rearm write
inside the descriptor ring replenishment function, and delay it to
receiving the packet, we can do a single 16B write inside the RX function
to set both the rearm data, and the flags together.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 57 ++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index 62afe31..49536c1 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -82,19 +82,23 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
/* Initialize the mbufs in vector, process 2 mbufs in one loop */
for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
__m128i vaddr0, vaddr1;
- uintptr_t p0, p1;
mb0 = rxep[0].mbuf;
mb1 = rxep[1].mbuf;
- /*
- * Flush mbuf with pkt template.
- * Data to be rearmed is 6 bytes long.
- */
- p0 = (uintptr_t)&mb0->rearm_data;
- *(uint64_t *)p0 = rxq->mbuf_initializer;
- p1 = (uintptr_t)&mb1->rearm_data;
- *(uint64_t *)p1 = rxq->mbuf_initializer;
+#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
+ {
+ uintptr_t p0, p1;
+ /*
+ * Flush mbuf with pkt template.
+ * Data to be rearmed is 6 bytes long.
+ */
+ p0 = (uintptr_t)&mb0->rearm_data;
+ *(uint64_t *)p0 = rxq->mbuf_initializer;
+ p1 = (uintptr_t)&mb1->rearm_data;
+ *(uint64_t *)p1 = rxq->mbuf_initializer;
+ }
+#endif
/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
@@ -139,14 +143,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
static inline void
-desc_to_olflags_v(__m128i descs[4], uint8_t vlan_flags,
+desc_to_olflags_v(__m128i descs[4], __m128i mbuf_init, uint8_t vlan_flags,
struct rte_mbuf **rx_pkts)
{
__m128i ptype0, ptype1, vtag0, vtag1, csum;
- union {
- uint16_t e[4];
- uint64_t dword;
- } vol;
+ __m128i rearm0, rearm1, rearm2, rearm3;
/* mask everything except rss type */
const __m128i rsstype_msk = _mm_set_epi16(
@@ -225,12 +226,25 @@ desc_to_olflags_v(__m128i descs[4], uint8_t vlan_flags,
vtag1 = _mm_or_si128(vtag0, vtag1);
vtag1 = _mm_or_si128(ptype0, vtag1);
- vol.dword = _mm_cvtsi128_si64(vtag1);
- rx_pkts[0]->ol_flags = vol.e[0];
- rx_pkts[1]->ol_flags = vol.e[1];
- rx_pkts[2]->ol_flags = vol.e[2];
- rx_pkts[3]->ol_flags = vol.e[3];
+ /*
+ * At this point, we have the 4 sets of flags in the low 64-bits
+ * of vtag1 (4x16).
+ * We want to extract these, and merge them with the mbuf init data
+ * so we can do a single 16-byte write to the mbuf to set the flags
+ * and all the other initialization fields. Extracting the
+ * appropriate flags means that we have to do a shift and blend for
+ * each mbuf before we do the write.
+ */
+ rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 8), 0x10);
+ rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 6), 0x10);
+ rearm2 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 4), 0x10);
+ rearm3 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 2), 0x10);
+
+ _mm_store_si128((__m128i *)&rx_pkts[0]->rearm_data, rearm0);
+ _mm_store_si128((__m128i *)&rx_pkts[1]->rearm_data, rearm1);
+ _mm_store_si128((__m128i *)&rx_pkts[2]->rearm_data, rearm2);
+ _mm_store_si128((__m128i *)&rx_pkts[3]->rearm_data, rearm3);
}
#else
#define desc_to_olflags_v(desc, vlan_flags, rx_pkts) do { \
@@ -265,6 +279,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
0, 0 /* ignore pkt_type field */
);
__m128i dd_check, eop_check;
+ __m128i mbuf_init;
uint8_t vlan_flags;
/* nb_pkts shall be less equal than RTE_IXGBE_MAX_RX_BURST */
@@ -310,6 +325,8 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
0xFF, 0xFF
);
+ mbuf_init = _mm_set_epi64x(0, rxq->mbuf_initializer);
+
/* Cache is empty -> need to scan the buffer rings, but first move
* the next 'n' mbufs into the cache
*/
@@ -382,7 +399,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);
/* set ol_flags with vlan packet type */
- desc_to_olflags_v(descs, vlan_flags, &rx_pkts[pos]);
+ desc_to_olflags_v(descs, mbuf_init, vlan_flags, &rx_pkts[pos]);
/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
pkt_mb4 = _mm_add_epi16(pkt_mb4, crc_adjust);
--
2.5.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 0/2] reduce writes to mbuf in ixgbe vRX
2017-04-04 10:29 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
@ 2017-04-10 15:59 ` Konstantin Ananyev
2017-04-10 16:17 ` Ferruh Yigit
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 2/2] net/ixgbe: remove option to disable offload flags Konstantin Ananyev
2 siblings, 1 reply; 155+ messages in thread
From: Konstantin Ananyev @ 2017-04-10 15:59 UTC (permalink / raw)
To: dev; +Cc: jerin.jacob, jianbo.liu, Konstantin Ananyev
Based on Olivier's mbuf rework patchset, and makes some
improvement to the ixgbe driver taking account of the rework.
It also removes a build-time option that seems unnecessary.
changes in v2:
Fix build error for "default" cpu.
Fix build error when CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=n
Konstantin Ananyev (2):
net/ixgbe: eliminate mbuf write on rearm
net/ixgbe: remove option to disable offload flags
config/common_base | 1 -
doc/guides/nics/ixgbe.rst | 18 --------
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 7 ---
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 11 -----
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 73 +++++++++++++++++--------------
5 files changed, 39 insertions(+), 71 deletions(-)
--
2.5.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/2] reduce writes to mbuf in ixgbe vRX
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
@ 2017-04-10 16:17 ` Ferruh Yigit
0 siblings, 0 replies; 155+ messages in thread
From: Ferruh Yigit @ 2017-04-10 16:17 UTC (permalink / raw)
To: Konstantin Ananyev, dev; +Cc: jerin.jacob, jianbo.liu
On 4/10/2017 4:59 PM, Konstantin Ananyev wrote:
> Based on Olivier's mbuf rework patchset, and makes some
> improvement to the ixgbe driver taking account of the rework.
> It also removes a build-time option that seems unnecessary.
>
> changes in v2:
> Fix build error for "default" cpu.
> Fix build error when CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=n
>
> Konstantin Ananyev (2):
> net/ixgbe: eliminate mbuf write on rearm
> net/ixgbe: remove option to disable offload flags
Series applied to dpdk-next-net/master, thanks.
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 1/2] net/ixgbe: eliminate mbuf write on rearm
2017-04-04 10:29 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
@ 2017-04-10 15:59 ` Konstantin Ananyev
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 2/2] net/ixgbe: remove option to disable offload flags Konstantin Ananyev
2 siblings, 0 replies; 155+ messages in thread
From: Konstantin Ananyev @ 2017-04-10 15:59 UTC (permalink / raw)
To: dev; +Cc: jerin.jacob, jianbo.liu, Konstantin Ananyev
With the mbuf rework, we now have 8 contiguous bytes to be rearmed in the
mbuf just before the 8-bytes of olflags. If we don't do the rearm write
inside the descriptor ring replenishment function, and delay it to
receiving the packet, we can do a single 16B write inside the RX function
to set both the rearm data, and the flags together.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 72 ++++++++++++++++++++++++----------
1 file changed, 52 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index 62afe31..8e19b9d 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -82,19 +82,23 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
/* Initialize the mbufs in vector, process 2 mbufs in one loop */
for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
__m128i vaddr0, vaddr1;
- uintptr_t p0, p1;
mb0 = rxep[0].mbuf;
mb1 = rxep[1].mbuf;
- /*
- * Flush mbuf with pkt template.
- * Data to be rearmed is 6 bytes long.
- */
- p0 = (uintptr_t)&mb0->rearm_data;
- *(uint64_t *)p0 = rxq->mbuf_initializer;
- p1 = (uintptr_t)&mb1->rearm_data;
- *(uint64_t *)p1 = rxq->mbuf_initializer;
+#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
+ {
+ uintptr_t p0, p1;
+ /*
+ * Flush mbuf with pkt template.
+ * Data to be rearmed is 6 bytes long.
+ */
+ p0 = (uintptr_t)&mb0->rearm_data;
+ *(uint64_t *)p0 = rxq->mbuf_initializer;
+ p1 = (uintptr_t)&mb1->rearm_data;
+ *(uint64_t *)p1 = rxq->mbuf_initializer;
+ }
+#endif
/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
@@ -139,14 +143,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
static inline void
-desc_to_olflags_v(__m128i descs[4], uint8_t vlan_flags,
+desc_to_olflags_v(__m128i descs[4], __m128i mbuf_init, uint8_t vlan_flags,
struct rte_mbuf **rx_pkts)
{
__m128i ptype0, ptype1, vtag0, vtag1, csum;
- union {
- uint16_t e[4];
- uint64_t dword;
- } vol;
+ __m128i rearm0, rearm1, rearm2, rearm3;
/* mask everything except rss type */
const __m128i rsstype_msk = _mm_set_epi16(
@@ -225,12 +226,40 @@ desc_to_olflags_v(__m128i descs[4], uint8_t vlan_flags,
vtag1 = _mm_or_si128(vtag0, vtag1);
vtag1 = _mm_or_si128(ptype0, vtag1);
- vol.dword = _mm_cvtsi128_si64(vtag1);
- rx_pkts[0]->ol_flags = vol.e[0];
- rx_pkts[1]->ol_flags = vol.e[1];
- rx_pkts[2]->ol_flags = vol.e[2];
- rx_pkts[3]->ol_flags = vol.e[3];
+ /*
+ * At this point, we have the 4 sets of flags in the low 64-bits
+ * of vtag1 (4x16).
+ * We want to extract these, and merge them with the mbuf init data
+ * so we can do a single 16-byte write to the mbuf to set the flags
+ * and all the other initialization fields. Extracting the
+ * appropriate flags means that we have to do a shift and blend for
+ * each mbuf before we do the write.
+ */
+#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
+
+ rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 8), 0x10);
+ rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 6), 0x10);
+ rearm2 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 4), 0x10);
+ rearm3 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(vtag1, 2), 0x10);
+
+#else
+ rearm0 = _mm_slli_si128(vtag1, 14);
+ rearm1 = _mm_slli_si128(vtag1, 12);
+ rearm2 = _mm_slli_si128(vtag1, 10);
+ rearm3 = _mm_slli_si128(vtag1, 8);
+
+ rearm0 = _mm_or_si128(mbuf_init, _mm_srli_epi64(rearm0, 48));
+ rearm1 = _mm_or_si128(mbuf_init, _mm_srli_epi64(rearm1, 48));
+ rearm2 = _mm_or_si128(mbuf_init, _mm_srli_epi64(rearm2, 48));
+ rearm3 = _mm_or_si128(mbuf_init, _mm_srli_epi64(rearm3, 48));
+
+#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
+
+ _mm_store_si128((__m128i *)&rx_pkts[0]->rearm_data, rearm0);
+ _mm_store_si128((__m128i *)&rx_pkts[1]->rearm_data, rearm1);
+ _mm_store_si128((__m128i *)&rx_pkts[2]->rearm_data, rearm2);
+ _mm_store_si128((__m128i *)&rx_pkts[3]->rearm_data, rearm3);
}
#else
#define desc_to_olflags_v(desc, vlan_flags, rx_pkts) do { \
@@ -265,6 +294,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
0, 0 /* ignore pkt_type field */
);
__m128i dd_check, eop_check;
+ __m128i mbuf_init;
uint8_t vlan_flags;
/* nb_pkts shall be less equal than RTE_IXGBE_MAX_RX_BURST */
@@ -310,6 +340,8 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
0xFF, 0xFF
);
+ mbuf_init = _mm_set_epi64x(0, rxq->mbuf_initializer);
+
/* Cache is empty -> need to scan the buffer rings, but first move
* the next 'n' mbufs into the cache
*/
@@ -382,7 +414,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);
/* set ol_flags with vlan packet type */
- desc_to_olflags_v(descs, vlan_flags, &rx_pkts[pos]);
+ desc_to_olflags_v(descs, mbuf_init, vlan_flags, &rx_pkts[pos]);
/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
pkt_mb4 = _mm_add_epi16(pkt_mb4, crc_adjust);
--
2.5.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 2/2] net/ixgbe: remove option to disable offload flags
2017-04-04 10:29 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
2017-04-10 15:59 ` [dpdk-dev] [PATCH v2 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
@ 2017-04-10 15:59 ` Konstantin Ananyev
2 siblings, 0 replies; 155+ messages in thread
From: Konstantin Ananyev @ 2017-04-10 15:59 UTC (permalink / raw)
To: dev; +Cc: jerin.jacob, jianbo.liu, Konstantin Ananyev
Having packets received without any offload flags given in the mbuf is not
very useful, and performance tests with testpmd indicates little
benefit is got with the current code by turning off the flags. This makes
the build-time option pointless, so we can remove it.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
config/common_base | 1 -
doc/guides/nics/ixgbe.rst | 18 ------------------
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 7 -------
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 11 -----------
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 27 ---------------------------
5 files changed, 64 deletions(-)
diff --git a/config/common_base b/config/common_base
index fc75c63..8d9560f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -173,7 +173,6 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
CONFIG_RTE_IXGBE_INC_VECTOR=y
-CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
#
# Compile burst-oriented I40E PMD driver
diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 1a4aa48..130765b 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -95,9 +95,6 @@ Other features are supported using optional MACRO configuration. They include:
* HW extend dual VLAN
-* Enabled by RX_OLFLAGS (RTE_IXGBE_RX_OLFLAGS_ENABLE=y)
-
-
To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked:
* hw_vlan_strip
@@ -156,21 +153,6 @@ The declarations for the API functions are in the header ``rte_pmd_ixgbe.h``.
Sample Application Notes
------------------------
-testpmd
-~~~~~~~
-
-By default, using CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y:
-
-.. code-block:: console
-
- ./x86_64-native-linuxapp-gcc/app/testpmd -l 8-9 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01
-
-When CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=n, better performance can be achieved:
-
-.. code-block:: console
-
- ./x86_64-native-linuxapp-gcc/app/testpmd -l 8-9 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 --disable-hw-vlan
-
l3fwd
~~~~~
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
index a83afe5..1c34bb5 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
@@ -310,13 +310,6 @@ ixgbe_rx_vec_dev_conf_condition_check_default(struct rte_eth_dev *dev)
struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
-#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
- /* whithout rx ol_flags, no VP flag report */
- if (rxmode->hw_vlan_strip != 0 ||
- rxmode->hw_vlan_extend != 0)
- return -1;
-#endif
-
/* no fdir support */
if (fconf->mode != RTE_FDIR_MODE_NONE)
return -1;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index bc8924f..517b8c7 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -111,14 +111,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id);
}
-/* Handling the offload flags (olflags) field takes computation
- * time when receiving packets. Therefore we provide a flag to disable
- * the processing of the olflags field when they are not needed. This
- * gives improved performance, at the cost of losing the offload info
- * in the received packet
- */
-#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
-
#define VTAG_SHIFT (3)
static inline void
@@ -167,9 +159,6 @@ desc_to_olflags_v(uint8x16x2_t sterr_tmp1, uint8x16x2_t sterr_tmp2,
rx_pkts[2]->ol_flags = vol.e[2];
rx_pkts[3]->ol_flags = vol.e[3];
}
-#else
-#define desc_to_olflags_v(sterr_tmp1, sterr_tmp2, staterr, rx_pkts)
-#endif
/*
* vPMD raw receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index 8e19b9d..e091e7d 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -86,20 +86,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
mb0 = rxep[0].mbuf;
mb1 = rxep[1].mbuf;
-#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
- {
- uintptr_t p0, p1;
- /*
- * Flush mbuf with pkt template.
- * Data to be rearmed is 6 bytes long.
- */
- p0 = (uintptr_t)&mb0->rearm_data;
- *(uint64_t *)p0 = rxq->mbuf_initializer;
- p1 = (uintptr_t)&mb1->rearm_data;
- *(uint64_t *)p1 = rxq->mbuf_initializer;
- }
-#endif
-
/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
@@ -134,14 +120,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id);
}
-/* Handling the offload flags (olflags) field takes computation
- * time when receiving packets. Therefore we provide a flag to disable
- * the processing of the olflags field when they are not needed. This
- * gives improved performance, at the cost of losing the offload info
- * in the received packet
- */
-#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
-
static inline void
desc_to_olflags_v(__m128i descs[4], __m128i mbuf_init, uint8_t vlan_flags,
struct rte_mbuf **rx_pkts)
@@ -261,11 +239,6 @@ desc_to_olflags_v(__m128i descs[4], __m128i mbuf_init, uint8_t vlan_flags,
_mm_store_si128((__m128i *)&rx_pkts[2]->rearm_data, rearm2);
_mm_store_si128((__m128i *)&rx_pkts[3]->rearm_data, rearm3);
}
-#else
-#define desc_to_olflags_v(desc, vlan_flags, rx_pkts) do { \
- RTE_SET_USED(vlan_flags); \
- } while (0)
-#endif
/*
* vPMD raw receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
--
2.5.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 2/2] net/ixgbe: remove option to disable offload flags
2017-03-08 9:42 ` [dpdk-dev] [PATCH 8/9] mbuf: add a timestamp field Olivier Matz
2017-04-04 10:29 ` [dpdk-dev] [PATCH 0/2] reduce writes to mbuf in ixgbe vRX Konstantin Ananyev
2017-04-04 10:29 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: eliminate mbuf write on rearm Konstantin Ananyev
@ 2017-04-04 10:29 ` Konstantin Ananyev
2 siblings, 0 replies; 155+ messages in thread
From: Konstantin Ananyev @ 2017-04-04 10:29 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
Having packets received without any offload flags given in the mbuf is not
very useful, and performance tests with testpmd indicates little
benefit is got with the current code by turning off the flags. This makes
the build-time option pointless, so we can remove it.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
config/common_base | 1 -
doc/guides/nics/ixgbe.rst | 18 ------------------
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 7 -------
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 11 -----------
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 27 ---------------------------
5 files changed, 64 deletions(-)
diff --git a/config/common_base b/config/common_base
index fc75c63..8d9560f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -173,7 +173,6 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
CONFIG_RTE_IXGBE_INC_VECTOR=y
-CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
#
# Compile burst-oriented I40E PMD driver
diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 1a4aa48..130765b 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -95,9 +95,6 @@ Other features are supported using optional MACRO configuration. They include:
* HW extend dual VLAN
-* Enabled by RX_OLFLAGS (RTE_IXGBE_RX_OLFLAGS_ENABLE=y)
-
-
To guarantee the constraint, configuration flags in dev_conf.rxmode will be checked:
* hw_vlan_strip
@@ -156,21 +153,6 @@ The declarations for the API functions are in the header ``rte_pmd_ixgbe.h``.
Sample Application Notes
------------------------
-testpmd
-~~~~~~~
-
-By default, using CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y:
-
-.. code-block:: console
-
- ./x86_64-native-linuxapp-gcc/app/testpmd -l 8-9 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01
-
-When CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=n, better performance can be achieved:
-
-.. code-block:: console
-
- ./x86_64-native-linuxapp-gcc/app/testpmd -l 8-9 -n 4 -- -i --burst=32 --rxfreet=32 --mbcache=250 --txpt=32 --rxht=8 --rxwt=0 --txfreet=32 --txrst=32 --txqflags=0xf01 --disable-hw-vlan
-
l3fwd
~~~~~
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
index a83afe5..1c34bb5 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
@@ -310,13 +310,6 @@ ixgbe_rx_vec_dev_conf_condition_check_default(struct rte_eth_dev *dev)
struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
-#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
- /* whithout rx ol_flags, no VP flag report */
- if (rxmode->hw_vlan_strip != 0 ||
- rxmode->hw_vlan_extend != 0)
- return -1;
-#endif
-
/* no fdir support */
if (fconf->mode != RTE_FDIR_MODE_NONE)
return -1;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index bc8924f..517b8c7 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -111,14 +111,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id);
}
-/* Handling the offload flags (olflags) field takes computation
- * time when receiving packets. Therefore we provide a flag to disable
- * the processing of the olflags field when they are not needed. This
- * gives improved performance, at the cost of losing the offload info
- * in the received packet
- */
-#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
-
#define VTAG_SHIFT (3)
static inline void
@@ -167,9 +159,6 @@ desc_to_olflags_v(uint8x16x2_t sterr_tmp1, uint8x16x2_t sterr_tmp2,
rx_pkts[2]->ol_flags = vol.e[2];
rx_pkts[3]->ol_flags = vol.e[3];
}
-#else
-#define desc_to_olflags_v(sterr_tmp1, sterr_tmp2, staterr, rx_pkts)
-#endif
/*
* vPMD raw receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index 49536c1..28c0ca6 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -86,20 +86,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
mb0 = rxep[0].mbuf;
mb1 = rxep[1].mbuf;
-#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
- {
- uintptr_t p0, p1;
- /*
- * Flush mbuf with pkt template.
- * Data to be rearmed is 6 bytes long.
- */
- p0 = (uintptr_t)&mb0->rearm_data;
- *(uint64_t *)p0 = rxq->mbuf_initializer;
- p1 = (uintptr_t)&mb1->rearm_data;
- *(uint64_t *)p1 = rxq->mbuf_initializer;
- }
-#endif
-
/* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
@@ -134,14 +120,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
IXGBE_PCI_REG_WRITE(rxq->rdt_reg_addr, rx_id);
}
-/* Handling the offload flags (olflags) field takes computation
- * time when receiving packets. Therefore we provide a flag to disable
- * the processing of the olflags field when they are not needed. This
- * gives improved performance, at the cost of losing the offload info
- * in the received packet
- */
-#ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
-
static inline void
desc_to_olflags_v(__m128i descs[4], __m128i mbuf_init, uint8_t vlan_flags,
struct rte_mbuf **rx_pkts)
@@ -246,11 +224,6 @@ desc_to_olflags_v(__m128i descs[4], __m128i mbuf_init, uint8_t vlan_flags,
_mm_store_si128((__m128i *)&rx_pkts[2]->rearm_data, rearm2);
_mm_store_si128((__m128i *)&rx_pkts[3]->rearm_data, rearm3);
}
-#else
-#define desc_to_olflags_v(desc, vlan_flags, rx_pkts) do { \
- RTE_SET_USED(vlan_flags); \
- } while (0)
-#endif
/*
* vPMD raw receive routine, only accept(nb_pkts >= RTE_IXGBE_DESCS_PER_LOOP)
--
2.5.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH 9/9] mbuf: reorder VLAN tci and buffer len fields
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (7 preceding siblings ...)
2017-03-08 9:42 ` [dpdk-dev] [PATCH 8/9] mbuf: add a timestamp field Olivier Matz
@ 2017-03-08 9:42 ` Olivier Matz
2017-03-29 15:56 ` [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization Olivier Matz
` (3 subsequent siblings)
12 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-08 9:42 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Move the vlan_tci field near vlan_tci_outer and buf_len near data_len
for more consistency. It opens the door for get/set of the 2 vlan tci at
the same time.
Suggested-by: Andrey Chilikin <andrey.chilikin@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
lib/librte_mbuf/rte_mbuf.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index fd97bd3..ada98d5 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -449,8 +449,7 @@ struct rte_mbuf {
uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
uint16_t data_len; /**< Amount of data in segment buffer. */
- /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
- uint16_t vlan_tci;
+ uint16_t buf_len; /**< Size of segment buffer. */
union {
uint32_t rss; /**< RSS hash result if RSS enabled */
@@ -475,11 +474,11 @@ struct rte_mbuf {
uint32_t usr; /**< User defined tags. See rte_distributor_process() */
} hash; /**< hash information */
+ /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+ uint16_t vlan_tci;
/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
uint16_t vlan_tci_outer;
- uint16_t buf_len; /**< Length of segment buffer. */
-
/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
* are not normalized but are always the same for a given port.
*/
--
2.8.1
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (8 preceding siblings ...)
2017-03-08 9:42 ` [dpdk-dev] [PATCH 9/9] mbuf: reorder VLAN tci and buffer len fields Olivier Matz
@ 2017-03-29 15:56 ` Olivier Matz
2017-03-29 16:03 ` Morten Brørup
` (2 more replies)
2017-03-30 14:54 ` Andrew Rybchenko
` (2 subsequent siblings)
12 siblings, 3 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-29 15:56 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
Hi,
Does anyone have any other comment on this series?
Can it be applied?
Thanks,
Olivier
On Wed, 8 Mar 2017 10:41:52 +0100, Olivier Matz <olivier.matz@6wind.com> wrote:
> Based on discussions done in [1] and in this thread, this patchset reorganizes
> the mbuf.
>
> The main changes are:
> - reorder structure to increase vector performance on some non-ia
> platforms.
> - add a 64bits timestamp field in the 1st cache line. This timestamp
> is not normalized, i.e. no unit or time reference is enforced. A
> library may be added to do this job in the future.
> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> in the pool, avoiding the need of setting m->next (located in the
> 2nd cache line) in the Rx path for mono-segment packets.
> - change port and nb_segs to 16 bits
> - move seqn in the 2nd cache line
>
> Things discussed but not done in the patchset:
> - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> them in the Rx path, so it could introduce a performance regression, or
> it would require to change all the drivers, which is not an easy task.
> - remove the m->port field: too much impact on many examples and libraries,
> and some people highlighted they are using it.
> - moving m->next in the 1st cache line: there is not enough room, and having
> it set to NULL for unused mbuf should remove the need for it.
> - merge seqn and timestamp together in a union: we could imagine use cases
> were both are activated. There is no flag indicating the presence of seqn,
> so it looks preferable to keep them separated for now.
>
> I made some basic performance tests (ixgbe) and see no regression.
> Other tests from NIC vendors are welcome.
>
> Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> idea of what could be done.
>
> [1] http://dpdk.org/ml/archives/dev/2016-October/049338.html
>
> rfc->v1:
> - fix reset of mbuf fields in case of indirect mbuf in rte_pktmbuf_prefree_seg()
> - do not enforce a unit or time reference for m->timestamp
> - reorganize fields to make vlan and outer vlan consecutive
> - enhance documentation of m->refcnt and m->port to explain why they are 16bits
>
> Jerin Jacob (1):
> mbuf: make rearm data address naturally aligned
>
> Olivier Matz (8):
> mbuf: make segment prefree function public
> mbuf: make raw free function public
> mbuf: set mbuf fields while in pool
> drivers/net: don't touch mbuf next or nb segs on Rx
> mbuf: use 2 bytes for port and nb segments
> mbuf: move sequence number in second cache line
> mbuf: add a timestamp field
> mbuf: reorder VLAN tci and buffer len fields
>
> app/test-pmd/csumonly.c | 4 +-
> drivers/net/ena/ena_ethdev.c | 2 +-
> drivers/net/enic/enic_rxtx.c | 2 +-
> drivers/net/fm10k/fm10k_rxtx.c | 6 +-
> drivers/net/fm10k/fm10k_rxtx_vec.c | 9 +-
> drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +-
> drivers/net/i40e/i40e_rxtx_vec_sse.c | 11 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 10 +-
> drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +-
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 9 --
> drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 9 --
> drivers/net/mlx5/mlx5_rxtx.c | 11 +-
> drivers/net/mpipe/mpipe_tilegx.c | 3 +-
> drivers/net/null/rte_eth_null.c | 2 -
> drivers/net/virtio/virtio_rxtx.c | 4 -
> drivers/net/virtio/virtio_rxtx_simple.h | 6 +-
> .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 +-
> lib/librte_mbuf/rte_mbuf.c | 4 +
> lib/librte_mbuf/rte_mbuf.h | 123 ++++++++++++++++-----
> 19 files changed, 130 insertions(+), 102 deletions(-)
>
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-29 15:56 ` [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization Olivier Matz
@ 2017-03-29 16:03 ` Morten Brørup
2017-03-29 20:09 ` Bruce Richardson
2017-03-31 11:18 ` Nélio Laranjeiro
2 siblings, 0 replies; 155+ messages in thread
From: Morten Brørup @ 2017-03-29 16:03 UTC (permalink / raw)
To: Olivier Matz, dev
Cc: bruce.richardson, konstantin.ananyev, andrey.chilikin, jblunck,
nelio.laranjeiro, arybchenko
> Does anyone have any other comment on this series?
Great work!
> Can it be applied?
Yes.
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-29 15:56 ` [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization Olivier Matz
2017-03-29 16:03 ` Morten Brørup
@ 2017-03-29 20:09 ` Bruce Richardson
2017-03-30 9:31 ` Bruce Richardson
2017-03-31 11:18 ` Nélio Laranjeiro
2 siblings, 1 reply; 155+ messages in thread
From: Bruce Richardson @ 2017-03-29 20:09 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, konstantin.ananyev, mb, andrey.chilikin, jblunck,
nelio.laranjeiro, arybchenko
On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> Hi,
>
> Does anyone have any other comment on this series?
> Can it be applied?
>
>
> Thanks,
> Olivier
>
I assume all driver maintainers have done performance analysis to check
for regressions. Perhaps they can confirm this is the case.
/Bruce
>
>
> On Wed, 8 Mar 2017 10:41:52 +0100, Olivier Matz <olivier.matz@6wind.com> wrote:
> > Based on discussions done in [1] and in this thread, this patchset reorganizes
> > the mbuf.
> >
> > The main changes are:
> > - reorder structure to increase vector performance on some non-ia
> > platforms.
> > - add a 64bits timestamp field in the 1st cache line. This timestamp
> > is not normalized, i.e. no unit or time reference is enforced. A
> > library may be added to do this job in the future.
> > - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> > in the pool, avoiding the need of setting m->next (located in the
> > 2nd cache line) in the Rx path for mono-segment packets.
> > - change port and nb_segs to 16 bits
> > - move seqn in the 2nd cache line
> >
> > Things discussed but not done in the patchset:
> > - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> > them in the Rx path, so it could introduce a performance regression, or
> > it would require to change all the drivers, which is not an easy task.
> > - remove the m->port field: too much impact on many examples and libraries,
> > and some people highlighted they are using it.
> > - moving m->next in the 1st cache line: there is not enough room, and having
> > it set to NULL for unused mbuf should remove the need for it.
> > - merge seqn and timestamp together in a union: we could imagine use cases
> > were both are activated. There is no flag indicating the presence of seqn,
> > so it looks preferable to keep them separated for now.
> >
> > I made some basic performance tests (ixgbe) and see no regression.
> > Other tests from NIC vendors are welcome.
> >
> > Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> > by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> > idea of what could be done.
> >
> > [1] http://dpdk.org/ml/archives/dev/2016-October/049338.html
> >
> > rfc->v1:
> > - fix reset of mbuf fields in case of indirect mbuf in rte_pktmbuf_prefree_seg()
> > - do not enforce a unit or time reference for m->timestamp
> > - reorganize fields to make vlan and outer vlan consecutive
> > - enhance documentation of m->refcnt and m->port to explain why they are 16bits
> >
> > Jerin Jacob (1):
> > mbuf: make rearm data address naturally aligned
> >
> > Olivier Matz (8):
> > mbuf: make segment prefree function public
> > mbuf: make raw free function public
> > mbuf: set mbuf fields while in pool
> > drivers/net: don't touch mbuf next or nb segs on Rx
> > mbuf: use 2 bytes for port and nb segments
> > mbuf: move sequence number in second cache line
> > mbuf: add a timestamp field
> > mbuf: reorder VLAN tci and buffer len fields
> >
> > app/test-pmd/csumonly.c | 4 +-
> > drivers/net/ena/ena_ethdev.c | 2 +-
> > drivers/net/enic/enic_rxtx.c | 2 +-
> > drivers/net/fm10k/fm10k_rxtx.c | 6 +-
> > drivers/net/fm10k/fm10k_rxtx_vec.c | 9 +-
> > drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +-
> > drivers/net/i40e/i40e_rxtx_vec_sse.c | 11 +-
> > drivers/net/ixgbe/ixgbe_rxtx.c | 10 +-
> > drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +-
> > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 9 --
> > drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 9 --
> > drivers/net/mlx5/mlx5_rxtx.c | 11 +-
> > drivers/net/mpipe/mpipe_tilegx.c | 3 +-
> > drivers/net/null/rte_eth_null.c | 2 -
> > drivers/net/virtio/virtio_rxtx.c | 4 -
> > drivers/net/virtio/virtio_rxtx_simple.h | 6 +-
> > .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 +-
> > lib/librte_mbuf/rte_mbuf.c | 4 +
> > lib/librte_mbuf/rte_mbuf.h | 123 ++++++++++++++++-----
> > 19 files changed, 130 insertions(+), 102 deletions(-)
> >
>
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-29 20:09 ` Bruce Richardson
@ 2017-03-30 9:31 ` Bruce Richardson
2017-03-30 12:02 ` Olivier Matz
0 siblings, 1 reply; 155+ messages in thread
From: Bruce Richardson @ 2017-03-30 9:31 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, konstantin.ananyev, mb, andrey.chilikin, jblunck,
nelio.laranjeiro, arybchenko
On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > Hi,
> >
> > Does anyone have any other comment on this series?
> > Can it be applied?
> >
> >
> > Thanks,
> > Olivier
> >
>
> I assume all driver maintainers have done performance analysis to check
> for regressions. Perhaps they can confirm this is the case.
>
> /Bruce
> >
In the absence, of anyone else reporting performance numbers with this
patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
fairly noticable performance drop. I still need to dig in more, e.g. do
an RFC2544 zero-loss test, and also bisect the patchset to see what
parts may be causing the problem.
Has anyone else tried any other drivers or systems to see what the perf
impact of this set may be?
/Bruce
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 9:31 ` Bruce Richardson
@ 2017-03-30 12:02 ` Olivier Matz
2017-03-30 12:23 ` Bruce Richardson
0 siblings, 1 reply; 155+ messages in thread
From: Olivier Matz @ 2017-03-30 12:02 UTC (permalink / raw)
To: Bruce Richardson
Cc: dev, konstantin.ananyev, mb, andrey.chilikin, jblunck,
nelio.laranjeiro, arybchenko
On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > Hi,
> > >
> > > Does anyone have any other comment on this series?
> > > Can it be applied?
> > >
> > >
> > > Thanks,
> > > Olivier
> > >
> >
> > I assume all driver maintainers have done performance analysis to check
> > for regressions. Perhaps they can confirm this is the case.
> >
> > /Bruce
> > >
> In the absence, of anyone else reporting performance numbers with this
> patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> fairly noticable performance drop. I still need to dig in more, e.g. do
> an RFC2544 zero-loss test, and also bisect the patchset to see what
> parts may be causing the problem.
>
> Has anyone else tried any other drivers or systems to see what the perf
> impact of this set may be?
I did, of course. I didn't see any noticeable performance drop on
ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
current version.
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 12:02 ` Olivier Matz
@ 2017-03-30 12:23 ` Bruce Richardson
2017-03-30 16:45 ` Ananyev, Konstantin
0 siblings, 1 reply; 155+ messages in thread
From: Bruce Richardson @ 2017-03-30 12:23 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, konstantin.ananyev, mb, andrey.chilikin, jblunck,
nelio.laranjeiro, arybchenko
On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > Hi,
> > > >
> > > > Does anyone have any other comment on this series?
> > > > Can it be applied?
> > > >
> > > >
> > > > Thanks,
> > > > Olivier
> > > >
> > >
> > > I assume all driver maintainers have done performance analysis to check
> > > for regressions. Perhaps they can confirm this is the case.
> > >
> > > /Bruce
> > > >
> > In the absence, of anyone else reporting performance numbers with this
> > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > fairly noticable performance drop. I still need to dig in more, e.g. do
> > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > parts may be causing the problem.
> >
> > Has anyone else tried any other drivers or systems to see what the perf
> > impact of this set may be?
>
> I did, of course. I didn't see any noticeable performance drop on
> ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> current version.
>
I had no doubt you did some perf testing! :-)
Perhaps the regression I see is limited to i40e driver. I've confirmed I
still see it with that driver in zero-loss tests, so next step is to try
and localise what change in the patchset is causing it.
Ideally, though, I think we should see acks or other comments from
driver maintainers at least confirming that they have tested. You cannot
be held responsible for testing every DPDK driver before you submit work
like this.
Regards,
/Bruce
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 12:23 ` Bruce Richardson
@ 2017-03-30 16:45 ` Ananyev, Konstantin
2017-03-30 16:47 ` Ananyev, Konstantin
0 siblings, 1 reply; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-30 16:45 UTC (permalink / raw)
To: Richardson, Bruce, Olivier Matz
Cc: dev, mb, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
> -----Original Message-----
> From: Richardson, Bruce
> Sent: Thursday, March 30, 2017 1:23 PM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
>
> On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > Hi,
> > > > >
> > > > > Does anyone have any other comment on this series?
> > > > > Can it be applied?
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Olivier
> > > > >
> > > >
> > > > I assume all driver maintainers have done performance analysis to check
> > > > for regressions. Perhaps they can confirm this is the case.
> > > >
> > > > /Bruce
> > > > >
> > > In the absence, of anyone else reporting performance numbers with this
> > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > parts may be causing the problem.
> > >
> > > Has anyone else tried any other drivers or systems to see what the perf
> > > impact of this set may be?
> >
> > I did, of course. I didn't see any noticeable performance drop on
> > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > current version.
> >
> I had no doubt you did some perf testing! :-)
>
> Perhaps the regression I see is limited to i40e driver. I've confirmed I
> still see it with that driver in zero-loss tests, so next step is to try
> and localise what change in the patchset is causing it.
>
> Ideally, though, I think we should see acks or other comments from
> driver maintainers at least confirming that they have tested. You cannot
> be held responsible for testing every DPDK driver before you submit work
> like this.
>
Unfortunately I also see a regression.
Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
Observed a drop even with default testpmd RXD/TXD numbers (128/512):
from 50.8 Mpps down to 47.8 Mpps.
>From what I am seeing the particular patch that causing it:
[dpdk-dev,3/9] mbuf: set mbuf fields while in pool
cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
cmdline:
./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w 0b:00.1 -w 0e:00.1 -- -i
Konstantin
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 16:45 ` Ananyev, Konstantin
@ 2017-03-30 16:47 ` Ananyev, Konstantin
2017-03-30 18:06 ` Ananyev, Konstantin
2017-03-31 1:00 ` Ananyev, Konstantin
0 siblings, 2 replies; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-30 16:47 UTC (permalink / raw)
To: Ananyev, Konstantin, Richardson, Bruce, Olivier Matz
Cc: dev, mb, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Thursday, March 30, 2017 5:45 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
>
>
>
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Thursday, March 30, 2017 1:23 PM
> > To: Olivier Matz <olivier.matz@6wind.com>
> > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> >
> > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Does anyone have any other comment on this series?
> > > > > > Can it be applied?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Olivier
> > > > > >
> > > > >
> > > > > I assume all driver maintainers have done performance analysis to check
> > > > > for regressions. Perhaps they can confirm this is the case.
> > > > >
> > > > > /Bruce
> > > > > >
> > > > In the absence, of anyone else reporting performance numbers with this
> > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > parts may be causing the problem.
> > > >
> > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > impact of this set may be?
> > >
> > > I did, of course. I didn't see any noticeable performance drop on
> > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > current version.
> > >
> > I had no doubt you did some perf testing! :-)
> >
> > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > still see it with that driver in zero-loss tests, so next step is to try
> > and localise what change in the patchset is causing it.
> >
> > Ideally, though, I think we should see acks or other comments from
> > driver maintainers at least confirming that they have tested. You cannot
> > be held responsible for testing every DPDK driver before you submit work
> > like this.
> >
>
> Unfortunately I also see a regression.
> Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
Sorry, forgot to mention - it is on ixgbe.
So it doesn't look like i40e specific.
> Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> from 50.8 Mpps down to 47.8 Mpps.
> From what I am seeing the particular patch that causing it:
> [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
>
> cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> cmdline:
> ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> 0b:00.1 -w 0e:00.1 -- -i
>
> Konstantin
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 16:47 ` Ananyev, Konstantin
@ 2017-03-30 18:06 ` Ananyev, Konstantin
2017-03-31 8:41 ` Olivier Matz
2017-03-31 1:00 ` Ananyev, Konstantin
1 sibling, 1 reply; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-30 18:06 UTC (permalink / raw)
To: Richardson, Bruce, Olivier Matz
Cc: dev, mb, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, March 30, 2017 5:48 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> Subject: RE: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
>
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > Sent: Thursday, March 30, 2017 5:45 PM
> > To: Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz <olivier.matz@6wind.com>
> > Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> > nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> >
> >
> >
> > > -----Original Message-----
> > > From: Richardson, Bruce
> > > Sent: Thursday, March 30, 2017 1:23 PM
> > > To: Olivier Matz <olivier.matz@6wind.com>
> > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > >
> > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Does anyone have any other comment on this series?
> > > > > > > Can it be applied?
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Olivier
> > > > > > >
> > > > > >
> > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > >
> > > > > > /Bruce
> > > > > > >
> > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > parts may be causing the problem.
> > > > >
> > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > impact of this set may be?
> > > >
> > > > I did, of course. I didn't see any noticeable performance drop on
> > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > current version.
> > > >
> > > I had no doubt you did some perf testing! :-)
> > >
> > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > still see it with that driver in zero-loss tests, so next step is to try
> > > and localise what change in the patchset is causing it.
> > >
> > > Ideally, though, I think we should see acks or other comments from
> > > driver maintainers at least confirming that they have tested. You cannot
> > > be held responsible for testing every DPDK driver before you submit work
> > > like this.
> > >
> >
> > Unfortunately I also see a regression.
> > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
>
> Sorry, forgot to mention - it is on ixgbe.
> So it doesn't look like i40e specific.
>
> > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > from 50.8 Mpps down to 47.8 Mpps.
> > From what I am seeing the particular patch that causing it:
> > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> >
> > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > cmdline:
> > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > 0b:00.1 -w 0e:00.1 -- -i
> >
Actually one more question regarding:
[dpdk-dev,9/9] mbuf: reorder VLAN tci and buffer len fields
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index fd97bd3..ada98d5 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -449,8 +449,7 @@ struct rte_mbuf {
uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
uint16_t data_len; /**< Amount of data in segment buffer. */
- /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
- uint16_t vlan_tci;
+ uint16_t buf_len; /**< Size of segment buffer. */
union {
uint32_t rss; /**< RSS hash result if RSS enabled */
@@ -475,11 +474,11 @@ struct rte_mbuf {
uint32_t usr; /**< User defined tags. See rte_distributor_process() */
} hash; /**< hash information */
+ /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+ uint16_t vlan_tci;
/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
uint16_t vlan_tci_outer;
- uint16_t buf_len; /**< Length of segment buffer. */
-
/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
* are not normalized but are always the same for a given port.
*/
How ixgbe and i40e SSE version supposed to work correctly after that change?
As I remember both of them sets vlan_tci as part of 16B shuffle operation.
Something like that:
pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
...
mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
pkt_mb4);
But now vlan_tci is swapped with buf_len.
Which means 2 things to me:
It is more than 16B away from rx_descriptor_fields1 and can't be updated in one go anymore,
and instead of vlan_tci we are updating buf_len.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 18:06 ` Ananyev, Konstantin
@ 2017-03-31 8:41 ` Olivier Matz
2017-03-31 9:58 ` Ananyev, Konstantin
0 siblings, 1 reply; 155+ messages in thread
From: Olivier Matz @ 2017-03-31 8:41 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Richardson, Bruce, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
On Thu, 30 Mar 2017 18:06:35 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, March 30, 2017 5:48 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz
> > <olivier.matz@6wind.com>
> > Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> > nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > Subject: RE: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> >
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > > Sent: Thursday, March 30, 2017 5:45 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz <olivier.matz@6wind.com>
> > > Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> > > nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richardson, Bruce
> > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > >
> > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > Can it be applied?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Olivier
> > > > > > > >
> > > > > > >
> > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > >
> > > > > > > /Bruce
> > > > > > > >
> > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > parts may be causing the problem.
> > > > > >
> > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > impact of this set may be?
> > > > >
> > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > current version.
> > > > >
> > > > I had no doubt you did some perf testing! :-)
> > > >
> > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > and localise what change in the patchset is causing it.
> > > >
> > > > Ideally, though, I think we should see acks or other comments from
> > > > driver maintainers at least confirming that they have tested. You cannot
> > > > be held responsible for testing every DPDK driver before you submit work
> > > > like this.
> > > >
> > >
> > > Unfortunately I also see a regression.
> > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> >
> > Sorry, forgot to mention - it is on ixgbe.
> > So it doesn't look like i40e specific.
> >
> > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > from 50.8 Mpps down to 47.8 Mpps.
> > > From what I am seeing the particular patch that causing it:
> > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > >
> > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > cmdline:
> > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > > 0b:00.1 -w 0e:00.1 -- -i
> > >
>
> Actually one more question regarding:
> [dpdk-dev,9/9] mbuf: reorder VLAN tci and buffer len fields
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index fd97bd3..ada98d5 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -449,8 +449,7 @@ struct rte_mbuf {
>
> uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
> uint16_t data_len; /**< Amount of data in segment buffer. */
> - /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
> - uint16_t vlan_tci;
> + uint16_t buf_len; /**< Size of segment buffer. */
>
> union {
> uint32_t rss; /**< RSS hash result if RSS enabled */
> @@ -475,11 +474,11 @@ struct rte_mbuf {
> uint32_t usr; /**< User defined tags. See rte_distributor_process() */
> } hash; /**< hash information */
>
> + /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
> + uint16_t vlan_tci;
> /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
> uint16_t vlan_tci_outer;
>
> - uint16_t buf_len; /**< Length of segment buffer. */
> -
> /** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
> * are not normalized but are always the same for a given port.
> */
>
> How ixgbe and i40e SSE version supposed to work correctly after that change?
> As I remember both of them sets vlan_tci as part of 16B shuffle operation.
> Something like that:
> pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
> ...
> mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
> pkt_mb4);
>
> But now vlan_tci is swapped with buf_len.
> Which means 2 things to me:
> It is more than 16B away from rx_descriptor_fields1 and can't be updated in one go anymore,
> and instead of vlan_tci we are updating buf_len.
Sorry, I missed it. But this shows something problematic: changing the
order of fields in a structure breaks code without notification. I think
that drivers expecting a field at a specific position should have some
BUG_ON() to check that the condition is still valid. We can't expect anyone
to know all the constraints of all vectors PMDs in DPDK.
The original idea of this patch was to group vlan_tci and vlan_outer_tci,
which looked to be a good idea at first glance. If it requires to change
all vector code, let's drop it.
Just for the exercice, let's imagine we need that patch. What would be
the procedure to have it integrated? How can we detect there is an issue?
Who would be in charge of modifying all the vector code in PMDs?
Regards,
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 8:41 ` Olivier Matz
@ 2017-03-31 9:58 ` Ananyev, Konstantin
0 siblings, 0 replies; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-31 9:58 UTC (permalink / raw)
To: Olivier Matz
Cc: Richardson, Bruce, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
Hi Olivier,
>
> On Thu, 30 Mar 2017 18:06:35 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, March 30, 2017 5:48 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz
> > > <olivier.matz@6wind.com>
> > > Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> > > nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > Subject: RE: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > > > Sent: Thursday, March 30, 2017 5:45 PM
> > > > To: Richardson, Bruce <bruce.richardson@intel.com>; Olivier Matz <olivier.matz@6wind.com>
> > > > Cc: dev@dpdk.org; mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> > > > nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richardson, Bruce
> > > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > > >
> > > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > > Can it be applied?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Olivier
> > > > > > > > >
> > > > > > > >
> > > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > > >
> > > > > > > > /Bruce
> > > > > > > > >
> > > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > > parts may be causing the problem.
> > > > > > >
> > > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > > impact of this set may be?
> > > > > >
> > > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > > current version.
> > > > > >
> > > > > I had no doubt you did some perf testing! :-)
> > > > >
> > > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > > and localise what change in the patchset is causing it.
> > > > >
> > > > > Ideally, though, I think we should see acks or other comments from
> > > > > driver maintainers at least confirming that they have tested. You cannot
> > > > > be held responsible for testing every DPDK driver before you submit work
> > > > > like this.
> > > > >
> > > >
> > > > Unfortunately I also see a regression.
> > > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> > >
> > > Sorry, forgot to mention - it is on ixgbe.
> > > So it doesn't look like i40e specific.
> > >
> > > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > > from 50.8 Mpps down to 47.8 Mpps.
> > > > From what I am seeing the particular patch that causing it:
> > > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > > >
> > > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > > cmdline:
> > > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -
> w
> > > > 0b:00.1 -w 0e:00.1 -- -i
> > > >
> >
> > Actually one more question regarding:
> > [dpdk-dev,9/9] mbuf: reorder VLAN tci and buffer len fields
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index fd97bd3..ada98d5 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -449,8 +449,7 @@ struct rte_mbuf {
> >
> > uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
> > uint16_t data_len; /**< Amount of data in segment buffer. */
> > - /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
> > - uint16_t vlan_tci;
> > + uint16_t buf_len; /**< Size of segment buffer. */
> >
> > union {
> > uint32_t rss; /**< RSS hash result if RSS enabled */
> > @@ -475,11 +474,11 @@ struct rte_mbuf {
> > uint32_t usr; /**< User defined tags. See rte_distributor_process() */
> > } hash; /**< hash information */
> >
> > + /** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
> > + uint16_t vlan_tci;
> > /** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
> > uint16_t vlan_tci_outer;
> >
> > - uint16_t buf_len; /**< Length of segment buffer. */
> > -
> > /** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
> > * are not normalized but are always the same for a given port.
> > */
> >
> > How ixgbe and i40e SSE version supposed to work correctly after that change?
> > As I remember both of them sets vlan_tci as part of 16B shuffle operation.
> > Something like that:
> > pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
> > ...
> > mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
> > pkt_mb4);
> >
> > But now vlan_tci is swapped with buf_len.
> > Which means 2 things to me:
> > It is more than 16B away from rx_descriptor_fields1 and can't be updated in one go anymore,
> > and instead of vlan_tci we are updating buf_len.
>
>
> Sorry, I missed it. But this shows something problematic: changing the
> order of fields in a structure breaks code without notification. I think
> that drivers expecting a field at a specific position should have some
> BUG_ON() to check that the condition is still valid. We can't expect anyone
> to know all the constraints of all vectors PMDs in DPDK.
>
> The original idea of this patch was to group vlan_tci and vlan_outer_tci,
> which looked to be a good idea at first glance. If it requires to change
> all vector code, let's drop it.
>
> Just for the exercice, let's imagine we need that patch. What would be
> the procedure to have it integrated? How can we detect there is an issue?
> Who would be in charge of modifying all the vector code in PMDs?
>
Indeed right now there is no way to know what is PMD requirement on mbuf layout.
Adding BUG_ON() into particular RX/TX implementation that has such constrains seems
like a very good idea to me.
Apart from that I don't know off-hand how we can make restructuring mbuf less painful.
Konstantin
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-30 16:47 ` Ananyev, Konstantin
2017-03-30 18:06 ` Ananyev, Konstantin
@ 2017-03-31 1:00 ` Ananyev, Konstantin
2017-03-31 7:21 ` Morten Brørup
2017-03-31 8:26 ` Olivier Matz
1 sibling, 2 replies; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-31 1:00 UTC (permalink / raw)
To: Richardson, Bruce, Olivier Matz
Cc: dev, mb, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
> >
> >
> >
> > > -----Original Message-----
> > > From: Richardson, Bruce
> > > Sent: Thursday, March 30, 2017 1:23 PM
> > > To: Olivier Matz <olivier.matz@6wind.com>
> > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > >
> > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Does anyone have any other comment on this series?
> > > > > > > Can it be applied?
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Olivier
> > > > > > >
> > > > > >
> > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > >
> > > > > > /Bruce
> > > > > > >
> > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > parts may be causing the problem.
> > > > >
> > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > impact of this set may be?
> > > >
> > > > I did, of course. I didn't see any noticeable performance drop on
> > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > current version.
> > > >
> > > I had no doubt you did some perf testing! :-)
> > >
> > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > still see it with that driver in zero-loss tests, so next step is to try
> > > and localise what change in the patchset is causing it.
> > >
> > > Ideally, though, I think we should see acks or other comments from
> > > driver maintainers at least confirming that they have tested. You cannot
> > > be held responsible for testing every DPDK driver before you submit work
> > > like this.
> > >
> >
> > Unfortunately I also see a regression.
> > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
>
> Sorry, forgot to mention - it is on ixgbe.
> So it doesn't look like i40e specific.
>
> > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > from 50.8 Mpps down to 47.8 Mpps.
> > From what I am seeing the particular patch that causing it:
> > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> >
> > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > cmdline:
> > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > 0b:00.1 -w 0e:00.1 -- -i
> >
After applying the patch below got nearly original numbers (though not quite) on my box.
dpdk.org mainline: 50.8
with Olivier patch: 47.8
with patch below: 50.4
What I tried to do in it - avoid unnecessary updates of mbuf inside rte_pktmbuf_prefree_seg().
For one segment per packet it seems to help.
Though so far I didn't try it on i40e and didn't do any testing for multi-seg scenario.
Konstantin
$ cat patch.mod4
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d7af852..558233f 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1283,12 +1283,28 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
__rte_mbuf_sanity_check(m, 0);
- if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
+ if (likely(rte_mbuf_refcnt_read(m) == 1)) {
+
+ if (m->next != NULL) {
+ m->next = NULL;
+ m->nb_segs = 1;
+ }
+
+ if (RTE_MBUF_INDIRECT(m))
+ rte_pktmbuf_detach(m);
+
+ return m;
+
+ } else if (rte_atomic16_add_return(&m->refcnt_atomic, -1) == 0) {
+
if (RTE_MBUF_INDIRECT(m))
rte_pktmbuf_detach(m);
- m->next = NULL;
- m->nb_segs = 1;
+ if (m->next != NULL) {
+ m->next = NULL;
+ m->nb_segs = 1;
+ }
+
rte_mbuf_refcnt_set(m, 1);
return m;
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 1:00 ` Ananyev, Konstantin
@ 2017-03-31 7:21 ` Morten Brørup
2017-03-31 8:26 ` Olivier Matz
1 sibling, 0 replies; 155+ messages in thread
From: Morten Brørup @ 2017-03-31 7:21 UTC (permalink / raw)
To: Ananyev, Konstantin, Richardson, Bruce, Olivier Matz
Cc: dev, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Friday, March 31, 2017 3:01 AM
>
> After applying the patch below got nearly original numbers (though not
> quite) on my box.
> dpdk.org mainline: 50.8
> with Olivier patch: 47.8
> with patch below: 50.4
> What I tried to do in it - avoid unnecessary updates of mbuf inside
> rte_pktmbuf_prefree_seg().
> For one segment per packet it seems to help.
> Though so far I didn't try it on i40e and didn't do any testing for
> multi-seg scenario.
> Konstantin
>
> $ cat patch.mod4
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index d7af852..558233f 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1283,12 +1283,28 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m) {
> __rte_mbuf_sanity_check(m, 0);
>
> - if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
> + if (likely(rte_mbuf_refcnt_read(m) == 1)) {
> +
> + if (m->next != NULL) {
> + m->next = NULL;
> + m->nb_segs = 1;
> + }
> +
> + if (RTE_MBUF_INDIRECT(m))
> + rte_pktmbuf_detach(m);
> +
> + return m;
> +
> + } else if (rte_atomic16_add_return(&m->refcnt_atomic, -1) == 0)
> + {
> +
> if (RTE_MBUF_INDIRECT(m))
> rte_pktmbuf_detach(m);
>
> - m->next = NULL;
> - m->nb_segs = 1;
> + if (m->next != NULL) {
> + m->next = NULL;
> + m->nb_segs = 1;
> + }
> +
> rte_mbuf_refcnt_set(m, 1);
>
> return m;
Maybe the access to the second cache line (for single-segment packets) can be avoided altogether in rte_pktmbuf_prefree_seg() by adding a multi-segment indication flag to the first cache line, and using this flag instead of the test for m->next != NULL.
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 1:00 ` Ananyev, Konstantin
2017-03-31 7:21 ` Morten Brørup
@ 2017-03-31 8:26 ` Olivier Matz
2017-03-31 8:41 ` Bruce Richardson
1 sibling, 1 reply; 155+ messages in thread
From: Olivier Matz @ 2017-03-31 8:26 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Richardson, Bruce, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
Hi,
On Fri, 31 Mar 2017 01:00:49 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richardson, Bruce
> > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > >
> > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > Can it be applied?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Olivier
> > > > > > > >
> > > > > > >
> > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > >
> > > > > > > /Bruce
> > > > > > > >
> > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > parts may be causing the problem.
> > > > > >
> > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > impact of this set may be?
> > > > >
> > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > current version.
> > > > >
> > > > I had no doubt you did some perf testing! :-)
> > > >
> > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > and localise what change in the patchset is causing it.
> > > >
> > > > Ideally, though, I think we should see acks or other comments from
> > > > driver maintainers at least confirming that they have tested. You cannot
> > > > be held responsible for testing every DPDK driver before you submit work
> > > > like this.
> > > >
> > >
> > > Unfortunately I also see a regression.
> > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> >
> > Sorry, forgot to mention - it is on ixgbe.
> > So it doesn't look like i40e specific.
> >
> > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > from 50.8 Mpps down to 47.8 Mpps.
> > > From what I am seeing the particular patch that causing it:
> > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > >
> > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > cmdline:
> > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > > 0b:00.1 -w 0e:00.1 -- -i
> > >
>
> After applying the patch below got nearly original numbers (though not quite) on my box.
> dpdk.org mainline: 50.8
> with Olivier patch: 47.8
> with patch below: 50.4
> What I tried to do in it - avoid unnecessary updates of mbuf inside rte_pktmbuf_prefree_seg().
> For one segment per packet it seems to help.
> Though so far I didn't try it on i40e and didn't do any testing for multi-seg scenario.
> Konstantin
I replayed my tests, and I can also see a performance loss with 1c/1t
(ixgbe), not in the same magnitude however. Here is what I have in MPPS:
1c/1t 1c/2t
53.3 58.7 current
52.1 58.8 original patchset
53.3 58.8 removed patches 3 and 9
53.1 58.7 with konstantin's patch
So we have 2 options here:
1/ integrate Konstantin's patch in the patchset (thank you, by the way)
2/ remove patch 3, and keep it for later until we have something that
really no impact
I'd prefer 1/, knowing that the difference is really small in terms
of cycles per packet.
Regards,
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 8:26 ` Olivier Matz
@ 2017-03-31 8:41 ` Bruce Richardson
2017-03-31 8:59 ` Olivier Matz
0 siblings, 1 reply; 155+ messages in thread
From: Bruce Richardson @ 2017-03-31 8:41 UTC (permalink / raw)
To: Olivier Matz
Cc: Ananyev, Konstantin, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> Hi,
>
> On Fri, 31 Mar 2017 01:00:49 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richardson, Bruce
> > > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > > >
> > > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > > Can it be applied?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Olivier
> > > > > > > > >
> > > > > > > >
> > > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > > >
> > > > > > > > /Bruce
> > > > > > > > >
> > > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > > parts may be causing the problem.
> > > > > > >
> > > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > > impact of this set may be?
> > > > > >
> > > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > > current version.
> > > > > >
> > > > > I had no doubt you did some perf testing! :-)
> > > > >
> > > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > > and localise what change in the patchset is causing it.
> > > > >
> > > > > Ideally, though, I think we should see acks or other comments from
> > > > > driver maintainers at least confirming that they have tested. You cannot
> > > > > be held responsible for testing every DPDK driver before you submit work
> > > > > like this.
> > > > >
> > > >
> > > > Unfortunately I also see a regression.
> > > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> > >
> > > Sorry, forgot to mention - it is on ixgbe.
> > > So it doesn't look like i40e specific.
> > >
> > > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > > from 50.8 Mpps down to 47.8 Mpps.
> > > > From what I am seeing the particular patch that causing it:
> > > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > > >
> > > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > > cmdline:
> > > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > > > 0b:00.1 -w 0e:00.1 -- -i
> > > >
> >
> > After applying the patch below got nearly original numbers (though not quite) on my box.
> > dpdk.org mainline: 50.8
> > with Olivier patch: 47.8
> > with patch below: 50.4
> > What I tried to do in it - avoid unnecessary updates of mbuf inside rte_pktmbuf_prefree_seg().
> > For one segment per packet it seems to help.
> > Though so far I didn't try it on i40e and didn't do any testing for multi-seg scenario.
> > Konstantin
>
> I replayed my tests, and I can also see a performance loss with 1c/1t
> (ixgbe), not in the same magnitude however. Here is what I have in MPPS:
>
> 1c/1t 1c/2t
> 53.3 58.7 current
> 52.1 58.8 original patchset
> 53.3 58.8 removed patches 3 and 9
> 53.1 58.7 with konstantin's patch
>
> So we have 2 options here:
>
> 1/ integrate Konstantin's patch in the patchset (thank you, by the way)
> 2/ remove patch 3, and keep it for later until we have something that
> really no impact
>
> I'd prefer 1/, knowing that the difference is really small in terms
> of cycles per packet.
>
>
1 is certainly the more attractive option. However, I think we can
afford to spend a little more time looking at this before we decide.
I'll try and check out the perf numbers I get with i40e with
Konstantin's patch today. We also need to double check the other
possible issues he reported in his other emails. While I don't want this
patchset held up for a long time, I think an extra 24/48 hours is
probably needed on it.
/Bruce
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 8:41 ` Bruce Richardson
@ 2017-03-31 8:59 ` Olivier Matz
2017-03-31 9:18 ` Ananyev, Konstantin
2017-03-31 9:23 ` Bruce Richardson
0 siblings, 2 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-31 8:59 UTC (permalink / raw)
To: Bruce Richardson
Cc: Ananyev, Konstantin, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
On Fri, 31 Mar 2017 09:41:39 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> > Hi,
> >
> > On Fri, 31 Mar 2017 01:00:49 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richardson, Bruce
> > > > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > > > Can it be applied?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Olivier
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > > > >
> > > > > > > > > /Bruce
> > > > > > > > > >
> > > > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > > > parts may be causing the problem.
> > > > > > > >
> > > > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > > > impact of this set may be?
> > > > > > >
> > > > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > > > current version.
> > > > > > >
> > > > > > I had no doubt you did some perf testing! :-)
> > > > > >
> > > > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > > > and localise what change in the patchset is causing it.
> > > > > >
> > > > > > Ideally, though, I think we should see acks or other comments from
> > > > > > driver maintainers at least confirming that they have tested. You cannot
> > > > > > be held responsible for testing every DPDK driver before you submit work
> > > > > > like this.
> > > > > >
> > > > >
> > > > > Unfortunately I also see a regression.
> > > > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> > > >
> > > > Sorry, forgot to mention - it is on ixgbe.
> > > > So it doesn't look like i40e specific.
> > > >
> > > > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > > > from 50.8 Mpps down to 47.8 Mpps.
> > > > > From what I am seeing the particular patch that causing it:
> > > > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > > > >
> > > > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > > > cmdline:
> > > > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > > > > 0b:00.1 -w 0e:00.1 -- -i
> > > > >
> > >
> > > After applying the patch below got nearly original numbers (though not quite) on my box.
> > > dpdk.org mainline: 50.8
> > > with Olivier patch: 47.8
> > > with patch below: 50.4
> > > What I tried to do in it - avoid unnecessary updates of mbuf inside rte_pktmbuf_prefree_seg().
> > > For one segment per packet it seems to help.
> > > Though so far I didn't try it on i40e and didn't do any testing for multi-seg scenario.
> > > Konstantin
> >
> > I replayed my tests, and I can also see a performance loss with 1c/1t
> > (ixgbe), not in the same magnitude however. Here is what I have in MPPS:
> >
> > 1c/1t 1c/2t
> > 53.3 58.7 current
> > 52.1 58.8 original patchset
> > 53.3 58.8 removed patches 3 and 9
> > 53.1 58.7 with konstantin's patch
> >
> > So we have 2 options here:
> >
> > 1/ integrate Konstantin's patch in the patchset (thank you, by the way)
> > 2/ remove patch 3, and keep it for later until we have something that
> > really no impact
> >
> > I'd prefer 1/, knowing that the difference is really small in terms
> > of cycles per packet.
> >
> >
> 1 is certainly the more attractive option. However, I think we can
> afford to spend a little more time looking at this before we decide.
> I'll try and check out the perf numbers I get with i40e with
> Konstantin's patch today. We also need to double check the other
> possible issues he reported in his other emails. While I don't want this
> patchset held up for a long time, I think an extra 24/48 hours is
> probably needed on it.
>
Yes, now that we have the "test momentum", try not to loose it ;)
I'm guilty to have missed the performance loss, but honnestly,
I'm a bit sad that nobody tried to this patchset before (it
is available for more than 2 months), knowing this is probably one of
the most critical part of dpdk. I think we need to be better next
time.
Anyway, thank you for your test and feedback now.
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 8:59 ` Olivier Matz
@ 2017-03-31 9:18 ` Ananyev, Konstantin
2017-03-31 9:36 ` Olivier Matz
2017-04-03 16:15 ` Thomas Monjalon
2017-03-31 9:23 ` Bruce Richardson
1 sibling, 2 replies; 155+ messages in thread
From: Ananyev, Konstantin @ 2017-03-31 9:18 UTC (permalink / raw)
To: Olivier Matz, Richardson, Bruce
Cc: dev, mb, Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko
Hi guys,
>
> On Fri, 31 Mar 2017 09:41:39 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> > > Hi,
> > >
> > > On Fri, 31 Mar 2017 01:00:49 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richardson, Bruce
> > > > > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > > > > Can it be applied?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Olivier
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > > > > >
> > > > > > > > > > /Bruce
> > > > > > > > > > >
> > > > > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > > > > parts may be causing the problem.
> > > > > > > > >
> > > > > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > > > > impact of this set may be?
> > > > > > > >
> > > > > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > > > > current version.
> > > > > > > >
> > > > > > > I had no doubt you did some perf testing! :-)
> > > > > > >
> > > > > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > > > > and localise what change in the patchset is causing it.
> > > > > > >
> > > > > > > Ideally, though, I think we should see acks or other comments from
> > > > > > > driver maintainers at least confirming that they have tested. You cannot
> > > > > > > be held responsible for testing every DPDK driver before you submit work
> > > > > > > like this.
> > > > > > >
> > > > > >
> > > > > > Unfortunately I also see a regression.
> > > > > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> > > > >
> > > > > Sorry, forgot to mention - it is on ixgbe.
> > > > > So it doesn't look like i40e specific.
> > > > >
> > > > > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > > > > from 50.8 Mpps down to 47.8 Mpps.
> > > > > > From what I am seeing the particular patch that causing it:
> > > > > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > > > > >
> > > > > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > > > > cmdline:
> > > > > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w
> 07:00.1 -w
> > > > > > 0b:00.1 -w 0e:00.1 -- -i
> > > > > >
> > > >
> > > > After applying the patch below got nearly original numbers (though not quite) on my box.
> > > > dpdk.org mainline: 50.8
> > > > with Olivier patch: 47.8
> > > > with patch below: 50.4
> > > > What I tried to do in it - avoid unnecessary updates of mbuf inside rte_pktmbuf_prefree_seg().
> > > > For one segment per packet it seems to help.
> > > > Though so far I didn't try it on i40e and didn't do any testing for multi-seg scenario.
> > > > Konstantin
> > >
> > > I replayed my tests, and I can also see a performance loss with 1c/1t
> > > (ixgbe), not in the same magnitude however. Here is what I have in MPPS:
> > >
> > > 1c/1t 1c/2t
> > > 53.3 58.7 current
> > > 52.1 58.8 original patchset
> > > 53.3 58.8 removed patches 3 and 9
> > > 53.1 58.7 with konstantin's patch
> > >
> > > So we have 2 options here:
> > >
> > > 1/ integrate Konstantin's patch in the patchset (thank you, by the way)
> > > 2/ remove patch 3, and keep it for later until we have something that
> > > really no impact
> > >
> > > I'd prefer 1/, knowing that the difference is really small in terms
> > > of cycles per packet.
> > >
> > >
> > 1 is certainly the more attractive option. However, I think we can
> > afford to spend a little more time looking at this before we decide.
> > I'll try and check out the perf numbers I get with i40e with
> > Konstantin's patch today. We also need to double check the other
> > possible issues he reported in his other emails. While I don't want this
> > patchset held up for a long time, I think an extra 24/48 hours is
> > probably needed on it.
> >
>
> Yes, now that we have the "test momentum", try not to loose it ;)
>
> I'm guilty to have missed the performance loss, but honnestly,
> I'm a bit sad that nobody tried to this patchset before (it
> is available for more than 2 months), knowing this is probably one of
> the most critical part of dpdk. I think we need to be better next
> time.
>
> Anyway, thank you for your test and feedback now.
I am also leaning towards option 1, but agree that some extra testing first
need to be done before making the final decision.
BTW, path #9 need to be removed anyway, even if will go for path #1.
Konstantin
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 9:18 ` Ananyev, Konstantin
@ 2017-03-31 9:36 ` Olivier Matz
2017-04-03 16:15 ` Thomas Monjalon
1 sibling, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-03-31 9:36 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Richardson, Bruce, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
On Fri, 31 Mar 2017 09:18:22 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> BTW, path #9 need to be removed anyway, even if will go for path #1.
Yes
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 9:18 ` Ananyev, Konstantin
2017-03-31 9:36 ` Olivier Matz
@ 2017-04-03 16:15 ` Thomas Monjalon
2017-04-04 7:58 ` Olivier MATZ
1 sibling, 1 reply; 155+ messages in thread
From: Thomas Monjalon @ 2017-04-03 16:15 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, Ananyev, Konstantin, Richardson, Bruce, mb, Chilikin,
Andrey, jblunck, nelio.laranjeiro, arybchenko
2017-03-31 09:18, Ananyev, Konstantin:
> > On Fri, 31 Mar 2017 09:41:39 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> > > > I replayed my tests, and I can also see a performance loss with 1c/1t
> > > > (ixgbe), not in the same magnitude however. Here is what I have in MPPS:
> > > >
> > > > 1c/1t 1c/2t
> > > > 53.3 58.7 current
> > > > 52.1 58.8 original patchset
> > > > 53.3 58.8 removed patches 3 and 9
> > > > 53.1 58.7 with konstantin's patch
> > > >
> > > > So we have 2 options here:
> > > >
> > > > 1/ integrate Konstantin's patch in the patchset (thank you, by the way)
> > > > 2/ remove patch 3, and keep it for later until we have something that
> > > > really no impact
> > > >
> > > > I'd prefer 1/, knowing that the difference is really small in terms
> > > > of cycles per packet.
> > > >
> > > >
> > > 1 is certainly the more attractive option. However, I think we can
> > > afford to spend a little more time looking at this before we decide.
> > > I'll try and check out the perf numbers I get with i40e with
> > > Konstantin's patch today. We also need to double check the other
> > > possible issues he reported in his other emails. While I don't want this
> > > patchset held up for a long time, I think an extra 24/48 hours is
> > > probably needed on it.
> > >
> >
> > Yes, now that we have the "test momentum", try not to loose it ;)
> >
> > I'm guilty to have missed the performance loss, but honnestly,
> > I'm a bit sad that nobody tried to this patchset before (it
> > is available for more than 2 months), knowing this is probably one of
> > the most critical part of dpdk. I think we need to be better next
> > time.
> >
> > Anyway, thank you for your test and feedback now.
>
> I am also leaning towards option 1, but agree that some extra testing first
> need to be done before making the final decision.
> BTW, path #9 need to be removed anyway, even if will go for path #1.
> Konstantin
Please, can we have a conclusion now?
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-04-03 16:15 ` Thomas Monjalon
@ 2017-04-04 7:58 ` Olivier MATZ
2017-04-04 8:53 ` Bruce Richardson
0 siblings, 1 reply; 155+ messages in thread
From: Olivier MATZ @ 2017-04-04 7:58 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ananyev, Konstantin, Richardson, Bruce, mb, Chilikin,
Andrey, jblunck, nelio.laranjeiro, arybchenko
On Mon, 03 Apr 2017 18:15:25 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 2017-03-31 09:18, Ananyev, Konstantin:
> > > On Fri, 31 Mar 2017 09:41:39 +0100, Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > > > On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> > > > > I replayed my tests, and I can also see a performance loss
> > > > > with 1c/1t (ixgbe), not in the same magnitude however. Here
> > > > > is what I have in MPPS:
> > > > >
> > > > > 1c/1t 1c/2t
> > > > > 53.3 58.7 current
> > > > > 52.1 58.8 original patchset
> > > > > 53.3 58.8 removed patches 3 and 9
> > > > > 53.1 58.7 with konstantin's patch
> > > > >
> > > > > So we have 2 options here:
> > > > >
> > > > > 1/ integrate Konstantin's patch in the patchset (thank you,
> > > > > by the way) 2/ remove patch 3, and keep it for later until we
> > > > > have something that really no impact
> > > > >
> > > > > I'd prefer 1/, knowing that the difference is really small in
> > > > > terms of cycles per packet.
> > > > >
> > > > >
> > > > 1 is certainly the more attractive option. However, I think we
> > > > can afford to spend a little more time looking at this before
> > > > we decide. I'll try and check out the perf numbers I get with
> > > > i40e with Konstantin's patch today. We also need to double
> > > > check the other possible issues he reported in his other
> > > > emails. While I don't want this patchset held up for a long
> > > > time, I think an extra 24/48 hours is probably needed on it.
> > > >
> > >
> > > Yes, now that we have the "test momentum", try not to loose it ;)
> > >
> > > I'm guilty to have missed the performance loss, but honnestly,
> > > I'm a bit sad that nobody tried to this patchset before (it
> > > is available for more than 2 months), knowing this is probably
> > > one of the most critical part of dpdk. I think we need to be
> > > better next time.
> > >
> > > Anyway, thank you for your test and feedback now.
> >
> > I am also leaning towards option 1, but agree that some extra
> > testing first need to be done before making the final decision.
> > BTW, path #9 need to be removed anyway, even if will go for path #1.
> > Konstantin
>
> Please, can we have a conclusion now?
I think we sholuld go with proposition 1, I can resubmit an updated
patch today.
This rework is needed at least for metrics libraries.
To summarize the perf data we have:
- There is a small impact on Intel NICs (-0.4MPPS on ixgbe in iofwd
mode according to Konstantin's test, which is less than 1%). I guess
it can be optimized.
- On mlx5, there is a gain (+0.8MPPS).
- On sfc, there is also a gain.
Any comment?
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-04-04 7:58 ` Olivier MATZ
@ 2017-04-04 8:53 ` Bruce Richardson
0 siblings, 0 replies; 155+ messages in thread
From: Bruce Richardson @ 2017-04-04 8:53 UTC (permalink / raw)
To: Olivier MATZ
Cc: Thomas Monjalon, dev, Ananyev, Konstantin, mb, Chilikin, Andrey,
jblunck, nelio.laranjeiro, arybchenko
On Tue, Apr 04, 2017 at 09:58:49AM +0200, Olivier MATZ wrote:
> On Mon, 03 Apr 2017 18:15:25 +0200
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>
> > 2017-03-31 09:18, Ananyev, Konstantin:
> > > > On Fri, 31 Mar 2017 09:41:39 +0100, Bruce Richardson
> > > > <bruce.richardson@intel.com> wrote:
> > > > > On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> > > > > > I replayed my tests, and I can also see a performance loss
> > > > > > with 1c/1t (ixgbe), not in the same magnitude however. Here
> > > > > > is what I have in MPPS:
> > > > > >
> > > > > > 1c/1t 1c/2t
> > > > > > 53.3 58.7 current
> > > > > > 52.1 58.8 original patchset
> > > > > > 53.3 58.8 removed patches 3 and 9
> > > > > > 53.1 58.7 with konstantin's patch
> > > > > >
> > > > > > So we have 2 options here:
> > > > > >
> > > > > > 1/ integrate Konstantin's patch in the patchset (thank you,
> > > > > > by the way) 2/ remove patch 3, and keep it for later until we
> > > > > > have something that really no impact
> > > > > >
> > > > > > I'd prefer 1/, knowing that the difference is really small in
> > > > > > terms of cycles per packet.
> > > > > >
> > > > > >
> > > > > 1 is certainly the more attractive option. However, I think we
> > > > > can afford to spend a little more time looking at this before
> > > > > we decide. I'll try and check out the perf numbers I get with
> > > > > i40e with Konstantin's patch today. We also need to double
> > > > > check the other possible issues he reported in his other
> > > > > emails. While I don't want this patchset held up for a long
> > > > > time, I think an extra 24/48 hours is probably needed on it.
> > > > >
> > > >
> > > > Yes, now that we have the "test momentum", try not to loose it ;)
> > > >
> > > > I'm guilty to have missed the performance loss, but honnestly,
> > > > I'm a bit sad that nobody tried to this patchset before (it
> > > > is available for more than 2 months), knowing this is probably
> > > > one of the most critical part of dpdk. I think we need to be
> > > > better next time.
> > > >
> > > > Anyway, thank you for your test and feedback now.
> > >
> > > I am also leaning towards option 1, but agree that some extra
> > > testing first need to be done before making the final decision.
> > > BTW, path #9 need to be removed anyway, even if will go for path #1.
> > > Konstantin
> >
> > Please, can we have a conclusion now?
>
> I think we sholuld go with proposition 1, I can resubmit an updated
> patch today.
>
> This rework is needed at least for metrics libraries.
>
> To summarize the perf data we have:
> - There is a small impact on Intel NICs (-0.4MPPS on ixgbe in iofwd
> mode according to Konstantin's test, which is less than 1%). I guess
> it can be optimized.
> - On mlx5, there is a gain (+0.8MPPS).
> - On sfc, there is also a gain.
>
> Any comment?
>
> Olivier
Hi,
As you have probably seen from the patches I sent yesterday, there are
optimizations we can make to our i40e (and ixgbe) drivers on top of this
patchset which should compensate for any performance loss due to the
mbuf rework. Therefore, we are ok to have this merged, so long as our
PMD enhancements based on this set can also be merged (they are not
large, so I assume this should not be controvertial). The i40e patches
are on the list; an equivalent set for ixgbe should be submitted by
Konstantin shortly.
Regards,
/Bruce
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-31 8:59 ` Olivier Matz
2017-03-31 9:18 ` Ananyev, Konstantin
@ 2017-03-31 9:23 ` Bruce Richardson
1 sibling, 0 replies; 155+ messages in thread
From: Bruce Richardson @ 2017-03-31 9:23 UTC (permalink / raw)
To: Olivier Matz
Cc: Ananyev, Konstantin, dev, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko
On Fri, Mar 31, 2017 at 10:59:25AM +0200, Olivier Matz wrote:
> On Fri, 31 Mar 2017 09:41:39 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > On Fri, Mar 31, 2017 at 10:26:10AM +0200, Olivier Matz wrote:
> > > Hi,
> > >
> > > On Fri, 31 Mar 2017 01:00:49 +0000, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richardson, Bruce
> > > > > > > Sent: Thursday, March 30, 2017 1:23 PM
> > > > > > > To: Olivier Matz <olivier.matz@6wind.com>
> > > > > > > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; mb@smartsharesystems.com; Chilikin, Andrey
> > > > > > > <andrey.chilikin@intel.com>; jblunck@infradead.org; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com
> > > > > > > Subject: Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
> > > > > > >
> > > > > > > On Thu, Mar 30, 2017 at 02:02:36PM +0200, Olivier Matz wrote:
> > > > > > > > On Thu, 30 Mar 2017 10:31:08 +0100, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > > > > > > > On Wed, Mar 29, 2017 at 09:09:23PM +0100, Bruce Richardson wrote:
> > > > > > > > > > On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Does anyone have any other comment on this series?
> > > > > > > > > > > Can it be applied?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Olivier
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I assume all driver maintainers have done performance analysis to check
> > > > > > > > > > for regressions. Perhaps they can confirm this is the case.
> > > > > > > > > >
> > > > > > > > > > /Bruce
> > > > > > > > > > >
> > > > > > > > > In the absence, of anyone else reporting performance numbers with this
> > > > > > > > > patchset, I ran a single-thread testpmd test using 2 x 40G ports (i40e)
> > > > > > > > > driver. With RX & TX descriptor ring sizes of 512 or above, I'm seeing a
> > > > > > > > > fairly noticable performance drop. I still need to dig in more, e.g. do
> > > > > > > > > an RFC2544 zero-loss test, and also bisect the patchset to see what
> > > > > > > > > parts may be causing the problem.
> > > > > > > > >
> > > > > > > > > Has anyone else tried any other drivers or systems to see what the perf
> > > > > > > > > impact of this set may be?
> > > > > > > >
> > > > > > > > I did, of course. I didn't see any noticeable performance drop on
> > > > > > > > ixgbe (4 NICs, one port per NIC, 1 core). I can replay the test with
> > > > > > > > current version.
> > > > > > > >
> > > > > > > I had no doubt you did some perf testing! :-)
> > > > > > >
> > > > > > > Perhaps the regression I see is limited to i40e driver. I've confirmed I
> > > > > > > still see it with that driver in zero-loss tests, so next step is to try
> > > > > > > and localise what change in the patchset is causing it.
> > > > > > >
> > > > > > > Ideally, though, I think we should see acks or other comments from
> > > > > > > driver maintainers at least confirming that they have tested. You cannot
> > > > > > > be held responsible for testing every DPDK driver before you submit work
> > > > > > > like this.
> > > > > > >
> > > > > >
> > > > > > Unfortunately I also see a regression.
> > > > > > Did a quick flood test on 2.8 GHZ IVB with 4x10Gb.
> > > > >
> > > > > Sorry, forgot to mention - it is on ixgbe.
> > > > > So it doesn't look like i40e specific.
> > > > >
> > > > > > Observed a drop even with default testpmd RXD/TXD numbers (128/512):
> > > > > > from 50.8 Mpps down to 47.8 Mpps.
> > > > > > From what I am seeing the particular patch that causing it:
> > > > > > [dpdk-dev,3/9] mbuf: set mbuf fields while in pool
> > > > > >
> > > > > > cc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > > > > > cmdline:
> > > > > > ./dpdk.org-1705-mbuf1/x86_64-native-linuxapp-gcc/app/testpmd --lcores='7,8' -n 4 --socket-mem='1024,0' -w 04:00.1 -w 07:00.1 -w
> > > > > > 0b:00.1 -w 0e:00.1 -- -i
> > > > > >
> > > >
> > > > After applying the patch below got nearly original numbers (though not quite) on my box.
> > > > dpdk.org mainline: 50.8
> > > > with Olivier patch: 47.8
> > > > with patch below: 50.4
> > > > What I tried to do in it - avoid unnecessary updates of mbuf inside rte_pktmbuf_prefree_seg().
> > > > For one segment per packet it seems to help.
> > > > Though so far I didn't try it on i40e and didn't do any testing for multi-seg scenario.
> > > > Konstantin
> > >
> > > I replayed my tests, and I can also see a performance loss with 1c/1t
> > > (ixgbe), not in the same magnitude however. Here is what I have in MPPS:
> > >
> > > 1c/1t 1c/2t
> > > 53.3 58.7 current
> > > 52.1 58.8 original patchset
> > > 53.3 58.8 removed patches 3 and 9
> > > 53.1 58.7 with konstantin's patch
> > >
> > > So we have 2 options here:
> > >
> > > 1/ integrate Konstantin's patch in the patchset (thank you, by the way)
> > > 2/ remove patch 3, and keep it for later until we have something that
> > > really no impact
> > >
> > > I'd prefer 1/, knowing that the difference is really small in terms
> > > of cycles per packet.
> > >
> > >
> > 1 is certainly the more attractive option. However, I think we can
> > afford to spend a little more time looking at this before we decide.
> > I'll try and check out the perf numbers I get with i40e with
> > Konstantin's patch today. We also need to double check the other
> > possible issues he reported in his other emails. While I don't want this
> > patchset held up for a long time, I think an extra 24/48 hours is
> > probably needed on it.
> >
>
> Yes, now that we have the "test momentum", try not to loose it ;)
>
> I'm guilty to have missed the performance loss, but honnestly,
> I'm a bit sad that nobody tried to this patchset before (it
> is available for more than 2 months), knowing this is probably one of
> the most critical part of dpdk. I think we need to be better next
> time.
No disagreement here.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-29 15:56 ` [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization Olivier Matz
2017-03-29 16:03 ` Morten Brørup
2017-03-29 20:09 ` Bruce Richardson
@ 2017-03-31 11:18 ` Nélio Laranjeiro
2 siblings, 0 replies; 155+ messages in thread
From: Nélio Laranjeiro @ 2017-03-31 11:18 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, arybchenko
On Wed, Mar 29, 2017 at 05:56:29PM +0200, Olivier Matz wrote:
> Hi,
>
> Does anyone have any other comment on this series?
> Can it be applied?
>
>
> Thanks,
> Olivier
>
>
>
> On Wed, 8 Mar 2017 10:41:52 +0100, Olivier Matz <olivier.matz@6wind.com> wrote:
> > Based on discussions done in [1] and in this thread, this patchset reorganizes
> > the mbuf.
> >
> > The main changes are:
> > - reorder structure to increase vector performance on some non-ia
> > platforms.
> > - add a 64bits timestamp field in the 1st cache line. This timestamp
> > is not normalized, i.e. no unit or time reference is enforced. A
> > library may be added to do this job in the future.
> > - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> > in the pool, avoiding the need of setting m->next (located in the
> > 2nd cache line) in the Rx path for mono-segment packets.
> > - change port and nb_segs to 16 bits
> > - move seqn in the 2nd cache line
> >
> > Things discussed but not done in the patchset:
> > - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> > them in the Rx path, so it could introduce a performance regression, or
> > it would require to change all the drivers, which is not an easy task.
> > - remove the m->port field: too much impact on many examples and libraries,
> > and some people highlighted they are using it.
> > - moving m->next in the 1st cache line: there is not enough room, and having
> > it set to NULL for unused mbuf should remove the need for it.
> > - merge seqn and timestamp together in a union: we could imagine use cases
> > were both are activated. There is no flag indicating the presence of seqn,
> > so it looks preferable to keep them separated for now.
> >
> > I made some basic performance tests (ixgbe) and see no regression.
> > Other tests from NIC vendors are welcome.
> >
> > Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> > by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> > idea of what could be done.
> >
> > [1] http://dpdk.org/ml/archives/dev/2016-October/049338.html
> >
> > rfc->v1:
> > - fix reset of mbuf fields in case of indirect mbuf in rte_pktmbuf_prefree_seg()
> > - do not enforce a unit or time reference for m->timestamp
> > - reorganize fields to make vlan and outer vlan consecutive
> > - enhance documentation of m->refcnt and m->port to explain why they are 16bits
> >
> > Jerin Jacob (1):
> > mbuf: make rearm data address naturally aligned
> >
> > Olivier Matz (8):
> > mbuf: make segment prefree function public
> > mbuf: make raw free function public
> > mbuf: set mbuf fields while in pool
> > drivers/net: don't touch mbuf next or nb segs on Rx
> > mbuf: use 2 bytes for port and nb segments
> > mbuf: move sequence number in second cache line
> > mbuf: add a timestamp field
> > mbuf: reorder VLAN tci and buffer len fields
> >
> > app/test-pmd/csumonly.c | 4 +-
> > drivers/net/ena/ena_ethdev.c | 2 +-
> > drivers/net/enic/enic_rxtx.c | 2 +-
> > drivers/net/fm10k/fm10k_rxtx.c | 6 +-
> > drivers/net/fm10k/fm10k_rxtx_vec.c | 9 +-
> > drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +-
> > drivers/net/i40e/i40e_rxtx_vec_sse.c | 11 +-
> > drivers/net/ixgbe/ixgbe_rxtx.c | 10 +-
> > drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +-
> > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 9 --
> > drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 9 --
> > drivers/net/mlx5/mlx5_rxtx.c | 11 +-
> > drivers/net/mpipe/mpipe_tilegx.c | 3 +-
> > drivers/net/null/rte_eth_null.c | 2 -
> > drivers/net/virtio/virtio_rxtx.c | 4 -
> > drivers/net/virtio/virtio_rxtx_simple.h | 6 +-
> > .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 +-
> > lib/librte_mbuf/rte_mbuf.c | 4 +
> > lib/librte_mbuf/rte_mbuf.h | 123 ++++++++++++++++-----
> > 19 files changed, 130 insertions(+), 102 deletions(-)
> >
Tested-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
with mlx5 ConnectX-4 two ports with a single thread IO forwarding.
Olivier patches: increase performance by +0.4Mpps.
Olivier + Konstantin patches: increase performance by +0.8Mpps.
Regards,
--
Nélio Laranjeiro
6WIND
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (9 preceding siblings ...)
2017-03-29 15:56 ` [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization Olivier Matz
@ 2017-03-30 14:54 ` Andrew Rybchenko
2017-03-30 15:12 ` Jerin Jacob
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
12 siblings, 0 replies; 155+ messages in thread
From: Andrew Rybchenko @ 2017-03-30 14:54 UTC (permalink / raw)
To: Olivier Matz, dev
Cc: bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
On 03/08/2017 12:41 PM, Olivier Matz wrote:
> Based on discussions done in [1] and in this thread, this patchset reorganizes
> the mbuf.
>
> The main changes are:
> - reorder structure to increase vector performance on some non-ia
> platforms.
> - add a 64bits timestamp field in the 1st cache line. This timestamp
> is not normalized, i.e. no unit or time reference is enforced. A
> library may be added to do this job in the future.
> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> in the pool, avoiding the need of setting m->next (located in the
> 2nd cache line) in the Rx path for mono-segment packets.
> - change port and nb_segs to 16 bits
> - move seqn in the 2nd cache line
>
> Things discussed but not done in the patchset:
> - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> them in the Rx path, so it could introduce a performance regression, or
> it would require to change all the drivers, which is not an easy task.
> - remove the m->port field: too much impact on many examples and libraries,
> and some people highlighted they are using it.
> - moving m->next in the 1st cache line: there is not enough room, and having
> it set to NULL for unused mbuf should remove the need for it.
> - merge seqn and timestamp together in a union: we could imagine use cases
> were both are activated. There is no flag indicating the presence of seqn,
> so it looks preferable to keep them separated for now.
>
> I made some basic performance tests (ixgbe) and see no regression.
> Other tests from NIC vendors are welcome.
>
> Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> idea of what could be done.
>
> [1] http://dpdk.org/ml/archives/dev/2016-October/049338.html
>
> rfc->v1:
> - fix reset of mbuf fields in case of indirect mbuf in rte_pktmbuf_prefree_seg()
> - do not enforce a unit or time reference for m->timestamp
> - reorganize fields to make vlan and outer vlan consecutive
> - enhance documentation of m->refcnt and m->port to explain why they are 16bits
>
> Jerin Jacob (1):
> mbuf: make rearm data address naturally aligned
>
> Olivier Matz (8):
> mbuf: make segment prefree function public
> mbuf: make raw free function public
> mbuf: set mbuf fields while in pool
> drivers/net: don't touch mbuf next or nb segs on Rx
> mbuf: use 2 bytes for port and nb segments
> mbuf: move sequence number in second cache line
> mbuf: add a timestamp field
> mbuf: reorder VLAN tci and buffer len fields
>
> app/test-pmd/csumonly.c | 4 +-
> drivers/net/ena/ena_ethdev.c | 2 +-
> drivers/net/enic/enic_rxtx.c | 2 +-
> drivers/net/fm10k/fm10k_rxtx.c | 6 +-
> drivers/net/fm10k/fm10k_rxtx_vec.c | 9 +-
> drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +-
> drivers/net/i40e/i40e_rxtx_vec_sse.c | 11 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 10 +-
> drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +-
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 9 --
> drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 9 --
> drivers/net/mlx5/mlx5_rxtx.c | 11 +-
> drivers/net/mpipe/mpipe_tilegx.c | 3 +-
> drivers/net/null/rte_eth_null.c | 2 -
> drivers/net/virtio/virtio_rxtx.c | 4 -
> drivers/net/virtio/virtio_rxtx_simple.h | 6 +-
> .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 +-
> lib/librte_mbuf/rte_mbuf.c | 4 +
> lib/librte_mbuf/rte_mbuf.h | 123 ++++++++++++++++-----
> 19 files changed, 130 insertions(+), 102 deletions(-)
>
I see better performance with the patch series applied and next=NULL
assignments removed from net/sfc (waiting for the series applied to submit
corresponding patches). So the series:
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH 0/9] mbuf: structure reorganization
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (10 preceding siblings ...)
2017-03-30 14:54 ` Andrew Rybchenko
@ 2017-03-30 15:12 ` Jerin Jacob
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
12 siblings, 0 replies; 155+ messages in thread
From: Jerin Jacob @ 2017-03-30 15:12 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, bruce.richardson, konstantin.ananyev, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko
On Wed, Mar 08, 2017 at 10:41:52AM +0100, Olivier Matz wrote:
> Based on discussions done in [1] and in this thread, this patchset reorganizes
> the mbuf.
>
> The main changes are:
> - reorder structure to increase vector performance on some non-ia
> platforms.
> - add a 64bits timestamp field in the 1st cache line. This timestamp
> is not normalized, i.e. no unit or time reference is enforced. A
> library may be added to do this job in the future.
> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> in the pool, avoiding the need of setting m->next (located in the
> 2nd cache line) in the Rx path for mono-segment packets.
> - change port and nb_segs to 16 bits
> - move seqn in the 2nd cache line
>
> Things discussed but not done in the patchset:
> - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> them in the Rx path, so it could introduce a performance regression, or
> it would require to change all the drivers, which is not an easy task.
> - remove the m->port field: too much impact on many examples and libraries,
> and some people highlighted they are using it.
> - moving m->next in the 1st cache line: there is not enough room, and having
> it set to NULL for unused mbuf should remove the need for it.
> - merge seqn and timestamp together in a union: we could imagine use cases
> were both are activated. There is no flag indicating the presence of seqn,
> so it looks preferable to keep them separated for now.
>
> I made some basic performance tests (ixgbe) and see no regression.
> Other tests from NIC vendors are welcome.
>
> Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> idea of what could be done.
>
> [1] http://dpdk.org/ml/archives/dev/2016-October/049338.html
>
> rfc->v1:
> - fix reset of mbuf fields in case of indirect mbuf in rte_pktmbuf_prefree_seg()
> - do not enforce a unit or time reference for m->timestamp
> - reorganize fields to make vlan and outer vlan consecutive
> - enhance documentation of m->refcnt and m->port to explain why they are 16bits
>
> Jerin Jacob (1):
> mbuf: make rearm data address naturally aligned
>
> Olivier Matz (8):
> mbuf: make segment prefree function public
> mbuf: make raw free function public
> mbuf: set mbuf fields while in pool
> drivers/net: don't touch mbuf next or nb segs on Rx
> mbuf: use 2 bytes for port and nb segments
> mbuf: move sequence number in second cache line
> mbuf: add a timestamp field
> mbuf: reorder VLAN tci and buffer len fields
>
> app/test-pmd/csumonly.c | 4 +-
> drivers/net/ena/ena_ethdev.c | 2 +-
> drivers/net/enic/enic_rxtx.c | 2 +-
> drivers/net/fm10k/fm10k_rxtx.c | 6 +-
> drivers/net/fm10k/fm10k_rxtx_vec.c | 9 +-
> drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +-
> drivers/net/i40e/i40e_rxtx_vec_sse.c | 11 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 10 +-
> drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +-
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 9 --
> drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 9 --
> drivers/net/mlx5/mlx5_rxtx.c | 11 +-
> drivers/net/mpipe/mpipe_tilegx.c | 3 +-
> drivers/net/null/rte_eth_null.c | 2 -
> drivers/net/virtio/virtio_rxtx.c | 4 -
> drivers/net/virtio/virtio_rxtx_simple.h | 6 +-
> .../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 +-
> lib/librte_mbuf/rte_mbuf.c | 4 +
> lib/librte_mbuf/rte_mbuf.h | 123 ++++++++++++++++-----
> 19 files changed, 130 insertions(+), 102 deletions(-)
No performance regression on this series on arm64 + thunderx PMD combo.
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>
> --
> 2.8.1
>
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-03-08 9:41 ` [dpdk-dev] [PATCH 0/9] " Olivier Matz
` (11 preceding siblings ...)
2017-03-30 15:12 ` Jerin Jacob
@ 2017-04-04 16:27 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 1/8] mbuf: make segment prefree function public Olivier Matz
` (8 more replies)
12 siblings, 9 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:27 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Based on discussions done in [1] and in this thread, this patchset reorganizes
the mbuf.
The main changes are:
- reorder structure to increase vector performance on some non-ia
platforms.
- add a 64bits timestamp field in the 1st cache line. This timestamp
is not normalized, i.e. no unit or time reference is enforced. A
library may be added to do this job in the future.
- m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
in the pool, avoiding the need of setting m->next (located in the
2nd cache line) in the Rx path for mono-segment packets.
- change port and nb_segs to 16 bits
- move seqn in the 2nd cache line
Things discussed but not done in the patchset:
- move refcnt and nb_segs to the 2nd cache line: many drivers sets
them in the Rx path, so it could introduce a performance regression, or
it would require to change all the drivers, which is not an easy task.
- remove the m->port field: too much impact on many examples and libraries,
and some people highlighted they are using it.
- moving m->next in the 1st cache line: there is not enough room, and having
it set to NULL for unused mbuf should remove the need for it.
- merge seqn and timestamp together in a union: we could imagine use cases
were both are activated. There is no flag indicating the presence of seqn,
so it looks preferable to keep them separated for now.
Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
idea of what could be done.
[1] http://dpdk.org/ml/archives/dev/2016-October/049338.html
v1->v2:
- remove reordering of vlan fields as it breaks pmd vector code
- optimize rte_pktmbuf_prefree_seg()
rfc->v1:
- fix reset of mbuf fields in case of indirect mbuf in rte_pktmbuf_prefree_seg()
- do not enforce a unit or time reference for m->timestamp
- reorganize fields to make vlan and outer vlan consecutive
- enhance documentation of m->refcnt and m->port to explain why they are 16bits
Jerin Jacob (1):
mbuf: make rearm data address naturally aligned
Olivier Matz (7):
mbuf: make segment prefree function public
mbuf: make raw free function public
mbuf: set mbuf fields while in pool
drivers/net: don't touch mbuf next or nb segs on Rx
mbuf: use 2 bytes for port and nb segments
mbuf: move sequence number in second cache line
mbuf: add a timestamp field
app/test-pmd/csumonly.c | 4 +-
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/enic/enic_rxtx.c | 2 +-
drivers/net/fm10k/fm10k_rxtx.c | 6 +-
drivers/net/fm10k/fm10k_rxtx_vec.c | 9 +-
drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +-
drivers/net/i40e/i40e_rxtx_vec_sse.c | 11 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 10 +-
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +-
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 9 --
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 9 --
drivers/net/mlx5/mlx5_rxtx.c | 11 +-
drivers/net/null/rte_eth_null.c | 2 -
drivers/net/virtio/virtio_rxtx.c | 4 -
drivers/net/virtio/virtio_rxtx_simple.h | 6 +-
.../linuxapp/eal/include/exec-env/rte_kni_common.h | 5 +-
lib/librte_mbuf/rte_mbuf.c | 4 +
lib/librte_mbuf/rte_mbuf.h | 138 +++++++++++++++++----
18 files changed, 144 insertions(+), 100 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 1/8] mbuf: make segment prefree function public
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 2/8] mbuf: make raw free " Olivier Matz
` (7 subsequent siblings)
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Document the function and make it public, since it is used at several
places in the drivers. The old one is marked as deprecated.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/enic/enic_rxtx.c | 2 +-
drivers/net/fm10k/fm10k_rxtx.c | 6 +++---
drivers/net/fm10k/fm10k_rxtx_vec.c | 6 +++---
drivers/net/i40e/i40e_rxtx_vec_common.h | 6 +++---
drivers/net/ixgbe/ixgbe_rxtx.c | 2 +-
drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 6 +++---
drivers/net/virtio/virtio_rxtx_simple.h | 6 +++---
lib/librte_mbuf/rte_mbuf.h | 30 +++++++++++++++++++++++++++---
8 files changed, 44 insertions(+), 20 deletions(-)
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 343dabc64..1ee5cbb50 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -473,7 +473,7 @@ static inline void enic_free_wq_bufs(struct vnic_wq *wq, u16 completed_index)
pool = ((struct rte_mbuf *)buf->mb)->pool;
for (i = 0; i < nb_to_free; i++) {
buf = &wq->bufs[tail_idx];
- m = __rte_pktmbuf_prefree_seg((struct rte_mbuf *)(buf->mb));
+ m = rte_pktmbuf_prefree_seg((struct rte_mbuf *)(buf->mb));
buf->mb = NULL;
if (unlikely(m == NULL)) {
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 144e5e6b1..c9bb04a0e 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -434,12 +434,12 @@ static inline void tx_free_bulk_mbuf(struct rte_mbuf **txep, int num)
if (unlikely(num == 0))
return;
- m = __rte_pktmbuf_prefree_seg(txep[0]);
+ m = rte_pktmbuf_prefree_seg(txep[0]);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < num; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -455,7 +455,7 @@ static inline void tx_free_bulk_mbuf(struct rte_mbuf **txep, int num)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < num; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (m != NULL)
rte_mempool_put(m->pool, m);
txep[i] = NULL;
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 27f3e43ff..825e3c125 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -754,12 +754,12 @@ fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
* next_dd - (rs_thresh-1)
*/
txep = &txq->sw_ring[txq->next_dd - (n - 1)];
- m = __rte_pktmbuf_prefree_seg(txep[0]);
+ m = rte_pktmbuf_prefree_seg(txep[0]);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -774,7 +774,7 @@ fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i]);
+ m = rte_pktmbuf_prefree_seg(txep[i]);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 374555896..76031fe2c 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -123,12 +123,12 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
* tx_next_dd - (tx_rs_thresh-1)
*/
txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
- m = __rte_pktmbuf_prefree_seg(txep[0].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool)) {
free[nb_free++] = m;
@@ -144,7 +144,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 52e5c9737..879e215f6 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -142,7 +142,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
/* free buffers one at a time */
- m = __rte_pktmbuf_prefree_seg(txep->mbuf);
+ m = rte_pktmbuf_prefree_seg(txep->mbuf);
txep->mbuf = NULL;
if (unlikely(m == NULL))
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
index a3473b985..a83afe520 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h
@@ -123,12 +123,12 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
* tx_next_dd - (tx_rs_thresh-1)
*/
txep = &txq->sw_ring_v[txq->tx_next_dd - (n - 1)];
- m = __rte_pktmbuf_prefree_seg(txep[0].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -143,7 +143,7 @@ ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
} else {
for (i = 1; i < n; i++) {
- m = __rte_pktmbuf_prefree_seg(txep[i].mbuf);
+ m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/drivers/net/virtio/virtio_rxtx_simple.h b/drivers/net/virtio/virtio_rxtx_simple.h
index b08f85948..f531c5428 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.h
+++ b/drivers/net/virtio/virtio_rxtx_simple.h
@@ -98,13 +98,13 @@ virtio_xmit_cleanup(struct virtqueue *vq)
desc_idx = (uint16_t)(vq->vq_used_cons_idx &
((vq->vq_nentries >> 1) - 1));
m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
- m = __rte_pktmbuf_prefree_seg(m);
+ m = rte_pktmbuf_prefree_seg(m);
if (likely(m != NULL)) {
free[0] = m;
nb_free = 1;
for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
- m = __rte_pktmbuf_prefree_seg(m);
+ m = rte_pktmbuf_prefree_seg(m);
if (likely(m != NULL)) {
if (likely(m->pool == free[0]->pool))
free[nb_free++] = m;
@@ -123,7 +123,7 @@ virtio_xmit_cleanup(struct virtqueue *vq)
} else {
for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
- m = __rte_pktmbuf_prefree_seg(m);
+ m = rte_pktmbuf_prefree_seg(m);
if (m != NULL)
rte_mempool_put(m->pool, m);
}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index fd5d32a2c..e15378567 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1220,8 +1220,23 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
__rte_mbuf_raw_free(md);
}
-static inline struct rte_mbuf* __attribute__((always_inline))
-__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
+/**
+ * Decrease reference counter and unlink a mbuf segment
+ *
+ * This function does the same than a free, except that it does not
+ * return the segment to its pool.
+ * It decreases the reference counter, and if it reaches 0, it is
+ * detached from its parent for an indirect mbuf.
+ *
+ * @param m
+ * The mbuf to be unlinked
+ * @return
+ * - (m) if it is the last reference. It can be recycled or freed.
+ * - (NULL) if the mbuf still has remaining references on it.
+ */
+__attribute__((always_inline))
+static inline struct rte_mbuf *
+rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
__rte_mbuf_sanity_check(m, 0);
@@ -1234,6 +1249,14 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
return NULL;
}
+/* deprecated, replaced by rte_pktmbuf_prefree_seg() */
+__rte_deprecated
+static inline struct rte_mbuf *
+__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
+{
+ return rte_pktmbuf_prefree_seg(m);
+}
+
/**
* Free a segment of a packet mbuf into its original mempool.
*
@@ -1246,7 +1269,8 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
static inline void __attribute__((always_inline))
rte_pktmbuf_free_seg(struct rte_mbuf *m)
{
- if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
+ m = rte_pktmbuf_prefree_seg(m);
+ if (likely(m != NULL)) {
m->next = NULL;
__rte_mbuf_raw_free(m);
}
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 2/8] mbuf: make raw free function public
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 1/8] mbuf: make segment prefree function public Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 3/8] mbuf: set mbuf fields while in pool Olivier Matz
` (6 subsequent siblings)
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Rename __rte_mbuf_raw_free() as rte_mbuf_raw_free() and make
it public. The old function is kept for compat but is marked as
deprecated.
The next commit changes the behavior of rte_mbuf_raw_free() to
make it more consistent with rte_mbuf_raw_alloc().
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/mlx5/mlx5_rxtx.c | 6 +++---
lib/librte_mbuf/rte_mbuf.h | 22 ++++++++++++++++------
3 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index b5e6db624..5dd44d778 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -680,7 +680,7 @@ static void ena_rx_queue_release_bufs(struct ena_ring *ring)
ring->rx_buffer_info[ring->next_to_clean & ring_mask];
if (m)
- __rte_mbuf_raw_free(m);
+ rte_mbuf_raw_free(m);
ring->next_to_clean++;
}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index b8d2bf628..0cbf98ffd 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1475,7 +1475,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(pkt != (*rxq->elts)[idx]);
rep = NEXT(pkt);
rte_mbuf_refcnt_set(pkt, 0);
- __rte_mbuf_raw_free(pkt);
+ rte_mbuf_raw_free(pkt);
pkt = rep;
}
break;
@@ -1486,13 +1486,13 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
&rss_hash_res);
if (!len) {
rte_mbuf_refcnt_set(rep, 0);
- __rte_mbuf_raw_free(rep);
+ rte_mbuf_raw_free(rep);
break;
}
if (unlikely(len == -1)) {
/* RX error, packet is likely too large. */
rte_mbuf_refcnt_set(rep, 0);
- __rte_mbuf_raw_free(rep);
+ rte_mbuf_raw_free(rep);
++rxq->stats.idropped;
goto skip;
}
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index e15378567..2dc4d8b98 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -797,20 +797,30 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
}
/**
- * @internal Put mbuf back into its original mempool.
- * The use of that function is reserved for RTE internal needs.
- * Please use rte_pktmbuf_free().
+ * Put mbuf back into its original mempool.
+ *
+ * The caller must ensure that the mbuf is direct and that the
+ * reference counter is 0.
*
* @param m
* The mbuf to be freed.
*/
static inline void __attribute__((always_inline))
-__rte_mbuf_raw_free(struct rte_mbuf *m)
+rte_mbuf_raw_free(struct rte_mbuf *m)
{
+ RTE_ASSERT(RTE_MBUF_DIRECT(m));
RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
rte_mempool_put(m->pool, m);
}
+/* compat with older versions */
+__rte_deprecated
+static inline void __attribute__((always_inline))
+__rte_mbuf_raw_free(struct rte_mbuf *m)
+{
+ rte_mbuf_raw_free(m);
+}
+
/* Operations on ctrl mbuf */
/**
@@ -1217,7 +1227,7 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
m->ol_flags = 0;
if (rte_mbuf_refcnt_update(md, -1) == 0)
- __rte_mbuf_raw_free(md);
+ rte_mbuf_raw_free(md);
}
/**
@@ -1272,7 +1282,7 @@ rte_pktmbuf_free_seg(struct rte_mbuf *m)
m = rte_pktmbuf_prefree_seg(m);
if (likely(m != NULL)) {
m->next = NULL;
- __rte_mbuf_raw_free(m);
+ rte_mbuf_raw_free(m);
}
}
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 3/8] mbuf: set mbuf fields while in pool
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 1/8] mbuf: make segment prefree function public Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 2/8] mbuf: make raw free " Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 4/8] drivers/net: don't touch mbuf next or nb segs on Rx Olivier Matz
` (5 subsequent siblings)
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
to NULL when the mbuf is stored inside the mempool (unused).
This is done in rte_pktmbuf_prefree_seg(), before freeing or
recycling a mbuf.
Before this patch, the value of m->refcnt was expected to be 0
while in pool.
The objectives are:
- to avoid drivers to set m->next to NULL in the early Rx path, since
this field is in the second 64B of the mbuf and its access could
trigger a cache miss
- rationalize the behavior of raw_alloc/raw_free: one is now the
symmetric of the other, and refcnt is never changed in these functions.
To optimize the freeing of the segments, we try try to only update
m->refcnt, m->next, and m->nb_segs when it's required (idea from
Konstantin Ananyev <konstantin.ananyev@intel.com>).
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/mlx5/mlx5_rxtx.c | 5 ++--
lib/librte_mbuf/rte_mbuf.c | 2 ++
lib/librte_mbuf/rte_mbuf.h | 60 +++++++++++++++++++++++++++++++++++---------
3 files changed, 52 insertions(+), 15 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 0cbf98ffd..e048b8d0e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1474,7 +1474,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
while (pkt != seg) {
assert(pkt != (*rxq->elts)[idx]);
rep = NEXT(pkt);
- rte_mbuf_refcnt_set(pkt, 0);
+ NEXT(pkt) = NULL;
+ NB_SEGS(pkt) = 1;
rte_mbuf_raw_free(pkt);
pkt = rep;
}
@@ -1485,13 +1486,11 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
len = mlx5_rx_poll_len(rxq, cqe, cqe_cnt,
&rss_hash_res);
if (!len) {
- rte_mbuf_refcnt_set(rep, 0);
rte_mbuf_raw_free(rep);
break;
}
if (unlikely(len == -1)) {
/* RX error, packet is likely too large. */
- rte_mbuf_refcnt_set(rep, 0);
rte_mbuf_raw_free(rep);
++rxq->stats.idropped;
goto skip;
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 3fb2700ba..207bf3dd3 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -146,6 +146,8 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->pool = mp;
m->nb_segs = 1;
m->port = 0xff;
+ rte_mbuf_refcnt_set(m, 1);
+ m->next = NULL;
}
/* helper to create a mbuf pool */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 2dc4d8b98..1efebec7c 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -775,6 +775,11 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
* initializing all the required fields. See rte_pktmbuf_reset().
* For standard needs, prefer rte_pktmbuf_alloc().
*
+ * The caller can expect that the following fields of the mbuf structure
+ * are initialized: buf_addr, buf_physaddr, buf_len, refcnt=1, nb_segs=1,
+ * next=NULL, pool, priv_size. The other fields must be initialized
+ * by the caller.
+ *
* @param mp
* The mempool from which mbuf is allocated.
* @return
@@ -789,8 +794,9 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
if (rte_mempool_get(mp, &mb) < 0)
return NULL;
m = (struct rte_mbuf *)mb;
- RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
- rte_mbuf_refcnt_set(m, 1);
+ RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+ RTE_ASSERT(m->next == NULL);
+ RTE_ASSERT(m->nb_segs == 1);
__rte_mbuf_sanity_check(m, 0);
return m;
@@ -799,8 +805,13 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
/**
* Put mbuf back into its original mempool.
*
- * The caller must ensure that the mbuf is direct and that the
- * reference counter is 0.
+ * The caller must ensure that the mbuf is direct and properly
+ * reinitialized (refcnt=1, next=NULL, nb_segs=1), as done by
+ * rte_pktmbuf_prefree_seg().
+ *
+ * This function should be used with care, when optimization is
+ * required. For standard needs, prefer rte_pktmbuf_free() or
+ * rte_pktmbuf_free_seg().
*
* @param m
* The mbuf to be freed.
@@ -809,13 +820,16 @@ static inline void __attribute__((always_inline))
rte_mbuf_raw_free(struct rte_mbuf *m)
{
RTE_ASSERT(RTE_MBUF_DIRECT(m));
- RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
+ RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+ RTE_ASSERT(m->next == NULL);
+ RTE_ASSERT(m->nb_segs == 1);
+ __rte_mbuf_sanity_check(m, 0);
rte_mempool_put(m->pool, m);
}
/* compat with older versions */
__rte_deprecated
-static inline void __attribute__((always_inline))
+static inline void
__rte_mbuf_raw_free(struct rte_mbuf *m)
{
rte_mbuf_raw_free(m);
@@ -1226,8 +1240,12 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
m->data_len = 0;
m->ol_flags = 0;
- if (rte_mbuf_refcnt_update(md, -1) == 0)
+ if (rte_mbuf_refcnt_update(md, -1) == 0) {
+ md->next = NULL;
+ md->nb_segs = 1;
+ rte_mbuf_refcnt_set(md, 1);
rte_mbuf_raw_free(md);
+ }
}
/**
@@ -1250,10 +1268,30 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
__rte_mbuf_sanity_check(m, 0);
- if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
- /* if this is an indirect mbuf, it is detached. */
+ if (likely(rte_mbuf_refcnt_read(m) == 1)) {
+
if (RTE_MBUF_INDIRECT(m))
rte_pktmbuf_detach(m);
+
+ if (m->next != NULL) {
+ m->next = NULL;
+ m->nb_segs = 1;
+ }
+
+ return m;
+
+ } else if (rte_atomic16_add_return(&m->refcnt_atomic, -1) == 0) {
+
+
+ if (RTE_MBUF_INDIRECT(m))
+ rte_pktmbuf_detach(m);
+
+ if (m->next != NULL) {
+ m->next = NULL;
+ m->nb_segs = 1;
+ }
+ rte_mbuf_refcnt_set(m, 1);
+
return m;
}
return NULL;
@@ -1280,10 +1318,8 @@ static inline void __attribute__((always_inline))
rte_pktmbuf_free_seg(struct rte_mbuf *m)
{
m = rte_pktmbuf_prefree_seg(m);
- if (likely(m != NULL)) {
- m->next = NULL;
+ if (likely(m != NULL))
rte_mbuf_raw_free(m);
- }
}
/**
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 4/8] drivers/net: don't touch mbuf next or nb segs on Rx
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
` (2 preceding siblings ...)
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 3/8] mbuf: set mbuf fields while in pool Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 5/8] mbuf: make rearm data address naturally aligned Olivier Matz
` (4 subsequent siblings)
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Now that the m->next pointer and m->nb_segs is expected to be set (to
NULL and 1 respectively) after a mempool_get(), we can avoid to write them
in the Rx functions of drivers.
Only some drivers are patched, it's not an exhaustive patch. It gives
the idea to do the same in other drivers.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/i40e/i40e_rxtx_vec_sse.c | 6 ------
drivers/net/ixgbe/ixgbe_rxtx.c | 8 --------
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 6 ------
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 6 ------
drivers/net/null/rte_eth_null.c | 2 --
drivers/net/virtio/virtio_rxtx.c | 4 ----
6 files changed, 32 deletions(-)
diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c
index b95cc8e19..2f861fde8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
@@ -424,12 +424,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
/* store the resulting 32-bit value */
*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
split_packet += RTE_I40E_DESCS_PER_LOOP;
-
- /* zero-out next pointers */
- rx_pkts[pos]->next = NULL;
- rx_pkts[pos + 1]->next = NULL;
- rx_pkts[pos + 2]->next = NULL;
- rx_pkts[pos + 3]->next = NULL;
}
/* C.3 calc available number of desc */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 879e215f6..5023617a2 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1556,8 +1556,6 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
/* populate the static rte mbuf fields */
mb = rxep[i].mbuf;
if (reset_mbuf) {
- mb->next = NULL;
- mb->nb_segs = 1;
mb->port = rxq->port_id;
}
@@ -2165,12 +2163,6 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
goto next_desc;
}
- /*
- * This is the last buffer of the received packet - return
- * the current cluster to the user.
- */
- rxm->next = NULL;
-
/* Initialize the first mbuf of the returned packet */
ixgbe_fill_cluster_head_buf(first_seg, &rxd, rxq, staterr);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index e2715cb96..2c0416179 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -330,12 +330,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
*(int *)split_packet = ~stat & IXGBE_VPMD_DESC_EOP_MASK;
split_packet += RTE_IXGBE_DESCS_PER_LOOP;
-
- /* zero-out next pointers */
- rx_pkts[pos]->next = NULL;
- rx_pkts[pos + 1]->next = NULL;
- rx_pkts[pos + 2]->next = NULL;
- rx_pkts[pos + 3]->next = NULL;
}
rte_prefetch_non_temporal(rxdp + RTE_IXGBE_DESCS_PER_LOOP);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index abbf2841f..65c5da3c7 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -425,12 +425,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
/* store the resulting 32-bit value */
*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
split_packet += RTE_IXGBE_DESCS_PER_LOOP;
-
- /* zero-out next pointers */
- rx_pkts[pos]->next = NULL;
- rx_pkts[pos + 1]->next = NULL;
- rx_pkts[pos + 2]->next = NULL;
- rx_pkts[pos + 3]->next = NULL;
}
/* C.3 calc available number of desc */
diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index 57203e2ed..7e14da0e0 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -112,8 +112,6 @@ eth_null_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
break;
bufs[i]->data_len = (uint16_t)packet_size;
bufs[i]->pkt_len = packet_size;
- bufs[i]->nb_segs = 1;
- bufs[i]->next = NULL;
bufs[i]->port = h->internals->port_id;
}
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index cab6e8fc0..b3e6d8027 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -772,8 +772,6 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
rxm->ol_flags = 0;
rxm->vlan_tci = 0;
- rxm->nb_segs = 1;
- rxm->next = NULL;
rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
rxm->data_len = (uint16_t)(len[i] - hdr_size);
@@ -900,7 +898,6 @@ virtio_recv_mergeable_pkts(void *rx_queue,
rxm->data_off = RTE_PKTMBUF_HEADROOM;
rxm->nb_segs = seg_num;
- rxm->next = NULL;
rxm->ol_flags = 0;
rxm->vlan_tci = 0;
rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
@@ -945,7 +942,6 @@ virtio_recv_mergeable_pkts(void *rx_queue,
rxm = rcv_pkts[extra_idx];
rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
- rxm->next = NULL;
rxm->pkt_len = (uint32_t)(len[extra_idx]);
rxm->data_len = (uint16_t)(len[extra_idx]);
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 5/8] mbuf: make rearm data address naturally aligned
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
` (3 preceding siblings ...)
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 4/8] drivers/net: don't touch mbuf next or nb segs on Rx Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments Olivier Matz
` (3 subsequent siblings)
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To avoid multiple stores on fast path, Ethernet drivers
aggregate the writes to data_off, refcnt, nb_segs and port
to an uint64_t data and write the data in one shot
with uint64_t* at &mbuf->rearm_data address.
Some of the non-IA platforms have store operation overhead
if the store address is not naturally aligned.This patch
fixes the performance issue on those targets.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/fm10k/fm10k_rxtx_vec.c | 3 ---
drivers/net/i40e/i40e_rxtx_vec_sse.c | 5 +----
drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 3 ---
drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 ---
lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 3 +--
lib/librte_mbuf/rte_mbuf.h | 6 +++---
6 files changed, 5 insertions(+), 18 deletions(-)
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 825e3c125..61a65e9bf 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -324,9 +324,6 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
/* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
p0 = (uintptr_t)&mb0->rearm_data;
*(uint64_t *)p0 = rxq->mbuf_initializer;
diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c
index 2f861fde8..e17235abf 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c
@@ -87,11 +87,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
mb0 = rxep[0].mbuf;
mb1 = rxep[1].mbuf;
- /* Flush mbuf with pkt template.
+ /* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
p0 = (uintptr_t)&mb0->rearm_data;
*(uint64_t *)p0 = rxq->mbuf_initializer;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 2c0416179..bc8924fbb 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -85,9 +85,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
/*
* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
vst1_u8((uint8_t *)&mb0->rearm_data, p);
paddr = mb0->buf_physaddr + RTE_PKTMBUF_HEADROOM;
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
index 65c5da3c7..62afe3100 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
@@ -90,9 +90,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
/*
* Flush mbuf with pkt template.
* Data to be rearmed is 6 bytes long.
- * Though, RX will overwrite ol_flags that are coming next
- * anyway. So overwrite whole 8 bytes with one load:
- * 6 bytes of rearm_data plus first 2 bytes of ol_flags.
*/
p0 = (uintptr_t)&mb0->rearm_data;
*(uint64_t *)p0 = rxq->mbuf_initializer;
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 09713b0c2..f24f79fa2 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -116,11 +116,10 @@ struct rte_kni_fifo {
struct rte_kni_mbuf {
void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
uint64_t buf_physaddr;
- char pad0[2];
uint16_t data_off; /**< Start address of data in segment buffer. */
char pad1[2];
uint8_t nb_segs; /**< Number of segments. */
- char pad4[1];
+ char pad4[3];
uint64_t ol_flags; /**< Offload features. */
char pad2[4];
uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 1efebec7c..4ef27f92a 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -400,10 +400,8 @@ struct rte_mbuf {
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
- uint16_t buf_len; /**< Length of segment buffer. */
-
/* next 6 bytes are initialised on RX descriptor rearm */
- MARKER8 rearm_data;
+ MARKER64 rearm_data;
uint16_t data_off;
/**
@@ -421,6 +419,7 @@ struct rte_mbuf {
};
uint8_t nb_segs; /**< Number of segments. */
uint8_t port; /**< Input port. */
+ uint16_t pad; /**< 2B pad for naturally aligned ol_flags */
uint64_t ol_flags; /**< Offload features. */
@@ -481,6 +480,7 @@ struct rte_mbuf {
/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
uint16_t vlan_tci_outer;
+ uint16_t buf_len; /**< Length of segment buffer. */
/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_min_aligned;
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
` (4 preceding siblings ...)
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 5/8] mbuf: make rearm data address naturally aligned Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-06 5:45 ` Yuanhan Liu
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 7/8] mbuf: move sequence number in second cache line Olivier Matz
` (2 subsequent siblings)
8 siblings, 1 reply; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Change the size of m->port and m->nb_segs to 16 bits. It is now possible
to reference a port identifier larger than 256 and have a mbuf chain
larger than 256 segments.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
app/test-pmd/csumonly.c | 4 ++--
.../linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++--
lib/librte_mbuf/rte_mbuf.h | 12 +++++++-----
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 88cc84205..5eaff9b2f 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -583,7 +583,7 @@ pkt_copy_split(const struct rte_mbuf *pkt)
rc = mbuf_copy_split(pkt, md, seglen, nb_seg);
if (rc < 0)
RTE_LOG(ERR, USER1,
- "mbuf_copy_split for %p(len=%u, nb_seg=%hhu) "
+ "mbuf_copy_split for %p(len=%u, nb_seg=%u) "
"into %u segments failed with error code: %d\n",
pkt, pkt->pkt_len, pkt->nb_segs, nb_seg, rc);
@@ -801,7 +801,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
char buf[256];
printf("-----------------\n");
- printf("port=%u, mbuf=%p, pkt_len=%u, nb_segs=%hhu:\n",
+ printf("port=%u, mbuf=%p, pkt_len=%u, nb_segs=%u:\n",
fs->rx_port, m, m->pkt_len, m->nb_segs);
/* dump rx parsed packet info */
rte_get_rx_ol_flag_list(rx_ol_flags, buf, sizeof(buf));
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index f24f79fa2..2ac879fdd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -118,8 +118,8 @@ struct rte_kni_mbuf {
uint64_t buf_physaddr;
uint16_t data_off; /**< Start address of data in segment buffer. */
char pad1[2];
- uint8_t nb_segs; /**< Number of segments. */
- char pad4[3];
+ uint16_t nb_segs; /**< Number of segments. */
+ char pad4[2];
uint64_t ol_flags; /**< Offload features. */
char pad2[4];
uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 4ef27f92a..323a1ac16 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -400,12 +400,13 @@ struct rte_mbuf {
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
- /* next 6 bytes are initialised on RX descriptor rearm */
+ /* next 8 bytes are initialised on RX descriptor rearm */
MARKER64 rearm_data;
uint16_t data_off;
/**
- * 16-bit Reference counter.
+ * Reference counter. Its size should at least equal to the size
+ * of port field (16 bits), to support zero-copy broadcast.
* It should only be accessed using the following functions:
* rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
* rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
@@ -417,9 +418,10 @@ struct rte_mbuf {
rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
uint16_t refcnt; /**< Non-atomically accessed refcnt */
};
- uint8_t nb_segs; /**< Number of segments. */
- uint8_t port; /**< Input port. */
- uint16_t pad; /**< 2B pad for naturally aligned ol_flags */
+ uint16_t nb_segs; /**< Number of segments. */
+
+ /** Input port (16 bits to support more than 256 virtual ports). */
+ uint16_t port;
uint64_t ol_flags; /**< Offload features. */
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments Olivier Matz
@ 2017-04-06 5:45 ` Yuanhan Liu
2017-04-18 13:03 ` Olivier MATZ
0 siblings, 1 reply; 155+ messages in thread
From: Yuanhan Liu @ 2017-04-06 5:45 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Hi Olivier,
On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> Change the size of m->port and m->nb_segs to 16 bits.
But all the ethdev APIs are still using 8 bits. 16 bits won't really
take effect without updating those APIs. Any plans?
--yliu
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-04-06 5:45 ` Yuanhan Liu
@ 2017-04-18 13:03 ` Olivier MATZ
2017-07-04 7:54 ` Wang, Zhihong
0 siblings, 1 reply; 155+ messages in thread
From: Olivier MATZ @ 2017-04-18 13:03 UTC (permalink / raw)
To: Yuanhan Liu
Cc: dev, konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Hi Yuanhan,
On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> Hi Olivier,
>
> On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> > Change the size of m->port and m->nb_segs to 16 bits.
>
> But all the ethdev APIs are still using 8 bits. 16 bits won't really
> take effect without updating those APIs. Any plans?
>
> --yliu
Yes, there is some work in ethdev, drivers and in example apps to
make the change effective. I think we could define a specific type for
a port number, maybe rte_eth_port_num_t. Using this type could be a
first step (for 17.08) before switching to 16 bits (17.11?).
I'll do the change and send a rfc.
Regards,
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-04-18 13:03 ` Olivier MATZ
@ 2017-07-04 7:54 ` Wang, Zhihong
2017-07-10 8:00 ` Olivier Matz
0 siblings, 1 reply; 155+ messages in thread
From: Wang, Zhihong @ 2017-07-04 7:54 UTC (permalink / raw)
To: Olivier MATZ, Yuanhan Liu
Cc: dev, Ananyev, Konstantin, Richardson, Bruce, mb, Chilikin,
Andrey, jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Tuesday, April 18, 2017 9:03 PM
> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>;
> mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>;
> jblunck@infradead.org; nelio.laranjeiro@6wind.com;
> arybchenko@solarflare.com; thomas.monjalon@6wind.com;
> jerin.jacob@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb
> segments
>
> Hi Yuanhan,
>
> On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
> <yuanhan.liu@linux.intel.com> wrote:
> > Hi Olivier,
> >
> > On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> > > Change the size of m->port and m->nb_segs to 16 bits.
> >
> > But all the ethdev APIs are still using 8 bits. 16 bits won't really
> > take effect without updating those APIs. Any plans?
> >
> > --yliu
>
> Yes, there is some work in ethdev, drivers and in example apps to
> make the change effective. I think we could define a specific type for
> a port number, maybe rte_eth_port_num_t. Using this type could be a
> first step (for 17.08) before switching to 16 bits (17.11?).
>
> I'll do the change and send a rfc.
Ping ;) Is this still in your plan?
Thanks
Zhihong
>
> Regards,
> Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-04 7:54 ` Wang, Zhihong
@ 2017-07-10 8:00 ` Olivier Matz
2017-07-10 8:15 ` Morten Brørup
0 siblings, 1 reply; 155+ messages in thread
From: Olivier Matz @ 2017-07-10 8:00 UTC (permalink / raw)
To: Wang, Zhihong
Cc: Yuanhan Liu, dev, Ananyev, Konstantin, Richardson, Bruce, mb,
Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko,
thomas.monjalon, jerin.jacob
Hi,
On Tue, 4 Jul 2017 07:54:23 +0000, "Wang, Zhihong" <zhihong.wang@intel.com> wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > Sent: Tuesday, April 18, 2017 9:03 PM
> > To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > Richardson, Bruce <bruce.richardson@intel.com>;
> > mb@smartsharesystems.com; Chilikin, Andrey <andrey.chilikin@intel.com>;
> > jblunck@infradead.org; nelio.laranjeiro@6wind.com;
> > arybchenko@solarflare.com; thomas.monjalon@6wind.com;
> > jerin.jacob@caviumnetworks.com
> > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb
> > segments
> >
> > Hi Yuanhan,
> >
> > On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
> > <yuanhan.liu@linux.intel.com> wrote:
> > > Hi Olivier,
> > >
> > > On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> > > > Change the size of m->port and m->nb_segs to 16 bits.
> > >
> > > But all the ethdev APIs are still using 8 bits. 16 bits won't really
> > > take effect without updating those APIs. Any plans?
> > >
> > > --yliu
> >
> > Yes, there is some work in ethdev, drivers and in example apps to
> > make the change effective. I think we could define a specific type for
> > a port number, maybe rte_eth_port_num_t. Using this type could be a
> > first step (for 17.08) before switching to 16 bits (17.11?).
> >
> > I'll do the change and send a rfc.
>
> Ping ;) Is this still in your plan?
>
Sorry, I don't think I will have time to work on this issue in the
coming weeks. If you plan to do it, I will be happy to help with reviews
and comments.
As I said in a previous message, I think a good first step would be
to introduce a typedef for the port number: rte_eth_port_num_t.
It can still be uint8_t for now, and can be switched to 16 bits in
one step when everyone uses this new type.
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-10 8:00 ` Olivier Matz
@ 2017-07-10 8:15 ` Morten Brørup
2017-07-11 13:25 ` Wiles, Keith
2017-07-11 13:34 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments Wiles, Keith
0 siblings, 2 replies; 155+ messages in thread
From: Morten Brørup @ 2017-07-10 8:15 UTC (permalink / raw)
To: Olivier Matz, Wang, Zhihong
Cc: Yuanhan Liu, dev, Ananyev, Konstantin, Richardson, Bruce,
Chilikin, Andrey, jblunck, nelio.laranjeiro, arybchenko,
thomas.monjalon, jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Monday, July 10, 2017 10:00 AM
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> nb segments
>
> Hi,
>
> On Tue, 4 Jul 2017 07:54:23 +0000, "Wang, Zhihong"
> <zhihong.wang@intel.com> wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > > Sent: Tuesday, April 18, 2017 9:03 PM
> > > To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > >
> > > Hi Yuanhan,
> > >
> > > On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
> > > <yuanhan.liu@linux.intel.com> wrote:
> > > > Hi Olivier,
> > > >
> > > > On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> > > > > Change the size of m->port and m->nb_segs to 16 bits.
> > > >
> > > > But all the ethdev APIs are still using 8 bits. 16 bits won't
> > > > really take effect without updating those APIs. Any plans?
> > > >
> > > > --yliu
> > >
> > > Yes, there is some work in ethdev, drivers and in example apps to
> > > make the change effective. I think we could define a specific type
> > > for a port number, maybe rte_eth_port_num_t. Using this type could
> > > be a first step (for 17.08) before switching to 16 bits (17.11?).
> > >
> > > I'll do the change and send a rfc.
> >
> > Ping ;) Is this still in your plan?
> >
>
> Sorry, I don't think I will have time to work on this issue in the
> coming weeks. If you plan to do it, I will be happy to help with
> reviews and comments.
>
> As I said in a previous message, I think a good first step would be to
> introduce a typedef for the port number: rte_eth_port_num_t.
> It can still be uint8_t for now, and can be switched to 16 bits in one
> step when everyone uses this new type.
>
> Olivier
I think that DPDK follows the Linux tradition of exposing the variable types, as opposed to hiding them behind typedefs. This has the unfortunate consequence that when a variable type changes, it has to be changed everywhere.
Introducing a rte_eth_port_num_t will require changing the same files at the same locations everywhere, so not even as a temporary solution will it be beneficial.
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-10 8:15 ` Morten Brørup
@ 2017-07-11 13:25 ` Wiles, Keith
2017-07-11 13:30 ` Morten Brørup
2017-07-11 13:34 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments Wiles, Keith
1 sibling, 1 reply; 155+ messages in thread
From: Wiles, Keith @ 2017-07-11 13:25 UTC (permalink / raw)
To: Morten Brørup
Cc: Olivier Matz, Wang, Zhihong, Yuanhan Liu, DPDK, Ananyev,
Konstantin, Richardson, Bruce, Chilikin, Andrey, Jan Blunck,
nelio.laranjeiro, arybchenko, thomas.monjalon, jerin.jacob
On Jul 10, 2017, at 3:15 AM, Morten Brørup <mb@smartsharesystems.com<mailto:mb@smartsharesystems.com>> wrote:
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
Sent: Monday, July 10, 2017 10:00 AM
Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
nb segments
Hi,
On Tue, 4 Jul 2017 07:54:23 +0000, "Wang, Zhihong"
<zhihong.wang@intel.com<mailto:zhihong.wang@intel.com>> wrote:
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
Sent: Tuesday, April 18, 2017 9:03 PM
To: Yuanhan Liu <yuanhan.liu@linux.intel.com<mailto:yuanhan.liu@linux.intel.com>>
Hi Yuanhan,
On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
<yuanhan.liu@linux.intel.com<mailto:yuanhan.liu@linux.intel.com>> wrote:
Hi Olivier,
On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
Change the size of m->port and m->nb_segs to 16 bits.
But all the ethdev APIs are still using 8 bits. 16 bits won't
really take effect without updating those APIs. Any plans?
--yliu
Yes, there is some work in ethdev, drivers and in example apps to
make the change effective. I think we could define a specific type
for a port number, maybe rte_eth_port_num_t. Using this type could
be a first step (for 17.08) before switching to 16 bits (17.11?).
I'll do the change and send a rfc.
Ping ;) Is this still in your plan?
Sorry, I don't think I will have time to work on this issue in the
coming weeks. If you plan to do it, I will be happy to help with
reviews and comments.
As I said in a previous message, I think a good first step would be to
introduce a typedef for the port number: rte_eth_port_num_t.
It can still be uint8_t for now, and can be switched to 16 bits in one
step when everyone uses this new type.
Olivier
I think that DPDK follows the Linux tradition of exposing the variable types, as opposed to hiding them behind typedefs. This has the unfortunate consequence that when a variable type changes, it has to be changed everywhere.
Introducing a rte_eth_port_num_t will require changing the same files at the same locations everywhere, so not even as a temporary solution will it be beneficial.
I would like to see a much smaller typedef name here, we use it everywhere.
rte_port_id_t
port_id_t
port_num_t
portid_t
I do not see why it needs to be rte_eth or even rte_, if we do not put eth in the name then is could be used in crypto or someplace else.
Med venlig hilsen / kind regards
- Morten Brørup
Regards,
Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-11 13:25 ` Wiles, Keith
@ 2017-07-11 13:30 ` Morten Brørup
2017-07-11 15:05 ` Thomas Monjalon
0 siblings, 1 reply; 155+ messages in thread
From: Morten Brørup @ 2017-07-11 13:30 UTC (permalink / raw)
To: Wiles, Keith
Cc: Olivier Matz, Wang, Zhihong, Yuanhan Liu, DPDK, Ananyev,
Konstantin, Richardson, Bruce, Chilikin, Andrey, Jan Blunck,
nelio.laranjeiro, arybchenko, thomas.monjalon, jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> Sent: Tuesday, July 11, 2017 3:26 PM
> To: Morten Brørup
> Cc: Olivier Matz; Wang, Zhihong; Yuanhan Liu; DPDK; Ananyev,
> Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan Blunck;
> nelio.laranjeiro@6wind.com; arybchenko@solarflare.com;
> thomas.monjalon@6wind.com; jerin.jacob@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> nb segments
>
>
> On Jul 10, 2017, at 3:15 AM, Morten Brørup
> <mb@smartsharesystems.com<mailto:mb@smartsharesystems.com>> wrote:
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Monday, July 10, 2017 10:00 AM
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> nb segments
>
> Hi,
>
> On Tue, 4 Jul 2017 07:54:23 +0000, "Wang, Zhihong"
> <zhihong.wang@intel.com<mailto:zhihong.wang@intel.com>> wrote:
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Tuesday, April 18, 2017 9:03 PM
> To: Yuanhan Liu
> <yuanhan.liu@linux.intel.com<mailto:yuanhan.liu@linux.intel.com>>
>
> Hi Yuanhan,
>
> On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
> <yuanhan.liu@linux.intel.com<mailto:yuanhan.liu@linux.intel.com>>
> wrote:
> Hi Olivier,
>
> On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> Change the size of m->port and m->nb_segs to 16 bits.
>
> But all the ethdev APIs are still using 8 bits. 16 bits won't really
> take effect without updating those APIs. Any plans?
>
> --yliu
>
> Yes, there is some work in ethdev, drivers and in example apps to make
> the change effective. I think we could define a specific type for a
> port number, maybe rte_eth_port_num_t. Using this type could be a first
> step (for 17.08) before switching to 16 bits (17.11?).
>
> I'll do the change and send a rfc.
>
> Ping ;) Is this still in your plan?
>
>
> Sorry, I don't think I will have time to work on this issue in the
> coming weeks. If you plan to do it, I will be happy to help with
> reviews and comments.
>
> As I said in a previous message, I think a good first step would be to
> introduce a typedef for the port number: rte_eth_port_num_t.
> It can still be uint8_t for now, and can be switched to 16 bits in one
> step when everyone uses this new type.
>
> Olivier
>
> I think that DPDK follows the Linux tradition of exposing the variable
> types, as opposed to hiding them behind typedefs. This has the
> unfortunate consequence that when a variable type changes, it has to be
> changed everywhere.
>
> Introducing a rte_eth_port_num_t will require changing the same files
> at the same locations everywhere, so not even as a temporary solution
> will it be beneficial.
>
> I would like to see a much smaller typedef name here, we use it
> everywhere.
> rte_port_id_t
> port_id_t
> port_num_t
> portid_t
>
> I do not see why it needs to be rte_eth or even rte_, if we do not put
> eth in the name then is could be used in crypto or someplace else.
>
>
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
> Regards,
> Keith
What I was trying to communicate with my long argument about type definitions was: When the type changed from 8 bit to 16 bit, the type needs to change from uint8_t to uint16_t everywhere too, including in the ethdev APIs.
Don't start breaking coding conventions here by hiding the type of this variable.
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-11 13:30 ` Morten Brørup
@ 2017-07-11 15:05 ` Thomas Monjalon
2017-07-11 15:23 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments Morten Brørup
0 siblings, 1 reply; 155+ messages in thread
From: Thomas Monjalon @ 2017-07-11 15:05 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Wiles, Keith, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, nelio.laranjeiro, arybchenko, jerin.jacob
11/07/2017 15:30, Morten Brørup:
> Morten Brørup wrote:
> > Olivier Matz wrote:
> > > As I said in a previous message, I think a good first step would be to
> > > introduce a typedef for the port number: rte_eth_port_num_t.
> > > It can still be uint8_t for now, and can be switched to 16 bits in one
> > > step when everyone uses this new type.
> >
> > I think that DPDK follows the Linux tradition of exposing the variable
> > types, as opposed to hiding them behind typedefs. This has the
> > unfortunate consequence that when a variable type changes, it has to be
> > changed everywhere.
> >
> > Introducing a rte_eth_port_num_t will require changing the same files
> > at the same locations everywhere, so not even as a temporary solution
> > will it be beneficial.
[...]
> What I was trying to communicate with my long argument about type definitions was: When the type changed from 8 bit to 16 bit, the type needs to change from uint8_t to uint16_t everywhere too, including in the ethdev APIs.
>
> Don't start breaking coding conventions here by hiding the type of this variable.
So, Morten, you are against the typedef, right?
Because we need to change it everywhere anyway, right?
Note: I have no strong opinion.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments
2017-07-11 15:05 ` Thomas Monjalon
@ 2017-07-11 15:23 ` Morten Brørup
2017-07-11 16:48 ` Wiles, Keith
0 siblings, 1 reply; 155+ messages in thread
From: Morten Brørup @ 2017-07-11 15:23 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Wiles, Keith, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, nelio.laranjeiro, arybchenko, jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, July 11, 2017 5:06 PM
> To: Morten Brørup
> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong; Yuanhan
> Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
> Blunck; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com;
> jerin.jacob@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> nbsegments
>
> 11/07/2017 15:30, Morten Brørup:
> > Morten Brørup wrote:
> > > Olivier Matz wrote:
> > > > As I said in a previous message, I think a good first step would
> > > > be to introduce a typedef for the port number:
> rte_eth_port_num_t.
> > > > It can still be uint8_t for now, and can be switched to 16 bits
> in
> > > > one step when everyone uses this new type.
> > >
> > > I think that DPDK follows the Linux tradition of exposing the
> > > variable types, as opposed to hiding them behind typedefs. This has
> > > the unfortunate consequence that when a variable type changes, it
> > > has to be changed everywhere.
> > >
> > > Introducing a rte_eth_port_num_t will require changing the same
> > > files at the same locations everywhere, so not even as a temporary
> > > solution will it be beneficial.
> [...]
> > What I was trying to communicate with my long argument about type
> definitions was: When the type changed from 8 bit to 16 bit, the type
> needs to change from uint8_t to uint16_t everywhere too, including in
> the ethdev APIs.
> >
> > Don't start breaking coding conventions here by hiding the type of
> this variable.
>
> So, Morten, you are against the typedef, right?
> Because we need to change it everywhere anyway, right?
>
> Note: I have no strong opinion.
I'm against the typedef because it would break convention, and I'm a strong proponent of conventions. In other projects, I'm all for typedefs, virtual classes, inheritance etc., but DPDK follows the Linux convention of not hiding simple types.
We need to change it from uint8_t everywhere, regardless what we change it to. (But if we need to change it again sometime in the future, then a typedef will save us next time.)
However, if we change the convention and start hiding simple types, they still need the rte_ prefix regardless if they are popular or obscure types. Even struct rte_mbuf has the rte_ prefix, and I consider that a very popular type. If so, rte_port_t would be a good name for this type.
My preference: Follow convention and change it to uint16_t everywhere.
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments
2017-07-11 15:23 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments Morten Brørup
@ 2017-07-11 16:48 ` Wiles, Keith
2017-07-12 7:25 ` Morten Brørup
0 siblings, 1 reply; 155+ messages in thread
From: Wiles, Keith @ 2017-07-11 16:48 UTC (permalink / raw)
To: Morten Brørup
Cc: Thomas Monjalon, DPDK, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, Nélio Laranjeiro, arybchenko, jerin.jacob
> On Jul 11, 2017, at 10:23 AM, Morten Brørup <mb@smartsharesystems.com> wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
>> Sent: Tuesday, July 11, 2017 5:06 PM
>> To: Morten Brørup
>> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong; Yuanhan
>> Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
>> Blunck; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com;
>> jerin.jacob@caviumnetworks.com
>> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
>> nbsegments
>>
>> 11/07/2017 15:30, Morten Brørup:
>>> Morten Brørup wrote:
>>>> Olivier Matz wrote:
>>>>> As I said in a previous message, I think a good first step would
>>>>> be to introduce a typedef for the port number:
>> rte_eth_port_num_t.
>>>>> It can still be uint8_t for now, and can be switched to 16 bits
>> in
>>>>> one step when everyone uses this new type.
>>>>
>>>> I think that DPDK follows the Linux tradition of exposing the
>>>> variable types, as opposed to hiding them behind typedefs. This has
>>>> the unfortunate consequence that when a variable type changes, it
>>>> has to be changed everywhere.
>>>>
>>>> Introducing a rte_eth_port_num_t will require changing the same
>>>> files at the same locations everywhere, so not even as a temporary
>>>> solution will it be beneficial.
>> [...]
>>> What I was trying to communicate with my long argument about type
>> definitions was: When the type changed from 8 bit to 16 bit, the type
>> needs to change from uint8_t to uint16_t everywhere too, including in
>> the ethdev APIs.
>>>
>>> Don't start breaking coding conventions here by hiding the type of
>> this variable.
>>
>> So, Morten, you are against the typedef, right?
>> Because we need to change it everywhere anyway, right?
>>
>> Note: I have no strong opinion.
>
> I'm against the typedef because it would break convention, and I'm a strong proponent of conventions. In other projects, I'm all for typedefs, virtual classes, inheritance etc., but DPDK follows the Linux convention of not hiding simple types.
>
> We need to change it from uint8_t everywhere, regardless what we change it to. (But if we need to change it again sometime in the future, then a typedef will save us next time.)
If the number of ports go beyond 64K then I will be the first one (if still alive) to eat this email. :-) The only reason to have more then 2 bytes would be to encode something into the port id value, which I could see, but a very slim chance IMHO.
>
> However, if we change the convention and start hiding simple types, they still need the rte_ prefix regardless if they are popular or obscure types. Even struct rte_mbuf has the rte_ prefix, and I consider that a very popular type. If so, rte_port_t would be a good name for this type.
>
> My preference: Follow convention and change it to uint16_t everywhere.
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
As we must change the uint8_t to uint16_t, then I would like it to be more descriptive via a typedef. I really do not see us needing to change it again in the near future. The only reason to make it a typedef is to be able to identify from just the prototype of the function that it takes a port ID value, which I am in favor of doing here for that reason.
As for Olivier’s statement about the typedef name I do not see the need for ‘_eth_' to be part of the typedef as it conveys no extra information in the name. Everything port related in DPDK is a ethernet type device or port. If we do add something like fiber channel maybe rte_pid_t is reason to that too, but if it contains ‘_eth_’ it would not.
I would like to see names that are just short enough to convey the information and not be redundant. IMHO rte_pid_t is fine, but if we use some something similar to the length of uint8_t (7) or uint16_t (8) characters then we would not have to also reformat the line more then needed. Using rte_pid_t (pid == port_id) we only extend the length by one (or two) characters and most likely the added byte(s) will not cause more format problems in the code.
Regards,
Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments
2017-07-11 16:48 ` Wiles, Keith
@ 2017-07-12 7:25 ` Morten Brørup
2017-07-12 9:02 ` Yang, Zhiyong
2017-07-12 15:34 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments Wiles, Keith
0 siblings, 2 replies; 155+ messages in thread
From: Morten Brørup @ 2017-07-12 7:25 UTC (permalink / raw)
To: Wiles, Keith
Cc: Thomas Monjalon, DPDK, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, Nélio Laranjeiro, arybchenko, jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> Sent: Tuesday, July 11, 2017 6:48 PM
> To: Morten Brørup
> Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan Liu;
> Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan Blunck;
> Nélio Laranjeiro; arybchenko@solarflare.com;
> jerin.jacob@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> nbsegments
>
>
> > On Jul 11, 2017, at 10:23 AM, Morten Brørup
> <mb@smartsharesystems.com> wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> >> Sent: Tuesday, July 11, 2017 5:06 PM
> >> To: Morten Brørup
> >> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong; Yuanhan
> >> Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
> >> Blunck; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com;
> >> jerin.jacob@caviumnetworks.com
> >> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> and
> >> nbsegments
> >>
> >> 11/07/2017 15:30, Morten Brørup:
> >>> Morten Brørup wrote:
> >>>> Olivier Matz wrote:
> >>>>> As I said in a previous message, I think a good first step would
> >>>>> be to introduce a typedef for the port number:
> >> rte_eth_port_num_t.
> >>>>> It can still be uint8_t for now, and can be switched to 16 bits
> >> in
> >>>>> one step when everyone uses this new type.
> >>>>
> >>>> I think that DPDK follows the Linux tradition of exposing the
> >>>> variable types, as opposed to hiding them behind typedefs. This
> has
> >>>> the unfortunate consequence that when a variable type changes, it
> >>>> has to be changed everywhere.
> >>>>
> >>>> Introducing a rte_eth_port_num_t will require changing the same
> >>>> files at the same locations everywhere, so not even as a temporary
> >>>> solution will it be beneficial.
> >> [...]
> >>> What I was trying to communicate with my long argument about type
> >> definitions was: When the type changed from 8 bit to 16 bit, the
> type
> >> needs to change from uint8_t to uint16_t everywhere too, including
> in
> >> the ethdev APIs.
> >>>
> >>> Don't start breaking coding conventions here by hiding the type of
> >> this variable.
> >>
> >> So, Morten, you are against the typedef, right?
> >> Because we need to change it everywhere anyway, right?
> >>
> >> Note: I have no strong opinion.
> >
> > I'm against the typedef because it would break convention, and I'm a
> strong proponent of conventions. In other projects, I'm all for
> typedefs, virtual classes, inheritance etc., but DPDK follows the Linux
> convention of not hiding simple types.
> >
> > We need to change it from uint8_t everywhere, regardless what we
> > change it to. (But if we need to change it again sometime in the
> > future, then a typedef will save us next time.)
>
> If the number of ports go beyond 64K then I will be the first one (if
> still alive) to eat this email. :-) The only reason to have more then 2
> bytes would be to encode something into the port id value, which I
> could see, but a very slim chance IMHO.
>
> >
> > However, if we change the convention and start hiding simple types,
> they still need the rte_ prefix regardless if they are popular or
> obscure types. Even struct rte_mbuf has the rte_ prefix, and I consider
> that a very popular type. If so, rte_port_t would be a good name for
> this type.
> >
> > My preference: Follow convention and change it to uint16_t
> everywhere.
> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> >
>
> As we must change the uint8_t to uint16_t, then I would like it to be
> more descriptive via a typedef. I really do not see us needing to
> change it again in the near future. The only reason to make it a
> typedef is to be able to identify from just the prototype of the
> function that it takes a port ID value, which I am in favor of doing
> here for that reason.
That is not a very good reason: When used as a function parameter, the type is only shown in the function declaration, whereas the variable name is shown every time it is used inside the function. So remember to always use meaningful variable names, such as "port" (like in the mbuf structure) or "port_id" (used in other places).
>
> As for Olivier’s statement about the typedef name I do not see the need
> for ‘_eth_' to be part of the typedef as it conveys no extra
> information in the name. Everything port related in DPDK is a ethernet
> type device or port. If we do add something like fiber channel maybe
> rte_pid_t is reason to that too, but if it contains ‘_eth_’ it would
> not.
>
> I would like to see names that are just short enough to convey the
> information and not be redundant. IMHO rte_pid_t is fine, but if we use
> some something similar to the length of uint8_t (7) or uint16_t (8)
> characters then we would not have to also reformat the line more then
> needed. Using rte_pid_t (pid == port_id) we only extend the length by
> one (or two) characters and most likely the added byte(s) will not
> cause more format problems in the code.
I still don't support typedefs for scalar types. I consider it against the coding style, although after reviewing the official DPDK Coding Style documentation (http://dpdk.org/doc/guides/contributing/coding_style.html), I can see that it is not explicitly stated. Please also note that section 1.5.7 of the DPDK Coding Style documentation says that the _t postfix should be avoided. Anyway, if we end up with a typedef, please don't use something resembling pid_t known from POSIX (https://www.gnu.org/software/libc/manual/html_node/Process-Identification.html).
>
> Regards,
> Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments
2017-07-12 7:25 ` Morten Brørup
@ 2017-07-12 9:02 ` Yang, Zhiyong
2017-07-12 9:50 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments Morten Brørup
2017-07-12 15:34 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments Wiles, Keith
1 sibling, 1 reply; 155+ messages in thread
From: Yang, Zhiyong @ 2017-07-12 9:02 UTC (permalink / raw)
To: Morten Brørup, Wiles, Keith
Cc: Thomas Monjalon, DPDK, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, Nélio Laranjeiro, arybchenko, jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Morten Brørup
> Sent: Wednesday, July 12, 2017 3:25 PM
> To: Wiles, Keith <keith.wiles@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; DPDK <dev@dpdk.org>; Olivier
> Matz <olivier.matz@6wind.com>; Wang, Zhihong <zhihong.wang@intel.com>;
> Yuanhan Liu <yuanhan.liu@linux.intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Chilikin, Andrey <andrey.chilikin@intel.com>;
> Jan Blunck <jblunck@infradead.org>; Nélio Laranjeiro
> <nelio.laranjeiro@6wind.com>; arybchenko@solarflare.com;
> jerin.jacob@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> nbsegments
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> > Sent: Tuesday, July 11, 2017 6:48 PM
> > To: Morten Brørup
> > Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan Liu;
> > Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan Blunck;
> > Nélio Laranjeiro; arybchenko@solarflare.com;
> > jerin.jacob@caviumnetworks.com
> > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> > nbsegments
> >
> >
> > > On Jul 11, 2017, at 10:23 AM, Morten Brørup
> > <mb@smartsharesystems.com> wrote:
> > >
> > >> -----Original Message-----
> > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > >> Monjalon
> > >> Sent: Tuesday, July 11, 2017 5:06 PM
> > >> To: Morten Brørup
> > >> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong;
> > >> Yuanhan Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin,
> > >> Andrey; Jan Blunck; nelio.laranjeiro@6wind.com;
> > >> arybchenko@solarflare.com; jerin.jacob@caviumnetworks.com
> > >> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> > and
> > >> nbsegments
> > >>
> > >> 11/07/2017 15:30, Morten Brørup:
> > >>> Morten Brørup wrote:
> > >>>> Olivier Matz wrote:
> > >>>>> As I said in a previous message, I think a good first step would
> > >>>>> be to introduce a typedef for the port number:
> > >> rte_eth_port_num_t.
> > >>>>> It can still be uint8_t for now, and can be switched to 16 bits
> > >> in
> > >>>>> one step when everyone uses this new type.
> > >>>>
> > >>>> I think that DPDK follows the Linux tradition of exposing the
> > >>>> variable types, as opposed to hiding them behind typedefs. This
> > has
> > >>>> the unfortunate consequence that when a variable type changes, it
> > >>>> has to be changed everywhere.
> > >>>>
> > >>>> Introducing a rte_eth_port_num_t will require changing the same
> > >>>> files at the same locations everywhere, so not even as a
> > >>>> temporary solution will it be beneficial.
> > >> [...]
> > >>> What I was trying to communicate with my long argument about type
> > >> definitions was: When the type changed from 8 bit to 16 bit, the
> > type
> > >> needs to change from uint8_t to uint16_t everywhere too, including
> > in
> > >> the ethdev APIs.
> > >>>
> > >>> Don't start breaking coding conventions here by hiding the type of
> > >> this variable.
> > >>
> > >> So, Morten, you are against the typedef, right?
> > >> Because we need to change it everywhere anyway, right?
> > >>
> > >> Note: I have no strong opinion.
> > >
> > > I'm against the typedef because it would break convention, and I'm a
> > strong proponent of conventions. In other projects, I'm all for
> > typedefs, virtual classes, inheritance etc., but DPDK follows the
> > Linux convention of not hiding simple types.
> > >
> > > We need to change it from uint8_t everywhere, regardless what we
> > > change it to. (But if we need to change it again sometime in the
> > > future, then a typedef will save us next time.)
> >
> > If the number of ports go beyond 64K then I will be the first one (if
> > still alive) to eat this email. :-) The only reason to have more then
> > 2 bytes would be to encode something into the port id value, which I
> > could see, but a very slim chance IMHO.
> >
> > >
> > > However, if we change the convention and start hiding simple types,
> > they still need the rte_ prefix regardless if they are popular or
> > obscure types. Even struct rte_mbuf has the rte_ prefix, and I
> > consider that a very popular type. If so, rte_port_t would be a good
> > name for this type.
> > >
> > > My preference: Follow convention and change it to uint16_t
> > everywhere.
> > >
> > > Med venlig hilsen / kind regards
> > > - Morten Brørup
> > >
> >
> > As we must change the uint8_t to uint16_t, then I would like it to be
> > more descriptive via a typedef. I really do not see us needing to
> > change it again in the near future. The only reason to make it a
> > typedef is to be able to identify from just the prototype of the
> > function that it takes a port ID value, which I am in favor of doing
> > here for that reason.
>
> That is not a very good reason: When used as a function parameter, the type is
> only shown in the function declaration, whereas the variable name is shown
> every time it is used inside the function. So remember to always use meaningful
> variable names, such as "port" (like in the mbuf structure) or "port_id" (used in
> other places).
>
> >
> > As for Olivier’s statement about the typedef name I do not see the
> > need for ‘_eth_' to be part of the typedef as it conveys no extra
> > information in the name. Everything port related in DPDK is a ethernet
> > type device or port. If we do add something like fiber channel maybe
> > rte_pid_t is reason to that too, but if it contains ‘_eth_’ it would
> > not.
> >
> > I would like to see names that are just short enough to convey the
> > information and not be redundant. IMHO rte_pid_t is fine, but if we
> > use some something similar to the length of uint8_t (7) or uint16_t
> > (8) characters then we would not have to also reformat the line more
> > then needed. Using rte_pid_t (pid == port_id) we only extend the
> > length by one (or two) characters and most likely the added byte(s)
> > will not cause more format problems in the code.
>
> I still don't support typedefs for scalar types. I consider it against the coding
> style, although after reviewing the official DPDK Coding Style documentation
> (http://dpdk.org/doc/guides/contributing/coding_style.html), I can see that it is
> not explicitly stated. Please also note that section 1.5.7 of the DPDK Coding
> Style documentation says that the _t postfix should be avoided. Anyway, if we
> end up with a typedef, please don't use something resembling pid_t known from
> POSIX (https://www.gnu.org/software/libc/manual/html_node/Process-
> Identification.html).
>
How about rte_dev_id_t?
Thanks
Zhiyong
>
> >
> > Regards,
> > Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments
2017-07-12 9:02 ` Yang, Zhiyong
@ 2017-07-12 9:50 ` Morten Brørup
2017-07-12 15:35 ` Stephen Hemminger
0 siblings, 1 reply; 155+ messages in thread
From: Morten Brørup @ 2017-07-12 9:50 UTC (permalink / raw)
To: Yang, Zhiyong, Wiles, Keith
Cc: Thomas Monjalon, DPDK, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, Nélio Laranjeiro, arybchenko, jerin.jacob
> -----Original Message-----
> From: Yang, Zhiyong [mailto:zhiyong.yang@intel.com]
> Sent: Wednesday, July 12, 2017 11:02 AM
> To: Morten Brørup; Wiles, Keith
> Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan Liu;
> Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan Blunck;
> Nélio Laranjeiro; arybchenko@solarflare.com;
> jerin.jacob@caviumnetworks.com
> Subject: RE: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> andnbsegments
>
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Morten Brørup
> > Sent: Wednesday, July 12, 2017 3:25 PM
> > To: Wiles, Keith <keith.wiles@intel.com>
> > Cc: Thomas Monjalon <thomas@monjalon.net>; DPDK <dev@dpdk.org>;
> > Olivier Matz <olivier.matz@6wind.com>; Wang, Zhihong
> > <zhihong.wang@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>;
> > Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; Chilikin, Andrey
> > <andrey.chilikin@intel.com>; Jan Blunck <jblunck@infradead.org>;
> Nélio
> > Laranjeiro <nelio.laranjeiro@6wind.com>; arybchenko@solarflare.com;
> > jerin.jacob@caviumnetworks.com
> > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> > nbsegments
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> > > Sent: Tuesday, July 11, 2017 6:48 PM
> > > To: Morten Brørup
> > > Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan
> Liu;
> > > Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
> > > Blunck; Nélio Laranjeiro; arybchenko@solarflare.com;
> > > jerin.jacob@caviumnetworks.com
> > > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> > > and nbsegments
> > >
> > >
> > > > On Jul 11, 2017, at 10:23 AM, Morten Brørup
> > > <mb@smartsharesystems.com> wrote:
> > > >
> > > >> -----Original Message-----
> > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > > >> Monjalon
> > > >> Sent: Tuesday, July 11, 2017 5:06 PM
> > > >> To: Morten Brørup
> > > >> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong;
> > > >> Yuanhan Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin,
> > > >> Andrey; Jan Blunck; nelio.laranjeiro@6wind.com;
> > > >> arybchenko@solarflare.com; jerin.jacob@caviumnetworks.com
> > > >> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for
> port
> > > and
> > > >> nbsegments
> > > >>
> > > >> 11/07/2017 15:30, Morten Brørup:
> > > >>> Morten Brørup wrote:
> > > >>>> Olivier Matz wrote:
> > > >>>>> As I said in a previous message, I think a good first step
> > > >>>>> would be to introduce a typedef for the port number:
> > > >> rte_eth_port_num_t.
> > > >>>>> It can still be uint8_t for now, and can be switched to 16
> > > >>>>> bits
> > > >> in
> > > >>>>> one step when everyone uses this new type.
> > > >>>>
> > > >>>> I think that DPDK follows the Linux tradition of exposing the
> > > >>>> variable types, as opposed to hiding them behind typedefs.
> This
> > > has
> > > >>>> the unfortunate consequence that when a variable type changes,
> > > >>>> it has to be changed everywhere.
> > > >>>>
> > > >>>> Introducing a rte_eth_port_num_t will require changing the
> same
> > > >>>> files at the same locations everywhere, so not even as a
> > > >>>> temporary solution will it be beneficial.
> > > >> [...]
> > > >>> What I was trying to communicate with my long argument about
> > > >>> type
> > > >> definitions was: When the type changed from 8 bit to 16 bit, the
> > > type
> > > >> needs to change from uint8_t to uint16_t everywhere too,
> > > >> including
> > > in
> > > >> the ethdev APIs.
> > > >>>
> > > >>> Don't start breaking coding conventions here by hiding the type
> > > >>> of
> > > >> this variable.
> > > >>
> > > >> So, Morten, you are against the typedef, right?
> > > >> Because we need to change it everywhere anyway, right?
> > > >>
> > > >> Note: I have no strong opinion.
> > > >
> > > > I'm against the typedef because it would break convention, and
> I'm
> > > > a
> > > strong proponent of conventions. In other projects, I'm all for
> > > typedefs, virtual classes, inheritance etc., but DPDK follows the
> > > Linux convention of not hiding simple types.
> > > >
> > > > We need to change it from uint8_t everywhere, regardless what we
> > > > change it to. (But if we need to change it again sometime in the
> > > > future, then a typedef will save us next time.)
> > >
> > > If the number of ports go beyond 64K then I will be the first one
> > > (if still alive) to eat this email. :-) The only reason to have
> more
> > > then
> > > 2 bytes would be to encode something into the port id value, which
> I
> > > could see, but a very slim chance IMHO.
> > >
> > > >
> > > > However, if we change the convention and start hiding simple
> > > > types,
> > > they still need the rte_ prefix regardless if they are popular or
> > > obscure types. Even struct rte_mbuf has the rte_ prefix, and I
> > > consider that a very popular type. If so, rte_port_t would be a
> good
> > > name for this type.
> > > >
> > > > My preference: Follow convention and change it to uint16_t
> > > everywhere.
> > > >
> > > > Med venlig hilsen / kind regards
> > > > - Morten Brørup
> > > >
> > >
> > > As we must change the uint8_t to uint16_t, then I would like it to
> > > be more descriptive via a typedef. I really do not see us needing
> to
> > > change it again in the near future. The only reason to make it a
> > > typedef is to be able to identify from just the prototype of the
> > > function that it takes a port ID value, which I am in favor of
> doing
> > > here for that reason.
> >
> > That is not a very good reason: When used as a function parameter,
> the
> > type is only shown in the function declaration, whereas the variable
> > name is shown every time it is used inside the function. So remember
> > to always use meaningful variable names, such as "port" (like in the
> > mbuf structure) or "port_id" (used in other places).
> >
> > >
> > > As for Olivier’s statement about the typedef name I do not see the
> > > need for ‘_eth_' to be part of the typedef as it conveys no extra
> > > information in the name. Everything port related in DPDK is a
> > > ethernet type device or port. If we do add something like fiber
> > > channel maybe rte_pid_t is reason to that too, but if it contains
> > > ‘_eth_’ it would not.
> > >
> > > I would like to see names that are just short enough to convey the
> > > information and not be redundant. IMHO rte_pid_t is fine, but if we
> > > use some something similar to the length of uint8_t (7) or uint16_t
> > > (8) characters then we would not have to also reformat the line
> more
> > > then needed. Using rte_pid_t (pid == port_id) we only extend the
> > > length by one (or two) characters and most likely the added byte(s)
> > > will not cause more format problems in the code.
> >
> > I still don't support typedefs for scalar types. I consider it
> against
> > the coding style, although after reviewing the official DPDK Coding
> > Style documentation
> > (http://dpdk.org/doc/guides/contributing/coding_style.html), I can
> see
> > that it is not explicitly stated. Please also note that section 1.5.7
> > of the DPDK Coding Style documentation says that the _t postfix
> should
> > be avoided. Anyway, if we end up with a typedef, please don't use
> > something resembling pid_t known from POSIX
> > (https://www.gnu.org/software/libc/manual/html_node/Process-
> > Identification.html).
> >
>
> How about rte_dev_id_t?
>
> Thanks
> Zhiyong
>
> >
> > >
> > > Regards,
> > > Keith
If the DPDK Coding Style is based on Linux Coding Style, we should avoid typedefs in general, not just for structures. Please read Linus Torvalds' opinions about it: http://yarchive.net/comp/linux/typedefs.html
Perhaps the DPDK Coding Style should be updated to clarify this. (Or if we decide otherwise, to explicitly mention this deviation from the Linux coding style.)
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments
2017-07-12 9:50 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments Morten Brørup
@ 2017-07-12 15:35 ` Stephen Hemminger
2017-07-12 15:57 ` Morten Brørup
0 siblings, 1 reply; 155+ messages in thread
From: Stephen Hemminger @ 2017-07-12 15:35 UTC (permalink / raw)
To: Morten Brørup
Cc: Yang, Zhiyong, Wiles, Keith, Thomas Monjalon, DPDK, Olivier Matz,
Wang, Zhihong, Yuanhan Liu, Ananyev, Konstantin, Richardson,
Bruce, Chilikin, Andrey, Jan Blunck, Nélio Laranjeiro,
arybchenko, jerin.jacob
On Wed, 12 Jul 2017 11:50:38 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> > -----Original Message-----
> > From: Yang, Zhiyong [mailto:zhiyong.yang@intel.com]
> > Sent: Wednesday, July 12, 2017 11:02 AM
> > To: Morten Brørup; Wiles, Keith
> > Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan Liu;
> > Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan Blunck;
> > Nélio Laranjeiro; arybchenko@solarflare.com;
> > jerin.jacob@caviumnetworks.com
> > Subject: RE: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> > andnbsegments
> >
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Morten Brørup
> > > Sent: Wednesday, July 12, 2017 3:25 PM
> > > To: Wiles, Keith <keith.wiles@intel.com>
> > > Cc: Thomas Monjalon <thomas@monjalon.net>; DPDK <dev@dpdk.org>;
> > > Olivier Matz <olivier.matz@6wind.com>; Wang, Zhihong
> > > <zhihong.wang@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>;
> > > Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce
> > > <bruce.richardson@intel.com>; Chilikin, Andrey
> > > <andrey.chilikin@intel.com>; Jan Blunck <jblunck@infradead.org>;
> > Nélio
> > > Laranjeiro <nelio.laranjeiro@6wind.com>; arybchenko@solarflare.com;
> > > jerin.jacob@caviumnetworks.com
> > > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> > > nbsegments
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> > > > Sent: Tuesday, July 11, 2017 6:48 PM
> > > > To: Morten Brørup
> > > > Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan
> > Liu;
> > > > Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
> > > > Blunck; Nélio Laranjeiro; arybchenko@solarflare.com;
> > > > jerin.jacob@caviumnetworks.com
> > > > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> > > > and nbsegments
> > > >
> > > >
> > > > > On Jul 11, 2017, at 10:23 AM, Morten Brørup
> > > > <mb@smartsharesystems.com> wrote:
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > > > >> Monjalon
> > > > >> Sent: Tuesday, July 11, 2017 5:06 PM
> > > > >> To: Morten Brørup
> > > > >> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong;
> > > > >> Yuanhan Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin,
> > > > >> Andrey; Jan Blunck; nelio.laranjeiro@6wind.com;
> > > > >> arybchenko@solarflare.com; jerin.jacob@caviumnetworks.com
> > > > >> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for
> > port
> > > > and
> > > > >> nbsegments
> > > > >>
> > > > >> 11/07/2017 15:30, Morten Brørup:
> > > > >>> Morten Brørup wrote:
> > > > >>>> Olivier Matz wrote:
> > > > >>>>> As I said in a previous message, I think a good first step
> > > > >>>>> would be to introduce a typedef for the port number:
> > > > >> rte_eth_port_num_t.
> > > > >>>>> It can still be uint8_t for now, and can be switched to 16
> > > > >>>>> bits
> > > > >> in
> > > > >>>>> one step when everyone uses this new type.
> > > > >>>>
> > > > >>>> I think that DPDK follows the Linux tradition of exposing the
> > > > >>>> variable types, as opposed to hiding them behind typedefs.
> > This
> > > > has
> > > > >>>> the unfortunate consequence that when a variable type changes,
> > > > >>>> it has to be changed everywhere.
> > > > >>>>
> > > > >>>> Introducing a rte_eth_port_num_t will require changing the
> > same
> > > > >>>> files at the same locations everywhere, so not even as a
> > > > >>>> temporary solution will it be beneficial.
> > > > >> [...]
> > > > >>> What I was trying to communicate with my long argument about
> > > > >>> type
> > > > >> definitions was: When the type changed from 8 bit to 16 bit, the
> > > > type
> > > > >> needs to change from uint8_t to uint16_t everywhere too,
> > > > >> including
> > > > in
> > > > >> the ethdev APIs.
> > > > >>>
> > > > >>> Don't start breaking coding conventions here by hiding the type
> > > > >>> of
> > > > >> this variable.
> > > > >>
> > > > >> So, Morten, you are against the typedef, right?
> > > > >> Because we need to change it everywhere anyway, right?
> > > > >>
> > > > >> Note: I have no strong opinion.
> > > > >
> > > > > I'm against the typedef because it would break convention, and
> > I'm
> > > > > a
> > > > strong proponent of conventions. In other projects, I'm all for
> > > > typedefs, virtual classes, inheritance etc., but DPDK follows the
> > > > Linux convention of not hiding simple types.
> > > > >
> > > > > We need to change it from uint8_t everywhere, regardless what we
> > > > > change it to. (But if we need to change it again sometime in the
> > > > > future, then a typedef will save us next time.)
> > > >
> > > > If the number of ports go beyond 64K then I will be the first one
> > > > (if still alive) to eat this email. :-) The only reason to have
> > more
> > > > then
> > > > 2 bytes would be to encode something into the port id value, which
> > I
> > > > could see, but a very slim chance IMHO.
> > > >
> > > > >
> > > > > However, if we change the convention and start hiding simple
> > > > > types,
> > > > they still need the rte_ prefix regardless if they are popular or
> > > > obscure types. Even struct rte_mbuf has the rte_ prefix, and I
> > > > consider that a very popular type. If so, rte_port_t would be a
> > good
> > > > name for this type.
> > > > >
> > > > > My preference: Follow convention and change it to uint16_t
> > > > everywhere.
> > > > >
> > > > > Med venlig hilsen / kind regards
> > > > > - Morten Brørup
> > > > >
> > > >
> > > > As we must change the uint8_t to uint16_t, then I would like it to
> > > > be more descriptive via a typedef. I really do not see us needing
> > to
> > > > change it again in the near future. The only reason to make it a
> > > > typedef is to be able to identify from just the prototype of the
> > > > function that it takes a port ID value, which I am in favor of
> > doing
> > > > here for that reason.
> > >
> > > That is not a very good reason: When used as a function parameter,
> > the
> > > type is only shown in the function declaration, whereas the variable
> > > name is shown every time it is used inside the function. So remember
> > > to always use meaningful variable names, such as "port" (like in the
> > > mbuf structure) or "port_id" (used in other places).
> > >
> > > >
> > > > As for Olivier’s statement about the typedef name I do not see the
> > > > need for ‘_eth_' to be part of the typedef as it conveys no extra
> > > > information in the name. Everything port related in DPDK is a
> > > > ethernet type device or port. If we do add something like fiber
> > > > channel maybe rte_pid_t is reason to that too, but if it contains
> > > > ‘_eth_’ it would not.
> > > >
> > > > I would like to see names that are just short enough to convey the
> > > > information and not be redundant. IMHO rte_pid_t is fine, but if we
> > > > use some something similar to the length of uint8_t (7) or uint16_t
> > > > (8) characters then we would not have to also reformat the line
> > more
> > > > then needed. Using rte_pid_t (pid == port_id) we only extend the
> > > > length by one (or two) characters and most likely the added byte(s)
> > > > will not cause more format problems in the code.
> > >
> > > I still don't support typedefs for scalar types. I consider it
> > against
> > > the coding style, although after reviewing the official DPDK Coding
> > > Style documentation
> > > (http://dpdk.org/doc/guides/contributing/coding_style.html), I can
> > see
> > > that it is not explicitly stated. Please also note that section 1.5.7
> > > of the DPDK Coding Style documentation says that the _t postfix
> > should
> > > be avoided. Anyway, if we end up with a typedef, please don't use
> > > something resembling pid_t known from POSIX
> > > (https://www.gnu.org/software/libc/manual/html_node/Process-
> > > Identification.html).
> > >
> >
> > How about rte_dev_id_t?
> >
> > Thanks
> > Zhiyong
> >
> > >
> > > >
> > > > Regards,
> > > > Keith
>
> If the DPDK Coding Style is based on Linux Coding Style, we should avoid typedefs in general, not just for structures. Please read Linus Torvalds' opinions about it: http://yarchive.net/comp/linux/typedefs.html
>
> Perhaps the DPDK Coding Style should be updated to clarify this. (Or if we decide otherwise, to explicitly mention this deviation from the Linux coding style.)
It is logical to use typedef's for this kind of scalar type that may need to change.
Names matter, please avoid pid (confuse with posix) and dev (confuse with device id).
I would prefer: rte_portid_t and rte_queueid_t
Longer term, probably rte_eth_devices[] needs to go. Change port id into something
more like ifindex. And use sparse data structure to allow very large number of devices
and non-contiguous lookup. Think of a VPN server where each VPN connection looks
like a DPDK device.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments
2017-07-12 15:35 ` Stephen Hemminger
@ 2017-07-12 15:57 ` Morten Brørup
2017-07-12 16:23 ` Thomas Monjalon
0 siblings, 1 reply; 155+ messages in thread
From: Morten Brørup @ 2017-07-12 15:57 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Yang, Zhiyong, Wiles, Keith, Thomas Monjalon, DPDK, Olivier Matz,
Wang, Zhihong, Yuanhan Liu, Ananyev, Konstantin, Richardson,
Bruce, Chilikin, Andrey, Jan Blunck, Nélio Laranjeiro,
arybchenko, jerin.jacob
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, July 12, 2017 5:36 PM
> To: Morten Brørup
> Cc: Yang, Zhiyong; Wiles, Keith; Thomas Monjalon; DPDK; Olivier Matz;
> Wang, Zhihong; Yuanhan Liu; Ananyev, Konstantin; Richardson, Bruce;
> Chilikin, Andrey; Jan Blunck; Nélio Laranjeiro;
> arybchenko@solarflare.com; jerin.jacob@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> andnbsegments
>
> On Wed, 12 Jul 2017 11:50:38 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > > -----Original Message-----
> > > From: Yang, Zhiyong [mailto:zhiyong.yang@intel.com]
> > > Sent: Wednesday, July 12, 2017 11:02 AM
> > > To: Morten Brørup; Wiles, Keith
> > > Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan
> Liu;
> > > Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
> > > Blunck; Nélio Laranjeiro; arybchenko@solarflare.com;
> > > jerin.jacob@caviumnetworks.com
> > > Subject: RE: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> > > andnbsegments
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Morten
> Brørup
> > > > Sent: Wednesday, July 12, 2017 3:25 PM
> > > > To: Wiles, Keith <keith.wiles@intel.com>
> > > > Cc: Thomas Monjalon <thomas@monjalon.net>; DPDK <dev@dpdk.org>;
> > > > Olivier Matz <olivier.matz@6wind.com>; Wang, Zhihong
> > > > <zhihong.wang@intel.com>; Yuanhan Liu
> > > > <yuanhan.liu@linux.intel.com>; Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com>; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; Chilikin, Andrey
> > > > <andrey.chilikin@intel.com>; Jan Blunck <jblunck@infradead.org>;
> > > Nélio
> > > > Laranjeiro <nelio.laranjeiro@6wind.com>;
> > > > arybchenko@solarflare.com; jerin.jacob@caviumnetworks.com
> > > > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
> > > > and nbsegments
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles,
> > > > > Keith
> > > > > Sent: Tuesday, July 11, 2017 6:48 PM
> > > > > To: Morten Brørup
> > > > > Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan
> > > Liu;
> > > > > Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
> > > > > Blunck; Nélio Laranjeiro; arybchenko@solarflare.com;
> > > > > jerin.jacob@caviumnetworks.com
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for
> > > > > port and nbsegments
> > > > >
> > > > >
> > > > > > On Jul 11, 2017, at 10:23 AM, Morten Brørup
> > > > > <mb@smartsharesystems.com> wrote:
> > > > > >
> > > > > >> -----Original Message-----
> > > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > > > > >> Monjalon
> > > > > >> Sent: Tuesday, July 11, 2017 5:06 PM
> > > > > >> To: Morten Brørup
> > > > > >> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong;
> > > > > >> Yuanhan Liu; Ananyev, Konstantin; Richardson, Bruce;
> > > > > >> Chilikin, Andrey; Jan Blunck; nelio.laranjeiro@6wind.com;
> > > > > >> arybchenko@solarflare.com; jerin.jacob@caviumnetworks.com
> > > > > >> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for
> > > port
> > > > > and
> > > > > >> nbsegments
> > > > > >>
> > > > > >> 11/07/2017 15:30, Morten Brørup:
> > > > > >>> Morten Brørup wrote:
> > > > > >>>> Olivier Matz wrote:
> > > > > >>>>> As I said in a previous message, I think a good first
> step
> > > > > >>>>> would be to introduce a typedef for the port number:
> > > > > >> rte_eth_port_num_t.
> > > > > >>>>> It can still be uint8_t for now, and can be switched to
> 16
> > > > > >>>>> bits
> > > > > >> in
> > > > > >>>>> one step when everyone uses this new type.
> > > > > >>>>
> > > > > >>>> I think that DPDK follows the Linux tradition of exposing
> > > > > >>>> the variable types, as opposed to hiding them behind
> typedefs.
> > > This
> > > > > has
> > > > > >>>> the unfortunate consequence that when a variable type
> > > > > >>>> changes, it has to be changed everywhere.
> > > > > >>>>
> > > > > >>>> Introducing a rte_eth_port_num_t will require changing the
> > > same
> > > > > >>>> files at the same locations everywhere, so not even as a
> > > > > >>>> temporary solution will it be beneficial.
> > > > > >> [...]
> > > > > >>> What I was trying to communicate with my long argument
> about
> > > > > >>> type
> > > > > >> definitions was: When the type changed from 8 bit to 16 bit,
> > > > > >> the
> > > > > type
> > > > > >> needs to change from uint8_t to uint16_t everywhere too,
> > > > > >> including
> > > > > in
> > > > > >> the ethdev APIs.
> > > > > >>>
> > > > > >>> Don't start breaking coding conventions here by hiding the
> > > > > >>> type of
> > > > > >> this variable.
> > > > > >>
> > > > > >> So, Morten, you are against the typedef, right?
> > > > > >> Because we need to change it everywhere anyway, right?
> > > > > >>
> > > > > >> Note: I have no strong opinion.
> > > > > >
> > > > > > I'm against the typedef because it would break convention,
> and
> > > I'm
> > > > > > a
> > > > > strong proponent of conventions. In other projects, I'm all for
> > > > > typedefs, virtual classes, inheritance etc., but DPDK follows
> > > > > the Linux convention of not hiding simple types.
> > > > > >
> > > > > > We need to change it from uint8_t everywhere, regardless what
> > > > > > we change it to. (But if we need to change it again sometime
> > > > > > in the future, then a typedef will save us next time.)
> > > > >
> > > > > If the number of ports go beyond 64K then I will be the first
> > > > > one (if still alive) to eat this email. :-) The only reason to
> > > > > have
> > > more
> > > > > then
> > > > > 2 bytes would be to encode something into the port id value,
> > > > > which
> > > I
> > > > > could see, but a very slim chance IMHO.
> > > > >
> > > > > >
> > > > > > However, if we change the convention and start hiding simple
> > > > > > types,
> > > > > they still need the rte_ prefix regardless if they are popular
> > > > > or obscure types. Even struct rte_mbuf has the rte_ prefix, and
> > > > > I consider that a very popular type. If so, rte_port_t would be
> > > > > a
> > > good
> > > > > name for this type.
> > > > > >
> > > > > > My preference: Follow convention and change it to uint16_t
> > > > > everywhere.
> > > > > >
> > > > > > Med venlig hilsen / kind regards
> > > > > > - Morten Brørup
> > > > > >
> > > > >
> > > > > As we must change the uint8_t to uint16_t, then I would like it
> > > > > to be more descriptive via a typedef. I really do not see us
> > > > > needing
> > > to
> > > > > change it again in the near future. The only reason to make it
> a
> > > > > typedef is to be able to identify from just the prototype of
> the
> > > > > function that it takes a port ID value, which I am in favor of
> > > doing
> > > > > here for that reason.
> > > >
> > > > That is not a very good reason: When used as a function
> parameter,
> > > the
> > > > type is only shown in the function declaration, whereas the
> > > > variable name is shown every time it is used inside the function.
> > > > So remember to always use meaningful variable names, such as
> > > > "port" (like in the mbuf structure) or "port_id" (used in other
> places).
> > > >
> > > > >
> > > > > As for Olivier’s statement about the typedef name I do not see
> > > > > the need for ‘_eth_' to be part of the typedef as it conveys no
> > > > > extra information in the name. Everything port related in DPDK
> > > > > is a ethernet type device or port. If we do add something like
> > > > > fiber channel maybe rte_pid_t is reason to that too, but if it
> > > > > contains ‘_eth_’ it would not.
> > > > >
> > > > > I would like to see names that are just short enough to convey
> > > > > the information and not be redundant. IMHO rte_pid_t is fine,
> > > > > but if we use some something similar to the length of uint8_t
> > > > > (7) or uint16_t
> > > > > (8) characters then we would not have to also reformat the line
> > > more
> > > > > then needed. Using rte_pid_t (pid == port_id) we only extend
> the
> > > > > length by one (or two) characters and most likely the added
> > > > > byte(s) will not cause more format problems in the code.
> > > >
> > > > I still don't support typedefs for scalar types. I consider it
> > > against
> > > > the coding style, although after reviewing the official DPDK
> > > > Coding Style documentation
> > > > (http://dpdk.org/doc/guides/contributing/coding_style.html), I
> can
> > > see
> > > > that it is not explicitly stated. Please also note that section
> > > > 1.5.7 of the DPDK Coding Style documentation says that the _t
> > > > postfix
> > > should
> > > > be avoided. Anyway, if we end up with a typedef, please don't use
> > > > something resembling pid_t known from POSIX
> > > > (https://www.gnu.org/software/libc/manual/html_node/Process-
> > > > Identification.html).
> > > >
> > >
> > > How about rte_dev_id_t?
> > >
> > > Thanks
> > > Zhiyong
> > >
> > > >
> > > > >
> > > > > Regards,
> > > > > Keith
> >
> > If the DPDK Coding Style is based on Linux Coding Style, we should
> > avoid typedefs in general, not just for structures. Please read Linus
> > Torvalds' opinions about it:
> > http://yarchive.net/comp/linux/typedefs.html
> >
> > Perhaps the DPDK Coding Style should be updated to clarify this. (Or
> > if we decide otherwise, to explicitly mention this deviation from the
> > Linux coding style.)
>
> It is logical to use typedef's for this kind of scalar type that may
> need to change.
> Names matter, please avoid pid (confuse with posix) and dev (confuse
> with device id).
> I would prefer: rte_portid_t and rte_queueid_t
>
> Longer term, probably rte_eth_devices[] needs to go. Change port id
> into something more like ifindex. And use sparse data structure to
> allow very large number of devices and non-contiguous lookup. Think of
> a VPN server where each VPN connection looks like a DPDK device.
We are using a non-contiguous ifindex in our firmware, for virtual interfaces as you mention, so I get your point here! But until DPDK gets there, I suppose the DPDK port id is considered more or less contiguous.
You clearly have a longer track record working with Linus than me, so if you interpret the coding style like that, I will not object anymore - as my objection was based on coding style. And will someone please update the DPDK Coding Style document accordingly...
rte_portid_t is fine with me, but why not just rte_port_t?
PS: uint16_t is a standard C type, not a Linux specific type.
Med venlig hilsen / kind regards
- Morten Brørup
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments
2017-07-12 15:57 ` Morten Brørup
@ 2017-07-12 16:23 ` Thomas Monjalon
2017-07-12 18:20 ` Wiles, Keith
2017-07-21 15:03 ` Bruce Richardson
0 siblings, 2 replies; 155+ messages in thread
From: Thomas Monjalon @ 2017-07-12 16:23 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger, Wiles, Keith, Olivier Matz
Cc: Yang, Zhiyong, dev, Wang, Zhihong, Yuanhan Liu, Ananyev,
Konstantin, Richardson, Bruce, Chilikin, Andrey, Jan Blunck,
Nélio Laranjeiro, arybchenko, jerin.jacob
12/07/2017 17:57, Morten Brørup:
> From: Stephen Hemminger
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> > > From: Yang, Zhiyong [mailto:zhiyong.yang@intel.com]
> > > > From: Morten Brørup
> > > > > From: Wiles, Keith
> > > > > > > On Jul 11, 2017, at 10:23 AM, Morten Brørup wrote:
> > > > > > > From: Thomas Monjalon
> > > > > > >> 11/07/2017 15:30, Morten Brørup:
> > > > > > >>> Morten Brørup wrote:
> > > > > > >>>> Olivier Matz wrote:
> > > > > > >>>>> As I said in a previous message, I think a good first
> > > > > > >>>>> step would be to introduce a typedef for the port
> > > > > > >>>>> number: rte_eth_port_num_t.
> > > > > > >>>>> It can still be uint8_t for now, and can be switched
> > > > > > >>>>> to 16 bits in one step when everyone uses this new type.
> > > > > > >>>>
> > > > > > >>>> I think that DPDK follows the Linux tradition of exposing
> > > > > > >>>> the variable types, as opposed to hiding them behind
> > > > > > >>>> typedefs. This has the unfortunate consequence that
> > > > > > >>>> when a variable type changes, it has to be changed everywhere.
> > > > > > >>>>
> > > > > > >>>> Introducing a rte_eth_port_num_t will require changing the
> > > > > > >>>> same files at the same locations everywhere, so not even as a
> > > > > > >>>> temporary solution will it be beneficial.
> > > > > > >> [...]
> > > > > > >>> What I was trying to communicate with my long argument
> > > > > > >>> about type definitions was:
> > > > > > >>> When the type changed from 8 bit to 16 bit, the type
> > > > > > >>> needs to change from uint8_t to uint16_t everywhere too,
> > > > > > >>> including in the ethdev APIs.
> > > > > > >>>
> > > > > > >>> Don't start breaking coding conventions here by hiding the
> > > > > > >>> type of this variable.
> > > > > > >>
> > > > > > >> So, Morten, you are against the typedef, right?
> > > > > > >> Because we need to change it everywhere anyway, right?
> > > > > > >>
> > > > > > >> Note: I have no strong opinion.
> > > > > > >
> > > > > > > I'm against the typedef because it would break convention,
> > > > > > > and I'm a strong proponent of conventions.
> > > > > > > In other projects, I'm all for typedefs, virtual classes,
> > > > > > > inheritance etc., but DPDK follows the Linux convention
> > > > > > > of not hiding simple types.
> > > > > > >
> > > > > > > We need to change it from uint8_t everywhere, regardless what
> > > > > > > we change it to. (But if we need to change it again sometime
> > > > > > > in the future, then a typedef will save us next time.)
> > > > > >
> > > > > > If the number of ports go beyond 64K then I will be the first
> > > > > > one (if still alive) to eat this email. :-) The only reason to
> > > > > > have more then 2 bytes would be to encode something into the
> > > > > > port id value, which I could see, but a very slim chance IMHO.
> > > > > >
> > > > > > > My preference: Follow convention and change it to uint16_t
> > > > > > > everywhere.
> > > > > >
> > > > > > As we must change the uint8_t to uint16_t, then I would like it
> > > > > > to be more descriptive via a typedef. I really do not see us
> > > > > > needing to change it again in the near future.
> > > > > > The only reason to make it a typedef is to be able to identify
> > > > > > from just the prototype of the function that it takes a port
> > > > > > ID value, which I am in favor of doing here for that reason.
> > > > >
> > > > > That is not a very good reason: When used as a function
> > > > > parameter, the type is only shown in the function declaration,
> > > > > whereas the variable name is shown every time it is used inside
> > > > > the function.
> > > > > So remember to always use meaningful variable names, such as
> > > > > "port" (like in the mbuf structure) or "port_id" (used in other
> > > > > places).
> > > > >
> > > > > I still don't support typedefs for scalar types. I consider it
> > > > > against the coding style, although after reviewing the official
> > > > > DPDK Coding Style documentation
> > > > > (http://dpdk.org/doc/guides/contributing/coding_style.html),
> > > > > I can see that it is not explicitly stated. Please also note
> > > > > that section 1.5.7 of the DPDK Coding Style documentation says
> > > > > that the _t postfix should be avoided. Anyway, if we end up
> > > > > with a typedef, please don't use something resembling pid_t
> > > > > known from POSIX
> > > > > (https://www.gnu.org/software/libc/manual/html_node/Process-
> > > > > Identification.html).
> > > >
> > > > How about rte_dev_id_t?
> > >
> > > If the DPDK Coding Style is based on Linux Coding Style, we should
> > > avoid typedefs in general, not just for structures. Please read Linus
> > > Torvalds' opinions about it:
> > > http://yarchive.net/comp/linux/typedefs.html
> > >
> > > Perhaps the DPDK Coding Style should be updated to clarify this. (Or
> > > if we decide otherwise, to explicitly mention this deviation from the
> > > Linux coding style.)
> >
> > It is logical to use typedef's for this kind of scalar type that may
> > need to change.
> > Names matter, please avoid pid (confuse with posix) and dev (confuse
> > with device id).
> > I would prefer: rte_portid_t and rte_queueid_t
> >
> > Longer term, probably rte_eth_devices[] needs to go. Change port id
> > into something more like ifindex. And use sparse data structure to
> > allow very large number of devices and non-contiguous lookup. Think of
> > a VPN server where each VPN connection looks like a DPDK device.
>
> We are using a non-contiguous ifindex in our firmware, for virtual
> interfaces as you mention, so I get your point here!
> But until DPDK gets there, I suppose the DPDK port id is considered
> more or less contiguous.
>
> You clearly have a longer track record working with Linus than me,
> so if you interpret the coding style like that, I will not object
> anymore - as my objection was based on coding style. And will someone
> please update the DPDK Coding Style document accordingly...
>
> rte_portid_t is fine with me, but why not just rte_port_t?
One problem with opaque typedef is that we don't know how to print them,
except if we have a PRIx macro.
So I suggest to keep with uint16_t (my preference),
or to add a printf format macro.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments
2017-07-12 16:23 ` Thomas Monjalon
@ 2017-07-12 18:20 ` Wiles, Keith
2017-07-21 15:03 ` Bruce Richardson
1 sibling, 0 replies; 155+ messages in thread
From: Wiles, Keith @ 2017-07-12 18:20 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Morten Brørup, Stephen Hemminger, Olivier Matz, Yang,
Zhiyong, DPDK, Wang, Zhihong, Yuanhan Liu, Ananyev, Konstantin,
Richardson, Bruce, Chilikin, Andrey, Jan Blunck,
Nélio Laranjeiro, Andrew Rybchenko, jerin.jacob
> On Jul 12, 2017, at 11:23 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 12/07/2017 17:57, Morten Brørup:
>> From: Stephen Hemminger
>>> Morten Brørup <mb@smartsharesystems.com> wrote:
>>>> From: Yang, Zhiyong [mailto:zhiyong.yang@intel.com]
>>>>> From: Morten Brørup
>>>>>> From: Wiles, Keith
>>>>>>>> On Jul 11, 2017, at 10:23 AM, Morten Brørup wrote:
>>>>>>>> From: Thomas Monjalon
>>>>>>>>> 11/07/2017 15:30, Morten Brørup:
>>>>>>>>>> Morten Brørup wrote:
>>>>>>>>>>> Olivier Matz wrote:
>>>>>>>>>>>> As I said in a previous message, I think a good first
>>>>>>>>>>>> step would be to introduce a typedef for the port
>>>>>>>>>>>> number: rte_eth_port_num_t.
>>>>>>>>>>>> It can still be uint8_t for now, and can be switched
>>>>>>>>>>>> to 16 bits in one step when everyone uses this new type.
>>>>>>>>>>>
>>>>>>>>>>> I think that DPDK follows the Linux tradition of exposing
>>>>>>>>>>> the variable types, as opposed to hiding them behind
>>>>>>>>>>> typedefs. This has the unfortunate consequence that
>>>>>>>>>>> when a variable type changes, it has to be changed everywhere.
>>>>>>>>>>>
>>>>>>>>>>> Introducing a rte_eth_port_num_t will require changing the
>>>>>>>>>>> same files at the same locations everywhere, so not even as a
>>>>>>>>>>> temporary solution will it be beneficial.
>>>>>>>>> [...]
>>>>>>>>>> What I was trying to communicate with my long argument
>>>>>>>>>> about type definitions was:
>>>>>>>>>> When the type changed from 8 bit to 16 bit, the type
>>>>>>>>>> needs to change from uint8_t to uint16_t everywhere too,
>>>>>>>>>> including in the ethdev APIs.
>>>>>>>>>>
>>>>>>>>>> Don't start breaking coding conventions here by hiding the
>>>>>>>>>> type of this variable.
>>>>>>>>>
>>>>>>>>> So, Morten, you are against the typedef, right?
>>>>>>>>> Because we need to change it everywhere anyway, right?
>>>>>>>>>
>>>>>>>>> Note: I have no strong opinion.
>>>>>>>>
>>>>>>>> I'm against the typedef because it would break convention,
>>>>>>>> and I'm a strong proponent of conventions.
>>>>>>>> In other projects, I'm all for typedefs, virtual classes,
>>>>>>>> inheritance etc., but DPDK follows the Linux convention
>>>>>>>> of not hiding simple types.
>>>>>>>>
>>>>>>>> We need to change it from uint8_t everywhere, regardless what
>>>>>>>> we change it to. (But if we need to change it again sometime
>>>>>>>> in the future, then a typedef will save us next time.)
>>>>>>>
>>>>>>> If the number of ports go beyond 64K then I will be the first
>>>>>>> one (if still alive) to eat this email. :-) The only reason to
>>>>>>> have more then 2 bytes would be to encode something into the
>>>>>>> port id value, which I could see, but a very slim chance IMHO.
>>>>>>>
>>>>>>>> My preference: Follow convention and change it to uint16_t
>>>>>>>> everywhere.
>>>>>>>
>>>>>>> As we must change the uint8_t to uint16_t, then I would like it
>>>>>>> to be more descriptive via a typedef. I really do not see us
>>>>>>> needing to change it again in the near future.
>>>>>>> The only reason to make it a typedef is to be able to identify
>>>>>>> from just the prototype of the function that it takes a port
>>>>>>> ID value, which I am in favor of doing here for that reason.
>>>>>>
>>>>>> That is not a very good reason: When used as a function
>>>>>> parameter, the type is only shown in the function declaration,
>>>>>> whereas the variable name is shown every time it is used inside
>>>>>> the function.
>>>>>> So remember to always use meaningful variable names, such as
>>>>>> "port" (like in the mbuf structure) or "port_id" (used in other
>>>>>> places).
>>>>>>
>>>>>> I still don't support typedefs for scalar types. I consider it
>>>>>> against the coding style, although after reviewing the official
>>>>>> DPDK Coding Style documentation
>>>>>> (http://dpdk.org/doc/guides/contributing/coding_style.html),
>>>>>> I can see that it is not explicitly stated. Please also note
>>>>>> that section 1.5.7 of the DPDK Coding Style documentation says
>>>>>> that the _t postfix should be avoided. Anyway, if we end up
>>>>>> with a typedef, please don't use something resembling pid_t
>>>>>> known from POSIX
>>>>>> (https://www.gnu.org/software/libc/manual/html_node/Process-
>>>>>> Identification.html).
>>>>>
>>>>> How about rte_dev_id_t?
>>>>
>>>> If the DPDK Coding Style is based on Linux Coding Style, we should
>>>> avoid typedefs in general, not just for structures. Please read Linus
>>>> Torvalds' opinions about it:
>>>> http://yarchive.net/comp/linux/typedefs.html
>>>>
>>>> Perhaps the DPDK Coding Style should be updated to clarify this. (Or
>>>> if we decide otherwise, to explicitly mention this deviation from the
>>>> Linux coding style.)
>>>
>>> It is logical to use typedef's for this kind of scalar type that may
>>> need to change.
>>> Names matter, please avoid pid (confuse with posix) and dev (confuse
>>> with device id).
>>> I would prefer: rte_portid_t and rte_queueid_t
>>>
>>> Longer term, probably rte_eth_devices[] needs to go. Change port id
>>> into something more like ifindex. And use sparse data structure to
>>> allow very large number of devices and non-contiguous lookup. Think of
>>> a VPN server where each VPN connection looks like a DPDK device.
>>
>> We are using a non-contiguous ifindex in our firmware, for virtual
>> interfaces as you mention, so I get your point here!
>> But until DPDK gets there, I suppose the DPDK port id is considered
>> more or less contiguous.
>>
>> You clearly have a longer track record working with Linus than me,
>> so if you interpret the coding style like that, I will not object
>> anymore - as my objection was based on coding style. And will someone
>> please update the DPDK Coding Style document accordingly...
>>
>> rte_portid_t is fine with me, but why not just rte_port_t?
>
> One problem with opaque typedef is that we don't know how to print them,
> except if we have a PRIx macro.
>
> So I suggest to keep with uint16_t (my preference),
> or to add a printf format macro.
As in my previous email I think we have settled on uint16_t for the port and not a new typedef. Unless someone can define a compelling reason to use a new typedef.
Regards,
Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port andnbsegments
2017-07-12 16:23 ` Thomas Monjalon
2017-07-12 18:20 ` Wiles, Keith
@ 2017-07-21 15:03 ` Bruce Richardson
1 sibling, 0 replies; 155+ messages in thread
From: Bruce Richardson @ 2017-07-21 15:03 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Morten Brørup, Stephen Hemminger, Wiles, Keith,
Olivier Matz, Yang, Zhiyong, dev, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Chilikin, Andrey, Jan Blunck,
Nélio Laranjeiro, arybchenko, jerin.jacob
On Wed, Jul 12, 2017 at 06:23:38PM +0200, Thomas Monjalon wrote:
> 12/07/2017 17:57, Morten Brørup:
> > From: Stephen Hemminger
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > > > From: Yang, Zhiyong [mailto:zhiyong.yang@intel.com]
> > > > > From: Morten Brørup
> > > > > > From: Wiles, Keith
> > > > > > > > On Jul 11, 2017, at 10:23 AM, Morten Brørup wrote:
> > > > > > > > From: Thomas Monjalon
> > > > > > > >> 11/07/2017 15:30, Morten Brørup:
> > > > > > > >>> Morten Brørup wrote:
> > > > > > > >>>> Olivier Matz wrote:
> > > > > > > >>>>> As I said in a previous message, I think a good first
> > > > > > > >>>>> step would be to introduce a typedef for the port
> > > > > > > >>>>> number: rte_eth_port_num_t.
> > > > > > > >>>>> It can still be uint8_t for now, and can be switched
> > > > > > > >>>>> to 16 bits in one step when everyone uses this new type.
> > > > > > > >>>>
> > > > > > > >>>> I think that DPDK follows the Linux tradition of exposing
> > > > > > > >>>> the variable types, as opposed to hiding them behind
> > > > > > > >>>> typedefs. This has the unfortunate consequence that
> > > > > > > >>>> when a variable type changes, it has to be changed everywhere.
> > > > > > > >>>>
> > > > > > > >>>> Introducing a rte_eth_port_num_t will require changing the
> > > > > > > >>>> same files at the same locations everywhere, so not even as a
> > > > > > > >>>> temporary solution will it be beneficial.
> > > > > > > >> [...]
> > > > > > > >>> What I was trying to communicate with my long argument
> > > > > > > >>> about type definitions was:
> > > > > > > >>> When the type changed from 8 bit to 16 bit, the type
> > > > > > > >>> needs to change from uint8_t to uint16_t everywhere too,
> > > > > > > >>> including in the ethdev APIs.
> > > > > > > >>>
> > > > > > > >>> Don't start breaking coding conventions here by hiding the
> > > > > > > >>> type of this variable.
> > > > > > > >>
> > > > > > > >> So, Morten, you are against the typedef, right?
> > > > > > > >> Because we need to change it everywhere anyway, right?
> > > > > > > >>
> > > > > > > >> Note: I have no strong opinion.
> > > > > > > >
> > > > > > > > I'm against the typedef because it would break convention,
> > > > > > > > and I'm a strong proponent of conventions.
> > > > > > > > In other projects, I'm all for typedefs, virtual classes,
> > > > > > > > inheritance etc., but DPDK follows the Linux convention
> > > > > > > > of not hiding simple types.
> > > > > > > >
> > > > > > > > We need to change it from uint8_t everywhere, regardless what
> > > > > > > > we change it to. (But if we need to change it again sometime
> > > > > > > > in the future, then a typedef will save us next time.)
> > > > > > >
> > > > > > > If the number of ports go beyond 64K then I will be the first
> > > > > > > one (if still alive) to eat this email. :-) The only reason to
> > > > > > > have more then 2 bytes would be to encode something into the
> > > > > > > port id value, which I could see, but a very slim chance IMHO.
> > > > > > >
> > > > > > > > My preference: Follow convention and change it to uint16_t
> > > > > > > > everywhere.
> > > > > > >
> > > > > > > As we must change the uint8_t to uint16_t, then I would like it
> > > > > > > to be more descriptive via a typedef. I really do not see us
> > > > > > > needing to change it again in the near future.
> > > > > > > The only reason to make it a typedef is to be able to identify
> > > > > > > from just the prototype of the function that it takes a port
> > > > > > > ID value, which I am in favor of doing here for that reason.
> > > > > >
> > > > > > That is not a very good reason: When used as a function
> > > > > > parameter, the type is only shown in the function declaration,
> > > > > > whereas the variable name is shown every time it is used inside
> > > > > > the function.
> > > > > > So remember to always use meaningful variable names, such as
> > > > > > "port" (like in the mbuf structure) or "port_id" (used in other
> > > > > > places).
> > > > > >
> > > > > > I still don't support typedefs for scalar types. I consider it
> > > > > > against the coding style, although after reviewing the official
> > > > > > DPDK Coding Style documentation
> > > > > > (http://dpdk.org/doc/guides/contributing/coding_style.html),
> > > > > > I can see that it is not explicitly stated. Please also note
> > > > > > that section 1.5.7 of the DPDK Coding Style documentation says
> > > > > > that the _t postfix should be avoided. Anyway, if we end up
> > > > > > with a typedef, please don't use something resembling pid_t
> > > > > > known from POSIX
> > > > > > (https://www.gnu.org/software/libc/manual/html_node/Process-
> > > > > > Identification.html).
> > > > >
> > > > > How about rte_dev_id_t?
> > > >
> > > > If the DPDK Coding Style is based on Linux Coding Style, we should
> > > > avoid typedefs in general, not just for structures. Please read Linus
> > > > Torvalds' opinions about it:
> > > > http://yarchive.net/comp/linux/typedefs.html
> > > >
> > > > Perhaps the DPDK Coding Style should be updated to clarify this. (Or
> > > > if we decide otherwise, to explicitly mention this deviation from the
> > > > Linux coding style.)
> > >
> > > It is logical to use typedef's for this kind of scalar type that may
> > > need to change.
> > > Names matter, please avoid pid (confuse with posix) and dev (confuse
> > > with device id).
> > > I would prefer: rte_portid_t and rte_queueid_t
> > >
> > > Longer term, probably rte_eth_devices[] needs to go. Change port id
> > > into something more like ifindex. And use sparse data structure to
> > > allow very large number of devices and non-contiguous lookup. Think of
> > > a VPN server where each VPN connection looks like a DPDK device.
> >
> > We are using a non-contiguous ifindex in our firmware, for virtual
> > interfaces as you mention, so I get your point here!
> > But until DPDK gets there, I suppose the DPDK port id is considered
> > more or less contiguous.
> >
> > You clearly have a longer track record working with Linus than me,
> > so if you interpret the coding style like that, I will not object
> > anymore - as my objection was based on coding style. And will someone
> > please update the DPDK Coding Style document accordingly...
> >
> > rte_portid_t is fine with me, but why not just rte_port_t?
>
> One problem with opaque typedef is that we don't know how to print them,
> except if we have a PRIx macro.
>
> So I suggest to keep with uint16_t (my preference),
> or to add a printf format macro.
+1 for using basic types rather than typedefs.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nbsegments
2017-07-12 7:25 ` Morten Brørup
2017-07-12 9:02 ` Yang, Zhiyong
@ 2017-07-12 15:34 ` Wiles, Keith
1 sibling, 0 replies; 155+ messages in thread
From: Wiles, Keith @ 2017-07-12 15:34 UTC (permalink / raw)
To: Morten Brørup
Cc: Thomas Monjalon, DPDK, Olivier Matz, Wang, Zhihong, Yuanhan Liu,
Ananyev, Konstantin, Richardson, Bruce, Chilikin, Andrey,
Jan Blunck, Nélio Laranjeiro, arybchenko, jerin.jacob
> On Jul 12, 2017, at 2:25 AM, Morten Brørup <mb@smartsharesystems.com> wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
>> Sent: Tuesday, July 11, 2017 6:48 PM
>> To: Morten Brørup
>> Cc: Thomas Monjalon; DPDK; Olivier Matz; Wang, Zhihong; Yuanhan Liu;
>> Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan Blunck;
>> Nélio Laranjeiro; arybchenko@solarflare.com;
>> jerin.jacob@caviumnetworks.com
>> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
>> nbsegments
>>
>>
>>> On Jul 11, 2017, at 10:23 AM, Morten Brørup
>> <mb@smartsharesystems.com> wrote:
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
>>>> Sent: Tuesday, July 11, 2017 5:06 PM
>>>> To: Morten Brørup
>>>> Cc: dev@dpdk.org; Wiles, Keith; Olivier Matz; Wang, Zhihong; Yuanhan
>>>> Liu; Ananyev, Konstantin; Richardson, Bruce; Chilikin, Andrey; Jan
>>>> Blunck; nelio.laranjeiro@6wind.com; arybchenko@solarflare.com;
>>>> jerin.jacob@caviumnetworks.com
>>>> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port
>> and
>>>> nbsegments
>>>>
>>>> 11/07/2017 15:30, Morten Brørup:
>>>>> Morten Brørup wrote:
>>>>>> Olivier Matz wrote:
>>>>>>> As I said in a previous message, I think a good first step would
>>>>>>> be to introduce a typedef for the port number:
>>>> rte_eth_port_num_t.
>>>>>>> It can still be uint8_t for now, and can be switched to 16 bits
>>>> in
>>>>>>> one step when everyone uses this new type.
>>>>>>
>>>>>> I think that DPDK follows the Linux tradition of exposing the
>>>>>> variable types, as opposed to hiding them behind typedefs. This
>> has
>>>>>> the unfortunate consequence that when a variable type changes, it
>>>>>> has to be changed everywhere.
>>>>>>
>>>>>> Introducing a rte_eth_port_num_t will require changing the same
>>>>>> files at the same locations everywhere, so not even as a temporary
>>>>>> solution will it be beneficial.
>>>> [...]
>>>>> What I was trying to communicate with my long argument about type
>>>> definitions was: When the type changed from 8 bit to 16 bit, the
>> type
>>>> needs to change from uint8_t to uint16_t everywhere too, including
>> in
>>>> the ethdev APIs.
>>>>>
>>>>> Don't start breaking coding conventions here by hiding the type of
>>>> this variable.
>>>>
>>>> So, Morten, you are against the typedef, right?
>>>> Because we need to change it everywhere anyway, right?
>>>>
>>>> Note: I have no strong opinion.
>>>
>>> I'm against the typedef because it would break convention, and I'm a
>> strong proponent of conventions. In other projects, I'm all for
>> typedefs, virtual classes, inheritance etc., but DPDK follows the Linux
>> convention of not hiding simple types.
>>>
>>> We need to change it from uint8_t everywhere, regardless what we
>>> change it to. (But if we need to change it again sometime in the
>>> future, then a typedef will save us next time.)
>>
>> If the number of ports go beyond 64K then I will be the first one (if
>> still alive) to eat this email. :-) The only reason to have more then 2
>> bytes would be to encode something into the port id value, which I
>> could see, but a very slim chance IMHO.
>>
>>>
>>> However, if we change the convention and start hiding simple types,
>> they still need the rte_ prefix regardless if they are popular or
>> obscure types. Even struct rte_mbuf has the rte_ prefix, and I consider
>> that a very popular type. If so, rte_port_t would be a good name for
>> this type.
>>>
>>> My preference: Follow convention and change it to uint16_t
>> everywhere.
>>>
>>> Med venlig hilsen / kind regards
>>> - Morten Brørup
>>>
>>
>> As we must change the uint8_t to uint16_t, then I would like it to be
>> more descriptive via a typedef. I really do not see us needing to
>> change it again in the near future. The only reason to make it a
>> typedef is to be able to identify from just the prototype of the
>> function that it takes a port ID value, which I am in favor of doing
>> here for that reason.
>
> That is not a very good reason: When used as a function parameter, the type is only shown in the function declaration, whereas the variable name is shown every time it is used inside the function. So remember to always use meaningful variable names, such as "port" (like in the mbuf structure) or "port_id" (used in other places).
I stated in the prototype not docs, which does pop up in some edits and so forth. Using the correct variable names it a given and not the subject of this discussion. Even Torvalds’ states that typedefs are reasonable in some cases, not sure this one falls in to that slot. Typedefs for structures were the main concern from Torvald is what I took from his email and we are not talking about a structure here.
>
>>
>> As for Olivier’s statement about the typedef name I do not see the need
>> for ‘_eth_' to be part of the typedef as it conveys no extra
>> information in the name. Everything port related in DPDK is a ethernet
>> type device or port. If we do add something like fiber channel maybe
>> rte_pid_t is reason to that too, but if it contains ‘_eth_’ it would
>> not.
>>
>> I would like to see names that are just short enough to convey the
>> information and not be redundant. IMHO rte_pid_t is fine, but if we use
>> some something similar to the length of uint8_t (7) or uint16_t (8)
>> characters then we would not have to also reformat the line more then
>> needed. Using rte_pid_t (pid == port_id) we only extend the length by
>> one (or two) characters and most likely the added byte(s) will not
>> cause more format problems in the code.
>
> I still don't support typedefs for scalar types. I consider it against the coding style, although after reviewing the official DPDK Coding Style documentation (http://dpdk.org/doc/guides/contributing/coding_style.html), I can see that it is not explicitly stated. Please also note that section 1.5.7 of the DPDK Coding Style documentation says that the _t postfix should be avoided. Anyway, if we end up with a typedef, please don't use something resembling pid_t known from POSIX (https://www.gnu.org/software/libc/manual/html_node/Process-Identification.html).
Even the Linux uint16_t is a typedef, so even Linux thought using a typedef here was reasonable as it changes from arch to arch. The real question here is does the port id fall into this category. Most likely not in this case unless we think the variable will change in the future or we decide to encode more information to the variable then just a simple port id 0-N.
If we can not determine it falls into this category then we use a scalar typedef like uint16_t, which seems to be the norm here. My point about typedefs mainly was around the name if we have a typedef not to make it 40 characters long or have redundant information in the typedef name.
So it appears the port ID value is not going to change any time soon and we do not need to make it opaque or encode information into the variable, which to me means we need to use the standard typedef from Linux ‘uint16_t’.
Is that agreeable with everyone or do you have a good reason to make it a typedef?
>
>
>>
>> Regards,
>> Keith
Regards,
Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-10 8:15 ` Morten Brørup
2017-07-11 13:25 ` Wiles, Keith
@ 2017-07-11 13:34 ` Wiles, Keith
2017-07-11 13:46 ` Olivier MATZ
1 sibling, 1 reply; 155+ messages in thread
From: Wiles, Keith @ 2017-07-11 13:34 UTC (permalink / raw)
To: Morten Brørup
Cc: Olivier Matz, Wang, Zhihong, Yuanhan Liu, DPDK, Ananyev,
Konstantin, Richardson, Bruce, Chilikin, Andrey, Jan Blunck,
Nélio Laranjeiro, Andrew Rybchenko, thomas.monjalon,
jerin.jacob
Resend because of format problems sorry.
> On Jul 10, 2017, at 3:15 AM, Morten Brørup <mb@smartsharesystems.com> wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
>> Sent: Monday, July 10, 2017 10:00 AM
>> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
>> nb segments
>>
>> Hi,
>>
>> On Tue, 4 Jul 2017 07:54:23 +0000, "Wang, Zhihong"
>> <zhihong.wang@intel.com> wrote:
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>> Sent: Tuesday, April 18, 2017 9:03 PM
>>>> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>>
>>>> Hi Yuanhan,
>>>>
>>>> On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
>>>> <yuanhan.liu@linux.intel.com> wrote:
>>>>> Hi Olivier,
>>>>>
>>>>> On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
>>>>>> Change the size of m->port and m->nb_segs to 16 bits.
>>>>>
>>>>> But all the ethdev APIs are still using 8 bits. 16 bits won't
>>>>> really take effect without updating those APIs. Any plans?
>>>>>
>>>>> --yliu
>>>>
>>>> Yes, there is some work in ethdev, drivers and in example apps to
>>>> make the change effective. I think we could define a specific type
>>>> for a port number, maybe rte_eth_port_num_t. Using this type could
>>>> be a first step (for 17.08) before switching to 16 bits (17.11?).
>>>>
>>>> I'll do the change and send a rfc.
>>>
>>> Ping ;) Is this still in your plan?
>>>
>>
>> Sorry, I don't think I will have time to work on this issue in the
>> coming weeks. If you plan to do it, I will be happy to help with
>> reviews and comments.
>>
>> As I said in a previous message, I think a good first step would be to
>> introduce a typedef for the port number: rte_eth_port_num_t.
>> It can still be uint8_t for now, and can be switched to 16 bits in one
>> step when everyone uses this new type.
>>
>> Olivier
>
> I think that DPDK follows the Linux tradition of exposing the variable types, as opposed to hiding them behind typedefs. This has the unfortunate consequence that when a variable type changes, it has to be changed everywhere.
>
> Introducing a rte_eth_port_num_t will require changing the same files at the same locations everywhere, so not even as a temporary solution will it be beneficial.
I would like to see a much smaller typedef name here, we use it everywhere.
rte_port_id_t
port_id_t
port_num_t
portid_t
pid_t
rte_pid_t
I do not see why it needs to be rte_eth or even rte_, if we do not put eth in the name then is could be used in crypto or someplace else.
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup
Regards,
Keith
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments
2017-07-11 13:34 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments Wiles, Keith
@ 2017-07-11 13:46 ` Olivier MATZ
0 siblings, 0 replies; 155+ messages in thread
From: Olivier MATZ @ 2017-07-11 13:46 UTC (permalink / raw)
To: Wiles, Keith
Cc: Morten Brørup, Wang, Zhihong, Yuanhan Liu, DPDK, Ananyev,
Konstantin, Richardson, Bruce, Chilikin, Andrey, Jan Blunck,
Nélio Laranjeiro, Andrew Rybchenko, thomas.monjalon,
jerin.jacob
On Tue, 11 Jul 2017 13:34:47 +0000, "Wiles, Keith" <keith.wiles@intel.com> wrote:
> Resend because of format problems sorry.
>
> > On Jul 10, 2017, at 3:15 AM, Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> >> Sent: Monday, July 10, 2017 10:00 AM
> >> Subject: Re: [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and
> >> nb segments
> >>
> >> Hi,
> >>
> >> On Tue, 4 Jul 2017 07:54:23 +0000, "Wang, Zhihong"
> >> <zhihong.wang@intel.com> wrote:
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >>>> Sent: Tuesday, April 18, 2017 9:03 PM
> >>>> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>>
> >>>> Hi Yuanhan,
> >>>>
> >>>> On Thu, 6 Apr 2017 13:45:23 +0800, Yuanhan Liu
> >>>> <yuanhan.liu@linux.intel.com> wrote:
> >>>>> Hi Olivier,
> >>>>>
> >>>>> On Tue, Apr 04, 2017 at 06:28:05PM +0200, Olivier Matz wrote:
> >>>>>> Change the size of m->port and m->nb_segs to 16 bits.
> >>>>>
> >>>>> But all the ethdev APIs are still using 8 bits. 16 bits won't
> >>>>> really take effect without updating those APIs. Any plans?
> >>>>>
> >>>>> --yliu
> >>>>
> >>>> Yes, there is some work in ethdev, drivers and in example apps to
> >>>> make the change effective. I think we could define a specific type
> >>>> for a port number, maybe rte_eth_port_num_t. Using this type could
> >>>> be a first step (for 17.08) before switching to 16 bits (17.11?).
> >>>>
> >>>> I'll do the change and send a rfc.
> >>>
> >>> Ping ;) Is this still in your plan?
> >>>
> >>
> >> Sorry, I don't think I will have time to work on this issue in the
> >> coming weeks. If you plan to do it, I will be happy to help with
> >> reviews and comments.
> >>
> >> As I said in a previous message, I think a good first step would be to
> >> introduce a typedef for the port number: rte_eth_port_num_t.
> >> It can still be uint8_t for now, and can be switched to 16 bits in one
> >> step when everyone uses this new type.
> >>
> >> Olivier
> >
> > I think that DPDK follows the Linux tradition of exposing the variable types, as opposed to hiding them behind typedefs. This has the unfortunate consequence that when a variable type changes, it has to be changed everywhere.
> >
> > Introducing a rte_eth_port_num_t will require changing the same files at the same locations everywhere, so not even as a temporary solution will it be beneficial.
>
> I would like to see a much smaller typedef name here, we use it everywhere.
> rte_port_id_t
> port_id_t
> port_num_t
> portid_t
> pid_t
> rte_pid_t
>
> I do not see why it needs to be rte_eth or even rte_, if we do not put eth in the name then is could be used in crypto or someplace else.
rte_ is required because we want to avoid namespace collision.
For instance, portid_t is too generic, and we would take the risk
that it is also defined for something else in another .h file.
Knowing it is is an ethernet port identifier, I also think eth_ is more
consistent regarding what we already have in rte_ethdev.h.
About num vs id, I have no strong opinion.
*pid_t looks really unclear to me :)
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 7/8] mbuf: move sequence number in second cache line
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
` (5 preceding siblings ...)
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 6/8] mbuf: use 2 bytes for port and nb segments Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 8/8] mbuf: add a timestamp field Olivier Matz
2017-04-05 9:37 ` [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization Thomas Monjalon
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
Move this field in the second cache line, since no driver use it
in Rx path. The freed space will be used by a timestamp in next
commit.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
lib/librte_mbuf/rte_mbuf.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 323a1ac16..349f0512e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -477,8 +477,6 @@ struct rte_mbuf {
uint32_t usr; /**< User defined tags. See rte_distributor_process() */
} hash; /**< hash information */
- uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
-
/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
uint16_t vlan_tci_outer;
@@ -523,6 +521,10 @@ struct rte_mbuf {
/** Timesync flags for use with IEEE1588. */
uint16_t timesync;
+
+ /** Sequence number. See also rte_reorder_insert(). */
+ uint32_t seqn;
+
} __rte_cache_aligned;
/**
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* [dpdk-dev] [PATCH v2 8/8] mbuf: add a timestamp field
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
` (6 preceding siblings ...)
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 7/8] mbuf: move sequence number in second cache line Olivier Matz
@ 2017-04-04 16:28 ` Olivier Matz
2017-04-05 9:37 ` [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization Thomas Monjalon
8 siblings, 0 replies; 155+ messages in thread
From: Olivier Matz @ 2017-04-04 16:28 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, thomas.monjalon,
jerin.jacob
The field itself is not fully described yet, but this commit reserves
the room in the mbuf.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
lib/librte_mbuf/rte_mbuf.c | 2 ++
lib/librte_mbuf/rte_mbuf.h | 12 ++++++++++++
2 files changed, 14 insertions(+)
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 207bf3dd3..0e3e36a58 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -323,6 +323,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
case PKT_RX_QINQ_STRIPPED: return "PKT_RX_QINQ_STRIPPED";
case PKT_RX_LRO: return "PKT_RX_LRO";
+ case PKT_RX_TIMESTAMP: return "PKT_RX_TIMESTAMP";
default: return NULL;
}
}
@@ -357,6 +358,7 @@ rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
{ PKT_RX_IEEE1588_TMST, PKT_RX_IEEE1588_TMST, NULL },
{ PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ_STRIPPED, NULL },
{ PKT_RX_LRO, PKT_RX_LRO, NULL },
+ { PKT_RX_TIMESTAMP, PKT_RX_TIMESTAMP, NULL },
};
const char *name;
unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 349f0512e..9dd8e807e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -184,6 +184,11 @@ extern "C" {
*/
#define PKT_RX_LRO (1ULL << 16)
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP (1ULL << 17)
+
/* add new RX flags here */
/* add new TX flags here */
@@ -481,6 +486,12 @@ struct rte_mbuf {
uint16_t vlan_tci_outer;
uint16_t buf_len; /**< Length of segment buffer. */
+
+ /** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+ * are not normalized but are always the same for a given port.
+ */
+ uint64_t timestamp;
+
/* second cache line - fields only used in slow path or on TX */
MARKER cacheline1 __rte_cache_min_aligned;
@@ -1208,6 +1219,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
mi->nb_segs = 1;
mi->ol_flags = m->ol_flags | IND_ATTACHED_MBUF;
mi->packet_type = m->packet_type;
+ mi->timestamp = m->timestamp;
__rte_mbuf_sanity_check(mi, 1);
__rte_mbuf_sanity_check(m, 0);
--
2.11.0
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-04 16:27 ` [dpdk-dev] [PATCH v2 0/8] " Olivier Matz
` (7 preceding siblings ...)
2017-04-04 16:28 ` [dpdk-dev] [PATCH v2 8/8] mbuf: add a timestamp field Olivier Matz
@ 2017-04-05 9:37 ` Thomas Monjalon
2017-04-05 9:46 ` Olivier MATZ
` (3 more replies)
8 siblings, 4 replies; 155+ messages in thread
From: Thomas Monjalon @ 2017-04-05 9:37 UTC (permalink / raw)
To: Olivier Matz
Cc: dev, konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, jerin.jacob
2017-04-04 18:27, Olivier Matz:
> Based on discussions done in [1] and in this thread, this patchset reorganizes
> the mbuf.
>
> The main changes are:
> - reorder structure to increase vector performance on some non-ia
> platforms.
> - add a 64bits timestamp field in the 1st cache line. This timestamp
> is not normalized, i.e. no unit or time reference is enforced. A
> library may be added to do this job in the future.
> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> in the pool, avoiding the need of setting m->next (located in the
> 2nd cache line) in the Rx path for mono-segment packets.
> - change port and nb_segs to 16 bits
> - move seqn in the 2nd cache line
Applied, thanks for the long work
We need to add a patch to bump ABIVER and document the changes.
> Things discussed but not done in the patchset:
> - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> them in the Rx path, so it could introduce a performance regression, or
> it would require to change all the drivers, which is not an easy task.
If it is worth to move these fields in 2nd cache line,
can we plan to rework drivers for not setting them in Rx?
> Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> idea of what could be done.
Yes drivers patches are welcome :)
Please target RC2 for these changes.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-05 9:37 ` [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization Thomas Monjalon
@ 2017-04-05 9:46 ` Olivier MATZ
2017-04-05 9:48 ` Richardson, Bruce
` (2 subsequent siblings)
3 siblings, 0 replies; 155+ messages in thread
From: Olivier MATZ @ 2017-04-05 9:46 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, jerin.jacob
On Wed, 05 Apr 2017 11:37:39 +0200, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 2017-04-04 18:27, Olivier Matz:
> > Based on discussions done in [1] and in this thread, this patchset reorganizes
> > the mbuf.
> >
> > The main changes are:
> > - reorder structure to increase vector performance on some non-ia
> > platforms.
> > - add a 64bits timestamp field in the 1st cache line. This timestamp
> > is not normalized, i.e. no unit or time reference is enforced. A
> > library may be added to do this job in the future.
> > - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> > in the pool, avoiding the need of setting m->next (located in the
> > 2nd cache line) in the Rx path for mono-segment packets.
> > - change port and nb_segs to 16 bits
> > - move seqn in the 2nd cache line
>
> Applied, thanks for the long work
>
> We need to add a patch to bump ABIVER and document the changes.
Thanks Thomas. I'm on it, I will send it ASAP.
> > Things discussed but not done in the patchset:
> > - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> > them in the Rx path, so it could introduce a performance regression, or
> > it would require to change all the drivers, which is not an easy task.
>
> If it is worth to move these fields in 2nd cache line,
> can we plan to rework drivers for not setting them in Rx?
I think it's worth doing the driver modification, it may gain some
cycles. Once it's done, it becomes easy to see the impact of moving the
fields... except if it breaks a vector code ;)
I think this move should only occur if we need more room in the first
cache line.
>
> > Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> > by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> > idea of what could be done.
>
> Yes drivers patches are welcome :)
> Please target RC2 for these changes.
>
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-05 9:37 ` [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization Thomas Monjalon
2017-04-05 9:46 ` Olivier MATZ
@ 2017-04-05 9:48 ` Richardson, Bruce
2017-04-05 12:06 ` Ferruh Yigit
2017-04-14 13:10 ` Ferruh Yigit
3 siblings, 0 replies; 155+ messages in thread
From: Richardson, Bruce @ 2017-04-05 9:48 UTC (permalink / raw)
To: Thomas Monjalon, Olivier Matz
Cc: dev, Ananyev, Konstantin, mb, Chilikin, Andrey, jblunck,
nelio.laranjeiro, arybchenko, jerin.jacob
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, April 5, 2017 10:38 AM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; mb@smartsharesystems.com;
> Chilikin, Andrey <andrey.chilikin@intel.com>; jblunck@infradead.org;
> nelio.laranjeiro@6wind.com; arybchenko@solarflare.com;
> jerin.jacob@caviumnetworks.com
> Subject: Re: [PATCH v2 0/8] mbuf: structure reorganization
>
> 2017-04-04 18:27, Olivier Matz:
> > Based on discussions done in [1] and in this thread, this patchset
> > reorganizes the mbuf.
> >
> > The main changes are:
> > - reorder structure to increase vector performance on some non-ia
> > platforms.
> > - add a 64bits timestamp field in the 1st cache line. This timestamp
> > is not normalized, i.e. no unit or time reference is enforced. A
> > library may be added to do this job in the future.
> > - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> > in the pool, avoiding the need of setting m->next (located in the
> > 2nd cache line) in the Rx path for mono-segment packets.
> > - change port and nb_segs to 16 bits
> > - move seqn in the 2nd cache line
>
> Applied, thanks for the long work
>
+1
> We need to add a patch to bump ABIVER and document the changes.
>
>
> > Things discussed but not done in the patchset:
> > - move refcnt and nb_segs to the 2nd cache line: many drivers sets
> > them in the Rx path, so it could introduce a performance regression,
> or
> > it would require to change all the drivers, which is not an easy task.
>
> If it is worth to move these fields in 2nd cache line, can we plan to
> rework drivers for not setting them in Rx?
Any drivers that are already setting these fields directly may get a perf bump by not doing so. However, I'm not sure there is a compelling need to move them down just yet. Let's try and avoid breaking the mbuf again for a few releases.
>
> > Once this patchset is pushed, the Rx path of drivers could be
> > optimized a bit, by removing writes to m->next, m->nb_segs and
> > m->refcnt. The patch 4/8 gives an idea of what could be done.
>
> Yes drivers patches are welcome :)
> Please target RC2 for these changes.
We indeed plan to do so!
/Bruce
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-05 9:37 ` [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization Thomas Monjalon
2017-04-05 9:46 ` Olivier MATZ
2017-04-05 9:48 ` Richardson, Bruce
@ 2017-04-05 12:06 ` Ferruh Yigit
2017-04-14 13:10 ` Ferruh Yigit
3 siblings, 0 replies; 155+ messages in thread
From: Ferruh Yigit @ 2017-04-05 12:06 UTC (permalink / raw)
To: nelio.laranjeiro, Adrien Mazarguil
Cc: Thomas Monjalon, Olivier Matz, dev, konstantin.ananyev,
bruce.richardson, mb, andrey.chilikin, jblunck, arybchenko,
jerin.jacob
On 4/5/2017 10:37 AM, Thomas Monjalon wrote:
> 2017-04-04 18:27, Olivier Matz:
>> Based on discussions done in [1] and in this thread, this patchset reorganizes
>> the mbuf.
>>
>> The main changes are:
>> - reorder structure to increase vector performance on some non-ia
>> platforms.
>> - add a 64bits timestamp field in the 1st cache line. This timestamp
>> is not normalized, i.e. no unit or time reference is enforced. A
>> library may be added to do this job in the future.
>> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
>> in the pool, avoiding the need of setting m->next (located in the
>> 2nd cache line) in the Rx path for mono-segment packets.
>> - change port and nb_segs to 16 bits
>> - move seqn in the 2nd cache line
>
> Applied, thanks for the long work
Hi Nelio, Adrien,
After this patch, mlx5 with debug enabled giving following build error
[1] with gcc, not really sure about reason of the error, can you please
check?
[1]
.../drivers/net/mlx5/mlx5_rxtx.c: In function ‘mlx5_rx_burst’:
.../drivers/net/mlx5/mlx5_rxtx.c:2082:17: error: ‘len’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
DATA_LEN(seg) = len;
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-05 9:37 ` [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization Thomas Monjalon
` (2 preceding siblings ...)
2017-04-05 12:06 ` Ferruh Yigit
@ 2017-04-14 13:10 ` Ferruh Yigit
2017-04-18 13:04 ` Olivier MATZ
3 siblings, 1 reply; 155+ messages in thread
From: Ferruh Yigit @ 2017-04-14 13:10 UTC (permalink / raw)
To: Thomas Monjalon, Olivier Matz
Cc: dev, konstantin.ananyev, bruce.richardson, mb, andrey.chilikin,
jblunck, nelio.laranjeiro, arybchenko, jerin.jacob
On 4/5/2017 10:37 AM, Thomas Monjalon wrote:
> 2017-04-04 18:27, Olivier Matz:
>> Based on discussions done in [1] and in this thread, this patchset reorganizes
>> the mbuf.
>>
>> The main changes are:
>> - reorder structure to increase vector performance on some non-ia
>> platforms.
>> - add a 64bits timestamp field in the 1st cache line. This timestamp
>> is not normalized, i.e. no unit or time reference is enforced. A
>> library may be added to do this job in the future.
>> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
>> in the pool, avoiding the need of setting m->next (located in the
>> 2nd cache line) in the Rx path for mono-segment packets.
>> - change port and nb_segs to 16 bits
>> - move seqn in the 2nd cache line
>
> Applied, thanks for the long work
>
<...>
>> Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
>> by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
>> idea of what could be done.
Hi Olivier,
Some driver patches already received for this update, but not all yet.
Can you please describe what changes are required in PMDs after this
patch? And what will be effect of doing changes or not?
Later we can circulate this information through the PMD maintainers to
be sure proper updates done.
Thanks,
ferruh
>
> Yes drivers patches are welcome :)
> Please target RC2 for these changes.
>
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-14 13:10 ` Ferruh Yigit
@ 2017-04-18 13:04 ` Olivier MATZ
2017-04-19 9:39 ` Thomas Monjalon
0 siblings, 1 reply; 155+ messages in thread
From: Olivier MATZ @ 2017-04-18 13:04 UTC (permalink / raw)
To: Ferruh Yigit
Cc: Thomas Monjalon, dev, konstantin.ananyev, bruce.richardson, mb,
andrey.chilikin, jblunck, nelio.laranjeiro, arybchenko,
jerin.jacob
Hi Ferruh,
On Fri, 14 Apr 2017 14:10:33 +0100, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 4/5/2017 10:37 AM, Thomas Monjalon wrote:
> > 2017-04-04 18:27, Olivier Matz:
> >> Based on discussions done in [1] and in this thread, this patchset reorganizes
> >> the mbuf.
> >>
> >> The main changes are:
> >> - reorder structure to increase vector performance on some non-ia
> >> platforms.!
> >> - add a 64bits timestamp field in the 1st cache line. This timestamp
> >> is not normalized, i.e. no unit or time reference is enforced. A
> >> library may be added to do this job in the future.
> >> - m->next, m->nb_segs, and m->refcnt are always initialized for mbufs
> >> in the pool, avoiding the need of setting m->next (located in the
> >> 2nd cache line) in the Rx path for mono-segment packets.
> >> - change port and nb_segs to 16 bits
> >> - move seqn in the 2nd cache line
> >
> > Applied, thanks for the long work
> >
>
> <...>
>
> >> Once this patchset is pushed, the Rx path of drivers could be optimized a bit,
> >> by removing writes to m->next, m->nb_segs and m->refcnt. The patch 4/8 gives an
> >> idea of what could be done.
>
> Hi Olivier,
>
> Some driver patches already received for this update, but not all yet.
>
> Can you please describe what changes are required in PMDs after this
> patch? And what will be effect of doing changes or not?
Yes, I will do it.
> Later we can circulate this information through the PMD maintainers to
> be sure proper updates done.
That would be good.
Do you know what will be the procedure to inform the PMD maintainers?
Is there a specific mailing list?
Thanks,
Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-18 13:04 ` Olivier MATZ
@ 2017-04-19 9:39 ` Thomas Monjalon
2017-04-19 12:28 ` Olivier MATZ
0 siblings, 1 reply; 155+ messages in thread
From: Thomas Monjalon @ 2017-04-19 9:39 UTC (permalink / raw)
To: dev
18/04/2017 15:04, Olivier MATZ:
> On Fri, 14 Apr 2017 14:10:33 +0100, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:
> > > 2017-04-04 18:27, Olivier Matz:
> > >> Once this patchset is pushed, the Rx path of drivers could be optimized
> > >> a bit, by removing writes to m->next, m->nb_segs and m->refcnt. The
> > >> patch 4/8 gives an idea of what could be done.
> >
> > Hi Olivier,
> >
> > Some driver patches already received for this update, but not all yet.
> >
> > Can you please describe what changes are required in PMDs after this
> > patch? And what will be effect of doing changes or not?
>
> Yes, I will do it.
>
> > Later we can circulate this information through the PMD maintainers to
> > be sure proper updates done.
>
> That would be good.
>
> Do you know what will be the procedure to inform the PMD maintainers?
> Is there a specific mailing list?
We should explain the required changes on dev@dpdk.org as it can be
interesting for a lot of people (not only current maintainers).
Then we just have to make sure that the PMDs are updated accordingly
in a good timeframe (1 or 2 releases).
If we feel someone miss an important message, we can ping him directly,
without dev@dpdk.org cc'ed to make sure it pops up in his inbox.
The other communication channel to ping people is IRC freenode #dpdk.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-19 9:39 ` Thomas Monjalon
@ 2017-04-19 12:28 ` Olivier MATZ
2017-04-19 12:56 ` Thomas Monjalon
0 siblings, 1 reply; 155+ messages in thread
From: Olivier MATZ @ 2017-04-19 12:28 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev
On Wed, 19 Apr 2017 11:39:01 +0200, Thomas Monjalon <thomas@monjalon.net> wrote:
> 18/04/2017 15:04, Olivier MATZ:
> > On Fri, 14 Apr 2017 14:10:33 +0100, Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
> > > > 2017-04-04 18:27, Olivier Matz:
> > > >> Once this patchset is pushed, the Rx path of drivers could be optimized
> > > >> a bit, by removing writes to m->next, m->nb_segs and m->refcnt. The
> > > >> patch 4/8 gives an idea of what could be done.
> > >
> > > Hi Olivier,
> > >
> > > Some driver patches already received for this update, but not all yet.
> > >
> > > Can you please describe what changes are required in PMDs after this
> > > patch? And what will be effect of doing changes or not?
> >
> > Yes, I will do it.
> >
> > > Later we can circulate this information through the PMD maintainers to
> > > be sure proper updates done.
> >
> > That would be good.
> >
> > Do you know what will be the procedure to inform the PMD maintainers?
> > Is there a specific mailing list?
>
> We should explain the required changes on dev@dpdk.org as it can be
> interesting for a lot of people (not only current maintainers).
I agree here.
> Then we just have to make sure that the PMDs are updated accordingly
> in a good timeframe (1 or 2 releases).
> If we feel someone miss an important message, we can ping him directly,
> without dev@dpdk.org cc'ed to make sure it pops up in his inbox.
> The other communication channel to ping people is IRC freenode #dpdk.
Who is the "we"? In that particular case, is it my job?
Shouldn't we notify the PMD maintainers more precisely?
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-19 12:28 ` Olivier MATZ
@ 2017-04-19 12:56 ` Thomas Monjalon
2017-04-19 13:03 ` Ferruh Yigit
0 siblings, 1 reply; 155+ messages in thread
From: Thomas Monjalon @ 2017-04-19 12:56 UTC (permalink / raw)
To: Olivier MATZ; +Cc: dev, ferruh.yigit
19/04/2017 14:28, Olivier MATZ:
> On Wed, 19 Apr 2017 11:39:01 +0200, Thomas Monjalon <thomas@monjalon.net>
wrote:
> > 18/04/2017 15:04, Olivier MATZ:
> > > On Fri, 14 Apr 2017 14:10:33 +0100, Ferruh Yigit
> > > <ferruh.yigit@intel.com>
> >
> > wrote:
> > > > > 2017-04-04 18:27, Olivier Matz:
> > > > >> Once this patchset is pushed, the Rx path of drivers could be
> > > > >> optimized
> > > > >> a bit, by removing writes to m->next, m->nb_segs and m->refcnt. The
> > > > >> patch 4/8 gives an idea of what could be done.
> > > >
> > > > Hi Olivier,
> > > >
> > > > Some driver patches already received for this update, but not all yet.
> > > >
> > > > Can you please describe what changes are required in PMDs after this
> > > > patch? And what will be effect of doing changes or not?
> > >
> > > Yes, I will do it.
> > >
> > > > Later we can circulate this information through the PMD maintainers to
> > > > be sure proper updates done.
> > >
> > > That would be good.
> > >
> > > Do you know what will be the procedure to inform the PMD maintainers?
> > > Is there a specific mailing list?
> >
> > We should explain the required changes on dev@dpdk.org as it can be
> > interesting for a lot of people (not only current maintainers).
>
> I agree here.
>
> > Then we just have to make sure that the PMDs are updated accordingly
> > in a good timeframe (1 or 2 releases).
> > If we feel someone miss an important message, we can ping him directly,
> > without dev@dpdk.org cc'ed to make sure it pops up in his inbox.
> > The other communication channel to ping people is IRC freenode #dpdk.
>
> Who is the "we"? In that particular case, is it my job?
> Shouldn't we notify the PMD maintainers more precisely?
We as a community :)
I think Ferruh will lead the follow-up of this rework,
as next-net maintainer.
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-19 12:56 ` Thomas Monjalon
@ 2017-04-19 13:03 ` Ferruh Yigit
2017-04-19 13:12 ` Thomas Monjalon
0 siblings, 1 reply; 155+ messages in thread
From: Ferruh Yigit @ 2017-04-19 13:03 UTC (permalink / raw)
To: Thomas Monjalon, Olivier MATZ; +Cc: dev
On 4/19/2017 1:56 PM, Thomas Monjalon wrote:
> 19/04/2017 14:28, Olivier MATZ:
>> On Wed, 19 Apr 2017 11:39:01 +0200, Thomas Monjalon <thomas@monjalon.net>
> wrote:
>>> 18/04/2017 15:04, Olivier MATZ:
>>>> On Fri, 14 Apr 2017 14:10:33 +0100, Ferruh Yigit
>>>> <ferruh.yigit@intel.com>
>>>
>>> wrote:
>>>>>> 2017-04-04 18:27, Olivier Matz:
>>>>>>> Once this patchset is pushed, the Rx path of drivers could be
>>>>>>> optimized
>>>>>>> a bit, by removing writes to m->next, m->nb_segs and m->refcnt. The
>>>>>>> patch 4/8 gives an idea of what could be done.
>>>>>
>>>>> Hi Olivier,
>>>>>
>>>>> Some driver patches already received for this update, but not all yet.
>>>>>
>>>>> Can you please describe what changes are required in PMDs after this
>>>>> patch? And what will be effect of doing changes or not?
>>>>
>>>> Yes, I will do it.
>>>>
>>>>> Later we can circulate this information through the PMD maintainers to
>>>>> be sure proper updates done.
>>>>
>>>> That would be good.
>>>>
>>>> Do you know what will be the procedure to inform the PMD maintainers?
>>>> Is there a specific mailing list?
>>>
>>> We should explain the required changes on dev@dpdk.org as it can be
>>> interesting for a lot of people (not only current maintainers).
>>
>> I agree here.
>>
>>> Then we just have to make sure that the PMDs are updated accordingly
>>> in a good timeframe (1 or 2 releases).
>>> If we feel someone miss an important message, we can ping him directly,
>>> without dev@dpdk.org cc'ed to make sure it pops up in his inbox.
>>> The other communication channel to ping people is IRC freenode #dpdk.
>>
>> Who is the "we"? In that particular case, is it my job?
>> Shouldn't we notify the PMD maintainers more precisely?
>
> We as a community :)
> I think Ferruh will lead the follow-up of this rework,
> as next-net maintainer.
I can trace net PMDs.
Lets start in dev mail list and make sure what a PMD maintainer should
do is clear, we can wait for a release for updates, later I can ping
missing ones individually, what do you think?
^ permalink raw reply [flat|nested] 155+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/8] mbuf: structure reorganization
2017-04-19 13:03 ` Ferruh Yigit
@ 2017-04-19 13:12 ` Thomas Monjalon
0 siblings, 0 replies; 155+ messages in thread
From: Thomas Monjalon @ 2017-04-19 13:12 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: Olivier MATZ, dev
19/04/2017 15:03, Ferruh Yigit:
> On 4/19/2017 1:56 PM, Thomas Monjalon wrote:
> > 19/04/2017 14:28, Olivier MATZ:
> >> On Wed, 19 Apr 2017 11:39:01 +0200, Thomas Monjalon <thomas@monjalon.net>
> >
> > wrote:
> >>> 18/04/2017 15:04, Olivier MATZ:
> >>>> On Fri, 14 Apr 2017 14:10:33 +0100, Ferruh Yigit
> >>>> <ferruh.yigit@intel.com>
> >>>
> >>> wrote:
> >>>>>> 2017-04-04 18:27, Olivier Matz:
> >>>>>>> Once this patchset is pushed, the Rx path of drivers could be
> >>>>>>> optimized
> >>>>>>> a bit, by removing writes to m->next, m->nb_segs and m->refcnt. The
> >>>>>>> patch 4/8 gives an idea of what could be done.
> >>>>>
> >>>>> Hi Olivier,
> >>>>>
> >>>>> Some driver patches already received for this update, but not all yet.
> >>>>>
> >>>>> Can you please describe what changes are required in PMDs after this
> >>>>> patch? And what will be effect of doing changes or not?
> >>>>
> >>>> Yes, I will do it.
> >>>>
> >>>>> Later we can circulate this information through the PMD maintainers to
> >>>>> be sure proper updates done.
> >>>>
> >>>> That would be good.
> >>>>
> >>>> Do you know what will be the procedure to inform the PMD maintainers?
> >>>> Is there a specific mailing list?
> >>>
> >>> We should explain the required changes on dev@dpdk.org as it can be
> >>> interesting for a lot of people (not only current maintainers).
> >>
> >> I agree here.
> >>
> >>> Then we just have to make sure that the PMDs are updated accordingly
> >>> in a good timeframe (1 or 2 releases).
> >>> If we feel someone miss an important message, we can ping him directly,
> >>> without dev@dpdk.org cc'ed to make sure it pops up in his inbox.
> >>> The other communication channel to ping people is IRC freenode #dpdk.
> >>
> >> Who is the "we"? In that particular case, is it my job?
> >> Shouldn't we notify the PMD maintainers more precisely?
> >
> > We as a community :)
> > I think Ferruh will lead the follow-up of this rework,
> > as next-net maintainer.
>
> I can trace net PMDs.
>
> Lets start in dev mail list and make sure what a PMD maintainer should
> do is clear, we can wait for a release for updates, later I can ping
> missing ones individually, what do you think?
Sounds like a plan :)
Thank you Ferruh and Olivier
^ permalink raw reply [flat|nested] 155+ messages in thread