DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops
@ 2016-03-28 20:51 Robert Sanford
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer Robert Sanford
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Robert Sanford @ 2016-03-28 20:51 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

This patch series does the following:

* enhances port ring writer test, to send two large, but not full
  bursts; exposes ring writer buffer overflow
* fixes ring writer buffer overflow
* fixes full burst checks in ethdev, ring, and sched f_tx_bulk ops
* fixes ethdev writer, to send bursts no larger than specified max

--------

Robert Sanford (4):
  app/test: enhance test_port_ring_writer
  port: fix ring writer buffer overflow
  port: fix full burst checks in f_tx_bulk ops
  port: fix ethdev writer burst too big

 app/test/test_table_ports.c       |   27 +++++++++++++++++++++++++--
 lib/librte_port/rte_port_ethdev.c |   35 ++++++++++++++---------------------
 lib/librte_port/rte_port_ring.c   |   20 ++++++--------------
 lib/librte_port/rte_port_sched.c  |    7 ++-----
 4 files changed, 47 insertions(+), 42 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer
  2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
@ 2016-03-28 20:51 ` Robert Sanford
  2016-04-01 19:42   ` Sanford, Robert
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow Robert Sanford
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Robert Sanford @ 2016-03-28 20:51 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

Add code to send two 60-packet bursts to a ring port_out.
This tests a ring writer buffer overflow problem and fix
(in patch 2/4).

Signed-off-by: Robert Sanford <rsanford@akamai.com>
---
 app/test/test_table_ports.c |   27 +++++++++++++++++++++++++--
 1 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/app/test/test_table_ports.c b/app/test/test_table_ports.c
index 2532367..0c0ec0a 100644
--- a/app/test/test_table_ports.c
+++ b/app/test/test_table_ports.c
@@ -149,8 +149,8 @@ test_port_ring_writer(void)
 
 	/* -- Traffic TX -- */
 	int expected_pkts, received_pkts;
-	struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
-	struct rte_mbuf *res_mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *res_mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
 
 	port_ring_writer_params.ring = RING_TX;
 	port_ring_writer_params.tx_burst_sz = RTE_PORT_IN_BURST_SIZE_MAX;
@@ -216,5 +216,28 @@ test_port_ring_writer(void)
 	for (i = 0; i < RTE_PORT_IN_BURST_SIZE_MAX; i++)
 		rte_pktmbuf_free(res_mbuf[i]);
 
+	/* TX Bulk - send two 60-packet bursts */
+	uint64_t pkt_mask = 0xfffffffffffffff0ULL;
+
+	for (i = 0; i < 4; i++)
+		mbuf[i] = NULL;
+	for (i = 4; i < 64; i++)
+		mbuf[i] = rte_pktmbuf_alloc(pool);
+	rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
+	for (i = 4; i < 64; i++)
+		mbuf[i] = rte_pktmbuf_alloc(pool);
+	rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
+	rte_port_ring_writer_ops.f_flush(port);
+
+	expected_pkts = 2 * 60;
+	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
+		(void **)res_mbuf, 2 * RTE_PORT_IN_BURST_SIZE_MAX);
+
+	if (received_pkts != expected_pkts)
+		return -10;
+
+	for (i = 0; i < received_pkts; i++)
+		rte_pktmbuf_free(res_mbuf[i]);
+
 	return 0;
 }
-- 
1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow
  2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer Robert Sanford
@ 2016-03-28 20:51 ` Robert Sanford
  2016-03-31 11:21   ` Dumitrescu, Cristian
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops Robert Sanford
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Robert Sanford @ 2016-03-28 20:51 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

Ring writer tx_bulk functions may write past the end of tx_buf[].
Solution is to double the size of tx_buf[].

Signed-off-by: Robert Sanford <rsanford@akamai.com>
---
 lib/librte_port/rte_port_ring.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index b847fea..765ecc5 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -179,7 +179,7 @@ rte_port_ring_reader_stats_read(void *port,
 struct rte_port_ring_writer {
 	struct rte_port_out_stats stats;
 
-	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
 	struct rte_ring *ring;
 	uint32_t tx_burst_sz;
 	uint32_t tx_buf_count;
@@ -447,7 +447,7 @@ rte_port_ring_writer_stats_read(void *port,
 struct rte_port_ring_writer_nodrop {
 	struct rte_port_out_stats stats;
 
-	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
 	struct rte_ring *ring;
 	uint32_t tx_burst_sz;
 	uint32_t tx_buf_count;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops
  2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer Robert Sanford
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow Robert Sanford
@ 2016-03-28 20:51 ` Robert Sanford
  2016-03-31 15:41   ` Dumitrescu, Cristian
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
  2016-03-30 11:00 ` [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Thomas Monjalon
  4 siblings, 1 reply; 14+ messages in thread
From: Robert Sanford @ 2016-03-28 20:51 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

For several f_tx_bulk functions in rte_port_{ethdev,ring,sched}.c,
it appears that the intent of the bsz_mask logic is to test whether
pkts_mask contains a full burst (i.e., the <tx_burst_sz> least
significant bits are set).

There are two problems with the bsz_mask code: 1) It truncates
by using the wrong size for local variable uint32_t bsz_mask, and
2) We may pass oversized bursts to the underlying ethdev/ring/sched,
e.g., tx_burst_sz=16, bsz_mask=0x8000, and pkts_mask=0x1ffff
(17 packets), results in expr==0, and we send a burst larger than
desired (and non-power-of-2) to the underlying tx burst interface.

We propose to effectively set bsz_mask = (1 << tx_burst_sz) - 1
(while avoiding truncation for tx_burst_sz=64), to cache the mask
value of a full burst, and then do a simple compare with pkts_mask
in each f_tx_bulk.

Signed-off-by: Robert Sanford <rsanford@akamai.com>
---
 lib/librte_port/rte_port_ethdev.c |   15 ++++-----------
 lib/librte_port/rte_port_ring.c   |   16 ++++------------
 lib/librte_port/rte_port_sched.c  |    7 ++-----
 3 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c
index 1c34602..3fb4947 100644
--- a/lib/librte_port/rte_port_ethdev.c
+++ b/lib/librte_port/rte_port_ethdev.c
@@ -188,7 +188,7 @@ rte_port_ethdev_writer_create(void *params, int socket_id)
 	port->queue_id = conf->queue_id;
 	port->tx_burst_sz = conf->tx_burst_sz;
 	port->tx_buf_count = 0;
-	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
+	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
 
 	return port;
 }
@@ -229,12 +229,9 @@ rte_port_ethdev_writer_tx_bulk(void *port,
 {
 	struct rte_port_ethdev_writer *p =
 		(struct rte_port_ethdev_writer *) port;
-	uint32_t bsz_mask = p->bsz_mask;
 	uint32_t tx_buf_count = p->tx_buf_count;
-	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
-			((pkts_mask & bsz_mask) ^ bsz_mask);
 
-	if (expr == 0) {
+	if (pkts_mask == p->bsz_mask) {
 		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
 		uint32_t n_pkts_ok;
 
@@ -369,7 +366,7 @@ rte_port_ethdev_writer_nodrop_create(void *params, int socket_id)
 	port->queue_id = conf->queue_id;
 	port->tx_burst_sz = conf->tx_burst_sz;
 	port->tx_buf_count = 0;
-	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
+	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
 
 	/*
 	 * When n_retries is 0 it means that we should wait for every packet to
@@ -435,13 +432,9 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void *port,
 {
 	struct rte_port_ethdev_writer_nodrop *p =
 		(struct rte_port_ethdev_writer_nodrop *) port;
-
-	uint32_t bsz_mask = p->bsz_mask;
 	uint32_t tx_buf_count = p->tx_buf_count;
-	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
-			((pkts_mask & bsz_mask) ^ bsz_mask);
 
-	if (expr == 0) {
+	if (pkts_mask == p->bsz_mask) {
 		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
 		uint32_t n_pkts_ok;
 
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 765ecc5..b36e4ce 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -217,7 +217,7 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	port->ring = conf->ring;
 	port->tx_burst_sz = conf->tx_burst_sz;
 	port->tx_buf_count = 0;
-	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
+	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
 	port->is_multi = is_multi;
 
 	return port;
@@ -299,13 +299,9 @@ rte_port_ring_writer_tx_bulk_internal(void *port,
 {
 	struct rte_port_ring_writer *p =
 		(struct rte_port_ring_writer *) port;
-
-	uint32_t bsz_mask = p->bsz_mask;
 	uint32_t tx_buf_count = p->tx_buf_count;
-	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
-			((pkts_mask & bsz_mask) ^ bsz_mask);
 
-	if (expr == 0) {
+	if (pkts_mask == p->bsz_mask) {
 		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
 		uint32_t n_pkts_ok;
 
@@ -486,7 +482,7 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	port->ring = conf->ring;
 	port->tx_burst_sz = conf->tx_burst_sz;
 	port->tx_buf_count = 0;
-	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
+	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
 	port->is_multi = is_multi;
 
 	/*
@@ -613,13 +609,9 @@ rte_port_ring_writer_nodrop_tx_bulk_internal(void *port,
 {
 	struct rte_port_ring_writer_nodrop *p =
 		(struct rte_port_ring_writer_nodrop *) port;
-
-	uint32_t bsz_mask = p->bsz_mask;
 	uint32_t tx_buf_count = p->tx_buf_count;
-	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
-			((pkts_mask & bsz_mask) ^ bsz_mask);
 
-	if (expr == 0) {
+	if (pkts_mask == p->bsz_mask) {
 		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
 		uint32_t n_pkts_ok;
 
diff --git a/lib/librte_port/rte_port_sched.c b/lib/librte_port/rte_port_sched.c
index c5ff8ab..5b6afc4 100644
--- a/lib/librte_port/rte_port_sched.c
+++ b/lib/librte_port/rte_port_sched.c
@@ -185,7 +185,7 @@ rte_port_sched_writer_create(void *params, int socket_id)
 	port->sched = conf->sched;
 	port->tx_burst_sz = conf->tx_burst_sz;
 	port->tx_buf_count = 0;
-	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
+	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
 
 	return port;
 }
@@ -214,12 +214,9 @@ rte_port_sched_writer_tx_bulk(void *port,
 		uint64_t pkts_mask)
 {
 	struct rte_port_sched_writer *p = (struct rte_port_sched_writer *) port;
-	uint32_t bsz_mask = p->bsz_mask;
 	uint32_t tx_buf_count = p->tx_buf_count;
-	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
-			((pkts_mask & bsz_mask) ^ bsz_mask);
 
-	if (expr == 0) {
+	if (pkts_mask == p->bsz_mask) {
 		__rte_unused uint32_t nb_tx;
 		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
 
-- 
1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
                   ` (2 preceding siblings ...)
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops Robert Sanford
@ 2016-03-28 20:51 ` Robert Sanford
  2016-03-31 13:22   ` Dumitrescu, Cristian
  2016-03-30 11:00 ` [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Thomas Monjalon
  4 siblings, 1 reply; 14+ messages in thread
From: Robert Sanford @ 2016-03-28 20:51 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
send bursts larger than tx_burst_sz to the underlying ethdev.
Some PMDs (e.g., ixgbe) may truncate this request to their maximum
burst size, resulting in unnecessary enqueuing failures or ethdev
writer retries.

We propose to fix this by moving the tx buffer flushing logic from
*after* the loop that puts all packets into the tx buffer, to *inside*
the loop, testing for a full burst when adding each packet.

Signed-off-by: Robert Sanford <rsanford@akamai.com>
---
 lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c
index 3fb4947..1283338 100644
--- a/lib/librte_port/rte_port_ethdev.c
+++ b/lib/librte_port/rte_port_ethdev.c
@@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void *port,
 struct rte_port_ethdev_writer {
 	struct rte_port_out_stats stats;
 
-	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
 	uint32_t tx_burst_sz;
 	uint16_t tx_buf_count;
 	uint64_t bsz_mask;
@@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
 			p->tx_buf[tx_buf_count++] = pkt;
 			RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
 			pkts_mask &= ~pkt_mask;
-		}
 
-		p->tx_buf_count = tx_buf_count;
-		if (tx_buf_count >= p->tx_burst_sz)
-			send_burst(p);
+			p->tx_buf_count = tx_buf_count;
+			if (tx_buf_count >= p->tx_burst_sz)
+				send_burst(p);
+		}
 	}
 
 	return 0;
@@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void *port,
 struct rte_port_ethdev_writer_nodrop {
 	struct rte_port_out_stats stats;
 
-	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
 	uint32_t tx_burst_sz;
 	uint16_t tx_buf_count;
 	uint64_t bsz_mask;
@@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void *port,
 			p->tx_buf[tx_buf_count++] = pkt;
 			RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
 			pkts_mask &= ~pkt_mask;
-		}
 
-		p->tx_buf_count = tx_buf_count;
-		if (tx_buf_count >= p->tx_burst_sz)
-			send_burst_nodrop(p);
+			p->tx_buf_count = tx_buf_count;
+			if (tx_buf_count >= p->tx_burst_sz)
+				send_burst_nodrop(p);
+		}
 	}
 
 	return 0;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops
  2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
                   ` (3 preceding siblings ...)
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
@ 2016-03-30 11:00 ` Thomas Monjalon
  2016-03-30 11:58   ` Dumitrescu, Cristian
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Monjalon @ 2016-03-30 11:00 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev, Robert Sanford

2016-03-28 16:51, Robert Sanford:
> This patch series does the following:
> 
> * enhances port ring writer test, to send two large, but not full
>   bursts; exposes ring writer buffer overflow
> * fixes ring writer buffer overflow
> * fixes full burst checks in ethdev, ring, and sched f_tx_bulk ops
> * fixes ethdev writer, to send bursts no larger than specified max

Cristian, a fast review would be helpful to integrate these fixes
in 16.04.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops
  2016-03-30 11:00 ` [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Thomas Monjalon
@ 2016-03-30 11:58   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 14+ messages in thread
From: Dumitrescu, Cristian @ 2016-03-30 11:58 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Robert Sanford



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, March 30, 2016 12:00 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; Robert Sanford <rsanford2@gmail.com>
> Subject: Re: [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops
> 
> 2016-03-28 16:51, Robert Sanford:
> > This patch series does the following:
> >
> > * enhances port ring writer test, to send two large, but not full
> >   bursts; exposes ring writer buffer overflow
> > * fixes ring writer buffer overflow
> > * fixes full burst checks in ethdev, ring, and sched f_tx_bulk ops
> > * fixes ethdev writer, to send bursts no larger than specified max
> 
> Cristian, a fast review would be helpful to integrate these fixes
> in 16.04.

Please note Robert's patches are not trivial and the potential impact is quite high. We'll keep you update We are working on reviewing them and will send update on our progress.

These potential changes come up very late into the 16.04 cycle. Given their nature, once they get merged, one RC cycle is required by the validation team to fully test them, so I want to make sure I flag this to you as early as possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow Robert Sanford
@ 2016-03-31 11:21   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 14+ messages in thread
From: Dumitrescu, Cristian @ 2016-03-31 11:21 UTC (permalink / raw)
  To: Robert Sanford, dev



> -----Original Message-----
> From: Robert Sanford [mailto:rsanford2@gmail.com]
> Sent: Monday, March 28, 2016 9:52 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH 2/4] port: fix ring writer buffer overflow
> 
> Ring writer tx_bulk functions may write past the end of tx_buf[].
> Solution is to double the size of tx_buf[].
> 
> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> ---
>  lib/librte_port/rte_port_ring.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
> index b847fea..765ecc5 100644
> --- a/lib/librte_port/rte_port_ring.c
> +++ b/lib/librte_port/rte_port_ring.c
> @@ -179,7 +179,7 @@ rte_port_ring_reader_stats_read(void *port,
>  struct rte_port_ring_writer {
>  	struct rte_port_out_stats stats;
> 
> -	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>  	struct rte_ring *ring;
>  	uint32_t tx_burst_sz;
>  	uint32_t tx_buf_count;
> @@ -447,7 +447,7 @@ rte_port_ring_writer_stats_read(void *port,
>  struct rte_port_ring_writer_nodrop {
>  	struct rte_port_out_stats stats;
> 
> -	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>  	struct rte_ring *ring;
>  	uint32_t tx_burst_sz;
>  	uint32_t tx_buf_count;
> --
> 1.7.1

Hi Robert,

How is the buffer overflow taking place?

After looking long and hard, I spotted that buffer overflow can potentially take place when the following conditions are met:
1. The input packet burst does not meet the conditions of (a) being contiguous (first n bits set in pkts_mask, all the other bits cleared) and (b) containing a full burst, i.e. at least tx_burst_sz packets (n >= tx_burst_size). This is the slow(er) code path taken when local variable expr != 0.
2. There are some packets already in the buffer.
3. The number of packets in the incoming burst (i.e. popcount(pkts_mask)) plus the number of packets already in the buffer exceeds the buffer size (RTE_PORT_IN_BURST_SIZE_MAX, i.e. 64).

Is this the buffer overflow scenario that you detected?

Thanks,
Cristian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
@ 2016-03-31 13:22   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 14+ messages in thread
From: Dumitrescu, Cristian @ 2016-03-31 13:22 UTC (permalink / raw)
  To: Robert Sanford, dev; +Cc: Liang, Cunming



> -----Original Message-----
> From: Robert Sanford [mailto:rsanford2@gmail.com]
> Sent: Monday, March 28, 2016 9:52 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
> 
> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
> send bursts larger than tx_burst_sz to the underlying ethdev.
> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
> burst size, resulting in unnecessary enqueuing failures or ethdev
> writer retries.

Sending bursts larger than tx_burst_sz is actually intentional. The assumption is that NIC performance benefits from larger burst size. So the tx_burst_sz is used as a minimal burst size requirement, not as a maximal or fixed burst size requirement.

I agree with you that a while ago the vector version of IXGBE driver used to work the way you describe it, but I don't think this is the case anymore. As an example, if TX burst size is set to 32 and 48 packets are transmitted, than the PMD will TX all the 48 packets (internally it can work in batches of 4, 8, 32, etc, should not matter) rather than TXing just 32 packets out of 48 and user having to either discard or retry with the remaining 16 packets. I am CC-ing Steve Liang for confirming this.

Is there any PMD that people can name that currently behaves the opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and discard the other 16?

> 
> We propose to fix this by moving the tx buffer flushing logic from
> *after* the loop that puts all packets into the tx buffer, to *inside*
> the loop, testing for a full burst when adding each packet.
> 

The issue I have with this approach is the introduction of a branch that has to be tested for each iteration of the loop rather than once for the entire loop.

The code branch where you add this is actually the slow(er) code path (where local variable expr != 0), which is used for non-contiguous or bursts smaller than tx_burst_sz. Is there a particular reason you are only interested of enabling this strategy (of using tx_burst_sz as a fixed burst size requirement) only on this code path? The reason I am asking is the other fast(er) code path (where expr == 0) also uses tx_burst_sz as a minimal requirement and therefore it can send burst sizes bigger than tx_burst_sz.


> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> ---
>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_port/rte_port_ethdev.c
> b/lib/librte_port/rte_port_ethdev.c
> index 3fb4947..1283338 100644
> --- a/lib/librte_port/rte_port_ethdev.c
> +++ b/lib/librte_port/rte_port_ethdev.c
> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
> *port,
>  struct rte_port_ethdev_writer {
>  	struct rte_port_out_stats stats;
> 
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>  			p->tx_buf[tx_buf_count++] = pkt;
> 
> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &= ~pkt_mask;
> -		}
> 
> -		p->tx_buf_count = tx_buf_count;
> -		if (tx_buf_count >= p->tx_burst_sz)
> -			send_burst(p);
> +			p->tx_buf_count = tx_buf_count;
> +			if (tx_buf_count >= p->tx_burst_sz)
> +				send_burst(p);
> +		}
>  	}

One observation here: if we enable this proposal (which I have an issue with due to the executing the branch per loop iteration rather than once per entire loop), it also eliminates the buffer overflow issue flagged by you in the other email :), so no need to e.g. doble the size of the port internal buffer (tx_buf).

> 
>  	return 0;
> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
> *port,
>  struct rte_port_ethdev_writer_nodrop {
>  	struct rte_port_out_stats stats;
> 
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
> *port,
>  			p->tx_buf[tx_buf_count++] = pkt;
> 
> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &= ~pkt_mask;
> -		}
> 
> -		p->tx_buf_count = tx_buf_count;
> -		if (tx_buf_count >= p->tx_burst_sz)
> -			send_burst_nodrop(p);
> +			p->tx_buf_count = tx_buf_count;
> +			if (tx_buf_count >= p->tx_burst_sz)
> +				send_burst_nodrop(p);
> +		}
>  	}
> 
>  	return 0;
> --
> 1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops Robert Sanford
@ 2016-03-31 15:41   ` Dumitrescu, Cristian
  2016-04-01 19:31     ` Sanford, Robert
  0 siblings, 1 reply; 14+ messages in thread
From: Dumitrescu, Cristian @ 2016-03-31 15:41 UTC (permalink / raw)
  To: Robert Sanford, dev
  Cc: Venkatesan, Venky, Wiles, Keith, Liang, Cunming, Zhang, Helin,
	Richardson, Bruce, Ananyev, Konstantin, Olivier MATZ,
	adrien.mazarguil, Stephen Hemminger,
	Jerin Jacob (Cavium) (jerin.jacob@caviumnetworks.com)



> -----Original Message-----
> From: Robert Sanford [mailto:rsanford2@gmail.com]
> Sent: Monday, March 28, 2016 9:52 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops
> 
> For several f_tx_bulk functions in rte_port_{ethdev,ring,sched}.c,
> it appears that the intent of the bsz_mask logic is to test whether
> pkts_mask contains a full burst (i.e., the <tx_burst_sz> least
> significant bits are set).
> 
> There are two problems with the bsz_mask code: 1) It truncates
> by using the wrong size for local variable uint32_t bsz_mask, and

This is indeed a bug: although port->bsz_mask is always defined as uint64_t, there are several places where we cache it to a local variable which is defined as 32-bit by mistake: uint32_t bsz = p->bsz_mask. Thanks, Robert!

> 2) We may pass oversized bursts to the underlying ethdev/ring/sched,
> e.g., tx_burst_sz=16, bsz_mask=0x8000, and pkts_mask=0x1ffff
> (17 packets), results in expr==0, and we send a burst larger than
> desired (and non-power-of-2) to the underlying tx burst interface.
> 

As stated in another related email, this is done by design, with the key assumption being that larger TX burst sizes will always be beneficial. So tx_burst_size is, by design, a requirement for the *minimal* value of the TX burst size rather than the *exact* value for the burst size.
As an example, when the TX burst size of 32 is set, then larger burst sizes of 33, 34, ..., 40, 41, ..., 48, ..., 64 are welcomed and sent out as a single burst rather than breaking in into multiple fixed size 32-packet bursts. 
For PMDs, burst size (smaller than 64) is typically much lower than the TX ring size (typical value for IXGBE: 512). Same for rte_ring.

So what we are debating here is which of the following two approaches is better:
Approach 1: Consider tx_burst_sz as the minimal burst size, welcome larger bursts and send them as a single burst (i.e. do not break them into fixed tx_burst_sz bursts). This is the existing approach used consistently everywhere in librte_port.
Approach 2: Consider tx_burst_sz as an exact burst size requirement, any larger incoming burst is broken into fixed size bursts of exactly tx_burst_sz packets before send. This is the approach suggested by Robert.

I think we should go for the approach that gives the best performance. Personally, I think Approach 1 (existing) is doing this, but I would like to get more fact-based opinions from the people on this mail list (CC-ing a few key folks), especially PMD and ring maintainers. What is your experience, guys?

> We propose to effectively set bsz_mask = (1 << tx_burst_sz) - 1
> (while avoiding truncation for tx_burst_sz=64), to cache the mask
> value of a full burst, and then do a simple compare with pkts_mask
> in each f_tx_bulk.
> 
> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> ---
>  lib/librte_port/rte_port_ethdev.c |   15 ++++-----------
>  lib/librte_port/rte_port_ring.c   |   16 ++++------------
>  lib/librte_port/rte_port_sched.c  |    7 ++-----
>  3 files changed, 10 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/librte_port/rte_port_ethdev.c
> b/lib/librte_port/rte_port_ethdev.c
> index 1c34602..3fb4947 100644
> --- a/lib/librte_port/rte_port_ethdev.c
> +++ b/lib/librte_port/rte_port_ethdev.c
> @@ -188,7 +188,7 @@ rte_port_ethdev_writer_create(void *params, int
> socket_id)
>  	port->queue_id = conf->queue_id;
>  	port->tx_burst_sz = conf->tx_burst_sz;
>  	port->tx_buf_count = 0;
> -	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
> +	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);

Another way to write this is: port->bsz_mask = RTE_LEN2MASK(conf->tx_burst_sz, uint64_t);

> 
>  	return port;
>  }
> @@ -229,12 +229,9 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>  {
>  	struct rte_port_ethdev_writer *p =
>  		(struct rte_port_ethdev_writer *) port;
> -	uint32_t bsz_mask = p->bsz_mask;
>  	uint32_t tx_buf_count = p->tx_buf_count;
> -	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
> -			((pkts_mask & bsz_mask) ^ bsz_mask);
> 
> -	if (expr == 0) {
> +	if (pkts_mask == p->bsz_mask) {
>  		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
>  		uint32_t n_pkts_ok;
> 
> @@ -369,7 +366,7 @@ rte_port_ethdev_writer_nodrop_create(void
> *params, int socket_id)
>  	port->queue_id = conf->queue_id;
>  	port->tx_burst_sz = conf->tx_burst_sz;
>  	port->tx_buf_count = 0;
> -	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
> +	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
> 
>  	/*
>  	 * When n_retries is 0 it means that we should wait for every packet
> to
> @@ -435,13 +432,9 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
> *port,
>  {
>  	struct rte_port_ethdev_writer_nodrop *p =
>  		(struct rte_port_ethdev_writer_nodrop *) port;
> -
> -	uint32_t bsz_mask = p->bsz_mask;
>  	uint32_t tx_buf_count = p->tx_buf_count;
> -	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
> -			((pkts_mask & bsz_mask) ^ bsz_mask);
> 
> -	if (expr == 0) {
> +	if (pkts_mask == p->bsz_mask) {
>  		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
>  		uint32_t n_pkts_ok;
> 
> diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
> index 765ecc5..b36e4ce 100644
> --- a/lib/librte_port/rte_port_ring.c
> +++ b/lib/librte_port/rte_port_ring.c
> @@ -217,7 +217,7 @@ rte_port_ring_writer_create_internal(void *params,
> int socket_id,
>  	port->ring = conf->ring;
>  	port->tx_burst_sz = conf->tx_burst_sz;
>  	port->tx_buf_count = 0;
> -	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
> +	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
>  	port->is_multi = is_multi;
> 
>  	return port;
> @@ -299,13 +299,9 @@ rte_port_ring_writer_tx_bulk_internal(void *port,
>  {
>  	struct rte_port_ring_writer *p =
>  		(struct rte_port_ring_writer *) port;
> -
> -	uint32_t bsz_mask = p->bsz_mask;
>  	uint32_t tx_buf_count = p->tx_buf_count;
> -	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
> -			((pkts_mask & bsz_mask) ^ bsz_mask);
> 
> -	if (expr == 0) {
> +	if (pkts_mask == p->bsz_mask) {
>  		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
>  		uint32_t n_pkts_ok;
> 
> @@ -486,7 +482,7 @@ rte_port_ring_writer_nodrop_create_internal(void
> *params, int socket_id,
>  	port->ring = conf->ring;
>  	port->tx_burst_sz = conf->tx_burst_sz;
>  	port->tx_buf_count = 0;
> -	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
> +	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
>  	port->is_multi = is_multi;
> 
>  	/*
> @@ -613,13 +609,9 @@ rte_port_ring_writer_nodrop_tx_bulk_internal(void
> *port,
>  {
>  	struct rte_port_ring_writer_nodrop *p =
>  		(struct rte_port_ring_writer_nodrop *) port;
> -
> -	uint32_t bsz_mask = p->bsz_mask;
>  	uint32_t tx_buf_count = p->tx_buf_count;
> -	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
> -			((pkts_mask & bsz_mask) ^ bsz_mask);
> 
> -	if (expr == 0) {
> +	if (pkts_mask == p->bsz_mask) {
>  		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
>  		uint32_t n_pkts_ok;
> 
> diff --git a/lib/librte_port/rte_port_sched.c
> b/lib/librte_port/rte_port_sched.c
> index c5ff8ab..5b6afc4 100644
> --- a/lib/librte_port/rte_port_sched.c
> +++ b/lib/librte_port/rte_port_sched.c
> @@ -185,7 +185,7 @@ rte_port_sched_writer_create(void *params, int
> socket_id)
>  	port->sched = conf->sched;
>  	port->tx_burst_sz = conf->tx_burst_sz;
>  	port->tx_buf_count = 0;
> -	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
> +	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
> 
>  	return port;
>  }
> @@ -214,12 +214,9 @@ rte_port_sched_writer_tx_bulk(void *port,
>  		uint64_t pkts_mask)
>  {
>  	struct rte_port_sched_writer *p = (struct rte_port_sched_writer *)
> port;
> -	uint32_t bsz_mask = p->bsz_mask;
>  	uint32_t tx_buf_count = p->tx_buf_count;
> -	uint64_t expr = (pkts_mask & (pkts_mask + 1)) |
> -			((pkts_mask & bsz_mask) ^ bsz_mask);
> 
> -	if (expr == 0) {
> +	if (pkts_mask == p->bsz_mask) {
>  		__rte_unused uint32_t nb_tx;
>  		uint64_t n_pkts = __builtin_popcountll(pkts_mask);
> 
> --
> 1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops
  2016-03-31 15:41   ` Dumitrescu, Cristian
@ 2016-04-01 19:31     ` Sanford, Robert
  0 siblings, 0 replies; 14+ messages in thread
From: Sanford, Robert @ 2016-04-01 19:31 UTC (permalink / raw)
  To: Dumitrescu, Cristian, dev
  Cc: Venkatesan, Venky, Wiles, Keith, Liang, Cunming, Zhang, Helin,
	Richardson, Bruce, Ananyev, Konstantin, Olivier MATZ,
	adrien.mazarguil, Stephen Hemminger,
	Jerin Jacob (Cavium) (jerin.jacob@caviumnetworks.com)

Hi Cristian,

In hindsight, I was overly agressive in proposing the same change
(approach #2, as you call it below) for rte_port_ring and rte_port_sched.
Changing local variable bsz_mask to uint64_t should be sufficient.

Please see additional comments inline below.



On 3/31/16 11:41 AM, "Dumitrescu, Cristian"
<cristian.dumitrescu@intel.com> wrote:
>> -----Original Message-----
>> From: Robert Sanford [mailto:rsanford2@gmail.com]
>> Sent: Monday, March 28, 2016 9:52 PM
>> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Subject: [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops
>> 
>> For several f_tx_bulk functions in rte_port_{ethdev,ring,sched}.c,
>> it appears that the intent of the bsz_mask logic is to test whether
>> pkts_mask contains a full burst (i.e., the <tx_burst_sz> least
>> significant bits are set).
>> 
>> There are two problems with the bsz_mask code: 1) It truncates
>> by using the wrong size for local variable uint32_t bsz_mask, and
>
>This is indeed a bug: although port->bsz_mask is always defined as
>uint64_t, there are several places where we cache it to a local variable
>which is defined as 32-bit by mistake: uint32_t bsz = p->bsz_mask.
>Thanks, Robert!
>
>> 2) We may pass oversized bursts to the underlying ethdev/ring/sched,
>> e.g., tx_burst_sz=16, bsz_mask=0x8000, and pkts_mask=0x1ffff
>> (17 packets), results in expr==0, and we send a burst larger than
>> desired (and non-power-of-2) to the underlying tx burst interface.
>> 
>
>As stated in another related email, this is done by design, with the key
>assumption being that larger TX burst sizes will always be beneficial. So
>tx_burst_size is, by design, a requirement for the *minimal* value of the
>TX burst size rather than the *exact* value for the burst size.
>As an example, when the TX burst size of 32 is set, then larger burst
>sizes of 33, 34, ..., 40, 41, ..., 48, ..., 64 are welcomed and sent out
>as a single burst rather than breaking in into multiple fixed size
>32-packet bursts. 
>For PMDs, burst size (smaller than 64) is typically much lower than the
>TX ring size (typical value for IXGBE: 512). Same for rte_ring.
>
>So what we are debating here is which of the following two approaches is
>better:
>Approach 1: Consider tx_burst_sz as the minimal burst size, welcome
>larger bursts and send them as a single burst (i.e. do not break them
>into fixed tx_burst_sz bursts). This is the existing approach used
>consistently everywhere in librte_port.
>Approach 2: Consider tx_burst_sz as an exact burst size requirement, any
>larger incoming burst is broken into fixed size bursts of exactly
>tx_burst_sz packets before send. This is the approach suggested by Robert.
>
>I think we should go for the approach that gives the best performance.
>Personally, I think Approach 1 (existing) is doing this, but I would like
>to get more fact-based opinions from the people on this mail list (CC-ing
>a few key folks), especially PMD and ring maintainers. What is your
>experience, guys?


I only advocate approach #2 (for rte_port_ethdev) if we can't be certain
of the semantics of the underlying rte_eth_tx_burst(). e.g., if we attempt
to send 33 packets, and it never enqueues more than 32.

Off topic, general points about PMD APIs:
* From the viewpoint of an API user, having rte_eth_{rx,tx}_burst()
unexpectedly truncate a burst request is an unwelcome surprise, and should
at least be well-documented.
* We previously encountered this problem on the RX-side of IXGBE: We asked
rte_eth_rx_burst() for 64 packets, but it never returned more than 32,
even when there were hundreds "done". This was a little unexpected.

* From a quick look at the latest code, it appears that
ixgbe_xmit_pkts_vec() and ixgbe_recv_pkts_vec() still do this. Yes, these
are the vector versions, but do we need to study the drivers of every
device that we might use, when deciding things such as burst sizes? :)
* One simple enhancement idea: ethdev info-get API could convey related
info, e.g., whether rx/tx APIs truncate bursts, if so, how big, etc.


>
>> We propose to effectively set bsz_mask = (1 << tx_burst_sz) - 1
>> (while avoiding truncation for tx_burst_sz=64), to cache the mask
>> value of a full burst, and then do a simple compare with pkts_mask
>> in each f_tx_bulk.
>> 
>> Signed-off-by: Robert Sanford <rsanford@akamai.com>
>> ---
>>  lib/librte_port/rte_port_ethdev.c |   15 ++++-----------
>>  lib/librte_port/rte_port_ring.c   |   16 ++++------------
>>  lib/librte_port/rte_port_sched.c  |    7 ++-----
>>  3 files changed, 10 insertions(+), 28 deletions(-)
>> 
>> diff --git a/lib/librte_port/rte_port_ethdev.c
>> b/lib/librte_port/rte_port_ethdev.c
>> index 1c34602..3fb4947 100644
>> --- a/lib/librte_port/rte_port_ethdev.c
>> +++ b/lib/librte_port/rte_port_ethdev.c
>> @@ -188,7 +188,7 @@ rte_port_ethdev_writer_create(void *params, int
>> socket_id)
>>  	port->queue_id = conf->queue_id;
>>  	port->tx_burst_sz = conf->tx_burst_sz;
>>  	port->tx_buf_count = 0;
>> -	port->bsz_mask = 1LLU << (conf->tx_burst_sz - 1);
>> +	port->bsz_mask = UINT64_MAX >> (64 - conf->tx_burst_sz);
>
>Another way to write this is: port->bsz_mask =
>RTE_LEN2MASK(conf->tx_burst_sz, uint64_t);

True, didn't know about that macro.

--
Thanks,
Robert

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer Robert Sanford
@ 2016-04-01 19:42   ` Sanford, Robert
  2016-04-06 16:46     ` Dumitrescu, Cristian
  0 siblings, 1 reply; 14+ messages in thread
From: Sanford, Robert @ 2016-04-01 19:42 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

We don't need to change this line, because we never access more than
RTE_PORT_IN_BURST_SIZE_MAX (64) elements in this array:

-	struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];


--
Robert

>Add code to send two 60-packet bursts to a ring port_out.
>This tests a ring writer buffer overflow problem and fix
>(in patch 2/4).
>
>Signed-off-by: Robert Sanford <rsanford@akamai.com>
>---
> app/test/test_table_ports.c |   27 +++++++++++++++++++++++++--
> 1 files changed, 25 insertions(+), 2 deletions(-)
>
>diff --git a/app/test/test_table_ports.c b/app/test/test_table_ports.c
>index 2532367..0c0ec0a 100644
>--- a/app/test/test_table_ports.c
>+++ b/app/test/test_table_ports.c
>@@ -149,8 +149,8 @@ test_port_ring_writer(void)
> 
> 	/* -- Traffic TX -- */
> 	int expected_pkts, received_pkts;
>-	struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
>-	struct rte_mbuf *res_mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
>+	struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>+	struct rte_mbuf *res_mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> 
> 	port_ring_writer_params.ring = RING_TX;
> 	port_ring_writer_params.tx_burst_sz = RTE_PORT_IN_BURST_SIZE_MAX;
>@@ -216,5 +216,28 @@ test_port_ring_writer(void)
> 	for (i = 0; i < RTE_PORT_IN_BURST_SIZE_MAX; i++)
> 		rte_pktmbuf_free(res_mbuf[i]);
> 
>+	/* TX Bulk - send two 60-packet bursts */
>+	uint64_t pkt_mask = 0xfffffffffffffff0ULL;
>+
>+	for (i = 0; i < 4; i++)
>+		mbuf[i] = NULL;
>+	for (i = 4; i < 64; i++)
>+		mbuf[i] = rte_pktmbuf_alloc(pool);
>+	rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
>+	for (i = 4; i < 64; i++)
>+		mbuf[i] = rte_pktmbuf_alloc(pool);
>+	rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
>+	rte_port_ring_writer_ops.f_flush(port);
>+
>+	expected_pkts = 2 * 60;
>+	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
>+		(void **)res_mbuf, 2 * RTE_PORT_IN_BURST_SIZE_MAX);
>+
>+	if (received_pkts != expected_pkts)
>+		return -10;
>+
>+	for (i = 0; i < received_pkts; i++)
>+		rte_pktmbuf_free(res_mbuf[i]);
>+
> 	return 0;
> }
>-- 
>1.7.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer
  2016-04-01 19:42   ` Sanford, Robert
@ 2016-04-06 16:46     ` Dumitrescu, Cristian
  0 siblings, 0 replies; 14+ messages in thread
From: Dumitrescu, Cristian @ 2016-04-06 16:46 UTC (permalink / raw)
  To: Sanford, Robert, dev

Hi Robert,

Sorry for my delay, I am traveling this week, I will reply as soon as I find a slot to focus on this, hopefully in the next couple of days, thanks for your patience.

Regards,
Cristian

> -----Original Message-----
> From: Sanford, Robert [mailto:rsanford@akamai.com]
> Sent: Friday, April 1, 2016 12:43 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] app/test: enhance
> test_port_ring_writer
> 
> We don't need to change this line, because we never access more than
> RTE_PORT_IN_BURST_SIZE_MAX (64) elements in this array:
> 
> -	struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> 
> 
> --
> Robert
> 
> >Add code to send two 60-packet bursts to a ring port_out.
> >This tests a ring writer buffer overflow problem and fix
> >(in patch 2/4).
> >
> >Signed-off-by: Robert Sanford <rsanford@akamai.com>
> >---
> > app/test/test_table_ports.c |   27 +++++++++++++++++++++++++--
> > 1 files changed, 25 insertions(+), 2 deletions(-)
> >
> >diff --git a/app/test/test_table_ports.c b/app/test/test_table_ports.c
> >index 2532367..0c0ec0a 100644
> >--- a/app/test/test_table_ports.c
> >+++ b/app/test/test_table_ports.c
> >@@ -149,8 +149,8 @@ test_port_ring_writer(void)
> >
> > 	/* -- Traffic TX -- */
> > 	int expected_pkts, received_pkts;
> >-	struct rte_mbuf *mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
> >-	struct rte_mbuf *res_mbuf[RTE_PORT_IN_BURST_SIZE_MAX];
> >+	struct rte_mbuf *mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> >+	struct rte_mbuf *res_mbuf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> >
> > 	port_ring_writer_params.ring = RING_TX;
> > 	port_ring_writer_params.tx_burst_sz =
> RTE_PORT_IN_BURST_SIZE_MAX;
> >@@ -216,5 +216,28 @@ test_port_ring_writer(void)
> > 	for (i = 0; i < RTE_PORT_IN_BURST_SIZE_MAX; i++)
> > 		rte_pktmbuf_free(res_mbuf[i]);
> >
> >+	/* TX Bulk - send two 60-packet bursts */
> >+	uint64_t pkt_mask = 0xfffffffffffffff0ULL;
> >+
> >+	for (i = 0; i < 4; i++)
> >+		mbuf[i] = NULL;
> >+	for (i = 4; i < 64; i++)
> >+		mbuf[i] = rte_pktmbuf_alloc(pool);
> >+	rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
> >+	for (i = 4; i < 64; i++)
> >+		mbuf[i] = rte_pktmbuf_alloc(pool);
> >+	rte_port_ring_writer_ops.f_tx_bulk(port, mbuf, pkt_mask);
> >+	rte_port_ring_writer_ops.f_flush(port);
> >+
> >+	expected_pkts = 2 * 60;
> >+	received_pkts =
> rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
> >+		(void **)res_mbuf, 2 * RTE_PORT_IN_BURST_SIZE_MAX);
> >+
> >+	if (received_pkts != expected_pkts)
> >+		return -10;
> >+
> >+	for (i = 0; i < received_pkts; i++)
> >+		rte_pktmbuf_free(res_mbuf[i]);
> >+
> > 	return 0;
> > }
> >--
> >1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow
@ 2016-04-01 14:58 Sanford, Robert
  0 siblings, 0 replies; 14+ messages in thread
From: Sanford, Robert @ 2016-04-01 14:58 UTC (permalink / raw)
  To: Dumitrescu, Cristian, dev



>
>
>> -----Original Message-----
>> From: Robert Sanford [mailto:rsanford2@gmail.com]
>> Sent: Monday, March 28, 2016 9:52 PM
>> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Subject: [PATCH 2/4] port: fix ring writer buffer overflow
>> 
>> Ring writer tx_bulk functions may write past the end of tx_buf[].
>> Solution is to double the size of tx_buf[].
>> 
>> Signed-off-by: Robert Sanford <rsanford@akamai.com>
>> ---
>>  lib/librte_port/rte_port_ring.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/lib/librte_port/rte_port_ring.c
>>b/lib/librte_port/rte_port_ring.c
>> index b847fea..765ecc5 100644
>> --- a/lib/librte_port/rte_port_ring.c
>> +++ b/lib/librte_port/rte_port_ring.c
>> @@ -179,7 +179,7 @@ rte_port_ring_reader_stats_read(void *port,
>>  struct rte_port_ring_writer {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>>  	struct rte_ring *ring;
>>  	uint32_t tx_burst_sz;
>>  	uint32_t tx_buf_count;
>> @@ -447,7 +447,7 @@ rte_port_ring_writer_stats_read(void *port,
>>  struct rte_port_ring_writer_nodrop {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>>  	struct rte_ring *ring;
>>  	uint32_t tx_burst_sz;
>>  	uint32_t tx_buf_count;
>> --
>> 1.7.1
>
>Hi Robert,
>
>How is the buffer overflow taking place?
>
>After looking long and hard, I spotted that buffer overflow can
>potentially take place when the following conditions are met:
>1. The input packet burst does not meet the conditions of (a) being
>contiguous (first n bits set in pkts_mask, all the other bits cleared)
>and (b) containing a full burst, i.e. at least tx_burst_sz packets (n >=
>tx_burst_size). This is the slow(er) code path taken when local variable
>expr != 0.
>2. There are some packets already in the buffer.
>3. The number of packets in the incoming burst (i.e. popcount(pkts_mask))
>plus the number of packets already in the buffer exceeds the buffer size
>(RTE_PORT_IN_BURST_SIZE_MAX, i.e. 64).
>
>Is this the buffer overflow scenario that you detected?
>
>Thanks,
>Cristian
>

Hi Cristian,

Thanks for looking at the patches.
Yes, the buffer overflow occurs in the scenario you described. The
additional testing steps in patch 1/4 expose the overflow. The first time
that I run the table_autotest, it fails. The second time, the process
crashes.

--
Robert

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-04-06 16:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
2016-03-28 20:51 ` [dpdk-dev] [PATCH 1/4] app/test: enhance test_port_ring_writer Robert Sanford
2016-04-01 19:42   ` Sanford, Robert
2016-04-06 16:46     ` Dumitrescu, Cristian
2016-03-28 20:51 ` [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow Robert Sanford
2016-03-31 11:21   ` Dumitrescu, Cristian
2016-03-28 20:51 ` [dpdk-dev] [PATCH 3/4] port: fix full burst checks in f_tx_bulk ops Robert Sanford
2016-03-31 15:41   ` Dumitrescu, Cristian
2016-04-01 19:31     ` Sanford, Robert
2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
2016-03-31 13:22   ` Dumitrescu, Cristian
2016-03-30 11:00 ` [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Thomas Monjalon
2016-03-30 11:58   ` Dumitrescu, Cristian
2016-04-01 14:58 [dpdk-dev] [PATCH 2/4] port: fix ring writer buffer overflow Sanford, Robert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).