DPDK patches and discussions
 help / color / mirror / Atom feed
* Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
@ 2016-04-01 16:22 Sanford, Robert
  2016-04-11 19:21 ` Dumitrescu, Cristian
  0 siblings, 1 reply; 6+ messages in thread
From: Sanford, Robert @ 2016-04-01 16:22 UTC (permalink / raw)
  To: Dumitrescu, Cristian, dev; +Cc: Liang, Cunming

Hi Cristian,

Please see my comments inline.

>
>
>> -----Original Message-----
>> From: Robert Sanford [mailto:rsanford2@gmail.com]
>> Sent: Monday, March 28, 2016 9:52 PM
>> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
>> 
>> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
>> send bursts larger than tx_burst_sz to the underlying ethdev.
>> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
>> burst size, resulting in unnecessary enqueuing failures or ethdev
>> writer retries.
>
>Sending bursts larger than tx_burst_sz is actually intentional. The
>assumption is that NIC performance benefits from larger burst size. So
>the tx_burst_sz is used as a minimal burst size requirement, not as a
>maximal or fixed burst size requirement.
>
>I agree with you that a while ago the vector version of IXGBE driver used
>to work the way you describe it, but I don't think this is the case
>anymore. As an example, if TX burst size is set to 32 and 48 packets are
>transmitted, than the PMD will TX all the 48 packets (internally it can
>work in batches of 4, 8, 32, etc, should not matter) rather than TXing
>just 32 packets out of 48 and user having to either discard or retry with
>the remaining 16 packets. I am CC-ing Steve Liang for confirming this.
>
>Is there any PMD that people can name that currently behaves the
>opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and
>discard the other 16?
>
>> 

Yes, I believe that IXGBE *still* truncates. What am I missing? :) My
interpretation of the latest vector TX burst function is that it truncates
bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that
show it lowering the number of packets (nb_pkts) to enqueue (apologies in
advance for the email client mangling the indentation):

---

#define IXGBE_DEFAULT_TX_RSBIT_THRESH 32

static void
ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
*dev_info)
{
  ...
  dev_info->default_txconf = (struct rte_eth_txconf) {
    ...
    .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
    ...
  };
  ...
}


uint16_t
ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
  uint16_t nb_pkts)
{
  ...
  /* cross rx_thresh boundary is not allowed */
  nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);

  if (txq->nb_tx_free < txq->tx_free_thresh)
    ixgbe_tx_free_bufs(txq);
    

  nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
  if (unlikely(nb_pkts == 0))
    return 0;
  ...
  return nb_pkts;
}

---



>> We propose to fix this by moving the tx buffer flushing logic from
>> *after* the loop that puts all packets into the tx buffer, to *inside*
>> the loop, testing for a full burst when adding each packet.
>> 
>
>The issue I have with this approach is the introduction of a branch that
>has to be tested for each iteration of the loop rather than once for the
>entire loop.
>
>The code branch where you add this is actually the slow(er) code path
>(where local variable expr != 0), which is used for non-contiguous or
>bursts smaller than tx_burst_sz. Is there a particular reason you are
>only interested of enabling this strategy (of using tx_burst_sz as a
>fixed burst size requirement) only on this code path? The reason I am
>asking is the other fast(er) code path (where expr == 0) also uses
>tx_burst_sz as a minimal requirement and therefore it can send burst
>sizes bigger than tx_burst_sz.

The reason we limit the burst size only in the "else" path is that we also
proposed to limit the ethdev tx burst in the "if (expr==0)" path, in patch
3/4.

>
>
>> Signed-off-by: Robert Sanford <rsanford@akamai.com>
>> ---
>>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>>  1 files changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/lib/librte_port/rte_port_ethdev.c
>> b/lib/librte_port/rte_port_ethdev.c
>> index 3fb4947..1283338 100644
>> --- a/lib/librte_port/rte_port_ethdev.c
>> +++ b/lib/librte_port/rte_port_ethdev.c
>> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
>> *port,
>>  struct rte_port_ethdev_writer {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>>  	uint32_t tx_burst_sz;
>>  	uint16_t tx_buf_count;
>>  	uint64_t bsz_mask;
>> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>>  			p->tx_buf[tx_buf_count++] = pkt;
>> 
>> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>>  			pkts_mask &= ~pkt_mask;
>> -		}
>> 
>> -		p->tx_buf_count = tx_buf_count;
>> -		if (tx_buf_count >= p->tx_burst_sz)
>> -			send_burst(p);
>> +			p->tx_buf_count = tx_buf_count;
>> +			if (tx_buf_count >= p->tx_burst_sz)
>> +				send_burst(p);
>> +		}
>>  	}
>
>One observation here: if we enable this proposal (which I have an issue
>with due to the executing the branch per loop iteration rather than once
>per entire loop), it also eliminates the buffer overflow issue flagged by
>you in the other email :), so no need to e.g. doble the size of the port
>internal buffer (tx_buf).
>
>> 


Not exactly correct: We suggested doubling tx_buf[] for *ring* writers.
Here (the hunks above) we suggest the opposite: *reduce* the size of the
*ethdev* tx_buf[], because we never expect to buffer more than a full
burst.

You are correct about the additional branch per loop iteration. On the
other hand, the proposed change is simpler than something like this:
compute how many more packets we need to complete a full burst, copy them
to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way
is acceptable to me.


>>  	return 0;
>> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
>> *port,
>>  struct rte_port_ethdev_writer_nodrop {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>>  	uint32_t tx_burst_sz;
>>  	uint16_t tx_buf_count;
>>  	uint64_t bsz_mask;
>> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
>> *port,
>>  			p->tx_buf[tx_buf_count++] = pkt;
>> 
>> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>>  			pkts_mask &= ~pkt_mask;
>> -		}
>> 
>> -		p->tx_buf_count = tx_buf_count;
>> -		if (tx_buf_count >= p->tx_burst_sz)
>> -			send_burst_nodrop(p);
>> +			p->tx_buf_count = tx_buf_count;
>> +			if (tx_buf_count >= p->tx_burst_sz)
>> +				send_burst_nodrop(p);
>> +		}
>>  	}
>> 
>>  	return 0;
>> --
>> 1.7.1
>

--
Regards,
Robert

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-04-01 16:22 [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Sanford, Robert
@ 2016-04-11 19:21 ` Dumitrescu, Cristian
  2016-04-11 20:36   ` Sanford, Robert
  0 siblings, 1 reply; 6+ messages in thread
From: Dumitrescu, Cristian @ 2016-04-11 19:21 UTC (permalink / raw)
  To: Sanford, Robert, dev; +Cc: Liang, Cunming, Venkatesan, Venky, Richardson, Bruce

Hi Robert,

I am doing a quick summary below on the changes proposed by these patches:

1. [PRIORITY 1] Bug fixing:
a) Fix buffer overflow issue in rte_port_ring.c (writer, writer_nodrop): double the tx_buf buffer size (applicable for current code approach)
b) Fix issue with handling burst sizes bigger than 32: replace all declarations of local variable bsz_size from uint32_t to uint64_t

2. [PRIORITY 2] Treat burst size as a fixed/exact value for the TX burst (Approach 2) instead of minimal value (current code, Approach 1) for ethdev ports. Rationale is that some PMDs (like vector IXGBE) _might_ drop the excess packets in the burst.

Additional input:
1. Bruce and I looked together at the code, it looks that vector IXGBE is not doing this (anymore). Internally it handles packets in batches on 32 (as your code snippets suggest), but there is no drop of excess packets taking place.

2. Venky also suggested to keep a larger burst as a single burst (Approach 1) rather than break the larger burst into a fixed/constant size burst while buffering the excess packets until complete burst is met in the future.

Given this input and also the timing of the release, we think the best option is:
- urgently send a quick patch to handle the bug fixes now
- keep the existing code (burst size used as minimal burst size requirement, not constant) as is, at least for now, and if you feel it is not the best choice, we can continue to debate it for 16.7 release.
What do you think?

Jasvinder just send the bug fixing patches, hopefully they will make it into the 16.4 release:
http://www.dpdk.org/ml/archives/dev/2016-April/037392.html
http://www.dpdk.org/ml/archives/dev/2016-April/037393.html

Many thanks for your work on this, Robert!

Regards,
Cristian


> -----Original Message-----
> From: Sanford, Robert [mailto:rsanford@akamai.com]
> Sent: Friday, April 1, 2016 5:22 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev@dpdk.org
> Cc: Liang, Cunming <cunming.liang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
> 
> Hi Cristian,
> 
> Please see my comments inline.
> 
> >
> >
> >> -----Original Message-----
> >> From: Robert Sanford [mailto:rsanford2@gmail.com]
> >> Sent: Monday, March 28, 2016 9:52 PM
> >> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> >> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
> >>
> >> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
> >> send bursts larger than tx_burst_sz to the underlying ethdev.
> >> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
> >> burst size, resulting in unnecessary enqueuing failures or ethdev
> >> writer retries.
> >
> >Sending bursts larger than tx_burst_sz is actually intentional. The
> >assumption is that NIC performance benefits from larger burst size. So
> >the tx_burst_sz is used as a minimal burst size requirement, not as a
> >maximal or fixed burst size requirement.
> >
> >I agree with you that a while ago the vector version of IXGBE driver used
> >to work the way you describe it, but I don't think this is the case
> >anymore. As an example, if TX burst size is set to 32 and 48 packets are
> >transmitted, than the PMD will TX all the 48 packets (internally it can
> >work in batches of 4, 8, 32, etc, should not matter) rather than TXing
> >just 32 packets out of 48 and user having to either discard or retry with
> >the remaining 16 packets. I am CC-ing Steve Liang for confirming this.
> >
> >Is there any PMD that people can name that currently behaves the
> >opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and
> >discard the other 16?
> >
> >>
> 
> Yes, I believe that IXGBE *still* truncates. What am I missing? :) My
> interpretation of the latest vector TX burst function is that it truncates
> bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that
> show it lowering the number of packets (nb_pkts) to enqueue (apologies in
> advance for the email client mangling the indentation):
> 
> ---
> 
> #define IXGBE_DEFAULT_TX_RSBIT_THRESH 32
> 
> static void
> ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
> *dev_info)
> {
>   ...
>   dev_info->default_txconf = (struct rte_eth_txconf) {
>     ...
>     .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
>     ...
>   };
>   ...
> }
> 
> 
> uint16_t
> ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
>   uint16_t nb_pkts)
> {
>   ...
>   /* cross rx_thresh boundary is not allowed */
>   nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
> 
>   if (txq->nb_tx_free < txq->tx_free_thresh)
>     ixgbe_tx_free_bufs(txq);
> 
> 
>   nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
>   if (unlikely(nb_pkts == 0))
>     return 0;
>   ...
>   return nb_pkts;
> }
> 
> ---
> 
> 
> 
> >> We propose to fix this by moving the tx buffer flushing logic from
> >> *after* the loop that puts all packets into the tx buffer, to *inside*
> >> the loop, testing for a full burst when adding each packet.
> >>
> >
> >The issue I have with this approach is the introduction of a branch that
> >has to be tested for each iteration of the loop rather than once for the
> >entire loop.
> >
> >The code branch where you add this is actually the slow(er) code path
> >(where local variable expr != 0), which is used for non-contiguous or
> >bursts smaller than tx_burst_sz. Is there a particular reason you are
> >only interested of enabling this strategy (of using tx_burst_sz as a
> >fixed burst size requirement) only on this code path? The reason I am
> >asking is the other fast(er) code path (where expr == 0) also uses
> >tx_burst_sz as a minimal requirement and therefore it can send burst
> >sizes bigger than tx_burst_sz.
> 
> The reason we limit the burst size only in the "else" path is that we also
> proposed to limit the ethdev tx burst in the "if (expr==0)" path, in patch
> 3/4.
> 
> >
> >
> >> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> >> ---
> >>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
> >>  1 files changed, 10 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/lib/librte_port/rte_port_ethdev.c
> >> b/lib/librte_port/rte_port_ethdev.c
> >> index 3fb4947..1283338 100644
> >> --- a/lib/librte_port/rte_port_ethdev.c
> >> +++ b/lib/librte_port/rte_port_ethdev.c
> >> @@ -151,7 +151,7 @@ static int
> rte_port_ethdev_reader_stats_read(void
> >> *port,
> >>  struct rte_port_ethdev_writer {
> >>  	struct rte_port_out_stats stats;
> >>
> >> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> >> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
> >>  	uint32_t tx_burst_sz;
> >>  	uint16_t tx_buf_count;
> >>  	uint64_t bsz_mask;
> >> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
> >>  			p->tx_buf[tx_buf_count++] = pkt;
> >>
> >> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
> >>  			pkts_mask &= ~pkt_mask;
> >> -		}
> >>
> >> -		p->tx_buf_count = tx_buf_count;
> >> -		if (tx_buf_count >= p->tx_burst_sz)
> >> -			send_burst(p);
> >> +			p->tx_buf_count = tx_buf_count;
> >> +			if (tx_buf_count >= p->tx_burst_sz)
> >> +				send_burst(p);
> >> +		}
> >>  	}
> >
> >One observation here: if we enable this proposal (which I have an issue
> >with due to the executing the branch per loop iteration rather than once
> >per entire loop), it also eliminates the buffer overflow issue flagged by
> >you in the other email :), so no need to e.g. doble the size of the port
> >internal buffer (tx_buf).
> >
> >>
> 
> 
> Not exactly correct: We suggested doubling tx_buf[] for *ring* writers.
> Here (the hunks above) we suggest the opposite: *reduce* the size of the
> *ethdev* tx_buf[], because we never expect to buffer more than a full
> burst.
> 
> You are correct about the additional branch per loop iteration. On the
> other hand, the proposed change is simpler than something like this:
> compute how many more packets we need to complete a full burst, copy
> them
> to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way
> is acceptable to me.
> 
> 
> >>  	return 0;
> >> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
> >> *port,
> >>  struct rte_port_ethdev_writer_nodrop {
> >>  	struct rte_port_out_stats stats;
> >>
> >> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> >> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
> >>  	uint32_t tx_burst_sz;
> >>  	uint16_t tx_buf_count;
> >>  	uint64_t bsz_mask;
> >> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
> >> *port,
> >>  			p->tx_buf[tx_buf_count++] = pkt;
> >>
> >> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
> >>  			pkts_mask &= ~pkt_mask;
> >> -		}
> >>
> >> -		p->tx_buf_count = tx_buf_count;
> >> -		if (tx_buf_count >= p->tx_burst_sz)
> >> -			send_burst_nodrop(p);
> >> +			p->tx_buf_count = tx_buf_count;
> >> +			if (tx_buf_count >= p->tx_burst_sz)
> >> +				send_burst_nodrop(p);
> >> +		}
> >>  	}
> >>
> >>  	return 0;
> >> --
> >> 1.7.1
> >
> 
> --
> Regards,
> Robert

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-04-11 19:21 ` Dumitrescu, Cristian
@ 2016-04-11 20:36   ` Sanford, Robert
  2016-04-12 15:40     ` Dumitrescu, Cristian
  0 siblings, 1 reply; 6+ messages in thread
From: Sanford, Robert @ 2016-04-11 20:36 UTC (permalink / raw)
  To: Dumitrescu, Cristian, dev
  Cc: Liang, Cunming, Venkatesan, Venky, Richardson, Bruce

Hi Cristian,

Yes, I mostly agree with your suggestions:
1. We should fix the two obvious bugs (1a and 1b) right away. Jasvinder's
patches look fine.
2. We should take no immediate action on the issue I raised about PMDs
(vector IXGBE) not enqueuing more than 32 packets. We can discuss and
debate; no patch for 16.04, perhaps something in 16.07.


Let's start the discussion now, regarding vector IXGBE. You state
"Internally it handles packets in batches [of] 32 (as your code snippets
suggest), but there is no drop of excess packets taking place." I guess it
depends on your definition of "drop". If I pass 33 packets to
ixgbe_xmit_pkts_vec(), it will enqueue 32 packets, and return a value of
32. Can we agree on that?

--
Regards,
Robert


On 4/11/16 3:21 PM, "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
wrote:

>Hi Robert,
>
>I am doing a quick summary below on the changes proposed by these patches:
>
>1. [PRIORITY 1] Bug fixing:
>a) Fix buffer overflow issue in rte_port_ring.c (writer, writer_nodrop):
>double the tx_buf buffer size (applicable for current code approach)
>b) Fix issue with handling burst sizes bigger than 32: replace all
>declarations of local variable bsz_size from uint32_t to uint64_t
>
>2. [PRIORITY 2] Treat burst size as a fixed/exact value for the TX burst
>(Approach 2) instead of minimal value (current code, Approach 1) for
>ethdev ports. Rationale is that some PMDs (like vector IXGBE) _might_
>drop the excess packets in the burst.
>
>Additional input:
>1. Bruce and I looked together at the code, it looks that vector IXGBE is
>not doing this (anymore). Internally it handles packets in batches on 32
>(as your code snippets suggest), but there is no drop of excess packets
>taking place.
>
>2. Venky also suggested to keep a larger burst as a single burst
>(Approach 1) rather than break the larger burst into a fixed/constant
>size burst while buffering the excess packets until complete burst is met
>in the future.
>
>Given this input and also the timing of the release, we think the best
>option is:
>- urgently send a quick patch to handle the bug fixes now
>- keep the existing code (burst size used as minimal burst size
>requirement, not constant) as is, at least for now, and if you feel it is
>not the best choice, we can continue to debate it for 16.7 release.
>What do you think?
>
>Jasvinder just send the bug fixing patches, hopefully they will make it
>into the 16.4 release:
>http://www.dpdk.org/ml/archives/dev/2016-April/037392.html
>http://www.dpdk.org/ml/archives/dev/2016-April/037393.html
>
>Many thanks for your work on this, Robert!
>
>Regards,
>Cristian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-04-11 20:36   ` Sanford, Robert
@ 2016-04-12 15:40     ` Dumitrescu, Cristian
  0 siblings, 0 replies; 6+ messages in thread
From: Dumitrescu, Cristian @ 2016-04-12 15:40 UTC (permalink / raw)
  To: Sanford, Robert, dev; +Cc: Liang, Cunming, Venkatesan, Venky, Richardson, Bruce



> -----Original Message-----
> From: Sanford, Robert [mailto:rsanford@akamai.com]
> Sent: Monday, April 11, 2016 9:37 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev@dpdk.org
> Cc: Liang, Cunming <cunming.liang@intel.com>; Venkatesan, Venky
> <venky.venkatesan@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
> 
> Hi Cristian,
> 
> Yes, I mostly agree with your suggestions:
> 1. We should fix the two obvious bugs (1a and 1b) right away. Jasvinder's
> patches look fine.
> 2. We should take no immediate action on the issue I raised about PMDs
> (vector IXGBE) not enqueuing more than 32 packets. We can discuss and
> debate; no patch for 16.04, perhaps something in 16.07.
> 
> 
> Let's start the discussion now, regarding vector IXGBE. You state
> "Internally it handles packets in batches [of] 32 (as your code snippets
> suggest), but there is no drop of excess packets taking place." I guess it
> depends on your definition of "drop". If I pass 33 packets to
> ixgbe_xmit_pkts_vec(), it will enqueue 32 packets, and return a value of
> 32. Can we agree on that?
> 

Yes, Steve Liang and I looked at the latest IXGBE vector code and it looks like you are right. The number of packets that get accepted is the minimum between number of packets provided by the user (33 in our case) and two thresholds, txq->tx_rs_thresh and txq->nb_tx_free, which are by default set to 32, which is the value that yields the best performance, hence only 32 packets get accepted.

It also looks virtually impossible to change this behaviour of IXGBE vector driver. As an example, let's say 33 packets are presented by the user, IXGBE picks the first 32 and tries to send them, but only 17 make it, so the other 15 have to be returned back to the user; then there is the 33rd packet that is picked, and this packet makes it. Since return value is a number (not a mask), how do you tell the user that packets 0 .. 16 and 32 made it, while the packets 17 .. 31 in the middle of the burst did not make it?

So the only real place to improve this is the port_ethdev_writer. I wonder whether there is nice way to combine existing behavior (burst treated as minimal requirement) with your proposal (burst treated as constant requirement) under the same code, and then pick between the two behaviors based on an input parameter provided when port is created?

> --
> Regards,
> Robert
> 
> 
> On 4/11/16 3:21 PM, "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
> wrote:
> 
> >Hi Robert,
> >
> >I am doing a quick summary below on the changes proposed by these
> patches:
> >
> >1. [PRIORITY 1] Bug fixing:
> >a) Fix buffer overflow issue in rte_port_ring.c (writer, writer_nodrop):
> >double the tx_buf buffer size (applicable for current code approach)
> >b) Fix issue with handling burst sizes bigger than 32: replace all
> >declarations of local variable bsz_size from uint32_t to uint64_t
> >
> >2. [PRIORITY 2] Treat burst size as a fixed/exact value for the TX burst
> >(Approach 2) instead of minimal value (current code, Approach 1) for
> >ethdev ports. Rationale is that some PMDs (like vector IXGBE) _might_
> >drop the excess packets in the burst.
> >
> >Additional input:
> >1. Bruce and I looked together at the code, it looks that vector IXGBE is
> >not doing this (anymore). Internally it handles packets in batches on 32
> >(as your code snippets suggest), but there is no drop of excess packets
> >taking place.
> >
> >2. Venky also suggested to keep a larger burst as a single burst
> >(Approach 1) rather than break the larger burst into a fixed/constant
> >size burst while buffering the excess packets until complete burst is met
> >in the future.
> >
> >Given this input and also the timing of the release, we think the best
> >option is:
> >- urgently send a quick patch to handle the bug fixes now
> >- keep the existing code (burst size used as minimal burst size
> >requirement, not constant) as is, at least for now, and if you feel it is
> >not the best choice, we can continue to debate it for 16.7 release.
> >What do you think?
> >
> >Jasvinder just send the bug fixing patches, hopefully they will make it
> >into the 16.4 release:
> >http://www.dpdk.org/ml/archives/dev/2016-April/037392.html
> >http://www.dpdk.org/ml/archives/dev/2016-April/037393.html
> >
> >Many thanks for your work on this, Robert!
> >
> >Regards,
> >Cristian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
@ 2016-03-31 13:22   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 6+ messages in thread
From: Dumitrescu, Cristian @ 2016-03-31 13:22 UTC (permalink / raw)
  To: Robert Sanford, dev; +Cc: Liang, Cunming



> -----Original Message-----
> From: Robert Sanford [mailto:rsanford2@gmail.com]
> Sent: Monday, March 28, 2016 9:52 PM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
> 
> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
> send bursts larger than tx_burst_sz to the underlying ethdev.
> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
> burst size, resulting in unnecessary enqueuing failures or ethdev
> writer retries.

Sending bursts larger than tx_burst_sz is actually intentional. The assumption is that NIC performance benefits from larger burst size. So the tx_burst_sz is used as a minimal burst size requirement, not as a maximal or fixed burst size requirement.

I agree with you that a while ago the vector version of IXGBE driver used to work the way you describe it, but I don't think this is the case anymore. As an example, if TX burst size is set to 32 and 48 packets are transmitted, than the PMD will TX all the 48 packets (internally it can work in batches of 4, 8, 32, etc, should not matter) rather than TXing just 32 packets out of 48 and user having to either discard or retry with the remaining 16 packets. I am CC-ing Steve Liang for confirming this.

Is there any PMD that people can name that currently behaves the opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and discard the other 16?

> 
> We propose to fix this by moving the tx buffer flushing logic from
> *after* the loop that puts all packets into the tx buffer, to *inside*
> the loop, testing for a full burst when adding each packet.
> 

The issue I have with this approach is the introduction of a branch that has to be tested for each iteration of the loop rather than once for the entire loop.

The code branch where you add this is actually the slow(er) code path (where local variable expr != 0), which is used for non-contiguous or bursts smaller than tx_burst_sz. Is there a particular reason you are only interested of enabling this strategy (of using tx_burst_sz as a fixed burst size requirement) only on this code path? The reason I am asking is the other fast(er) code path (where expr == 0) also uses tx_burst_sz as a minimal requirement and therefore it can send burst sizes bigger than tx_burst_sz.


> Signed-off-by: Robert Sanford <rsanford@akamai.com>
> ---
>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>  1 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_port/rte_port_ethdev.c
> b/lib/librte_port/rte_port_ethdev.c
> index 3fb4947..1283338 100644
> --- a/lib/librte_port/rte_port_ethdev.c
> +++ b/lib/librte_port/rte_port_ethdev.c
> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
> *port,
>  struct rte_port_ethdev_writer {
>  	struct rte_port_out_stats stats;
> 
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>  			p->tx_buf[tx_buf_count++] = pkt;
> 
> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &= ~pkt_mask;
> -		}
> 
> -		p->tx_buf_count = tx_buf_count;
> -		if (tx_buf_count >= p->tx_burst_sz)
> -			send_burst(p);
> +			p->tx_buf_count = tx_buf_count;
> +			if (tx_buf_count >= p->tx_burst_sz)
> +				send_burst(p);
> +		}
>  	}

One observation here: if we enable this proposal (which I have an issue with due to the executing the branch per loop iteration rather than once per entire loop), it also eliminates the buffer overflow issue flagged by you in the other email :), so no need to e.g. doble the size of the port internal buffer (tx_buf).

> 
>  	return 0;
> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
> *port,
>  struct rte_port_ethdev_writer_nodrop {
>  	struct rte_port_out_stats stats;
> 
> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>  	uint32_t tx_burst_sz;
>  	uint16_t tx_buf_count;
>  	uint64_t bsz_mask;
> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
> *port,
>  			p->tx_buf[tx_buf_count++] = pkt;
> 
> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>  			pkts_mask &= ~pkt_mask;
> -		}
> 
> -		p->tx_buf_count = tx_buf_count;
> -		if (tx_buf_count >= p->tx_burst_sz)
> -			send_burst_nodrop(p);
> +			p->tx_buf_count = tx_buf_count;
> +			if (tx_buf_count >= p->tx_burst_sz)
> +				send_burst_nodrop(p);
> +		}
>  	}
> 
>  	return 0;
> --
> 1.7.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
  2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
@ 2016-03-28 20:51 ` Robert Sanford
  2016-03-31 13:22   ` Dumitrescu, Cristian
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Sanford @ 2016-03-28 20:51 UTC (permalink / raw)
  To: dev, cristian.dumitrescu

For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
send bursts larger than tx_burst_sz to the underlying ethdev.
Some PMDs (e.g., ixgbe) may truncate this request to their maximum
burst size, resulting in unnecessary enqueuing failures or ethdev
writer retries.

We propose to fix this by moving the tx buffer flushing logic from
*after* the loop that puts all packets into the tx buffer, to *inside*
the loop, testing for a full burst when adding each packet.

Signed-off-by: Robert Sanford <rsanford@akamai.com>
---
 lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_port/rte_port_ethdev.c b/lib/librte_port/rte_port_ethdev.c
index 3fb4947..1283338 100644
--- a/lib/librte_port/rte_port_ethdev.c
+++ b/lib/librte_port/rte_port_ethdev.c
@@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void *port,
 struct rte_port_ethdev_writer {
 	struct rte_port_out_stats stats;
 
-	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
 	uint32_t tx_burst_sz;
 	uint16_t tx_buf_count;
 	uint64_t bsz_mask;
@@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
 			p->tx_buf[tx_buf_count++] = pkt;
 			RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
 			pkts_mask &= ~pkt_mask;
-		}
 
-		p->tx_buf_count = tx_buf_count;
-		if (tx_buf_count >= p->tx_burst_sz)
-			send_burst(p);
+			p->tx_buf_count = tx_buf_count;
+			if (tx_buf_count >= p->tx_burst_sz)
+				send_burst(p);
+		}
 	}
 
 	return 0;
@@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void *port,
 struct rte_port_ethdev_writer_nodrop {
 	struct rte_port_out_stats stats;
 
-	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
+	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
 	uint32_t tx_burst_sz;
 	uint16_t tx_buf_count;
 	uint64_t bsz_mask;
@@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void *port,
 			p->tx_buf[tx_buf_count++] = pkt;
 			RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
 			pkts_mask &= ~pkt_mask;
-		}
 
-		p->tx_buf_count = tx_buf_count;
-		if (tx_buf_count >= p->tx_burst_sz)
-			send_burst_nodrop(p);
+			p->tx_buf_count = tx_buf_count;
+			if (tx_buf_count >= p->tx_burst_sz)
+				send_burst_nodrop(p);
+		}
 	}
 
 	return 0;
-- 
1.7.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-12 15:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 16:22 [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Sanford, Robert
2016-04-11 19:21 ` Dumitrescu, Cristian
2016-04-11 20:36   ` Sanford, Robert
2016-04-12 15:40     ` Dumitrescu, Cristian
  -- strict thread matches above, loose matches on Subject: below --
2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
2016-03-31 13:22   ` Dumitrescu, Cristian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).