DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Sanford, Robert" <rsanford@akamai.com>
To: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "Liang, Cunming" <cunming.liang@intel.com>
Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
Date: Fri, 1 Apr 2016 16:22:17 +0000	[thread overview]
Message-ID: <D3240573.118A9%rsanford@akamai.com> (raw)

Hi Cristian,

Please see my comments inline.

>
>
>> -----Original Message-----
>> From: Robert Sanford [mailto:rsanford2@gmail.com]
>> Sent: Monday, March 28, 2016 9:52 PM
>> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
>> 
>> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
>> send bursts larger than tx_burst_sz to the underlying ethdev.
>> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
>> burst size, resulting in unnecessary enqueuing failures or ethdev
>> writer retries.
>
>Sending bursts larger than tx_burst_sz is actually intentional. The
>assumption is that NIC performance benefits from larger burst size. So
>the tx_burst_sz is used as a minimal burst size requirement, not as a
>maximal or fixed burst size requirement.
>
>I agree with you that a while ago the vector version of IXGBE driver used
>to work the way you describe it, but I don't think this is the case
>anymore. As an example, if TX burst size is set to 32 and 48 packets are
>transmitted, than the PMD will TX all the 48 packets (internally it can
>work in batches of 4, 8, 32, etc, should not matter) rather than TXing
>just 32 packets out of 48 and user having to either discard or retry with
>the remaining 16 packets. I am CC-ing Steve Liang for confirming this.
>
>Is there any PMD that people can name that currently behaves the
>opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and
>discard the other 16?
>
>> 

Yes, I believe that IXGBE *still* truncates. What am I missing? :) My
interpretation of the latest vector TX burst function is that it truncates
bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that
show it lowering the number of packets (nb_pkts) to enqueue (apologies in
advance for the email client mangling the indentation):

---

#define IXGBE_DEFAULT_TX_RSBIT_THRESH 32

static void
ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
*dev_info)
{
  ...
  dev_info->default_txconf = (struct rte_eth_txconf) {
    ...
    .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
    ...
  };
  ...
}


uint16_t
ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
  uint16_t nb_pkts)
{
  ...
  /* cross rx_thresh boundary is not allowed */
  nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);

  if (txq->nb_tx_free < txq->tx_free_thresh)
    ixgbe_tx_free_bufs(txq);
    

  nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
  if (unlikely(nb_pkts == 0))
    return 0;
  ...
  return nb_pkts;
}

---



>> We propose to fix this by moving the tx buffer flushing logic from
>> *after* the loop that puts all packets into the tx buffer, to *inside*
>> the loop, testing for a full burst when adding each packet.
>> 
>
>The issue I have with this approach is the introduction of a branch that
>has to be tested for each iteration of the loop rather than once for the
>entire loop.
>
>The code branch where you add this is actually the slow(er) code path
>(where local variable expr != 0), which is used for non-contiguous or
>bursts smaller than tx_burst_sz. Is there a particular reason you are
>only interested of enabling this strategy (of using tx_burst_sz as a
>fixed burst size requirement) only on this code path? The reason I am
>asking is the other fast(er) code path (where expr == 0) also uses
>tx_burst_sz as a minimal requirement and therefore it can send burst
>sizes bigger than tx_burst_sz.

The reason we limit the burst size only in the "else" path is that we also
proposed to limit the ethdev tx burst in the "if (expr==0)" path, in patch
3/4.

>
>
>> Signed-off-by: Robert Sanford <rsanford@akamai.com>
>> ---
>>  lib/librte_port/rte_port_ethdev.c |   20 ++++++++++----------
>>  1 files changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/lib/librte_port/rte_port_ethdev.c
>> b/lib/librte_port/rte_port_ethdev.c
>> index 3fb4947..1283338 100644
>> --- a/lib/librte_port/rte_port_ethdev.c
>> +++ b/lib/librte_port/rte_port_ethdev.c
>> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
>> *port,
>>  struct rte_port_ethdev_writer {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>>  	uint32_t tx_burst_sz;
>>  	uint16_t tx_buf_count;
>>  	uint64_t bsz_mask;
>> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>>  			p->tx_buf[tx_buf_count++] = pkt;
>> 
>> 	RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>>  			pkts_mask &= ~pkt_mask;
>> -		}
>> 
>> -		p->tx_buf_count = tx_buf_count;
>> -		if (tx_buf_count >= p->tx_burst_sz)
>> -			send_burst(p);
>> +			p->tx_buf_count = tx_buf_count;
>> +			if (tx_buf_count >= p->tx_burst_sz)
>> +				send_burst(p);
>> +		}
>>  	}
>
>One observation here: if we enable this proposal (which I have an issue
>with due to the executing the branch per loop iteration rather than once
>per entire loop), it also eliminates the buffer overflow issue flagged by
>you in the other email :), so no need to e.g. doble the size of the port
>internal buffer (tx_buf).
>
>> 


Not exactly correct: We suggested doubling tx_buf[] for *ring* writers.
Here (the hunks above) we suggest the opposite: *reduce* the size of the
*ethdev* tx_buf[], because we never expect to buffer more than a full
burst.

You are correct about the additional branch per loop iteration. On the
other hand, the proposed change is simpler than something like this:
compute how many more packets we need to complete a full burst, copy them
to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way
is acceptable to me.


>>  	return 0;
>> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
>> *port,
>>  struct rte_port_ethdev_writer_nodrop {
>>  	struct rte_port_out_stats stats;
>> 
>> -	struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> +	struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>>  	uint32_t tx_burst_sz;
>>  	uint16_t tx_buf_count;
>>  	uint64_t bsz_mask;
>> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
>> *port,
>>  			p->tx_buf[tx_buf_count++] = pkt;
>> 
>> 	RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>>  			pkts_mask &= ~pkt_mask;
>> -		}
>> 
>> -		p->tx_buf_count = tx_buf_count;
>> -		if (tx_buf_count >= p->tx_burst_sz)
>> -			send_burst_nodrop(p);
>> +			p->tx_buf_count = tx_buf_count;
>> +			if (tx_buf_count >= p->tx_burst_sz)
>> +				send_burst_nodrop(p);
>> +		}
>>  	}
>> 
>>  	return 0;
>> --
>> 1.7.1
>

--
Regards,
Robert

             reply	other threads:[~2016-04-01 16:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-01 16:22 Sanford, Robert [this message]
2016-04-11 19:21 ` Dumitrescu, Cristian
2016-04-11 20:36   ` Sanford, Robert
2016-04-12 15:40     ` Dumitrescu, Cristian
  -- strict thread matches above, loose matches on Subject: below --
2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
2016-03-31 13:22   ` Dumitrescu, Cristian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D3240573.118A9%rsanford@akamai.com \
    --to=rsanford@akamai.com \
    --cc=cristian.dumitrescu@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).