From: "Sanford, Robert" <rsanford@akamai.com>
To: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "Liang, Cunming" <cunming.liang@intel.com>
Subject: Re: [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big
Date: Fri, 1 Apr 2016 16:22:17 +0000 [thread overview]
Message-ID: <D3240573.118A9%rsanford@akamai.com> (raw)
Hi Cristian,
Please see my comments inline.
>
>
>> -----Original Message-----
>> From: Robert Sanford [mailto:rsanford2@gmail.com]
>> Sent: Monday, March 28, 2016 9:52 PM
>> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Subject: [PATCH 4/4] port: fix ethdev writer burst too big
>>
>> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally
>> send bursts larger than tx_burst_sz to the underlying ethdev.
>> Some PMDs (e.g., ixgbe) may truncate this request to their maximum
>> burst size, resulting in unnecessary enqueuing failures or ethdev
>> writer retries.
>
>Sending bursts larger than tx_burst_sz is actually intentional. The
>assumption is that NIC performance benefits from larger burst size. So
>the tx_burst_sz is used as a minimal burst size requirement, not as a
>maximal or fixed burst size requirement.
>
>I agree with you that a while ago the vector version of IXGBE driver used
>to work the way you describe it, but I don't think this is the case
>anymore. As an example, if TX burst size is set to 32 and 48 packets are
>transmitted, than the PMD will TX all the 48 packets (internally it can
>work in batches of 4, 8, 32, etc, should not matter) rather than TXing
>just 32 packets out of 48 and user having to either discard or retry with
>the remaining 16 packets. I am CC-ing Steve Liang for confirming this.
>
>Is there any PMD that people can name that currently behaves the
>opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and
>discard the other 16?
>
>>
Yes, I believe that IXGBE *still* truncates. What am I missing? :) My
interpretation of the latest vector TX burst function is that it truncates
bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that
show it lowering the number of packets (nb_pkts) to enqueue (apologies in
advance for the email client mangling the indentation):
---
#define IXGBE_DEFAULT_TX_RSBIT_THRESH 32
static void
ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info
*dev_info)
{
...
dev_info->default_txconf = (struct rte_eth_txconf) {
...
.tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH,
...
};
...
}
uint16_t
ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
{
...
/* cross rx_thresh boundary is not allowed */
nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
if (txq->nb_tx_free < txq->tx_free_thresh)
ixgbe_tx_free_bufs(txq);
nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
return 0;
...
return nb_pkts;
}
---
>> We propose to fix this by moving the tx buffer flushing logic from
>> *after* the loop that puts all packets into the tx buffer, to *inside*
>> the loop, testing for a full burst when adding each packet.
>>
>
>The issue I have with this approach is the introduction of a branch that
>has to be tested for each iteration of the loop rather than once for the
>entire loop.
>
>The code branch where you add this is actually the slow(er) code path
>(where local variable expr != 0), which is used for non-contiguous or
>bursts smaller than tx_burst_sz. Is there a particular reason you are
>only interested of enabling this strategy (of using tx_burst_sz as a
>fixed burst size requirement) only on this code path? The reason I am
>asking is the other fast(er) code path (where expr == 0) also uses
>tx_burst_sz as a minimal requirement and therefore it can send burst
>sizes bigger than tx_burst_sz.
The reason we limit the burst size only in the "else" path is that we also
proposed to limit the ethdev tx burst in the "if (expr==0)" path, in patch
3/4.
>
>
>> Signed-off-by: Robert Sanford <rsanford@akamai.com>
>> ---
>> lib/librte_port/rte_port_ethdev.c | 20 ++++++++++----------
>> 1 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/lib/librte_port/rte_port_ethdev.c
>> b/lib/librte_port/rte_port_ethdev.c
>> index 3fb4947..1283338 100644
>> --- a/lib/librte_port/rte_port_ethdev.c
>> +++ b/lib/librte_port/rte_port_ethdev.c
>> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void
>> *port,
>> struct rte_port_ethdev_writer {
>> struct rte_port_out_stats stats;
>>
>> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>> uint32_t tx_burst_sz;
>> uint16_t tx_buf_count;
>> uint64_t bsz_mask;
>> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port,
>> p->tx_buf[tx_buf_count++] = pkt;
>>
>> RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1);
>> pkts_mask &= ~pkt_mask;
>> - }
>>
>> - p->tx_buf_count = tx_buf_count;
>> - if (tx_buf_count >= p->tx_burst_sz)
>> - send_burst(p);
>> + p->tx_buf_count = tx_buf_count;
>> + if (tx_buf_count >= p->tx_burst_sz)
>> + send_burst(p);
>> + }
>> }
>
>One observation here: if we enable this proposal (which I have an issue
>with due to the executing the branch per loop iteration rather than once
>per entire loop), it also eliminates the buffer overflow issue flagged by
>you in the other email :), so no need to e.g. doble the size of the port
>internal buffer (tx_buf).
>
>>
Not exactly correct: We suggested doubling tx_buf[] for *ring* writers.
Here (the hunks above) we suggest the opposite: *reduce* the size of the
*ethdev* tx_buf[], because we never expect to buffer more than a full
burst.
You are correct about the additional branch per loop iteration. On the
other hand, the proposed change is simpler than something like this:
compute how many more packets we need to complete a full burst, copy them
to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way
is acceptable to me.
>> return 0;
>> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void
>> *port,
>> struct rte_port_ethdev_writer_nodrop {
>> struct rte_port_out_stats stats;
>>
>> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX];
>> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX];
>> uint32_t tx_burst_sz;
>> uint16_t tx_buf_count;
>> uint64_t bsz_mask;
>> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void
>> *port,
>> p->tx_buf[tx_buf_count++] = pkt;
>>
>> RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1);
>> pkts_mask &= ~pkt_mask;
>> - }
>>
>> - p->tx_buf_count = tx_buf_count;
>> - if (tx_buf_count >= p->tx_burst_sz)
>> - send_burst_nodrop(p);
>> + p->tx_buf_count = tx_buf_count;
>> + if (tx_buf_count >= p->tx_burst_sz)
>> + send_burst_nodrop(p);
>> + }
>> }
>>
>> return 0;
>> --
>> 1.7.1
>
--
Regards,
Robert
next reply other threads:[~2016-04-01 16:22 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-01 16:22 Sanford, Robert [this message]
2016-04-11 19:21 ` Dumitrescu, Cristian
2016-04-11 20:36 ` Sanford, Robert
2016-04-12 15:40 ` Dumitrescu, Cristian
-- strict thread matches above, loose matches on Subject: below --
2016-03-28 20:51 [dpdk-dev] [PATCH 0/4] port: fix and test bugs in tx_bulk ops Robert Sanford
2016-03-28 20:51 ` [dpdk-dev] [PATCH 4/4] port: fix ethdev writer burst too big Robert Sanford
2016-03-31 13:22 ` Dumitrescu, Cristian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D3240573.118A9%rsanford@akamai.com \
--to=rsanford@akamai.com \
--cc=cristian.dumitrescu@intel.com \
--cc=cunming.liang@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).