DPDK patches and discussions
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Yongseok Koh <yskoh@mellanox.com>
Cc: adrien.mazarguil@6wind.com, nelio.laranjeiro@6wind.com, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] net/mlx5: poll completion queue once per a call
Date: Thu, 27 Jul 2017 14:12:41 +0300	[thread overview]
Message-ID: <cfdb1356-9fc6-f4ec-34c6-3ec1ff571207@grimberg.me> (raw)
In-Reply-To: <20170725074356.GA4034@minint-98vp2qg>


>> Yes I realize that, but can't the device still complete in a burst (of
>> unsuppressed completions)? I mean its not guaranteed that for every
>> txq_complete a signaled completion is pending right? What happens if
>> the device has inconsistent completion pacing? Can't the sw grow a
>> batch of completions if txq_complete will process a single completion
>> unconditionally?
> Speculation. First of all, device doesn't delay completion notifications for no
> reason. ASIC is not a SW running on top of a OS.

I'm sorry but this statement is not correct. It might be correct in a
lab environment, but in practice, there are lots of things that can
affect the device timing.

> If a completion comes up late,
> this means device really can't keep up the rate of posting descriptors. If so,
> tx_burst() should generate back-pressure by returning partial Tx, then app can
> make a decision between drop or retry. Retry on Tx means back-pressuring Rx side
> if app is forwarding packets.

Not arguing on that, I was simply suggesting that better heuristics
could be applied than "process one completion unconditionally".

> More serious problem I expected was a case that the THRESH is smaller than
> burst size. In that case, txq->elts[] will be short of slots all the time. But
> fortunately, in MLX PMD, we request one completion per a burst at most, not
> every THRESH of packets.
> 
> If there's some SW jitter on Tx processiong, the Tx CQ can grow for sure.
> Question to myself was "when does it shrink?". It shrinks when Tx burst is light
> (burst size is smaller than THRESH) because mlx5_tx_complete() is always called
> every time tx_burst() is called. What if it keeps growing? Then, drop is
> necessary and natural like I mentioned above.
> 
> It doesn't make sense for SW to absorb any possible SW jitters. Cost is high.
> It is usually done by increasing queue depth. Keeping steady state is more
> important.

Again, I agree jitters are bad, but with proper heuristics in place mlx5
can still keep a low jitter _and_ consume completions faster than
consecutive tx_burst invocations.

> Rather, this patch is helpful for reducing jitters. When I run a profiler, the
> most cycle-consuming part on Tx is still freeing buffers. If we allow loops on
> checking valid CQE, many buffers could be freed in a single call of
> mlx5_tx_complete() at some moment, then it would cause a long delay. This would
> aggravate jitter.

I didn't argue the fact that this patch addresses an issue, but mlx5 is
a driver that is designed to run applications that can act differently
than your test case.

> Of course. I appreciate your time for the review. And keep in mind that nothing
> is impossible in an open source community. I always like to discuss about ideas
> with anyone. But I was just asking to hear more details about your suggestion if
> you wanted me to implement it, rather than giving me one-sentence question :-)

Good to know.

>>> Does "budget" mean the
>>> threshold? If so, calculation of stats for adaptive threshold can impact single
>>> core performance. With multiple cores, adjusting threshold doesn't affect much.
>>
>> If you look at mlx5e driver in the kernel, it maintains online stats on
>> its RX and TX queues. It maintain these stats mostly for adaptive
>> interrupt moderation control (but not only).
>>
>> I was suggesting maintaining per TX queue stats on average completions
>> consumed for each TX burst call, and adjust the stopping condition
>> according to a calculated stat.
> In case of interrupt mitigation, it could be beneficial because interrupt
> handling cost is too costly. But, the beauty of DPDK is polling, isn't it?

If you read again my comment, I didn't suggest to apply stats for
interrupt moderation, I just gave an example of a use-case. I was
suggesting to maintain the online stats for adjusting a threshold of
how much completions to process in a tx burst call (instead of
processing one unconditionally).

> And please remember to ack at the end of this discussion if you are okay so that
> this patch can gets merged. One data point is, single core performance (fwd) of
> vectorized PMD gets improved by more than 6% with this patch. 6% is never small.

Yea, I don't mind merging it in given that I don't have time to come
up with anything better (or worse :))

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

  reply	other threads:[~2017-07-27 11:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20 15:48 Yongseok Koh
2017-07-20 16:34 ` Sagi Grimberg
2017-07-21 15:10   ` Yongseok Koh
2017-07-23  9:49     ` Sagi Grimberg
2017-07-25  7:43       ` Yongseok Koh
2017-07-27 11:12         ` Sagi Grimberg [this message]
2017-07-28  0:26           ` Yongseok Koh
2017-07-31 16:12 ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfdb1356-9fc6-f4ec-34c6-3ec1ff571207@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).