From: Sagi Grimberg <sagi@grimberg.me>
To: Yongseok Koh <yskoh@mellanox.com>
Cc: adrien.mazarguil@6wind.com, nelio.laranjeiro@6wind.com, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] net/mlx5: poll completion queue once per a call
Date: Thu, 27 Jul 2017 14:12:41 +0300 [thread overview]
Message-ID: <cfdb1356-9fc6-f4ec-34c6-3ec1ff571207@grimberg.me> (raw)
In-Reply-To: <20170725074356.GA4034@minint-98vp2qg>
>> Yes I realize that, but can't the device still complete in a burst (of
>> unsuppressed completions)? I mean its not guaranteed that for every
>> txq_complete a signaled completion is pending right? What happens if
>> the device has inconsistent completion pacing? Can't the sw grow a
>> batch of completions if txq_complete will process a single completion
>> unconditionally?
> Speculation. First of all, device doesn't delay completion notifications for no
> reason. ASIC is not a SW running on top of a OS.
I'm sorry but this statement is not correct. It might be correct in a
lab environment, but in practice, there are lots of things that can
affect the device timing.
> If a completion comes up late,
> this means device really can't keep up the rate of posting descriptors. If so,
> tx_burst() should generate back-pressure by returning partial Tx, then app can
> make a decision between drop or retry. Retry on Tx means back-pressuring Rx side
> if app is forwarding packets.
Not arguing on that, I was simply suggesting that better heuristics
could be applied than "process one completion unconditionally".
> More serious problem I expected was a case that the THRESH is smaller than
> burst size. In that case, txq->elts[] will be short of slots all the time. But
> fortunately, in MLX PMD, we request one completion per a burst at most, not
> every THRESH of packets.
>
> If there's some SW jitter on Tx processiong, the Tx CQ can grow for sure.
> Question to myself was "when does it shrink?". It shrinks when Tx burst is light
> (burst size is smaller than THRESH) because mlx5_tx_complete() is always called
> every time tx_burst() is called. What if it keeps growing? Then, drop is
> necessary and natural like I mentioned above.
>
> It doesn't make sense for SW to absorb any possible SW jitters. Cost is high.
> It is usually done by increasing queue depth. Keeping steady state is more
> important.
Again, I agree jitters are bad, but with proper heuristics in place mlx5
can still keep a low jitter _and_ consume completions faster than
consecutive tx_burst invocations.
> Rather, this patch is helpful for reducing jitters. When I run a profiler, the
> most cycle-consuming part on Tx is still freeing buffers. If we allow loops on
> checking valid CQE, many buffers could be freed in a single call of
> mlx5_tx_complete() at some moment, then it would cause a long delay. This would
> aggravate jitter.
I didn't argue the fact that this patch addresses an issue, but mlx5 is
a driver that is designed to run applications that can act differently
than your test case.
> Of course. I appreciate your time for the review. And keep in mind that nothing
> is impossible in an open source community. I always like to discuss about ideas
> with anyone. But I was just asking to hear more details about your suggestion if
> you wanted me to implement it, rather than giving me one-sentence question :-)
Good to know.
>>> Does "budget" mean the
>>> threshold? If so, calculation of stats for adaptive threshold can impact single
>>> core performance. With multiple cores, adjusting threshold doesn't affect much.
>>
>> If you look at mlx5e driver in the kernel, it maintains online stats on
>> its RX and TX queues. It maintain these stats mostly for adaptive
>> interrupt moderation control (but not only).
>>
>> I was suggesting maintaining per TX queue stats on average completions
>> consumed for each TX burst call, and adjust the stopping condition
>> according to a calculated stat.
> In case of interrupt mitigation, it could be beneficial because interrupt
> handling cost is too costly. But, the beauty of DPDK is polling, isn't it?
If you read again my comment, I didn't suggest to apply stats for
interrupt moderation, I just gave an example of a use-case. I was
suggesting to maintain the online stats for adjusting a threshold of
how much completions to process in a tx burst call (instead of
processing one unconditionally).
> And please remember to ack at the end of this discussion if you are okay so that
> this patch can gets merged. One data point is, single core performance (fwd) of
> vectorized PMD gets improved by more than 6% with this patch. 6% is never small.
Yea, I don't mind merging it in given that I don't have time to come
up with anything better (or worse :))
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
next prev parent reply other threads:[~2017-07-27 11:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-20 15:48 Yongseok Koh
2017-07-20 16:34 ` Sagi Grimberg
2017-07-21 15:10 ` Yongseok Koh
2017-07-23 9:49 ` Sagi Grimberg
2017-07-25 7:43 ` Yongseok Koh
2017-07-27 11:12 ` Sagi Grimberg [this message]
2017-07-28 0:26 ` Yongseok Koh
2017-07-31 16:12 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfdb1356-9fc6-f4ec-34c6-3ec1ff571207@grimberg.me \
--to=sagi@grimberg.me \
--cc=adrien.mazarguil@6wind.com \
--cc=dev@dpdk.org \
--cc=nelio.laranjeiro@6wind.com \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).