From: "Venkatesan, Venky" <venky.venkatesan@intel.com>
To: dev@dpdk.org
Subject: Re: [dpdk-dev] overcommitting CPUs
Date: Wed, 27 Aug 2014 07:54:21 -0700 [thread overview]
Message-ID: <53FDF11D.3040504@intel.com> (raw)
In-Reply-To: <CAKfHP0Wpf1scek3yJywmHVDxGOBY6KBDYAZNkcZM0_zqUvt0sw@mail.gmail.com>
DPDK currently isn't exactly poll mode - it has an API that receives and
transmits packets. How you enter that API could be interrupt or polled
-we've left that up to the application to decide, rather than force a
interrupt/NAPI type architecture. I do agree with Alex in that
implementing a interrupt/load driven entry point as an option will make
it usable more widely. There are multiple challenges here - managing the
latency of an interrupt driven scheme in a user-space context, not to
mention very high jitter rates to mention a few.
That said, overcommitment of CPUs can be achieved in other ways as well.
You could allocate and enforce CPU sharing via cgroups, and allocate x%
of a core to the DPDK pthread. It does introduce a degree of
indeterminism to when the DPDK pthread gets scheduled back in (depending
on how many other threads are running on that core). But it is another
option ...
Regards,
-Venky
On 8/27/2014 1:40 AM, Alex Markuze wrote:
> IMHO adding "Interrupt Mode" to dpdk is important as this can open
> DPDK to a larger public of consumers, I can easily imagine someone
> trying to find user space networking solution (And deciding against
> verbs - RDMA) for the obvious reasons and not needing deterministic
> latency.
>
> A few thoughts:
>
> Deterministic Latency: Its a fiction in a sence that this something
> you will be able to see only in a small controlled environment. As
> network latencies in Data Centres(DC) are dominated by switch queuing
> (One good reference is http://fastpass.mit.edu that Vincent shared a
> few days back).
>
> Virtual environments: In virtual environments this is especially
> interesting as the NIC driver(Hypervisor) is working in IRQ mode which
> unless the Interrupts are pinned to different cpus then the VM will
> have a disruptive effect on the VM's performance. Moving to interrupt
> mode mode in paravirtualised environments makes sense as in any
> environment that is not carefully crafted you should not expect any
> deterministic guaranties and would opt for a simpler programming model
> - like interrupt mode.
>
> NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
> rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
> even with small packets. In some cases where the CPU is working slower
> - for example when intel_iommu=on,strict is set , you can actually see
> a performance inversion where the "slower" CPU can reach higher B/W
> because the slowdown makes NAPI work with the kernel effectively
> moving to polling mode.
>
> I think that a smarter DPDK-NAPI is important, but it is a next step
> IFF the interrupt mode is adopted.
>
> On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
> <rashmin.n.patel@intel.com> wrote:
>> You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth thinking about.
>>
>> Thanks,
>> Rashmin
>>
>> On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> The way to handle switch between out of poll mode is to use IRQ coalescing
>> parameters.
>> You want to hold off IRQ until there are a couple packets or a short delay.
>> Going out of poll mode
>> is harder to determine.
>>
>>
>> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
>>>> Sent: Wednesday, August 27, 2014 12:39 AM
>>>> To: Michael Marchetti
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] overcommitting CPUs
>>>>
>>>> On Tue, 26 Aug 2014 16:27:14 +0000
>>>> "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
>>>>
>>>>> Hi, has there been any consideration to introduce a non-spinning
>>> network driver (interrupt based), for the purpose of overcommitting
>>>> CPUs in a virtualized environment? This would obviously have reduced
>>> high-end performance but would allow for increased guest
>>>> density (sharing of physical CPUs) on a host.
>>>>> I am interested in adding support for this kind of operation, is there
>>> any interest in the community?
>>>>> Thanks,
>>>>>
>>>>> Mike.
>>>> Better to implement a NAPI like algorithm that adapts from poll to
>>> interrupt.
>>>
>>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
>>> simple algorithm, the new heuristic algorithm should not switch from
>>> poll-mode to interrupt-mode immediately once there is no packet in the
>>> recent poll. Otherwise, mode switching will be too frequent which brings
>>> serious negative performance impact to DPDK.
>>>
next prev parent reply other threads:[~2014-08-27 14:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-26 16:27 Michael Marchetti
2014-08-26 16:38 ` Stephen Hemminger
2014-08-26 16:59 ` Zhou, Danny
2014-08-27 4:14 ` Stephen Hemminger
2014-08-27 5:48 ` Patel, Rashmin N
2014-08-27 8:40 ` Alex Markuze
2014-08-27 14:54 ` Venkatesan, Venky [this message]
2014-08-28 4:03 ` Liang, Cunming
2014-08-27 16:06 ` Zhou, Danny
2014-08-26 16:42 ` Zhou, Danny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FDF11D.3040504@intel.com \
--to=venky.venkatesan@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).