* [dpdk-dev] overcommitting CPUs
@ 2014-08-26 16:27 Michael Marchetti
2014-08-26 16:38 ` Stephen Hemminger
2014-08-26 16:42 ` Zhou, Danny
0 siblings, 2 replies; 10+ messages in thread
From: Michael Marchetti @ 2014-08-26 16:27 UTC (permalink / raw)
To: dev
Hi, has there been any consideration to introduce a non-spinning network driver (interrupt based), for the purpose of overcommitting CPUs in a virtualized environment? This would obviously have reduced high-end performance but would allow for increased guest density (sharing of physical CPUs) on a host.
I am interested in adding support for this kind of operation, is there any interest in the community?
Thanks,
Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-26 16:27 [dpdk-dev] overcommitting CPUs Michael Marchetti
@ 2014-08-26 16:38 ` Stephen Hemminger
2014-08-26 16:59 ` Zhou, Danny
2014-08-26 16:42 ` Zhou, Danny
1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2014-08-26 16:38 UTC (permalink / raw)
To: Michael Marchetti; +Cc: dev
On Tue, 26 Aug 2014 16:27:14 +0000
"Michael Marchetti" <mmarchetti@sandvine.com> wrote:
> Hi, has there been any consideration to introduce a non-spinning network driver (interrupt based), for the purpose of overcommitting CPUs in a virtualized environment? This would obviously have reduced high-end performance but would allow for increased guest density (sharing of physical CPUs) on a host.
>
> I am interested in adding support for this kind of operation, is there any interest in the community?
>
> Thanks,
>
> Mike.
Better to implement a NAPI like algorithm that adapts from poll to interrupt.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-26 16:27 [dpdk-dev] overcommitting CPUs Michael Marchetti
2014-08-26 16:38 ` Stephen Hemminger
@ 2014-08-26 16:42 ` Zhou, Danny
1 sibling, 0 replies; 10+ messages in thread
From: Zhou, Danny @ 2014-08-26 16:42 UTC (permalink / raw)
To: Michael Marchetti, dev
I have a prototype that works on Niantic to enable NIC rx interrupt and allow interrupt and polling mode switch according to real traffic load on the rx queue. It is designed for DPDK power management, and can apply to CPU resource sharing as well. It only works for non-virtualized environment at the moment. The prototype also optimized DPDK interrupt notification mechanism to user space in order to minimize the latency. Basically, it looks like a user space NAPI.
The downside of this solution is that packet latency is enlarged, which is combination of interrupt latency, CPU wakeup latency from C3/C6 C0, cache warmup latency, OS scheduling latency. Also it potentially drop packets for burst traffic on >40G NIC. In other words, the latency is non-deterministic which is not suitable for packet latency sensitive scenarios.
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Michael Marchetti
> Sent: Wednesday, August 27, 2014 12:27 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] overcommitting CPUs
>
> Hi, has there been any consideration to introduce a non-spinning network driver (interrupt based), for the purpose of overcommitting
> CPUs in a virtualized environment? This would obviously have reduced high-end performance but would allow for increased guest
> density (sharing of physical CPUs) on a host.
>
> I am interested in adding support for this kind of operation, is there any interest in the community?
>
> Thanks,
>
> Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-26 16:38 ` Stephen Hemminger
@ 2014-08-26 16:59 ` Zhou, Danny
2014-08-27 4:14 ` Stephen Hemminger
0 siblings, 1 reply; 10+ messages in thread
From: Zhou, Danny @ 2014-08-26 16:59 UTC (permalink / raw)
To: Stephen Hemminger, Michael Marchetti; +Cc: dev
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, August 27, 2014 12:39 AM
> To: Michael Marchetti
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] overcommitting CPUs
>
> On Tue, 26 Aug 2014 16:27:14 +0000
> "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
>
> > Hi, has there been any consideration to introduce a non-spinning network driver (interrupt based), for the purpose of overcommitting
> CPUs in a virtualized environment? This would obviously have reduced high-end performance but would allow for increased guest
> density (sharing of physical CPUs) on a host.
> >
> > I am interested in adding support for this kind of operation, is there any interest in the community?
> >
> > Thanks,
> >
> > Mike.
>
> Better to implement a NAPI like algorithm that adapts from poll to interrupt.
Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI' simple algorithm, the new heuristic algorithm should not switch from poll-mode to interrupt-mode immediately once there is no packet in the recent poll. Otherwise, mode switching will be too frequent which brings serious negative performance impact to DPDK.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-26 16:59 ` Zhou, Danny
@ 2014-08-27 4:14 ` Stephen Hemminger
2014-08-27 5:48 ` Patel, Rashmin N
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2014-08-27 4:14 UTC (permalink / raw)
To: Zhou, Danny; +Cc: dev
The way to handle switch between out of poll mode is to use IRQ coalescing
parameters.
You want to hold off IRQ until there are a couple packets or a short delay.
Going out of poll mode
is harder to determine.
On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Wednesday, August 27, 2014 12:39 AM
> > To: Michael Marchetti
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] overcommitting CPUs
> >
> > On Tue, 26 Aug 2014 16:27:14 +0000
> > "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
> >
> > > Hi, has there been any consideration to introduce a non-spinning
> network driver (interrupt based), for the purpose of overcommitting
> > CPUs in a virtualized environment? This would obviously have reduced
> high-end performance but would allow for increased guest
> > density (sharing of physical CPUs) on a host.
> > >
> > > I am interested in adding support for this kind of operation, is there
> any interest in the community?
> > >
> > > Thanks,
> > >
> > > Mike.
> >
> > Better to implement a NAPI like algorithm that adapts from poll to
> interrupt.
>
> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
> simple algorithm, the new heuristic algorithm should not switch from
> poll-mode to interrupt-mode immediately once there is no packet in the
> recent poll. Otherwise, mode switching will be too frequent which brings
> serious negative performance impact to DPDK.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-27 4:14 ` Stephen Hemminger
@ 2014-08-27 5:48 ` Patel, Rashmin N
2014-08-27 8:40 ` Alex Markuze
0 siblings, 1 reply; 10+ messages in thread
From: Patel, Rashmin N @ 2014-08-27 5:48 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth thinking about.
Thanks,
Rashmin
On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
The way to handle switch between out of poll mode is to use IRQ coalescing
parameters.
You want to hold off IRQ until there are a couple packets or a short delay.
Going out of poll mode
is harder to determine.
On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Wednesday, August 27, 2014 12:39 AM
> > To: Michael Marchetti
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] overcommitting CPUs
> >
> > On Tue, 26 Aug 2014 16:27:14 +0000
> > "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
> >
> > > Hi, has there been any consideration to introduce a non-spinning
> network driver (interrupt based), for the purpose of overcommitting
> > CPUs in a virtualized environment? This would obviously have reduced
> high-end performance but would allow for increased guest
> > density (sharing of physical CPUs) on a host.
> > >
> > > I am interested in adding support for this kind of operation, is there
> any interest in the community?
> > >
> > > Thanks,
> > >
> > > Mike.
> >
> > Better to implement a NAPI like algorithm that adapts from poll to
> interrupt.
>
> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
> simple algorithm, the new heuristic algorithm should not switch from
> poll-mode to interrupt-mode immediately once there is no packet in the
> recent poll. Otherwise, mode switching will be too frequent which brings
> serious negative performance impact to DPDK.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-27 5:48 ` Patel, Rashmin N
@ 2014-08-27 8:40 ` Alex Markuze
2014-08-27 14:54 ` Venkatesan, Venky
2014-08-27 16:06 ` Zhou, Danny
0 siblings, 2 replies; 10+ messages in thread
From: Alex Markuze @ 2014-08-27 8:40 UTC (permalink / raw)
To: Patel, Rashmin N; +Cc: dev
IMHO adding "Interrupt Mode" to dpdk is important as this can open
DPDK to a larger public of consumers, I can easily imagine someone
trying to find user space networking solution (And deciding against
verbs - RDMA) for the obvious reasons and not needing deterministic
latency.
A few thoughts:
Deterministic Latency: Its a fiction in a sence that this something
you will be able to see only in a small controlled environment. As
network latencies in Data Centres(DC) are dominated by switch queuing
(One good reference is http://fastpass.mit.edu that Vincent shared a
few days back).
Virtual environments: In virtual environments this is especially
interesting as the NIC driver(Hypervisor) is working in IRQ mode which
unless the Interrupts are pinned to different cpus then the VM will
have a disruptive effect on the VM's performance. Moving to interrupt
mode mode in paravirtualised environments makes sense as in any
environment that is not carefully crafted you should not expect any
deterministic guaranties and would opt for a simpler programming model
- like interrupt mode.
NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
even with small packets. In some cases where the CPU is working slower
- for example when intel_iommu=on,strict is set , you can actually see
a performance inversion where the "slower" CPU can reach higher B/W
because the slowdown makes NAPI work with the kernel effectively
moving to polling mode.
I think that a smarter DPDK-NAPI is important, but it is a next step
IFF the interrupt mode is adopted.
On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
<rashmin.n.patel@intel.com> wrote:
> You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth thinking about.
>
> Thanks,
> Rashmin
>
> On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> The way to handle switch between out of poll mode is to use IRQ coalescing
> parameters.
> You want to hold off IRQ until there are a couple packets or a short delay.
> Going out of poll mode
> is harder to determine.
>
>
> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>
>>
>> > -----Original Message-----
>> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
>> > Sent: Wednesday, August 27, 2014 12:39 AM
>> > To: Michael Marchetti
>> > Cc: dev@dpdk.org
>> > Subject: Re: [dpdk-dev] overcommitting CPUs
>> >
>> > On Tue, 26 Aug 2014 16:27:14 +0000
>> > "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
>> >
>> > > Hi, has there been any consideration to introduce a non-spinning
>> network driver (interrupt based), for the purpose of overcommitting
>> > CPUs in a virtualized environment? This would obviously have reduced
>> high-end performance but would allow for increased guest
>> > density (sharing of physical CPUs) on a host.
>> > >
>> > > I am interested in adding support for this kind of operation, is there
>> any interest in the community?
>> > >
>> > > Thanks,
>> > >
>> > > Mike.
>> >
>> > Better to implement a NAPI like algorithm that adapts from poll to
>> interrupt.
>>
>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
>> simple algorithm, the new heuristic algorithm should not switch from
>> poll-mode to interrupt-mode immediately once there is no packet in the
>> recent poll. Otherwise, mode switching will be too frequent which brings
>> serious negative performance impact to DPDK.
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-27 8:40 ` Alex Markuze
@ 2014-08-27 14:54 ` Venkatesan, Venky
2014-08-28 4:03 ` Liang, Cunming
2014-08-27 16:06 ` Zhou, Danny
1 sibling, 1 reply; 10+ messages in thread
From: Venkatesan, Venky @ 2014-08-27 14:54 UTC (permalink / raw)
To: dev
DPDK currently isn't exactly poll mode - it has an API that receives and
transmits packets. How you enter that API could be interrupt or polled
-we've left that up to the application to decide, rather than force a
interrupt/NAPI type architecture. I do agree with Alex in that
implementing a interrupt/load driven entry point as an option will make
it usable more widely. There are multiple challenges here - managing the
latency of an interrupt driven scheme in a user-space context, not to
mention very high jitter rates to mention a few.
That said, overcommitment of CPUs can be achieved in other ways as well.
You could allocate and enforce CPU sharing via cgroups, and allocate x%
of a core to the DPDK pthread. It does introduce a degree of
indeterminism to when the DPDK pthread gets scheduled back in (depending
on how many other threads are running on that core). But it is another
option ...
Regards,
-Venky
On 8/27/2014 1:40 AM, Alex Markuze wrote:
> IMHO adding "Interrupt Mode" to dpdk is important as this can open
> DPDK to a larger public of consumers, I can easily imagine someone
> trying to find user space networking solution (And deciding against
> verbs - RDMA) for the obvious reasons and not needing deterministic
> latency.
>
> A few thoughts:
>
> Deterministic Latency: Its a fiction in a sence that this something
> you will be able to see only in a small controlled environment. As
> network latencies in Data Centres(DC) are dominated by switch queuing
> (One good reference is http://fastpass.mit.edu that Vincent shared a
> few days back).
>
> Virtual environments: In virtual environments this is especially
> interesting as the NIC driver(Hypervisor) is working in IRQ mode which
> unless the Interrupts are pinned to different cpus then the VM will
> have a disruptive effect on the VM's performance. Moving to interrupt
> mode mode in paravirtualised environments makes sense as in any
> environment that is not carefully crafted you should not expect any
> deterministic guaranties and would opt for a simpler programming model
> - like interrupt mode.
>
> NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
> rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
> even with small packets. In some cases where the CPU is working slower
> - for example when intel_iommu=on,strict is set , you can actually see
> a performance inversion where the "slower" CPU can reach higher B/W
> because the slowdown makes NAPI work with the kernel effectively
> moving to polling mode.
>
> I think that a smarter DPDK-NAPI is important, but it is a next step
> IFF the interrupt mode is adopted.
>
> On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
> <rashmin.n.patel@intel.com> wrote:
>> You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth thinking about.
>>
>> Thanks,
>> Rashmin
>>
>> On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> The way to handle switch between out of poll mode is to use IRQ coalescing
>> parameters.
>> You want to hold off IRQ until there are a couple packets or a short delay.
>> Going out of poll mode
>> is harder to determine.
>>
>>
>> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
>>>> Sent: Wednesday, August 27, 2014 12:39 AM
>>>> To: Michael Marchetti
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] overcommitting CPUs
>>>>
>>>> On Tue, 26 Aug 2014 16:27:14 +0000
>>>> "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
>>>>
>>>>> Hi, has there been any consideration to introduce a non-spinning
>>> network driver (interrupt based), for the purpose of overcommitting
>>>> CPUs in a virtualized environment? This would obviously have reduced
>>> high-end performance but would allow for increased guest
>>>> density (sharing of physical CPUs) on a host.
>>>>> I am interested in adding support for this kind of operation, is there
>>> any interest in the community?
>>>>> Thanks,
>>>>>
>>>>> Mike.
>>>> Better to implement a NAPI like algorithm that adapts from poll to
>>> interrupt.
>>>
>>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
>>> simple algorithm, the new heuristic algorithm should not switch from
>>> poll-mode to interrupt-mode immediately once there is no packet in the
>>> recent poll. Otherwise, mode switching will be too frequent which brings
>>> serious negative performance impact to DPDK.
>>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-27 8:40 ` Alex Markuze
2014-08-27 14:54 ` Venkatesan, Venky
@ 2014-08-27 16:06 ` Zhou, Danny
1 sibling, 0 replies; 10+ messages in thread
From: Zhou, Danny @ 2014-08-27 16:06 UTC (permalink / raw)
To: Alex Markuze, Patel, Rashmin N; +Cc: dev
When I have time maybe next month, will submit my initial RFC patch to enable interrupt and polling mode switching for 10G NIC with sample. Welcome anybody in the community to further optimize it.
I think it will be very hard to have a generic DPDK-NAPI API to fit all the use cases, at least thresholds in the algorithm need to be adjusted on a case by case basis.
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alex Markuze
> Sent: Wednesday, August 27, 2014 4:41 PM
> To: Patel, Rashmin N
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] overcommitting CPUs
>
> IMHO adding "Interrupt Mode" to dpdk is important as this can open
> DPDK to a larger public of consumers, I can easily imagine someone
> trying to find user space networking solution (And deciding against
> verbs - RDMA) for the obvious reasons and not needing deterministic
> latency.
>
> A few thoughts:
>
> Deterministic Latency: Its a fiction in a sence that this something
> you will be able to see only in a small controlled environment. As
> network latencies in Data Centres(DC) are dominated by switch queuing
> (One good reference is http://fastpass.mit.edu that Vincent shared a
> few days back).
>
> Virtual environments: In virtual environments this is especially
> interesting as the NIC driver(Hypervisor) is working in IRQ mode which
> unless the Interrupts are pinned to different cpus then the VM will
> have a disruptive effect on the VM's performance. Moving to interrupt
> mode mode in paravirtualised environments makes sense as in any
> environment that is not carefully crafted you should not expect any
> deterministic guaranties and would opt for a simpler programming model
> - like interrupt mode.
>
> NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
> rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
> even with small packets. In some cases where the CPU is working slower
> - for example when intel_iommu=on,strict is set , you can actually see
> a performance inversion where the "slower" CPU can reach higher B/W
> because the slowdown makes NAPI work with the kernel effectively
> moving to polling mode.
>
> I think that a smarter DPDK-NAPI is important, but it is a next step
> IFF the interrupt mode is adopted.
>
> On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
> <rashmin.n.patel@intel.com> wrote:
> > You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth
> thinking about.
> >
> > Thanks,
> > Rashmin
> >
> > On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > The way to handle switch between out of poll mode is to use IRQ coalescing
> > parameters.
> > You want to hold off IRQ until there are a couple packets or a short delay.
> > Going out of poll mode
> > is harder to determine.
> >
> >
> > On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
> >
> >>
> >> > -----Original Message-----
> >> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> >> > Sent: Wednesday, August 27, 2014 12:39 AM
> >> > To: Michael Marchetti
> >> > Cc: dev@dpdk.org
> >> > Subject: Re: [dpdk-dev] overcommitting CPUs
> >> >
> >> > On Tue, 26 Aug 2014 16:27:14 +0000
> >> > "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
> >> >
> >> > > Hi, has there been any consideration to introduce a non-spinning
> >> network driver (interrupt based), for the purpose of overcommitting
> >> > CPUs in a virtualized environment? This would obviously have reduced
> >> high-end performance but would allow for increased guest
> >> > density (sharing of physical CPUs) on a host.
> >> > >
> >> > > I am interested in adding support for this kind of operation, is there
> >> any interest in the community?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Mike.
> >> >
> >> > Better to implement a NAPI like algorithm that adapts from poll to
> >> interrupt.
> >>
> >> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
> >> simple algorithm, the new heuristic algorithm should not switch from
> >> poll-mode to interrupt-mode immediately once there is no packet in the
> >> recent poll. Otherwise, mode switching will be too frequent which brings
> >> serious negative performance impact to DPDK.
> >>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] overcommitting CPUs
2014-08-27 14:54 ` Venkatesan, Venky
@ 2014-08-28 4:03 ` Liang, Cunming
0 siblings, 0 replies; 10+ messages in thread
From: Liang, Cunming @ 2014-08-28 4:03 UTC (permalink / raw)
To: Alex Markuze, Venkatesan, Venky, Zhou, Danny, Patel, Rashmin N; +Cc: dev
PMD is combined of 'PM' - a thread model and 'D' - a user space driver.
DPDK provides optimized RX and TX in Driver on fast path.
DPDK provides a single thread core affinity model to demonstrate the best IO with minimum noisy penalty.
They are not tight coupling as Venky said.
In some cases, you may only pick up the RX/TX but give up the thread model DPDK provided.
Just take care to well handle the penalty may exist in the specific thread model.
For DPDK, we do think on it, and start to deal with the negative factor.
In another perspective, the more cycles we gain on 'D' side the more we could spend on 'PM' side to cancel the penalty out.
Maybe a sample using RX/TX without dead polling is a good start.
But cannot expect more on user space wake up latency so far.
Regards,
Liang Cunming
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Venkatesan, Venky
> Sent: Wednesday, August 27, 2014 10:54 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] overcommitting CPUs
>
> DPDK currently isn't exactly poll mode - it has an API that receives and
> transmits packets. How you enter that API could be interrupt or polled
> -we've left that up to the application to decide, rather than force a
> interrupt/NAPI type architecture. I do agree with Alex in that
> implementing a interrupt/load driven entry point as an option will make
> it usable more widely. There are multiple challenges here - managing the
> latency of an interrupt driven scheme in a user-space context, not to
> mention very high jitter rates to mention a few.
>
> That said, overcommitment of CPUs can be achieved in other ways as well.
> You could allocate and enforce CPU sharing via cgroups, and allocate x%
> of a core to the DPDK pthread. It does introduce a degree of
> indeterminism to when the DPDK pthread gets scheduled back in (depending
> on how many other threads are running on that core). But it is another
> option ...
>
> Regards,
> -Venky
>
> On 8/27/2014 1:40 AM, Alex Markuze wrote:
> > IMHO adding "Interrupt Mode" to dpdk is important as this can open
> > DPDK to a larger public of consumers, I can easily imagine someone
> > trying to find user space networking solution (And deciding against
> > verbs - RDMA) for the obvious reasons and not needing deterministic
> > latency.
> >
> > A few thoughts:
> >
> > Deterministic Latency: Its a fiction in a sence that this something
> > you will be able to see only in a small controlled environment. As
> > network latencies in Data Centres(DC) are dominated by switch queuing
> > (One good reference is http://fastpass.mit.edu that Vincent shared a
> > few days back).
> >
> > Virtual environments: In virtual environments this is especially
> > interesting as the NIC driver(Hypervisor) is working in IRQ mode which
> > unless the Interrupts are pinned to different cpus then the VM will
> > have a disruptive effect on the VM's performance. Moving to interrupt
> > mode mode in paravirtualised environments makes sense as in any
> > environment that is not carefully crafted you should not expect any
> > deterministic guaranties and would opt for a simpler programming model
> > - like interrupt mode.
> >
> > NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
> > rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
> > even with small packets. In some cases where the CPU is working slower
> > - for example when intel_iommu=on,strict is set , you can actually see
> > a performance inversion where the "slower" CPU can reach higher B/W
> > because the slowdown makes NAPI work with the kernel effectively
> > moving to polling mode.
> >
> > I think that a smarter DPDK-NAPI is important, but it is a next step
> > IFF the interrupt mode is adopted.
> >
> > On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
> > <rashmin.n.patel@intel.com> wrote:
> >> You're right and I've felt the same harder part of determinism with other
> hypervisors' soft switch solutions as well. I think it's worth thinking about.
> >>
> >> Thanks,
> >> Rashmin
> >>
> >> On Aug 26, 2014 9:15 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >> The way to handle switch between out of poll mode is to use IRQ coalescing
> >> parameters.
> >> You want to hold off IRQ until there are a couple packets or a short delay.
> >> Going out of poll mode
> >> is harder to determine.
> >>
> >>
> >> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com>
> wrote:
> >>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> Hemminger
> >>>> Sent: Wednesday, August 27, 2014 12:39 AM
> >>>> To: Michael Marchetti
> >>>> Cc: dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] overcommitting CPUs
> >>>>
> >>>> On Tue, 26 Aug 2014 16:27:14 +0000
> >>>> "Michael Marchetti" <mmarchetti@sandvine.com> wrote:
> >>>>
> >>>>> Hi, has there been any consideration to introduce a non-spinning
> >>> network driver (interrupt based), for the purpose of overcommitting
> >>>> CPUs in a virtualized environment? This would obviously have reduced
> >>> high-end performance but would allow for increased guest
> >>>> density (sharing of physical CPUs) on a host.
> >>>>> I am interested in adding support for this kind of operation, is there
> >>> any interest in the community?
> >>>>> Thanks,
> >>>>>
> >>>>> Mike.
> >>>> Better to implement a NAPI like algorithm that adapts from poll to
> >>> interrupt.
> >>>
> >>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
> >>> simple algorithm, the new heuristic algorithm should not switch from
> >>> poll-mode to interrupt-mode immediately once there is no packet in the
> >>> recent poll. Otherwise, mode switching will be too frequent which brings
> >>> serious negative performance impact to DPDK.
> >>>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-08-28 3:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-26 16:27 [dpdk-dev] overcommitting CPUs Michael Marchetti
2014-08-26 16:38 ` Stephen Hemminger
2014-08-26 16:59 ` Zhou, Danny
2014-08-27 4:14 ` Stephen Hemminger
2014-08-27 5:48 ` Patel, Rashmin N
2014-08-27 8:40 ` Alex Markuze
2014-08-27 14:54 ` Venkatesan, Venky
2014-08-28 4:03 ` Liang, Cunming
2014-08-27 16:06 ` Zhou, Danny
2014-08-26 16:42 ` Zhou, Danny
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).