From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <venky.venkatesan@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 01C9D682E
 for <dev@dpdk.org>; Wed, 27 Aug 2014 16:50:18 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga103.fm.intel.com with ESMTP; 27 Aug 2014 07:46:21 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="377670582"
Received: from jfick-mobl.amr.corp.intel.com (HELO [10.254.12.54])
 ([10.254.12.54])
 by FMSMGA003.fm.intel.com with ESMTP; 27 Aug 2014 07:50:09 -0700
Message-ID: <53FDF11D.3040504@intel.com>
Date: Wed, 27 Aug 2014 07:54:21 -0700
From: "Venkatesan, Venky" <venky.venkatesan@intel.com>
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: dev@dpdk.org
References: <A7E5BBF1ACE50B43AF1D6E89B8280FE4C5290BB5@wtl-exchp-1.sandvine.com>
 <20140826093837.4e3d1d4b@urahara>
 <DFDF335405C17848924A094BC35766CF0A90F714@SHSMSX104.ccr.corp.intel.com>
 <CAOaVG178x4DA32XsO8GZCyo5qLKydT0UBo1NO-i-q5roD4VbNw@mail.gmail.com>
 <C68F1134885B32458704E1E4DA3E34F341A2CFFF@FMSMSX105.amr.corp.intel.com>
 <CAKfHP0Wpf1scek3yJywmHVDxGOBY6KBDYAZNkcZM0_zqUvt0sw@mail.gmail.com>
In-Reply-To: <CAKfHP0Wpf1scek3yJywmHVDxGOBY6KBDYAZNkcZM0_zqUvt0sw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] overcommitting CPUs
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Aug 2014 14:50:19 -0000

DPDK currently isn't exactly poll mode - it has an API that receives and 
transmits packets. How you enter that API could be interrupt or polled 
-we've left that up to the application to decide, rather than force a 
interrupt/NAPI type architecture. I do agree with Alex in that 
implementing a interrupt/load driven entry point as an option will make 
it usable more widely. There are multiple challenges here - managing the 
latency of an interrupt driven scheme in a user-space context, not to 
mention very high jitter rates to mention a few.

That said, overcommitment of CPUs can be achieved in other ways as well. 
You could allocate and enforce CPU sharing via cgroups, and allocate x% 
of a core to the DPDK pthread. It does introduce a degree of 
indeterminism to when the DPDK pthread gets scheduled back in (depending 
on how many other threads are running on that core). But it is another 
option ...

Regards,
-Venky

On 8/27/2014 1:40 AM, Alex Markuze wrote:
> IMHO adding "Interrupt Mode" to dpdk is important as this can open
> DPDK to a larger public of consumers, I can easily imagine someone
> trying to find user space networking  solution (And deciding against
> verbs - RDMA) for the obvious reasons and not needing deterministic
> latency.
>
> A few thoughts:
>
> Deterministic Latency: Its a fiction in a sence that  this something
> you will be able to see only in a small controlled environment. As
> network latencies in Data Centres(DC) are dominated by switch queuing
> (One good reference is http://fastpass.mit.edu that Vincent shared a
> few days back).
>
> Virtual environments: In virtual environments this is especially
> interesting as the NIC driver(Hypervisor) is working in IRQ mode which
> unless the Interrupts are pinned to different cpus then the VM will
> have a disruptive effect on the VM's performance. Moving to interrupt
> mode mode in paravirtualised environments makes sense as in any
> environment that is not carefully crafted you should not expect any
> deterministic guaranties and would opt for a simpler programming model
> - like interrupt mode.
>
> NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
> rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
> even with small packets. In some cases where the CPU is working slower
> - for example when intel_iommu=on,strict is set , you can actually see
> a performance inversion where the "slower" CPU can reach higher B/W
> because the slowdown makes NAPI work with the kernel effectively
> moving to polling mode.
>
> I think that a smarter DPDK-NAPI is important, but it is a next step
> IFF the interrupt mode is adopted.
>
> On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
> <rashmin.n.patel@intel.com> wrote:
>> You're right and I've felt the same harder part of determinism with other hypervisors' soft switch solutions as well. I think it's worth thinking about.
>>
>> Thanks,
>> Rashmin
>>
>> On Aug 26, 2014 9:15 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> The way to handle switch between out of poll mode is to use IRQ coalescing
>> parameters.
>> You want to hold off IRQ until there are a couple packets or a short delay.
>> Going out of poll mode
>> is harder to determine.
>>
>>
>> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny <danny.zhou@intel.com> wrote:
>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
>>>> Sent: Wednesday, August 27, 2014 12:39 AM
>>>> To: Michael Marchetti
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] overcommitting CPUs
>>>>
>>>> On Tue, 26 Aug 2014 16:27:14 +0000
>>>> "Michael  Marchetti" <mmarchetti@sandvine.com> wrote:
>>>>
>>>>> Hi, has there been any consideration to introduce a non-spinning
>>> network driver (interrupt based), for the purpose of overcommitting
>>>> CPUs in a virtualized environment?  This would obviously have reduced
>>> high-end performance but would allow for increased guest
>>>> density (sharing of physical CPUs) on a host.
>>>>> I am interested in adding support for this kind of operation, is there
>>> any interest in the community?
>>>>> Thanks,
>>>>>
>>>>> Mike.
>>>> Better to implement a NAPI like algorithm that adapts from poll to
>>> interrupt.
>>>
>>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
>>> simple algorithm, the new heuristic algorithm should not switch from
>>> poll-mode to interrupt-mode immediately once there is no packet in the
>>> recent poll. Otherwise, mode switching will be too frequent which brings
>>> serious negative performance impact to DPDK.
>>>