DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] KNI and memzones
@ 2014-09-23  9:27 Marc Sune
  2014-09-23 12:39 ` Jay Rolette
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Sune @ 2014-09-23  9:27 UTC (permalink / raw)
  To: <dev@dpdk.org>; +Cc: dev-team

Hi all,

So we are having some problems with KNI. In short, we have a DPDK 
application that creates KNI interfaces and destroys them during its 
lifecycle and connecting them to DOCKER containers. Interfaces may 
eventually be even named the same (see below).

We were wondering why even calling rte_kni_release() the hugepages 
memory was rapidly being exhausted, and we also realised even after 
destruction, you cannot use the same name for the interface.

After close inspection of the rte_kni lib we think the core issue and is 
mostly a design issue. rte_kni_alloc ends up calling 
kni_memzone_reserve() that calls at the end rte_memzone_reserve() which 
cannot be unreserved by rte_kni_relese() (by design of memzones). The 
exhaustion is rapid due to the number of FIFOs created (6).

If this would be right, we would propose and try to provide a patch as 
follows:

* Create a new rte_kni_init(unsigned int max_knis);

This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request 
and  Response)*max_knis by calling kni_memzone_reserve(), and store them 
in a kni_fifo_pool. This should only be called once by DPDK applications 
at bootstrapping time.

* rte_kni_allocate would just use one of the kni_fifo_pool (one => 
meaning a a set of 6 FIFOs making a single slot)
* rte_kni_release would return to the pool.

This should solve both issues. We would base the patch on 1.7.2.

Thoughts?
marc

p.s. Lately someone involved with DPDK said KNI would be deprecated in 
future DPDK releases; I haven't read or listen to this before, is this 
true? What would be the natural replacement then?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] KNI and memzones
  2014-09-23  9:27 [dpdk-dev] KNI and memzones Marc Sune
@ 2014-09-23 12:39 ` Jay Rolette
  2014-09-23 16:38   ` Zhou, Danny
  0 siblings, 1 reply; 7+ messages in thread
From: Jay Rolette @ 2014-09-23 12:39 UTC (permalink / raw)
  To: Marc Sune; +Cc: <dev@dpdk.org>, dev-team

*> p.s. Lately someone involved with DPDK said KNI would be deprecated in
future DPDK releases; I haven't read or listen to this before, is this
true? What would be the natural replacement then?*

KNI is a non-trivial part of the product I'm in the process of building.
I'd appreciate someone "in the know" addressing this one please. Are there
specific roadmap plans relative to KNI that we need to be aware of?

Regards,
Jay

On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune <marc.sune@bisdn.de> wrote:

> Hi all,
>
> So we are having some problems with KNI. In short, we have a DPDK
> application that creates KNI interfaces and destroys them during its
> lifecycle and connecting them to DOCKER containers. Interfaces may
> eventually be even named the same (see below).
>
> We were wondering why even calling rte_kni_release() the hugepages memory
> was rapidly being exhausted, and we also realised even after destruction,
> you cannot use the same name for the interface.
>
> After close inspection of the rte_kni lib we think the core issue and is
> mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
> that calls at the end rte_memzone_reserve() which cannot be unreserved by
> rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
> the number of FIFOs created (6).
>
> If this would be right, we would propose and try to provide a patch as
> follows:
>
> * Create a new rte_kni_init(unsigned int max_knis);
>
> This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
> a kni_fifo_pool. This should only be called once by DPDK applications at
> bootstrapping time.
>
> * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
> a a set of 6 FIFOs making a single slot)
> * rte_kni_release would return to the pool.
>
> This should solve both issues. We would base the patch on 1.7.2.
>
> Thoughts?
> marc
>
> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] KNI and memzones
  2014-09-23 12:39 ` Jay Rolette
@ 2014-09-23 16:38   ` Zhou, Danny
  2014-09-23 18:53     ` Jay Rolette
  2014-09-23 20:50     ` Marc Sune
  0 siblings, 2 replies; 7+ messages in thread
From: Zhou, Danny @ 2014-09-23 16:38 UTC (permalink / raw)
  To: Jay Rolette, Marc Sune; +Cc: <dev@dpdk.org>, dev-team


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jay Rolette
> Sent: Tuesday, September 23, 2014 8:39 PM
> To: Marc Sune
> Cc: <dev@dpdk.org>; dev-team@bisdn.de
> Subject: Re: [dpdk-dev] KNI and memzones
> 
> *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?*
> 
> KNI is a non-trivial part of the product I'm in the process of building.
> I'd appreciate someone "in the know" addressing this one please. Are there
> specific roadmap plans relative to KNI that we need to be aware of?
> 

KNI and multi-threaded KNI has several limitation:
1) Flow classification and packet distribution are done both software, specifically KNI user space library, at the cost of CPU cycles.
2) Low performance, skb creation/free and packetscopy between skb and mbuf kills performance significantly.
3) Dedicate cores in user space and kernel space responsible for rx/tx packets between DPDK App and KNI device, it seems to me waste too many core resources.
4) GPL license jail as KNI sits in kernel.

We actually have a bifurcated driver prototype that meets both high performance and upstreamable requirement, which is treated as alternative solution of KNI. The idea is to
leverage NIC' flow director capability to bifurcate data plane packets to DPDK and keep control plane packets or whatever packets need to go through kernel' TCP/IP stack remains
being processed in kernel(NIC driver + stack). Basically, kernel NIC driver and DPDK co-exists to driver a same NIC device, but manipulate different rx/tx queue pairs. Though there is some 
tough consistent NIC control issue which needs to be resolved and upstreamed to kernel, which I do not want to expose details at the moment.

IMHO, KNI should NOT be removed unless there is a really good user space, open-source and socket backward-compatible() TCP/IP stack which should not become true very soon.
The bifurcated driver approach could certainly replace KNI for some use cases where DPDK does not own the NIC control. 

Do you mind share your KNI use case in more details to help determine whether bifurcate driver could help with?

> Regards,
> Jay
> 
> On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune <marc.sune@bisdn.de> wrote:
> 
> > Hi all,
> >
> > So we are having some problems with KNI. In short, we have a DPDK
> > application that creates KNI interfaces and destroys them during its
> > lifecycle and connecting them to DOCKER containers. Interfaces may
> > eventually be even named the same (see below).
> >
> > We were wondering why even calling rte_kni_release() the hugepages memory
> > was rapidly being exhausted, and we also realised even after destruction,
> > you cannot use the same name for the interface.
> >
> > After close inspection of the rte_kni lib we think the core issue and is
> > mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
> > that calls at the end rte_memzone_reserve() which cannot be unreserved by
> > rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
> > the number of FIFOs created (6).
> >
> > If this would be right, we would propose and try to provide a patch as
> > follows:
> >
> > * Create a new rte_kni_init(unsigned int max_knis);
> >
> > This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> > and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
> > a kni_fifo_pool. This should only be called once by DPDK applications at
> > bootstrapping time.
> >
> > * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
> > a a set of 6 FIFOs making a single slot)
> > * rte_kni_release would return to the pool.
> >
> > This should solve both issues. We would base the patch on 1.7.2.
> >
> > Thoughts?
> > marc
> >
> > p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] KNI and memzones
  2014-09-23 16:38   ` Zhou, Danny
@ 2014-09-23 18:53     ` Jay Rolette
  2014-09-23 19:12       ` Zhou, Danny
  2014-09-23 20:50     ` Marc Sune
  1 sibling, 1 reply; 7+ messages in thread
From: Jay Rolette @ 2014-09-23 18:53 UTC (permalink / raw)
  To: Zhou, Danny; +Cc: <dev@dpdk.org>, dev-team

I can't discuss product details openly yet, but I'm happy to have a
detailed discussion under NDA with Intel. In fact, we had an early NDA
discussion with Intel about it a few months ago.

That said, the use case isn't tied so closely to my product that I can't
describe it in general terms...

Imagine a box that installs in your network as a transparent
bump-in-the-wire. Traffic comes in port 1 and is processed by our
DPDK-based engine, then the packets are forwarded out port 2, where they
head to their original destination. From a network topology point of view,
the box is mostly invisible.

Same process applies for traffic going the other way (RX on port 2,
special-sauce processing in DPDK app, TX on port 1).

If you are familiar with network security products, this is very much how
IPS devices work.

Where KNI comes into play is for several user-space apps that need to use
the normal network stack (sockets) to communicate over the _same_ ports
used on the main data path. We use KNI to create a virtual port with an IP
address overlaid on the "invisible" data path ports.

This isn't just for control traffic. It's obviously not line-rate
processing, but we need to get all the bandwidth we can out of it.

Let me know if that makes sense or if I need to clarify some things. If
you'd rather continue this as an NDA discussion, just shoot me an email
directly.

Regards,
Jay



On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny <danny.zhou@intel.com> wrote:

>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jay Rolette
> > Sent: Tuesday, September 23, 2014 8:39 PM
> > To: Marc Sune
> > Cc: <dev@dpdk.org>; dev-team@bisdn.de
> > Subject: Re: [dpdk-dev] KNI and memzones
> >
> > *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?*
> >
> > KNI is a non-trivial part of the product I'm in the process of building.
> > I'd appreciate someone "in the know" addressing this one please. Are
> there
> > specific roadmap plans relative to KNI that we need to be aware of?
> >
>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software,
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx
> packets between DPDK App and KNI device, it seems to me waste too many core
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high
> performance and upstreamable requirement, which is treated as alternative
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to
> DPDK and keep control plane packets or whatever packets need to go through
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC
> driver and DPDK co-exists to driver a same NIC device, but manipulate
> different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and
> upstreamed to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space,
> open-source and socket backward-compatible() TCP/IP stack which should not
> become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use
> cases where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine
> whether bifurcate driver could help with?
>
> > Regards,
> > Jay
> >
> > On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune <marc.sune@bisdn.de> wrote:
> >
> > > Hi all,
> > >
> > > So we are having some problems with KNI. In short, we have a DPDK
> > > application that creates KNI interfaces and destroys them during its
> > > lifecycle and connecting them to DOCKER containers. Interfaces may
> > > eventually be even named the same (see below).
> > >
> > > We were wondering why even calling rte_kni_release() the hugepages
> memory
> > > was rapidly being exhausted, and we also realised even after
> destruction,
> > > you cannot use the same name for the interface.
> > >
> > > After close inspection of the rte_kni lib we think the core issue and
> is
> > > mostly a design issue. rte_kni_alloc ends up calling
> kni_memzone_reserve()
> > > that calls at the end rte_memzone_reserve() which cannot be unreserved
> by
> > > rte_kni_relese() (by design of memzones). The exhaustion is rapid due
> to
> > > the number of FIFOs created (6).
> > >
> > > If this would be right, we would propose and try to provide a patch as
> > > follows:
> > >
> > > * Create a new rte_kni_init(unsigned int max_knis);
> > >
> > > This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> > > and  Response)*max_knis by calling kni_memzone_reserve(), and store
> them in
> > > a kni_fifo_pool. This should only be called once by DPDK applications
> at
> > > bootstrapping time.
> > >
> > > * rte_kni_allocate would just use one of the kni_fifo_pool (one =>
> meaning
> > > a a set of 6 FIFOs making a single slot)
> > > * rte_kni_release would return to the pool.
> > >
> > > This should solve both issues. We would base the patch on 1.7.2.
> > >
> > > Thoughts?
> > > marc
> > >
> > > p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > > future DPDK releases; I haven't read or listen to this before, is this
> > > true? What would be the natural replacement then?
> > >
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] KNI and memzones
  2014-09-23 18:53     ` Jay Rolette
@ 2014-09-23 19:12       ` Zhou, Danny
  2014-09-23 19:24         ` Jay Rolette
  0 siblings, 1 reply; 7+ messages in thread
From: Zhou, Danny @ 2014-09-23 19:12 UTC (permalink / raw)
  To: Jay Rolette; +Cc: <dev@dpdk.org>, dev-team

It looks like a typical network middle box usage with IDS/IPS/DPI sort of functionalities.  Good enough performance rather than line-rate performance should be ok for this case, and multi-threaded KNI(multiple software rx/tx queues are established between DPDK and a single vEth netdev with multiple kernel threads affinities to several lcores) should fit, with linear performance scaling if you can allocate multiple lcores to achieve satisfied throughput for relatively big packets.

Since NIC control is still in DPDK’ PMD for this case, bifurcated driver does not fit, unless you only use DPDK to rx/tx packets in your box.

From: Jay Rolette [mailto:rolette@infiniteio.com]
Sent: Wednesday, September 24, 2014 2:53 AM
To: Zhou, Danny
Cc: Marc Sune; <dev@dpdk.org>; dev-team@bisdn.de
Subject: Re: [dpdk-dev] KNI and memzones

I can't discuss product details openly yet, but I'm happy to have a detailed discussion under NDA with Intel. In fact, we had an early NDA discussion with Intel about it a few months ago.

That said, the use case isn't tied so closely to my product that I can't describe it in general terms...

Imagine a box that installs in your network as a transparent bump-in-the-wire. Traffic comes in port 1 and is processed by our DPDK-based engine, then the packets are forwarded out port 2, where they head to their original destination. From a network topology point of view, the box is mostly invisible.

Same process applies for traffic going the other way (RX on port 2, special-sauce processing in DPDK app, TX on port 1).

If you are familiar with network security products, this is very much how IPS devices work.

Where KNI comes into play is for several user-space apps that need to use the normal network stack (sockets) to communicate over the _same_ ports used on the main data path. We use KNI to create a virtual port with an IP address overlaid on the "invisible" data path ports.

This isn't just for control traffic. It's obviously not line-rate processing, but we need to get all the bandwidth we can out of it.

Let me know if that makes sense or if I need to clarify some things. If you'd rather continue this as an NDA discussion, just shoot me an email directly.

Regards,
Jay



On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny <danny.zhou@intel.com<mailto:danny.zhou@intel.com>> wrote:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Jay Rolette
> Sent: Tuesday, September 23, 2014 8:39 PM
> To: Marc Sune
> Cc: <dev@dpdk.org<mailto:dev@dpdk.org>>; dev-team@bisdn.de<mailto:dev-team@bisdn.de>
> Subject: Re: [dpdk-dev] KNI and memzones
>
> *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?*
>
> KNI is a non-trivial part of the product I'm in the process of building.
> I'd appreciate someone "in the know" addressing this one please. Are there
> specific roadmap plans relative to KNI that we need to be aware of?
>

KNI and multi-threaded KNI has several limitation:
1) Flow classification and packet distribution are done both software, specifically KNI user space library, at the cost of CPU cycles.
2) Low performance, skb creation/free and packetscopy between skb and mbuf kills performance significantly.
3) Dedicate cores in user space and kernel space responsible for rx/tx packets between DPDK App and KNI device, it seems to me waste too many core resources.
4) GPL license jail as KNI sits in kernel.

We actually have a bifurcated driver prototype that meets both high performance and upstreamable requirement, which is treated as alternative solution of KNI. The idea is to
leverage NIC' flow director capability to bifurcate data plane packets to DPDK and keep control plane packets or whatever packets need to go through kernel' TCP/IP stack remains
being processed in kernel(NIC driver + stack). Basically, kernel NIC driver and DPDK co-exists to driver a same NIC device, but manipulate different rx/tx queue pairs. Though there is some
tough consistent NIC control issue which needs to be resolved and upstreamed to kernel, which I do not want to expose details at the moment.

IMHO, KNI should NOT be removed unless there is a really good user space, open-source and socket backward-compatible() TCP/IP stack which should not become true very soon.
The bifurcated driver approach could certainly replace KNI for some use cases where DPDK does not own the NIC control.

Do you mind share your KNI use case in more details to help determine whether bifurcate driver could help with?

> Regards,
> Jay
>
> On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune <marc.sune@bisdn.de<mailto:marc.sune@bisdn.de>> wrote:
>
> > Hi all,
> >
> > So we are having some problems with KNI. In short, we have a DPDK
> > application that creates KNI interfaces and destroys them during its
> > lifecycle and connecting them to DOCKER containers. Interfaces may
> > eventually be even named the same (see below).
> >
> > We were wondering why even calling rte_kni_release() the hugepages memory
> > was rapidly being exhausted, and we also realised even after destruction,
> > you cannot use the same name for the interface.
> >
> > After close inspection of the rte_kni lib we think the core issue and is
> > mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
> > that calls at the end rte_memzone_reserve() which cannot be unreserved by
> > rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
> > the number of FIFOs created (6).
> >
> > If this would be right, we would propose and try to provide a patch as
> > follows:
> >
> > * Create a new rte_kni_init(unsigned int max_knis);
> >
> > This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> > and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
> > a kni_fifo_pool. This should only be called once by DPDK applications at
> > bootstrapping time.
> >
> > * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
> > a a set of 6 FIFOs making a single slot)
> > * rte_kni_release would return to the pool.
> >
> > This should solve both issues. We would base the patch on 1.7.2.
> >
> > Thoughts?
> > marc
> >
> > p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?
> >


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] KNI and memzones
  2014-09-23 19:12       ` Zhou, Danny
@ 2014-09-23 19:24         ` Jay Rolette
  0 siblings, 0 replies; 7+ messages in thread
From: Jay Rolette @ 2014-09-23 19:24 UTC (permalink / raw)
  To: Zhou, Danny; +Cc: <dev@dpdk.org>

Yep, good way to describe it. Not really related to network security
functions but very similar architecture.

On Tue, Sep 23, 2014 at 2:12 PM, Zhou, Danny <danny.zhou@intel.com> wrote:

>  It looks like a typical network middle box usage with IDS/IPS/DPI sort
> of functionalities.  Good enough performance rather than line-rate
> performance should be ok for this case, and multi-threaded KNI(multiple
> software rx/tx queues are established between DPDK and a single vEth netdev
> with multiple kernel threads affinities to several lcores) should fit, with
> linear performance scaling if you can allocate multiple lcores to achieve
> satisfied throughput for relatively big packets.
>
>
>
> Since NIC control is still in DPDK’ PMD for this case, bifurcated driver
> does not fit, unless you only use DPDK to rx/tx packets in your box.
>
>
>
> *From:* Jay Rolette [mailto:rolette@infiniteio.com]
> *Sent:* Wednesday, September 24, 2014 2:53 AM
> *To:* Zhou, Danny
> *Cc:* Marc Sune; <dev@dpdk.org>; dev-team@bisdn.de
>
> *Subject:* Re: [dpdk-dev] KNI and memzones
>
>
>
> I can't discuss product details openly yet, but I'm happy to have a
> detailed discussion under NDA with Intel. In fact, we had an early NDA
> discussion with Intel about it a few months ago.
>
>
>
> That said, the use case isn't tied so closely to my product that I can't
> describe it in general terms...
>
>
>
> Imagine a box that installs in your network as a transparent
> bump-in-the-wire. Traffic comes in port 1 and is processed by our
> DPDK-based engine, then the packets are forwarded out port 2, where they
> head to their original destination. From a network topology point of view,
> the box is mostly invisible.
>
>
>
> Same process applies for traffic going the other way (RX on port 2,
> special-sauce processing in DPDK app, TX on port 1).
>
>
>
> If you are familiar with network security products, this is very much how
> IPS devices work.
>
>
>
> Where KNI comes into play is for several user-space apps that need to use
> the normal network stack (sockets) to communicate over the _same_ ports
> used on the main data path. We use KNI to create a virtual port with an IP
> address overlaid on the "invisible" data path ports.
>
>
>
> This isn't just for control traffic. It's obviously not line-rate
> processing, but we need to get all the bandwidth we can out of it.
>
>
>
> Let me know if that makes sense or if I need to clarify some things. If
> you'd rather continue this as an NDA discussion, just shoot me an email
> directly.
>
>
>
> Regards,
>
> Jay
>
>
>
>
>
>
>
> On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny <danny.zhou@intel.com>
> wrote:
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jay Rolette
> > Sent: Tuesday, September 23, 2014 8:39 PM
> > To: Marc Sune
> > Cc: <dev@dpdk.org>; dev-team@bisdn.de
> > Subject: Re: [dpdk-dev] KNI and memzones
> >
> > *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?*
> >
> > KNI is a non-trivial part of the product I'm in the process of building.
> > I'd appreciate someone "in the know" addressing this one please. Are
> there
> > specific roadmap plans relative to KNI that we need to be aware of?
> >
>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software,
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx
> packets between DPDK App and KNI device, it seems to me waste too many core
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high
> performance and upstreamable requirement, which is treated as alternative
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to
> DPDK and keep control plane packets or whatever packets need to go through
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC
> driver and DPDK co-exists to driver a same NIC device, but manipulate
> different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and
> upstreamed to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space,
> open-source and socket backward-compatible() TCP/IP stack which should not
> become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use
> cases where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine
> whether bifurcate driver could help with?
>
>
> > Regards,
> > Jay
> >
> > On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune <marc.sune@bisdn.de> wrote:
> >
> > > Hi all,
> > >
> > > So we are having some problems with KNI. In short, we have a DPDK
> > > application that creates KNI interfaces and destroys them during its
> > > lifecycle and connecting them to DOCKER containers. Interfaces may
> > > eventually be even named the same (see below).
> > >
> > > We were wondering why even calling rte_kni_release() the hugepages
> memory
> > > was rapidly being exhausted, and we also realised even after
> destruction,
> > > you cannot use the same name for the interface.
> > >
> > > After close inspection of the rte_kni lib we think the core issue and
> is
> > > mostly a design issue. rte_kni_alloc ends up calling
> kni_memzone_reserve()
> > > that calls at the end rte_memzone_reserve() which cannot be unreserved
> by
> > > rte_kni_relese() (by design of memzones). The exhaustion is rapid due
> to
> > > the number of FIFOs created (6).
> > >
> > > If this would be right, we would propose and try to provide a patch as
> > > follows:
> > >
> > > * Create a new rte_kni_init(unsigned int max_knis);
> > >
> > > This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> > > and  Response)*max_knis by calling kni_memzone_reserve(), and store
> them in
> > > a kni_fifo_pool. This should only be called once by DPDK applications
> at
> > > bootstrapping time.
> > >
> > > * rte_kni_allocate would just use one of the kni_fifo_pool (one =>
> meaning
> > > a a set of 6 FIFOs making a single slot)
> > > * rte_kni_release would return to the pool.
> > >
> > > This should solve both issues. We would base the patch on 1.7.2.
> > >
> > > Thoughts?
> > > marc
> > >
> > > p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > > future DPDK releases; I haven't read or listen to this before, is this
> > > true? What would be the natural replacement then?
> > >
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] KNI and memzones
  2014-09-23 16:38   ` Zhou, Danny
  2014-09-23 18:53     ` Jay Rolette
@ 2014-09-23 20:50     ` Marc Sune
  1 sibling, 0 replies; 7+ messages in thread
From: Marc Sune @ 2014-09-23 20:50 UTC (permalink / raw)
  To: Zhou, Danny; +Cc: <dev@dpdk.org>, CERRATO IVANO

Danny,

On 23/09/14 18:38, Zhou, Danny wrote:
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jay Rolette
>> Sent: Tuesday, September 23, 2014 8:39 PM
>> To: Marc Sune
>> Cc: <dev@dpdk.org>; dev-team@bisdn.de
>> Subject: Re: [dpdk-dev] KNI and memzones
>>
>> *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
>> future DPDK releases; I haven't read or listen to this before, is this
>> true? What would be the natural replacement then?*
>>
>> KNI is a non-trivial part of the product I'm in the process of building.
>> I'd appreciate someone "in the know" addressing this one please. Are there
>> specific roadmap plans relative to KNI that we need to be aware of?
>>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software, specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx packets between DPDK App and KNI device, it seems to me waste too many core resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high performance and upstreamable requirement, which is treated as alternative solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to DPDK and keep control plane packets or whatever packets need to go through kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC driver and DPDK co-exists to driver a same NIC device, but manipulate different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and upstreamed to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space, open-source and socket backward-compatible() TCP/IP stack which should not become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use cases where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine whether bifurcate driver could help with?
I don't know if your question was (also) directed to me, but I will give 
an explanation, as short as I can, to put the problem in context, since 
the KNI issue is still open to me.

The use case is a set of experimental(still, though close to stable) 
extensions over xDPd's[1] multi-platform OpenFlow switch as well as an 
orchestration framework prototype, developed by Politecnico di 
Torino&BISDN to support the deployment of NF graphs, in the framework of 
the UNIFY FP7 [2] research (which btw, Intel is a partner of). This 
prototype was publicly demoed in EWSDN'14. The switch extensions are 
already public, but still in a development branch. We would like to 
merge them mainstream, but we need to fix some issues, being the major 
one this KNI problem. The code for the orchestration will be public soon 
too.

The idea is that the standard xdpd, in the gnu-linux-dpdk platform, is 
enhanced by creating and destroying virtual ports that hide behind VNFs. 
>From the perspective of OF, these are just ports, so the OF controller 
can distribute traffic across VNFs and other OF LSIs(~virtual switches) 
via regular OF flowmods outputting there, and compose complex VNF 
graphs. We have implemented 3 types of ports a) NATIVE, meaning the 
function is a DPDK primary process function; this is a place holder for 
future work b) SHMEM, meaning a VNF implemented as a secondary process 
communicating via rte_ring buffers, implemented but needs to be profiled 
and c) EXTERNAL ports, implemented currently using KNI interfaces.

Although KNI imposes performance penalties, it is still interesting for 
legacy applications that can be reused without any change, using DOCKER 
or other containers as well as low performance functions. virtio for VMs 
are also next steps.

The approach of the bifurcated driver is something that would fit quite 
straight forward, since the HW hooks (used in ASICs and other HW accel) 
when installing flowmods can capture this and configure the NICs to 
shortcut the sw OF processing and send pkts directly to the VNF ports, 
if the flow matches flow director restrictions. But I am not sure on the 
way back to the switch, so from the kernel to the PHY or other kernels, 
since there is no flow director. In any case, this is something we 
already had in the mid-term roadmap for the normal OF switch without the 
VNF port extensions.

I wouldn't want to go deeper on the specifics of the use case, because 
the important topic here  is actually the librte_kni implement. So 
please let me know if some of you would be interested in further 
details, also @Jay since the use-case sounds pretty aligned.

The problem here is that during the lifetime of an LSI the orchestration 
may deploy new VNFs at runtime, and since there is the support for 
multiple LSIs that can be created and destroyed, we quickly run in 
memzone allocation problems with the KNI interfaces since the memzones 
are never released. This was demoed and we had to increase quite a bit 
the hugepages memory, to delay the exhaustion, but cannot be used in a 
real deployment as of now..

What I would like to know is if the proposed strategy of having a pool, 
to reuse the memzone and to solve the issue, would be *conceptually* 
acceptable. I wouldn't want to spend some time on a patch that would 
stay out of mainstream, and have to mantain a fork of DPDK, because at 
least in short term we need to fix this issue with the KNI interfaces.

Let me know your thoughts and if I should proceed with the development 
of the patch

best
marc


[1] http://www.xdpd.org/, https://github.com/bisdn/xdpd
[2] https://www.fp7-unify.eu/

>
>> Regards,
>> Jay
>>
>> On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune <marc.sune@bisdn.de> wrote:
>>
>>> Hi all,
>>>
>>> So we are having some problems with KNI. In short, we have a DPDK
>>> application that creates KNI interfaces and destroys them during its
>>> lifecycle and connecting them to DOCKER containers. Interfaces may
>>> eventually be even named the same (see below).
>>>
>>> We were wondering why even calling rte_kni_release() the hugepages memory
>>> was rapidly being exhausted, and we also realised even after destruction,
>>> you cannot use the same name for the interface.
>>>
>>> After close inspection of the rte_kni lib we think the core issue and is
>>> mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
>>> that calls at the end rte_memzone_reserve() which cannot be unreserved by
>>> rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
>>> the number of FIFOs created (6).
>>>
>>> If this would be right, we would propose and try to provide a patch as
>>> follows:
>>>
>>> * Create a new rte_kni_init(unsigned int max_knis);
>>>
>>> This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
>>> and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
>>> a kni_fifo_pool. This should only be called once by DPDK applications at
>>> bootstrapping time.
>>>
>>> * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
>>> a a set of 6 FIFOs making a single slot)
>>> * rte_kni_release would return to the pool.
>>>
>>> This should solve both issues. We would base the patch on 1.7.2.
>>>
>>> Thoughts?
>>> marc
>>>
>>> p.s. Lately someone involved with DPDK said KNI would be deprecated in
>>> future DPDK releases; I haven't read or listen to this before, is this
>>> true? What would be the natural replacement then?
>>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-23 20:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23  9:27 [dpdk-dev] KNI and memzones Marc Sune
2014-09-23 12:39 ` Jay Rolette
2014-09-23 16:38   ` Zhou, Danny
2014-09-23 18:53     ` Jay Rolette
2014-09-23 19:12       ` Zhou, Danny
2014-09-23 19:24         ` Jay Rolette
2014-09-23 20:50     ` Marc Sune

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).