DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  5:28  8%   ` Stephen Hemminger
@ 2015-06-17  8:23  9%     ` Marc Sune
  0 siblings, 0 replies; 200+ results
From: Marc Sune @ 2015-06-17  8:23 UTC (permalink / raw)
  To: dev



On 17/06/15 07:28, Stephen Hemminger wrote:
> On Tue, Jun 16, 2015 at 9:36 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
>
>> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
>>> There were some debates about software statistics disabling.
>>> Should they be always on or possibly disabled when compiled?
>>> We need to take a decision shortly and discuss (or agree) this proposal:
>>>        http://dpdk.org/ml/archives/dev/2015-June/019461.html
>> This goes against the idea I have seen before that we should be moving
>> toward
>> a distro-friendly approach where one copy of DPDK can be used by multiple
>> apps
>> without having to rebuild it. It seems like it is also a bit ABI hostile
>> according to the below goals / discussions.
>>
>> Jemalloc is also very high-performance code and still manages to allow
>> enabling and disabling statistics at runtime. Are we sure it's impossible
>> for
>> DPDK or just theorizing?
>>
>>> During the development of the release 2.0, there was an agreement to keep
>>> ABI compatibility or to bring new ABI while keeping old one during one
>> release.
>>> In case it's not possible to have this transition, the (exceptional)
>> break
>>> should be acknowledged by several developers.
>> Personally to me it seems more important to preserve the ABI on patch
>> releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?
>>
>>> During the current development cycle for the release 2.1, the ABI
>> question
>>> arises many times in different threads.
>> Most but not all of these examples point to a different issue which
>> sometimes
>> happens in libraries... often seen as "old-style" versus "new-style" C
>> library
>> interface. For example, in old-style like libpcap there are a lot of
>> structs,
>> both opaque and non-opaque, which the caller must allocate in order to run
>> libpcap.
>>
>> However new-style libraries such as libcurl usually just have init
>> functions
>> which initialize all the secret structs based on some defaults and some
>> user
>> parameters and hide the actual structs from the user. If you want to adjust
>> some you call an adjuster function that modifies the actual secret struct
>> contents, with some enum saying what field to adjust, and the new value you
>> want it to have.
>>
>> If you want to keep a stable ABI for a non-stable library like DPDK,
>> there's a
>> good chance you must begin hiding all these weird device specific structs
>> all
>> over the DPDK from the user needing to directly allocate and modify them.
>> Otherwise the ABI breaks everytime you have to add adjustments, extensions,
>> modifications to all these obscure special features.
>>
>> Matthew.
>>
> The DPDK makes extensive use of inline functions which prevents data hiding
> necessary for ABI stablility. This a fundamental tradeoff, and since the
> whole
> reason for DPDK is performance; the ABI is going to be a moving target.
>
> It would make more sense to provide a higher level API which was abstracted,
> slower, but stable for applications. But in doing so it would mean giving
> up things
> like inline lockless rings. Just don't go as far as the Open (not)
> dataplane API;
> which is just an excuse for closed source.
+1 to what Stephen says. I don't think though a higher level API is 
really worth.

I unfortunately could not participate in the ABI discussion (there are 
times in which you cannot be following each and every single thread in 
the mailing list, and then it got lost in my inbox, my bad).

To me, ABI compatibility is a must if we'd have long term support 
releases (1.8.0, 1.8.1...), which we don't (I proposed to have them a 
couple of times).

On the other side, if the argument for ABI compatibility is shared 
libraries and to make it easy to package for distros, I don't see 
anything better than using the MAJOR.MIN version to identify the ABI 
(1.8, 1.9). Having LTS support would also help distros be able to stay 
in an ABI compatible version of DPDK and get recent bug fixes, and to 
better identify ABi changes (MAJ.MIN).

Moreover, I doubt that many people use shared libraries in DPDK. DPDK is 
built for performance and has a lot of code inlining which makes anyway 
your binary to be recompiled, as Stephen says.

DPDK is still growing, and the strict ABI policy is forcing patches to 
be delayed, to add hacky work-arounds to not break the ABI policy ( Re: 
[dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf as an 
example)  that are announced that have to be removed afterwards. I see 
the entire ABI policy as a blocker and inducing to have botches all over 
the code, than anything else.

Marc

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH 3/3] acl: mark deprecated functions
  @ 2015-06-17  7:59  4%   ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-17  7:59 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger

On 06/15/2015 07:51 PM, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> To allow for compatiablity with later releases, any functions
> to be removed should be marked as deprecated for one release.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[...]
> diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h
> index 3a93730..0c32df0 100644
> --- a/lib/librte_acl/rte_acl.h
> +++ b/lib/librte_acl/rte_acl.h
> @@ -456,7 +456,7 @@ enum {
>   int
>   rte_acl_ipv4vlan_add_rules(struct rte_acl_ctx *ctx,
>   	const struct rte_acl_ipv4vlan_rule *rules,
> -	uint32_t num);
> +	uint32_t num) __attribute__((deprecated));
>
>   /**
>    * Analyze set of ipv4vlan rules and build required internal
> @@ -478,7 +478,7 @@ rte_acl_ipv4vlan_add_rules(struct rte_acl_ctx *ctx,
>   int
>   rte_acl_ipv4vlan_build(struct rte_acl_ctx *ctx,
>   	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
> -	uint32_t num_categories);
> +	uint32_t num_categories) __attribute__((deprecated));
>
>
>   #ifdef __cplusplus
>

I've no objections to the patch as such, but I think the ABI policy is 
asking all deprecation notices to be added to the "Deprecation Notices" 
section in the policy document itself.

That said, the average developer is MUCH likelier to notice the compiler 
warning from these than a deprecation notice in some "who reads those" 
document :) Perhaps we could generate a list of functions marked for 
removal for the ABI policy document automatically based on the 
deprecated attributes.

	- Panu -

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  2015-06-17  0:39  0%         ` Stephen Hemminger
@ 2015-06-17  7:29  0%           ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-17  7:29 UTC (permalink / raw)
  To: Stephen Hemminger, Thomas Monjalon; +Cc: dev, Stephen Hemminger

On 06/17/2015 03:39 AM, Stephen Hemminger wrote:
> On Tue, 16 Jun 2015 23:37:32 +0000
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>
>> 2015-06-16 16:05, Stephen Hemminger:
>>> On Tue, 16 Jun 2015 14:52:16 +0100
>>> Bruce Richardson <bruce.richardson@intel.com> wrote:
>>>
>>>> On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
>>>>> From: Stephen Hemminger <shemming@brocade.com>
>>>>>
>>>>> These were deprecated in 2.0 so remove them from 2.1
>>>>>
>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>> ---
>>>>>   drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
>>>>>   drivers/net/ring/rte_eth_ring_version.map |  4 +--
>>>>>   2 files changed, 1 insertion(+), 58 deletions(-)
>>>>>
>>>> [..snip..]
>>>>> diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
>>>>> index 8ad107d..5ee55d9 100644
>>>>> --- a/drivers/net/ring/rte_eth_ring_version.map
>>>>> +++ b/drivers/net/ring/rte_eth_ring_version.map
>>>>> @@ -1,9 +1,7 @@
>>>>> -DPDK_2.0 {
>>>>> +DPDK_2.1 {
>>>>>   	global:
>>>>>
>>>>>   	rte_eth_from_rings;
>>>>> -	rte_eth_ring_pair_attach;
>>>>> -	rte_eth_ring_pair_create;
>>>>>
>>>>>   	local: *;
>>>>>   };
>>>>
>>>> [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0
>>>> version listings in the .map file?
>>>
>>> Notice the version # changed as well, so linker will generate a new version.
>>> The function was marked deprecated in last version.
>>
>> What happens if you load the 2.1 lib with an app built for 2.0?
>> Shouldn't we keep the DPDK_2.0 block?
>
> What happens is that build process makes a new version of DPDK package
> with a new version number. This version can co-exist on same system with
> old library (depends on library packaging).
> Old library will have old functions, and old application will
> use old library. New applications will be have new so version and get the
> new library.
>
>    http://unix.stackexchange.com/questions/475/how-do-so-shared-object-numbers-work
>
> If we didn't do this, nothing could ever really be removed!

Yes, soname bump is required when symbols are removed.

However that doesn't change the version the remaining symbols were 
introduced, eg rte_eth_from_rings() in this case, so AIUI you should 
leave the DPDK_2.0 {} block version alone. If new symbols get added in 
2.1 then a new DPDK_2.1 block needs to be added for those.

	- Panu -

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: announce ABI changes planned for unified packet type
@ 2015-06-17  5:48 20% Helin Zhang
  0 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-17  5:48 UTC (permalink / raw)
  To: dev

The significant ABI changes are planned for unified packet type
which will be supported from release 2.2. Here announces that ABI
changes in detail.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..6bbb6ed 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* Significant ABI changes are planned for struct rte_mbuf, and several PKT_RX_ flags will be removed, to support unified packet type from release 2.2. The upcoming release 2.1 will not have those changes. There is no backward compatibility planned from release 2.2. All binaries will need to be rebuilt from release 2.2.
-- 
1.9.3

^ permalink raw reply	[relevance 20%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  4:36  9% ` Matthew Hall
@ 2015-06-17  5:28  8%   ` Stephen Hemminger
  2015-06-17  8:23  9%     ` Marc Sune
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-17  5:28 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev

On Tue, Jun 16, 2015 at 9:36 PM, Matthew Hall <mhall@mhcomputing.net> wrote:

> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> >       http://dpdk.org/ml/archives/dev/2015-June/019461.html
>
> This goes against the idea I have seen before that we should be moving
> toward
> a distro-friendly approach where one copy of DPDK can be used by multiple
> apps
> without having to rebuild it. It seems like it is also a bit ABI hostile
> according to the below goals / discussions.
>
> Jemalloc is also very high-performance code and still manages to allow
> enabling and disabling statistics at runtime. Are we sure it's impossible
> for
> DPDK or just theorizing?
>
> > During the development of the release 2.0, there was an agreement to keep
> > ABI compatibility or to bring new ABI while keeping old one during one
> release.
> > In case it's not possible to have this transition, the (exceptional)
> break
> > should be acknowledged by several developers.
>
> Personally to me it seems more important to preserve the ABI on patch
> releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?
>
> > During the current development cycle for the release 2.1, the ABI
> question
> > arises many times in different threads.
>
> Most but not all of these examples point to a different issue which
> sometimes
> happens in libraries... often seen as "old-style" versus "new-style" C
> library
> interface. For example, in old-style like libpcap there are a lot of
> structs,
> both opaque and non-opaque, which the caller must allocate in order to run
> libpcap.
>
> However new-style libraries such as libcurl usually just have init
> functions
> which initialize all the secret structs based on some defaults and some
> user
> parameters and hide the actual structs from the user. If you want to adjust
> some you call an adjuster function that modifies the actual secret struct
> contents, with some enum saying what field to adjust, and the new value you
> want it to have.
>
> If you want to keep a stable ABI for a non-stable library like DPDK,
> there's a
> good chance you must begin hiding all these weird device specific structs
> all
> over the DPDK from the user needing to directly allocate and modify them.
> Otherwise the ABI breaks everytime you have to add adjustments, extensions,
> modifications to all these obscure special features.
>
> Matthew.
>

The DPDK makes extensive use of inline functions which prevents data hiding
necessary for ABI stablility. This a fundamental tradeoff, and since the
whole
reason for DPDK is performance; the ABI is going to be a moving target.

It would make more sense to provide a higher level API which was abstracted,
slower, but stable for applications. But in doing so it would mean giving
up things
like inline lockless rings. Just don't go as far as the Open (not)
dataplane API;
which is just an excuse for closed source.

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
@ 2015-06-17  4:36  9% ` Matthew Hall
  2015-06-17  5:28  8%   ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Matthew Hall @ 2015-06-17  4:36 UTC (permalink / raw)
  To: dev

On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> There were some debates about software statistics disabling.
> Should they be always on or possibly disabled when compiled?
> We need to take a decision shortly and discuss (or agree) this proposal:
> 	http://dpdk.org/ml/archives/dev/2015-June/019461.html

This goes against the idea I have seen before that we should be moving toward 
a distro-friendly approach where one copy of DPDK can be used by multiple apps 
without having to rebuild it. It seems like it is also a bit ABI hostile 
according to the below goals / discussions.

Jemalloc is also very high-performance code and still manages to allow 
enabling and disabling statistics at runtime. Are we sure it's impossible for 
DPDK or just theorizing?

> During the development of the release 2.0, there was an agreement to keep
> ABI compatibility or to bring new ABI while keeping old one during one release.
> In case it's not possible to have this transition, the (exceptional) break
> should be acknowledged by several developers.

Personally to me it seems more important to preserve the ABI on patch 
releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?

> During the current development cycle for the release 2.1, the ABI question
> arises many times in different threads.

Most but not all of these examples point to a different issue which sometimes 
happens in libraries... often seen as "old-style" versus "new-style" C library 
interface. For example, in old-style like libpcap there are a lot of structs, 
both opaque and non-opaque, which the caller must allocate in order to run libpcap. 

However new-style libraries such as libcurl usually just have init functions 
which initialize all the secret structs based on some defaults and some user 
parameters and hide the actual structs from the user. If you want to adjust 
some you call an adjuster function that modifies the actual secret struct 
contents, with some enum saying what field to adjust, and the new value you 
want it to have.

If you want to keep a stable ABI for a non-stable library like DPDK, there's a 
good chance you must begin hiding all these weird device specific structs all 
over the DPDK from the user needing to directly allocate and modify them. 
Otherwise the ABI breaks everytime you have to add adjustments, extensions, 
modifications to all these obscure special features.

Matthew.

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] abi: announce abi changes plan for struct rte_eth_fdir_flow_ext
@ 2015-06-17  3:36 23% Jingjing Wu
  0 siblings, 0 replies; 200+ results
From: Jingjing Wu @ 2015-06-17  3:36 UTC (permalink / raw)
  To: dev

It announces the planned ABI change to support flow director filtering in VF on v2.2.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..b8d7cc8 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* The ABI changes are planned for struct rte_eth_fdir_flow_ext in order to support flow director filtering in VF. The upcoming release 2.1 will not contain these ABI changes, but release 2.2 will, and no backwards compatibility is planed due to this change. Binaries using this library build prior to version 2.2 will require updating and recompilation.
-- 
1.9.3

^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
       [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
@ 2015-06-17  0:39  0%         ` Stephen Hemminger
  2015-06-17  7:29  0%           ` Panu Matilainen
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-17  0:39 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Stephen Hemminger

On Tue, 16 Jun 2015 23:37:32 +0000
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-06-16 16:05, Stephen Hemminger:
> > On Tue, 16 Jun 2015 14:52:16 +0100
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > 
> > > On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> > > > From: Stephen Hemminger <shemming@brocade.com>
> > > > 
> > > > These were deprecated in 2.0 so remove them from 2.1
> > > > 
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > ---
> > > >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> > > >  drivers/net/ring/rte_eth_ring_version.map |  4 +--
> > > >  2 files changed, 1 insertion(+), 58 deletions(-)
> > > >   
> > > [..snip..]
> > > > diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> > > > index 8ad107d..5ee55d9 100644
> > > > --- a/drivers/net/ring/rte_eth_ring_version.map
> > > > +++ b/drivers/net/ring/rte_eth_ring_version.map
> > > > @@ -1,9 +1,7 @@
> > > > -DPDK_2.0 {
> > > > +DPDK_2.1 {
> > > >  	global:
> > > >  
> > > >  	rte_eth_from_rings;
> > > > -	rte_eth_ring_pair_attach;
> > > > -	rte_eth_ring_pair_create;
> > > >  
> > > >  	local: *;
> > > >  };  
> > > 
> > > [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
> > > version listings in the .map file?
> > 
> > Notice the version # changed as well, so linker will generate a new version.
> > The function was marked deprecated in last version.
> 
> What happens if you load the 2.1 lib with an app built for 2.0?
> Shouldn't we keep the DPDK_2.0 block?

What happens is that build process makes a new version of DPDK package
with a new version number. This version can co-exist on same system with
old library (depends on library packaging).
Old library will have old functions, and old application will
use old library. New applications will be have new so version and get the
new library.

  http://unix.stackexchange.com/questions/475/how-do-so-shared-object-numbers-work

If we didn't do this, nothing could ever really be removed!

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  2015-06-16 23:05  0%     ` Stephen Hemminger
@ 2015-06-16 23:37  0%       ` Thomas Monjalon
       [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-16 23:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

2015-06-16 16:05, Stephen Hemminger:
> On Tue, 16 Jun 2015 14:52:16 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> > > From: Stephen Hemminger <shemming@brocade.com>
> > > 
> > > These were deprecated in 2.0 so remove them from 2.1
> > > 
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> > >  drivers/net/ring/rte_eth_ring_version.map |  4 +--
> > >  2 files changed, 1 insertion(+), 58 deletions(-)
> > >   
> > [..snip..]
> > > diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> > > index 8ad107d..5ee55d9 100644
> > > --- a/drivers/net/ring/rte_eth_ring_version.map
> > > +++ b/drivers/net/ring/rte_eth_ring_version.map
> > > @@ -1,9 +1,7 @@
> > > -DPDK_2.0 {
> > > +DPDK_2.1 {
> > >  	global:
> > >  
> > >  	rte_eth_from_rings;
> > > -	rte_eth_ring_pair_attach;
> > > -	rte_eth_ring_pair_create;
> > >  
> > >  	local: *;
> > >  };  
> > 
> > [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
> > version listings in the .map file?
> 
> Notice the version # changed as well, so linker will generate a new version.
> The function was marked deprecated in last version.

What happens if you load the 2.1 lib with an app built for 2.0?
Shouldn't we keep the DPDK_2.0 block?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
@ 2015-06-16 23:29  9% Thomas Monjalon
  2015-06-17  4:36  9% ` Matthew Hall
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-16 23:29 UTC (permalink / raw)
  To: announce

Hi all,

Sometimes there are some important discussions about architecture or design
which require opinions from several developers. Unfortunately, we cannot
read every threads. Maybe that using the announce mailing list will help
to bring more audience to these discussions.
Please note that
	- the announce@ ML is moderated to keep a low traffic,
	- every announce email is forwarded to dev@ ML.
In case you want to reply to this email, please use dev@dpdk.org address.

There were some debates about software statistics disabling.
Should they be always on or possibly disabled when compiled?
We need to take a decision shortly and discuss (or agree) this proposal:
	http://dpdk.org/ml/archives/dev/2015-June/019461.html

During the development of the release 2.0, there was an agreement to keep
ABI compatibility or to bring new ABI while keeping old one during one release.
In case it's not possible to have this transition, the (exceptional) break
should be acknowledged by several developers.
	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
There were some interesting discussions but not a lot of participants:
	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=8461

During the current development cycle for the release 2.1, the ABI question
arises many times in different threads.
To add the hash key size field, it is proposed to use a struct padding gap:
	http://dpdk.org/ml/archives/dev/2015-June/019386.html
To support the flow director for VF, there is no proposal yet:
	http://dpdk.org/ml/archives/dev/2015-June/019343.html
To add the speed capability, it is proposed to break ABI in the release 2.2:
	http://dpdk.org/ml/archives/dev/2015-June/019225.html
To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
	http://dpdk.org/ml/archives/dev/2015-June/019443.html
To add the interrupt mode, it is proposed to add a build-time option
CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking binary:
	http://dpdk.org/ml/archives/dev/2015-June/018947.html
To add the packet type, there is a proposal to add a build-time option
CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
	http://dpdk.org/ml/archives/dev/2015-June/019172.html
We must also better document how to remove a deprecated ABI:
	http://dpdk.org/ml/archives/dev/2015-June/019465.html
The ABI compatibility is a new constraint and we need to better understand
what it means and how to proceed. Even the macros are not yet well documented:
	http://dpdk.org/ml/archives/dev/2015-June/019357.html

Thanks for your attention and your participation in these important choices.

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  2015-06-16 13:52  3%   ` Bruce Richardson
@ 2015-06-16 23:05  0%     ` Stephen Hemminger
  2015-06-16 23:37  0%       ` Thomas Monjalon
       [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2015-06-16 23:05 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Stephen Hemminger

On Tue, 16 Jun 2015 14:52:16 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> > From: Stephen Hemminger <shemming@brocade.com>
> > 
> > These were deprecated in 2.0 so remove them from 2.1
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> >  drivers/net/ring/rte_eth_ring_version.map |  4 +--
> >  2 files changed, 1 insertion(+), 58 deletions(-)
> >   
> [..snip..]
> > diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> > index 8ad107d..5ee55d9 100644
> > --- a/drivers/net/ring/rte_eth_ring_version.map
> > +++ b/drivers/net/ring/rte_eth_ring_version.map
> > @@ -1,9 +1,7 @@
> > -DPDK_2.0 {
> > +DPDK_2.1 {
> >  	global:
> >  
> >  	rte_eth_from_rings;
> > -	rte_eth_ring_pair_attach;
> > -	rte_eth_ring_pair_create;
> >  
> >  	local: *;
> >  };  
> 
> [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
> version listings in the .map file?

Notice the version # changed as well, so linker will generate a new version.
The function was marked deprecated in last version.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  @ 2015-06-16 13:52  3%   ` Bruce Richardson
  2015-06-16 23:05  0%     ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-06-16 13:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
> 
> These were deprecated in 2.0 so remove them from 2.1
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
>  drivers/net/ring/rte_eth_ring_version.map |  4 +--
>  2 files changed, 1 insertion(+), 58 deletions(-)
> 
[..snip..]
> diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> index 8ad107d..5ee55d9 100644
> --- a/drivers/net/ring/rte_eth_ring_version.map
> +++ b/drivers/net/ring/rte_eth_ring_version.map
> @@ -1,9 +1,7 @@
> -DPDK_2.0 {
> +DPDK_2.1 {
>  	global:
>  
>  	rte_eth_from_rings;
> -	rte_eth_ring_pair_attach;
> -	rte_eth_ring_pair_create;
>  
>  	local: *;
>  };

[ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
version listings in the .map file?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4] doc: guidelines for library statistics
@ 2015-06-16 13:35  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-16 13:35 UTC (permalink / raw)
  To: dev

v4 changes
-more fixes for bullets

v3 changes
-fixed bullets for correct doc generation

v2 changes
-small text changes
-reordered sections to have guidelines at the top and motivation at the end
-broke lines at 80 characters

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+* For Application A, counters maintained by Library X are always relevant and
+  the application needs to use them to implement certain features, such as traffic
+  accounting, logging, application-level statistics, etc. In this case,
+  the application requires that collection of statistics counters for Library X is
+  always turned on.
+
+* For Application B, counters maintained by Library X are only useful during the
+  application debug stage and are not relevant once debug phase is over. In this
+  case, the application may decide to turn on the collection of Library X
+  statistics counters during the debug phase and at a later stage turn them off.
+
+* For Application C, counters maintained by Library X are not relevant at all.
+  It might be that the application maintains its own set of statistics counters
+  that monitor a different set of run-time events (e.g. number of connection
+  requests, number of active users, etc). It might also be that the application
+  uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+  statistics counters of Library Y, but not in those of Library X. In this case,
+  the application may decide to turn the collection of statistics counters off for
+  Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+* Number of libraries used by the current application that have statistics
+  counters collection turned on.
+
+* Number of statistics counters maintained by each library per object type
+  instance (e.g. per port, table, pipeline, thread, etc).
+
+* Number of instances created for each object type supported by each library.
+
+* Complexity of the statistics logic collection for each counter: when only
+  some occurrences of a specific event are valid, additional logic is typically
+  needed to decide whether the current occurrence of the event should be counted
+  or not. For example, in the event of packet reception, when only TCP packets
+  with destination port within a certain range should be recorded, conditional
+  branches are usually required. When processing a burst of packets that have been
+  validated for header integrity, counting the number of bits set in a bitmask
+  might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v3] doc: guidelines for library statistics
@ 2015-06-16 13:15  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-16 13:15 UTC (permalink / raw)
  To: dev

v3 changes
-fixed bullets for correct doc generation

v2 changes
-small text changes
-reordered sections to have guidelines at the top and motivation at the end
-broke lines at 80 characters

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+ * For Application A, counters maintained by Library X are always relevant and
+the application needs to use them to implement certain features, such as traffic
+accounting, logging, application-level statistics, etc. In this case,
+the application requires that collection of statistics counters for Library X is
+always turned on.
+
+ * For Application B, counters maintained by Library X are only useful during the
+application debug stage and are not relevant once debug phase is over. In this
+case, the application may decide to turn on the collection of Library X
+statistics counters during the debug phase and at a later stage turn them off.
+
+ * For Application C, counters maintained by Library X are not relevant at all.
+It might be that the application maintains its own set of statistics counters
+that monitor a different set of run-time events (e.g. number of connection
+requests, number of active users, etc). It might also be that the application
+uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+statistics counters of Library Y, but not in those of Library X. In this case,
+the application may decide to turn the collection of statistics counters off for
+Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+ * Number of libraries used by the current application that have statistics
+counters collection turned on.
+
+ * Number of statistics counters maintained by each library per object type
+instance (e.g. per port, table, pipeline, thread, etc).
+
+ * Number of instances created for each object type supported by each library.
+
+ * Complexity of the statistics logic collection for each counter: when only
+some occurrences of a specific event are valid, additional logic is typically
+needed to decide whether the current occurrence of the event should be counted
+or not. For example, in the event of packet reception, when only TCP packets
+with destination port within a certain range should be recorded, conditional
+branches are usually required. When processing a burst of packets that have been
+validated for header integrity, counting the number of bits set in a bitmask
+might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2] doc: guidelines for library statistics
@ 2015-06-16 13:14  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-16 13:14 UTC (permalink / raw)
  To: dev

v3 changes
-fixed bullets for correct doc generation

v2 changes
-small text changes
-reordered sections to have guidelines at the top and motivation at the end
-broke lines at 80 characters

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+ * For Application A, counters maintained by Library X are always relevant and
+the application needs to use them to implement certain features, such as traffic
+accounting, logging, application-level statistics, etc. In this case,
+the application requires that collection of statistics counters for Library X is
+always turned on.
+
+ * For Application B, counters maintained by Library X are only useful during the
+application debug stage and are not relevant once debug phase is over. In this
+case, the application may decide to turn on the collection of Library X
+statistics counters during the debug phase and at a later stage turn them off.
+
+ * For Application C, counters maintained by Library X are not relevant at all.
+It might be that the application maintains its own set of statistics counters
+that monitor a different set of run-time events (e.g. number of connection
+requests, number of active users, etc). It might also be that the application
+uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+statistics counters of Library Y, but not in those of Library X. In this case,
+the application may decide to turn the collection of statistics counters off for
+Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+ * Number of libraries used by the current application that have statistics
+counters collection turned on.
+
+ * Number of statistics counters maintained by each library per object type
+instance (e.g. per port, table, pipeline, thread, etc).
+
+ * Number of instances created for each object type supported by each library.
+
+ * Complexity of the statistics logic collection for each counter: when only
+some occurrences of a specific event are valid, additional logic is typically
+needed to decide whether the current occurrence of the event should be counted
+or not. For example, in the event of packet reception, when only TCP packets
+with destination port within a certain range should be recorded, conditional
+branches are usually required. When processing a burst of packets that have been
+validated for header integrity, counting the number of bits set in a bitmask
+might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues
  2015-06-16  1:38 23% [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues Ouyang Changchun
@ 2015-06-16 10:36  5% ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-06-16 10:36 UTC (permalink / raw)
  To: Ouyang Changchun; +Cc: dev

On Tue, Jun 16, 2015 at 09:38:43AM +0800, Ouyang Changchun wrote:
> It announces the planned ABI changes for vhost-user multiple queues feature on v2.2.
> 
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> ---
>  doc/guides/rel_notes/abi.rst | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
> index f00a6ee..dc1b0eb 100644
> --- a/doc/guides/rel_notes/abi.rst
> +++ b/doc/guides/rel_notes/abi.rst
> @@ -38,3 +38,4 @@ Examples of Deprecation Notices
>  
>  Deprecation Notices
>  -------------------
> +* The ABI changes are planned for struct virtio_net in order to support vhost-user multiple queues feature. The upcoming release 2.1 will not contain these ABI changes, but release 2.2 will, and no backwards compatibility is planed due to the vhost-user multiple queues feature enabling. Binaries using this library build prior to version 2.2 will require updating and recompilation.
> -- 
> 1.8.4.2
> 
> 

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type
    @ 2015-06-16  3:43  3% ` Jingjing Wu
  1 sibling, 0 replies; 200+ results
From: Jingjing Wu @ 2015-06-16  3:43 UTC (permalink / raw)
  To: dev

This patch set extends flow director to support L2_paylod type in i40e driver.
 
v2 change:
 - remove the flow director VF filtering from this patch to avoid breaking ABI.

Jingjing Wu (4):
  ethdev: add struct rte_eth_l2_flow to support l2_payload flow type
  i40e: extend flow diretcor to support l2_payload flow type
  testpmd: extend commands
  doc: extend commands in testpmd

 app/test-pmd/cmdline.c                      | 48 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  5 ++-
 drivers/net/i40e/i40e_fdir.c                | 24 +++++++++++++--
 lib/librte_ether/rte_eth_ctrl.h             |  8 +++++
 4 files changed, 78 insertions(+), 7 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues
@ 2015-06-16  1:38 23% Ouyang Changchun
  2015-06-16 10:36  5% ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ouyang Changchun @ 2015-06-16  1:38 UTC (permalink / raw)
  To: dev

It announces the planned ABI changes for vhost-user multiple queues feature on v2.2.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..dc1b0eb 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* The ABI changes are planned for struct virtio_net in order to support vhost-user multiple queues feature. The upcoming release 2.1 will not contain these ABI changes, but release 2.2 will, and no backwards compatibility is planed due to the vhost-user multiple queues feature enabling. Binaries using this library build prior to version 2.2 will require updating and recompilation.
-- 
1.8.4.2

^ permalink raw reply	[relevance 23%]

* [dpdk-dev] [PATCH v2] doc: guidelines for library statistics
@ 2015-06-15 22:07  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-15 22:07 UTC (permalink / raw)
  To: dev


Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+* For Application A, counters maintained by Library X are always relevant and
+the application needs to use them to implement certain features, such as traffic
+accounting, logging, application-level statistics, etc. In this case,
+the application requires that collection of statistics counters for Library X is
+always turned on.
+
+* For Application B, counters maintained by Library X are only useful during the
+application debug stage and are not relevant once debug phase is over. In this
+case, the application may decide to turn on the collection of Library X
+statistics counters during the debug phase and at a later stage turn them off.
+
+* For Application C, counters maintained by Library X are not relevant at all.
+It might be that the application maintains its own set of statistics counters
+that monitor a different set of run-time events (e.g. number of connection
+requests, number of active users, etc). It might also be that the application
+uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+statistics counters of Library Y, but not in those of Library X. In this case,
+the application may decide to turn the collection of statistics counters off for
+Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+* Number of libraries used by the current application that have statistics
+counters collection turned on.
+
+* Number of statistics counters maintained by each library per object type
+instance (e.g. per port, table, pipeline, thread, etc).
+
+* Number of instances created for each object type supported by each library.
+
+* Complexity of the statistics logic collection for each counter: when only
+some occurrences of a specific event are valid, additional logic is typically
+needed to decide whether the current occurrence of the event should be counted
+or not. For example, in the event of packet reception, when only TCP packets
+with destination port within a certain range should be recorded, conditional
+branches are usually required. When processing a burst of packets that have been
+validated for header integrity, counting the number of bits set in a bitmask
+might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics
  2015-06-11 12:05  0% ` Thomas Monjalon
@ 2015-06-15 21:46  4%   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-15 21:46 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, June 11, 2015 1:05 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics
> 
> Hi Cristian,
> 
> Thanks for trying to make a policy clearer.
> We need to make a decision in the coming week.
> Below are comments on the style and content.
> 
> 2015-06-08 15:50, Cristian Dumitrescu:
> >  doc/guides/guidelines/statistics.rst | 42
> ++++++++++++++++++++++++++++++++++++
> 
> Maybe we should have a more general file like design.rst.

I am not sure I correctly understood your suggestion. Do you want the section on statistics to be called design.rst? Do you want it to be part of doc/guides/guidelines or create a brand new document?
To me, the initial idea of doc/guidelines/statistics.rst makes more sense, but I am fine either way.

> In order to have a lot of readers of such guidelines, they must be concise.

I reordered the sections to provide the guidelines first and the motivation afterwards. I think it is good to keep the motivation together with the guidelines. Do you suggest to remove the motivation from this document?

> 
> Please wrap lines to be not too long and/or split lines after the end of a
> sentence.

Done in next version.

> 
> > +Library Statistics
> > +==================
> > +
> > +Description
> > +-----------
> > +
> > +This document describes the guidelines for DPDK library-level statistics
> counter support. This includes guidelines for turning library statistics on and
> off, requirements for preventing ABI changes when library statistics are
> turned on and off, etc.
> 
> Should we consider that driver stats and lib stats are different in DPDK? Why?

I did update the document to make it clear these guidelines are applicable for the case when the stats counters are maintained by the library itself.
I think the primary difference is whether the stats counters are implemented by the HW (e.g. NIC) or in SW by the CPU.
* In the first case, the CPU does not spend cycles maintaining the counters. System resources might be consumed by the NIC for maintaining the counters (e.g. memory bandwidth), so sometimes the NICs provide mechanisms to enable/disable stats, which is done as part of init code rather than buid time.
* In the second case, the CPU maintains the counters, which comes at the usual cost of cycles and system resources. These guidelines focus on this case.

> 
> > +Motivation to allow the application to turn library statistics on and off
> > +-------------------------------------------------------------------------
> > +
> > +It is highly recommended that each library provides statistics counters to
> allow the application to monitor the library-level run-time events. Typical
> counters are: number of packets received/dropped/transmitted, number of
> buffers allocated/freed, number of occurrences for specific events, etc.
> > +
> > +Since the resources consumed for library-level statistics counter collection
> have to be spent out of the application budget and the counters collected by
> some libraries might not be relevant for the current application, in order to
> avoid any unwanted waste of resources and/or performance for the
> application, the application is to decide at build time whether the collection
> of library-level statistics counters should be turned on or off for each library
> individually.
> 
> It would be good to have acknowledgements or other opinions on this.
> Some of them were expressed in other threads. Please comment here.
> 
> > +Library-level statistics counters can be relevant or not for specific
> applications:
> > +* For application A, counters maintained by library X are always relevant
> and the application needs to use them to implement certain features, as
> traffic accounting, logging, application-level statistics, etc. In this case, the
> application requires that collection of statistics counters for library X is always
> turned on;
> > +* For application B, counters maintained by library X are only useful during
> the application debug stage and not relevant once debug phase is over. In
> this case, the application may decide to turn on the collection of library X
> statistics counters during the debug phase and later on turn them off;
> 
> Users of binary package have not this choice.

There is a section in the document about allowing stats to be turned on/off without ABI impact.

> 
> > +* For application C, counters maintained by library X are not relevant at all.
> It might me that the application maintains its own set of statistics counters
> that monitor a different set of run-time events than library X (e.g. number of
> connection requests, number of active users, etc). It might also be that
> application uses multiple libraries (library X, library Y, etc) and it is interested
> in the statistics counters of library Y, but not in those of library X. In this case,
> the application may decide to turn the collection of statistics counters off for
> library X and on for library Y.
> > +
> > +The statistics collection consumes a certain amount of CPU resources
> (cycles, cache bandwidth, memory bandwidth, etc) that depends on:
> > +* Number of libraries used by the current application that have statistics
> counters collection turned on;
> > +* Number of statistics counters maintained by each library per object type
> instance (e.g. per port, table, pipeline, thread, etc);
> > +* Number of instances created for each object type supported by each
> library;
> > +* Complexity of the statistics logic collection for each counter: when only
> some occurrences of a specific event are valid, several conditional branches
> might involved in the decision of whether the current occurrence of the
> event should be counted or not (e.g. on the event of packet reception, only
> TCP packets with destination port within a certain range should be recorded),
> etc.
> > +
> > +Mechanism to allow the application to turn library statistics on and off
> > +------------------------------------------------------------------------
> > +
> > +Each library that provides statistics counters should provide a single build
> time flag that decides whether the statistics counter collection is enabled or
> not for this library. This flag should be exposed as a variable within the DPDK
> configuration file. When this flag is set, all the counters supported by current
> library are collected; when this flag is cleared, none of the counters
> supported by the current library are collected:
> > +
> > +	#DPDK file “./config/common_linuxapp”,
> “./config/common_bsdapp”, etc
> > +	CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_COLLECT_STATS=y/n
> 
> Why not simply CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_STATS (without
> COLLECT)?
> 

The reason I propose the name STATS_COLLECT rather than just STATS is to give a strong hint that counters are always allocated and the stats API is always available, while just the collection/update of the counters is configurable. The simple STATS mention might drive some library developers to make the stats API compiled in and out, which is something we want to prevent in order to avoid ABI changes. I don't have a strong preference though, if you still think STATS is better than STATS_COLLECT, let me know.

> > +The default value for this DPDK configuration file variable (either “yes” or
> “no”) is left at the decision of each library.
> > +
> > +Prevention of ABI changes due to library statistics support
> > +-----------------------------------------------------------
> > +
> > +The layout of data structures and prototype of functions that are part of
> the library API should not be affected by whether the collection of statistics
> counters is turned on or off for the current library. In practical terms, this
> means that space is always allocated in the API data structures for statistics
> counters and the statistics related API functions are always built into the
> code, regardless of whether the statistics counter collection is turned on or
> off for the current library.
> > +
> > +When the collection of statistics counters for the current library is turned
> off, the counters retrieved through the statistics related API functions should
> have the default value of zero.


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info
  @ 2015-06-15 18:23  3%                     ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-15 18:23 UTC (permalink / raw)
  To: David Harton (dharton), Wang, Liang-min, dev



> -----Original Message-----
> From: David Harton (dharton) [mailto:dharton@cisco.com]
> Sent: Monday, June 15, 2015 5:05 PM
> To: Ananyev, Konstantin; Wang, Liang-min; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > Sent: Monday, June 15, 2015 9:46 AM
> > To: Wang, Liang-min; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access
> > device info
> >
> >
> >
> > > -----Original Message-----
> > > From: Wang, Liang-min
> > > Sent: Monday, June 15, 2015 2:26 PM
> > > To: Ananyev, Konstantin; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > access device info
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Friday, June 12, 2015 8:31 AM
> > > > To: Wang, Liang-min; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > > access device info
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Wang, Liang-min
> > > > > Sent: Thursday, June 11, 2015 10:51 PM
> > > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > > > access device info
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Ananyev, Konstantin
> > > > > > Sent: Thursday, June 11, 2015 9:07 AM
> > > > > > To: Wang, Liang-min; dev@dpdk.org
> > > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > support access device info
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Wang, Liang-min
> > > > > > > Sent: Thursday, June 11, 2015 1:58 PM
> > > > > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > > support access device info
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ananyev, Konstantin
> > > > > > > > Sent: Thursday, June 11, 2015 8:26 AM
> > > > > > > > To: Wang, Liang-min; dev@dpdk.org
> > > > > > > > Cc: Wang, Liang-min
> > > > > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > > > support access device info
> > > > > > > >
> > > > > > > > Hi Larry,
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of
> > > > > > > > > Liang-Min Larry Wang
> > > > > > > > > Sent: Wednesday, June 10, 2015 4:10 PM
> > > > > > > > > To: dev@dpdk.org
> > > > > > > > > Cc: Wang, Liang-min
> > > > > > > > > Subject: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > > > > support access device info
> > > > > > > > >
> > > > > > > > > add new apis:
> > > > > > > > > - rte_eth_dev_default_mac_addr_set
> > > > > > > > > - rte_eth_dev_reg_leng
> > > > > > > > > - rte_eth_dev_reg_info
> > > > > > > > > - rte_eth_dev_eeprom_leng
> > > > > > > > > - rte_eth_dev_get_eeprom
> > > > > > > > > - rte_eth_dev_set_eeprom
> > > > > > > > > - rte_eth_dev_get_ringparam
> > > > > > > > > - rte_eth_dev_set_ringparam
> > > > > > > > >
> > > > > > > > > to enable reading device parameters (mac-addr, register,
> > > > > > > > > eeprom,
> > > > > > > > > ring) based upon ethtool alike data parameter specification.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Liang-Min Larry Wang
> > > > > > > > > <liang-min.wang@intel.com>
> > > > > > > > > ---
> > > > > > > > >  lib/librte_ether/Makefile              |   1 +
> > > > > > > > >  lib/librte_ether/rte_eth_dev_info.h    |  80
> > +++++++++++++++++
> > > > > > > > >  lib/librte_ether/rte_ethdev.c          | 159
> > > > > > > > +++++++++++++++++++++++++++++++++
> > > > > > > > >  lib/librte_ether/rte_ethdev.h          | 158
> > > > > > > > ++++++++++++++++++++++++++++++++
> > > > > > > > >  lib/librte_ether/rte_ether_version.map |   8 ++
> > > > > > > > >  5 files changed, 406 insertions(+)  create mode 100644
> > > > > > > > > lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > >
> > > > > > > > > diff --git a/lib/librte_ether/Makefile
> > > > > > > > > b/lib/librte_ether/Makefile index c0e5768..05209e9 100644
> > > > > > > > > --- a/lib/librte_ether/Makefile
> > > > > > > > > +++ b/lib/librte_ether/Makefile
> > > > > > > > > @@ -51,6 +51,7 @@ SRCS-y += rte_ethdev.c
> > > > > > > > > SYMLINK-y-include += rte_ether.h  SYMLINK-y-include +=
> > > > > > > > > rte_ethdev.h SYMLINK-y-include
> > > > > > > > > += rte_eth_ctrl.h
> > > > > > > > > +SYMLINK-y-include += rte_eth_dev_info.h
> > > > > > > > >
> > > > > > > > >  # this lib depends upon:
> > > > > > > > >  DEPDIRS-y += lib/librte_eal lib/librte_mempool
> > > > > > > > > lib/librte_ring lib/librte_mbuf diff --git
> > > > > > > > > a/lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > > b/lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000..002c4b5
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > > @@ -0,0 +1,80 @@
> > > > > > > > > +/*-
> > > > > > > > > + *   BSD LICENSE
> > > > > > > > > + *
> > > > > > > > > + *   Copyright(c) 2015 Intel Corporation. All rights
> > reserved.
> > > > > > > > > + *   All rights reserved.
> > > > > > > > > + *
> > > > > > > > > + *   Redistribution and use in source and binary forms,
> > with or
> > > > without
> > > > > > > > > + *   modification, are permitted provided that the
> > following
> > > > conditions
> > > > > > > > > + *   are met:
> > > > > > > > > + *
> > > > > > > > > + *     * Redistributions of source code must retain the
> > above
> > > > copyright
> > > > > > > > > + *       notice, this list of conditions and the following
> > disclaimer.
> > > > > > > > > + *     * Redistributions in binary form must reproduce the
> > above
> > > > > > copyright
> > > > > > > > > + *       notice, this list of conditions and the following
> > disclaimer in
> > > > > > > > > + *       the documentation and/or other materials provided
> > with the
> > > > > > > > > + *       distribution.
> > > > > > > > > + *     * Neither the name of Intel Corporation nor the
> > names of its
> > > > > > > > > + *       contributors may be used to endorse or promote
> > products
> > > > > > derived
> > > > > > > > > + *       from this software without specific prior written
> > permission.
> > > > > > > > > + *
> > > > > > > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > > > > > > > CONTRIBUTORS
> > > > > > > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
> > > > INCLUDING,
> > > > > > BUT
> > > > > > > > NOT
> > > > > > > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
> > > > AND
> > > > > > > > FITNESS FOR
> > > > > > > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> > > > THE
> > > > > > > > COPYRIGHT
> > > > > > > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> > > > INDIRECT,
> > > > > > > > INCIDENTAL,
> > > > > > > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> > > > > > (INCLUDING,
> > > > > > > > BUT NOT
> > > > > > > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> > > > SERVICES;
> > > > > > > > LOSS OF USE,
> > > > > > > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
> > > > > > CAUSED
> > > > > > > > AND ON ANY
> > > > > > > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> > > > LIABILITY,
> > > > > > OR
> > > > > > > > TORT
> > > > > > > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> > > > WAY
> > > > > > OUT
> > > > > > > > OF THE USE
> > > > > > > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> > OF
> > > > SUCH
> > > > > > > > DAMAGE.
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +#ifndef _RTE_ETH_DEV_INFO_H_ #define _RTE_ETH_DEV_INFO_H_
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Placeholder for accessing device registers  */ struct
> > > > > > > > > +rte_dev_reg_info {
> > > > > > > > > +	void *buf; /**< Buffer for register */
> > > > > > > > > +	uint32_t offset; /**< Offset for 1st register to fetch
> > */
> > > > > > > > > +	uint32_t leng; /**< Number of registers to fetch */
> > > > > > > > > +	uint32_t version; /**< Device version */ };
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Placeholder for accessing device eeprom  */ struct
> > > > > > > > > +rte_dev_eeprom_info {
> > > > > > > > > +	void *buf; /**< Buffer for eeprom */
> > > > > > > > > +	uint32_t offset; /**< Offset for 1st eeprom location to
> > > > > > > > > +access
> > > > */
> > > > > > > > > +	uint32_t leng; /**< Length of eeprom region to access */
> > > > > > > > > +	uint32_t magic; /**< Device ID */ };
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Placeholder for accessing device ring parameters  */
> > > > > > > > > +struct rte_dev_ring_info {
> > > > > > > > > +	uint32_t rx_pending; /**< Number of outstanding Rx ring
> > */
> > > > > > > > > +	uint32_t tx_pending; /**< Number of outstanding Tx ring
> > */
> > > > > > > > > +	uint32_t rx_max_pending; /**< Maximum number of
> > > > outstanding
> > > > > > > > > +Rx
> > > > > > > > ring */
> > > > > > > > > +	uint32_t tx_max_pending; /**< Maximum number of
> > > > outstanding
> > > > > > > > > +Tx
> > > > > > > > ring
> > > > > > > > > +*/ };
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * A data structure captures information as defined in
> > > > > > > > > +struct ifla_vf_info
> > > > > > > > > + * for user-space api
> > > > > > > > > + */
> > > > > > > > > +struct rte_dev_vf_info {
> > > > > > > > > +	uint32_t vf;
> > > > > > > > > +	uint8_t mac[ETHER_ADDR_LEN];
> > > > > > > > > +	uint32_t vlan;
> > > > > > > > > +	uint32_t tx_rate;
> > > > > > > > > +	uint32_t spoofchk;
> > > > > > > > > +};
> > > > > > > >
> > > > > > > >
> > > > > > > > Wonder what that structure is for?
> > > > > > > > I can't see it used in any function below?
> > > > > > > >
> > > > > > >
> > > > > > > Good catch, this is designed for other ethtool ops that I did
> > > > > > > not include in
> > > > > > this release, I will remove this from next fix.
> > > > > > >
> > > > > > > > > +
> > > > > > > > > +#endif /* _RTE_ETH_DEV_INFO_H_ */
> > > > > > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > > > > > b/lib/librte_ether/rte_ethdev.c index 5a94654..186e85c
> > > > > > > > > 100644
> > > > > > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > > > > > @@ -2751,6 +2751,32 @@ rte_eth_dev_mac_addr_remove(uint8_t
> > > > > > > > port_id,
> > > > > > > > > struct ether_addr *addr)  }
> > > > > > > > >
> > > > > > > > >  int
> > > > > > > > > +rte_eth_dev_default_mac_addr_set(uint8_t port_id, struct
> > > > > > > > > +ether_addr
> > > > > > > > > +*addr) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +	const int index = 0;
> > > > > > > > > +	const uint32_t pool = 0;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	dev = &rte_eth_devices[port_id];
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> > > > >mac_addr_remove, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mac_addr_add, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +
> > > > > > > > > +	/* Update NIC default MAC address*/
> > > > > > > > > +	(*dev->dev_ops->mac_addr_remove)(dev, index);
> > > > > > > > > +	(*dev->dev_ops->mac_addr_add)(dev, addr, index, pool);
> > > > > > > > > +
> > > > > > > > > +	/* Update default address in NIC data structure */
> > > > > > > > > +	ether_addr_copy(addr, &dev->data->mac_addrs[index]);
> > > > > > > > > +
> > > > > > > > > +	return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > >  rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
> > > > > > > > >  				uint16_t rx_mode, uint8_t on)  { @@
> > > > -3627,3
> > > > > > +3653,136 @@
> > > > > > > > > rte_eth_remove_tx_callback(uint8_t port_id,
> > > > > > > > uint16_t queue_id,
> > > > > > > > >  	/* Callback wasn't found. */
> > > > > > > > >  	return -EINVAL;
> > > > > > > > >  }
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_reg_leng(uint8_t port_id) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_length,
> > > > -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_reg_length)(dev);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_reg_info(uint8_t port_id, struct
> > > > > > > > > +rte_dev_reg_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg, -
> > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_reg)(dev, info); }
> > > > > > > >
> > > > > > > > Seems that *get_reg* stuff, will be really good addition for
> > > > > > > > DPDK debugging abilities.
> > > > > > > > Though, I'd suggest we change it a bit to make more generic
> > > > > > > > and
> > > > flexible:
> > > > > > > >
> > > > > > > > Introduce rte_eth_reg_read/rte_eth_reg_write(),
> > > > > > > > or probably even better rte_pcidev_reg_read
> > > > > > > > /rte_pcidev_reg_write at
> > > > > > EAL.
> > > > > > > > Something similar to what
> > > > > > > > port_pci_reg_read/port_pci_reg_write()
> > > > > > > > are doing now at testpmd.h.
> > > > > > > >
> > > > > > > > struct rte_pcidev_reg_info {
> > > > > > > >    const char *name;
> > > > > > > >    uint32_t endianes, bar, offset, size, count; };
> > > > > > > >
> > > > > > > > int rte_pcidev_reg_read(const struct rte_pci_device *, const
> > > > > > > > struct rte_pcidev_reg_info *, uint64_t *reg_val);
> > > > > > > >
> > > > > > > > Then:
> > > > > > > > int rte_eth_dev_get_reg_info(port_id, const struct
> > > > > > > > rte_pcidev_reg_info **info);
> > > > > > > >
> > > > > > > > So each device would store in info a pointer to an array of
> > > > > > > > it's register descriptions (finished by zero elem?).
> > > > > > > >
> > > > > > > > Then your ethtool (or any other upper layer) can do the
> > > > > > > > following to read all device regs:
> > > > > > > >
> > > > > > >
> > > > > > > The proposed reg info structure allows future improvement to
> > > > > > > support
> > > > > > individual register read/write.
> > > > > > > Also, because each NIC device has a very distinguish register
> > definition.
> > > > > > > So, the plan is to have more comprehensive interface to
> > > > > > > support query operation (for example, register name) before
> > > > > > > introduce individual/group
> > > > > > register access.
> > > > > > > Points taken, the support will be in future release.
> > > > > >
> > > > > > Sorry, didn't get you.
> > > > > > So you are ok to make these changes in next patch version?
> > > > > >
> > > > > I would like to get a consensus from dpdk community on how to
> > > > > provide
> > > > register information.
> > > >
> > > > Well, that' ok, but if it is just a trial patch that is not intended
> > > > to be applied, then you should mark it as RFC.
> > > >
> > > > > Currently, it's designed for debug dumping. The register
> > > > > information is very
> > > > hardware dependent.
> > > > > Need to consider current supported NIC device and future devices
> > > > > for
> > > > DPDK, so we won't make it a bulky interface.
> > > >
> > > > Ok, could you explain what exactly  concerns you in the approach
> > > > described above?
> > > > What part you feel is bulky?
> > > >
> > > > > > >
> > > > > > > > const struct rte_eth_dev_reg_info *reg_info; struct
> > > > > > > > rte_eth_dev_info dev_info;
> > > > > > > >
> > > > > > > > rte_eth_dev_info_get(pid, &dev_info);
> > > > > > > > rte_eth_dev_get_reg_info(port_id, &reg_info);
> > > > > > > >
> > > > > > > > for (i = 0; reg_info[i].name != NULL; i++) {
> > > > > > > >    ...
> > > > > > > >    rte_pcidev_read_reg(dev_info. pci_dev, reg_info[i], &v);
> > > > > > > >   ..
> > > > > > > > }
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_eeprom_leng(uint8_t port_id) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> > > > >get_eeprom_length, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_eeprom_length)(dev);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_get_eeprom(uint8_t port_id, struct
> > > > > > > > > +rte_dev_eeprom_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_eeprom, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_eeprom)(dev, info); }
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_set_eeprom(uint8_t port_id, struct
> > > > > > > > > +rte_dev_eeprom_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_eeprom, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->set_eeprom)(dev, info); }
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_get_ringparam(uint8_t port_id, struct
> > > > > > > > > +rte_dev_ring_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_ringparam, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_ringparam)(dev, info); }
> > > > > > > >
> > > > > > > > I think it will be a useful addition to the ethdev API  to
> > > > > > > > have an ability to retrieve current RX/TX queue parameters.
> > > > > > > > Though again, it need to be more generic, so it could be
> > > > > > > > useful for
> > > > > > > > non- ethtool upper layer too.
> > > > > > > > So I suggest to modify it a bit.
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > struct rte_eth_tx_queue_info {
> > > > > > > >     struct rte_eth_txconf txconf;
> > > > > > > >     uint32_t nb_tx_desc;
> > > > > > > >     uint32_t nb_max_tx_desc; /*max allowable TXDs for that
> > queue */
> > > > > > > >     uint32_t nb_tx_free;            /* number of free TXDs at
> > the moment
> > > > of
> > > > > > call.
> > > > > > > > */
> > > > > > > >     /* other tx queue data. */ };
> > > > > > > >
> > > > > > > > int rte_etdev_get_tx_queue_info(portid, queue_id, struct
> > > > > > > > rte_eth_tx_queue_info *qinfo)
> > > > > > > >
> > > > > > > > Then, your upper layer ethtool wrapper, can implement yours
> > > > > > > > ethtool_get_ringparam() by:
> > > > > > > >
> > > > > > > >  ...
> > > > > > > >  struct rte_eth_tx_queue_info qinfo;
> > > > > > > > rte_ethdev_get_tx_queue_info(port, 0, &qinfo);
> > > > > > > > ring_param->tx_pending = qinfo.nb_tx_desc -
> > > > > > > > rte_eth_rx_queue_count(port, 0);
> > > > > > > >
> > > > > > > > Or probably even:
> > > > > > > > ring_param->tx_pending = qinfo.nb_tx_desc -
> > > > > > > > qinfo.nb_tx_free;
> > > > > > > >
> > > > > > > > Same for RX.
> > > > > > > >
> > > > > > > For now, this descriptor ring information is used by the ethtool
> > op.
> > > > > > > To make this interface simple, i.e. caller doesn't need to
> > > > > > > access other
> > > > > > queue information.
> > > > > >
> > > > > > I just repeat what I said to you in off-line conversation:
> > > > > > ethdev API is not equal ethtool API.
> > > > > > It is ok to add  a new function/structure to ethdev if it really
> > > > > > needed, but we should do mechanical one to one copy.
> > > > > > It is much better to add  a function/structure that would be
> > > > > > more generic, and suit other users, not only ethtool.
> > > > > > There is no point to have dozen functions in rte_ethdev API
> > > > > > providing similar information.
> > > > > > BTW, I don't see how API I proposed is much more  complex, then
> > > > > > yours
> > > > one.
> > > > > The ring parameter is a run-time information which is different
> > > > > than data
> > > > structure described in this discussion.
> > > >
> > > > I don't see how they are different.
> > > > Looking at ixgbe_get_ringparam(), it returns:
> > > > rx_max_pending - that's a static IXGBE PMD value (max possible
> > > > number of RXDs per one queue).
> > > > rx_pending - number of RXD currently in use by the HW for queue 0
> > > > (that information can be changed at each call).
> > > >
> > > > With the approach I suggesting - you can get same information for
> > > > each RX queue by calling rte_ethdev_get_rx_queue_info() and
> > > > rte_eth_rx_queue_count().
> > > > Plus you are getting other RX queue data.
> > > >
> > > > Another thing - what is practical usage of the information you
> > > > retrieving now by get_ringparam()?
> > > > Let say it returned to you: rx_max_pending=4096; rx_pending=128; How
> > > > that information would help to understand what is going on with the
> > device?
> > > > Without knowing  value of nb_tx_desc for the queue, you can't say is
> > > > you queue full or not.
> > > > Again, it could be that all your traffic going through some other
> > > > queue (not 0).
> > > > So from my point rte_eth_dev_get_ringparam()  usage is very limited,
> > > > and doesn't provide enough information about current queue state.
> > > >
> > > > Same thing applies for TX.
> > > >
> > >
> > > After careful review the suggestion in this comment, and review the
> > existing dpdk source code.
> > > I came to realize that neither rte_ethdev_get_rx_queue_info,
> > > rte_ethdev_get_tx_queue_info, struct rte_eth_rx_queue_info and struct
> > > rte_eth_tx_queue_info are available in the existing dpdk source code. I
> > could not make a patch based upon a set of non- existent API and data
> > structure.
> >
> > Right now, in dpdk.org source code, struct  rte_eth_dev_ring_info, struct
> > rte_dev_eeprom_info and struct  rte_dev_reg_info don't exist also.
> > Same as  all these functions:
> >
> > rte_eth_dev_default_mac_addr_set
> > rte_eth_dev_reg_length
> > rte_eth_dev_reg_info
> > rte_eth_dev_eeprom_length
> > rte_eth_dev_get_eeprom
> > rte_eth_dev_set_eeprom
> > rte_eth_dev_get_ringparam
> >
> > All this is a new API that's you are trying to add.
> > But, by some reason you consider it is ok to add 'struct
> > rte_eth_dev_ring_info', but couldn't add  struct
> > 'rte_ethdev_get_tx_queue_info'
> > That just doesn't make any sense to me.
> > In fact, I think our conversation is going in cycles.
> > If you are not happy with the suggested approach, please provide some
> > meaningful reason why.
> > Konstantin
> 
> It seems the new API aims at providing users a mechanism to quickly and
> gracefully migrate from using ethtool/ioctl calls.

I am fine with that goal in general.
But it doesn't mean that all ethool API should be pushed into rte_ethdev layer.
That's why  a shim layer on top of rte_ethdev is created -
it's goal is to provide for the upper layer an ethool-like API and
hide actual implementation based on rte_ethdev API inside.

>  The provided get/set
> ring param info is very similar to that of ethtool and facilitates the
> ethtool needs.

If rte_ethtool shim layer has to provide get/set ring_param API as it is - that's ok with me.
Though I don't see any good reason why rte_ethdev layer API should restrict itself with exactly the same ethtool-like API.
As I said before - current practical usage of rte_eth_dev_get_ringparam() looks quite limited.
Probably it just me, but I don't see how user can conclude what is the device state,
using  information provided by rte_eth_dev_get_ringparam().

My concern is - if we'll introduce rte_eth_dev_get_ringparam() as it suggested by Larry now,
one of two things would happen:
- no one except ethtool shim layer would use it.
- people will complain that it doesn't provide desired information. 
So, in next release we'll have to either introduce a new function and support 2 functions doing similar things
(code duplication), or modify existing API (ABI breakage pain).

It would be much better to introduce a new rte_ethdev API here that would be generic enough
and fit common needs, not only one particular case (ethtool).
After all, I don't think the changes I suggesting  differ that much from current approach.
Konstantin

> While additional enhancements to the API to provide additional details
> such as those you have suggested are certainly possible, I believe Larry
> is stating those ideas are outside the scope he has intended with the
> API introduction and that they should be discussed further and delivered
> in a future patch.

> 
> Does that seem reasonable?
> 
> Thanks,
> Dave
> 
> >
> > >
> > > > > It's the desire of this patch to separate each data structure to
> > > > > avoid cross
> > > > dependency.
> > > >
> > > > That's too cryptic to me.
> > > > Could you explain what cross dependency you are talking about?
> > > >
> > > > Konstantin

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
@ 2015-06-15 15:01  3%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-15 15:01 UTC (permalink / raw)
  To: nhorman; +Cc: dev

2015-06-12 15:33, Helin Zhang:
> v3 changes:
> * Moved the newly added element right after 'uint16_t reta_size', where it
>   was a padding. So it will not break any ABI compatibility, and no need to
>   disable it by default.
[...]
> @@ -918,6 +918,7 @@ struct rte_eth_dev_info {
>  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
>  	uint16_t reta_size;
>  	/**< Device redirection table size, the total number of entries. */
> +	uint8_t hash_key_size; /**< Hash key size in bytes */
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>  	uint64_t flow_type_rss_offloads;
>  	struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */

Neil, what is your opinion?
Do you ack that this technique covers every ABI cases?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 4/4] ethdev: check support for rx_queue_count and descriptor_done fns
  @ 2015-06-15 10:14  4%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-15 10:14 UTC (permalink / raw)
  To: Roger B. Melton; +Cc: dev

On Fri, Jun 12, 2015 at 01:32:56PM -0400, Roger B. Melton wrote:
> Hi Bruce,  Comment in-line.  Regards, Roger
> 
> On 6/12/15 7:28 AM, Bruce Richardson wrote:
> >The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are
> >supported by very few PMDs. Therefore, it is best to check for support
> >for the functions in the ethdev library, so as to avoid crashes
> >at run-time if the application goes to use those APIs. The performance
> >impact of this change should be very small as this is a predictable
> >branch in the function.
> >
> >Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >---
> >  lib/librte_ether/rte_ethdev.h | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> >diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> >index 827ca3e..9ad1b6a 100644
> >--- a/lib/librte_ether/rte_ethdev.h
> >+++ b/lib/librte_ether/rte_ethdev.h
> >@@ -2496,6 +2496,8 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
> >   *  The queue id on the specific port.
> >   * @return
> >   *  The number of used descriptors in the specific queue.
> >+ *  NOTE: if function is not supported by device this call
> >+ *        returns (uint32_t)-ENOTSUP
> >   */
> >  static inline uint32_t
> 
> Why not change the return type to int32_t?
> In this way, the caller isn't required to make the assumption that a large
> queue count indicates an error.  < 0 means error, other wise it's a valid
> queue count.
> 
> This approach would be consistent with other APIs.
> 

Yes, good point, I should see about that. One thing I'm unsure of, though, is
does this count as ABI breakage? I don't see how it should break any older
apps, since the return type is the same size, but I'm not sure as we are 
changing the return type of the function.

Neil, can you perhaps comment here? Is changing uint32_t to int32_t ok, from
an ABI point of view?

Regards,
/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow director in VFs
  2015-06-12 16:45  3%   ` Thomas Monjalon
@ 2015-06-15  7:14  3%     ` Wu, Jingjing
  0 siblings, 0 replies; 200+ results
From: Wu, Jingjing @ 2015-06-15  7:14 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, neil.horman



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Saturday, June 13, 2015 12:45 AM
> To: Wu, Jingjing
> Cc: dev@dpdk.org; neil.horman@tuxdriver.com
> Subject: Re: [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow
> director in VFs
> 
> 2015-05-11 11:46, Jingjing Wu:
> > This patch extends struct rte_eth_fdir_flow_ext to support flow
> > director in VFs.
> >
> > Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> 
> > --- a/lib/librte_ether/rte_eth_ctrl.h
> > +++ b/lib/librte_ether/rte_eth_ctrl.h
> > @@ -394,6 +394,8 @@ struct rte_eth_fdir_flow_ext {
> >  	uint16_t vlan_tci;
> >  	uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
> >  	/**< It is filled by the flexible payload to match. */
> > +	uint8_t is_vf;   /**< 1 for VF, 0 for port dev */
> > +	uint16_t dst_id; /**< VF ID, available when is_vf is 1*/
> >  };
> 
> Isn't it breaking the ABI?

Yes, it is breaking the ABI. Will consider how to avoid that.

Thanks
Jingjing

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow director in VFs
  @ 2015-06-12 16:45  3%   ` Thomas Monjalon
  2015-06-15  7:14  3%     ` Wu, Jingjing
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-12 16:45 UTC (permalink / raw)
  To: Jingjing Wu; +Cc: dev, neil.horman

2015-05-11 11:46, Jingjing Wu:
> This patch extends struct rte_eth_fdir_flow_ext to support flow
> director in VFs.
> 
> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>

> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -394,6 +394,8 @@ struct rte_eth_fdir_flow_ext {
>  	uint16_t vlan_tci;
>  	uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
>  	/**< It is filled by the flexible payload to match. */
> +	uint8_t is_vf;   /**< 1 for VF, 0 for port dev */
> +	uint16_t dst_id; /**< VF ID, available when is_vf is 1*/
>  };

Isn't it breaking the ABI?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] rte_ring: remove deprecated functions
  2015-06-12  5:46  3%   ` Panu Matilainen
@ 2015-06-12 14:00  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-12 14:00 UTC (permalink / raw)
  To: Panu Matilainen; +Cc: dev, Stephen Hemminger

On Fri, Jun 12, 2015 at 08:46:55AM +0300, Panu Matilainen wrote:
> On 06/12/2015 08:18 AM, Stephen Hemminger wrote:
> >From: Stephen Hemminger <shemming@brocade.com>
> >
> >These were deprecated in 2.0 so remove them from 2.1
> >
> >Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >---
> >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> >  drivers/net/ring/rte_eth_ring_version.map |  2 --
> >  2 files changed, 57 deletions(-)
> >
> [...]
> >diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> >index 8ad107d..0875e25 100644
> >--- a/drivers/net/ring/rte_eth_ring_version.map
> >+++ b/drivers/net/ring/rte_eth_ring_version.map
> >@@ -2,8 +2,6 @@ DPDK_2.0 {
> >  	global:
> >
> >  	rte_eth_from_rings;
> >-	rte_eth_ring_pair_attach;
> >-	rte_eth_ring_pair_create;
> >
> >  	local: *;
> >  };
> 
> Removing symbols is an ABI break so it additionally requires a soname bump
> for this library.
> 
> In addition, simply due to being the first library to do so, it'll also then
> break the combined shared library as is currently is. Mind you, this is not
> an objection at all, the need to change to a linker script approach has
> always been a matter of time.
> 
> 	- Panu -
> 
Also, patch title should be prefixed with "ring pmd" or "drivers/net/ring" rather
than rte_ring, since this is a patch for the ring pmd, not the rte_ring library.

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 0/4] ethdev: Add checks for function support in driver
@ 2015-06-12 11:28  3% Bruce Richardson
    0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-06-12 11:28 UTC (permalink / raw)
  To: dev

The functions to check the current occupancy of the RX descriptor ring, and
to specifically query if a particular descriptor is ready to be received, are
used for load monitoring on the data path, and so are inline functions inside
rte_ethdev.h. However, these functions are not implemented for the majority of
drivers, so their use can cause crashes in applications as there are no checks
for a function call with a NULL pointer.

This patchset attempts to fix this by putting in place the minimal check
for a NULL pointer needed before calling the function and thereby avoid any
crashes. The functions now also have a new possible return code of -ENOTSUP
but no return type changes, so ABI is preserved.

As part of the patchset, some additional cleanup is performed. The two functions
in question, as well as the rx and tx burst functions actually have two copies in
the ethdev library - one in the header and a debug version in the C file. This
patchset removes this duplication by merging the debug version into the header
file version. Build-time and runtime behaviour was preserved as part of this
merge.

NOTE: This patchset depends on Stephen Hemminger's patch to make
rte_eth_dev_is_valid_port() public: http://dpdk.org/dev/patchwork/patch/5373/ 

Bruce Richardson (4):
  ethdev: rename macros to have RTE_ETH prefix
  ethdev: move RTE_ETH_FPTR_OR_ERR macros to header
  ethdev: remove duplicated debug functions
  ethdev: check support for rx_queue_count and descriptor_done fns

 lib/librte_ether/rte_ethdev.c | 626 ++++++++++++++++++------------------------
 lib/librte_ether/rte_ethdev.h | 112 +++++---
 2 files changed, 345 insertions(+), 393 deletions(-)

-- 
2.4.2

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 0/6] query hash key size in byte
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (5 preceding siblings ...)
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 6/6] app/testpmd: show " Helin Zhang
@ 2015-06-12  9:31  0%     ` Ananyev, Konstantin
  6 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-12  9:31 UTC (permalink / raw)
  To: Zhang, Helin, dev



> -----Original Message-----
> From: Zhang, Helin
> Sent: Friday, June 12, 2015 8:34 AM
> To: dev@dpdk.org
> Cc: Cao, Min; Xu, Qian Q; Cao, Waterman; Ananyev, Konstantin; Zhang, Helin
> Subject: [PATCH v3 0/6] query hash key size in byte
> 
> As different hardware has different hash key sizes, querying it (in byte)
> per port was asked by users. Otherwise there is no convenient way to know
> the size of hash key which should be prepared.
> 
> v2 changes:
> * Disabled the code changes by default, to avoid breaking ABI compatibility.
> 
> v3 changes:
> * Moved the newly added element right after 'uint16_t reta_size', where it
>   was a padding. So it will not break any ABI compatibility, and no need to
>   disable the code changes by default.
> 
> Helin Zhang (6):
>   ethdev: add an field for querying hash key size
>   e1000: fill the hash key size
>   fm10k: fill the hash key size
>   i40e: fill the hash key size
>   ixgbe: fill the hash key size
>   app/testpmd: show the hash key size
> 
>  app/test-pmd/config.c             | 2 ++
>  drivers/net/e1000/igb_ethdev.c    | 3 +++
>  drivers/net/fm10k/fm10k_ethdev.c  | 1 +
>  drivers/net/i40e/i40e_ethdev.c    | 2 ++
>  drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
>  drivers/net/ixgbe/ixgbe_ethdev.c  | 3 +++
>  lib/librte_ether/rte_ethdev.h     | 1 +
>  7 files changed, 14 insertions(+)
> 

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] pmd: change initialization to indicate pci drivers
  2015-05-29 15:47  3% [dpdk-dev] [PATCH] pmd: change initialization to indicate pci drivers Stephen Hemminger
@ 2015-06-12  9:13  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-12  9:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2015-05-29 08:47, Stephen Hemminger:
> Upcoming drivers will need to be able to support other bus types.
> This is a transparent change to how struct eth_driver is initialized.
> It has not function or ABI layout impact, but makes adding a later
> bus type (Xen, Hyper-V, ...) much easier.
> 
> Signed-off-by: Stpehen Hemminger <stephen@networkplumber.org>

There is a (fixed) typo in this line. Please use --signoff.

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied, thanks

Next step would be to avoid this strange assumption:
struct eth_driver {
    struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  8:28  3%                     ` Zhang, Helin
  2015-06-12  9:00  3%                       ` Panu Matilainen
@ 2015-06-12  9:07  4%                       ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-12  9:07 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Fri, Jun 12, 2015 at 08:28:55AM +0000, Zhang, Helin wrote:
> 
> 
> > -----Original Message-----
> > From: Panu Matilainen [mailto:pmatilai@redhat.com]
> > Sent: Friday, June 12, 2015 4:15 PM
> > To: Zhang, Helin; Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim;
> > nhorman@tuxdriver.com
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> > rte_mbuf
> > 
> > On 06/12/2015 10:43 AM, Zhang, Helin wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> > >> Sent: Friday, June 12, 2015 3:24 PM
> > >> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
> > >> nhorman@tuxdriver.com
> > >> Cc: dev@dpdk.org
> > >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> > >> in rte_mbuf
> > >>
> > >> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> > >>> 2015-06-10 16:32, Olivier MATZ:
> > >>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > >>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> > >>>>>>> In order to unify the packet type, the field of 'packet_type' in
> > >>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> > >>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized
> > >>>>>>> to support this change for Vector PMD. As 'struct rte_kni_mbuf'
> > >>>>>>> for KNI should be right mapped to 'struct rte_mbuf', it should
> > >>>>>>> be modified accordingly. In addition, Vector PMD of ixgbe is
> > >>>>>>> disabled by default, as 'struct rte_mbuf' changed.
> > >>>>>>> To avoid breaking ABI compatibility, all the changes would be
> > >>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> > >>>>>>
> > >>>>>> What are the plans for this compile-time option in the future?
> > >>>>>>
> > >>>>>> I wonder what are the benefits of having this option in terms of
> > >>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
> > >>>>>> the packet-type feature is not present, and when it is enabled we
> > >>>>>> have the feature but it breaks the compatibility.
> > >>>>>>
> > >>>>>> In my opinion, the v5 is preferable: for this kind of features, I
> > >>>>>> don't see how the ABI can be preserved, and I think packet-type
> > >>>>>> won't be the only feature that will modify the mbuf structure. I
> > >>>>>> think the process described here should be applied:
> > >>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> > >>>>>>
> > >>>>>> (starting from "Some ABI changes may be too significant to
> > >>>>>> reasonably maintain multiple versions of").
> > >>>>>
> > >>>>> This is just like the change that Steve (Cunming) Liang submitted
> > >>>>> for Interrupt Mode. We have the same problem in both cases: we
> > >>>>> want to find a way to get the features included, but need to
> > >>>>> comply with our ABI policy. So, in both cases, the proposal is to
> > >>>>> add a config option to enable the change by default, so we
> > >>>>> maintain backward
> > >> compatibility.
> > >>>>> Users that want these changes, and are willing to accept the
> > >>>>> associated ABI change, have to specifically enable them.
> > >>>>>
> > >>>>> We can note in the Deprecation Notices in the Release Notes for
> > >>>>> 2.1 that these config options will be removed in 2.2. The features
> > >>>>> will then be enabled by default.
> > >>>>>
> > >>>>> This seems like a good compromise which allows us to get these
> > >>>>> changes into 2.1 but avoids breaking the ABI policy.
> > >>>>
> > >>>> Sorry for the late answer.
> > >>>>
> > >>>> After some thoughts on this topic, I understand that having a
> > >>>> compile-time option is perhaps a good compromise between keeping
> > >>>> compatibility and having new features earlier.
> > >>>>
> > >>>> I'm just afraid about having one #ifdef in the code for each new
> > >>>> feature that cannot keep the ABI compatibility.
> > >>>> What do you think about having one option -- let's call it
> > >>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that
> > >>>> would surround any new feature that breaks the ABI?
> > >>>>
> > >>>> This would have several advantages:
> > >>>> - only 2 cases (on or off), the combinatorial is smaller than
> > >>>>      having one option per feature
> > >>>> - all next features breaking the abi can be identified by a grep
> > >>>> - the code inside the #ifdef can be enabled in a simple operation
> > >>>>      by Thomas after each release.
> > >>>>
> > >>>> Thomas, any comment?
> > >>>
> > >>> As previously discussed (1to1) with Olivier, I think that's a good
> > >>> proposal to introduce changes breaking deeply the ABI.
> > >>>
> > >>> Let's sum up the current policy:
> > >>> 1/ For changes which have a limited impact on the ABI, the backward
> > >>> compatibility must be kept during 1 release including the notice in
> > >> doc/guides/rel_notes/abi.rst.
> > >>> 2/ For important changes like mbuf rework, there was an agreement on
> > >>> skipping the backward compatibility after having 3 acknowledgements
> > >>> and an
> > >> 1-release long notice.
> > >>> Then the ABI numbering must be incremented.
> > >>>
> > >>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the
> > >>> second
> > >> case.
> > >>> In order to be adopted, a patch for the file
> > >>> doc/guides/rel_notes/abi.rst must be submitted and strongly
> > acknowledged.
> > >>>
> > >>> The ABI numbering must be also clearly explained:
> > >>> 1/ Should we have different libraries version number depending of
> > >> CONFIG_RTE_NEXT_ABI?
> > >>> It seems straightforward to use "ifeq" when LIBABIVER in the
> > >>> Makefiles
> > >>
> > >> An incompatible ABI must be reflected by a soname change, otherwise
> > >> the whole library versioning is irrelevant.
> > >>
> > >>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
> > >> the .map files?
> > >>> Maybe we should remove these files and generate them with some
> > >> preprocessing.
> > >>>
> > >>> Neil, as the ABI policy author, what is your opinion?
> > >>
> > >> I'm not Neil but my 5c...
> > >>
> > >> Working around ABI compatibility policy via config options seems like
> > >> a slippery slope. Going forward this will likely mean there are
> > >> always two different ABIs for any given version, and the thought of
> > >> keeping track of it all in a truly compatible manner makes my head hurt.
> > >>
> > >> That said its easy to understand the desire to move faster than the
> > >> ABI policy allows. In a project where so many structs are in the open
> > >> it gets hard to do much anything at all without breaking the ABI.
> > >>
> > >> The issue could be mitigated somewhat by reserving some space at the
> > >> end of the structs eg when the ABI needs to be changed anyway, but it
> > >> has obvious downsides as well. The other options I see tend to
> > >> revolve around changing release policies one way or the other:
> > >> releasing ABI compatible micro versions between minor versions and
> > >> relaxing the ABI policy a bit, or just releasing new minor versions more often
> > than the current cycle.
> > >>
> > >> 	- Panu -
> > >
> > > Does it mean releasing R2.01 right now with announcement of all ABI
> > > changes, which based on R2.0 first, and then releasing R2.1 several weeks later
> > with all the code changes?
> > 
> > Something like that, but I'd think its too late for any big release model / policy
> > changes for this particular cycle.
> > 
> > I also do not want to undermine the ABI policy we just got in place, but since
> > people are actively looking for ways to work around it anyway its better to map
> > out all the possibilities. One of them is committing to longer term maintenance of
> > releases (via ABI compatible micro version updates), another one is shortening
> > the cycles. Both achieve roughly the same goals with differences in emphasis
> > perhaps, but more releases requires more resources on maintaining, testing etc
> > so...
> R2.01 could just have all the same of R2.0, with an additional ABI announcement.
> Then nothing needs to be tested.
> 
> - Helin
> 
Then it would be a paper exercise just to bypass an ABI policy, so NACK to that idea.

If (and it's a fairly big if) we do decide we need longer-term maintenance 
branches for maintaining ABI, then we need to do it properly.
This may including doing things liek back-porting relevant (maybe all) features from
later releases that don't break the ABI to the supported version. Bug fixes would
obviously have to be backported.

However, the overhead of this is obvious, since we would now have multiple development
lines to be maintained. 

Regards,
/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  8:28  3%                     ` Zhang, Helin
@ 2015-06-12  9:00  3%                       ` Panu Matilainen
  2015-06-12  9:07  4%                       ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-12  9:00 UTC (permalink / raw)
  To: Zhang, Helin, Thomas Monjalon, Olivier MATZ, O'Driscoll, Tim,
	nhorman
  Cc: dev

On 06/12/2015 11:28 AM, Zhang, Helin wrote:
>
>
>> -----Original Message-----
>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>> Sent: Friday, June 12, 2015 4:15 PM
>> To: Zhang, Helin; Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim;
>> nhorman@tuxdriver.com
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
>> rte_mbuf
>>
>> On 06/12/2015 10:43 AM, Zhang, Helin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>>>> Sent: Friday, June 12, 2015 3:24 PM
>>>> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
>>>> nhorman@tuxdriver.com
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
>>>> in rte_mbuf
>>>>
>>>> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
>>>>> 2015-06-10 16:32, Olivier MATZ:
>>>>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>>>>>>>> In order to unify the packet type, the field of 'packet_type' in
>>>>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>>>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized
>>>>>>>>> to support this change for Vector PMD. As 'struct rte_kni_mbuf'
>>>>>>>>> for KNI should be right mapped to 'struct rte_mbuf', it should
>>>>>>>>> be modified accordingly. In addition, Vector PMD of ixgbe is
>>>>>>>>> disabled by default, as 'struct rte_mbuf' changed.
>>>>>>>>> To avoid breaking ABI compatibility, all the changes would be
>>>>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>>>>>>>
>>>>>>>> What are the plans for this compile-time option in the future?
>>>>>>>>
>>>>>>>> I wonder what are the benefits of having this option in terms of
>>>>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
>>>>>>>> the packet-type feature is not present, and when it is enabled we
>>>>>>>> have the feature but it breaks the compatibility.
>>>>>>>>
>>>>>>>> In my opinion, the v5 is preferable: for this kind of features, I
>>>>>>>> don't see how the ABI can be preserved, and I think packet-type
>>>>>>>> won't be the only feature that will modify the mbuf structure. I
>>>>>>>> think the process described here should be applied:
>>>>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>>>>>>>
>>>>>>>> (starting from "Some ABI changes may be too significant to
>>>>>>>> reasonably maintain multiple versions of").
>>>>>>>
>>>>>>> This is just like the change that Steve (Cunming) Liang submitted
>>>>>>> for Interrupt Mode. We have the same problem in both cases: we
>>>>>>> want to find a way to get the features included, but need to
>>>>>>> comply with our ABI policy. So, in both cases, the proposal is to
>>>>>>> add a config option to enable the change by default, so we
>>>>>>> maintain backward
>>>> compatibility.
>>>>>>> Users that want these changes, and are willing to accept the
>>>>>>> associated ABI change, have to specifically enable them.
>>>>>>>
>>>>>>> We can note in the Deprecation Notices in the Release Notes for
>>>>>>> 2.1 that these config options will be removed in 2.2. The features
>>>>>>> will then be enabled by default.
>>>>>>>
>>>>>>> This seems like a good compromise which allows us to get these
>>>>>>> changes into 2.1 but avoids breaking the ABI policy.
>>>>>>
>>>>>> Sorry for the late answer.
>>>>>>
>>>>>> After some thoughts on this topic, I understand that having a
>>>>>> compile-time option is perhaps a good compromise between keeping
>>>>>> compatibility and having new features earlier.
>>>>>>
>>>>>> I'm just afraid about having one #ifdef in the code for each new
>>>>>> feature that cannot keep the ABI compatibility.
>>>>>> What do you think about having one option -- let's call it
>>>>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that
>>>>>> would surround any new feature that breaks the ABI?
>>>>>>
>>>>>> This would have several advantages:
>>>>>> - only 2 cases (on or off), the combinatorial is smaller than
>>>>>>       having one option per feature
>>>>>> - all next features breaking the abi can be identified by a grep
>>>>>> - the code inside the #ifdef can be enabled in a simple operation
>>>>>>       by Thomas after each release.
>>>>>>
>>>>>> Thomas, any comment?
>>>>>
>>>>> As previously discussed (1to1) with Olivier, I think that's a good
>>>>> proposal to introduce changes breaking deeply the ABI.
>>>>>
>>>>> Let's sum up the current policy:
>>>>> 1/ For changes which have a limited impact on the ABI, the backward
>>>>> compatibility must be kept during 1 release including the notice in
>>>> doc/guides/rel_notes/abi.rst.
>>>>> 2/ For important changes like mbuf rework, there was an agreement on
>>>>> skipping the backward compatibility after having 3 acknowledgements
>>>>> and an
>>>> 1-release long notice.
>>>>> Then the ABI numbering must be incremented.
>>>>>
>>>>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the
>>>>> second
>>>> case.
>>>>> In order to be adopted, a patch for the file
>>>>> doc/guides/rel_notes/abi.rst must be submitted and strongly
>> acknowledged.
>>>>>
>>>>> The ABI numbering must be also clearly explained:
>>>>> 1/ Should we have different libraries version number depending of
>>>> CONFIG_RTE_NEXT_ABI?
>>>>> It seems straightforward to use "ifeq" when LIBABIVER in the
>>>>> Makefiles
>>>>
>>>> An incompatible ABI must be reflected by a soname change, otherwise
>>>> the whole library versioning is irrelevant.
>>>>
>>>>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
>>>> the .map files?
>>>>> Maybe we should remove these files and generate them with some
>>>> preprocessing.
>>>>>
>>>>> Neil, as the ABI policy author, what is your opinion?
>>>>
>>>> I'm not Neil but my 5c...
>>>>
>>>> Working around ABI compatibility policy via config options seems like
>>>> a slippery slope. Going forward this will likely mean there are
>>>> always two different ABIs for any given version, and the thought of
>>>> keeping track of it all in a truly compatible manner makes my head hurt.
>>>>
>>>> That said its easy to understand the desire to move faster than the
>>>> ABI policy allows. In a project where so many structs are in the open
>>>> it gets hard to do much anything at all without breaking the ABI.
>>>>
>>>> The issue could be mitigated somewhat by reserving some space at the
>>>> end of the structs eg when the ABI needs to be changed anyway, but it
>>>> has obvious downsides as well. The other options I see tend to
>>>> revolve around changing release policies one way or the other:
>>>> releasing ABI compatible micro versions between minor versions and
>>>> relaxing the ABI policy a bit, or just releasing new minor versions more often
>> than the current cycle.
>>>>
>>>> 	- Panu -
>>>
>>> Does it mean releasing R2.01 right now with announcement of all ABI
>>> changes, which based on R2.0 first, and then releasing R2.1 several weeks later
>> with all the code changes?
>>
>> Something like that, but I'd think its too late for any big release model / policy
>> changes for this particular cycle.
>>
>> I also do not want to undermine the ABI policy we just got in place, but since
>> people are actively looking for ways to work around it anyway its better to map
>> out all the possibilities. One of them is committing to longer term maintenance of
>> releases (via ABI compatible micro version updates), another one is shortening
>> the cycles. Both achieve roughly the same goals with differences in emphasis
>> perhaps, but more releases requires more resources on maintaining, testing etc
>> so...
> R2.01 could just have all the same of R2.0, with an additional ABI announcement.
> Then nothing needs to be tested.

That's also entirely missing the point of having an ABI policy in the 
first place. Its purpose is not to force people to find loopholes in the 
policy but for the benefit of other developers building apps on top of DPDK.

	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  8:15  4%                   ` Panu Matilainen
@ 2015-06-12  8:28  3%                     ` Zhang, Helin
  2015-06-12  9:00  3%                       ` Panu Matilainen
  2015-06-12  9:07  4%                       ` Bruce Richardson
  0 siblings, 2 replies; 200+ results
From: Zhang, Helin @ 2015-06-12  8:28 UTC (permalink / raw)
  To: Panu Matilainen, Thomas Monjalon, Olivier MATZ, O'Driscoll,
	Tim, nhorman
  Cc: dev



> -----Original Message-----
> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> Sent: Friday, June 12, 2015 4:15 PM
> To: Zhang, Helin; Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim;
> nhorman@tuxdriver.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> On 06/12/2015 10:43 AM, Zhang, Helin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> >> Sent: Friday, June 12, 2015 3:24 PM
> >> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
> >> nhorman@tuxdriver.com
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> >> in rte_mbuf
> >>
> >> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> >>> 2015-06-10 16:32, Olivier MATZ:
> >>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>>>>>> In order to unify the packet type, the field of 'packet_type' in
> >>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized
> >>>>>>> to support this change for Vector PMD. As 'struct rte_kni_mbuf'
> >>>>>>> for KNI should be right mapped to 'struct rte_mbuf', it should
> >>>>>>> be modified accordingly. In addition, Vector PMD of ixgbe is
> >>>>>>> disabled by default, as 'struct rte_mbuf' changed.
> >>>>>>> To avoid breaking ABI compatibility, all the changes would be
> >>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>>>>>
> >>>>>> What are the plans for this compile-time option in the future?
> >>>>>>
> >>>>>> I wonder what are the benefits of having this option in terms of
> >>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
> >>>>>> the packet-type feature is not present, and when it is enabled we
> >>>>>> have the feature but it breaks the compatibility.
> >>>>>>
> >>>>>> In my opinion, the v5 is preferable: for this kind of features, I
> >>>>>> don't see how the ABI can be preserved, and I think packet-type
> >>>>>> won't be the only feature that will modify the mbuf structure. I
> >>>>>> think the process described here should be applied:
> >>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>>>>>
> >>>>>> (starting from "Some ABI changes may be too significant to
> >>>>>> reasonably maintain multiple versions of").
> >>>>>
> >>>>> This is just like the change that Steve (Cunming) Liang submitted
> >>>>> for Interrupt Mode. We have the same problem in both cases: we
> >>>>> want to find a way to get the features included, but need to
> >>>>> comply with our ABI policy. So, in both cases, the proposal is to
> >>>>> add a config option to enable the change by default, so we
> >>>>> maintain backward
> >> compatibility.
> >>>>> Users that want these changes, and are willing to accept the
> >>>>> associated ABI change, have to specifically enable them.
> >>>>>
> >>>>> We can note in the Deprecation Notices in the Release Notes for
> >>>>> 2.1 that these config options will be removed in 2.2. The features
> >>>>> will then be enabled by default.
> >>>>>
> >>>>> This seems like a good compromise which allows us to get these
> >>>>> changes into 2.1 but avoids breaking the ABI policy.
> >>>>
> >>>> Sorry for the late answer.
> >>>>
> >>>> After some thoughts on this topic, I understand that having a
> >>>> compile-time option is perhaps a good compromise between keeping
> >>>> compatibility and having new features earlier.
> >>>>
> >>>> I'm just afraid about having one #ifdef in the code for each new
> >>>> feature that cannot keep the ABI compatibility.
> >>>> What do you think about having one option -- let's call it
> >>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that
> >>>> would surround any new feature that breaks the ABI?
> >>>>
> >>>> This would have several advantages:
> >>>> - only 2 cases (on or off), the combinatorial is smaller than
> >>>>      having one option per feature
> >>>> - all next features breaking the abi can be identified by a grep
> >>>> - the code inside the #ifdef can be enabled in a simple operation
> >>>>      by Thomas after each release.
> >>>>
> >>>> Thomas, any comment?
> >>>
> >>> As previously discussed (1to1) with Olivier, I think that's a good
> >>> proposal to introduce changes breaking deeply the ABI.
> >>>
> >>> Let's sum up the current policy:
> >>> 1/ For changes which have a limited impact on the ABI, the backward
> >>> compatibility must be kept during 1 release including the notice in
> >> doc/guides/rel_notes/abi.rst.
> >>> 2/ For important changes like mbuf rework, there was an agreement on
> >>> skipping the backward compatibility after having 3 acknowledgements
> >>> and an
> >> 1-release long notice.
> >>> Then the ABI numbering must be incremented.
> >>>
> >>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the
> >>> second
> >> case.
> >>> In order to be adopted, a patch for the file
> >>> doc/guides/rel_notes/abi.rst must be submitted and strongly
> acknowledged.
> >>>
> >>> The ABI numbering must be also clearly explained:
> >>> 1/ Should we have different libraries version number depending of
> >> CONFIG_RTE_NEXT_ABI?
> >>> It seems straightforward to use "ifeq" when LIBABIVER in the
> >>> Makefiles
> >>
> >> An incompatible ABI must be reflected by a soname change, otherwise
> >> the whole library versioning is irrelevant.
> >>
> >>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
> >> the .map files?
> >>> Maybe we should remove these files and generate them with some
> >> preprocessing.
> >>>
> >>> Neil, as the ABI policy author, what is your opinion?
> >>
> >> I'm not Neil but my 5c...
> >>
> >> Working around ABI compatibility policy via config options seems like
> >> a slippery slope. Going forward this will likely mean there are
> >> always two different ABIs for any given version, and the thought of
> >> keeping track of it all in a truly compatible manner makes my head hurt.
> >>
> >> That said its easy to understand the desire to move faster than the
> >> ABI policy allows. In a project where so many structs are in the open
> >> it gets hard to do much anything at all without breaking the ABI.
> >>
> >> The issue could be mitigated somewhat by reserving some space at the
> >> end of the structs eg when the ABI needs to be changed anyway, but it
> >> has obvious downsides as well. The other options I see tend to
> >> revolve around changing release policies one way or the other:
> >> releasing ABI compatible micro versions between minor versions and
> >> relaxing the ABI policy a bit, or just releasing new minor versions more often
> than the current cycle.
> >>
> >> 	- Panu -
> >
> > Does it mean releasing R2.01 right now with announcement of all ABI
> > changes, which based on R2.0 first, and then releasing R2.1 several weeks later
> with all the code changes?
> 
> Something like that, but I'd think its too late for any big release model / policy
> changes for this particular cycle.
> 
> I also do not want to undermine the ABI policy we just got in place, but since
> people are actively looking for ways to work around it anyway its better to map
> out all the possibilities. One of them is committing to longer term maintenance of
> releases (via ABI compatible micro version updates), another one is shortening
> the cycles. Both achieve roughly the same goals with differences in emphasis
> perhaps, but more releases requires more resources on maintaining, testing etc
> so...
R2.01 could just have all the same of R2.0, with an additional ABI announcement.
Then nothing needs to be tested.

- Helin

> 
> 	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  7:43  3%                 ` Zhang, Helin
@ 2015-06-12  8:15  4%                   ` Panu Matilainen
  2015-06-12  8:28  3%                     ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-12  8:15 UTC (permalink / raw)
  To: Zhang, Helin, Thomas Monjalon, Olivier MATZ, O'Driscoll, Tim,
	nhorman
  Cc: dev

On 06/12/2015 10:43 AM, Zhang, Helin wrote:
>
>
>> -----Original Message-----
>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>> Sent: Friday, June 12, 2015 3:24 PM
>> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
>> nhorman@tuxdriver.com
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
>> rte_mbuf
>>
>> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
>>> 2015-06-10 16:32, Olivier MATZ:
>>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>>>>>> In order to unify the packet type, the field of 'packet_type' in
>>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
>>>>>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
>>>>>>> KNI should be right mapped to 'struct rte_mbuf', it should be
>>>>>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
>>>>>>> by default, as 'struct rte_mbuf' changed.
>>>>>>> To avoid breaking ABI compatibility, all the changes would be
>>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>>>>>
>>>>>> What are the plans for this compile-time option in the future?
>>>>>>
>>>>>> I wonder what are the benefits of having this option in terms of
>>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
>>>>>> the packet-type feature is not present, and when it is enabled we
>>>>>> have the feature but it breaks the compatibility.
>>>>>>
>>>>>> In my opinion, the v5 is preferable: for this kind of features, I
>>>>>> don't see how the ABI can be preserved, and I think packet-type
>>>>>> won't be the only feature that will modify the mbuf structure. I
>>>>>> think the process described here should be applied:
>>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>>>>>
>>>>>> (starting from "Some ABI changes may be too significant to
>>>>>> reasonably maintain multiple versions of").
>>>>>
>>>>> This is just like the change that Steve (Cunming) Liang submitted
>>>>> for Interrupt Mode. We have the same problem in both cases: we want
>>>>> to find a way to get the features included, but need to comply with
>>>>> our ABI policy. So, in both cases, the proposal is to add a config
>>>>> option to enable the change by default, so we maintain backward
>> compatibility.
>>>>> Users that want these changes, and are willing to accept the
>>>>> associated ABI change, have to specifically enable them.
>>>>>
>>>>> We can note in the Deprecation Notices in the Release Notes for 2.1
>>>>> that these config options will be removed in 2.2. The features will
>>>>> then be enabled by default.
>>>>>
>>>>> This seems like a good compromise which allows us to get these
>>>>> changes into 2.1 but avoids breaking the ABI policy.
>>>>
>>>> Sorry for the late answer.
>>>>
>>>> After some thoughts on this topic, I understand that having a
>>>> compile-time option is perhaps a good compromise between keeping
>>>> compatibility and having new features earlier.
>>>>
>>>> I'm just afraid about having one #ifdef in the code for each new
>>>> feature that cannot keep the ABI compatibility.
>>>> What do you think about having one option -- let's call it
>>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would
>>>> surround any new feature that breaks the ABI?
>>>>
>>>> This would have several advantages:
>>>> - only 2 cases (on or off), the combinatorial is smaller than
>>>>      having one option per feature
>>>> - all next features breaking the abi can be identified by a grep
>>>> - the code inside the #ifdef can be enabled in a simple operation
>>>>      by Thomas after each release.
>>>>
>>>> Thomas, any comment?
>>>
>>> As previously discussed (1to1) with Olivier, I think that's a good
>>> proposal to introduce changes breaking deeply the ABI.
>>>
>>> Let's sum up the current policy:
>>> 1/ For changes which have a limited impact on the ABI, the backward
>>> compatibility must be kept during 1 release including the notice in
>> doc/guides/rel_notes/abi.rst.
>>> 2/ For important changes like mbuf rework, there was an agreement on
>>> skipping the backward compatibility after having 3 acknowledgements and an
>> 1-release long notice.
>>> Then the ABI numbering must be incremented.
>>>
>>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second
>> case.
>>> In order to be adopted, a patch for the file
>>> doc/guides/rel_notes/abi.rst must be submitted and strongly acknowledged.
>>>
>>> The ABI numbering must be also clearly explained:
>>> 1/ Should we have different libraries version number depending of
>> CONFIG_RTE_NEXT_ABI?
>>> It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles
>>
>> An incompatible ABI must be reflected by a soname change, otherwise the
>> whole library versioning is irrelevant.
>>
>>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
>> the .map files?
>>> Maybe we should remove these files and generate them with some
>> preprocessing.
>>>
>>> Neil, as the ABI policy author, what is your opinion?
>>
>> I'm not Neil but my 5c...
>>
>> Working around ABI compatibility policy via config options seems like a slippery
>> slope. Going forward this will likely mean there are always two different ABIs for
>> any given version, and the thought of keeping track of it all in a truly compatible
>> manner makes my head hurt.
>>
>> That said its easy to understand the desire to move faster than the ABI policy
>> allows. In a project where so many structs are in the open it gets hard to do much
>> anything at all without breaking the ABI.
>>
>> The issue could be mitigated somewhat by reserving some space at the end of
>> the structs eg when the ABI needs to be changed anyway, but it has obvious
>> downsides as well. The other options I see tend to revolve around changing
>> release policies one way or the other: releasing ABI compatible micro versions
>> between minor versions and relaxing the ABI policy a bit, or just releasing new
>> minor versions more often than the current cycle.
>>
>> 	- Panu -
>
> Does it mean releasing R2.01 right now with announcement of all ABI changes, which
> based on R2.0 first, and then releasing R2.1 several weeks later with all the code changes?

Something like that, but I'd think its too late for any big release 
model / policy changes for this particular cycle.

I also do not want to undermine the ABI policy we just got in place, but 
since people are actively looking for ways to work around it anyway its 
better to map out all the possibilities. One of them is committing to 
longer term maintenance of releases (via ABI compatible micro version 
updates), another one is shortening the cycles. Both achieve roughly the 
same goals with differences in emphasis perhaps, but more releases 
requires more resources on maintaining, testing etc so...

	- Panu -

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  7:24  5%               ` Panu Matilainen
@ 2015-06-12  7:43  3%                 ` Zhang, Helin
  2015-06-12  8:15  4%                   ` Panu Matilainen
  0 siblings, 1 reply; 200+ results
From: Zhang, Helin @ 2015-06-12  7:43 UTC (permalink / raw)
  To: Panu Matilainen, Thomas Monjalon, Olivier MATZ, O'Driscoll,
	Tim, nhorman
  Cc: dev



> -----Original Message-----
> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> Sent: Friday, June 12, 2015 3:24 PM
> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
> nhorman@tuxdriver.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> > 2015-06-10 16:32, Olivier MATZ:
> >> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>>>> In order to unify the packet type, the field of 'packet_type' in
> >>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> >>>>> KNI should be right mapped to 'struct rte_mbuf', it should be
> >>>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> >>>>> by default, as 'struct rte_mbuf' changed.
> >>>>> To avoid breaking ABI compatibility, all the changes would be
> >>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>>>
> >>>> What are the plans for this compile-time option in the future?
> >>>>
> >>>> I wonder what are the benefits of having this option in terms of
> >>>> ABI compatibility: when it is disabled, it is ABI-compatible but
> >>>> the packet-type feature is not present, and when it is enabled we
> >>>> have the feature but it breaks the compatibility.
> >>>>
> >>>> In my opinion, the v5 is preferable: for this kind of features, I
> >>>> don't see how the ABI can be preserved, and I think packet-type
> >>>> won't be the only feature that will modify the mbuf structure. I
> >>>> think the process described here should be applied:
> >>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>>>
> >>>> (starting from "Some ABI changes may be too significant to
> >>>> reasonably maintain multiple versions of").
> >>>
> >>> This is just like the change that Steve (Cunming) Liang submitted
> >>> for Interrupt Mode. We have the same problem in both cases: we want
> >>> to find a way to get the features included, but need to comply with
> >>> our ABI policy. So, in both cases, the proposal is to add a config
> >>> option to enable the change by default, so we maintain backward
> compatibility.
> >>> Users that want these changes, and are willing to accept the
> >>> associated ABI change, have to specifically enable them.
> >>>
> >>> We can note in the Deprecation Notices in the Release Notes for 2.1
> >>> that these config options will be removed in 2.2. The features will
> >>> then be enabled by default.
> >>>
> >>> This seems like a good compromise which allows us to get these
> >>> changes into 2.1 but avoids breaking the ABI policy.
> >>
> >> Sorry for the late answer.
> >>
> >> After some thoughts on this topic, I understand that having a
> >> compile-time option is perhaps a good compromise between keeping
> >> compatibility and having new features earlier.
> >>
> >> I'm just afraid about having one #ifdef in the code for each new
> >> feature that cannot keep the ABI compatibility.
> >> What do you think about having one option -- let's call it
> >> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would
> >> surround any new feature that breaks the ABI?
> >>
> >> This would have several advantages:
> >> - only 2 cases (on or off), the combinatorial is smaller than
> >>     having one option per feature
> >> - all next features breaking the abi can be identified by a grep
> >> - the code inside the #ifdef can be enabled in a simple operation
> >>     by Thomas after each release.
> >>
> >> Thomas, any comment?
> >
> > As previously discussed (1to1) with Olivier, I think that's a good
> > proposal to introduce changes breaking deeply the ABI.
> >
> > Let's sum up the current policy:
> > 1/ For changes which have a limited impact on the ABI, the backward
> > compatibility must be kept during 1 release including the notice in
> doc/guides/rel_notes/abi.rst.
> > 2/ For important changes like mbuf rework, there was an agreement on
> > skipping the backward compatibility after having 3 acknowledgements and an
> 1-release long notice.
> > Then the ABI numbering must be incremented.
> >
> > This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second
> case.
> > In order to be adopted, a patch for the file
> > doc/guides/rel_notes/abi.rst must be submitted and strongly acknowledged.
> >
> > The ABI numbering must be also clearly explained:
> > 1/ Should we have different libraries version number depending of
> CONFIG_RTE_NEXT_ABI?
> > It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles
> 
> An incompatible ABI must be reflected by a soname change, otherwise the
> whole library versioning is irrelevant.
> 
> > 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
> the .map files?
> > Maybe we should remove these files and generate them with some
> preprocessing.
> >
> > Neil, as the ABI policy author, what is your opinion?
> 
> I'm not Neil but my 5c...
> 
> Working around ABI compatibility policy via config options seems like a slippery
> slope. Going forward this will likely mean there are always two different ABIs for
> any given version, and the thought of keeping track of it all in a truly compatible
> manner makes my head hurt.
> 
> That said its easy to understand the desire to move faster than the ABI policy
> allows. In a project where so many structs are in the open it gets hard to do much
> anything at all without breaking the ABI.
> 
> The issue could be mitigated somewhat by reserving some space at the end of
> the structs eg when the ABI needs to be changed anyway, but it has obvious
> downsides as well. The other options I see tend to revolve around changing
> release policies one way or the other: releasing ABI compatible micro versions
> between minor versions and relaxing the ABI policy a bit, or just releasing new
> minor versions more often than the current cycle.
> 
> 	- Panu -

Does it mean releasing R2.01 right now with announcement of all ABI changes, which
based on R2.0 first, and then releasing R2.1 several weeks later with all the code changes?

- Helin


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 3/6] fm10k: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 2/6] e1000: fill the " Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Helin Zhang
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c | 1 +
 1 file changed, 1 insertion(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 87852ed..bd626ce 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -767,6 +767,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		DEV_RX_OFFLOAD_UDP_CKSUM  |
 		DEV_RX_OFFLOAD_TCP_CKSUM;
 	dev_info->tx_offload_capa    = 0;
+	dev_info->hash_key_size = FM10K_RSSRK_SIZE * sizeof(uint32_t);
 	dev_info->reta_size = FM10K_MAX_RSS_INDICES;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 0/6] query hash key size in byte
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (5 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: show " Helin Zhang
@ 2015-06-12  7:33  4%   ` Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
                       ` (6 more replies)
  6 siblings, 7 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

As different hardware has different hash key sizes, querying it (in byte)
per port was asked by users. Otherwise there is no convenient way to know
the size of hash key which should be prepared.

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* Moved the newly added element right after 'uint16_t reta_size', where it
  was a padding. So it will not break any ABI compatibility, and no need to
  disable the code changes by default.

Helin Zhang (6):
  ethdev: add an field for querying hash key size
  e1000: fill the hash key size
  fm10k: fill the hash key size
  i40e: fill the hash key size
  ixgbe: fill the hash key size
  app/testpmd: show the hash key size

 app/test-pmd/config.c             | 2 ++
 drivers/net/e1000/igb_ethdev.c    | 3 +++
 drivers/net/fm10k/fm10k_ethdev.c  | 1 +
 drivers/net/i40e/i40e_ethdev.c    | 2 ++
 drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
 drivers/net/ixgbe/ixgbe_ethdev.c  | 3 +++
 lib/librte_ether/rte_ethdev.h     | 1 +
 7 files changed, 14 insertions(+)

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 6/6] app/testpmd: show the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (4 preceding siblings ...)
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Helin Zhang
@ 2015-06-12  7:34  4%     ` Helin Zhang
  2015-06-12  9:31  0%     ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Ananyev, Konstantin
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:34 UTC (permalink / raw)
  To: dev

As querying hash key size in byte was supported, it can be shown
in testpmd after getting the device information if not zero.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pmd/config.c | 2 ++
 1 file changed, 2 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f788ed5..800756f 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -361,6 +361,8 @@ port_infos_display(portid_t port_id)
 
 	memset(&dev_info, 0, sizeof(dev_info));
 	rte_eth_dev_info_get(port_id, &dev_info);
+	if (dev_info.hash_key_size > 0)
+		printf("Hash key size in bytes: %u\n", dev_info.hash_key_size);
 	if (dev_info.reta_size > 0)
 		printf("Redirection table size: %u\n", dev_info.reta_size);
 	if (!dev_info.flow_type_rss_offloads)
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 5/6] ixgbe: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (3 preceding siblings ...)
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Helin Zhang
@ 2015-06-12  7:34  4%     ` Helin Zhang
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 6/6] app/testpmd: show " Helin Zhang
  2015-06-12  9:31  0%     ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Ananyev, Konstantin
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:34 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..588ccc0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -116,6 +116,8 @@
 
 #define IXGBE_QUEUE_STAT_COUNTERS (sizeof(hw_stats->qprc) / sizeof(hw_stats->qprc[0]))
 
+#define IXGBE_HKEY_MAX_INDEX 10
+
 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
@@ -2052,6 +2054,7 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
 				ETH_TXQ_FLAGS_NOOFFLOADS,
 	};
+	dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
 }
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 4/6] i40e: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (2 preceding siblings ...)
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Helin Zhang
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 2 ++
 drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
 2 files changed, 4 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..c699ddb 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1540,6 +1540,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_SCTP_CKSUM |
 		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_TCP_TSO;
+	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
 	dev_info->reta_size = pf->hash_lut_size;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 4f4404e..f70d94c 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1670,6 +1670,8 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 2/6] e1000: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Helin Zhang
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..7d388f3 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -68,6 +68,8 @@
 #define IGB_DEFAULT_TX_HTHRESH      0
 #define IGB_DEFAULT_TX_WTHRESH      0
 
+#define IGB_HKEY_MAX_INDEX 10
+
 /* Bit shift and mask */
 #define IGB_4_BIT_WIDTH  (CHAR_BIT / 2)
 #define IGB_4_BIT_MASK   RTE_LEN2MASK(IGB_4_BIT_WIDTH, uint8_t)
@@ -1377,6 +1379,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		/* Should not happen */
 		break;
 	}
+	dev_info->hash_key_size = IGB_HKEY_MAX_INDEX * sizeof(uint32_t);
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IGB_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-15 15:01  3%       ` Thomas Monjalon
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 2/6] e1000: fill the " Helin Zhang
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

To support querying hash key size per port, an new field of
'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
hash key size in bytes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 1 +
 1 file changed, 1 insertion(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* Moved the newly added element right after 'uint16_t reta_size', where it
  was a padding. So it will not break any ABI compatibility, and no need to
  disable it by default.

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..bce152d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -918,6 +918,7 @@ struct rte_eth_dev_info {
 	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
 	uint16_t reta_size;
 	/**< Device redirection table size, the total number of entries. */
+	uint8_t hash_key_size; /**< Hash key size in bytes */
 	/** Bit mask of RSS offloads, the bit offset also means flow type */
 	uint64_t flow_type_rss_offloads;
 	struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 16:14  5%             ` Thomas Monjalon
@ 2015-06-12  7:24  5%               ` Panu Matilainen
  2015-06-12  7:43  3%                 ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-12  7:24 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier MATZ, O'Driscoll, Tim, Zhang, Helin,
	nhorman
  Cc: dev

On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> 2015-06-10 16:32, Olivier MATZ:
>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>>>> In order to unify the packet type, the field of 'packet_type' in
>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
>>>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
>>>>> KNI should be right mapped to 'struct rte_mbuf', it should be
>>>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
>>>>> by default, as 'struct rte_mbuf' changed.
>>>>> To avoid breaking ABI compatibility, all the changes would be
>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>>>
>>>> What are the plans for this compile-time option in the future?
>>>>
>>>> I wonder what are the benefits of having this option in terms
>>>> of ABI compatibility: when it is disabled, it is ABI-compatible but
>>>> the packet-type feature is not present, and when it is enabled we
>>>> have the feature but it breaks the compatibility.
>>>>
>>>> In my opinion, the v5 is preferable: for this kind of features, I
>>>> don't see how the ABI can be preserved, and I think packet-type
>>>> won't be the only feature that will modify the mbuf structure. I think
>>>> the process described here should be applied:
>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>>>
>>>> (starting from "Some ABI changes may be too significant to reasonably
>>>> maintain multiple versions of").
>>>
>>> This is just like the change that Steve (Cunming) Liang submitted for
>>> Interrupt Mode. We have the same problem in both cases: we want to find
>>> a way to get the features included, but need to comply with our ABI
>>> policy. So, in both cases, the proposal is to add a config option to
>>> enable the change by default, so we maintain backward compatibility.
>>> Users that want these changes, and are willing to accept the
>>> associated ABI change, have to specifically enable them.
>>>
>>> We can note in the Deprecation Notices in the Release Notes for 2.1
>>> that these config options will be removed in 2.2. The features will
>>> then be enabled by default.
>>>
>>> This seems like a good compromise which allows us to get these changes
>>> into 2.1 but avoids breaking the ABI policy.
>>
>> Sorry for the late answer.
>>
>> After some thoughts on this topic, I understand that having a
>> compile-time option is perhaps a good compromise between
>> keeping compatibility and having new features earlier.
>>
>> I'm just afraid about having one #ifdef in the code for
>> each new feature that cannot keep the ABI compatibility.
>> What do you think about having one option -- let's call
>> it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
>> and that would surround any new feature that breaks the
>> ABI?
>>
>> This would have several advantages:
>> - only 2 cases (on or off), the combinatorial is smaller than
>>     having one option per feature
>> - all next features breaking the abi can be identified by a grep
>> - the code inside the #ifdef can be enabled in a simple operation
>>     by Thomas after each release.
>>
>> Thomas, any comment?
>
> As previously discussed (1to1) with Olivier, I think that's a good proposal
> to introduce changes breaking deeply the ABI.
>
> Let's sum up the current policy:
> 1/ For changes which have a limited impact on the ABI, the backward compatibility
> must be kept during 1 release including the notice in doc/guides/rel_notes/abi.rst.
> 2/ For important changes like mbuf rework, there was an agreement on skipping the
> backward compatibility after having 3 acknowledgements and an 1-release long notice.
> Then the ABI numbering must be incremented.
>
> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second case.
> In order to be adopted, a patch for the file doc/guides/rel_notes/abi.rst must
> be submitted and strongly acknowledged.
>
> The ABI numbering must be also clearly explained:
> 1/ Should we have different libraries version number depending of CONFIG_RTE_NEXT_ABI?
> It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles

An incompatible ABI must be reflected by a soname change, otherwise the 
whole library versioning is irrelevant.

> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in the .map files?
> Maybe we should remove these files and generate them with some preprocessing.
>
> Neil, as the ABI policy author, what is your opinion?

I'm not Neil but my 5c...

Working around ABI compatibility policy via config options seems like a 
slippery slope. Going forward this will likely mean there are always two 
different ABIs for any given version, and the thought of keeping track 
of it all in a truly compatible manner makes my head hurt.

That said its easy to understand the desire to move faster than the ABI 
policy allows. In a project where so many structs are in the open it 
gets hard to do much anything at all without breaking the ABI.

The issue could be mitigated somewhat by reserving some space at the end 
of the structs eg when the ABI needs to be changed anyway, but it has 
obvious downsides as well. The other options I see tend to revolve 
around changing release policies one way or the other: releasing ABI 
compatible micro versions between minor versions and relaxing the ABI 
policy a bit, or just releasing new minor versions more often than the 
current cycle.

	- Panu -

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH 2/3] kni: remove deprecated functions
  @ 2015-06-12  6:20  3%   ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-12  6:20 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger

On 06/12/2015 08:18 AM, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> These functions were tagged as deprecated in 2.0 so they can be
> removed in 2.1
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   app/test/Makefile        |  6 ------
>   app/test/test_kni.c      | 36 --------------------------------
>   lib/librte_kni/rte_kni.c | 50 --------------------------------------------
>   lib/librte_kni/rte_kni.h | 54 ------------------------------------------------
>   4 files changed, 146 deletions(-)
>
[...]
> diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
> index 603e2cd..f65ce24 100644
> --- a/lib/librte_kni/rte_kni.h
> +++ b/lib/librte_kni/rte_kni.h
> @@ -129,30 +129,6 @@ extern struct rte_kni *rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
>   				     struct rte_kni_ops *ops);
>
>   /**
> - * It create a KNI device for specific port.
> - *
> - * Note: It is deprecated and just for backward compatibility.
> - *
> - * @param port_id
> - *  Port ID.
> - * @param mbuf_size
> - *  mbuf size.
> - * @param pktmbuf_pool
> - *  The mempool for allocting mbufs for packets.
> - * @param ops
> - *  The pointer to the callbacks for the KNI kernel requests.
> - *
> - * @return
> - *  - The pointer to the context of a KNI interface.
> - *  - NULL indicate error.
> - */
> -extern struct rte_kni *rte_kni_create(uint8_t port_id,
> -				      unsigned mbuf_size,
> -				      struct rte_mempool *pktmbuf_pool,
> -				      struct rte_kni_ops *ops) \
> -				      __attribute__ ((deprecated));
> -
> -/**
>    * Release KNI interface according to the context. It will also release the
>    * paired KNI interface in kernel space. All processing on the specific KNI
>    * context need to be stopped before calling this interface.
> @@ -221,21 +197,6 @@ extern unsigned rte_kni_tx_burst(struct rte_kni *kni,
>   		struct rte_mbuf **mbufs, unsigned num);
>
>   /**
> - * Get the port id from KNI interface.
> - *
> - * Note: It is deprecated and just for backward compatibility.
> - *
> - * @param kni
> - *  The KNI interface context.
> - *
> - * @return
> - *  On success: The port id.
> - *  On failure: ~0x0
> - */
> -extern uint8_t rte_kni_get_port_id(struct rte_kni *kni) \
> -				__attribute__ ((deprecated));
> -
> -/**
>    * Get the KNI context of its name.
>    *
>    * @param name
> @@ -248,21 +209,6 @@ extern uint8_t rte_kni_get_port_id(struct rte_kni *kni) \
>   extern struct rte_kni *rte_kni_get(const char *name);
>
>   /**
> - * Get the KNI context of the specific port.
> - *
> - * Note: It is deprecated and just for backward compatibility.
> - *
> - * @param port_id
> - *  the port id.
> - *
> - * @return
> - *  On success: Pointer to KNI interface.
> - *  On failure: NULL
> - */
> -extern struct rte_kni *rte_kni_info_get(uint8_t port_id) \
> -				__attribute__ ((deprecated));
> -
> -/**
>    * Register KNI request handling for a specified port,and it can
>    * be called by master process or slave process.
>    *
>

These symbols need to be removed from rte_kni_version.map too, and since 
its an ABI break, the library soname needs a bump as well.

	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
  2015-06-04 10:38  3%     ` Ananyev, Konstantin
@ 2015-06-12  6:06  0%       ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-12  6:06 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, June 4, 2015 6:38 PM
> To: Zhang, Helin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash
> key size
> 
> Hi Helin,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Helin Zhang
> > Sent: Thursday, June 04, 2015 8:34 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying
> > hash key size
> >
> > To support querying hash key size per port, an new field of
> > 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> > hash key size in bytes.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > v2 changes:
> > * Disabled the code changes by default, to avoid breaking ABI compatibility.
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 16dbe00..bdebc87 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -916,6 +916,9 @@ struct rte_eth_dev_info {
> >  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> >  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> >  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> > +#ifdef RTE_QUERY_HASH_KEY_SIZE
> > +	uint8_t hash_key_size; /**< Hash key size in bytes */ #endif
> >  	uint16_t reta_size;
> >  	/**< Device redirection table size, the total number of entries. */
> >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> 
> Why do you need to introduce an #ifdef RTE_QUERY_HASH_KEY_SIZE around
> your code?
> Why not to have it always on?
> Is it because of not breaking ABI for 2.1?
> But here, I suppose there would be no breakage anyway:
> 
> struct rte_eth_dev_info {
> ...
>         uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
>         uint16_t reta_size;
>         /**< Device redirection table size, the total number of entries. */
>         /** Bit mask of RSS offloads, the bit offset also means flow type */
>         uint64_t flow_type_rss_offloads;
>         struct rte_eth_rxconf default_rxconf;
> 
> 
> so between 'reta_size' and 'flow_type_rss_offloads', there is a 2 bytes gap.
> Wonder, why not put it there?
Oh, yes, you are totally right. There should be a 2 bytes padding there.
I will rework it with that. Thanks!

Helin

> 
> Konstantin
> 
> > --
> > 1.9.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] rte_ring: remove deprecated functions
  @ 2015-06-12  5:46  3%   ` Panu Matilainen
  2015-06-12 14:00  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-12  5:46 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger

On 06/12/2015 08:18 AM, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> These were deprecated in 2.0 so remove them from 2.1
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
>   drivers/net/ring/rte_eth_ring_version.map |  2 --
>   2 files changed, 57 deletions(-)
>
[...]
> diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> index 8ad107d..0875e25 100644
> --- a/drivers/net/ring/rte_eth_ring_version.map
> +++ b/drivers/net/ring/rte_eth_ring_version.map
> @@ -2,8 +2,6 @@ DPDK_2.0 {
>   	global:
>
>   	rte_eth_from_rings;
> -	rte_eth_ring_pair_attach;
> -	rte_eth_ring_pair_create;
>
>   	local: *;
>   };

Removing symbols is an ABI break so it additionally requires a soname 
bump for this library.

In addition, simply due to being the first library to do so, it'll also 
then break the combined shared library as is currently is. Mind you, 
this is not an objection at all, the need to change to a linker script 
approach has always been a matter of time.

	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 15:39  0%             ` Ananyev, Konstantin
@ 2015-06-12  3:22  0%               ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-12  3:22 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier MATZ, O'Driscoll, Tim, Thomas Monjalon
  Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, June 10, 2015 11:40 PM
> To: Olivier MATZ; O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> Hi Olivier,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > Sent: Wednesday, June 10, 2015 3:33 PM
> > To: O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> > rte_mbuf
> >
> > Hi Tim, Helin,
> >
> > On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> > >
> > >> -----Original Message-----
> > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > >> Sent: Monday, June 1, 2015 9:15 AM
> > >> To: Zhang, Helin; dev@dpdk.org
> > >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> > >> in rte_mbuf
> > >>
> > >> Hi Helin,
> > >>
> > >> +CC Neil
> > >>
> > >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> > >>> In order to unify the packet type, the field of 'packet_type' in
> > >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> > >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> > >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> > >>> KNI should be right mapped to 'struct rte_mbuf', it should be
> > >>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> > >>> by default, as 'struct rte_mbuf' changed.
> > >>> To avoid breaking ABI compatibility, all the changes would be
> > >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> > >>
> > >> What are the plans for this compile-time option in the future?
> > >>
> > >> I wonder what are the benefits of having this option in terms of
> > >> ABI compatibility: when it is disabled, it is ABI-compatible but
> > >> the packet-type feature is not present, and when it is enabled we
> > >> have the feature but it breaks the compatibility.
> > >>
> > >> In my opinion, the v5 is preferable: for this kind of features, I
> > >> don't see how the ABI can be preserved, and I think packet-type
> > >> won't be the only feature that will modify the mbuf structure. I
> > >> think the process described here should be applied:
> > >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> > >>
> > >> (starting from "Some ABI changes may be too significant to
> > >> reasonably maintain multiple versions of").
> > >>
> > >>
> > >> Regards,
> > >> Olivier
> > >>
> > >
> > > This is just like the change that Steve (Cunming) Liang submitted
> > > for Interrupt Mode. We have the same problem in both cases: we
> > want to find a way to get the features included, but need to comply
> > with our ABI policy. So, in both cases, the proposal is to add a
> > config option to enable the change by default, so we maintain backward
> compatibility. Users that want these changes, and are willing to accept the
> associated ABI change, have to specifically enable them.
> > >
> > > We can note in the Deprecation Notices in the Release Notes for 2.1
> > > that these config options will be removed in 2.2. The features
> > will then be enabled by default.
> > >
> > > This seems like a good compromise which allows us to get these changes into
> 2.1 but avoids breaking the ABI policy.
> >
> > Sorry for the late answer.
> >
> > After some thoughts on this topic, I understand that having a
> > compile-time option is perhaps a good compromise between keeping
> > compatibility and having new features earlier.
> >
> > I'm just afraid about having one #ifdef in the code for each new
> > feature that cannot keep the ABI compatibility.
> > What do you think about having one option -- let's call it
> > "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would
> > surround any new feature that breaks the ABI?
> 
> I am not Tim/Helin, but really like that idea :) Konstantin

It seems more guys like Oliver's idea of introducing CONFIG_RTE_NEXT_ABI. Any objections?
If none, I will rework my patches with that.

- Helin

> 
> 
> >
> > This would have several advantages:
> > - only 2 cases (on or off), the combinatorial is smaller than
> >    having one option per feature
> > - all next features breaking the abi can be identified by a grep
> > - the code inside the #ifdef can be enabled in a simple operation
> >    by Thomas after each release.
> >
> > Thomas, any comment?
> >
> > Regards,
> > Olivier
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
  @ 2015-06-11 14:35  3%       ` Marc Sune
  0 siblings, 0 replies; 200+ results
From: Marc Sune @ 2015-06-11 14:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



On 11/06/15 11:08, Thomas Monjalon wrote:
> 2015-06-08 10:50, Marc Sune:
>> On 29/05/15 20:23, Thomas Monjalon wrote:
>>> 2015-05-27 11:15, Marc Sune:
>>>> On 27/05/15 06:02, Thomas Monjalon wrote:
>>>>>> +#define ETH_SPEED_CAP_10M_HD	(1 << 0)  /*< 10 Mbps half-duplex> */
>>>>>> +#define ETH_SPEED_CAP_10M_FD	(1 << 1)  /*< 10 Mbps full-duplex> */
>>>>>> +#define ETH_SPEED_CAP_100M_HD	(1 << 2)  /*< 100 Mbps half-duplex> */
>>>>>> +#define ETH_SPEED_CAP_100M_FD	(1 << 3)  /*< 100 Mbps full-duplex> */
>>>>>> +#define ETH_SPEED_CAP_1G	(1 << 4)  /*< 1 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_2_5G	(1 << 5)  /*< 2.5 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_5G	(1 << 6)  /*< 5 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_10G	(1 << 7)  /*< 10 Mbps > */
>>>>>> +#define ETH_SPEED_CAP_20G	(1 << 8)  /*< 20 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_25G	(1 << 9)  /*< 25 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_40G	(1 << 10)  /*< 40 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_50G	(1 << 11)  /*< 50 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_56G	(1 << 12)  /*< 56 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_100G	(1 << 13)  /*< 100 Gbps > */
>>>>> We should note that rte_eth_link is using ETH_LINK_SPEED_* constants
>>>>> which are not some bitmaps so we have to create these new constants.
>>>> Yes, I can add that to the patch description (1/2).
>>>>
>>>>> Furthermore, rte_eth_link.link_speed is an uint16_t so it is limited
>>>>> to 40G. Should we use some constant bitmaps here also?
>>>> I also thought about converting link_speed into a bitmap to unify the
>>>> constants before starting the patch (there is redundancy), but I wanted
>>>> to be minimally invasive; changing link to a bitmap can break existing apps.
>>>>
>>>> I can also merge them if we think is a better idea.
>>> Maybe. Someone against this idea?
>> Me. I tried implementing this unified speed constantss, but the problem
>> is that for the capabilities full-duplex/half-duplex speeds are unrolled
>> (e.g. 100M_HD/100_FD). There is no generic 100M to set a specific speed,
> Or we can define ETH_SPEED_CAP_100M and ETH_SPEED_CAP_100M_FD.
> Is it possible to have a NIC doing 100M_FD but not 100M_HD?

Did not check in detail, but I guess this is mostly legacy NICs, not 
supported by DPDK anyway, and that is safe to assume 10M/100M meaning 
supports both HD/FD.

>
>> so if you want a fiex speed and duplex auto-negociation witht the
>> current set of constants, it would look weird; e.g.
>> link_speed=ETH_SPEED_100M_HD and then set
>> link_duplex=ETH_LINK_AUTONEG_DUPLEX):
>>
>>    232 struct rte_eth_link {
>>    233         uint16_t link_speed;      /**< ETH_LINK_SPEED_[10, 100,
>> 1000, 10000] */
>>    234         uint16_t link_duplex;     /**< ETH_LINK_[HALF_DUPLEX,
>> FULL_DUPLEX] */
>>    235         uint8_t  link_status : 1; /**< 1 -> link up, 0 -> link
>> down */
>>    236 }__attribute__((aligned(8)));     /**< aligned for atomic64
>> read/write */
>>
>> There is another minor point, which is when setting the speed in
>> rte_eth_conf:
>>
>>    840 struct rte_eth_conf {
>>    841         uint16_t link_speed;
>>    842         /**< ETH_LINK_SPEED_10[0|00|000], or 0 for autonegotation */
>>
>> 0 is used for speed auto-negociation, but 0 is also used in the
>> capabilities bitmap to indicate no PHY_MEDIA (virtual interface). I
>> would have to define something like:
>>
>> 906 #define ETH_SPEED_NOT_PHY   (0)  /*< No phy media > */
>> 907 #define ETH_SPEED_AUTONEG   (0)  /*< Autonegociate speed > */
> Or something like SPEED_UNDEFINED

Ok. I will prepare the patch and circulate a v3.

After briefly chatting offline with Thomas, it seems I was not clearly 
stating in my original v1 that this patch is targeting DPDK v2.2, due to 
ABI(and API) issues.

It is, and so I will hold v3 until 2.2 window starts, to make it more clear.

thanks
Marc
>
>> And use (only) NOT_PHY for a capabilities and _AUTONEG for rte_eth_conf.
>>
>> The options I see:
>>
>> a) add to the the list of the current speeds generic 10M/100M/1G speeds
>> without HD/FD, and just use these speeds in rte_eth_conf.
>> b) leave them separated.
>>
>> I would vote for b), since the a) is not completely clean.
>> Opinions&other alternatives welcome.
>>
>> Marc

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics
  2015-06-08 14:50  5% [dpdk-dev] [PATCH] doc: guidelines for library statistics Cristian Dumitrescu
@ 2015-06-11 12:05  0% ` Thomas Monjalon
  2015-06-15 21:46  4%   ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-11 12:05 UTC (permalink / raw)
  To: Cristian Dumitrescu; +Cc: dev

Hi Cristian,

Thanks for trying to make a policy clearer.
We need to make a decision in the coming week.
Below are comments on the style and content.

2015-06-08 15:50, Cristian Dumitrescu:
>  doc/guides/guidelines/statistics.rst | 42 ++++++++++++++++++++++++++++++++++++

Maybe we should have a more general file like design.rst.
In order to have a lot of readers of such guidelines, they must be concise.

Please wrap lines to be not too long and/or split lines after the end of a sentence.

> +Library Statistics
> +==================
> +
> +Description
> +-----------
> +
> +This document describes the guidelines for DPDK library-level statistics counter support. This includes guidelines for turning library statistics on and off, requirements for preventing ABI changes when library statistics are turned on and off, etc.

Should we consider that driver stats and lib stats are different in DPDK? Why?

> +Motivation to allow the application to turn library statistics on and off
> +-------------------------------------------------------------------------
> +
> +It is highly recommended that each library provides statistics counters to allow the application to monitor the library-level run-time events. Typical counters are: number of packets received/dropped/transmitted, number of buffers allocated/freed, number of occurrences for specific events, etc.
> +
> +Since the resources consumed for library-level statistics counter collection have to be spent out of the application budget and the counters collected by some libraries might not be relevant for the current application, in order to avoid any unwanted waste of resources and/or performance for the application, the application is to decide at build time whether the collection of library-level statistics counters should be turned on or off for each library individually.

It would be good to have acknowledgements or other opinions on this.
Some of them were expressed in other threads. Please comment here.

> +Library-level statistics counters can be relevant or not for specific applications:
> +* For application A, counters maintained by library X are always relevant and the application needs to use them to implement certain features, as traffic accounting, logging, application-level statistics, etc. In this case, the application requires that collection of statistics counters for library X is always turned on;
> +* For application B, counters maintained by library X are only useful during the application debug stage and not relevant once debug phase is over. In this case, the application may decide to turn on the collection of library X statistics counters during the debug phase and later on turn them off;

Users of binary package have not this choice.

> +* For application C, counters maintained by library X are not relevant at all. It might me that the application maintains its own set of statistics counters that monitor a different set of run-time events than library X (e.g. number of connection requests, number of active users, etc). It might also be that application uses multiple libraries (library X, library Y, etc) and it is interested in the statistics counters of library Y, but not in those of library X. In this case, the application may decide to turn the collection of statistics counters off for library X and on for library Y.
> +
> +The statistics collection consumes a certain amount of CPU resources (cycles, cache bandwidth, memory bandwidth, etc) that depends on:
> +* Number of libraries used by the current application that have statistics counters collection turned on;
> +* Number of statistics counters maintained by each library per object type instance (e.g. per port, table, pipeline, thread, etc);
> +* Number of instances created for each object type supported by each library;
> +* Complexity of the statistics logic collection for each counter: when only some occurrences of a specific event are valid, several conditional branches might involved in the decision of whether the current occurrence of the event should be counted or not (e.g. on the event of packet reception, only TCP packets with destination port within a certain range should be recorded), etc.
> +
> +Mechanism to allow the application to turn library statistics on and off
> +------------------------------------------------------------------------
> +
> +Each library that provides statistics counters should provide a single build time flag that decides whether the statistics counter collection is enabled or not for this library. This flag should be exposed as a variable within the DPDK configuration file. When this flag is set, all the counters supported by current library are collected; when this flag is cleared, none of the counters supported by the current library are collected:
> +
> +	#DPDK file “./config/common_linuxapp”, “./config/common_bsdapp”, etc
> +	CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_COLLECT_STATS=y/n

Why not simply CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_STATS (without COLLECT)?

> +The default value for this DPDK configuration file variable (either “yes” or “no”) is left at the decision of each library.
> +
> +Prevention of ABI changes due to library statistics support
> +-----------------------------------------------------------
> +
> +The layout of data structures and prototype of functions that are part of the library API should not be affected by whether the collection of statistics counters is turned on or off for the current library. In practical terms, this means that space is always allocated in the API data structures for statistics counters and the statistics related API functions are always built into the code, regardless of whether the statistics counter collection is turned on or off for the current library.
> +
> +When the collection of statistics counters for the current library is turned off, the counters retrieved through the statistics related API functions should have the default value of zero.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 2/7] lib_vhost: Support multiple queues in virtio dev
  @ 2015-06-11  9:54  5%     ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-11  9:54 UTC (permalink / raw)
  To: Ouyang Changchun, dev

On 06/10/2015 08:52 AM, Ouyang Changchun wrote:
> Each virtio device could have multiple queues, say 2 or 4, at most 8.
> Enabling this feature allows virtio device/port on guest has the ability to
> use different vCPU to receive/transmit packets from/to each queue.
>
> In multiple queues mode, virtio device readiness means all queues of
> this virtio device are ready, cleanup/destroy a virtio device also
> requires clearing all queues belong to it.
>
> Changes in v2:
>    - remove the q_num_set api
>    - add the qp_num_get api
>    - determine the queue pair num from qemu message
>    - rework for reset owner message handler
>    - dynamically alloc mem for dev virtqueue
>    - queue pair num could be 0x8000
>    - fix checkpatch errors
>
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> ---
>   lib/librte_vhost/rte_virtio_net.h             |  10 +-
>   lib/librte_vhost/vhost-net.h                  |   1 +
>   lib/librte_vhost/vhost_rxtx.c                 |  32 ++---
>   lib/librte_vhost/vhost_user/vhost-net-user.c  |   4 +-
>   lib/librte_vhost/vhost_user/virtio-net-user.c |  76 +++++++++---
>   lib/librte_vhost/vhost_user/virtio-net-user.h |   2 +
>   lib/librte_vhost/virtio-net.c                 | 161 +++++++++++++++++---------
>   7 files changed, 197 insertions(+), 89 deletions(-)
>
> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
> index 5d38185..92b4bfa 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -59,7 +59,6 @@ struct rte_mbuf;
>   /* Backend value set by guest. */
>   #define VIRTIO_DEV_STOPPED -1
>
> -
>   /* Enum for virtqueue management. */
>   enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
>
> @@ -96,13 +95,14 @@ struct vhost_virtqueue {
>    * Device structure contains all configuration information relating to the device.
>    */
>   struct virtio_net {
> -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
>   	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
> +	struct vhost_virtqueue	**virtqueue;    /**< Contains all virtqueue information. */
>   	uint64_t		features;	/**< Negotiated feature set. */
>   	uint64_t		device_fh;	/**< device identifier. */
>   	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
>   #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>   	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
> +	uint32_t                num_virt_queues;
>   	void			*priv;		/**< private context */
>   } __rte_cache_aligned;
>
> @@ -220,4 +220,10 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id,
>   uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>   	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
>

Unfortunately this is an ABI break, NAK. Ditto for other changes to 
struct virtio_net in patch 3/7 in this series. See 
http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst for the 
ABI policy.

There's plenty of discussion around the ABI going on at the moment, 
including this thread: http://dpdk.org/ml/archives/dev/2015-June/018456.html

	- Panu -

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v3 0/7] support i40e QinQ stripping and insertion
  2015-06-11  7:03  3%     ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
  2015-06-11  7:03  2%       ` [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan " Helin Zhang
@ 2015-06-11  7:25  0%       ` Wu, Jingjing
  1 sibling, 0 replies; 200+ results
From: Wu, Jingjing @ 2015-06-11  7:25 UTC (permalink / raw)
  To: Zhang, Helin, dev

Acked-by: Jingjing Wu <jingjing.wu@intel.com>


> -----Original Message-----
> From: Zhang, Helin
> Sent: Thursday, June 11, 2015 3:04 PM
> To: dev@dpdk.org
> Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; Richardson,
> Bruce; olivier.matz@6wind.com; Zhang, Helin
> Subject: [PATCH v3 0/7] support i40e QinQ stripping and insertion
> 
> As i40e hardware can be reconfigured to support QinQ stripping and insertion,
> this patch set is to enable that together with using the reserved 16 bits in
> 'struct rte_mbuf' for the second vlan tag. Corresponding command is added
> in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.
> 
> v3 changes:
> * update documentation (Testpmd Application User Guide).
> 
> Helin Zhang (7):
>   ixgbe: remove a discarded source line
>   mbuf: use the reserved 16 bits for double vlan
>   i40e: support double vlan stripping and insertion
>   i40evf: add supported offload capability flags
>   app/testpmd: add test cases for qinq stripping and insertion
>   examples/ipv4_multicast: support double vlan stripping and insertion
>   doc: update testpmd command
> 
>  app/test-pmd/cmdline.c                      | 78 ++++++++++++++++++++++++---
>  app/test-pmd/config.c                       | 21 +++++++-
>  app/test-pmd/flowgen.c                      |  4 +-
>  app/test-pmd/macfwd.c                       |  3 ++
>  app/test-pmd/macswap.c                      |  3 ++
>  app/test-pmd/rxonly.c                       |  3 ++
>  app/test-pmd/testpmd.h                      |  6 ++-
>  app/test-pmd/txonly.c                       |  8 ++-
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 14 ++++-
>  drivers/net/i40e/i40e_ethdev.c              | 52 ++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev_vf.c           | 13 +++++
>  drivers/net/i40e/i40e_rxtx.c                | 81 ++++++++++++++++++-----------
>  drivers/net/ixgbe/ixgbe_rxtx.c              |  1 -
>  examples/ipv4_multicast/main.c              |  1 +
>  lib/librte_ether/rte_ethdev.h               |  2 +
>  lib/librte_mbuf/rte_mbuf.h                  | 10 +++-
>  16 files changed, 255 insertions(+), 45 deletions(-)
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan stripping and insertion
  2015-06-11  7:03  3%     ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
@ 2015-06-11  7:03  2%       ` Helin Zhang
  2015-06-11  7:25  0%       ` [dpdk-dev] [PATCH v3 0/7] support i40e QinQ " Wu, Jingjing
  1 sibling, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-11  7:03 UTC (permalink / raw)
  To: dev

It configures specific registers to enable double vlan stripping
on RX side and insertion on TX side.
The RX descriptors will be parsed, the vlan tags and flags will be
saved to corresponding mbuf fields if vlan tag is detected.
The TX descriptors will be configured according to the
configurations in mbufs, to trigger the hardware insertion of
double vlan tags for each packets sent out.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev_vf.c |  6 +++
 drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
 lib/librte_ether/rte_ethdev.h     |  2 +
 4 files changed, 112 insertions(+), 29 deletions(-)

v2 changes:
* Kept the original RX/TX offload flags as they were, added new
  flags after with new bit masks, for ABI compatibility.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..7593a70 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -211,6 +211,7 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
 				void *arg);
 static void i40e_configure_registers(struct i40e_hw *hw);
 static void i40e_hw_init(struct i40e_hw *hw);
+static int i40e_config_qinq(struct i40e_hw *hw, struct i40e_vsi *vsi);
 
 static const struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
@@ -1529,11 +1530,13 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = dev->pci_dev->max_vfs;
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
 		DEV_RX_OFFLOAD_IPV4_CKSUM |
 		DEV_RX_OFFLOAD_UDP_CKSUM |
 		DEV_RX_OFFLOAD_TCP_CKSUM;
 	dev_info->tx_offload_capa =
 		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT |
 		DEV_TX_OFFLOAD_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_UDP_CKSUM |
 		DEV_TX_OFFLOAD_TCP_CKSUM |
@@ -3056,6 +3059,7 @@ i40e_vsi_setup(struct i40e_pf *pf,
 		 * macvlan filter which is expected and cannot be removed.
 		 */
 		i40e_update_default_filter_setting(vsi);
+		i40e_config_qinq(hw, vsi);
 	} else if (type == I40E_VSI_SRIOV) {
 		memset(&ctxt, 0, sizeof(ctxt));
 		/**
@@ -3096,6 +3100,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
 		 * Since VSI is not created yet, only configure parameter,
 		 * will add vsi below.
 		 */
+
+		i40e_config_qinq(hw, vsi);
 	} else if (type == I40E_VSI_VMDQ2) {
 		memset(&ctxt, 0, sizeof(ctxt));
 		/*
@@ -5697,3 +5703,49 @@ i40e_configure_registers(struct i40e_hw *hw)
 			"0x%"PRIx32, reg_table[i].val, reg_table[i].addr);
 	}
 }
+
+#define I40E_VSI_TSR(_i)            (0x00050800 + ((_i) * 4))
+#define I40E_VSI_TSR_QINQ_CONFIG    0xc030
+#define I40E_VSI_L2TAGSTXVALID(_i)  (0x00042800 + ((_i) * 4))
+#define I40E_VSI_L2TAGSTXVALID_QINQ 0xab
+static int
+i40e_config_qinq(struct i40e_hw *hw, struct i40e_vsi *vsi)
+{
+	uint32_t reg;
+	int ret;
+
+	if (vsi->vsi_id >= I40E_MAX_NUM_VSIS) {
+		PMD_DRV_LOG(ERR, "VSI ID exceeds the maximum");
+		return -EINVAL;
+	}
+
+	/* Configure for double VLAN RX stripping */
+	reg = I40E_READ_REG(hw, I40E_VSI_TSR(vsi->vsi_id));
+	if ((reg & I40E_VSI_TSR_QINQ_CONFIG) != I40E_VSI_TSR_QINQ_CONFIG) {
+		reg |= I40E_VSI_TSR_QINQ_CONFIG;
+		ret = i40e_aq_debug_write_register(hw,
+						   I40E_VSI_TSR(vsi->vsi_id),
+						   reg, NULL);
+		if (ret < 0) {
+			PMD_DRV_LOG(ERR, "Failed to update VSI_TSR[%d]",
+				    vsi->vsi_id);
+			return I40E_ERR_CONFIG;
+		}
+	}
+
+	/* Configure for double VLAN TX insertion */
+	reg = I40E_READ_REG(hw, I40E_VSI_L2TAGSTXVALID(vsi->vsi_id));
+	if ((reg & 0xff) != I40E_VSI_L2TAGSTXVALID_QINQ) {
+		reg = I40E_VSI_L2TAGSTXVALID_QINQ;
+		ret = i40e_aq_debug_write_register(hw,
+						   I40E_VSI_L2TAGSTXVALID(
+						   vsi->vsi_id), reg, NULL);
+		if (ret < 0) {
+			PMD_DRV_LOG(ERR, "Failed to update "
+				"VSI_L2TAGSTXVALID[%d]", vsi->vsi_id);
+			return I40E_ERR_CONFIG;
+		}
+	}
+
+	return 0;
+}
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 4f4404e..3ae2553 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1672,6 +1672,12 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP;
+	dev_info->tx_offload_capa =
+		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 2de0ac4..b2e1d6d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -94,18 +94,44 @@ static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
 
+static inline void
+i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp)
+{
+	if (rte_le_to_cpu_64(rxdp->wb.qword1.status_error_len) &
+		(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) {
+		mb->ol_flags |= PKT_RX_VLAN_PKT;
+		mb->vlan_tci =
+			rte_le_to_cpu_16(rxdp->wb.qword0.lo_dword.l2tag1);
+		PMD_RX_LOG(DEBUG, "Descriptor l2tag1: %u",
+			   rte_le_to_cpu_16(rxdp->wb.qword0.lo_dword.l2tag1));
+	} else {
+		mb->vlan_tci = 0;
+	}
+#ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+	if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) &
+		(1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) {
+		mb->ol_flags |= PKT_RX_QINQ_PKT;
+		mb->vlan_tci_outer = mb->vlan_tci;
+		mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2);
+		PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
+			   rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_1),
+			   rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2));
+	} else {
+		mb->vlan_tci_outer = 0;
+	}
+#endif
+	PMD_RX_LOG(DEBUG, "Mbuf vlan_tci: %u, vlan_tci_outer: %u",
+		   mb->vlan_tci, mb->vlan_tci_outer);
+}
+
 /* Translate the rx descriptor status to pkt flags */
 static inline uint64_t
 i40e_rxd_status_to_pkt_flags(uint64_t qword)
 {
 	uint64_t flags;
 
-	/* Check if VLAN packet */
-	flags = qword & (1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-							PKT_RX_VLAN_PKT : 0;
-
 	/* Check if RSS_HASH */
-	flags |= (((qword >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
+	flags = (((qword >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
 					I40E_RX_DESC_FLTSTAT_RSS_HASH) ==
 			I40E_RX_DESC_FLTSTAT_RSS_HASH) ? PKT_RX_RSS_HASH : 0;
 
@@ -696,16 +722,12 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			mb = rxep[j].mbuf;
 			qword1 = rte_le_to_cpu_64(\
 				rxdp[j].wb.qword1.status_error_len);
-			rx_status = (qword1 & I40E_RXD_QW1_STATUS_MASK) >>
-						I40E_RXD_QW1_STATUS_SHIFT;
 			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
 				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
-			mb->vlan_tci = rx_status &
-				(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-			rte_le_to_cpu_16(\
-				rxdp[j].wb.qword0.lo_dword.l2tag1) : 0;
+			mb->ol_flags = 0;
+			i40e_rxd_to_vlan_tci(mb, &rxdp[j]);
 			pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -719,7 +741,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			if (pkt_flags & PKT_RX_FDIR)
 				pkt_flags |= i40e_rxd_build_fdir(&rxdp[j], mb);
 
-			mb->ol_flags = pkt_flags;
+			mb->ol_flags |= pkt_flags;
 		}
 
 		for (j = 0; j < I40E_LOOK_AHEAD; j++)
@@ -945,10 +967,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = rx_packet_len;
 		rxm->data_len = rx_packet_len;
 		rxm->port = rxq->port_id;
-
-		rxm->vlan_tci = rx_status &
-			(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
+		rxm->ol_flags = 0;
+		i40e_rxd_to_vlan_tci(rxm, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -960,7 +980,7 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (pkt_flags & PKT_RX_FDIR)
 			pkt_flags |= i40e_rxd_build_fdir(&rxd, rxm);
 
-		rxm->ol_flags = pkt_flags;
+		rxm->ol_flags |= pkt_flags;
 
 		rx_pkts[nb_rx++] = rxm;
 	}
@@ -1105,9 +1125,8 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		}
 
 		first_seg->port = rxq->port_id;
-		first_seg->vlan_tci = (rx_status &
-			(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
-			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
+		first_seg->ol_flags = 0;
+		i40e_rxd_to_vlan_tci(first_seg, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -1120,7 +1139,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		if (pkt_flags & PKT_RX_FDIR)
 			pkt_flags |= i40e_rxd_build_fdir(&rxd, rxm);
 
-		first_seg->ol_flags = pkt_flags;
+		first_seg->ol_flags |= pkt_flags;
 
 		/* Prefetch data of first segment, if configured to do so. */
 		rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
@@ -1158,17 +1177,15 @@ i40e_recv_scattered_pkts(void *rx_queue,
 static inline uint16_t
 i40e_calc_context_desc(uint64_t flags)
 {
-	uint64_t mask = 0ULL;
-
-	mask |= (PKT_TX_OUTER_IP_CKSUM | PKT_TX_TCP_SEG);
+	static uint64_t mask = PKT_TX_OUTER_IP_CKSUM |
+		PKT_TX_TCP_SEG |
+		PKT_TX_QINQ_PKT;
 
 #ifdef RTE_LIBRTE_IEEE1588
 	mask |= PKT_TX_IEEE1588_TMST;
 #endif
-	if (flags & mask)
-		return 1;
 
-	return 0;
+	return ((flags & mask) ? 1 : 0);
 }
 
 /* set i40e TSO context descriptor */
@@ -1289,9 +1306,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Descriptor based VLAN insertion */
-		if (ol_flags & PKT_TX_VLAN_PKT) {
+		if (ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_QINQ_PKT)) {
 			tx_flags |= tx_pkt->vlan_tci <<
-					I40E_TX_FLAG_L2TAG1_SHIFT;
+				I40E_TX_FLAG_L2TAG1_SHIFT;
 			tx_flags |= I40E_TX_FLAG_INSERT_VLAN;
 			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
 			td_tag = (tx_flags & I40E_TX_FLAG_L2TAG1_MASK) >>
@@ -1339,6 +1356,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			ctx_txd->tunneling_params =
 				rte_cpu_to_le_32(cd_tunneling_params);
+			if (ol_flags & PKT_TX_QINQ_PKT) {
+				cd_l2tag2 = tx_pkt->vlan_tci_outer;
+				cd_type_cmd_tso_mss |=
+					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
+						I40E_TXD_CTX_QW1_CMD_SHIFT);
+			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->type_cmd_tso_mss =
 				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..892280c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -887,6 +887,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_UDP_CKSUM   0x00000004
 #define DEV_RX_OFFLOAD_TCP_CKSUM   0x00000008
 #define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
+#define DEV_RX_OFFLOAD_QINQ_STRIP  0x00000020
 
 /**
  * TX offload capabilities of a device.
@@ -899,6 +900,7 @@ struct rte_eth_conf {
 #define DEV_TX_OFFLOAD_TCP_TSO     0x00000020
 #define DEV_TX_OFFLOAD_UDP_TSO     0x00000040
 #define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000080 /**< Used for tunneling packet. */
+#define DEV_TX_OFFLOAD_QINQ_INSERT 0x00000100
 
 struct rte_eth_dev_info {
 	struct rte_pci_device *pci_dev; /**< Device PCI information. */
-- 
1.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 0/7] support i40e QinQ stripping and insertion
  2015-06-02  3:16  3%   ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Helin Zhang
                       ` (2 preceding siblings ...)
  2015-06-08  7:40  0%     ` Olivier MATZ
@ 2015-06-11  7:03  3%     ` Helin Zhang
  2015-06-11  7:03  2%       ` [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan " Helin Zhang
  2015-06-11  7:25  0%       ` [dpdk-dev] [PATCH v3 0/7] support i40e QinQ " Wu, Jingjing
  3 siblings, 2 replies; 200+ results
From: Helin Zhang @ 2015-06-11  7:03 UTC (permalink / raw)
  To: dev

As i40e hardware can be reconfigured to support QinQ stripping
and insertion, this patch set is to enable that together with
using the reserved 16 bits in 'struct rte_mbuf' for the second
vlan tag. Corresponding command is added in testpmd for testing.
Note that no need to rework vPMD, as nothings used in it changed.

v2 changes:
* Added more commit logs of which commit it fix for.
* Fixed a typo.
* Kept the original RX/TX offload flags as they were, added new
  flags after with new bit masks, for ABI compatibility.
* Supported double vlan stripping/insertion in examples/ipv4_multicast.

v3 changes:
* update documentation (Testpmd Application User Guide).

Helin Zhang (7):
  ixgbe: remove a discarded source line
  mbuf: use the reserved 16 bits for double vlan
  i40e: support double vlan stripping and insertion
  i40evf: add supported offload capability flags
  app/testpmd: add test cases for qinq stripping and insertion
  examples/ipv4_multicast: support double vlan stripping and insertion
  doc: update testpmd command

 app/test-pmd/cmdline.c                      | 78 ++++++++++++++++++++++++---
 app/test-pmd/config.c                       | 21 +++++++-
 app/test-pmd/flowgen.c                      |  4 +-
 app/test-pmd/macfwd.c                       |  3 ++
 app/test-pmd/macswap.c                      |  3 ++
 app/test-pmd/rxonly.c                       |  3 ++
 app/test-pmd/testpmd.h                      |  6 ++-
 app/test-pmd/txonly.c                       |  8 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 14 ++++-
 drivers/net/i40e/i40e_ethdev.c              | 52 ++++++++++++++++++
 drivers/net/i40e/i40e_ethdev_vf.c           | 13 +++++
 drivers/net/i40e/i40e_rxtx.c                | 81 ++++++++++++++++++-----------
 drivers/net/ixgbe/ixgbe_rxtx.c              |  1 -
 examples/ipv4_multicast/main.c              |  1 +
 lib/librte_ether/rte_ethdev.h               |  2 +
 lib/librte_mbuf/rte_mbuf.h                  | 10 +++-
 16 files changed, 255 insertions(+), 45 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 14:32  4%           ` Olivier MATZ
  2015-06-10 14:51  0%             ` Zhang, Helin
  2015-06-10 15:39  0%             ` Ananyev, Konstantin
@ 2015-06-10 16:14  5%             ` Thomas Monjalon
  2015-06-12  7:24  5%               ` Panu Matilainen
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-10 16:14 UTC (permalink / raw)
  To: Olivier MATZ, O'Driscoll, Tim, Zhang, Helin, nhorman; +Cc: dev

2015-06-10 16:32, Olivier MATZ:
> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>> In order to unify the packet type, the field of 'packet_type' in
> >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> >>> KNI should be right mapped to 'struct rte_mbuf', it should be
> >>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> >>> by default, as 'struct rte_mbuf' changed.
> >>> To avoid breaking ABI compatibility, all the changes would be
> >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>
> >> What are the plans for this compile-time option in the future?
> >>
> >> I wonder what are the benefits of having this option in terms
> >> of ABI compatibility: when it is disabled, it is ABI-compatible but
> >> the packet-type feature is not present, and when it is enabled we
> >> have the feature but it breaks the compatibility.
> >>
> >> In my opinion, the v5 is preferable: for this kind of features, I
> >> don't see how the ABI can be preserved, and I think packet-type
> >> won't be the only feature that will modify the mbuf structure. I think
> >> the process described here should be applied:
> >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>
> >> (starting from "Some ABI changes may be too significant to reasonably
> >> maintain multiple versions of").
> >
> > This is just like the change that Steve (Cunming) Liang submitted for
> > Interrupt Mode. We have the same problem in both cases: we want to find
> > a way to get the features included, but need to comply with our ABI
> > policy. So, in both cases, the proposal is to add a config option to
> > enable the change by default, so we maintain backward compatibility.
> > Users that want these changes, and are willing to accept the
> > associated ABI change, have to specifically enable them.
> >
> > We can note in the Deprecation Notices in the Release Notes for 2.1
> > that these config options will be removed in 2.2. The features will
> > then be enabled by default.
> >
> > This seems like a good compromise which allows us to get these changes
> > into 2.1 but avoids breaking the ABI policy.
> 
> Sorry for the late answer.
> 
> After some thoughts on this topic, I understand that having a
> compile-time option is perhaps a good compromise between
> keeping compatibility and having new features earlier.
> 
> I'm just afraid about having one #ifdef in the code for
> each new feature that cannot keep the ABI compatibility.
> What do you think about having one option -- let's call
> it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
> and that would surround any new feature that breaks the
> ABI?
> 
> This would have several advantages:
> - only 2 cases (on or off), the combinatorial is smaller than
>    having one option per feature
> - all next features breaking the abi can be identified by a grep
> - the code inside the #ifdef can be enabled in a simple operation
>    by Thomas after each release.
> 
> Thomas, any comment?

As previously discussed (1to1) with Olivier, I think that's a good proposal
to introduce changes breaking deeply the ABI.

Let's sum up the current policy:
1/ For changes which have a limited impact on the ABI, the backward compatibility
must be kept during 1 release including the notice in doc/guides/rel_notes/abi.rst.
2/ For important changes like mbuf rework, there was an agreement on skipping the
backward compatibility after having 3 acknowledgements and an 1-release long notice.
Then the ABI numbering must be incremented.

This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second case.
In order to be adopted, a patch for the file doc/guides/rel_notes/abi.rst must
be submitted and strongly acknowledged.

The ABI numbering must be also clearly explained:
1/ Should we have different libraries version number depending of CONFIG_RTE_NEXT_ABI?
It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles
2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in the .map files?
Maybe we should remove these files and generate them with some preprocessing.

Neil, as the ABI policy author, what is your opinion?

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 14:32  4%           ` Olivier MATZ
  2015-06-10 14:51  0%             ` Zhang, Helin
@ 2015-06-10 15:39  0%             ` Ananyev, Konstantin
  2015-06-12  3:22  0%               ` Zhang, Helin
  2015-06-10 16:14  5%             ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-06-10 15:39 UTC (permalink / raw)
  To: Olivier MATZ, O'Driscoll, Tim, Zhang, Helin, dev

Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Wednesday, June 10, 2015 3:33 PM
> To: O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
> 
> Hi Tim, Helin,
> 
> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Monday, June 1, 2015 9:15 AM
> >> To: Zhang, Helin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> >> rte_mbuf
> >>
> >> Hi Helin,
> >>
> >> +CC Neil
> >>
> >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>> In order to unify the packet type, the field of 'packet_type' in
> >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> >>> KNI should be right mapped to 'struct rte_mbuf', it should be
> >>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> >>> by default, as 'struct rte_mbuf' changed.
> >>> To avoid breaking ABI compatibility, all the changes would be
> >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>
> >> What are the plans for this compile-time option in the future?
> >>
> >> I wonder what are the benefits of having this option in terms
> >> of ABI compatibility: when it is disabled, it is ABI-compatible but
> >> the packet-type feature is not present, and when it is enabled we
> >> have the feature but it breaks the compatibility.
> >>
> >> In my opinion, the v5 is preferable: for this kind of features, I
> >> don't see how the ABI can be preserved, and I think packet-type
> >> won't be the only feature that will modify the mbuf structure. I think
> >> the process described here should be applied:
> >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>
> >> (starting from "Some ABI changes may be too significant to reasonably
> >> maintain multiple versions of").
> >>
> >>
> >> Regards,
> >> Olivier
> >>
> >
> > This is just like the change that Steve (Cunming) Liang submitted for Interrupt Mode. We have the same problem in both cases: we
> want to find a way to get the features included, but need to comply with our ABI policy. So, in both cases, the proposal is to add a
> config option to enable the change by default, so we maintain backward compatibility. Users that want these changes, and are willing
> to accept the associated ABI change, have to specifically enable them.
> >
> > We can note in the Deprecation Notices in the Release Notes for 2.1 that these config options will be removed in 2.2. The features
> will then be enabled by default.
> >
> > This seems like a good compromise which allows us to get these changes into 2.1 but avoids breaking the ABI policy.
> 
> Sorry for the late answer.
> 
> After some thoughts on this topic, I understand that having a
> compile-time option is perhaps a good compromise between
> keeping compatibility and having new features earlier.
> 
> I'm just afraid about having one #ifdef in the code for
> each new feature that cannot keep the ABI compatibility.
> What do you think about having one option -- let's call
> it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
> and that would surround any new feature that breaks the
> ABI?

I am not Tim/Helin, but really like that idea :)
Konstantin


> 
> This would have several advantages:
> - only 2 cases (on or off), the combinatorial is smaller than
>    having one option per feature
> - all next features breaking the abi can be identified by a grep
> - the code inside the #ifdef can be enabled in a simple operation
>    by Thomas after each release.
> 
> Thomas, any comment?
> 
> Regards,
> Olivier
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 14:32  4%           ` Olivier MATZ
@ 2015-06-10 14:51  0%             ` Zhang, Helin
  2015-06-10 15:39  0%             ` Ananyev, Konstantin
  2015-06-10 16:14  5%             ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-10 14:51 UTC (permalink / raw)
  To: Olivier MATZ, O'Driscoll, Tim, dev

Hi Oliver

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, June 10, 2015 10:33 PM
> To: O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> Cc: Thomas Monjalon
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> Hi Tim, Helin,
> 
> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Monday, June 1, 2015 9:15 AM
> >> To: Zhang, Helin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> >> in rte_mbuf
> >>
> >> Hi Helin,
> >>
> >> +CC Neil
> >>
> >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>> In order to unify the packet type, the field of 'packet_type' in
> >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for KNI
> >>> should be right mapped to 'struct rte_mbuf', it should be modified
> >>> accordingly. In addition, Vector PMD of ixgbe is disabled by
> >>> default, as 'struct rte_mbuf' changed.
> >>> To avoid breaking ABI compatibility, all the changes would be
> >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>
> >> What are the plans for this compile-time option in the future?
> >>
> >> I wonder what are the benefits of having this option in terms of ABI
> >> compatibility: when it is disabled, it is ABI-compatible but the
> >> packet-type feature is not present, and when it is enabled we have
> >> the feature but it breaks the compatibility.
> >>
> >> In my opinion, the v5 is preferable: for this kind of features, I
> >> don't see how the ABI can be preserved, and I think packet-type won't
> >> be the only feature that will modify the mbuf structure. I think the
> >> process described here should be applied:
> >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>
> >> (starting from "Some ABI changes may be too significant to reasonably
> >> maintain multiple versions of").
> >>
> >>
> >> Regards,
> >> Olivier
> >>
> >
> > This is just like the change that Steve (Cunming) Liang submitted for Interrupt
> Mode. We have the same problem in both cases: we want to find a way to get
> the features included, but need to comply with our ABI policy. So, in both cases,
> the proposal is to add a config option to enable the change by default, so we
> maintain backward compatibility. Users that want these changes, and are willing
> to accept the associated ABI change, have to specifically enable them.
> >
> > We can note in the Deprecation Notices in the Release Notes for 2.1 that these
> config options will be removed in 2.2. The features will then be enabled by
> default.
> >
> > This seems like a good compromise which allows us to get these changes into
> 2.1 but avoids breaking the ABI policy.
> 
> Sorry for the late answer.
> 
> After some thoughts on this topic, I understand that having a compile-time
> option is perhaps a good compromise between keeping compatibility and having
> new features earlier.
> 
> I'm just afraid about having one #ifdef in the code for each new feature that
> cannot keep the ABI compatibility.
> What do you think about having one option -- let's call it
> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would surround
> any new feature that breaks the ABI?
Will we allow this type of workaround for a long time? If yes, agree with your good idea.

Regards,
Helin

> 
> This would have several advantages:
> - only 2 cases (on or off), the combinatorial is smaller than
>    having one option per feature
> - all next features breaking the abi can be identified by a grep
> - the code inside the #ifdef can be enabled in a simple operation
>    by Thomas after each release.
> 
> Thomas, any comment?
> 
> Regards,
> Olivier
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-02 13:27  4%         ` O'Driscoll, Tim
@ 2015-06-10 14:32  4%           ` Olivier MATZ
  2015-06-10 14:51  0%             ` Zhang, Helin
                               ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Olivier MATZ @ 2015-06-10 14:32 UTC (permalink / raw)
  To: O'Driscoll, Tim, Zhang, Helin, dev

Hi Tim, Helin,

On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>> Sent: Monday, June 1, 2015 9:15 AM
>> To: Zhang, Helin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
>> rte_mbuf
>>
>> Hi Helin,
>>
>> +CC Neil
>>
>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>> In order to unify the packet type, the field of 'packet_type' in
>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
>>> KNI should be right mapped to 'struct rte_mbuf', it should be
>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
>>> by default, as 'struct rte_mbuf' changed.
>>> To avoid breaking ABI compatibility, all the changes would be
>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>
>> What are the plans for this compile-time option in the future?
>>
>> I wonder what are the benefits of having this option in terms
>> of ABI compatibility: when it is disabled, it is ABI-compatible but
>> the packet-type feature is not present, and when it is enabled we
>> have the feature but it breaks the compatibility.
>>
>> In my opinion, the v5 is preferable: for this kind of features, I
>> don't see how the ABI can be preserved, and I think packet-type
>> won't be the only feature that will modify the mbuf structure. I think
>> the process described here should be applied:
>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>
>> (starting from "Some ABI changes may be too significant to reasonably
>> maintain multiple versions of").
>>
>>
>> Regards,
>> Olivier
>>
>
> This is just like the change that Steve (Cunming) Liang submitted for Interrupt Mode. We have the same problem in both cases: we want to find a way to get the features included, but need to comply with our ABI policy. So, in both cases, the proposal is to add a config option to enable the change by default, so we maintain backward compatibility. Users that want these changes, and are willing to accept the associated ABI change, have to specifically enable them.
>
> We can note in the Deprecation Notices in the Release Notes for 2.1 that these config options will be removed in 2.2. The features will then be enabled by default.
>
> This seems like a good compromise which allows us to get these changes into 2.1 but avoids breaking the ABI policy.

Sorry for the late answer.

After some thoughts on this topic, I understand that having a
compile-time option is perhaps a good compromise between
keeping compatibility and having new features earlier.

I'm just afraid about having one #ifdef in the code for
each new feature that cannot keep the ABI compatibility.
What do you think about having one option -- let's call
it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
and that would surround any new feature that breaks the
ABI?

This would have several advantages:
- only 2 cases (on or off), the combinatorial is smaller than
   having one option per feature
- all next features breaking the abi can be identified by a grep
- the code inside the #ifdef can be enabled in a simple operation
   by Thomas after each release.

Thomas, any comment?

Regards,
Olivier

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD
  2015-06-08  5:28  4%           ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  2015-06-08  5:29  2%             ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-06-08  5:29 10%             ` [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-06-09 23:59  0%             ` Stephen Hemminger
  2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-06-09 23:59 UTC (permalink / raw)
  To: Cunming Liang; +Cc: dev, liang-min.wang

On Mon,  8 Jun 2015 13:28:57 +0800
Cunming Liang <cunming.liang@intel.com> wrote:

> v12 changes
>  - bsd cleanup for unused variable warning
>  - fix awkward line split in debug message
> 
> v11 changes
>  - typo cleanup and check kernel style
> 
> v10 changes
>  - code rework to return actual error code
>  - bug fix for lsc when using uio_pci_generic
> 
> v9 changes
>  - code rework to fix open comment
>  - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
>  - new patch to turn off the feature by default so as to avoid v2.1 abi broken
> 
> v8 changes
>  - remove condition check for only vfio-msix
>  - add multiplex intr support when only one intr vector allowed
>  - lsc and rxq interrupt runtime enable decision
>  - add safe event delete while the event wakeup execution happens
> 
> v7 changes
>  - decouple epoll event and intr operation
>  - add condition check in the case intr vector is disabled
>  - renaming some APIs
> 
> v6 changes
>  - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
>  - rewrite rte_intr_rx_wait/rte_intr_rx_set.
>  - using vector number instead of queue_id as interrupt API params.
>  - patch reorder and split.
> 
> v5 changes
>  - Rebase the patchset onto the HEAD
>  - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
>  - Export wait-for-rx interrupt function for shared libraries
>  - Split-off a new patch file for changed struct rte_intr_handle that
>    other patches depend on, to avoid breaking git bisect
>  - Change sample applicaiton to accomodate EAL function spec change
>    accordingly
> 
> v4 changes
>  - Export interrupt enable/disable functions for shared libraries
>  - Adjust position of new-added structure fields and functions to
>    avoid breaking ABI
>  
> v3 changes
>  - Add return value for interrupt enable/disable functions
>  - Move spinlok from PMD to L3fwd-power
>  - Remove unnecessary variables in e1000_mac_info
>  - Fix miscelleous review comments
>  
> v2 changes
>  - Fix compilation issue in Makefile for missed header file.
>  - Consolidate internal and community review comments of v1 patch set.
>  
> The patch series introduce low-latency one-shot rx interrupt into DPDK with
> polling and interrupt mode switch control example.
>  
> DPDK userspace interrupt notification and handling mechanism is based on UIO
> with below limitation:
> 1) It is designed to handle LSC interrupt only with inefficient suspended
>    pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
>    which then wakes up DPDK polling thread). In this way, it introduces
>    non-deterministic wakeup latency for DPDK polling thread as well as packet
>    latency if it is used to handle Rx interrupt.
> 2) UIO only supports a single interrupt vector which has to been shared by
>    LSC interrupt and interrupts assigned to dedicated rx queues.
>  
> This patchset includes below features:
> 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
> 2) Build on top of the VFIO mechanism instead of UIO, so it could support
>    up to 64 interrupt vectors for rx queue interrupts.
> 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
>    VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
>    user space.
> 4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
>    switch algorithms in L3fwd-power example.
> 
> Known limitations:
> 1) It does not work for UIO due to a single interrupt eventfd shared by LSC
>    and rx queue interrupt handlers causes a mess. [FIXED]
> 2) LSC interrupt is not supported by VF driver, so it is by default disabled
>    in L3fwd-power now. Feel free to turn in on if you want to support both LSC
>    and rx queue interrupts on a PF.
> 
> Cunming Liang (14):
>   eal/linux: add interrupt vectors support in intr_handle
>   eal/linux: add rte_epoll_wait/ctl support
>   eal/linux: add API to set rx interrupt event monitor
>   eal/linux: fix comments typo on vfio msi
>   eal/linux: add interrupt vectors handling on VFIO
>   eal/linux: standalone intr event fd create support
>   eal/linux: fix lsc read error in uio_pci_generic
>   eal/bsd: dummy for new intr definition
>   eal/bsd: fix inappropriate linuxapp referred in bsd
>   ethdev: add rx intr enable, disable and ctl functions
>   ixgbe: enable rx queue interrupts for both PF and VF
>   igb: enable rx queue interrupts for PF
>   l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
>     switch
>   abi: fix v2.1 abi broken issue
> 
>  drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
>  drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
>  examples/l3fwd-power/main.c                        | 206 ++++++--
>  lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  30 ++
>  .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  91 +++-
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
>  lib/librte_ether/rte_ethdev.c                      | 109 +++++
>  lib/librte_ether/rte_ethdev.h                      | 132 ++++++
>  lib/librte_ether/rte_ether_version.map             |   4 +
>  13 files changed, 1871 insertions(+), 128 deletions(-)
> 

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

This still needs more work in lots more drivers (like bonding) and in several
other subsystems (like pipeline) before it is widely useful. But this is
a great first step.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: guidelines for library statistics
@ 2015-06-08 14:50  5% Cristian Dumitrescu
  2015-06-11 12:05  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Cristian Dumitrescu @ 2015-06-08 14:50 UTC (permalink / raw)
  To: dev

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |  1 +
 doc/guides/guidelines/statistics.rst | 42 ++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..32c6020
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,42 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter support. This includes guidelines for turning library statistics on and off, requirements for preventing ABI changes when library statistics are turned on and off, etc.
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow the application to monitor the library-level run-time events. Typical counters are: number of packets received/dropped/transmitted, number of buffers allocated/freed, number of occurrences for specific events, etc.
+
+Since the resources consumed for library-level statistics counter collection have to be spent out of the application budget and the counters collected by some libraries might not be relevant for the current application, in order to avoid any unwanted waste of resources and/or performance for the application, the application is to decide at build time whether the collection of library-level statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific applications:
+* For application A, counters maintained by library X are always relevant and the application needs to use them to implement certain features, as traffic accounting, logging, application-level statistics, etc. In this case, the application requires that collection of statistics counters for library X is always turned on;
+* For application B, counters maintained by library X are only useful during the application debug stage and not relevant once debug phase is over. In this case, the application may decide to turn on the collection of library X statistics counters during the debug phase and later on turn them off;
+* For application C, counters maintained by library X are not relevant at all. It might me that the application maintains its own set of statistics counters that monitor a different set of run-time events than library X (e.g. number of connection requests, number of active users, etc). It might also be that application uses multiple libraries (library X, library Y, etc) and it is interested in the statistics counters of library Y, but not in those of library X. In this case, the application may decide to turn the collection of statistics counters off for library X and on for library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles, cache bandwidth, memory bandwidth, etc) that depends on:
+* Number of libraries used by the current application that have statistics counters collection turned on;
+* Number of statistics counters maintained by each library per object type instance (e.g. per port, table, pipeline, thread, etc);
+* Number of instances created for each object type supported by each library;
+* Complexity of the statistics logic collection for each counter: when only some occurrences of a specific event are valid, several conditional branches might involved in the decision of whether the current occurrence of the event should be counted or not (e.g. on the event of packet reception, only TCP packets with destination port within a certain range should be recorded), etc.
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that provides statistics counters should provide a single build time flag that decides whether the statistics counter collection is enabled or not for this library. This flag should be exposed as a variable within the DPDK configuration file. When this flag is set, all the counters supported by current library are collected; when this flag is cleared, none of the counters supported by the current library are collected:
+
+	#DPDK file “./config/common_linuxapp”, “./config/common_bsdapp”, etc
+	CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_COLLECT_STATS=y/n
+
+The default value for this DPDK configuration file variable (either “yes” or “no”) is left at the decision of each library.
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the library API should not be affected by whether the collection of statistics counters is turned on or off for the current library. In practical terms, this means that space is always allocated in the API data structures for statistics counters and the statistics related API functions are always built into the code, regardless of whether the statistics counter collection is turned on or off for the current library.
+
+When the collection of statistics counters for the current library is turned off, the counters retrieved through the statistics related API functions should have the default value of zero.
-- 
1.8.5.3

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
  2015-06-02  3:16  3%   ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Helin Zhang
  2015-06-02  3:16  2%     ` [dpdk-dev] [PATCH v2 3/6] i40e: support double vlan " Helin Zhang
  2015-06-02  7:37  0%     ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Liu, Jijiang
@ 2015-06-08  7:40  0%     ` Olivier MATZ
  2015-06-11  7:03  3%     ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
  3 siblings, 0 replies; 200+ results
From: Olivier MATZ @ 2015-06-08  7:40 UTC (permalink / raw)
  To: Helin Zhang, dev

Hi Helin,

On 06/02/2015 05:16 AM, Helin Zhang wrote:
> As i40e hardware can be reconfigured to support QinQ stripping and
> insertion, this patch set is to enable that together with using the
> reserved 16 bits in 'struct rte_mbuf' for the second vlan tag.
> Corresponding command is added in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
  2015-06-02  7:37  0%     ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Liu, Jijiang
@ 2015-06-08  7:32  0%       ` Cao, Min
  0 siblings, 0 replies; 200+ results
From: Cao, Min @ 2015-06-08  7:32 UTC (permalink / raw)
  To: Liu, Jijiang, Zhang, Helin, dev

Tested-by: Min Cao <min.cao@intel.com>

- OS: Fedora20  3.11.10-301
- GCC: gcc version 4.8.2 20131212
- CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
- NIC: Ethernet controller: Intel Corporation Device 1572 (rev 01)
- Default x86_64-native-linuxapp-gcc configuration
- Total 2 cases, 2 passed, 0 failed

- Case: double vlan filter
- Case: double vlan insertion 

> -----Original Message-----
> From: Zhang, Helin
> Sent: Tuesday, June 2, 2015 11:16 AM
> To: dev@dpdk.org
> Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; 
> Richardson, Bruce; olivier.matz@6wind.com; Zhang, Helin
> Subject: [PATCH v2 0/6] support i40e QinQ stripping and insertion
> 
> As i40e hardware can be reconfigured to support QinQ stripping and 
> insertion, this patch set is to enable that together with using the 
> reserved 16 bits in 'struct rte_mbuf' for the second vlan tag.
> Corresponding command is added in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.
> 
> Helin Zhang (6):
>   ixgbe: remove a discarded source line
>   mbuf: use the reserved 16 bits for double vlan
>   i40e: support double vlan stripping and insertion
>   i40evf: add supported offload capability flags
>   app/testpmd: add test cases for qinq stripping and insertion
>   examples/ipv4_multicast: support double vlan stripping and insertion
> 
>  app/test-pmd/cmdline.c            | 78 +++++++++++++++++++++++++++++++++----
>  app/test-pmd/config.c             | 21 +++++++++-
>  app/test-pmd/flowgen.c            |  4 +-
>  app/test-pmd/macfwd.c             |  3 ++
>  app/test-pmd/macswap.c            |  3 ++
>  app/test-pmd/rxonly.c             |  3 ++
>  app/test-pmd/testpmd.h            |  6 ++-
>  app/test-pmd/txonly.c             |  8 +++-
>  drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev_vf.c | 13 +++++++
>  drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
>  drivers/net/ixgbe/ixgbe_rxtx.c    |  1 -
>  examples/ipv4_multicast/main.c    |  1 +
>  lib/librte_ether/rte_ethdev.h     |  2 +
>  lib/librte_mbuf/rte_mbuf.h        | 10 ++++-
>  15 files changed, 243 insertions(+), 43 deletions(-)
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue
  2015-06-08  5:28  4%           ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  2015-06-08  5:29  2%             ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-06-08  5:29 10%             ` Cunming Liang
  2015-06-09 23:59  0%             ` [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD Stephen Hemminger
  2 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-08  5:29 UTC (permalink / raw)
  To: dev, shemming; +Cc: liang-min.wang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The usrs should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 v9 Acked-by: vincent jardin <vincent.jardin@6wind.com>

 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1894,6 +1913,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3750,6 +3770,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3791,6 +3812,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3800,18 +3822,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_EAL_RX_INTR
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3868,6 +3893,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index bcec971..3a70af6 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1481,7 +1487,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1514,6 +1522,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1532,6 +1541,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1619,9 +1629,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1727,12 +1739,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2335,6 +2349,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_EAL_RX_INTR
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2345,6 +2360,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3127,7 +3143,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3160,6 +3178,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3177,7 +3196,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3223,19 +3242,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3246,11 +3269,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3834,6 +3859,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3902,21 +3928,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3927,6 +3957,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3937,18 +3968,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4000,6 +4034,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 538bb93..3b4054c 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,6 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
 	},
 };
 
@@ -889,7 +888,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index ba4640a..11dc59b 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -51,9 +51,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index d7a5403..f81d553 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_EAL_RX_INTR
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_EAL_RX_INTR
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -876,6 +884,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -918,6 +927,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1056,6 +1066,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1168,3 +1179,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 912cc50..a2056bd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_EAL_RX_INTR
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_EAL_RX_INTR
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 27a87f5..3f6e1f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3352,6 +3353,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c199d32..8bea68d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_EAL_RX_INTR
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2943,8 +2945,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2967,9 +2981,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions
  2015-06-08  5:28  4%           ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
@ 2015-06-08  5:29  2%             ` Cunming Liang
  2015-06-08  5:29 10%             ` [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue Cunming Liang
  2015-06-09 23:59  0%             ` [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD Stephen Hemminger
  2 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-08  5:29 UTC (permalink / raw)
  To: dev, shemming; +Cc: liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..27a87f5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1386,6 +1396,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2868,6 +2882,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD
  2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
                             ` (2 preceding siblings ...)
  2015-06-05  8:59  0%           ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Zhou, Danny
@ 2015-06-08  5:28  4%           ` Cunming Liang
  2015-06-08  5:29  2%             ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
                               ` (2 more replies)
  3 siblings, 3 replies; 200+ results
From: Cunming Liang @ 2015-06-08  5:28 UTC (permalink / raw)
  To: dev, shemming; +Cc: liang-min.wang

v12 changes
 - bsd cleanup for unused variable warning
 - fix awkward line split in debug message

v11 changes
 - typo cleanup and check kernel style

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by default so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (14):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  eal/bsd: fix inappropriate linuxapp referred in bsd
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 206 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  30 ++
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  91 +++-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1871 insertions(+), 128 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 2/7] eal: memzone allocated by malloc
  @ 2015-06-06 10:32  1%   ` Sergio Gonzalez Monroy
  0 siblings, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-06 10:32 UTC (permalink / raw)
  To: dev

In the current memory hierarchy, memsegs are groups of physically contiguous
hugepages, memzones are slices of memsegs and malloc further slices memzones
into smaller memory chunks.

This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.

It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c        | 273 ++++++----------------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   2 +-
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/include/rte_memory.h        |   1 +
 lib/librte_eal/common/malloc_elem.c               |  68 ++++--
 lib/librte_eal/common/malloc_elem.h               |  14 +-
 lib/librte_eal/common/malloc_heap.c               | 140 ++++++-----
 lib/librte_eal/common/malloc_heap.h               |   6 +-
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 9 files changed, 197 insertions(+), 317 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 888f9e5..742f6c9 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -50,11 +50,10 @@
 #include <rte_string_fns.h>
 #include <rte_common.h>
 
+#include "malloc_heap.h"
+#include "malloc_elem.h"
 #include "eal_private.h"
 
-/* internal copy of free memory segments */
-static struct rte_memseg *free_memseg = NULL;
-
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
@@ -68,8 +67,9 @@ memzone_lookup_thread_unsafe(const char *name)
 	 * the algorithm is not optimal (linear), but there are few
 	 * zones and this function should be called at init only
 	 */
-	for (i = 0; i < RTE_MAX_MEMZONE && mcfg->memzone[i].addr != NULL; i++) {
-		if (!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
+	for (i = 0; i < RTE_MAX_MEMZONE; i++) {
+		if (mcfg->memzone[i].addr != NULL &&
+				!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
 			return &mcfg->memzone[i];
 	}
 
@@ -88,39 +88,45 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 			len, socket_id, flags, RTE_CACHE_LINE_SIZE);
 }
 
-/*
- * Helper function for memzone_reserve_aligned_thread_unsafe().
- * Calculate address offset from the start of the segment.
- * Align offset in that way that it satisfy istart alignmnet and
- * buffer of the  requested length would not cross specified boundary.
- */
-static inline phys_addr_t
-align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align,
-	size_t bound)
+/* Find the heap with the greatest free block size */
+static void
+find_heap_max_free_elem(int *s, size_t *len, unsigned align)
 {
-	phys_addr_t addr_offset, bmask, end, start;
-	size_t step;
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-	step = RTE_MAX(align, bound);
-	bmask = ~((phys_addr_t)bound - 1);
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* calculate offset to closest alignment */
-	start = RTE_ALIGN_CEIL(ms->phys_addr, align);
-	addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size > *len) {
+			*len = stats.greatest_free_size;
+			*s = i;
+		}
+	}
+	*len -= (MALLOC_ELEM_OVERHEAD + align);
+}
 
-	while (addr_offset + len < ms->len) {
+/* Find a heap that can allocate the requested size */
+static void
+find_heap_suitable(int *s, size_t len, unsigned align)
+{
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-		/* check, do we meet boundary condition */
-		end = start + len - (len != 0);
-		if ((start & bmask) == (end & bmask))
-			break;
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-		/* calculate next offset */
-		start = RTE_ALIGN_CEIL(start + 1, step);
-		addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size >= len + MALLOC_ELEM_OVERHEAD + align) {
+			*s = i;
+			break;
+		}
 	}
-
-	return (addr_offset);
 }
 
 static const struct rte_memzone *
@@ -128,13 +134,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
 	struct rte_mem_config *mcfg;
-	unsigned i = 0;
-	int memseg_idx = -1;
-	uint64_t addr_offset, seg_offset = 0;
 	size_t requested_len;
-	size_t memseg_len = 0;
-	phys_addr_t memseg_physaddr;
-	void *memseg_addr;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -166,7 +166,6 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (align < RTE_CACHE_LINE_SIZE)
 		align = RTE_CACHE_LINE_SIZE;
 
-
 	/* align length on cache boundary. Check for overflow before doing so */
 	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
 		rte_errno = EINVAL; /* requested size too big */
@@ -180,129 +179,50 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	requested_len = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE,  len);
 
 	/* check that boundary condition is valid */
-	if (bound != 0 &&
-			(requested_len > bound || !rte_is_power_of_2(bound))) {
+	if (bound != 0 && (requested_len > bound || !rte_is_power_of_2(bound))) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	/* find the smallest segment matching requirements */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		/* last segment */
-		if (free_memseg[i].addr == NULL)
-			break;
+	if (len == 0) {
+		if (bound != 0)
+			requested_len = bound;
+		else
+			requested_len = 0;
+	}
 
-		/* empty segment, skip it */
-		if (free_memseg[i].len == 0)
-			continue;
-
-		/* bad socket ID */
-		if (socket_id != SOCKET_ID_ANY &&
-		    free_memseg[i].socket_id != SOCKET_ID_ANY &&
-		    socket_id != free_memseg[i].socket_id)
-			continue;
-
-		/*
-		 * calculate offset to closest alignment that
-		 * meets boundary conditions.
-		 */
-		addr_offset = align_phys_boundary(free_memseg + i,
-			requested_len, align, bound);
-
-		/* check len */
-		if ((requested_len + addr_offset) > free_memseg[i].len)
-			continue;
-
-		/* check flags for hugepage sizes */
-		if ((flags & RTE_MEMZONE_2MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_1G)
-			continue;
-		if ((flags & RTE_MEMZONE_1GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_2M)
-			continue;
-		if ((flags & RTE_MEMZONE_16MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16G)
-			continue;
-		if ((flags & RTE_MEMZONE_16GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16M)
-			continue;
-
-		/* this segment is the best until now */
-		if (memseg_idx == -1) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-		/* find the biggest contiguous zone */
-		else if (len == 0) {
-			if (free_memseg[i].len > memseg_len) {
-				memseg_idx = i;
-				memseg_len = free_memseg[i].len;
-				seg_offset = addr_offset;
-			}
-		}
-		/*
-		 * find the smallest (we already checked that current
-		 * zone length is > len
-		 */
-		else if (free_memseg[i].len + align < memseg_len ||
-				(free_memseg[i].len <= memseg_len + align &&
-				addr_offset < seg_offset)) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
+	if (socket_id == SOCKET_ID_ANY) {
+		if (requested_len == 0)
+			find_heap_max_free_elem(&socket_id, &requested_len, align);
+		else
+			find_heap_suitable(&socket_id, requested_len, align);
+
+		if (socket_id == SOCKET_ID_ANY) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
 	}
 
-	/* no segment found */
-	if (memseg_idx == -1) {
-		/*
-		 * If RTE_MEMZONE_SIZE_HINT_ONLY flag is specified,
-		 * try allocating again without the size parameter otherwise -fail.
-		 */
-		if ((flags & RTE_MEMZONE_SIZE_HINT_ONLY)  &&
-		    ((flags & RTE_MEMZONE_1GB) || (flags & RTE_MEMZONE_2MB)
-		|| (flags & RTE_MEMZONE_16MB) || (flags & RTE_MEMZONE_16GB)))
-			return memzone_reserve_aligned_thread_unsafe(name,
-				len, socket_id, 0, align, bound);
-
+	/* allocate memory on heap */
+	void *mz_addr = malloc_heap_alloc(&mcfg->malloc_heaps[socket_id], NULL,
+			requested_len, flags, align, bound);
+	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
 
-	/* save aligned physical and virtual addresses */
-	memseg_physaddr = free_memseg[memseg_idx].phys_addr + seg_offset;
-	memseg_addr = RTE_PTR_ADD(free_memseg[memseg_idx].addr,
-			(uintptr_t) seg_offset);
-
-	/* if we are looking for a biggest memzone */
-	if (len == 0) {
-		if (bound == 0)
-			requested_len = memseg_len - seg_offset;
-		else
-			requested_len = RTE_ALIGN_CEIL(memseg_physaddr + 1,
-				bound) - memseg_physaddr;
-	}
-
-	/* set length to correct value */
-	len = (size_t)seg_offset + requested_len;
-
-	/* update our internal state */
-	free_memseg[memseg_idx].len -= len;
-	free_memseg[memseg_idx].phys_addr += len;
-	free_memseg[memseg_idx].addr =
-		(char *)free_memseg[memseg_idx].addr + len;
+	const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
 
 	/* fill the zone in config */
 	struct rte_memzone *mz = &mcfg->memzone[mcfg->memzone_idx++];
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
-	mz->phys_addr = memseg_physaddr;
-	mz->addr = memseg_addr;
-	mz->len = requested_len;
-	mz->hugepage_sz = free_memseg[memseg_idx].hugepage_sz;
-	mz->socket_id = free_memseg[memseg_idx].socket_id;
+	mz->phys_addr = rte_malloc_virt2phy(mz_addr);
+	mz->addr = mz_addr;
+	mz->len = (requested_len == 0? elem->size: requested_len);
+	mz->hugepage_sz = elem->ms->hugepage_sz;
+	mz->socket_id = elem->ms->socket_id;
 	mz->flags = 0;
-	mz->memseg_id = memseg_idx;
+	mz->memseg_id = elem->ms - rte_eal_get_configuration()->mem_config->memseg;
 
 	return mz;
 }
@@ -419,45 +339,6 @@ rte_memzone_dump(FILE *f)
 }
 
 /*
- * called by init: modify the free memseg list to have cache-aligned
- * addresses and cache-aligned lengths
- */
-static int
-memseg_sanitize(struct rte_memseg *memseg)
-{
-	unsigned phys_align;
-	unsigned virt_align;
-	unsigned off;
-
-	phys_align = memseg->phys_addr & RTE_CACHE_LINE_MASK;
-	virt_align = (unsigned long)memseg->addr & RTE_CACHE_LINE_MASK;
-
-	/*
-	 * sanity check: phys_addr and addr must have the same
-	 * alignment
-	 */
-	if (phys_align != virt_align)
-		return -1;
-
-	/* memseg is really too small, don't bother with it */
-	if (memseg->len < (2 * RTE_CACHE_LINE_SIZE)) {
-		memseg->len = 0;
-		return 0;
-	}
-
-	/* align start address */
-	off = (RTE_CACHE_LINE_SIZE - phys_align) & RTE_CACHE_LINE_MASK;
-	memseg->phys_addr += off;
-	memseg->addr = (char *)memseg->addr + off;
-	memseg->len -= off;
-
-	/* align end address */
-	memseg->len &= ~((uint64_t)RTE_CACHE_LINE_MASK);
-
-	return 0;
-}
-
-/*
  * Init the memzone subsystem
  */
 int
@@ -465,14 +346,10 @@ rte_eal_memzone_init(void)
 {
 	struct rte_mem_config *mcfg;
 	const struct rte_memseg *memseg;
-	unsigned i = 0;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* mirror the runtime memsegs from config */
-	free_memseg = mcfg->free_memseg;
-
 	/* secondary processes don't need to initialise anything */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		return 0;
@@ -485,33 +362,13 @@ rte_eal_memzone_init(void)
 
 	rte_rwlock_write_lock(&mcfg->mlock);
 
-	/* fill in uninitialized free_memsegs */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (memseg[i].addr == NULL)
-			break;
-		if (free_memseg[i].addr != NULL)
-			continue;
-		memcpy(&free_memseg[i], &memseg[i], sizeof(struct rte_memseg));
-	}
-
-	/* make all zones cache-aligned */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (free_memseg[i].addr == NULL)
-			break;
-		if (memseg_sanitize(&free_memseg[i]) < 0) {
-			RTE_LOG(ERR, EAL, "%s(): Sanity check failed\n", __func__);
-			rte_rwlock_write_unlock(&mcfg->mlock);
-			return -1;
-		}
-	}
-
 	/* delete all zones */
 	mcfg->memzone_idx = 0;
 	memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
 
 	rte_rwlock_write_unlock(&mcfg->mlock);
 
-	return 0;
+	return rte_eal_malloc_heap_init();
 }
 
 /* Walk all reserved memory zones */
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 34f5abc..055212a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -73,7 +73,7 @@ struct rte_mem_config {
 	struct rte_memseg memseg[RTE_MAX_MEMSEG];    /**< Physmem descriptors. */
 	struct rte_memzone memzone[RTE_MAX_MEMZONE]; /**< Memzone descriptors. */
 
-	/* Runtime Physmem descriptors. */
+	/* Runtime Physmem descriptors - NOT USED */
 	struct rte_memseg free_memseg[RTE_MAX_MEMSEG];
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index 716216f..b270356 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -40,7 +40,7 @@
 #include <rte_memory.h>
 
 /* Number of free lists per heap, grouped by size. */
-#define RTE_HEAP_NUM_FREELISTS  5
+#define RTE_HEAP_NUM_FREELISTS  13
 
 /**
  * Structure to hold malloc heap
@@ -48,7 +48,6 @@
 struct malloc_heap {
 	rte_spinlock_t lock;
 	LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
-	unsigned mz_count;
 	unsigned alloc_count;
 	size_t total_size;
 } __rte_cache_aligned;
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 7f8103f..675b630 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -100,6 +100,7 @@ struct rte_memseg {
 	 /**< store segment MFNs */
 	uint64_t mfn[DOM0_NUM_MEMBLOCK];
 #endif
+	uint8_t used;               /**< Used by a heap */
 } __attribute__((__packed__));
 
 /**
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index a5e1248..b54ee33 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -37,7 +37,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
 #include <rte_per_lcore.h>
@@ -56,10 +55,10 @@
  */
 void
 malloc_elem_init(struct malloc_elem *elem,
-		struct malloc_heap *heap, const struct rte_memzone *mz, size_t size)
+		struct malloc_heap *heap, const struct rte_memseg *ms, size_t size)
 {
 	elem->heap = heap;
-	elem->mz = mz;
+	elem->ms = ms;
 	elem->prev = NULL;
 	memset(&elem->free_list, 0, sizeof(elem->free_list));
 	elem->state = ELEM_FREE;
@@ -70,12 +69,12 @@ malloc_elem_init(struct malloc_elem *elem,
 }
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
 {
-	malloc_elem_init(elem, prev->heap, prev->mz, 0);
+	malloc_elem_init(elem, prev->heap, prev->ms, 0);
 	elem->prev = prev;
 	elem->state = ELEM_BUSY; /* mark busy so its never merged */
 }
@@ -86,12 +85,24 @@ malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
  * fit, return NULL.
  */
 static void *
-elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
+elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	const uintptr_t end_pt = (uintptr_t)elem +
+	const size_t bmask = ~(bound - 1);
+	uintptr_t end_pt = (uintptr_t)elem +
 			elem->size - MALLOC_ELEM_TRAILER_LEN;
-	const uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
-	const uintptr_t new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
+	uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+	uintptr_t new_elem_start;
+
+	/* check boundary */
+	if ((new_data_start & bmask) != ((end_pt - 1) & bmask)) {
+		end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
+		new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+		if (((end_pt - 1) & bmask) != (new_data_start & bmask))
+			return NULL;
+	}
+
+	new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
 
 	/* if the new start point is before the exist start, it won't fit */
 	return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
@@ -102,9 +113,10 @@ elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
  * alignment request from the current element
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,	unsigned align,
+		size_t bound)
 {
-	return elem_start_pt(elem, size, align) != NULL;
+	return elem_start_pt(elem, size, align, bound) != NULL;
 }
 
 /*
@@ -115,10 +127,10 @@ static void
 split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt)
 {
 	struct malloc_elem *next_elem = RTE_PTR_ADD(elem, elem->size);
-	const unsigned old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
-	const unsigned new_elem_size = elem->size - old_elem_size;
+	const size_t old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
+	const size_t new_elem_size = elem->size - old_elem_size;
 
-	malloc_elem_init(split_pt, elem->heap, elem->mz, new_elem_size);
+	malloc_elem_init(split_pt, elem->heap, elem->ms, new_elem_size);
 	split_pt->prev = elem;
 	next_elem->prev = split_pt;
 	elem->size = old_elem_size;
@@ -168,8 +180,9 @@ malloc_elem_free_list_index(size_t size)
 void
 malloc_elem_free_list_insert(struct malloc_elem *elem)
 {
-	size_t idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
+	size_t idx;
 
+	idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
 	elem->state = ELEM_FREE;
 	LIST_INSERT_HEAD(&elem->heap->free_head[idx], elem, free_list);
 }
@@ -190,12 +203,26 @@ elem_free_list_remove(struct malloc_elem *elem)
  * is not done here, as it's done there previously.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	struct malloc_elem *new_elem = elem_start_pt(elem, size, align);
-	const unsigned old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
+	const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	const size_t trailer_size = elem->size - old_elem_size - size -
+		MALLOC_ELEM_OVERHEAD;
+
+	elem_free_list_remove(elem);
 
-	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE){
+	if (trailer_size > MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+		/* split it, too much free space after elem */
+		struct malloc_elem *new_free_elem =
+				RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
+
+		split_elem(elem, new_free_elem);
+		malloc_elem_free_list_insert(new_free_elem);
+	}
+
+	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
 		/* don't split it, pad the element instead */
 		elem->state = ELEM_BUSY;
 		elem->pad = old_elem_size;
@@ -208,8 +235,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 			new_elem->size = elem->size - elem->pad;
 			set_header(new_elem);
 		}
-		/* remove element from free list */
-		elem_free_list_remove(elem);
 
 		return new_elem;
 	}
@@ -219,7 +244,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 	 * Re-insert original element, in case its new size makes it
 	 * belong on a different list.
 	 */
-	elem_free_list_remove(elem);
 	split_elem(elem, new_elem);
 	new_elem->state = ELEM_BUSY;
 	malloc_elem_free_list_insert(elem);
diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h
index 9790b1a..e05d2ea 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -47,9 +47,9 @@ enum elem_state {
 
 struct malloc_elem {
 	struct malloc_heap *heap;
-	struct malloc_elem *volatile prev;      /* points to prev elem in memzone */
+	struct malloc_elem *volatile prev;      /* points to prev elem in memseg */
 	LIST_ENTRY(malloc_elem) free_list;      /* list of free elements in heap */
-	const struct rte_memzone *mz;
+	const struct rte_memseg *ms;
 	volatile enum elem_state state;
 	uint32_t pad;
 	size_t size;
@@ -136,11 +136,11 @@ malloc_elem_from_data(const void *data)
 void
 malloc_elem_init(struct malloc_elem *elem,
 		struct malloc_heap *heap,
-		const struct rte_memzone *mz,
+		const struct rte_memseg *ms,
 		size_t size);
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem,
@@ -151,14 +151,16 @@ malloc_elem_mkend(struct malloc_elem *elem,
  * of the requested size and with the requested alignment
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * reserve a block of data in an existing malloc_elem. If the malloc_elem
  * is much larger than the data block requested, we split the element in two.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_alloc(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * free a malloc_elem block by adding it to the free list. If the
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index defb903..4a423c1 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -39,7 +39,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_launch.h>
@@ -54,123 +53,105 @@
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
-/* since the memzone size starts with a digit, it will appear unquoted in
- * rte_config.h, so quote it so it can be passed to rte_str_to_size */
-#define MALLOC_MEMZONE_SIZE RTE_STR(RTE_MALLOC_MEMZONE_SIZE)
-
-/*
- * returns the configuration setting for the memzone size as a size_t value
- */
-static inline size_t
-get_malloc_memzone_size(void)
+static unsigned
+check_hugepage_sz(unsigned flags, size_t hugepage_sz)
 {
-	return rte_str_to_size(MALLOC_MEMZONE_SIZE);
+	unsigned ret = 1;
+
+	if ((flags & RTE_MEMZONE_2MB) && hugepage_sz == RTE_PGSIZE_1G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_1GB) && hugepage_sz == RTE_PGSIZE_2M)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16MB) && hugepage_sz == RTE_PGSIZE_16G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16GB) && hugepage_sz == RTE_PGSIZE_16M)
+		ret = 0;
+
+	return ret;
 }
 
 /*
- * reserve an extra memory zone and make it available for use by a particular
- * heap. This reserves the zone and sets a dummy malloc_elem header at the end
+ * Expand the heap with a memseg.
+ * This reserves the zone and sets a dummy malloc_elem header at the end
  * to prevent overflow. The rest of the zone is added to free list as a single
  * large free block
  */
-static int
-malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
+static void
+malloc_heap_add_memseg(struct malloc_heap *heap, struct rte_memseg *ms)
 {
-	const unsigned mz_flags = 0;
-	const size_t block_size = get_malloc_memzone_size();
-	/* ensure the data we want to allocate will fit in the memzone */
-	const size_t min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
-	const struct rte_memzone *mz = NULL;
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	unsigned numa_socket = heap - mcfg->malloc_heaps;
-
-	size_t mz_size = min_size;
-	if (mz_size < block_size)
-		mz_size = block_size;
-
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	snprintf(mz_name, sizeof(mz_name), "MALLOC_S%u_HEAP_%u",
-		     numa_socket, heap->mz_count++);
-
-	/* try getting a block. if we fail and we don't need as big a block
-	 * as given in the config, we can shrink our request and try again
-	 */
-	do {
-		mz = rte_memzone_reserve(mz_name, mz_size, numa_socket,
-					 mz_flags);
-		if (mz == NULL)
-			mz_size /= 2;
-	} while (mz == NULL && mz_size > min_size);
-	if (mz == NULL)
-		return -1;
-
 	/* allocate the memory block headers, one at end, one at start */
-	struct malloc_elem *start_elem = (struct malloc_elem *)mz->addr;
-	struct malloc_elem *end_elem = RTE_PTR_ADD(mz->addr,
-			mz_size - MALLOC_ELEM_OVERHEAD);
+	struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
+	struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr,
+			ms->len - MALLOC_ELEM_OVERHEAD);
 	end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
 
-	const unsigned elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
-	malloc_elem_init(start_elem, heap, mz, elem_size);
+	const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
+	malloc_elem_init(start_elem, heap, ms, elem_size);
 	malloc_elem_mkend(end_elem, start_elem);
 	malloc_elem_free_list_insert(start_elem);
 
-	/* increase heap total size by size of new memzone */
-	heap->total_size+=mz_size - MALLOC_ELEM_OVERHEAD;
-	return 0;
+	heap->total_size += elem_size;
 }
 
 /*
  * Iterates through the freelist for a heap to find a free element
  * which can store data of the required size and with the requested alignment.
+ * If size is 0, find the biggest available elem.
  * Returns null on failure, or pointer to element on success.
  */
 static struct malloc_elem *
-find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
+find_suitable_element(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
 	size_t idx;
-	struct malloc_elem *elem;
+	struct malloc_elem *elem, *alt_elem = NULL;
 
 	for (idx = malloc_elem_free_list_index(size);
-		idx < RTE_HEAP_NUM_FREELISTS; idx++)
-	{
+			idx < RTE_HEAP_NUM_FREELISTS; idx++) {
 		for (elem = LIST_FIRST(&heap->free_head[idx]);
-			!!elem; elem = LIST_NEXT(elem, free_list))
-		{
-			if (malloc_elem_can_hold(elem, size, align))
-				return elem;
+				!!elem; elem = LIST_NEXT(elem, free_list)) {
+			if (malloc_elem_can_hold(elem, size, align, bound)) {
+				if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
+					return elem;
+				else
+					alt_elem = elem;
+			}
 		}
 	}
+
+	if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		return alt_elem;
+
 	return NULL;
 }
 
 /*
- * Main function called by malloc to allocate a block of memory from the
- * heap. It locks the free list, scans it, and adds a new memzone if the
- * scan fails. Once the new memzone is added, it re-scans and should return
+ * Main function to allocate a block of memory from the heap.
+ * It locks the free list, scans it, and adds a new memseg if the
+ * scan fails. Once the new memseg is added, it re-scans and should return
  * the new element after releasing the lock.
  */
 void *
 malloc_heap_alloc(struct malloc_heap *heap,
-		const char *type __attribute__((unused)), size_t size, unsigned align)
+		const char *type __attribute__((unused)), size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
+	struct malloc_elem *elem;
+
 	size = RTE_CACHE_LINE_ROUNDUP(size);
 	align = RTE_CACHE_LINE_ROUNDUP(align);
+
 	rte_spinlock_lock(&heap->lock);
-	struct malloc_elem *elem = find_suitable_element(heap, size, align);
-	if (elem == NULL){
-		if ((malloc_heap_add_memzone(heap, size, align)) == 0)
-			elem = find_suitable_element(heap, size, align);
-	}
 
-	if (elem != NULL){
-		elem = malloc_elem_alloc(elem, size, align);
+	elem = find_suitable_element(heap, size, flags, align, bound);
+	if (elem != NULL) {
+		elem = malloc_elem_alloc(elem, size, align, bound);
 		/* increase heap's count of allocated elements */
 		heap->alloc_count++;
 	}
 	rte_spinlock_unlock(&heap->lock);
-	return elem == NULL ? NULL : (void *)(&elem[1]);
 
+	return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
 /*
@@ -207,3 +188,20 @@ malloc_heap_get_stats(const struct malloc_heap *heap,
 	return 0;
 }
 
+int
+rte_eal_malloc_heap_init(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned ms_cnt;
+	struct rte_memseg *ms;
+
+	if (mcfg == NULL)
+		return -1;
+
+	for (ms = &mcfg->memseg[0], ms_cnt = 0;
+			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
+			ms_cnt++, ms++)
+		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+
+	return 0;
+}
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a47136d..3ccbef0 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -53,15 +53,15 @@ malloc_get_numa_socket(void)
 }
 
 void *
-malloc_heap_alloc(struct malloc_heap *heap, const char *type,
-		size_t size, unsigned align);
+malloc_heap_alloc(struct malloc_heap *heap,	const char *type, size_t size,
+		unsigned flags, size_t align, size_t bound);
 
 int
 malloc_heap_get_stats(const struct malloc_heap *heap,
 		struct rte_malloc_socket_stats *socket_stats);
 
 int
-rte_eal_heap_memzone_init(void);
+rte_eal_malloc_heap_init(void);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index c313a57..54c2bd8 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -39,7 +39,6 @@
 
 #include <rte_memcpy.h>
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_branch_prediction.h>
@@ -87,7 +86,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 		return NULL;
 
 	ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
-				size, align == 0 ? 1 : align);
+				size, 0, align == 0 ? 1 : align, 0);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
@@ -98,7 +97,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 			continue;
 
 		ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
-					size, align == 0 ? 1 : align);
+					size, 0, align == 0 ? 1 : align, 0);
 		if (ret != NULL)
 			return ret;
 	}
@@ -256,5 +255,5 @@ rte_malloc_virt2phy(const void *addr)
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->mz->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->mz->addr);
+	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
-- 
1.9.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
  2015-06-05 14:55  2% [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros Daniel Mrzyglod
@ 2015-06-05 15:31  0% ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-05 15:31 UTC (permalink / raw)
  To: Mrzyglod, DanielX T, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Friday, June 5, 2015 3:55 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
> 
> Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
> meta-data fields.
> Forcing aligned accesses is not really required, so this is removing an
> unneeded constraint.
> This issue was met during testing of the new version of the ip_pipeline
> application. There is no performance impact.
> This change has no ABI impact, as the previous code that uses aligned
> accesses continues to run without any issues.
> 
> Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>


Ack-ed by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
@ 2015-06-05 14:55  2% Daniel Mrzyglod
  2015-06-05 15:31  0% ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Daniel Mrzyglod @ 2015-06-05 14:55 UTC (permalink / raw)
  To: dev

Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
meta-data fields.
Forcing aligned accesses is not really required, so this is removing an
unneeded constraint.
This issue was met during testing of the new version of the ip_pipeline
application. There is no performance impact.
This change has no ABI impact, as the previous code that uses aligned
accesses continues to run without any issues.

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c      |  8 --------
 lib/librte_port/rte_port.h              | 26 +++++++++++++-------------
 lib/librte_table/rte_table_array.c      |  4 +---
 lib/librte_table/rte_table_hash_ext.c   | 13 -------------
 lib/librte_table/rte_table_hash_key16.c | 24 ------------------------
 lib/librte_table/rte_table_hash_key32.c | 24 ------------------------
 lib/librte_table/rte_table_hash_key8.c  | 24 ------------------------
 lib/librte_table/rte_table_hash_lru.c   | 13 -------------
 lib/librte_table/rte_table_lpm.c        |  4 ----
 lib/librte_table/rte_table_lpm_ipv6.c   |  4 ----
 10 files changed, 14 insertions(+), 130 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 36d92c9..b777cf1 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -175,14 +175,6 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 		return -EINVAL;
 	}
 
-	/* offset_port_id */
-	if (params->offset_port_id & 0x3) {
-		RTE_LOG(ERR, PIPELINE,
-			"%s: Incorrect value for parameter offset_port_id\n",
-			__func__);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
diff --git a/lib/librte_port/rte_port.h b/lib/librte_port/rte_port.h
index d84e5a1..c3a0cca 100644
--- a/lib/librte_port/rte_port.h
+++ b/lib/librte_port/rte_port.h
@@ -54,23 +54,23 @@ extern "C" {
  * Macros to allow accessing metadata stored in the mbuf headroom
  * just beyond the end of the mbuf data structure returned by a port
  */
-#define RTE_MBUF_METADATA_UINT8(mbuf, offset)              \
-	(((uint8_t *)&(mbuf)[1])[offset])
-#define RTE_MBUF_METADATA_UINT16(mbuf, offset)             \
-	(((uint16_t *)&(mbuf)[1])[offset/sizeof(uint16_t)])
-#define RTE_MBUF_METADATA_UINT32(mbuf, offset)             \
-	(((uint32_t *)&(mbuf)[1])[offset/sizeof(uint32_t)])
-#define RTE_MBUF_METADATA_UINT64(mbuf, offset)             \
-	(((uint64_t *)&(mbuf)[1])[offset/sizeof(uint64_t)])
-
 #define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)          \
-	(&RTE_MBUF_METADATA_UINT8(mbuf, offset))
+	(&((uint8_t *) &(mbuf)[1])[offset])
 #define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)         \
-	(&RTE_MBUF_METADATA_UINT16(mbuf, offset))
+	((uint16_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
 #define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)         \
-	(&RTE_MBUF_METADATA_UINT32(mbuf, offset))
+	((uint32_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
 #define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)         \
-	(&RTE_MBUF_METADATA_UINT64(mbuf, offset))
+	((uint64_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
+
+#define RTE_MBUF_METADATA_UINT8(mbuf, offset)              \
+	(* RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT16(mbuf, offset)             \
+	(* RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT32(mbuf, offset)             \
+	(* RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT64(mbuf, offset)             \
+	(* RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset))
 /**@}*/
 
 /*
diff --git a/lib/librte_table/rte_table_array.c b/lib/librte_table/rte_table_array.c
index c031070..b00ca67 100644
--- a/lib/librte_table/rte_table_array.c
+++ b/lib/librte_table/rte_table_array.c
@@ -66,10 +66,8 @@ rte_table_array_create(void *params, int socket_id, uint32_t entry_size)
 	/* Check input parameters */
 	if ((p == NULL) ||
 	    (p->n_entries == 0) ||
-		(!rte_is_power_of_2(p->n_entries)) ||
-		((p->offset & 0x3) != 0)) {
+		(!rte_is_power_of_2(p->n_entries)))
 		return NULL;
-	}
 
 	/* Memory allocation */
 	total_cl_size = (sizeof(struct rte_table_array) +
diff --git a/lib/librte_table/rte_table_hash_ext.c b/lib/librte_table/rte_table_hash_ext.c
index 66e416b..73beeaf 100644
--- a/lib/librte_table/rte_table_hash_ext.c
+++ b/lib/librte_table/rte_table_hash_ext.c
@@ -149,19 +149,6 @@ check_params_create(struct rte_table_hash_ext_params *params)
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: signature_offset invalid value\n",
-			__func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: key_offset invalid value\n", __func__);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c
index f87ea0e..67a4249 100644
--- a/lib/librte_table/rte_table_hash_key16.c
+++ b/lib/librte_table/rte_table_hash_key16.c
@@ -89,18 +89,6 @@ check_params_create_lru(struct rte_table_hash_key16_lru_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature_offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key_offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE,
@@ -307,18 +295,6 @@ check_params_create_ext(struct rte_table_hash_key16_ext_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE,
diff --git a/lib/librte_table/rte_table_hash_key32.c b/lib/librte_table/rte_table_hash_key32.c
index 6790594..1fdb75d 100644
--- a/lib/librte_table/rte_table_hash_key32.c
+++ b/lib/librte_table/rte_table_hash_key32.c
@@ -89,18 +89,6 @@ check_params_create_lru(struct rte_table_hash_key32_lru_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
@@ -309,18 +297,6 @@ check_params_create_ext(struct rte_table_hash_key32_ext_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
diff --git a/lib/librte_table/rte_table_hash_key8.c b/lib/librte_table/rte_table_hash_key8.c
index 6803eb2..4dfa3c8 100644
--- a/lib/librte_table/rte_table_hash_key8.c
+++ b/lib/librte_table/rte_table_hash_key8.c
@@ -86,18 +86,6 @@ check_params_create_lru(struct rte_table_hash_key8_lru_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature_offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key_offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
@@ -300,18 +288,6 @@ check_params_create_ext(struct rte_table_hash_key8_ext_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature_offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key_offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
diff --git a/lib/librte_table/rte_table_hash_lru.c b/lib/librte_table/rte_table_hash_lru.c
index c9a8afd..b5393f0 100644
--- a/lib/librte_table/rte_table_hash_lru.c
+++ b/lib/librte_table/rte_table_hash_lru.c
@@ -126,19 +126,6 @@ check_params_create(struct rte_table_hash_lru_params *params)
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: signature_offset invalid value\n",
-			__func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: key_offset invalid value\n", __func__);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
diff --git a/lib/librte_table/rte_table_lpm.c b/lib/librte_table/rte_table_lpm.c
index 64c684d..3f60672 100644
--- a/lib/librte_table/rte_table_lpm.c
+++ b/lib/librte_table/rte_table_lpm.c
@@ -87,10 +87,6 @@ rte_table_lpm_create(void *params, int socket_id, uint32_t entry_size)
 			__func__);
 		return NULL;
 	}
-	if ((p->offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: Invalid offset\n", __func__);
-		return NULL;
-	}
 
 	entry_size = RTE_ALIGN(entry_size, sizeof(uint64_t));
 
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index ce4ddc0..df83ecf 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -93,10 +93,6 @@ rte_table_lpm_ipv6_create(void *params, int socket_id, uint32_t entry_size)
 			__func__);
 		return NULL;
 	}
-	if ((p->offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: Invalid offset\n", __func__);
-		return NULL;
-	}
 
 	entry_size = RTE_ALIGN(entry_size, sizeof(uint64_t));
 
-- 
2.1.0

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size
  2015-06-05  6:21  3%     ` Zhang, Helin
@ 2015-06-05 10:30  0%       ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-06-05 10:30 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Fri, Jun 05, 2015 at 06:21:52AM +0000, Zhang, Helin wrote:
> Hi Neil
> 
> Yes, thank you very much for the comments!
> I realized the ABI issue after I sent out the patch. I think even I put the new one the end of this structure, it may also have issue.
> I'd like to have this change announced and then get it merged. That means I'd like to make this change and follow the policy and process.
> 
> Regards,
> Helin
> 

Ok, sounds good.

Thanks!
Neil

> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Thursday, June 4, 2015 9:05 PM
> > To: Zhang, Helin
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key
> > size
> > 
> > On Thu, Jun 04, 2015 at 09:00:33AM +0800, Helin Zhang wrote:
> > > To support querying hash key size per port, an new field of
> > > 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> > > hash key size in bytes.
> > >
> > > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 16dbe00..004b05a 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -916,6 +916,7 @@ struct rte_eth_dev_info {
> > >  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> > >  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> > >  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> > > +	uint8_t hash_key_size; /**< Hash key size in bytes */
> > >  	uint16_t reta_size;
> > >  	/**< Device redirection table size, the total number of entries. */
> > >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> > > --
> > > 1.9.3
> > >
> > >
> > 
> > You'll need to at least move this to the end of the structure to avoid ABI breakage,
> > but even then, since the examples statically allocate this struct on the stack, you
> > need to worry about previously compiled applications not having enough space
> > allocated.  Is there a hole in the struct that this can fit into to avoid changing the
> > other member offsets?
> > Neil
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD
  2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
  2015-06-05  8:20  2%           ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-06-05  8:20 10%           ` [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-06-05  8:59  0%           ` Zhou, Danny
  2015-06-08  5:28  4%           ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  3 siblings, 0 replies; 200+ results
From: Zhou, Danny @ 2015-06-05  8:59 UTC (permalink / raw)
  To: Liang, Cunming, dev; +Cc: shemming, Wang, Liang-min

Acked-by: Danny Zhou <danny.zhou@intel.com>

> -----Original Message-----
> From: Liang, Cunming
> Sent: Friday, June 05, 2015 4:20 PM
> To: dev@dpdk.org
> Cc: shemming@brocade.com; david.marchand@6wind.com; thomas.monjalon@6wind.com; Zhou, Danny; Wang, Liang-min;
> Richardson, Bruce; Liu, Yong; nhorman@tuxdriver.com; Liang, Cunming
> Subject: [PATCH v11 00/13] Interrupt mode PMD
> 
> v11 changes
>  - typo cleanup and check kernel style
> 
> v10 changes
>  - code rework to return actual error code
>  - bug fix for lsc when using uio_pci_generic
> 
> v9 changes
>  - code rework to fix open comment
>  - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
>  - new patch to turn off the feature by default so as to avoid v2.1 abi broken
> 
> v8 changes
>  - remove condition check for only vfio-msix
>  - add multiplex intr support when only one intr vector allowed
>  - lsc and rxq interrupt runtime enable decision
>  - add safe event delete while the event wakeup execution happens
> 
> v7 changes
>  - decouple epoll event and intr operation
>  - add condition check in the case intr vector is disabled
>  - renaming some APIs
> 
> v6 changes
>  - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
>  - rewrite rte_intr_rx_wait/rte_intr_rx_set.
>  - using vector number instead of queue_id as interrupt API params.
>  - patch reorder and split.
> 
> v5 changes
>  - Rebase the patchset onto the HEAD
>  - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
>  - Export wait-for-rx interrupt function for shared libraries
>  - Split-off a new patch file for changed struct rte_intr_handle that
>    other patches depend on, to avoid breaking git bisect
>  - Change sample applicaiton to accomodate EAL function spec change
>    accordingly
> 
> v4 changes
>  - Export interrupt enable/disable functions for shared libraries
>  - Adjust position of new-added structure fields and functions to
>    avoid breaking ABI
> 
> v3 changes
>  - Add return value for interrupt enable/disable functions
>  - Move spinlok from PMD to L3fwd-power
>  - Remove unnecessary variables in e1000_mac_info
>  - Fix miscelleous review comments
> 
> v2 changes
>  - Fix compilation issue in Makefile for missed header file.
>  - Consolidate internal and community review comments of v1 patch set.
> 
> The patch series introduce low-latency one-shot rx interrupt into DPDK with
> polling and interrupt mode switch control example.
> 
> DPDK userspace interrupt notification and handling mechanism is based on UIO
> with below limitation:
> 1) It is designed to handle LSC interrupt only with inefficient suspended
>    pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
>    which then wakes up DPDK polling thread). In this way, it introduces
>    non-deterministic wakeup latency for DPDK polling thread as well as packet
>    latency if it is used to handle Rx interrupt.
> 2) UIO only supports a single interrupt vector which has to been shared by
>    LSC interrupt and interrupts assigned to dedicated rx queues.
> 
> This patchset includes below features:
> 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
> 2) Build on top of the VFIO mechanism instead of UIO, so it could support
>    up to 64 interrupt vectors for rx queue interrupts.
> 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
>    VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
>    user space.
> 4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
>    switch algorithms in L3fwd-power example.
> 
> Known limitations:
> 1) It does not work for UIO due to a single interrupt eventfd shared by LSC
>    and rx queue interrupt handlers causes a mess. [FIXED]
> 2) LSC interrupt is not supported by VF driver, so it is by default disabled
>    in L3fwd-power now. Feel free to turn in on if you want to support both LSC
>    and rx queue interrupts on a PF.
> 
> Cunming Liang (13):
>   eal/linux: add interrupt vectors support in intr_handle
>   eal/linux: add rte_epoll_wait/ctl support
>   eal/linux: add API to set rx interrupt event monitor
>   eal/linux: fix comments typo on vfio msi
>   eal/linux: add interrupt vectors handling on VFIO
>   eal/linux: standalone intr event fd create support
>   eal/linux: fix lsc read error in uio_pci_generic
>   eal/bsd: dummy for new intr definition
>   ethdev: add rx intr enable, disable and ctl functions
>   ixgbe: enable rx queue interrupts for both PF and VF
>   igb: enable rx queue interrupts for PF
>   l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
>     switch
>   abi: fix v2.1 abi broken issue
> 
>  drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
>  drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
>  examples/l3fwd-power/main.c                        | 206 ++++++--
>  lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
>  .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
>  lib/librte_ether/rte_ethdev.c                      | 109 +++++
>  lib/librte_ether/rte_ethdev.h                      | 132 ++++++
>  lib/librte_ether/rte_ether_version.map             |   4 +
>  13 files changed, 1853 insertions(+), 125 deletions(-)
> 
> --
> 1.8.1.4

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue
  2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
  2015-06-05  8:20  2%           ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-06-05  8:20 10%           ` Cunming Liang
  2015-06-05  8:59  0%           ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Zhou, Danny
  2015-06-08  5:28  4%           ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  3 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-05  8:20 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The users should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 v9 Acked-by: vincent jardin <vincent.jardin@6wind.com>

 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1894,6 +1913,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3750,6 +3770,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3791,6 +3812,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3800,18 +3822,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_EAL_RX_INTR
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3868,6 +3893,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index bcec971..3a70af6 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1481,7 +1487,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1514,6 +1522,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1532,6 +1541,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1619,9 +1629,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1727,12 +1739,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2335,6 +2349,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_EAL_RX_INTR
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2345,6 +2360,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3127,7 +3143,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3160,6 +3178,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3177,7 +3196,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3223,19 +3242,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3246,11 +3269,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3834,6 +3859,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3902,21 +3928,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3927,6 +3957,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3937,18 +3968,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4000,6 +4034,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 538bb93..3b4054c 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,6 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
 	},
 };
 
@@ -889,7 +888,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index fc2c46b..f0f6a3f 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -49,9 +49,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 300ebb1..efab896 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_EAL_RX_INTR
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_EAL_RX_INTR
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -876,6 +884,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -919,6 +928,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1057,6 +1067,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1168,3 +1179,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 912cc50..a2056bd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_EAL_RX_INTR
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_EAL_RX_INTR
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 27a87f5..3f6e1f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3352,6 +3353,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c199d32..8bea68d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_EAL_RX_INTR
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2943,8 +2945,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2967,9 +2981,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions
  2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
@ 2015-06-05  8:20  2%           ` Cunming Liang
  2015-06-05  8:20 10%           ` [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue Cunming Liang
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-05  8:20 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>

fix by http://www.dpdk.org/dev/patchwork/patch/4784/
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..27a87f5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1386,6 +1396,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2868,6 +2882,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD
  2015-06-02  6:53  4%       ` [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
  2015-06-02  6:53  2%         ` [dpdk-dev] [PATCH v10 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-06-02  6:53 10%         ` [dpdk-dev] [PATCH v10 13/13] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-06-05  8:19  4%         ` Cunming Liang
  2015-06-05  8:20  2%           ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
                             ` (3 more replies)
  2 siblings, 4 replies; 200+ results
From: Cunming Liang @ 2015-06-05  8:19 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

v11 changes
 - typo cleanup and check kernel style 

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by default so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (13):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 206 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1853 insertions(+), 125 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 0/4] enable mirror functionality in i40e driver
  @ 2015-06-05  8:16  3% ` Jingjing Wu
  0 siblings, 0 replies; 200+ results
From: Jingjing Wu @ 2015-06-05  8:16 UTC (permalink / raw)
  To: dev

This patch set enables mirror functionality in i40e driver, and redefines structure and macros used to configure mirror.
 
v2 changes:
 - correct comments style
 - add doc change
 
v3 changes:
 - change the mirror rule type to support bit mask and avoid ABI broken
 - fix code style

Jingjing Wu (4):
  ethdev: rename rte_eth_vmdq_mirror_conf
  ethdev: redefine the mirror type
  i40e: enable mirror functionality in i40e driver
  doc: modify the command about mirror in testpmd guide

 app/test-pmd/cmdline.c                      |  62 +++---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   8 +-
 drivers/net/i40e/i40e_ethdev.c              | 334 ++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev.h              |  23 ++
 drivers/net/ixgbe/ixgbe_ethdev.c            |  64 ++++--
 drivers/net/ixgbe/ixgbe_ethdev.h            |   4 +-
 lib/librte_ether/rte_ethdev.c               |  28 +--
 lib/librte_ether/rte_ethdev.h               |  30 +--
 8 files changed, 467 insertions(+), 86 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] abi: announce abi changes plan for interrupt mode
@ 2015-06-05  7:40 23% Cunming Liang
  0 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-05  7:40 UTC (permalink / raw)
  To: dev, nhorman; +Cc: shemming

It announces the planned ABI changes for interrupt mode on v2.2. 
The feature will turn off by default so as to avoid v2.1 ABI broken.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..4c9bf85 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* The ABI changes are planned for struct rte_intr_handle and struct rte_eth_conf in order to support interrupt mode feature. The upcoming release 2.1 will not contain these ABI changes by default, but release 2.2 will, and no backwards compatibility is planed due to the additional interrupt mode feature enabling. Binaries using this library build prior to version 2.2 will require updating and recompilation.
-- 
1.8.1.4

^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size
  2015-06-04 13:05  3%   ` Neil Horman
@ 2015-06-05  6:21  3%     ` Zhang, Helin
  2015-06-05 10:30  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Zhang, Helin @ 2015-06-05  6:21 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

Hi Neil

Yes, thank you very much for the comments!
I realized the ABI issue after I sent out the patch. I think even I put the new one the end of this structure, it may also have issue.
I'd like to have this change announced and then get it merged. That means I'd like to make this change and follow the policy and process.

Regards,
Helin

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Thursday, June 4, 2015 9:05 PM
> To: Zhang, Helin
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key
> size
> 
> On Thu, Jun 04, 2015 at 09:00:33AM +0800, Helin Zhang wrote:
> > To support querying hash key size per port, an new field of
> > 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> > hash key size in bytes.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 16dbe00..004b05a 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -916,6 +916,7 @@ struct rte_eth_dev_info {
> >  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> >  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> >  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> > +	uint8_t hash_key_size; /**< Hash key size in bytes */
> >  	uint16_t reta_size;
> >  	/**< Device redirection table size, the total number of entries. */
> >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> > --
> > 1.9.3
> >
> >
> 
> You'll need to at least move this to the end of the structure to avoid ABI breakage,
> but even then, since the examples statically allocate this struct on the stack, you
> need to worry about previously compiled applications not having enough space
> allocated.  Is there a hole in the struct that this can fit into to avoid changing the
> other member offsets?
> Neil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 8/9] mk, scripts: remove useless blank lines
  2015-06-04 14:43  3% [dpdk-dev] [PATCH 0/9] whitespace cleanups Stephen Hemminger
@ 2015-06-04 14:43 14% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-06-04 14:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Signed-off-by: Stephen Hemminger <stephen@neworkplumber.org>
---
 mk/rte.extapp.mk                         | 1 -
 mk/rte.extlib.mk                         | 1 -
 mk/rte.extobj.mk                         | 1 -
 mk/toolchain/gcc/rte.toolchain-compat.mk | 1 -
 mk/toolchain/icc/rte.toolchain-compat.mk | 1 -
 scripts/gen-config-h.sh                  | 1 -
 scripts/validate-abi.sh                  | 2 --
 7 files changed, 8 deletions(-)

diff --git a/mk/rte.extapp.mk b/mk/rte.extapp.mk
index 40ff82c..b4d1ef6 100644
--- a/mk/rte.extapp.mk
+++ b/mk/rte.extapp.mk
@@ -50,4 +50,3 @@ all:
 else
 include $(RTE_SDK)/mk/rte.app.mk
 endif
-
diff --git a/mk/rte.extlib.mk b/mk/rte.extlib.mk
index ac5e84f..ba066bc 100644
--- a/mk/rte.extlib.mk
+++ b/mk/rte.extlib.mk
@@ -50,4 +50,3 @@ all:
 else
 include $(RTE_SDK)/mk/rte.lib.mk
 endif
-
diff --git a/mk/rte.extobj.mk b/mk/rte.extobj.mk
index cb2f996..253de28 100644
--- a/mk/rte.extobj.mk
+++ b/mk/rte.extobj.mk
@@ -50,4 +50,3 @@ all:
 else
 include $(RTE_SDK)/mk/rte.obj.mk
 endif
-
diff --git a/mk/toolchain/gcc/rte.toolchain-compat.mk b/mk/toolchain/gcc/rte.toolchain-compat.mk
index 05aa37f..61bb5b7 100644
--- a/mk/toolchain/gcc/rte.toolchain-compat.mk
+++ b/mk/toolchain/gcc/rte.toolchain-compat.mk
@@ -84,4 +84,3 @@ else
 		MACHINE_CFLAGS := $(filter-out -march% -mtune% -msse%,$(MACHINE_CFLAGS))
 	endif
 endif
-
diff --git a/mk/toolchain/icc/rte.toolchain-compat.mk b/mk/toolchain/icc/rte.toolchain-compat.mk
index 621afcd..4134466 100644
--- a/mk/toolchain/icc/rte.toolchain-compat.mk
+++ b/mk/toolchain/icc/rte.toolchain-compat.mk
@@ -73,4 +73,3 @@ else
 		MACHINE_CFLAGS := $(patsubst -march=%,-xSSE3,$(MACHINE_CFLAGS))
 	endif
 endif
-
diff --git a/scripts/gen-config-h.sh b/scripts/gen-config-h.sh
index d36efd6..1a2436c 100755
--- a/scripts/gen-config-h.sh
+++ b/scripts/gen-config-h.sh
@@ -42,4 +42,3 @@ sed 's,CONFIG_\(.*\)=\(.*\)$,#undef \1\
 #define \1 \2,' |
 sed 's,\# CONFIG_\(.*\) is not set$,#undef \1,'
 echo "#endif /* __RTE_CONFIG_H */"
-
diff --git a/scripts/validate-abi.sh b/scripts/validate-abi.sh
index 369ea8a..1747b8b 100755
--- a/scripts/validate-abi.sh
+++ b/scripts/validate-abi.sh
@@ -241,5 +241,3 @@ done
 git reset --hard
 log "INFO" "ABI CHECK COMPLETE.  REPORTS ARE IN compat_report directory"
 cleanup_and_exit 0
-
-
-- 
2.1.4

^ permalink raw reply	[relevance 14%]

* [dpdk-dev] [PATCH 0/9] whitespace cleanups
@ 2015-06-04 14:43  3% Stephen Hemminger
  2015-06-04 14:43 14% ` [dpdk-dev] [PATCH 8/9] mk, scripts: remove useless blank lines Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-04 14:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Ran the current code base through a script which:
  - removes trailing whitespace
  - removes space before tabs
  - removes blank lines at end of file


Stephen Hemminger (9):
  kni: fix whitespace
  eal: fix whitespace
  cmdline: fix whitespace
  vhost: fix trailing whitespace
  lib: fix misc whitespace
  app: fix whitespace
  examples: fix whitespace
  mk, scripts: remove useless blank lines
  drivers: fix whitespace

 app/cmdline_test/cmdline_test.py                   |  3 +-
 app/cmdline_test/cmdline_test_data.py              |  1 -
 app/test-pmd/csumonly.c                            |  1 -
 app/test-pmd/mempool_anon.c                        |  4 +-
 app/test-pmd/testpmd.c                             |  8 ++--
 app/test/autotest.py                               |  1 -
 app/test/autotest_data.py                          | 28 ++++++-------
 app/test/autotest_runner.py                        | 41 ++++++++++---------
 app/test/autotest_test_funcs.py                    |  1 -
 app/test/process.h                                 | 12 +++---
 app/test/test_acl.h                                |  4 +-
 app/test/test_sched.c                              |  8 ++--
 drivers/net/e1000/em_rxtx.c                        |  1 -
 drivers/net/e1000/igb_rxtx.c                       |  1 -
 drivers/net/pcap/rte_eth_pcap.c                    |  2 +-
 examples/cmdline/commands.c                        |  1 -
 .../config_files/shumway/dh89xxcc_qa_dev0.conf     |  2 -
 .../config_files/shumway/dh89xxcc_qa_dev1.conf     |  2 -
 .../config_files/stargo/dh89xxcc_qa_dev0.conf      |  2 -
 examples/kni/main.c                                |  1 -
 examples/l2fwd/main.c                              |  1 -
 examples/l3fwd-power/main.c                        |  8 ++--
 .../client_server_mp/mp_server/args.c              |  1 -
 examples/netmap_compat/lib/compat_netmap.c         |  6 +--
 examples/qos_sched/app_thread.c                    |  2 -
 examples/qos_sched/args.c                          |  1 -
 examples/qos_sched/cfg_file.c                      |  2 -
 examples/qos_sched/main.c                          |  1 -
 examples/qos_sched/main.h                          |  2 +-
 examples/qos_sched/stats.c                         |  1 -
 examples/quota_watermark/qw/init.c                 |  2 +-
 examples/quota_watermark/qw/init.h                 |  1 -
 examples/vhost/main.c                              |  9 ++---
 examples/vhost_xen/main.c                          | 16 ++++----
 examples/vhost_xen/vhost_monitor.c                 |  6 +--
 lib/librte_cmdline/cmdline_cirbuf.c                |  1 -
 lib/librte_cmdline/cmdline_parse.c                 |  1 -
 lib/librte_cmdline/cmdline_rdline.c                |  1 -
 lib/librte_compat/rte_compat.h                     |  8 ++--
 lib/librte_eal/bsdapp/contigmem/contigmem.c        |  1 -
 lib/librte_eal/bsdapp/eal/Makefile                 |  1 -
 lib/librte_eal/bsdapp/eal/eal.c                    |  1 -
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  1 -
 lib/librte_eal/common/eal_common_hexdump.c         |  3 +-
 lib/librte_eal/common/eal_common_launch.c          |  1 -
 lib/librte_eal/common/eal_common_log.c             |  1 -
 lib/librte_eal/common/include/generic/rte_cycles.h |  8 ++--
 lib/librte_eal/common/include/rte_hexdump.h        | 20 +++++-----
 lib/librte_eal/common/include/rte_interrupts.h     |  1 -
 lib/librte_eal/common/include/rte_memory.h         |  2 +-
 lib/librte_eal/common/include/rte_pci.h            | 10 ++---
 lib/librte_eal/linuxapp/eal/Makefile               |  1 -
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  1 -
 .../linuxapp/kni/ethtool/igb/e1000_api.c           |  1 -
 .../linuxapp/kni/ethtool/igb/e1000_manage.c        |  2 -
 .../linuxapp/kni/ethtool/igb/e1000_mbx.c           |  1 -
 .../linuxapp/kni/ethtool/igb/e1000_nvm.c           |  2 -
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h      |  2 +-
 .../linuxapp/kni/ethtool/igb/igb_debugfs.c         |  1 -
 .../linuxapp/kni/ethtool/igb/igb_ethtool.c         | 26 ++++++------
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c |  2 +-
 .../linuxapp/kni/ethtool/igb/igb_param.c           |  3 +-
 .../linuxapp/kni/ethtool/igb/igb_procfs.c          | 46 +++++++++++-----------
 .../linuxapp/kni/ethtool/igb/igb_regtest.h         |  2 -
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_vmdq.c |  1 -
 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h  |  4 +-
 .../linuxapp/kni/ethtool/igb/kcompat_ethtool.c     | 11 +++---
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_82599.c       |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_api.c         |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_common.c      |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_main.c        |  5 ---
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_sriov.h       |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_x540.c        |  7 ++--
 .../linuxapp/kni/ethtool/ixgbe/kcompat.h           |  2 +-
 lib/librte_eal/linuxapp/kni/kni_dev.h              |  1 -
 lib/librte_eal/linuxapp/kni/kni_misc.c             |  1 -
 lib/librte_eal/linuxapp/kni/kni_vhost.c            |  3 +-
 lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c    |  2 +-
 lib/librte_hash/rte_fbk_hash.c                     |  1 -
 lib/librte_kni/rte_kni.h                           |  1 -
 lib/librte_lpm/rte_lpm.c                           |  1 -
 lib/librte_malloc/malloc_heap.c                    |  1 -
 lib/librte_vhost/libvirt/qemu-wrap.py              | 13 +++---
 lib/librte_vhost/vhost_rxtx.c                      |  2 +-
 mk/rte.extapp.mk                                   |  1 -
 mk/rte.extlib.mk                                   |  1 -
 mk/rte.extobj.mk                                   |  1 -
 mk/toolchain/gcc/rte.toolchain-compat.mk           |  1 -
 mk/toolchain/icc/rte.toolchain-compat.mk           |  1 -
 scripts/gen-config-h.sh                            |  1 -
 scripts/validate-abi.sh                            |  2 -
 91 files changed, 160 insertions(+), 242 deletions(-)

-- 
2.1.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size
  @ 2015-06-04 13:05  3%   ` Neil Horman
  2015-06-05  6:21  3%     ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-06-04 13:05 UTC (permalink / raw)
  To: Helin Zhang; +Cc: dev

On Thu, Jun 04, 2015 at 09:00:33AM +0800, Helin Zhang wrote:
> To support querying hash key size per port, an new field of
> 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> hash key size in bytes.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16dbe00..004b05a 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -916,6 +916,7 @@ struct rte_eth_dev_info {
>  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
>  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
>  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> +	uint8_t hash_key_size; /**< Hash key size in bytes */
>  	uint16_t reta_size;
>  	/**< Device redirection table size, the total number of entries. */
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> -- 
> 1.9.3
> 
> 

You'll need to at least move this to the end of the structure to avoid ABI
breakage, but even then, since the examples statically allocate this struct on
the stack, you need to worry about previously compiled applications not having
enough space allocated.  Is there a hole in the struct that this can fit into to
avoid changing the other member offsets?
Neil

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size Helin Zhang
@ 2015-06-04 10:38  3%     ` Ananyev, Konstantin
  2015-06-12  6:06  0%       ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-06-04 10:38 UTC (permalink / raw)
  To: Zhang, Helin, dev

Hi Helin,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Helin Zhang
> Sent: Thursday, June 04, 2015 8:34 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
> 
> To support querying hash key size per port, an new field of
> 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> hash key size in bytes.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> v2 changes:
> * Disabled the code changes by default, to avoid breaking ABI compatibility.
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16dbe00..bdebc87 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -916,6 +916,9 @@ struct rte_eth_dev_info {
>  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
>  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
>  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> +#ifdef RTE_QUERY_HASH_KEY_SIZE
> +	uint8_t hash_key_size; /**< Hash key size in bytes */
> +#endif
>  	uint16_t reta_size;
>  	/**< Device redirection table size, the total number of entries. */
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */

Why do you need to introduce an #ifdef RTE_QUERY_HASH_KEY_SIZE
around your code?
Why not to have it always on?
Is it because of not breaking ABI for 2.1?
But here, I suppose there would be no breakage anyway:

struct rte_eth_dev_info {
...
        uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
        uint16_t reta_size;
        /**< Device redirection table size, the total number of entries. */
        /** Bit mask of RSS offloads, the bit offset also means flow type */
        uint64_t flow_type_rss_offloads;
        struct rte_eth_rxconf default_rxconf;
 

so between 'reta_size' and 'flow_type_rss_offloads', there is a 2 bytes gap.
Wonder, why not put it there?

Konstantin

> --
> 1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/6] query hash key size in byte
    @ 2015-06-04  7:33  3% ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size Helin Zhang
                     ` (6 more replies)
  1 sibling, 7 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

As different hardware has different hash key sizes, querying it (in byte)
per port was asked by users. Otherwise there is no convenient way to know
the size of hash key which should be prepared.

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

Helin Zhang (6):
  ethdev: add an field for querying hash key size
  e1000: fill the hash key size
  fm10k: fill the hash key size
  i40e: fill the hash key size
  ixgbe: fill the hash key size
  app/testpmd: show the hash key size

 app/test-pmd/config.c             | 4 ++++
 drivers/net/e1000/igb_ethdev.c    | 5 +++++
 drivers/net/fm10k/fm10k_ethdev.c  | 3 +++
 drivers/net/i40e/i40e_ethdev.c    | 4 ++++
 drivers/net/i40e/i40e_ethdev_vf.c | 4 ++++
 drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++++
 lib/librte_ether/rte_ethdev.h     | 3 +++
 7 files changed, 28 insertions(+)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 6/6] app/testpmd: show the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (4 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

As querying hash key size in byte was supported, it can be shown
in testpmd after getting the device information if not zero.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pmd/config.c | 4 ++++
 1 file changed, 4 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f788ed5..a9ec065 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -361,6 +361,10 @@ port_infos_display(portid_t port_id)
 
 	memset(&dev_info, 0, sizeof(dev_info));
 	rte_eth_dev_info_get(port_id, &dev_info);
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	if (dev_info.hash_key_size > 0)
+		printf("Hash key size in bytes: %u\n", dev_info.hash_key_size);
+#endif
 	if (dev_info.reta_size > 0)
 		printf("Redirection table size: %u\n", dev_info.reta_size);
 	if (!dev_info.flow_type_rss_offloads)
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 5/6] ixgbe: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (3 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: show " Helin Zhang
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++++
 1 file changed, 5 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..b6f2574 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -116,6 +116,8 @@
 
 #define IXGBE_QUEUE_STAT_COUNTERS (sizeof(hw_stats->qprc) / sizeof(hw_stats->qprc[0]))
 
+#define IXGBE_HKEY_MAX_INDEX 10
+
 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
@@ -2052,6 +2054,9 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
 				ETH_TXQ_FLAGS_NOOFFLOADS,
 	};
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
+#endif
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
 }
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 4/6] i40e: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (2 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 3/6] fm10k: " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Helin Zhang
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 4 ++++
 drivers/net/i40e/i40e_ethdev_vf.c | 4 ++++
 2 files changed, 8 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..63c76a6 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1540,6 +1540,10 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_SCTP_CKSUM |
 		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_TCP_TSO;
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
+#endif
 	dev_info->reta_size = pf->hash_lut_size;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 9f92a2f..486b394 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1641,6 +1641,10 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
+#endif
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 2/6] e1000: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 3/6] fm10k: " Helin Zhang
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_ethdev.c | 5 +++++
 1 file changed, 5 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..b85b786 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -68,6 +68,8 @@
 #define IGB_DEFAULT_TX_HTHRESH      0
 #define IGB_DEFAULT_TX_WTHRESH      0
 
+#define IGB_HKEY_MAX_INDEX 10
+
 /* Bit shift and mask */
 #define IGB_4_BIT_WIDTH  (CHAR_BIT / 2)
 #define IGB_4_BIT_MASK   RTE_LEN2MASK(IGB_4_BIT_WIDTH, uint8_t)
@@ -1377,6 +1379,9 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		/* Should not happen */
 		break;
 	}
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = IGB_HKEY_MAX_INDEX * sizeof(uint32_t);
+#endif
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IGB_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 3/6] fm10k: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 2/6] e1000: fill the " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Helin Zhang
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 87852ed..043c60a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -767,6 +767,9 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		DEV_RX_OFFLOAD_UDP_CKSUM  |
 		DEV_RX_OFFLOAD_TCP_CKSUM;
 	dev_info->tx_offload_capa    = 0;
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = FM10K_RSSRK_SIZE * sizeof(uint32_t);
+#endif
 	dev_info->reta_size = FM10K_MAX_RSS_INDICES;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04 10:38  3%     ` Ananyev, Konstantin
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 2/6] e1000: fill the " Helin Zhang
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

To support querying hash key size per port, an new field of
'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
hash key size in bytes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 3 +++
 1 file changed, 3 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..bdebc87 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -916,6 +916,9 @@ struct rte_eth_dev_info {
 	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
 	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
 	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	uint8_t hash_key_size; /**< Hash key size in bytes */
+#endif
 	uint16_t reta_size;
 	/**< Device redirection table size, the total number of entries. */
 	/** Bit mask of RSS offloads, the bit offset also means flow type */
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] add support for HTM lock elision for x86
       [not found]     ` <CADNuJVpeKa9-R7WHkoCzw82vpYd=3XmhOoz2JfGsFLzDW+F5UQ@mail.gmail.com>
@ 2015-06-02 13:39  0%   ` Dementiev, Roman
  0 siblings, 0 replies; 200+ results
From: Dementiev, Roman @ 2015-06-02 13:39 UTC (permalink / raw)
  To: Jay Rolette; +Cc: DPDK

Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk
Registergericht: Muenchen HRB 47456
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
From yong.liu@intel.com  Tue Jun  2 16:09:11 2015
Return-Path: <yong.liu@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 23E59C334
 for <dev@dpdk.org>; Tue,  2 Jun 2015 16:09:09 +0200 (CEST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga101.fm.intel.com with ESMTP; 02 Jun 2015 07:08:47 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.13,540,1427785200"; d="scan'208";a="735637182"
Received: from kmsmsx152.gar.corp.intel.com ([172.21.73.87])
 by fmsmga002.fm.intel.com with ESMTP; 02 Jun 2015 07:08:45 -0700
Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by
 KMSMSX152.gar.corp.intel.com (172.21.73.87) with Microsoft SMTP Server (TLS)
 id 14.3.224.2; Tue, 2 Jun 2015 22:08:44 +0800
Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.23]) by
 shsmsx102.ccr.corp.intel.com ([169.254.2.109]) with mapi id 14.03.0224.002;
 Tue, 2 Jun 2015 22:08:43 +0800
From: "Liu, Yong" <yong.liu@intel.com>
To: "Liang, Cunming" <cunming.liang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [PATCH v10 00/13] Interrupt mode PMD
Thread-Index: AQHQnQDjRUxaR9oYIk2Xx85CWpKmRJ2ZQSHw
Date: Tue, 2 Jun 2015 14:08:43 +0000
Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E10E37490@SHSMSX103.ccr.corp.intel.com>
References: <1432889125-20255-1-git-send-email-cunming.liang@intel.com>
 <1433228006-24661-1-git-send-email-cunming.liang@intel.com>
In-Reply-To: <1433228006-24661-1-git-send-email-cunming.liang@intel.com>
Accept-Language: zh-CN, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Jun 2015 14:09:11 -0000

Tested-by: Yong Liu <yong.liu@intel.com>

- Tested Commit: 7c4c66bf666b8059ed0ad2f2478ef349b3272f51
- OS: Fedora20 3.15.5
- GCC: gcc version 4.8.3 20140911
- CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
- NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb]
- NIC: Intel Corporation I350 Gigabit Network Connection [8086:1521]
- Default x86_64-native-linuxapp-gcc configuration
- Prerequisites: vfio related case request vt-d enable in bios and IOMMU enable in kernel 
- Total 17 cases, 17 passed, 0 failed

- Case: pf_lsc_igbuio_legacy
  Description: check when pf bound to igb_uio with legacy mode, link status change interrupt can be normally handled
  Command / instruction:
    Insmod igb_uio driver with legacy interrupt mode
      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko intr_mode=legacy
    Change port config to lsc enable and rxq disable in l3fwd-power/main.c
    Build l3fwd-power and start l3fwd-power with 2 ports  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Change tester port0 link down and verify link down detected on dut port0 
      Port 0: link down
    Change tester port0 link up and verify link up detected on dut port0
      Port 0: link up
    Change tester port1 link down and verify link down detected on dut port1 
      Port 1: link down
    Change tester port1 link up and verify link up detected on dut port1
      Port 1: link up
  
    Change port config to lsc enable and rxq enable in l3fwd-power/main.c
    Build l3fwd-power and start l3fwd-power with 2 ports
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify lsc disabled for can't enable lsc and rxq in the same time when pf bound to igb_uio
      lsc won't enable because of no intr multiplex
  
- Case: pf_lsc_igbuio_msix
  Description: check when pf bound to igb_uio with msix mode, link status change interrupt can be normally handled
  Command / instruction:
    Insmod igb_uio driver with msix interrupt mode
      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko intr_mode=msix
    Verify link status can be normally handled like previous case pf_lsc_igbuio_legacy.

- Case: pf_lsc_vfio_legacy
  Description: check when pf bound to vfio with legacy mode, link status change interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
      modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Change port config to lsc enable and rxq disable in l3fwd-power/main.c
    Start l3fwd-power with vfio legacy mode
      l3fwd-power -c 0x6 -n 3 --vfio-intr=legacy -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Check link status change interrupt can be normally handled like previous case.
	
    Change port config to lsc enable and rxq enable in l3fwd-power/main.c
    Start l3fwd-power with vfio legacy mode
      l3fwd-power -c 0x6 -n 3 --vfio-intr=legacy -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify lsc disabled for can't enable lsc and rxq in the same time with legacy mode.
	
- Case: pf_lsc_vfio_msi
  Description: check when pf bound to vfio with msi mode, link status change interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
      modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Change port config to lsc enable and rxq disable in l3fwd-power/main.c
    Start l3fwd-power with vfio msi mode
      l3fwd-power -c 0x6 -n 3 --vfio-intr=msi -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Check link status change interrupt can be normally handled like previous case.
	
    Change port config to lsc enable and rxq enable in l3fwd-power/main.c
    Start l3fwd-power with vfio msi mode
      l3fwd-power -c 0x6 -n 3 --vfio-intr=msi -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify lsc disabled for can't enable lsc and rxq in the same time with legacy mode.

- Case: pf_lsc_vfio_msix
  Description: check when pf bound to vfio with msix mode, link status change interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
      modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Change port config to lsc enable and rxq disable in l3fwd-power/main.c
    Start l3fwd-power with vfio msix mode
      l3fwd-power -c 0x6 -n 3 --vfio-intr=msix -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Check link status change interrupt can be normally handled like previous case.
	
    Change port config to lsc enable and rxq enable in l3fwd-power/main.c
    Start l3fwd-power with vfio msix mode
      l3fwd-power -c 0x6 -n 3 --vfio-intr=msix -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Check link status change interrupt can be normally handled like previous case.
	
- Case: pf_rxq_on_vfio_msix
  Description: check when pf bound to vfio with default msix mode, receive packet interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
          modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Send packet from tester port0 and verify dut core1 wakeup and then sleep.
      lcore 1 is waked up from rx interrupt on port 0 queue 0
      lcore 1 sleeps until interrupt triggers
    Send packet from tester port1 and verify dut core2 wakeup and then sleep.
      lcore 2 is waked up from rx interrupt on port 1 queue 0
      lcore 2 sleeps until interrupt triggers
	  
- Case: pf_rxq_on_vfio_msi
  Description: check when pf bound to vfio with msi mode, receive packet interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
          modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 --vfio-intr=msi -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify packet interrupt can be normally handled like previous case pf_rxq_on_vfio_msix.
  
- Case: pf_rxq_on_vfio_legacy
  Description: check when pf bound to vfio with legacy mode, receive packet interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
      modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 --vfio-intr=legacy -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify packet interrupt can be normally handled like previous case pf_rxq_on_vfio_msix.

- Case: pf_onecore_on_vfio
  Description: check when all pf devices bound to one core, receive packet interrupt can be normally handled
  Command / instruction:
    Do prerequisites for vfio driver then bind device to vfio-driver
      modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:00.0 08:00.1
    Start l3fwd-power with 2 ports and 1 cores.  
      l3fwd-power -c 0x2 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,1)"
    Verify packet interrupt can be normally handled like previous case pf_rxq_on_vfio_msix.
	
- Case: pf_multiqueue_on_vfio
  Description: check when pf device has mulit queues, receive packet interrupt can be normally handled
  Command / instruction:
    Start l3fwd-power with 2 ports and 4 cores.
      l3fwd-power -c 0x100000e -n 3 -- -p 0x3 -P --config="(0,0,1),(0,1,2),(1,0,3),(1,1,24)"
    Send enough packets with different destination ip address.
	  sendp([Ether()/IP(dst="127.0.0.X")/UDP()/Raw('0'*18)], iface="p786p1")
    Verify all cores wakeup and then sleep as expected.

- Case: pf_maxqueue_on_vfio
  Description: check when pf device has maximum queues, receive packet interrupt can be normally handled
  Command / instruction:
    Start l3fwd-power with 2 ports and 32 cores [only for niantic], different nic has different maximum rx queues
      l3fwd-power -c 0x3fdfe3fdfe -n 3 -- -p 0x3 -P --config="(0,0,1),(0,1,21),(0,2,2),(0,3,22),\
          (0,4,3),(0,5,23),(0,6,24),(0,7,4),(0,8,25),(0,9,5),(0,10,26),(0,11,6),(0,12,27),(0,13,7),\
          (0,14,8),(0,15,28),(1,0,10),(1,1,30),(1,2,11),(1,3,31),(1,4,32),(1,5,12),(1,6,33),(1,7,13),\
          (1,8,34),(1,9,14),(1,10,35),(1,11,15),(1,12,16),(1,13,36),(1,14,17),(1,15,37),"
    Send enough packets with different destination ip address.
	  sendp([Ether()/IP(dst="127.0.0.X")/UDP()/Raw('0'*18)], iface="p786p1")
    Verify all cores wakeup and then sleep as expected.	
	
- Case: pf_rxq_on_igbuio_legacy
  Description: check when pf bound to igb_uio with legacy mode, receive packet interrupt can be normally handled
  Command / instruction:
    Insmod igb_uio driver with legacy interrupt mode
      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko intr_mode=legacy
      ./tools/dpdk_nic_bind.py --bind=igb_uio 08:00.0 08:00.1  
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify packet interrupt can be normally handled like previous case pf_rxq_on_vfio_msix.
  
- Case: pf_rxq_on_igbuio_msix
  Description: check when pf bound to igb_uio with msix mode, receive packet interrupt can be normally handled
  Command / instruction:
    Insmod igb_uio driver with msix interrupt mode
      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko intr_mode=msix
      ./tools/dpdk_nic_bind.py --bind=igb_uio 08:00.0 08:00.1  
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify packet interrupt can be normally handled like previous case pf_rxq_on_vfio_msix.

- Case: pf_rxq_on_uiopcigeneric
  Description: check when pf bound to uio_pci_generic, receive packet interrupt can be normally handled
  Command / instruction:
    Insmod uio_pci_generic driver and bind pf device on it.
      ./tools/dpdk_nic_bind.py --bind=uio_pci_generic 08:00.0 08:00.1  
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify packet interrupt can be normally handled like previous case pf_rxq_on_vfio_msix.	

- Case: pf_lsc_on_uiopcigeneric
  Description: check when pf bound to uio_pci_generic, link status changed interrupt can be normally handled
  Command / instruction:
    Insmod uio_pci_generic driver and bind pf device on it.
      ./tools/dpdk_nic_bind.py --bind=uio_pci_generic 08:00.0 08:00.1  
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Change tester port0 link down and verify link down detected on dut port0 
      Port 0: link down
    Change tester port0 link up and verify link up detected on dut port0
      Port 0: link up
    Change tester port1 link down and verify link down detected on dut port1 
      Port 1: link down
    Change tester port1 link up and verify link up detected on dut port1
      Port 1: link up
 
    Change port config to lsc enable and rxq enable in l3fwd-power/main.c
    Build l3fwd-power and start l3fwd-power with 2 ports
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Verify lsc disabled for can't enable lsc and rxq in the same time when pf bound to uio_pci_generic
      lsc won't enable because of no intr multiplex	
	
- Case: vf_in_vm_rxq
  Description: check when vf bound to igb_uio in virtual machine, receive packet interrupt can be normally handled
               Only support niantic by now.
  Command / instruction:
    Create vf devices and bound into virtual machine
      echo 1 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_numvfs
      echo 1 > /sys/bus/pci/devices/0000\:08\:00.1/sriov_numvfs
      virsh
      virsh # nodedev-dettach pci_0000_08_10_0
      virsh # nodedev-dettach pci_0000_08_10_1
    Start virtual machine and bind vf devices to driver igb_uio.
      ./tools/dpdk_nic_bind.py --bind=igb_uio eth1 eth2
    Change port config to lsc disable and rxq enable in l3fwd-power/main.c	
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Send packet from tester port0 with promisc mac and verify vm core1 wakeup and then sleep.
      lcore 1 is waked up from rx interrupt on port 0 queue 0
      lcore 1 sleeps until interrupt triggers
    Send packet from tester port1 with promisc mac and verify vm core2 wakeup and then sleep.
      lcore 2 is waked up from rx interrupt on port 1 queue 0
      lcore 2 sleeps until interrupt triggers
	  
- Case: vf_in_host_rxq
  Description: check when vf bound to vfio with msix mode, receive packet interrupt can be normally handled
               Only support niantic by now.
  Command / instruction:
    Create vf devices and bound to vfio
      echo 1 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_numvfs
      echo 1 > /sys/bus/pci/devices/0000\:08\:00.1/sriov_numvfs
      modprobe vfio
      modprobe vfio-pci
      ./tools/dpdk_nic_bind.py --bind=vfio-pci 08:10.0 08:10.1
    Start l3fwd-power with 2 ports and 2 cores.  
      l3fwd-power -c 0x6 -n 3 -- -p 0x3 -P --config="(0,0,1),(1,0,2)"
    Send packet from tester port0 with promisc mac and verify dut core1 wakeup and then sleep.
      lcore 1 is waked up from rx interrupt on port 0 queue 0
      lcore 1 sleeps until interrupt triggers
    Send packet from tester port1 with promisc mac and verify dut core2 wakeup and then sleep.
      lcore 2 is waked up from rx interrupt on port 1 queue 0
      lcore 2 sleeps until interrupt triggers

> -----Original Message-----
> From: Liang, Cunming
> Sent: Tuesday, June 02, 2015 2:53 PM
> To: dev@dpdk.org
> Cc: shemming@brocade.com; david.marchand@6wind.com;
> thomas.monjalon@6wind.com; Zhou, Danny; Wang, Liang-min; Richardson, Bruce;
> Liu, Yong; nhorman@tuxdriver.com; Liang, Cunming
> Subject: [PATCH v10 00/13] Interrupt mode PMD
> 
> v10 changes
>  - code rework to return actual error code
>  - bug fix for lsc when using uio_pci_generic
> 
> v9 changes
>  - code rework to fix open comment
>  - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
>  - new patch to turn off the feature by defalut so as to avoid v2.1 abi
> broken
> 
> v8 changes
>  - remove condition check for only vfio-msix
>  - add multiplex intr support when only one intr vector allowed
>  - lsc and rxq interrupt runtime enable decision
>  - add safe event delete while the event wakeup execution happens
> 
> v7 changes
>  - decouple epoll event and intr operation
>  - add condition check in the case intr vector is disabled
>  - renaming some APIs
> 
> v6 changes
>  - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
>  - rewrite rte_intr_rx_wait/rte_intr_rx_set.
>  - using vector number instead of queue_id as interrupt API params.
>  - patch reorder and split.
> 
> v5 changes
>  - Rebase the patchset onto the HEAD
>  - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
>  - Export wait-for-rx interrupt function for shared libraries
>  - Split-off a new patch file for changed struct rte_intr_handle that
>    other patches depend on, to avoid breaking git bisect
>  - Change sample applicaiton to accomodate EAL function spec change
>    accordingly
> 
> v4 changes
>  - Export interrupt enable/disable functions for shared libraries
>  - Adjust position of new-added structure fields and functions to
>    avoid breaking ABI
> 
> v3 changes
>  - Add return value for interrupt enable/disable functions
>  - Move spinlok from PMD to L3fwd-power
>  - Remove unnecessary variables in e1000_mac_info
>  - Fix miscelleous review comments
> 
> v2 changes
>  - Fix compilation issue in Makefile for missed header file.
>  - Consolidate internal and community review comments of v1 patch set.
> 
> The patch series introduce low-latency one-shot rx interrupt into DPDK
> with
> polling and interrupt mode switch control example.
> 
> DPDK userspace interrupt notification and handling mechanism is based on
> UIO
> with below limitation:
> 1) It is designed to handle LSC interrupt only with inefficient suspended
>    pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling
> thread
>    which then wakes up DPDK polling thread). In this way, it introduces
>    non-deterministic wakeup latency for DPDK polling thread as well as
> packet
>    latency if it is used to handle Rx interrupt.
> 2) UIO only supports a single interrupt vector which has to been shared by
>    LSC interrupt and interrupts assigned to dedicated rx queues.
> 
> This patchset includes below features:
> 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF
> only).
> 2) Build on top of the VFIO mechanism instead of UIO, so it could support
>    up to 64 interrupt vectors for rx queue interrupts.
> 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a
> dedicated
>    VFIO eventfd, which eliminates non-deterministic pthread wakeup latency
> in
>    user space.
> 4) Demonstrate interrupts control APIs and userspace NAIP-like
> polling/interrupt
>    switch algorithms in L3fwd-power example.
> 
> Known limitations:
> 1) It does not work for UIO due to a single interrupt eventfd shared by
> LSC
>    and rx queue interrupt handlers causes a mess. [FIXED]
> 2) LSC interrupt is not supported by VF driver, so it is by default
> disabled
>    in L3fwd-power now. Feel free to turn in on if you want to support both
> LSC
>    and rx queue interrupts on a PF.
> 
> Cunming Liang (13):
>   eal/linux: add interrupt vectors support in intr_handle
>   eal/linux: add rte_epoll_wait/ctl support
>   eal/linux: add API to set rx interrupt event monitor
>   eal/linux: fix comments typo on vfio msi
>   eal/linux: add interrupt vectors handling on VFIO
>   eal/linux: standalone intr event fd create support
>   eal/linux: fix lsc read error in uio_pci_generic
>   eal/bsd: dummy for new intr definition
>   ethdev: add rx intr enable, disable and ctl functions
>   ixgbe: enable rx queue interrupts for both PF and VF
>   igb: enable rx queue interrupts for PF
>   l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
>     switch
>   abi: fix v2.1 abi broken issue
> 
>  drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
>  drivers/net/ixgbe/ixgbe_ethdev.c                   | 519
> ++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
>  examples/l3fwd-power/main.c                        | 206 ++++++--
>  lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
>  .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 360 ++++++++++++--
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
>  lib/librte_ether/rte_ethdev.c                      | 109 +++++
>  lib/librte_ether/rte_ethdev.h                      | 132 ++++++
>  lib/librte_ether/rte_ether_version.map             |   4 +
>  13 files changed, 1852 insertions(+), 125 deletions(-)
> 
> --
> 1.8.1.4

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-01  8:14  5%       ` Olivier MATZ
@ 2015-06-02 13:27  4%         ` O'Driscoll, Tim
  2015-06-10 14:32  4%           ` Olivier MATZ
  0 siblings, 1 reply; 200+ results
From: O'Driscoll, Tim @ 2015-06-02 13:27 UTC (permalink / raw)
  To: Olivier MATZ, Zhang, Helin, dev


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Monday, June 1, 2015 9:15 AM
> To: Zhang, Helin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> Hi Helin,
> 
> +CC Neil
> 
> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> > In order to unify the packet type, the field of 'packet_type' in
> > 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> > Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> > support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> > KNI should be right mapped to 'struct rte_mbuf', it should be
> > modified accordingly. In addition, Vector PMD of ixgbe is disabled
> > by default, as 'struct rte_mbuf' changed.
> > To avoid breaking ABI compatibility, all the changes would be
> > enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> 
> What are the plans for this compile-time option in the future?
> 
> I wonder what are the benefits of having this option in terms
> of ABI compatibility: when it is disabled, it is ABI-compatible but
> the packet-type feature is not present, and when it is enabled we
> have the feature but it breaks the compatibility.
> 
> In my opinion, the v5 is preferable: for this kind of features, I
> don't see how the ABI can be preserved, and I think packet-type
> won't be the only feature that will modify the mbuf structure. I think
> the process described here should be applied:
> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> 
> (starting from "Some ABI changes may be too significant to reasonably
> maintain multiple versions of").
> 
> 
> Regards,
> Olivier
> 

This is just like the change that Steve (Cunming) Liang submitted for Interrupt Mode. We have the same problem in both cases: we want to find a way to get the features included, but need to comply with our ABI policy. So, in both cases, the proposal is to add a config option to enable the change by default, so we maintain backward compatibility. Users that want these changes, and are willing to accept the associated ABI change, have to specifically enable them.

We can note in the Deprecation Notices in the Release Notes for 2.1 that these config options will be removed in 2.2. The features will then be enabled by default.

This seems like a good compromise which allows us to get these changes into 2.1 but avoids breaking the ABI policy.


Tim

> 
> 
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > ---
> >  config/common_linuxapp                             |  2 +-
> >  .../linuxapp/eal/include/exec-env/rte_kni_common.h |  6 ++++++
> >  lib/librte_mbuf/rte_mbuf.h                         | 23
> ++++++++++++++++++++++
> >  3 files changed, 30 insertions(+), 1 deletion(-)
> >
> > v2 changes:
> > * Enlarged the packet_type field from 16 bits to 32 bits.
> > * Redefined the packet type sub-fields.
> > * Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf
> changes.
> >
> > v3 changes:
> > * Put the mbuf layout changes into a single patch.
> > * Disabled vector ixgbe PMD by default, as mbuf layout changed.
> >
> > v5 changes:
> > * Re-worded the commit logs.
> >
> > v6 changes:
> > * Disabled the code changes for unified packet type by default, to
> >   avoid breaking ABI compatibility.
> >
> > diff --git a/config/common_linuxapp b/config/common_linuxapp
> > index 0078dc9..6b067c7 100644
> > --- a/config/common_linuxapp
> > +++ b/config/common_linuxapp
> > @@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
> >  CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
> >  CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
> >  CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
> > -CONFIG_RTE_IXGBE_INC_VECTOR=y
> > +CONFIG_RTE_IXGBE_INC_VECTOR=n
> >  CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
> >
> >  #
> > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-
> env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> env/rte_kni_common.h
> > index 1e55c2d..7a2abbb 100644
> > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > @@ -117,9 +117,15 @@ struct rte_kni_mbuf {
> >  	uint16_t data_off;      /**< Start address of data in segment
> buffer. */
> >  	char pad1[4];
> >  	uint64_t ol_flags;      /**< Offload features. */
> > +#ifdef RTE_UNIFIED_PKT_TYPE
> > +	char pad2[4];
> > +	uint32_t pkt_len;       /**< Total pkt len: sum of all segment
> data_len. */
> > +	uint16_t data_len;      /**< Amount of data in segment buffer. */
> > +#else
> >  	char pad2[2];
> >  	uint16_t data_len;      /**< Amount of data in segment buffer. */
> >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment
> data_len. */
> > +#endif
> >
> >  	/* fields on second cache line */
> >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index ab6de67..a8662c2 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -269,6 +269,28 @@ struct rte_mbuf {
> >  	/* remaining bytes are set on RX when pulling packet from
> descriptor */
> >  	MARKER rx_descriptor_fields1;
> >
> > +#ifdef RTE_UNIFIED_PKT_TYPE
> > +	/*
> > +	 * The packet type, which is the combination of outer/inner L2, L3,
> L4
> > +	 * and tunnel types.
> > +	 */
> > +	union {
> > +		uint32_t packet_type; /**< L2/L3/L4 and tunnel information.
> */
> > +		struct {
> > +			uint32_t l2_type:4; /**< (Outer) L2 type. */
> > +			uint32_t l3_type:4; /**< (Outer) L3 type. */
> > +			uint32_t l4_type:4; /**< (Outer) L4 type. */
> > +			uint32_t tun_type:4; /**< Tunnel type. */
> > +			uint32_t inner_l2_type:4; /**< Inner L2 type. */
> > +			uint32_t inner_l3_type:4; /**< Inner L3 type. */
> > +			uint32_t inner_l4_type:4; /**< Inner L4 type. */
> > +		};
> > +	};
> > +
> > +	uint32_t pkt_len;         /**< Total pkt len: sum of all segments.
> */
> > +	uint16_t data_len;        /**< Amount of data in segment buffer. */
> > +	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU
> order) */
> > +#else
> >  	/**
> >  	 * The packet type, which is used to indicate ordinary packet and
> also
> >  	 * tunneled packet format, i.e. each number is represented a type
> of
> > @@ -280,6 +302,7 @@ struct rte_mbuf {
> >  	uint32_t pkt_len;         /**< Total pkt len: sum of all segments.
> */
> >  	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU
> order) */
> >  	uint16_t reserved;
> > +#endif
> >  	union {
> >  		uint32_t rss;     /**< RSS hash result if RSS enabled */
> >  		struct {
> >

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
  2015-06-02  3:16  3%   ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Helin Zhang
  2015-06-02  3:16  2%     ` [dpdk-dev] [PATCH v2 3/6] i40e: support double vlan " Helin Zhang
@ 2015-06-02  7:37  0%     ` Liu, Jijiang
  2015-06-08  7:32  0%       ` Cao, Min
  2015-06-08  7:40  0%     ` Olivier MATZ
  2015-06-11  7:03  3%     ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
  3 siblings, 1 reply; 200+ results
From: Liu, Jijiang @ 2015-06-02  7:37 UTC (permalink / raw)
  To: Zhang, Helin, dev


Acked-by: Jijiang Liu <Jijiang.liu@intel.com>


> -----Original Message-----
> From: Zhang, Helin
> Sent: Tuesday, June 2, 2015 11:16 AM
> To: dev@dpdk.org
> Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; Richardson, Bruce;
> olivier.matz@6wind.com; Zhang, Helin
> Subject: [PATCH v2 0/6] support i40e QinQ stripping and insertion
> 
> As i40e hardware can be reconfigured to support QinQ stripping and insertion,
> this patch set is to enable that together with using the reserved 16 bits in 'struct
> rte_mbuf' for the second vlan tag.
> Corresponding command is added in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.
> 
> Helin Zhang (6):
>   ixgbe: remove a discarded source line
>   mbuf: use the reserved 16 bits for double vlan
>   i40e: support double vlan stripping and insertion
>   i40evf: add supported offload capability flags
>   app/testpmd: add test cases for qinq stripping and insertion
>   examples/ipv4_multicast: support double vlan stripping and insertion
> 
>  app/test-pmd/cmdline.c            | 78 +++++++++++++++++++++++++++++++++----
>  app/test-pmd/config.c             | 21 +++++++++-
>  app/test-pmd/flowgen.c            |  4 +-
>  app/test-pmd/macfwd.c             |  3 ++
>  app/test-pmd/macswap.c            |  3 ++
>  app/test-pmd/rxonly.c             |  3 ++
>  app/test-pmd/testpmd.h            |  6 ++-
>  app/test-pmd/txonly.c             |  8 +++-
>  drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev_vf.c | 13 +++++++
>  drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
>  drivers/net/ixgbe/ixgbe_rxtx.c    |  1 -
>  examples/ipv4_multicast/main.c    |  1 +
>  lib/librte_ether/rte_ethdev.h     |  2 +
>  lib/librte_mbuf/rte_mbuf.h        | 10 ++++-
>  15 files changed, 243 insertions(+), 43 deletions(-)
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 13/13] abi: fix v2.1 abi broken issue
  2015-06-02  6:53  4%       ` [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
  2015-06-02  6:53  2%         ` [dpdk-dev] [PATCH v10 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-06-02  6:53 10%         ` Cunming Liang
  2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
  2 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-02  6:53 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The usrs should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 v9 Acked-by: vincent jardin <vincent.jardin@6wind.com>

 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1894,6 +1913,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3750,6 +3770,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3791,6 +3812,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3800,18 +3822,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_EAL_RX_INTR
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3868,6 +3893,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index bcec971..3a70af6 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1481,7 +1487,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1514,6 +1522,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1532,6 +1541,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1619,9 +1629,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1727,12 +1739,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2335,6 +2349,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_EAL_RX_INTR
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2345,6 +2360,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3127,7 +3143,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3160,6 +3178,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3177,7 +3196,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3223,19 +3242,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3246,11 +3269,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3834,6 +3859,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3902,21 +3928,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3927,6 +3957,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3937,18 +3968,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4000,6 +4034,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 538bb93..3b4054c 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,6 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
 	},
 };
 
@@ -889,7 +888,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index fc2c46b..f0f6a3f 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -49,9 +49,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 0f902b2..09a8fd2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_EAL_RX_INTR
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_EAL_RX_INTR
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -876,6 +884,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -919,6 +928,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1056,6 +1066,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1167,3 +1178,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 7c8a62b..5390b21 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_EAL_RX_INTR
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_EAL_RX_INTR
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 846d7f8..823eb46 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3282,6 +3282,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3353,6 +3354,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c199d32..8bea68d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_EAL_RX_INTR
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2943,8 +2945,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2967,9 +2981,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH v10 09/13] ethdev: add rx intr enable, disable and ctl functions
  2015-06-02  6:53  4%       ` [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
@ 2015-06-02  6:53  2%         ` Cunming Liang
  2015-06-02  6:53 10%         ` [dpdk-dev] [PATCH v10 13/13] abi: fix v2.1 abi broken issue Cunming Liang
  2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
  2 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-02  6:53 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 024fe8b..846d7f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1386,6 +1396,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2868,6 +2882,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD
  2015-05-29  8:45  4%     ` [dpdk-dev] [PATCH v9 00/12] Interrupt mode PMD Cunming Liang
  2015-05-29  8:45  2%       ` [dpdk-dev] [PATCH v9 08/12] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-05-29  8:45 10%       ` [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-06-02  6:53  4%       ` Cunming Liang
  2015-06-02  6:53  2%         ` [dpdk-dev] [PATCH v10 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
                           ` (2 more replies)
  2 siblings, 3 replies; 200+ results
From: Cunming Liang @ 2015-06-02  6:53 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by defalut so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (13):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 206 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 360 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1852 insertions(+), 125 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 3/6] i40e: support double vlan stripping and insertion
  2015-06-02  3:16  3%   ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Helin Zhang
@ 2015-06-02  3:16  2%     ` Helin Zhang
  2015-06-02  7:37  0%     ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Liu, Jijiang
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-02  3:16 UTC (permalink / raw)
  To: dev

It configures specific registers to enable double vlan stripping
on RX side and insertion on TX side.
The RX descriptors will be parsed, the vlan tags and flags will be
saved to corresponding mbuf fields if vlan tag is detected.
The TX descriptors will be configured according to the
configurations in mbufs, to trigger the hardware insertion of
double vlan tags for each packets sent out.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev_vf.c |  6 +++
 drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
 lib/librte_ether/rte_ethdev.h     |  2 +
 4 files changed, 112 insertions(+), 29 deletions(-)

v2 changes:
* Kept the original RX/TX offload flags as they were, added new
  flags after with new bit masks, for ABI compatibility.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..7593a70 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -211,6 +211,7 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
 				void *arg);
 static void i40e_configure_registers(struct i40e_hw *hw);
 static void i40e_hw_init(struct i40e_hw *hw);
+static int i40e_config_qinq(struct i40e_hw *hw, struct i40e_vsi *vsi);
 
 static const struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
@@ -1529,11 +1530,13 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = dev->pci_dev->max_vfs;
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
 		DEV_RX_OFFLOAD_IPV4_CKSUM |
 		DEV_RX_OFFLOAD_UDP_CKSUM |
 		DEV_RX_OFFLOAD_TCP_CKSUM;
 	dev_info->tx_offload_capa =
 		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT |
 		DEV_TX_OFFLOAD_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_UDP_CKSUM |
 		DEV_TX_OFFLOAD_TCP_CKSUM |
@@ -3056,6 +3059,7 @@ i40e_vsi_setup(struct i40e_pf *pf,
 		 * macvlan filter which is expected and cannot be removed.
 		 */
 		i40e_update_default_filter_setting(vsi);
+		i40e_config_qinq(hw, vsi);
 	} else if (type == I40E_VSI_SRIOV) {
 		memset(&ctxt, 0, sizeof(ctxt));
 		/**
@@ -3096,6 +3100,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
 		 * Since VSI is not created yet, only configure parameter,
 		 * will add vsi below.
 		 */
+
+		i40e_config_qinq(hw, vsi);
 	} else if (type == I40E_VSI_VMDQ2) {
 		memset(&ctxt, 0, sizeof(ctxt));
 		/*
@@ -5697,3 +5703,49 @@ i40e_configure_registers(struct i40e_hw *hw)
 			"0x%"PRIx32, reg_table[i].val, reg_table[i].addr);
 	}
 }
+
+#define I40E_VSI_TSR(_i)            (0x00050800 + ((_i) * 4))
+#define I40E_VSI_TSR_QINQ_CONFIG    0xc030
+#define I40E_VSI_L2TAGSTXVALID(_i)  (0x00042800 + ((_i) * 4))
+#define I40E_VSI_L2TAGSTXVALID_QINQ 0xab
+static int
+i40e_config_qinq(struct i40e_hw *hw, struct i40e_vsi *vsi)
+{
+	uint32_t reg;
+	int ret;
+
+	if (vsi->vsi_id >= I40E_MAX_NUM_VSIS) {
+		PMD_DRV_LOG(ERR, "VSI ID exceeds the maximum");
+		return -EINVAL;
+	}
+
+	/* Configure for double VLAN RX stripping */
+	reg = I40E_READ_REG(hw, I40E_VSI_TSR(vsi->vsi_id));
+	if ((reg & I40E_VSI_TSR_QINQ_CONFIG) != I40E_VSI_TSR_QINQ_CONFIG) {
+		reg |= I40E_VSI_TSR_QINQ_CONFIG;
+		ret = i40e_aq_debug_write_register(hw,
+						   I40E_VSI_TSR(vsi->vsi_id),
+						   reg, NULL);
+		if (ret < 0) {
+			PMD_DRV_LOG(ERR, "Failed to update VSI_TSR[%d]",
+				    vsi->vsi_id);
+			return I40E_ERR_CONFIG;
+		}
+	}
+
+	/* Configure for double VLAN TX insertion */
+	reg = I40E_READ_REG(hw, I40E_VSI_L2TAGSTXVALID(vsi->vsi_id));
+	if ((reg & 0xff) != I40E_VSI_L2TAGSTXVALID_QINQ) {
+		reg = I40E_VSI_L2TAGSTXVALID_QINQ;
+		ret = i40e_aq_debug_write_register(hw,
+						   I40E_VSI_L2TAGSTXVALID(
+						   vsi->vsi_id), reg, NULL);
+		if (ret < 0) {
+			PMD_DRV_LOG(ERR, "Failed to update "
+				"VSI_L2TAGSTXVALID[%d]", vsi->vsi_id);
+			return I40E_ERR_CONFIG;
+		}
+	}
+
+	return 0;
+}
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 9f92a2f..1a4d088 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1643,6 +1643,12 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP;
+	dev_info->tx_offload_capa =
+		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 787f0bd..442494e 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -95,18 +95,44 @@ static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
 
+static inline void
+i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp)
+{
+	if (rte_le_to_cpu_64(rxdp->wb.qword1.status_error_len) &
+		(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) {
+		mb->ol_flags |= PKT_RX_VLAN_PKT;
+		mb->vlan_tci =
+			rte_le_to_cpu_16(rxdp->wb.qword0.lo_dword.l2tag1);
+		PMD_RX_LOG(DEBUG, "Descriptor l2tag1: %u",
+			   rte_le_to_cpu_16(rxdp->wb.qword0.lo_dword.l2tag1));
+	} else {
+		mb->vlan_tci = 0;
+	}
+#ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+	if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) &
+		(1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) {
+		mb->ol_flags |= PKT_RX_QINQ_PKT;
+		mb->vlan_tci_outer = mb->vlan_tci;
+		mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2);
+		PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
+			   rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_1),
+			   rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2));
+	} else {
+		mb->vlan_tci_outer = 0;
+	}
+#endif
+	PMD_RX_LOG(DEBUG, "Mbuf vlan_tci: %u, vlan_tci_outer: %u",
+		   mb->vlan_tci, mb->vlan_tci_outer);
+}
+
 /* Translate the rx descriptor status to pkt flags */
 static inline uint64_t
 i40e_rxd_status_to_pkt_flags(uint64_t qword)
 {
 	uint64_t flags;
 
-	/* Check if VLAN packet */
-	flags = qword & (1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-							PKT_RX_VLAN_PKT : 0;
-
 	/* Check if RSS_HASH */
-	flags |= (((qword >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
+	flags = (((qword >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
 					I40E_RX_DESC_FLTSTAT_RSS_HASH) ==
 			I40E_RX_DESC_FLTSTAT_RSS_HASH) ? PKT_RX_RSS_HASH : 0;
 
@@ -697,16 +723,12 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			mb = rxep[j].mbuf;
 			qword1 = rte_le_to_cpu_64(\
 				rxdp[j].wb.qword1.status_error_len);
-			rx_status = (qword1 & I40E_RXD_QW1_STATUS_MASK) >>
-						I40E_RXD_QW1_STATUS_SHIFT;
 			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
 				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
-			mb->vlan_tci = rx_status &
-				(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-			rte_le_to_cpu_16(\
-				rxdp[j].wb.qword0.lo_dword.l2tag1) : 0;
+			mb->ol_flags = 0;
+			i40e_rxd_to_vlan_tci(mb, &rxdp[j]);
 			pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -720,7 +742,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			if (pkt_flags & PKT_RX_FDIR)
 				pkt_flags |= i40e_rxd_build_fdir(&rxdp[j], mb);
 
-			mb->ol_flags = pkt_flags;
+			mb->ol_flags |= pkt_flags;
 		}
 
 		for (j = 0; j < I40E_LOOK_AHEAD; j++)
@@ -946,10 +968,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = rx_packet_len;
 		rxm->data_len = rx_packet_len;
 		rxm->port = rxq->port_id;
-
-		rxm->vlan_tci = rx_status &
-			(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
+		rxm->ol_flags = 0;
+		i40e_rxd_to_vlan_tci(rxm, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -961,7 +981,7 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (pkt_flags & PKT_RX_FDIR)
 			pkt_flags |= i40e_rxd_build_fdir(&rxd, rxm);
 
-		rxm->ol_flags = pkt_flags;
+		rxm->ol_flags |= pkt_flags;
 
 		rx_pkts[nb_rx++] = rxm;
 	}
@@ -1106,9 +1126,8 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		}
 
 		first_seg->port = rxq->port_id;
-		first_seg->vlan_tci = (rx_status &
-			(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
-			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
+		first_seg->ol_flags = 0;
+		i40e_rxd_to_vlan_tci(first_seg, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -1121,7 +1140,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		if (pkt_flags & PKT_RX_FDIR)
 			pkt_flags |= i40e_rxd_build_fdir(&rxd, rxm);
 
-		first_seg->ol_flags = pkt_flags;
+		first_seg->ol_flags |= pkt_flags;
 
 		/* Prefetch data of first segment, if configured to do so. */
 		rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
@@ -1159,17 +1178,15 @@ i40e_recv_scattered_pkts(void *rx_queue,
 static inline uint16_t
 i40e_calc_context_desc(uint64_t flags)
 {
-	uint64_t mask = 0ULL;
-
-	mask |= (PKT_TX_OUTER_IP_CKSUM | PKT_TX_TCP_SEG);
+	static uint64_t mask = PKT_TX_OUTER_IP_CKSUM |
+		PKT_TX_TCP_SEG |
+		PKT_TX_QINQ_PKT;
 
 #ifdef RTE_LIBRTE_IEEE1588
 	mask |= PKT_TX_IEEE1588_TMST;
 #endif
-	if (flags & mask)
-		return 1;
 
-	return 0;
+	return ((flags & mask) ? 1 : 0);
 }
 
 /* set i40e TSO context descriptor */
@@ -1290,9 +1307,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Descriptor based VLAN insertion */
-		if (ol_flags & PKT_TX_VLAN_PKT) {
+		if (ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_QINQ_PKT)) {
 			tx_flags |= tx_pkt->vlan_tci <<
-					I40E_TX_FLAG_L2TAG1_SHIFT;
+				I40E_TX_FLAG_L2TAG1_SHIFT;
 			tx_flags |= I40E_TX_FLAG_INSERT_VLAN;
 			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
 			td_tag = (tx_flags & I40E_TX_FLAG_L2TAG1_MASK) >>
@@ -1340,6 +1357,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			ctx_txd->tunneling_params =
 				rte_cpu_to_le_32(cd_tunneling_params);
+			if (ol_flags & PKT_TX_QINQ_PKT) {
+				cd_l2tag2 = tx_pkt->vlan_tci_outer;
+				cd_type_cmd_tso_mss |=
+					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
+						I40E_TXD_CTX_QW1_CMD_SHIFT);
+			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->type_cmd_tso_mss =
 				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..892280c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -887,6 +887,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_UDP_CKSUM   0x00000004
 #define DEV_RX_OFFLOAD_TCP_CKSUM   0x00000008
 #define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
+#define DEV_RX_OFFLOAD_QINQ_STRIP  0x00000020
 
 /**
  * TX offload capabilities of a device.
@@ -899,6 +900,7 @@ struct rte_eth_conf {
 #define DEV_TX_OFFLOAD_TCP_TSO     0x00000020
 #define DEV_TX_OFFLOAD_UDP_TSO     0x00000040
 #define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000080 /**< Used for tunneling packet. */
+#define DEV_TX_OFFLOAD_QINQ_INSERT 0x00000100
 
 struct rte_eth_dev_info {
 	struct rte_pci_device *pci_dev; /**< Device PCI information. */
-- 
1.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
      @ 2015-06-02  3:16  3%   ` Helin Zhang
  2015-06-02  3:16  2%     ` [dpdk-dev] [PATCH v2 3/6] i40e: support double vlan " Helin Zhang
                       ` (3 more replies)
  2 siblings, 4 replies; 200+ results
From: Helin Zhang @ 2015-06-02  3:16 UTC (permalink / raw)
  To: dev

As i40e hardware can be reconfigured to support QinQ stripping and
insertion, this patch set is to enable that together with using the
reserved 16 bits in 'struct rte_mbuf' for the second vlan tag.
Corresponding command is added in testpmd for testing.
Note that no need to rework vPMD, as nothings used in it changed.

v2 changes:
* Added more commit logs of which commit it fix for.
* Fixed a typo.
* Kept the original RX/TX offload flags as they were, added new
  flags after with new bit masks, for ABI compatibility.
* Supported double vlan stripping/insertion in examples/ipv4_multicast.

Helin Zhang (6):
  ixgbe: remove a discarded source line
  mbuf: use the reserved 16 bits for double vlan
  i40e: support double vlan stripping and insertion
  i40evf: add supported offload capability flags
  app/testpmd: add test cases for qinq stripping and insertion
  examples/ipv4_multicast: support double vlan stripping and insertion

 app/test-pmd/cmdline.c            | 78 +++++++++++++++++++++++++++++++++----
 app/test-pmd/config.c             | 21 +++++++++-
 app/test-pmd/flowgen.c            |  4 +-
 app/test-pmd/macfwd.c             |  3 ++
 app/test-pmd/macswap.c            |  3 ++
 app/test-pmd/rxonly.c             |  3 ++
 app/test-pmd/testpmd.h            |  6 ++-
 app/test-pmd/txonly.c             |  8 +++-
 drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev_vf.c | 13 +++++++
 drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
 drivers/net/ixgbe/ixgbe_rxtx.c    |  1 -
 examples/ipv4_multicast/main.c    |  1 +
 lib/librte_ether/rte_ethdev.h     |  2 +
 lib/librte_mbuf/rte_mbuf.h        | 10 ++++-
 15 files changed, 243 insertions(+), 43 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 3/5] i40e: support double vlan stripping and insertion
  2015-06-01  8:50  3%     ` Olivier MATZ
@ 2015-06-02  2:45  0%       ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-02  2:45 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Monday, June 1, 2015 4:51 PM
> To: Zhang, Helin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 3/5] i40e: support double vlan stripping and
> insertion
> 
> Hi Helin,
> 
> On 05/26/2015 10:36 AM, Helin Zhang wrote:
> > It configures specific registers to enable double vlan stripping on RX
> > side and insertion on TX side.
> > The RX descriptors will be parsed, the vlan tags and flags will be
> > saved to corresponding mbuf fields if vlan tag is detected.
> > The TX descriptors will be configured according to the configurations
> > in mbufs, to trigger the hardware insertion of double vlan tags for
> > each packets sent out.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> 
> > [...]
> 
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 16dbe00..b26670e 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -882,23 +882,25 @@ struct rte_eth_conf {
> >  /**
> >   * RX offload capabilities of a device.
> >   */
> > -#define DEV_RX_OFFLOAD_VLAN_STRIP  0x00000001 -#define
> > DEV_RX_OFFLOAD_IPV4_CKSUM  0x00000002
> > -#define DEV_RX_OFFLOAD_UDP_CKSUM   0x00000004
> > -#define DEV_RX_OFFLOAD_TCP_CKSUM   0x00000008
> > -#define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
> > +#define DEV_RX_OFFLOAD_VLAN_STRIP       0x00000001
> > +#define DEV_RX_OFFLOAD_QINQ_STRIP       0x00000002
> > +#define DEV_RX_OFFLOAD_IPV4_CKSUM       0x00000004
> > +#define DEV_RX_OFFLOAD_UDP_CKSUM        0x00000008
> > +#define DEV_RX_OFFLOAD_TCP_CKSUM        0x00000010
> > +#define DEV_RX_OFFLOAD_TCP_LRO          0x00000020
> >
> >  /**
> >   * TX offload capabilities of a device.
> >   */
> > -#define DEV_TX_OFFLOAD_VLAN_INSERT 0x00000001 -#define
> > DEV_TX_OFFLOAD_IPV4_CKSUM  0x00000002
> > -#define DEV_TX_OFFLOAD_UDP_CKSUM   0x00000004
> > -#define DEV_TX_OFFLOAD_TCP_CKSUM   0x00000008
> > -#define DEV_TX_OFFLOAD_SCTP_CKSUM  0x00000010
> > -#define DEV_TX_OFFLOAD_TCP_TSO     0x00000020
> > -#define DEV_TX_OFFLOAD_UDP_TSO     0x00000040
> > -#define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000080 /**< Used for
> > tunneling packet. */
> > +#define DEV_TX_OFFLOAD_VLAN_INSERT      0x00000001
> > +#define DEV_TX_OFFLOAD_QINQ_INSERT      0x00000002
> > +#define DEV_TX_OFFLOAD_IPV4_CKSUM       0x00000004
> > +#define DEV_TX_OFFLOAD_UDP_CKSUM        0x00000008
> > +#define DEV_TX_OFFLOAD_TCP_CKSUM        0x00000010
> > +#define DEV_TX_OFFLOAD_SCTP_CKSUM       0x00000020
> > +#define DEV_TX_OFFLOAD_TCP_TSO          0x00000040
> > +#define DEV_TX_OFFLOAD_UDP_TSO          0x00000080
> > +#define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000100
> >
> >  struct rte_eth_dev_info {
> >  	struct rte_pci_device *pci_dev; /**< Device PCI information. */
> >
> 
> It's probably better to add the new flags after the others for ABI compat reasons.
Agree, will fix it.

Thanks,
Helin

> 
> 
> Regards,
> Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-06-01 13:27  5%             ` Stephen Hemminger
@ 2015-06-02  2:14  5%               ` Liang, Cunming
  0 siblings, 0 replies; 200+ results
From: Liang, Cunming @ 2015-06-02  2:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

On 6/1/2015 9:27 PM, Stephen Hemminger wrote:
> On Mon, 1 Jun 2015 16:48:01 +0800
> "Liang, Cunming" <cunming.liang@intel.com> wrote:
>
>> Hi Stephen,
>>
>> On 5/29/2015 11:27 PM, Stephen Hemminger wrote:
>>> On Fri, 29 May 2015 16:45:25 +0800
>>> Cunming Liang <cunming.liang@intel.com> wrote:
>>>   
>>>> +#ifdef RTE_EAL_RX_INTR
>>>> +extern int
>>>>    rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
>>>> +#else
>>>> +static inline int
>>>> +rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
>>>> +{
>>>> +	RTE_SET_USED(port_id);
>>>> +	RTE_SET_USED(epfd);
>>>> +	RTE_SET_USED(op);
>>>> +	RTE_SET_USED(data);
>>>> +	return -1;
>>>> +}
>>>> +#endif
>>> Doing ABI compatibility is good but hard.
>>>
>>> I think it would be better not to provide the functions for rx_intr_ctl unless
>>> the feature was configured on. That way anyone using them with incorrect config
>>> would detect failure at build time, rather than run time.
>> I tend to not agree. For rx_intr_ctl/rx_intr_ctl_q, no matter w/ or w/o
>> RTE_EAL_RX_INTR, it's necessary to check the return value.
>> The failure return shall cause application give up using epoll waiting
>> on the specified epfd for the port, and then degraded to pure polling mode.
>> So I think these failure should be handled by the caller.
> It is always best to fail as early in the development process as possible.
> What possible benefit could there be from allowing application to be linked
> and run with incorrect configuration.
In fact you'll always detect failure at build time if your application 
insist to call rx_intr_ctl with port_conf.intr_conf.rxq=1.
As port_conf.intr_conf.rxq is not defined when RTE_EAL_RX_INTR is not 
defined.
If you ignore port_conf.intr_conf.rxq, no matter RTE_EAL_RX_INTR is 
defined or not, you always will get failure return when calling rx_intr_ctl.
So I think the behavior doesn't break 'fail as early in the development 
process as possible'.
And I'd like to expose all new APIs in this version, if we don't provide 
this API, what about rte_intr_rx_ctl or others? They're probably used by 
other user application as well.

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-06-01 14:11  5%         ` Stephen Hemminger
@ 2015-06-01 14:18  5%           ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-06-01 14:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, liang-min.wang

Never mind, had wrong version of one of the patches.

On Mon, Jun 1, 2015 at 7:11 AM, Stephen Hemminger <shemming@brocade.com>
wrote:

> On Fri, 29 May 2015 16:45:25 +0800
> Cunming Liang <cunming.liang@intel.com> wrote:
>
> > RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid
> ABI(unannounced) broken in v2.1.
> > The usrs should make sure understand the impact before turning on the
> feature.
> > There are two abi changes required in this interrupt patch set.
> > They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.
> >
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > ---
>
> While merging for testing I discovered another minor issue.
>
> The patch order here is a problem. The intermediate steps won't build
> until
> this last patch is applied.
>
> In order to allow git bisect to be useful, it is important that every
> commit
> done in the upstream version build and work. This series does not seem to
> build until this last patch is applied. Maybe it should just be first?
>

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-05-29  8:45 10%       ` [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue Cunming Liang
  2015-05-29 15:27  9%         ` Stephen Hemminger
  2015-05-29 15:36  5%         ` Vincent JARDIN
@ 2015-06-01 14:11  5%         ` Stephen Hemminger
  2015-06-01 14:18  5%           ` Stephen Hemminger
  2 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-01 14:11 UTC (permalink / raw)
  To: Cunming Liang; +Cc: dev, liang-min.wang

On Fri, 29 May 2015 16:45:25 +0800
Cunming Liang <cunming.liang@intel.com> wrote:

> RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
> The usrs should make sure understand the impact before turning on the feature.
> There are two abi changes required in this interrupt patch set.
> They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.
> 
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> ---

While merging for testing I discovered another minor issue.

The patch order here is a problem. The intermediate steps won't build  until
this last patch is applied.

In order to allow git bisect to be useful, it is important that every commit
done in the upstream version build and work. This series does not seem to
build until this last patch is applied. Maybe it should just be first?

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-06-01  8:48  9%           ` Liang, Cunming
@ 2015-06-01 13:27  5%             ` Stephen Hemminger
  2015-06-02  2:14  5%               ` Liang, Cunming
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-01 13:27 UTC (permalink / raw)
  To: Liang, Cunming; +Cc: dev

On Mon, 1 Jun 2015 16:48:01 +0800
"Liang, Cunming" <cunming.liang@intel.com> wrote:

> Hi Stephen,
> 
> On 5/29/2015 11:27 PM, Stephen Hemminger wrote:
> > On Fri, 29 May 2015 16:45:25 +0800
> > Cunming Liang <cunming.liang@intel.com> wrote:
> >  
> >> +#ifdef RTE_EAL_RX_INTR
> >> +extern int
> >>   rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
> >> +#else
> >> +static inline int
> >> +rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
> >> +{
> >> +	RTE_SET_USED(port_id);
> >> +	RTE_SET_USED(epfd);
> >> +	RTE_SET_USED(op);
> >> +	RTE_SET_USED(data);
> >> +	return -1;
> >> +}
> >> +#endif  
> > Doing ABI compatibility is good but hard.
> >
> > I think it would be better not to provide the functions for rx_intr_ctl unless
> > the feature was configured on. That way anyone using them with incorrect config
> > would detect failure at build time, rather than run time.  
> I tend to not agree. For rx_intr_ctl/rx_intr_ctl_q, no matter w/ or w/o 
> RTE_EAL_RX_INTR, it's necessary to check the return value.
> The failure return shall cause application give up using epoll waiting 
> on the specified epfd for the port, and then degraded to pure polling mode.
> So I think these failure should be handled by the caller.

It is always best to fail as early in the development process as possible.
What possible benefit could there be from allowing application to be linked
and run with incorrect configuration.

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH 3/5] i40e: support double vlan stripping and insertion
  @ 2015-06-01  8:50  3%     ` Olivier MATZ
  2015-06-02  2:45  0%       ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Olivier MATZ @ 2015-06-01  8:50 UTC (permalink / raw)
  To: Helin Zhang, dev

Hi Helin,

On 05/26/2015 10:36 AM, Helin Zhang wrote:
> It configures specific registers to enable double vlan stripping
> on RX side and insertion on TX side.
> The RX descriptors will be parsed, the vlan tags and flags will be
> saved to corresponding mbuf fields if vlan tag is detected.
> The TX descriptors will be configured according to the
> configurations in mbufs, to trigger the hardware insertion of
> double vlan tags for each packets sent out.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>

> [...]

> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16dbe00..b26670e 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -882,23 +882,25 @@ struct rte_eth_conf {
>  /**
>   * RX offload capabilities of a device.
>   */
> -#define DEV_RX_OFFLOAD_VLAN_STRIP  0x00000001
> -#define DEV_RX_OFFLOAD_IPV4_CKSUM  0x00000002
> -#define DEV_RX_OFFLOAD_UDP_CKSUM   0x00000004
> -#define DEV_RX_OFFLOAD_TCP_CKSUM   0x00000008
> -#define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
> +#define DEV_RX_OFFLOAD_VLAN_STRIP       0x00000001
> +#define DEV_RX_OFFLOAD_QINQ_STRIP       0x00000002
> +#define DEV_RX_OFFLOAD_IPV4_CKSUM       0x00000004
> +#define DEV_RX_OFFLOAD_UDP_CKSUM        0x00000008
> +#define DEV_RX_OFFLOAD_TCP_CKSUM        0x00000010
> +#define DEV_RX_OFFLOAD_TCP_LRO          0x00000020
>  
>  /**
>   * TX offload capabilities of a device.
>   */
> -#define DEV_TX_OFFLOAD_VLAN_INSERT 0x00000001
> -#define DEV_TX_OFFLOAD_IPV4_CKSUM  0x00000002
> -#define DEV_TX_OFFLOAD_UDP_CKSUM   0x00000004
> -#define DEV_TX_OFFLOAD_TCP_CKSUM   0x00000008
> -#define DEV_TX_OFFLOAD_SCTP_CKSUM  0x00000010
> -#define DEV_TX_OFFLOAD_TCP_TSO     0x00000020
> -#define DEV_TX_OFFLOAD_UDP_TSO     0x00000040
> -#define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000080 /**< Used for tunneling packet. */
> +#define DEV_TX_OFFLOAD_VLAN_INSERT      0x00000001
> +#define DEV_TX_OFFLOAD_QINQ_INSERT      0x00000002
> +#define DEV_TX_OFFLOAD_IPV4_CKSUM       0x00000004
> +#define DEV_TX_OFFLOAD_UDP_CKSUM        0x00000008
> +#define DEV_TX_OFFLOAD_TCP_CKSUM        0x00000010
> +#define DEV_TX_OFFLOAD_SCTP_CKSUM       0x00000020
> +#define DEV_TX_OFFLOAD_TCP_TSO          0x00000040
> +#define DEV_TX_OFFLOAD_UDP_TSO          0x00000080
> +#define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000100
>  
>  struct rte_eth_dev_info {
>  	struct rte_pci_device *pci_dev; /**< Device PCI information. */
> 

It's probably better to add the new flags after the others
for ABI compat reasons.


Regards,
Olivier

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-05-29 15:27  9%         ` Stephen Hemminger
@ 2015-06-01  8:48  9%           ` Liang, Cunming
  2015-06-01 13:27  5%             ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Liang, Cunming @ 2015-06-01  8:48 UTC (permalink / raw)
  To: Stephen Hemminger, dev

Hi Stephen,

On 5/29/2015 11:27 PM, Stephen Hemminger wrote:
> On Fri, 29 May 2015 16:45:25 +0800
> Cunming Liang <cunming.liang@intel.com> wrote:
>
>> +#ifdef RTE_EAL_RX_INTR
>> +extern int
>>   rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
>> +#else
>> +static inline int
>> +rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
>> +{
>> +	RTE_SET_USED(port_id);
>> +	RTE_SET_USED(epfd);
>> +	RTE_SET_USED(op);
>> +	RTE_SET_USED(data);
>> +	return -1;
>> +}
>> +#endif
> Doing ABI compatibility is good but hard.
>
> I think it would be better not to provide the functions for rx_intr_ctl unless
> the feature was configured on. That way anyone using them with incorrect config
> would detect failure at build time, rather than run time.
I tend to not agree. For rx_intr_ctl/rx_intr_ctl_q, no matter w/ or w/o 
RTE_EAL_RX_INTR, it's necessary to check the return value.
The failure return shall cause application give up using epoll waiting 
on the specified epfd for the port, and then degraded to pure polling mode.
So I think these failure should be handled by the caller.
>
> Also, doesn't some doc file have to be updated for the announcement?
Yes, the ABI section in release note (doc/guides/rel_notes/abi.rst) 
shall update for the announcement.

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
@ 2015-06-01  8:14  5%       ` Olivier MATZ
  2015-06-02 13:27  4%         ` O'Driscoll, Tim
  0 siblings, 1 reply; 200+ results
From: Olivier MATZ @ 2015-06-01  8:14 UTC (permalink / raw)
  To: Helin Zhang, dev

Hi Helin,

+CC Neil

On 06/01/2015 09:33 AM, Helin Zhang wrote:
> In order to unify the packet type, the field of 'packet_type' in
> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> KNI should be right mapped to 'struct rte_mbuf', it should be
> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> by default, as 'struct rte_mbuf' changed.
> To avoid breaking ABI compatibility, all the changes would be
> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

What are the plans for this compile-time option in the future?

I wonder what are the benefits of having this option in terms
of ABI compatibility: when it is disabled, it is ABI-compatible but
the packet-type feature is not present, and when it is enabled we
have the feature but it breaks the compatibility.

In my opinion, the v5 is preferable: for this kind of features, I
don't see how the ABI can be preserved, and I think packet-type
won't be the only feature that will modify the mbuf structure. I think
the process described here should be applied:
http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst

(starting from "Some ABI changes may be too significant to reasonably
maintain multiple versions of").


Regards,
Olivier



> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> ---
>  config/common_linuxapp                             |  2 +-
>  .../linuxapp/eal/include/exec-env/rte_kni_common.h |  6 ++++++
>  lib/librte_mbuf/rte_mbuf.h                         | 23 ++++++++++++++++++++++
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> v2 changes:
> * Enlarged the packet_type field from 16 bits to 32 bits.
> * Redefined the packet type sub-fields.
> * Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
> 
> v3 changes:
> * Put the mbuf layout changes into a single patch.
> * Disabled vector ixgbe PMD by default, as mbuf layout changed.
> 
> v5 changes:
> * Re-worded the commit logs.
> 
> v6 changes:
> * Disabled the code changes for unified packet type by default, to
>   avoid breaking ABI compatibility.
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 0078dc9..6b067c7 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
>  CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
>  CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
> -CONFIG_RTE_IXGBE_INC_VECTOR=y
> +CONFIG_RTE_IXGBE_INC_VECTOR=n
>  CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
>  
>  #
> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> index 1e55c2d..7a2abbb 100644
> --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> @@ -117,9 +117,15 @@ struct rte_kni_mbuf {
>  	uint16_t data_off;      /**< Start address of data in segment buffer. */
>  	char pad1[4];
>  	uint64_t ol_flags;      /**< Offload features. */
> +#ifdef RTE_UNIFIED_PKT_TYPE
> +	char pad2[4];
> +	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> +	uint16_t data_len;      /**< Amount of data in segment buffer. */
> +#else
>  	char pad2[2];
>  	uint16_t data_len;      /**< Amount of data in segment buffer. */
>  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> +#endif
>  
>  	/* fields on second cache line */
>  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index ab6de67..a8662c2 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -269,6 +269,28 @@ struct rte_mbuf {
>  	/* remaining bytes are set on RX when pulling packet from descriptor */
>  	MARKER rx_descriptor_fields1;
>  
> +#ifdef RTE_UNIFIED_PKT_TYPE
> +	/*
> +	 * The packet type, which is the combination of outer/inner L2, L3, L4
> +	 * and tunnel types.
> +	 */
> +	union {
> +		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
> +		struct {
> +			uint32_t l2_type:4; /**< (Outer) L2 type. */
> +			uint32_t l3_type:4; /**< (Outer) L3 type. */
> +			uint32_t l4_type:4; /**< (Outer) L4 type. */
> +			uint32_t tun_type:4; /**< Tunnel type. */
> +			uint32_t inner_l2_type:4; /**< Inner L2 type. */
> +			uint32_t inner_l3_type:4; /**< Inner L3 type. */
> +			uint32_t inner_l4_type:4; /**< Inner L4 type. */
> +		};
> +	};
> +
> +	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
> +	uint16_t data_len;        /**< Amount of data in segment buffer. */
> +	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
> +#else
>  	/**
>  	 * The packet type, which is used to indicate ordinary packet and also
>  	 * tunneled packet format, i.e. each number is represented a type of
> @@ -280,6 +302,7 @@ struct rte_mbuf {
>  	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
>  	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
>  	uint16_t reserved;
> +#endif
>  	union {
>  		uint32_t rss;     /**< RSS hash result if RSS enabled */
>  		struct {
> 

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v6 17/18] examples/l3fwd: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (15 preceding siblings ...)
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 16/18] examples/l3fwd-power: " Helin Zhang
@ 2015-06-01  7:34  3%     ` Helin Zhang
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 18/18] mbuf: remove old packet type bit masks Helin Zhang
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:34 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd/main.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 120 insertions(+), 3 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v3 changes:
* Minor bug fixes and enhancements.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index e32512e..72d9ab7 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -955,7 +955,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		/* Handle IPv4 headers.*/
 		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char *) +
 				sizeof(struct ether_hdr));
@@ -989,8 +993,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
-
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 	} else {
+#endif
 		/* Handle IPv6 headers.*/
 		struct ipv6_hdr *ipv6_hdr;
 
@@ -1011,8 +1018,13 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else
+		/* Free the mbuf that contains non-IPV4/IPV6 packet */
+		rte_pktmbuf_free(m);
+#else
 	}
-
+#endif
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1036,12 +1048,19 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
  * to BAD_PORT value.
  */
 static inline __attribute__((always_inline)) void
+#ifdef RTE_UNIFIED_PKT_TYPE
+rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
+#else
 rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t flags)
+#endif
 {
 	uint8_t ihl;
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(ptype)) {
+#else
 	if ((flags & PKT_RX_IPV4_HDR) != 0) {
-
+#endif
 		ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL;
 
 		ipv4_hdr->time_to_live--;
@@ -1071,11 +1090,19 @@ get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	if (pkt->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		if (rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
 				&next_hop) != 0)
 			next_hop = portid;
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (pkt->ol_flags & PKT_RX_IPV6_HDR) {
+#endif
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 		if (rte_lpm6_lookup(qconf->ipv6_lookup_struct,
@@ -1109,12 +1136,52 @@ process_packet(struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	ve = val_eth[dp];
 
 	dst_port[0] = dp;
+#ifdef RTE_UNIFIED_PKT_TYPE
+	rfc1812_process(ipv4_hdr, dst_port, pkt->packet_type);
+#else
 	rfc1812_process(ipv4_hdr, dst_port, pkt->ol_flags);
+#endif
 
 	te =  _mm_blend_epi16(te, ve, MASK_ETH);
 	_mm_store_si128((__m128i *)eth_hdr, te);
 }
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+/*
+ * Read packet_type and destination IPV4 addresses from 4 mbufs.
+ */
+static inline void
+processx4_step1(struct rte_mbuf *pkt[FWDSTEP],
+		__m128i *dip,
+		uint32_t *ipv4_flag)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ether_hdr *eth_hdr;
+	uint32_t x0, x1, x2, x3;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[0], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x0 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] = pkt[0]->packet_type & RTE_PTYPE_L3_IPV4;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[1], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x1 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[1]->packet_type;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[2], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x2 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[2]->packet_type;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[3], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x3 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[3]->packet_type;
+
+	dip[0] = _mm_set_epi32(x3, x2, x1, x0);
+}
+#else /* RTE_UNIFIED_PKT_TYPE */
 /*
  * Read ol_flags and destination IPV4 addresses from 4 mbufs.
  */
@@ -1147,14 +1214,24 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, uint32_t *flag)
 
 	dip[0] = _mm_set_epi32(x3, x2, x1, x0);
 }
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 /*
  * Lookup into LPM for destination port.
  * If lookup fails, use incoming port (portid) as destination port.
  */
 static inline void
+#ifdef RTE_UNIFIED_PKT_TYPE
+processx4_step2(const struct lcore_conf *qconf,
+		__m128i dip,
+		uint32_t ipv4_flag,
+		uint8_t portid,
+		struct rte_mbuf *pkt[FWDSTEP],
+		uint16_t dprt[FWDSTEP])
+#else
 processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
 	uint8_t portid, struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP])
+#endif /* RTE_UNIFIED_PKT_TYPE */
 {
 	rte_xmm_t dst;
 	const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
@@ -1164,7 +1241,11 @@ processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
 	dip = _mm_shuffle_epi8(dip, bswap_mask);
 
 	/* if all 4 packets are IPV4. */
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (likely(ipv4_flag)) {
+#else
 	if (likely(flag != 0)) {
+#endif
 		rte_lpm_lookupx4(qconf->ipv4_lookup_struct, dip, dprt, portid);
 	} else {
 		dst.x = dip;
@@ -1214,6 +1295,16 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 	_mm_store_si128(p[2], te[2]);
 	_mm_store_si128(p[3], te[3]);
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[0] + 1),
+		&dst_port[0], pkt[0]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[1] + 1),
+		&dst_port[1], pkt[1]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[2] + 1),
+		&dst_port[2], pkt[2]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[3] + 1),
+		&dst_port[3], pkt[3]->packet_type);
+#else /* RTE_UNIFIED_PKT_TYPE */
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[0] + 1),
 		&dst_port[0], pkt[0]->ol_flags);
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[1] + 1),
@@ -1222,6 +1313,7 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 		&dst_port[2], pkt[2]->ol_flags);
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[3] + 1),
 		&dst_port[3], pkt[3]->ol_flags);
+#endif /* RTE_UNIFIED_PKT_TYPE */
 }
 
 /*
@@ -1408,7 +1500,11 @@ main_loop(__attribute__((unused)) void *dummy)
 	uint16_t *lp;
 	uint16_t dst_port[MAX_PKT_BURST];
 	__m128i dip[MAX_PKT_BURST / FWDSTEP];
+#ifdef RTE_UNIFIED_PKT_TYPE
+	uint32_t ipv4_flag[MAX_PKT_BURST / FWDSTEP];
+#else
 	uint32_t flag[MAX_PKT_BURST / FWDSTEP];
+#endif
 	uint16_t pnum[MAX_PKT_BURST + 1];
 #endif
 
@@ -1478,6 +1574,18 @@ main_loop(__attribute__((unused)) void *dummy)
 				 */
 				int32_t n = RTE_ALIGN_FLOOR(nb_rx, 4);
 				for (j = 0; j < n ; j+=4) {
+#ifdef RTE_UNIFIED_PKT_TYPE
+					uint32_t pkt_type =
+						pkts_burst[j]->packet_type &
+						pkts_burst[j+1]->packet_type &
+						pkts_burst[j+2]->packet_type &
+						pkts_burst[j+3]->packet_type;
+					if (pkt_type & RTE_PTYPE_L3_IPV4) {
+						simple_ipv4_fwd_4pkts(
+						&pkts_burst[j], portid, qconf);
+					} else if (pkt_type &
+						RTE_PTYPE_L3_IPV6) {
+#else /* RTE_UNIFIED_PKT_TYPE */
 					uint32_t ol_flag = pkts_burst[j]->ol_flags
 							& pkts_burst[j+1]->ol_flags
 							& pkts_burst[j+2]->ol_flags
@@ -1486,6 +1594,7 @@ main_loop(__attribute__((unused)) void *dummy)
 						simple_ipv4_fwd_4pkts(&pkts_burst[j],
 									portid, qconf);
 					} else if (ol_flag & PKT_RX_IPV6_HDR) {
+#endif /* RTE_UNIFIED_PKT_TYPE */
 						simple_ipv6_fwd_4pkts(&pkts_burst[j],
 									portid, qconf);
 					} else {
@@ -1510,13 +1619,21 @@ main_loop(__attribute__((unused)) void *dummy)
 			for (j = 0; j != k; j += FWDSTEP) {
 				processx4_step1(&pkts_burst[j],
 					&dip[j / FWDSTEP],
+#ifdef RTE_UNIFIED_PKT_TYPE
+					&ipv4_flag[j / FWDSTEP]);
+#else
 					&flag[j / FWDSTEP]);
+#endif
 			}
 
 			k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
 			for (j = 0; j != k; j += FWDSTEP) {
 				processx4_step2(qconf, dip[j / FWDSTEP],
+#ifdef RTE_UNIFIED_PKT_TYPE
+					ipv4_flag[j / FWDSTEP], portid,
+#else
 					flag[j / FWDSTEP], portid,
+#endif
 					&pkts_burst[j], &dst_port[j]);
 			}
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (11 preceding siblings ...)
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 12/18] app/test: Remove useless code Helin Zhang
@ 2015-06-01  7:34  4%     ` Helin Zhang
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 14/18] examples/ip_reassembly: " Helin Zhang
                       ` (4 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:34 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/ip_fragmentation/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 0922ba6..5eccecc 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -283,7 +283,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 	len = qconf->tx_mbufs[port_out].len;
 
 	/* if this is an IPv4 packet */
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		struct ipv4_hdr *ip_hdr;
 		uint32_t ip_dst;
 		/* Read the lookup key (i.e. ip_dst) from the input packet */
@@ -317,9 +321,14 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 			if (unlikely (len2 < 0))
 				return;
 		}
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+		/* if this is an IPv6 packet */
+#else
 	}
 	/* if this is an IPv6 packet */
 	else if (m->ol_flags & PKT_RX_IPV6_HDR) {
+#endif
 		struct ipv6_hdr *ip_hdr;
 
 		ipv6 = 1;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 18/18] mbuf: remove old packet type bit masks
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (16 preceding siblings ...)
  2015-06-01  7:34  3%     ` [dpdk-dev] [PATCH v6 17/18] examples/l3fwd: " Helin Zhang
@ 2015-06-01  7:34  4%     ` Helin Zhang
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:34 UTC (permalink / raw)
  To: dev

As unified packet types are used instead, those old bit masks and
the relevant macros for packet type indication need to be removed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 4 ++++
 lib/librte_mbuf/rte_mbuf.h | 4 ++++
 2 files changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.
* Redefined the bit masks for packet RX offload flags.

v5 changes:
* Rolled back the bit masks of RX flags, for ABI compatibility.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index f506517..0b3a4fc 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -251,14 +251,18 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
 	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
 	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+#ifndef RTE_UNIFIED_PKT_TYPE
 	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
 	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
 	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
 	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+#endif /* RTE_UNIFIED_PKT_TYPE */
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+#ifndef RTE_UNIFIED_PKT_TYPE
 	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
 	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+#endif /* RTE_UNIFIED_PKT_TYPE */
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 94e51cd..d82fc8e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,14 +91,18 @@ extern "C" {
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR     (0ULL << 0)  /**< Hardware processing error. */
 #define PKT_RX_MAC_ERR       (0ULL << 0)  /**< MAC error. */
+#ifndef RTE_UNIFIED_PKT_TYPE
 #define PKT_RX_IPV4_HDR      (1ULL << 5)  /**< RX packet with IPv4 header. */
 #define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with extended IPv4 header. */
 #define PKT_RX_IPV6_HDR      (1ULL << 7)  /**< RX packet with IPv6 header. */
 #define PKT_RX_IPV6_HDR_EXT  (1ULL << 8)  /**< RX packet with extended IPv6 header. */
+#endif /* RTE_UNIFIED_PKT_TYPE */
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
+#ifndef RTE_UNIFIED_PKT_TYPE
 #define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 header.*/
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
+#endif /* RTE_UNIFIED_PKT_TYPE */
 #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
 /* add new RX flags here */
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 16/18] examples/l3fwd-power: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (14 preceding siblings ...)
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 15/18] examples/l3fwd-acl: " Helin Zhang
@ 2015-06-01  7:34  4%     ` Helin Zhang
  2015-06-01  7:34  3%     ` [dpdk-dev] [PATCH v6 17/18] examples/l3fwd: " Helin Zhang
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 18/18] mbuf: remove old packet type bit masks Helin Zhang
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:34 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd-power/main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 6ac342b..e27ad4e 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -635,7 +635,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		/* Handle IPv4 headers.*/
 		ipv4_hdr =
 			(struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char*)
@@ -670,8 +674,12 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 	}
 	else {
+#endif
 		/* Handle IPv6 headers.*/
 #if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH)
 		struct ipv6_hdr *ipv6_hdr;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 15/18] examples/l3fwd-acl: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (13 preceding siblings ...)
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 14/18] examples/ip_reassembly: " Helin Zhang
@ 2015-06-01  7:34  4%     ` Helin Zhang
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 16/18] examples/l3fwd-power: " Helin Zhang
                       ` (2 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:34 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd-acl/main.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index a5d4f25..2da8bf1 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -645,10 +645,13 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 	struct ipv4_hdr *ipv4_hdr;
 	struct rte_mbuf *pkt = pkts_in[index];
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
 
 	if (type == PKT_RX_IPV4_HDR) {
-
+#endif
 		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
 			unsigned char *) + sizeof(struct ether_hdr));
 
@@ -667,9 +670,11 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 			/* Not a valid IPv4 packet */
 			rte_pktmbuf_free(pkt);
 		}
-
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (type == PKT_RX_IPV6_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
 		acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -687,17 +692,22 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 {
 	struct rte_mbuf *pkt = pkts_in[index];
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
 
 	if (type == PKT_RX_IPV4_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv4[acl->num_ipv4] = MBUF_IPV4_2PROTO(pkt);
 		acl->m_ipv4[(acl->num_ipv4)++] = pkt;
 
-
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (type == PKT_RX_IPV6_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
 		acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -745,10 +755,17 @@ send_one_packet(struct rte_mbuf *m, uint32_t res)
 		/* in the ACL list, drop it */
 #ifdef L3FWDACL_DEBUG
 		if ((res & ACL_DENY_SIGNATURE) != 0) {
+#ifdef RTE_UNIFIED_PKT_TYPE
+			if (RTE_ETH_IS_IPV4_HDR(m->packet_type))
+				dump_acl4_rule(m, res);
+			else if (RTE_ETH_IS_IPV6_HDR(m->packet_type))
+				dump_acl6_rule(m, res);
+#else
 			if (m->ol_flags & PKT_RX_IPV4_HDR)
 				dump_acl4_rule(m, res);
 			else
 				dump_acl6_rule(m, res);
+#endif /* RTE_UNIFIED_PKT_TYPE */
 		}
 #endif
 		rte_pktmbuf_free(m);
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 14/18] examples/ip_reassembly: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (12 preceding siblings ...)
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
@ 2015-06-01  7:34  4%     ` Helin Zhang
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 15/18] examples/l3fwd-acl: " Helin Zhang
                       ` (3 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:34 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/ip_reassembly/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 9ecb6f9..cb131f6 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -356,7 +356,11 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	dst_port = portid;
 
 	/* if packet is IPv4 */
+#ifdef RTE_UNIFIED_PKT_TYPE
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & (PKT_RX_IPV4_HDR)) {
+#endif
 		struct ipv4_hdr *ip_hdr;
 		uint32_t ip_dst;
 
@@ -396,9 +400,14 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
+#ifdef RTE_UNIFIED_PKT_TYPE
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+		/* if packet is IPv6 */
+#else
 	}
 	/* if packet is IPv6 */
 	else if (m->ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) {
+#endif
 		struct ipv6_extension_fragment *frag_hdr;
 		struct ipv6_hdr *ip_hdr;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 12/18] app/test: Remove useless code
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (10 preceding siblings ...)
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 11/18] app/testpmd: " Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
                       ` (5 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

Severl useless code lines are added accidently, which blocks packet
type unification. They should be deleted at all.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test/packet_burst_generator.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

v4 changes:
* Removed several useless code lines which block packet type unification.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/app/test/packet_burst_generator.c b/app/test/packet_burst_generator.c
index b46eed7..6b1bbb5 100644
--- a/app/test/packet_burst_generator.c
+++ b/app/test/packet_burst_generator.c
@@ -272,19 +272,21 @@ nomore_mbuf:
 		if (ipv4) {
 			pkt->vlan_tci  = ETHER_TYPE_IPv4;
 			pkt->l3_len = sizeof(struct ipv4_hdr);
-
+#ifndef RTE_UNIFIED_PKT_TYPE
 			if (vlan_enabled)
 				pkt->ol_flags = PKT_RX_IPV4_HDR | PKT_RX_VLAN_PKT;
 			else
 				pkt->ol_flags = PKT_RX_IPV4_HDR;
+#endif
 		} else {
 			pkt->vlan_tci  = ETHER_TYPE_IPv6;
 			pkt->l3_len = sizeof(struct ipv6_hdr);
-
+#ifndef RTE_UNIFIED_PKT_TYPE
 			if (vlan_enabled)
 				pkt->ol_flags = PKT_RX_IPV6_HDR | PKT_RX_VLAN_PKT;
 			else
 				pkt->ol_flags = PKT_RX_IPV6_HDR;
+#endif
 		}
 
 		pkts_burst[nb_pkt] = pkt;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 09/18] fm10k: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (7 preceding siblings ...)
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 08/18] vmxnet3: " Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 10/18] app/test-pipeline: " Helin Zhang
                       ` (8 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

v4 changes:
* Supported unified packet type of fm10k from v4.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 56df6cd..71a7f5d 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -68,12 +68,37 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 static inline void
 rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d)
 {
+#ifdef RTE_UNIFIED_PKT_TYPE
+	static const uint32_t
+		ptype_table[FM10K_RXD_PKTTYPE_MASK >> FM10K_RXD_PKTTYPE_SHIFT]
+			__rte_cache_aligned = {
+		[FM10K_PKTTYPE_OTHER] = RTE_PTYPE_L2_MAC,
+		[FM10K_PKTTYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+		[FM10K_PKTTYPE_IPV4_EX] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[FM10K_PKTTYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+		[FM10K_PKTTYPE_IPV6_EX] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[FM10K_PKTTYPE_IPV4 | FM10K_PKTTYPE_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[FM10K_PKTTYPE_IPV6 | FM10K_PKTTYPE_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[FM10K_PKTTYPE_IPV4 | FM10K_PKTTYPE_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[FM10K_PKTTYPE_IPV6 | FM10K_PKTTYPE_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+	};
+
+	m->packet_type = ptype_table[(d->w.pkt_info & FM10K_RXD_PKTTYPE_MASK)
+						>> FM10K_RXD_PKTTYPE_SHIFT];
+#else /* RTE_UNIFIED_PKT_TYPE */
 	uint16_t ptype;
 	static const uint16_t pt_lut[] = { 0,
 		PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT,
 		PKT_RX_IPV6_HDR, PKT_RX_IPV6_HDR_EXT,
 		0, 0, 0
 	};
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 	if (d->w.pkt_info & FM10K_RXD_RSSTYPE_MASK)
 		m->ol_flags |= PKT_RX_RSS_HASH;
@@ -97,9 +122,11 @@ rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d)
 	if (unlikely(d->d.staterr & FM10K_RXD_STATUS_RXE))
 		m->ol_flags |= PKT_RX_RECIP_ERR;
 
+#ifndef RTE_UNIFIED_PKT_TYPE
 	ptype = (d->d.data & FM10K_RXD_PKTTYPE_MASK_L3) >>
 						FM10K_RXD_PKTTYPE_SHIFT;
 	m->ol_flags |= pt_lut[(uint8_t)ptype];
+#endif
 }
 
 uint16_t
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 11/18] app/testpmd: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (9 preceding siblings ...)
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 10/18] app/test-pipeline: " Helin Zhang
@ 2015-06-01  7:33  3%     ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 12/18] app/test: Remove useless code Helin Zhang
                       ` (6 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c |  14 ++++
 app/test-pmd/rxonly.c   | 183 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v4 changes:
* Added printing logs of packet types of each received packet in rxonly mode.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index c180ff2..43ab6f8 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -202,8 +202,14 @@ parse_ethernet(struct ether_hdr *eth_hdr, struct testpmd_offload_info *info)
 
 /* Parse a vxlan header */
 static void
+#ifdef RTE_UNIFIED_PKT_TYPE
+parse_vxlan(struct udp_hdr *udp_hdr,
+	    struct testpmd_offload_info *info,
+	    uint32_t pkt_type)
+#else
 parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
 	uint64_t mbuf_olflags)
+#endif
 {
 	struct ether_hdr *eth_hdr;
 
@@ -211,8 +217,12 @@ parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
 	 * (rfc7348) or that the rx offload flag is set (i40e only
 	 * currently) */
 	if (udp_hdr->dst_port != _htons(4789) &&
+#ifdef RTE_UNIFIED_PKT_TYPE
+		RTE_ETH_IS_TUNNEL_PKT(pkt_type) == 0)
+#else
 		(mbuf_olflags & (PKT_RX_TUNNEL_IPV4_HDR |
 			PKT_RX_TUNNEL_IPV6_HDR)) == 0)
+#endif
 		return;
 
 	info->is_tunnel = 1;
@@ -549,7 +559,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				struct udp_hdr *udp_hdr;
 				udp_hdr = (struct udp_hdr *)((char *)l3_hdr +
 					info.l3_len);
+#ifdef RTE_UNIFIED_PKT_TYPE
+				parse_vxlan(udp_hdr, &info, m->packet_type);
+#else
 				parse_vxlan(udp_hdr, &info, m->ol_flags);
+#endif
 			} else if (info.l4_proto == IPPROTO_GRE) {
 				struct simple_gre_hdr *gre_hdr;
 				gre_hdr = (struct simple_gre_hdr *)
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index ac56090..e6767be 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -91,7 +91,11 @@ pkt_burst_receive(struct fwd_stream *fs)
 	uint64_t ol_flags;
 	uint16_t nb_rx;
 	uint16_t i, packet_type;
+#ifdef RTE_UNIFIED_PKT_TYPE
+	uint16_t is_encapsulation;
+#else
 	uint64_t is_encapsulation;
+#endif
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -135,8 +139,12 @@ pkt_burst_receive(struct fwd_stream *fs)
 		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
+#else
 		is_encapsulation = ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
 				PKT_RX_TUNNEL_IPV6_HDR);
+#endif
 
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
@@ -160,6 +168,177 @@ pkt_burst_receive(struct fwd_stream *fs)
 		}
 		if (ol_flags & PKT_RX_VLAN_PKT)
 			printf(" - VLAN tci=0x%x", mb->vlan_tci);
+#ifdef RTE_UNIFIED_PKT_TYPE
+		if (mb->packet_type) {
+			uint32_t ptype;
+
+			/* (outer) L2 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L2_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L2_MAC:
+				printf(" - (outer) L2 type: MAC");
+				break;
+			case RTE_PTYPE_L2_MAC_TIMESYNC:
+				printf(" - (outer) L2 type: MAC Timesync");
+				break;
+			case RTE_PTYPE_L2_ARP:
+				printf(" - (outer) L2 type: ARP");
+				break;
+			case RTE_PTYPE_L2_LLDP:
+				printf(" - (outer) L2 type: LLDP");
+				break;
+			default:
+				printf(" - (outer) L2 type: Unknown");
+				break;
+			}
+
+			/* (outer) L3 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L3_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L3_IPV4:
+				printf(" - (outer) L3 type: IPV4");
+				break;
+			case RTE_PTYPE_L3_IPV4_EXT:
+				printf(" - (outer) L3 type: IPV4_EXT");
+				break;
+			case RTE_PTYPE_L3_IPV6:
+				printf(" - (outer) L3 type: IPV6");
+				break;
+			case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+				printf(" - (outer) L3 type: IPV4_EXT_UNKNOWN");
+				break;
+			case RTE_PTYPE_L3_IPV6_EXT:
+				printf(" - (outer) L3 type: IPV6_EXT");
+				break;
+			case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+				printf(" - (outer) L3 type: IPV6_EXT_UNKNOWN");
+				break;
+			default:
+				printf(" - (outer) L3 type: Unknown");
+				break;
+			}
+
+			/* (outer) L4 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L4_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L4_TCP:
+				printf(" - (outer) L4 type: TCP");
+				break;
+			case RTE_PTYPE_L4_UDP:
+				printf(" - (outer) L4 type: UDP");
+				break;
+			case RTE_PTYPE_L4_FRAG:
+				printf(" - (outer) L4 type: L4_FRAG");
+				break;
+			case RTE_PTYPE_L4_SCTP:
+				printf(" - (outer) L4 type: SCTP");
+				break;
+			case RTE_PTYPE_L4_ICMP:
+				printf(" - (outer) L4 type: ICMP");
+				break;
+			case RTE_PTYPE_L4_NONFRAG:
+				printf(" - (outer) L4 type: L4_NONFRAG");
+				break;
+			default:
+				printf(" - (outer) L4 type: Unknown");
+				break;
+			}
+
+			/* packet tunnel type */
+			ptype = mb->packet_type & RTE_PTYPE_TUNNEL_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_TUNNEL_IP:
+				printf(" - Tunnel type: IP");
+				break;
+			case RTE_PTYPE_TUNNEL_GRE:
+				printf(" - Tunnel type: GRE");
+				break;
+			case RTE_PTYPE_TUNNEL_VXLAN:
+				printf(" - Tunnel type: VXLAN");
+				break;
+			case RTE_PTYPE_TUNNEL_NVGRE:
+				printf(" - Tunnel type: NVGRE");
+				break;
+			case RTE_PTYPE_TUNNEL_GENEVE:
+				printf(" - Tunnel type: GENEVE");
+				break;
+			case RTE_PTYPE_TUNNEL_GRENAT:
+				printf(" - Tunnel type: GRENAT");
+				break;
+			default:
+				printf(" - Tunnel type: Unkown");
+				break;
+			}
+
+			/* inner L2 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_L2_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L2_MAC:
+				printf(" - Inner L2 type: MAC");
+				break;
+			case RTE_PTYPE_INNER_L2_MAC_VLAN:
+				printf(" - Inner L2 type: MAC_VLAN");
+				break;
+			default:
+				printf(" - Inner L2 type: Unknown");
+				break;
+			}
+
+			/* inner L3 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_INNER_L3_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L3_IPV4:
+				printf(" - Inner L3 type: IPV4");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV4_EXT:
+				printf(" - Inner L3 type: IPV4_EXT");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6:
+				printf(" - Inner L3 type: IPV6");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+				printf(" - Inner L3 type: IPV4_EXT_UNKNOWN");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6_EXT:
+				printf(" - Inner L3 type: IPV6_EXT");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+				printf(" - Inner L3 type: IPV6_EXT_UNKOWN");
+				break;
+			default:
+				printf(" - Inner L3 type: Unkown");
+				break;
+			}
+
+			/* inner L4 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_L4_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L4_TCP:
+				printf(" - Inner L4 type: TCP");
+				break;
+			case RTE_PTYPE_INNER_L4_UDP:
+				printf(" - Inner L4 type: UDP");
+				break;
+			case RTE_PTYPE_INNER_L4_FRAG:
+				printf(" - Inner L4 type: L4_FRAG");
+				break;
+			case RTE_PTYPE_INNER_L4_SCTP:
+				printf(" - Inner L4 type: SCTP");
+				break;
+			case RTE_PTYPE_INNER_L4_ICMP:
+				printf(" - Inner L4 type: ICMP");
+				break;
+			case RTE_PTYPE_INNER_L4_NONFRAG:
+				printf(" - Inner L4 type: L4_NONFRAG");
+				break;
+			default:
+				printf(" - Inner L4 type: Unknown");
+				break;
+			}
+			printf("\n");
+		} else
+			printf("Unknown packet type\n");
+#endif /* RTE_UNIFIED_PKT_TYPE */
 		if (is_encapsulation) {
 			struct ipv4_hdr *ipv4_hdr;
 			struct ipv6_hdr *ipv6_hdr;
@@ -173,7 +352,11 @@ pkt_burst_receive(struct fwd_stream *fs)
 			l2_len  = sizeof(struct ether_hdr);
 
 			 /* Do not support ipv4 option field */
+#ifdef RTE_UNIFIED_PKT_TYPE
+			if (RTE_ETH_IS_IPV4_HDR(packet_type)) {
+#else
 			if (ol_flags & PKT_RX_TUNNEL_IPV4_HDR) {
+#endif
 				l3_len = sizeof(struct ipv4_hdr);
 				ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 06/18] i40e: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (4 preceding siblings ...)
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 05/18] ixgbe: " Helin Zhang
@ 2015-06-01  7:33  3%     ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 07/18] enic: " Helin Zhang
                       ` (11 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_rxtx.c | 528 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 528 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 787f0bd..e20c98d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -151,6 +151,514 @@ i40e_rxd_error_to_pkt_flags(uint64_t qword)
 	return flags;
 }
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+/* For each value it means, datasheet of hardware can tell more details */
+static inline uint32_t
+i40e_rxd_pkt_type_mapping(uint8_t ptype)
+{
+	static const uint32_t ptype_table[UINT8_MAX] __rte_cache_aligned = {
+		/* L2 types */
+		/* [0] reserved */
+		[1] = RTE_PTYPE_L2_MAC,
+		[2] = RTE_PTYPE_L2_MAC_TIMESYNC,
+		/* [3] - [5] reserved */
+		[6] = RTE_PTYPE_L2_LLDP,
+		/* [7] - [10] reserved */
+		[11] = RTE_PTYPE_L2_ARP,
+		/* [12] - [21] reserved */
+
+		/* Non tunneled IPv4 */
+		[22] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_FRAG,
+		[23] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_NONFRAG,
+		[24] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_UDP,
+		/* [25] reserved */
+		[26] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_TCP,
+		[27] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_SCTP,
+		[28] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_ICMP,
+
+		/* IPv4 --> IPv4 */
+		[29] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[30] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[31] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [32] reserved */
+		[33] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[34] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[35] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> IPv6 */
+		[36] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[37] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[38] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [39] reserved */
+		[40] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[41] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[42] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN */
+		[43] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> IPv4 */
+		[44] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[45] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[46] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [47] reserved */
+		[48] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[49] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[50] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> IPv6 */
+		[51] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[52] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[53] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [54] reserved */
+		[55] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[56] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[57] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC */
+		[58] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4 */
+		[59] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[60] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[61] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [62] reserved */
+		[63] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[64] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[65] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv6 */
+		[66] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[67] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[68] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [69] reserved */
+		[70] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[71] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[72] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN */
+		[73] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv4 */
+		[74] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[75] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[76] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [77] reserved */
+		[78] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[79] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[80] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv6 */
+		[81] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[82] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[83] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [84] reserved */
+		[85] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[86] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[87] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* Non tunneled IPv6 */
+		[88] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_FRAG,
+		[89] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_NONFRAG,
+		[90] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_UDP,
+		/* [91] reserved */
+		[92] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_TCP,
+		[93] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_SCTP,
+		[94] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_ICMP,
+
+		/* IPv6 --> IPv4 */
+		[95] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[96] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[97] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [98] reserved */
+		[99] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[100] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[101] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> IPv6 */
+		[102] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[103] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[104] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [105] reserved */
+		[106] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[107] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[108] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN */
+		[109] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> IPv4 */
+		[110] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[111] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[112] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [113] reserved */
+		[114] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[115] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[116] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> IPv6 */
+		[117] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[118] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[119] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [120] reserved */
+		[121] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[122] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[123] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC */
+		[124] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC --> IPv4 */
+		[125] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[126] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[127] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [128] reserved */
+		[129] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[130] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[131] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC --> IPv6 */
+		[132] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[133] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[134] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [135] reserved */
+		[136] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[137] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[138] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN */
+		[139] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv4 */
+		[140] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[141] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[142] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [143] reserved */
+		[144] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[145] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[146] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv6 */
+		[147] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[148] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[149] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [150] reserved */
+		[151] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[152] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[153] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* All others reserved */
+	};
+
+	return ptype_table[ptype];
+}
+#else /* RTE_UNIFIED_PKT_TYPE */
 /* Translate pkt types to pkt flags */
 static inline uint64_t
 i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
@@ -418,6 +926,7 @@ i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
 
 	return ip_ptype_map[ptype];
 }
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 #define I40E_RX_DESC_EXT_STATUS_FLEXBH_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FD_ID  0x01
@@ -709,11 +1218,18 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 				rxdp[j].wb.qword0.lo_dword.l2tag1) : 0;
 			pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_UNIFIED_PKT_TYPE
+			mb->packet_type =
+				i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+						I40E_RXD_QW1_PTYPE_MASK) >>
+						I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 			pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 
 			mb->packet_type = (uint16_t)((qword1 &
 					I40E_RXD_QW1_PTYPE_MASK) >>
 					I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_UNIFIED_PKT_TYPE */
 			if (pkt_flags & PKT_RX_RSS_HASH)
 				mb->hash.rss = rte_le_to_cpu_32(\
 					rxdp[j].wb.qword0.hi_dword.rss);
@@ -952,9 +1468,15 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_UNIFIED_PKT_TYPE
+		rxm->packet_type =
+			i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+			I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 		rxm->packet_type = (uint16_t)((qword1 & I40E_RXD_QW1_PTYPE_MASK) >>
 				I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_UNIFIED_PKT_TYPE */
 		if (pkt_flags & PKT_RX_RSS_HASH)
 			rxm->hash.rss =
 				rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
@@ -1111,10 +1633,16 @@ i40e_recv_scattered_pkts(void *rx_queue,
 			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_UNIFIED_PKT_TYPE
+		first_seg->packet_type =
+			i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+			I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 		first_seg->packet_type = (uint16_t)((qword1 &
 					I40E_RXD_QW1_PTYPE_MASK) >>
 					I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_UNIFIED_PKT_TYPE */
 		if (pkt_flags & PKT_RX_RSS_HASH)
 			rxm->hash.rss =
 				rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 10/18] app/test-pipeline: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (8 preceding siblings ...)
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 09/18] fm10k: " Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 11/18] app/testpmd: " Helin Zhang
                       ` (7 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pipeline/pipeline_hash.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/app/test-pipeline/pipeline_hash.c b/app/test-pipeline/pipeline_hash.c
index 4598ad4..bc84920 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -459,20 +459,33 @@ app_main_loop_rx_metadata(void) {
 			signature = RTE_MBUF_METADATA_UINT32_PTR(m, 0);
 			key = RTE_MBUF_METADATA_UINT8_PTR(m, 32);
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+			if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 			if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 				ip_hdr = (struct ipv4_hdr *)
 					&m_data[sizeof(struct ether_hdr)];
 				ip_dst = ip_hdr->dst_addr;
 
 				k32 = (uint32_t *) key;
 				k32[0] = ip_dst & 0xFFFFFF00;
+#ifdef RTE_UNIFIED_PKT_TYPE
+			} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 			} else {
+#endif
 				ipv6_hdr = (struct ipv6_hdr *)
 					&m_data[sizeof(struct ether_hdr)];
 				ipv6_dst = ipv6_hdr->dst_addr;
 
 				memcpy(key, ipv6_dst, 16);
+#ifdef RTE_UNIFIED_PKT_TYPE
+			} else
+				continue;
+#else
 			}
+#endif
 
 			*signature = test_hash(key, 0, 0);
 		}
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 08/18] vmxnet3: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (6 preceding siblings ...)
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 07/18] enic: " Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 09/18] fm10k: " Helin Zhang
                       ` (9 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index a1eac45..89b600b 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -649,9 +649,17 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);
 
 			if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct ipv4_hdr))
+#ifdef RTE_UNIFIED_PKT_TYPE
+				rxm->packet_type = RTE_PTYPE_L3_IPV4_EXT;
+#else
 				rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+#endif
 			else
+#ifdef RTE_UNIFIED_PKT_TYPE
+				rxm->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 				rxm->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 
 			if (!rcd->cnc) {
 				if (!rcd->ipc)
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 07/18] enic: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (5 preceding siblings ...)
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 06/18] i40e: " Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 08/18] vmxnet3: " Helin Zhang
                       ` (10 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/enic/enic_main.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 15313c2..50cd8c9 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -423,7 +423,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 		rx_pkt->pkt_len = bytes_written;
 
 		if (ipv4) {
+#ifdef RTE_UNIFIED_PKT_TYPE
+			rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 			rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 			if (!csum_not_calc) {
 				if (unlikely(!ipv4_csum_ok))
 					rx_pkt->ol_flags |= PKT_RX_IP_CKSUM_BAD;
@@ -432,7 +436,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 					rx_pkt->ol_flags |= PKT_RX_L4_CKSUM_BAD;
 			}
 		} else if (ipv6)
+#ifdef RTE_UNIFIED_PKT_TYPE
+			rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
+#else
 			rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 	} else {
 		/* Header split */
 		if (sop && !eop) {
@@ -445,7 +453,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 				*rx_pkt_bucket = rx_pkt;
 				rx_pkt->pkt_len = bytes_written;
 				if (ipv4) {
+#ifdef RTE_UNIFIED_PKT_TYPE
+					rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 					rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 					if (!csum_not_calc) {
 						if (unlikely(!ipv4_csum_ok))
 							rx_pkt->ol_flags |=
@@ -457,13 +469,22 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 							    PKT_RX_L4_CKSUM_BAD;
 					}
 				} else if (ipv6)
+#ifdef RTE_UNIFIED_PKT_TYPE
+					rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
+#else
 					rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 			} else {
 				/* Payload */
 				hdr_rx_pkt = *rx_pkt_bucket;
 				hdr_rx_pkt->pkt_len += bytes_written;
 				if (ipv4) {
+#ifdef RTE_UNIFIED_PKT_TYPE
+					hdr_rx_pkt->packet_type =
+						RTE_PTYPE_L3_IPV4;
+#else
 					hdr_rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 					if (!csum_not_calc) {
 						if (unlikely(!ipv4_csum_ok))
 							hdr_rx_pkt->ol_flags |=
@@ -475,7 +496,12 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 							    PKT_RX_L4_CKSUM_BAD;
 					}
 				} else if (ipv6)
+#ifdef RTE_UNIFIED_PKT_TYPE
+					hdr_rx_pkt->packet_type =
+						RTE_PTYPE_L3_IPV6;
+#else
 					hdr_rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 
 			}
 		}
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 04/18] e1000: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (2 preceding siblings ...)
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 03/18] mbuf: add definitions of unified packet types Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 05/18] ixgbe: " Helin Zhang
                       ` (13 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_rxtx.c | 102 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index f586311..112b876 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -590,6 +590,99 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
  *  RX functions
  *
  **********************************************************************/
+#ifdef RTE_UNIFIED_PKT_TYPE
+#define IGB_PACKET_TYPE_IPV4              0X01
+#define IGB_PACKET_TYPE_IPV4_TCP          0X11
+#define IGB_PACKET_TYPE_IPV4_UDP          0X21
+#define IGB_PACKET_TYPE_IPV4_SCTP         0X41
+#define IGB_PACKET_TYPE_IPV4_EXT          0X03
+#define IGB_PACKET_TYPE_IPV4_EXT_SCTP     0X43
+#define IGB_PACKET_TYPE_IPV6              0X04
+#define IGB_PACKET_TYPE_IPV6_TCP          0X14
+#define IGB_PACKET_TYPE_IPV6_UDP          0X24
+#define IGB_PACKET_TYPE_IPV6_EXT          0X0C
+#define IGB_PACKET_TYPE_IPV6_EXT_TCP      0X1C
+#define IGB_PACKET_TYPE_IPV6_EXT_UDP      0X2C
+#define IGB_PACKET_TYPE_IPV4_IPV6         0X05
+#define IGB_PACKET_TYPE_IPV4_IPV6_TCP     0X15
+#define IGB_PACKET_TYPE_IPV4_IPV6_UDP     0X25
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT     0X0D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IGB_PACKET_TYPE_MAX               0X80
+#define IGB_PACKET_TYPE_MASK              0X7F
+#define IGB_PACKET_TYPE_SHIFT             0X04
+static inline uint32_t
+igb_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+	static const uint32_t
+		ptype_table[IGB_PACKET_TYPE_MAX] __rte_cache_aligned = {
+		[IGB_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+		[IGB_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[IGB_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+		[IGB_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6,
+		[IGB_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT,
+		[IGB_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+		[IGB_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_UDP] =  RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+		[IGB_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+		[IGB_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+	};
+	if (unlikely(pkt_info & E1000_RXDADV_PKTTYPE_ETQF))
+		return RTE_PTYPE_UNKNOWN;
+
+	pkt_info = (pkt_info >> IGB_PACKET_TYPE_SHIFT) & IGB_PACKET_TYPE_MASK;
+
+	return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
+{
+	uint64_t pkt_flags = ((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH;
+
+#if defined(RTE_LIBRTE_IEEE1588)
+	static uint32_t ip_pkt_etqf_map[8] = {
+		0, 0, 0, PKT_RX_IEEE1588_PTP,
+		0, 0, 0, 0,
+	};
+
+	pkt_flags |= ip_pkt_etqf_map[(hl_tp_rs >> 4) & 0x07];
+#endif
+
+	return pkt_flags;
+}
+#else /* RTE_UNIFIED_PKT_TYPE */
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
@@ -617,6 +710,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 #endif
 	return pkt_flags | (((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH);
 }
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 static inline uint64_t
 rx_desc_status_to_pkt_flags(uint32_t rx_status)
@@ -790,6 +884,10 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		rxm->ol_flags = pkt_flags;
+#ifdef RTE_UNIFIED_PKT_TYPE
+		rxm->packet_type = igb_rxd_pkt_info_to_pkt_type(rxd.wb.lower.
+						lo_dword.hs_rss.pkt_info);
+#endif
 
 		/*
 		 * Store the mbuf address into the next entry of the array
@@ -1024,6 +1122,10 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		first_seg->ol_flags = pkt_flags;
+#ifdef RTE_UNIFIED_PKT_TYPE
+		first_seg->packet_type = igb_rxd_pkt_info_to_pkt_type(rxd.wb.
+					lower.lo_dword.hs_rss.pkt_info);
+#endif
 
 		/* Prefetch data of first segment, if configured to do so. */
 		rte_packet_prefetch((char *)first_seg->buf_addr +
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 05/18] ixgbe: replace bit mask based packet type with unified packet type
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
                       ` (3 preceding siblings ...)
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
@ 2015-06-01  7:33  3%     ` Helin Zhang
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 06/18] i40e: " Helin Zhang
                       ` (12 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify packet type among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
Note that around 2.5% performance drop (64B) was observed of doing
4 ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 163 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 163 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 4f9ab22..c4d9b02 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -855,6 +855,110 @@ end_of_tx:
  *  RX functions
  *
  **********************************************************************/
+#ifdef RTE_UNIFIED_PKT_TYPE
+#define IXGBE_PACKET_TYPE_IPV4              0X01
+#define IXGBE_PACKET_TYPE_IPV4_TCP          0X11
+#define IXGBE_PACKET_TYPE_IPV4_UDP          0X21
+#define IXGBE_PACKET_TYPE_IPV4_SCTP         0X41
+#define IXGBE_PACKET_TYPE_IPV4_EXT          0X03
+#define IXGBE_PACKET_TYPE_IPV4_EXT_SCTP     0X43
+#define IXGBE_PACKET_TYPE_IPV6              0X04
+#define IXGBE_PACKET_TYPE_IPV6_TCP          0X14
+#define IXGBE_PACKET_TYPE_IPV6_UDP          0X24
+#define IXGBE_PACKET_TYPE_IPV6_EXT          0X0C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_TCP      0X1C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_UDP      0X2C
+#define IXGBE_PACKET_TYPE_IPV4_IPV6         0X05
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_TCP     0X15
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_UDP     0X25
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT     0X0D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IXGBE_PACKET_TYPE_MAX               0X80
+#define IXGBE_PACKET_TYPE_MASK              0X7F
+#define IXGBE_PACKET_TYPE_SHIFT             0X04
+static inline uint32_t
+ixgbe_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+	static const uint32_t
+		ptype_table[IXGBE_PACKET_TYPE_MAX] __rte_cache_aligned = {
+		[IXGBE_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4,
+		[IXGBE_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[IXGBE_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6,
+		[IXGBE_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT,
+		[IXGBE_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+		[IXGBE_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+	};
+	if (unlikely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+		return RTE_PTYPE_UNKNOWN;
+
+	pkt_info = (pkt_info >> IXGBE_PACKET_TYPE_SHIFT) &
+				IXGBE_PACKET_TYPE_MASK;
+
+	return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+ixgbe_rxd_pkt_info_to_pkt_flags(uint16_t pkt_info)
+{
+	static uint64_t ip_rss_types_map[16] __rte_cache_aligned = {
+		0, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH,
+		0, PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH,
+		PKT_RX_RSS_HASH, 0, 0, 0,
+		0, 0, 0,  PKT_RX_FDIR,
+	};
+#ifdef RTE_LIBRTE_IEEE1588
+	static uint64_t ip_pkt_etqf_map[8] = {
+		0, 0, 0, PKT_RX_IEEE1588_PTP,
+		0, 0, 0, 0,
+	};
+
+	if (likely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+		return ip_pkt_etqf_map[(pkt_info >> 4) & 0X07] |
+				ip_rss_types_map[pkt_info & 0XF];
+	else
+		return ip_rss_types_map[pkt_info & 0XF];
+#else
+	return ip_rss_types_map[pkt_info & 0XF];
+#endif
+}
+#else /* RTE_UNIFIED_PKT_TYPE */
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
@@ -890,6 +994,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 #endif
 	return pkt_flags | ip_rss_types_map[hl_tp_rs & 0xF];
 }
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 static inline uint64_t
 rx_desc_status_to_pkt_flags(uint32_t rx_status)
@@ -945,7 +1050,13 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
 	uint64_t pkt_flags;
+#ifdef RTE_UNIFIED_PKT_TYPE
+	int nb_dd;
+	uint32_t s[LOOK_AHEAD];
+	uint16_t pkt_info[LOOK_AHEAD];
+#else
 	int s[LOOK_AHEAD], nb_dd;
+#endif /* RTE_UNIFIED_PKT_TYPE */
 	int i, j, nb_rx = 0;
 
 
@@ -968,6 +1079,12 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 		for (j = LOOK_AHEAD-1; j >= 0; --j)
 			s[j] = rxdp[j].wb.upper.status_error;
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+		for (j = LOOK_AHEAD-1; j >= 0; --j)
+			pkt_info[j] = rxdp[j].wb.lower.lo_dword.
+						hs_rss.pkt_info;
+#endif /* RTE_UNIFIED_PKT_TYPE */
+
 		/* Compute how many status bits were set */
 		nb_dd = 0;
 		for (j = 0; j < LOOK_AHEAD; ++j)
@@ -985,12 +1102,22 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 			mb->vlan_tci = rte_le_to_cpu_16(rxdp[j].wb.upper.vlan);
 
 			/* convert descriptor fields to rte mbuf flags */
+#ifdef RTE_UNIFIED_PKT_TYPE
+			pkt_flags = rx_desc_status_to_pkt_flags(s[j]);
+			pkt_flags |= rx_desc_error_to_pkt_flags(s[j]);
+			pkt_flags |=
+				ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info[j]);
+			mb->ol_flags = pkt_flags;
+			mb->packet_type =
+				ixgbe_rxd_pkt_info_to_pkt_type(pkt_info[j]);
+#else /* RTE_UNIFIED_PKT_TYPE */
 			pkt_flags  = rx_desc_hlen_type_rss_to_pkt_flags(
 					rxdp[j].wb.lower.lo_dword.data);
 			/* reuse status field from scan list */
 			pkt_flags |= rx_desc_status_to_pkt_flags(s[j]);
 			pkt_flags |= rx_desc_error_to_pkt_flags(s[j]);
 			mb->ol_flags = pkt_flags;
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 			if (likely(pkt_flags & PKT_RX_RSS_HASH))
 				mb->hash.rss = rxdp[j].wb.lower.hi_dword.rss;
@@ -1207,7 +1334,11 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	union ixgbe_adv_rx_desc rxd;
 	uint64_t dma_addr;
 	uint32_t staterr;
+#ifdef RTE_UNIFIED_PKT_TYPE
+	uint32_t pkt_info;
+#else
 	uint32_t hlen_type_rss;
+#endif
 	uint16_t pkt_len;
 	uint16_t rx_id;
 	uint16_t nb_rx;
@@ -1325,6 +1456,19 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		rxm->data_len = pkt_len;
 		rxm->port = rxq->port_id;
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+		pkt_info = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.hs_rss.
+								pkt_info);
+		/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
+		rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
+
+		pkt_flags = rx_desc_status_to_pkt_flags(staterr);
+		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
+		pkt_flags = pkt_flags |
+			ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info);
+		rxm->ol_flags = pkt_flags;
+		rxm->packet_type = ixgbe_rxd_pkt_info_to_pkt_type(pkt_info);
+#else /* RTE_UNIFIED_PKT_TYPE */
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
 		rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
@@ -1333,6 +1477,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		rxm->ol_flags = pkt_flags;
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 		if (likely(pkt_flags & PKT_RX_RSS_HASH))
 			rxm->hash.rss = rxd.wb.lower.hi_dword.rss;
@@ -1406,6 +1551,23 @@ ixgbe_fill_cluster_head_buf(
 	uint8_t port_id,
 	uint32_t staterr)
 {
+#ifdef RTE_UNIFIED_PKT_TYPE
+	uint16_t pkt_info;
+	uint64_t pkt_flags;
+
+	head->port = port_id;
+
+	/* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is
+	 * set in the pkt_flags field.
+	 */
+	head->vlan_tci = rte_le_to_cpu_16(desc->wb.upper.vlan);
+	pkt_info = rte_le_to_cpu_32(desc->wb.lower.lo_dword.hs_rss.pkt_info);
+	pkt_flags = rx_desc_status_to_pkt_flags(staterr);
+	pkt_flags |= rx_desc_error_to_pkt_flags(staterr);
+	pkt_flags |= ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info);
+	head->ol_flags = pkt_flags;
+	head->packet_type = ixgbe_rxd_pkt_info_to_pkt_type(pkt_info);
+#else /* RTE_UNIFIED_PKT_TYPE */
 	uint32_t hlen_type_rss;
 	uint64_t pkt_flags;
 
@@ -1421,6 +1583,7 @@ ixgbe_fill_cluster_head_buf(
 	pkt_flags |= rx_desc_status_to_pkt_flags(staterr);
 	pkt_flags |= rx_desc_error_to_pkt_flags(staterr);
 	head->ol_flags = pkt_flags;
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 	if (likely(pkt_flags & PKT_RX_RSS_HASH))
 		head->hash.rss = rte_le_to_cpu_32(desc->wb.lower.hi_dword.rss);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 03/18] mbuf: add definitions of unified packet types
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
@ 2015-06-01  7:33  3%     ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
                       ` (14 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

As there are only 6 bit flags in ol_flags for indicating packet
types, which is not enough to describe all the possible packet
types hardware can recognize. For example, i40e hardware can
recognize more than 150 packet types. Unified packet type is
composed of L2 type, L3 type, L4 type, tunnel type, inner L2 type,
inner L3 type and inner L4 type fields, and can be stored in
'struct rte_mbuf' of 32 bits field 'packet_type'.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 487 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 487 insertions(+)

v3 changes:
* Put the definitions of unified packet type into a single patch.

v4 changes:
* Added detailed description of each packet types.

v5 changes:
* Re-worded the commit logs.
* Added more detailed description for all packet types, together with examples.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a8662c2..94e51cd 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -195,6 +195,493 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+/*
+ * 32 bits are divided into several fields to mark packet types. Note that
+ * each field is indexical.
+ * - Bit 3:0 is for L2 types.
+ * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
+ * - Bit 11:8 is for L4 or outer L4 (for tunneling case) types.
+ * - Bit 15:12 is for tunnel types.
+ * - Bit 19:16 is for inner L2 types.
+ * - Bit 23:20 is for inner L3 types.
+ * - Bit 27:24 is for inner L4 types.
+ * - Bit 31:28 is reserved.
+ *
+ * To be compatible with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,
+ * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
+ * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
+ *
+ * Note that L3 types values are selected for checking IPV4/IPV6 header from
+ * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
+ * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
+ *
+ * Note that the packet types of the same packet recognized by different
+ * hardware may be different, as different hardware may have different
+ * capability of packet type recognition.
+ *
+ * examples:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=0x29
+ * | 'version'=6, 'next header'=0x3A
+ * | 'ICMPv6 header'>
+ * will be recognized on i40e hardware as packet type combination of,
+ * RTE_PTYPE_L2_MAC |
+ * RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+ * RTE_PTYPE_TUNNEL_IP |
+ * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_INNER_L4_ICMP.
+ *
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x2F
+ * | 'GRE header'
+ * | 'version'=6, 'next header'=0x11
+ * | 'UDP header'>
+ * will be recognized on i40e hardware as packet type combination of,
+ * RTE_PTYPE_L2_MAC |
+ * RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_TUNNEL_GRENAT |
+ * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_INNER_L4_UDP.
+ */
+#define RTE_PTYPE_UNKNOWN                   0x00000000
+/**
+ * MAC (Media Access Control) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=[0x0800|0x86DD|others]>
+ */
+#define RTE_PTYPE_L2_MAC                    0x00000001
+/**
+ * MAC (Media Access Control) packet type for time sync.
+ *
+ * Packet format:
+ * <'ether type'=0x88F7>
+ */
+#define RTE_PTYPE_L2_MAC_TIMESYNC           0x00000002
+/**
+ * ARP (Address Resolution Protocol) packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0806>
+ */
+#define RTE_PTYPE_L2_ARP                    0x00000003
+/**
+ * LLDP (Link Layer Discovery Protocol) packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x88CC>
+ */
+#define RTE_PTYPE_L2_LLDP                   0x00000004
+/**
+ * Mask of layer 2 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L2_MASK                   0x0000000f
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and does not contain any
+ * header option.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=5>
+ */
+#define RTE_PTYPE_L3_IPV4                   0x00000010
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and contains header
+ * options.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[6-15], 'options'>
+ */
+#define RTE_PTYPE_L3_IPV4_EXT               0x00000030
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and does not contain any
+ * extension header.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x3B>
+ */
+#define RTE_PTYPE_L3_IPV6                   0x00000040
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and may or maynot contain
+ * header options.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[5-15], <'options'>>
+ */
+#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x00000090
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and contains extension
+ * headers.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   'extension headers'>
+ */
+#define RTE_PTYPE_L3_IPV6_EXT               0x000000c0
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and may or maynot contain
+ * extension headers.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   <'extension headers'>>
+ */
+#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x000000e0
+/**
+ * Mask of layer 3 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L3_MASK                   0x000000f0
+/**
+ * TCP (Transmission Control Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=6, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=6>
+ */
+#define RTE_PTYPE_L4_TCP                    0x00000100
+/**
+ * UDP (User Datagram Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17>
+ */
+#define RTE_PTYPE_L4_UDP                    0x00000200
+/**
+ * Fragmented IP (Internet Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * It refers to those packets of any IP types, which can be recognized as
+ * fragmented. A fragmented packet cannot be recognized as any other L4 types
+ * (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP,
+ * RTE_PTYPE_L4_NONFRAG).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'MF'=1>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=44>
+ */
+#define RTE_PTYPE_L4_FRAG                   0x00000300
+/**
+ * SCTP (Stream Control Transmission Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=132, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=132>
+ */
+#define RTE_PTYPE_L4_SCTP                   0x00000400
+/**
+ * ICMP (Internet Control Message Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=1, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=1>
+ */
+#define RTE_PTYPE_L4_ICMP                   0x00000500
+/**
+ * Non-fragmented IP (Internet Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * It refers to those packets of any IP types, while cannot be recognized as
+ * any of above L4 types (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
+ * RTE_PTYPE_L4_FRAG, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'!=[6|17|44|132|1]>
+ */
+#define RTE_PTYPE_L4_NONFRAG                0x00000600
+/**
+ * Mask of layer 4 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L4_MASK                   0x00000f00
+/**
+ * IP (Internet Protocol) in IP (Internet Protocol) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=[4|41]>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[4|41]>
+ */
+#define RTE_PTYPE_TUNNEL_IP                 0x00001000
+/**
+ * GRE (Generic Routing Encapsulation) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47>
+ */
+#define RTE_PTYPE_TUNNEL_GRE                0x00002000
+/**
+ * VXLAN (Virtual eXtensible Local Area Network) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=4798>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=4798>
+ */
+#define RTE_PTYPE_TUNNEL_VXLAN              0x00003000
+/**
+ * NVGRE (Network Virtualization using Generic Routing Encapsulation) tunneling
+ * packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47
+ * | 'protocol type'=0x6558>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47
+ * | 'protocol type'=0x6558'>
+ */
+#define RTE_PTYPE_TUNNEL_NVGRE              0x00004000
+/**
+ * GENEVE (Generic Network Virtualization Encapsulation) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=6081>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=6081>
+ */
+#define RTE_PTYPE_TUNNEL_GENEVE             0x00005000
+/**
+ * Tunneling packet type of Teredo, VXLAN (Virtual eXtensible Local Area
+ * Network) or GRE (Generic Routing Encapsulation) could be recognized as this
+ * packet type, if they can not be recognized independently as of hardware
+ * capability.
+ */
+#define RTE_PTYPE_TUNNEL_GRENAT             0x00006000
+/**
+ * Mask of tunneling packet types.
+ */
+#define RTE_PTYPE_TUNNEL_MASK               0x0000f000
+/**
+ * MAC (Media Access Control) packet type.
+ * It is used for inner packet type only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=[0x800|0x86DD]>
+ */
+#define RTE_PTYPE_INNER_L2_MAC              0x00010000
+/**
+ * MAC (Media Access Control) packet type with VLAN (Virtual Local Area
+ * Network) tag.
+ *
+ * Packet format (inner only):
+ * <'ether type'=[0x800|0x86DD], vlan=[1-4095]>
+ */
+#define RTE_PTYPE_INNER_L2_MAC_VLAN         0x00020000
+/**
+ * Mask of inner layer 2 packet types.
+ */
+#define RTE_PTYPE_INNER_L2_MASK             0x000f0000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and does not contain any header option.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=5>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4             0x00100000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and contains header options.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[6-15], 'options'>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x00200000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and does not contain any extension header.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x3B>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6             0x00300000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and may or maynot contain header options.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[5-15], <'options'>>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x00400000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and contains extension headers.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   'extension headers'>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x00500000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and may or maynot contain extension
+ * headers.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   <'extension headers'>>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x00600000
+/**
+ * Mask of inner layer 3 packet types.
+ */
+#define RTE_PTYPE_INNER_INNER_L3_MASK       0x00f00000
+/**
+ * TCP (Transmission Control Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=6, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=6>
+ */
+#define RTE_PTYPE_INNER_L4_TCP              0x01000000
+/**
+ * UDP (User Datagram Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17>
+ */
+#define RTE_PTYPE_INNER_L4_UDP              0x02000000
+/**
+ * Fragmented IP (Internet Protocol) packet type.
+ * It is used for inner packet only, and may or maynot have layer 4 packet.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'MF'=1>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=44>
+ */
+#define RTE_PTYPE_INNER_L4_FRAG             0x03000000
+/**
+ * SCTP (Stream Control Transmission Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=132, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=132>
+ */
+#define RTE_PTYPE_INNER_L4_SCTP             0x04000000
+/**
+ * ICMP (Internet Control Message Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=1, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=1>
+ */
+#define RTE_PTYPE_INNER_L4_ICMP             0x05000000
+/**
+ * Non-fragmented IP (Internet Protocol) packet type.
+ * It is used for inner packet only, and may or maynot have other unknown layer
+ * 4 packet types.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'!=[6|17|44|132|1]>
+ */
+#define RTE_PTYPE_INNER_L4_NONFRAG          0x06000000
+/**
+ * Mask of inner layer 4 packet types.
+ */
+#define RTE_PTYPE_INNER_L4_MASK             0x0f000000
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 6 is selected to be used for IPv4 only. Then checking bit 6 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
+
+/* Check if it is a tunneling packet */
+#define RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & RTE_PTYPE_TUNNEL_MASK)
+#endif /* RTE_UNIFIED_PKT_TYPE */
+
 /**
  * Get the name of a RX offload flag
  *
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 02/18] ixgbe: support unified packet type in vectorized PMD
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
@ 2015-06-01  7:33  3%     ` Helin Zhang
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 03/18] mbuf: add definitions of unified packet types Helin Zhang
                       ` (15 subsequent siblings)
  17 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

To unify the packet type, bit masks of packet type for ol_flags are
replaced. In addition, more packet types (UDP, TCP and SCTP) are
supported in vectorized ixgbe PMD.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
Note that around 2% performance drop (64B) was observed of doing 4
ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 config/common_linuxapp             |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 75 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 3 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v3 changes:
* Put vector ixgbe changes right after mbuf changes.
* Enabled vector ixgbe PMD by default together with changes for updated
  vector PMD.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 6b067c7..0078dc9 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_IXGBE_INC_VECTOR=y
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
 
 #
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index abd10f6..382c949 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -134,6 +134,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
  */
 #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+#define OLFLAGS_MASK_V  (((uint64_t)PKT_RX_VLAN_PKT << 48) | \
+			((uint64_t)PKT_RX_VLAN_PKT << 32) | \
+			((uint64_t)PKT_RX_VLAN_PKT << 16) | \
+			((uint64_t)PKT_RX_VLAN_PKT))
+#else
 #define OLFLAGS_MASK     ((uint16_t)(PKT_RX_VLAN_PKT | PKT_RX_IPV4_HDR |\
 				     PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\
 				     PKT_RX_IPV6_HDR_EXT))
@@ -142,11 +148,26 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 			  ((uint64_t)OLFLAGS_MASK << 16) | \
 			  ((uint64_t)OLFLAGS_MASK))
 #define PTYPE_SHIFT    (1)
+#endif /* RTE_UNIFIED_PKT_TYPE */
+
 #define VTAG_SHIFT     (3)
 
 static inline void
 desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
+#ifdef RTE_UNIFIED_PKT_TYPE
+	__m128i vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT);
+	vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V;
+#else
 	__m128i ptype0, ptype1, vtag0, vtag1;
 	union {
 		uint16_t e[4];
@@ -166,6 +187,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 
 	ptype1 = _mm_or_si128(ptype1, vtag1);
 	vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V;
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 	rx_pkts[0]->ol_flags = vol.e[0];
 	rx_pkts[1]->ol_flags = vol.e[1];
@@ -196,6 +218,18 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	int pos;
 	uint64_t var;
 	__m128i shuf_msk;
+#ifdef RTE_UNIFIED_PKT_TYPE
+	__m128i crc_adjust = _mm_set_epi16(
+				0, 0, 0,    /* ignore non-length fields */
+				-rxq->crc_len, /* sub crc on data_len */
+				0,          /* ignore high-16bits of pkt_len */
+				-rxq->crc_len, /* sub crc on pkt_len */
+				0, 0            /* ignore pkt_type field */
+			);
+	__m128i dd_check, eop_check;
+	__m128i desc_mask = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF,
+					  0xFFFFFFFF, 0xFFFF07F0);
+#else
 	__m128i crc_adjust = _mm_set_epi16(
 				0, 0, 0, 0, /* ignore non-length fields */
 				0,          /* ignore high-16bits of pkt_len */
@@ -204,6 +238,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 				0            /* ignore pkt_type field */
 			);
 	__m128i dd_check, eop_check;
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 	if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST))
 		return 0;
@@ -232,6 +267,18 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
 
 	/* mask to shuffle from desc. to mbuf */
+#ifdef RTE_UNIFIED_PKT_TYPE
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		1,           /* octet 1, 8 bits pkt_type field */
+		0            /* octet 0, 4 bits offset 4 pkt_type field */
+		);
+#else
 	shuf_msk = _mm_set_epi8(
 		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
 		0xFF, 0xFF,  /* skip high 16 bits vlan_macip, zero out */
@@ -241,18 +288,28 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		13, 12,      /* octet 12~13, 16 bits data_len */
 		0xFF, 0xFF   /* skip pkt_type field */
 		);
+#endif /* RTE_UNIFIED_PKT_TYPE */
 
 	/* Cache is empty -> need to scan the buffer rings, but first move
 	 * the next 'n' mbufs into the cache */
 	sw_ring = &rxq->sw_ring[rxq->rx_tail];
 
-	/*
-	 * A. load 4 packet in one loop
+#ifdef RTE_UNIFIED_PKT_TYPE
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
 	 * B. copy 4 mbuf point from swring to rx_pkts
 	 * C. calc the number of DD bits among the 4 packets
 	 * [C*. extract the end-of-packet bit, if requested]
 	 * D. fill info. from desc to mbuf
 	 */
+#else
+	/* A. load 4 packet in one loop
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+#endif /* RTE_UNIFIED_PKT_TYPE */
 	for (pos = 0, nb_pkts_recd = 0; pos < RTE_IXGBE_VPMD_RX_BURST;
 			pos += RTE_IXGBE_DESCS_PER_LOOP,
 			rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
@@ -289,6 +346,16 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+		/* A* mask out 0~3 bits RSS type */
+		descs[3] = _mm_and_si128(descs[3], desc_mask);
+		descs[2] = _mm_and_si128(descs[2], desc_mask);
+
+		/* A* mask out 0~3 bits RSS type */
+		descs[1] = _mm_and_si128(descs[1], desc_mask);
+		descs[0] = _mm_and_si128(descs[0], desc_mask);
+#endif /* RTE_UNIFIED_PKT_TYPE */
+
 		/* avoid compiler reorder optimization */
 		rte_compiler_barrier();
 
@@ -301,7 +368,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* C.1 4=>2 filter staterr info only */
 		sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+		/* set ol_flags with vlan packet type */
+#else
 		/* set ol_flags with packet type and vlan tag */
+#endif /* RTE_UNIFIED_PKT_TYPE */
 		desc_to_olflags_v(descs, &rx_pkts[pos]);
 
 		/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
@ 2015-06-01  7:33  4%     ` Helin Zhang
  2015-06-01  8:14  5%       ` Olivier MATZ
  2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
                       ` (16 subsequent siblings)
  17 siblings, 1 reply; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

In order to unify the packet type, the field of 'packet_type' in
'struct rte_mbuf' needs to be extended from 16 to 32 bits.
Accordingly, some fields in 'struct rte_mbuf' are re-organized to
support this change for Vector PMD. As 'struct rte_kni_mbuf' for
KNI should be right mapped to 'struct rte_mbuf', it should be
modified accordingly. In addition, Vector PMD of ixgbe is disabled
by default, as 'struct rte_mbuf' changed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 config/common_linuxapp                             |  2 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |  6 ++++++
 lib/librte_mbuf/rte_mbuf.h                         | 23 ++++++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Disabled vector ixgbe PMD by default, as mbuf layout changed.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0078dc9..6b067c7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_IXGBE_INC_VECTOR=y
+CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
 
 #
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 1e55c2d..7a2abbb 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -117,9 +117,15 @@ struct rte_kni_mbuf {
 	uint16_t data_off;      /**< Start address of data in segment buffer. */
 	char pad1[4];
 	uint64_t ol_flags;      /**< Offload features. */
+#ifdef RTE_UNIFIED_PKT_TYPE
+	char pad2[4];
+	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
+	uint16_t data_len;      /**< Amount of data in segment buffer. */
+#else
 	char pad2[2];
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
+#endif
 
 	/* fields on second cache line */
 	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ab6de67..a8662c2 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -269,6 +269,28 @@ struct rte_mbuf {
 	/* remaining bytes are set on RX when pulling packet from descriptor */
 	MARKER rx_descriptor_fields1;
 
+#ifdef RTE_UNIFIED_PKT_TYPE
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types.
+	 */
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
+#else
 	/**
 	 * The packet type, which is used to indicate ordinary packet and also
 	 * tunneled packet format, i.e. each number is represented a type of
@@ -280,6 +302,7 @@ struct rte_mbuf {
 	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
 	uint16_t reserved;
+#endif
 	union {
 		uint32_t rss;     /**< RSS hash result if RSS enabled */
 		struct {
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 00/18] unified packet type
  2015-05-22  8:44  2% ` [dpdk-dev] [PATCH v5 " Helin Zhang
    2015-05-22  8:44  3%   ` [dpdk-dev] [PATCH v5 18/18] mbuf: remove old packet type bit masks Helin Zhang
@ 2015-06-01  7:33  4%   ` Helin Zhang
  2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
                       ` (17 more replies)
  2 siblings, 18 replies; 200+ results
From: Helin Zhang @ 2015-06-01  7:33 UTC (permalink / raw)
  To: dev

Currently only 6 bits which are stored in ol_flags are used to indicate
the packet types. This is not enough, as some NIC hardware can recognize
quite a lot of packet types, e.g i40e hardware can recognize more than 150
packet types. Hiding those packet types hides hardware offload capabilities
which could be quite useful for improving performance and for end users. So
an unified packet types are needed to support all possible PMDs. A 16 bits
packet_type in mbuf structure can be changed to 32 bits and used for this
purpose. In addition, all packet types stored in ol_flag field should be
deleted at all, and 6 bits of ol_flags can be save as the benifit.

Initially, 32 bits of packet_type can be divided into several sub fields to
indicate different packet type information of a packet. The initial design
is to divide those bits into fields for L2 types, L3 types, L4 types, tunnel
types, inner L2 types, inner L3 types and inner L4 types. All PMDs should
translate the offloaded packet types into these 7 fields of information, for
user applications.

To avoid breaking ABI compatibility, currently all the code changes for
unified packet type are disabled at compile time by default. Users can
enable it manually by defining the macro of RTE_UNIFIED_PKT_TYPE. The code
changes will be valid by default in a future release, and the old version
will be deleted accordingly, after the ABI change process is done.

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
* Used redefined packet types and enlarged packet_type field for all PMDs
  and corresponding applications.
* Removed changes in bond and its relevant application, as there is no need
  at all according to the recent bond changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Put vector ixgbe changes right after mbuf changes.
* Disabled vector ixgbe PMD by default, as mbuf layout changed, and then
  re-enabled it after vector ixgbe PMD updated.
* Put the definitions of unified packet type into a single patch.
* Minor bug fixes and enhancements in l3fwd example.

v4 changes:
* Added detailed description of each packet types.
* Supported unified packet type of fm10k.
* Added printing logs of packet types of each received packet for rxonly
  mode in testpmd.
* Removed several useless code lines which block packet type unification from
  app/test/packet_burst_generator.c.

v5 changes:
* Added more detailed description for each packet types, together with examples.
* Rolled back the macro definitions of RX packet flags, for ABI compitability.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

Helin Zhang (18):
  mbuf: redefine packet_type in rte_mbuf
  ixgbe: support unified packet type in vectorized PMD
  mbuf: add definitions of unified packet types
  e1000: replace bit mask based packet type with unified packet type
  ixgbe: replace bit mask based packet type with unified packet type
  i40e: replace bit mask based packet type with unified packet type
  enic: replace bit mask based packet type with unified packet type
  vmxnet3: replace bit mask based packet type with unified packet type
  fm10k: replace bit mask based packet type with unified packet type
  app/test-pipeline: replace bit mask based packet type with unified
    packet type
  app/testpmd: replace bit mask based packet type with unified packet
    type
  app/test: Remove useless code
  examples/ip_fragmentation: replace bit mask based packet type with
    unified packet type
  examples/ip_reassembly: replace bit mask based packet type with
    unified packet type
  examples/l3fwd-acl: replace bit mask based packet type with unified
    packet type
  examples/l3fwd-power: replace bit mask based packet type with unified
    packet type
  examples/l3fwd: replace bit mask based packet type with unified packet
    type
  mbuf: remove old packet type bit masks

 app/test-pipeline/pipeline_hash.c                  |  13 +
 app/test-pmd/csumonly.c                            |  14 +
 app/test-pmd/rxonly.c                              | 183 +++++++
 app/test/packet_burst_generator.c                  |   6 +-
 drivers/net/e1000/igb_rxtx.c                       | 102 ++++
 drivers/net/enic/enic_main.c                       |  26 +
 drivers/net/fm10k/fm10k_rxtx.c                     |  27 ++
 drivers/net/i40e/i40e_rxtx.c                       | 528 +++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.c                     | 163 +++++++
 drivers/net/ixgbe/ixgbe_rxtx_vec.c                 |  75 ++-
 drivers/net/vmxnet3/vmxnet3_rxtx.c                 |   8 +
 examples/ip_fragmentation/main.c                   |   9 +
 examples/ip_reassembly/main.c                      |   9 +
 examples/l3fwd-acl/main.c                          |  29 +-
 examples/l3fwd-power/main.c                        |   8 +
 examples/l3fwd/main.c                              | 123 ++++-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |   6 +
 lib/librte_mbuf/rte_mbuf.c                         |   4 +
 lib/librte_mbuf/rte_mbuf.h                         | 514 ++++++++++++++++++++
 19 files changed, 1834 insertions(+), 13 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs
  2015-05-31 16:48  3%           ` Stephen Hemminger
  2015-05-31 17:30  0%             ` Wang, Liang-min
@ 2015-05-31 18:31  0%             ` Wang, Liang-min
  1 sibling, 0 replies; 200+ results
From: Wang, Liang-min @ 2015-05-31 18:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev


>>On Sat, 30 May 2015 19:40:46 +0000
>>"Wang, Liang-min" <liang-min.wang@intel.com> wrote:
>>
>>> 
> >>On Sat, 30 May 2015 16:16:01 +0000
> >>"Wang, Liang-min" <liang-min.wang@intel.com> wrote:
> >>
> >> >>The design decision is to keep ethdev as THE interface for all the external API, so ethtool APIs are designed based upon ethdev API. At the meantime, the ethtool APIs are designed to enable users to migrate designs based upon kernel-space ethtool. The open/close/start are put in place to enable quick migration.
> >>>
> >>>But there is no open/close/start in ethtool in kernel.
> >>>Anyway ethtool is currently on the disfavored list from kernel developers.
> >>>What about netlink or something better?
> >>>
> >>>Remember each new API creates more long term compatiablity and ABI issues.
> >>>So I am against introducing any new API that does the same thing as existing API's.
> >>
>>> Just to clarify APIs supported by this ethtool api: there are 
>>> net_open
> >>and net_stop and no net_start. Both functions are put in place to 
> >>support net_device_ops::ndo_open and net_device_ops::ndo_close as 
> >>defined in linux/netdevice.h
>>
>>
>>I get the feeling there is some use case you are not telling the list about.
>>What kind of application would use this api only. Why or how would DPDK application be involved in net_device_ops. If you are planning on putting DPDK in the kernel there are lots of other issues >>including kernel ABI stability and licensing that need to be dealt with.
>
>(I'm manually adding ">" through my email, outlook, to make my reply. I apology if I make any mistake on adding ">" in wrong place) No, we don't plan to put DPDK into kernel space, and this patch has >nothing to do with bifurcated driver that was announced for DPDK 2.0 then got scrubbed (or deferred). In contrary, the entire ethtool API (more support is coming) is designed to assist applications that >were designed based upon kernel ethtool to migrate into user-space driver based DPDK libraries. Being said that, as you are aware the kernel version of ndo_open/ndo_close is more than just start and >stop device. The initial implementation is to provide minimum functionality (strip off all the kernel related state management). In the future release (we need comments like yours), we will continue >make improvement. So this new API can be another alternative for applications to run device management.

Just to clarify my last reply: when I said "... based upon kernel ethtool to migrate ...", I was referring to both ethtool and net_device_op.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs
  2015-05-31 16:48  3%           ` Stephen Hemminger
@ 2015-05-31 17:30  0%             ` Wang, Liang-min
  2015-05-31 18:31  0%             ` Wang, Liang-min
  1 sibling, 0 replies; 200+ results
From: Wang, Liang-min @ 2015-05-31 17:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev


>On Sat, 30 May 2015 19:40:46 +0000
>"Wang, Liang-min" <liang-min.wang@intel.com> wrote:
>
>> 
> >On Sat, 30 May 2015 16:16:01 +0000
> >"Wang, Liang-min" <liang-min.wang@intel.com> wrote:
> >
> >> >The design decision is to keep ethdev as THE interface for all the external API, so ethtool APIs are designed based upon ethdev API. At the meantime, the ethtool APIs are designed to enable users to migrate designs based upon kernel-space ethtool. The open/close/start are put in place to enable quick migration.
> >>
> >>But there is no open/close/start in ethtool in kernel.
> >>Anyway ethtool is currently on the disfavored list from kernel developers.
> >>What about netlink or something better?
> >>
> >>Remember each new API creates more long term compatiablity and ABI issues.
> >>So I am against introducing any new API that does the same thing as existing API's.
> >
>> Just to clarify APIs supported by this ethtool api: there are net_open 
> >and net_stop and no net_start. Both functions are put in place to 
> >support net_device_ops::ndo_open and net_device_ops::ndo_close as 
> >defined in linux/netdevice.h
>
>
>I get the feeling there is some use case you are not telling the list about.
>What kind of application would use this api only. Why or how would DPDK application be involved in net_device_ops. If you are planning on putting DPDK in the kernel there are lots of other issues >including kernel ABI stability and licensing that need to be dealt with.

(I'm manually adding ">" through my email, outlook, to make my reply. I apology if I make any mistake on adding ">" in wrong place)
No, we don't plan to put DPDK into kernel space, and this patch has nothing to do with bifurcated driver that was announced for DPDK 2.0 then got scrubbed (or deferred). In contrary, the entire ethtool API (more support is coming) is designed to assist applications that were designed based upon kernel ethtool to migrate into user-space driver based DPDK libraries. Being said that, as you are aware the kernel version of ndo_open/ndo_close is more than just start and stop device. The initial implementation is to provide minimum functionality (strip off all the kernel related state management). In the future release (we need comments like yours), we will continue make improvement. So this new API can be another alternative for applications to run device management.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs
  2015-05-30 19:40  0%         ` Wang, Liang-min
@ 2015-05-31 16:48  3%           ` Stephen Hemminger
  2015-05-31 17:30  0%             ` Wang, Liang-min
  2015-05-31 18:31  0%             ` Wang, Liang-min
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2015-05-31 16:48 UTC (permalink / raw)
  To: Wang, Liang-min; +Cc: dev

On Sat, 30 May 2015 19:40:46 +0000
"Wang, Liang-min" <liang-min.wang@intel.com> wrote:

> 
> On Sat, 30 May 2015 16:16:01 +0000
> "Wang, Liang-min" <liang-min.wang@intel.com> wrote:
> 
> > >The design decision is to keep ethdev as THE interface for all the external API, so ethtool APIs are designed based upon ethdev API. At the meantime, the ethtool APIs are designed to enable users to migrate designs based upon kernel-space ethtool. The open/close/start are put in place to enable quick migration.
> >
> >But there is no open/close/start in ethtool in kernel.
> >Anyway ethtool is currently on the disfavored list from kernel developers.
> >What about netlink or something better?
> >
> >Remember each new API creates more long term compatiablity and ABI issues.
> >So I am against introducing any new API that does the same thing as existing API's.
> 
> Just to clarify APIs supported by this ethtool api: there are net_open and net_stop and no net_start. Both functions are put in place to support net_device_ops::ndo_open and net_device_ops::ndo_close as defined in linux/netdevice.h


I get the feeling there is some use case you are not telling the list about.
What kind of application would use this api only. Why or how would DPDK application
be involved in net_device_ops. If you are planning on putting DPDK in the kernel
there are lots of other issues including kernel ABI stability and licensing
that need to be dealt with.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs
  2015-05-30 19:26  3%       ` Stephen Hemminger
@ 2015-05-30 19:40  0%         ` Wang, Liang-min
  2015-05-31 16:48  3%           ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Wang, Liang-min @ 2015-05-30 19:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev


On Sat, 30 May 2015 16:16:01 +0000
"Wang, Liang-min" <liang-min.wang@intel.com> wrote:

> >The design decision is to keep ethdev as THE interface for all the external API, so ethtool APIs are designed based upon ethdev API. At the meantime, the ethtool APIs are designed to enable users to migrate designs based upon kernel-space ethtool. The open/close/start are put in place to enable quick migration.
>
>But there is no open/close/start in ethtool in kernel.
>Anyway ethtool is currently on the disfavored list from kernel developers.
>What about netlink or something better?
>
>Remember each new API creates more long term compatiablity and ABI issues.
>So I am against introducing any new API that does the same thing as existing API's.

Just to clarify APIs supported by this ethtool api: there are net_open and net_stop and no net_start. Both functions are put in place to support net_device_ops::ndo_open and net_device_ops::ndo_close as defined in linux/netdevice.h

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs
  @ 2015-05-30 19:26  3%       ` Stephen Hemminger
  2015-05-30 19:40  0%         ` Wang, Liang-min
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-30 19:26 UTC (permalink / raw)
  To: Wang, Liang-min; +Cc: dev

On Sat, 30 May 2015 16:16:01 +0000
"Wang, Liang-min" <liang-min.wang@intel.com> wrote:

> The design decision is to keep ethdev as THE interface for all the external API, so ethtool APIs are designed based upon ethdev API. At the meantime, the ethtool APIs are designed to enable users to migrate designs based upon kernel-space ethtool. The open/close/start are put in place to enable quick migration.

But there is no open/close/start in ethtool in kernel.
Anyway ethtool is currently on the disfavored list from kernel developers.
What about netlink or something better?

Remember each new API creates more long term compatiablity and ABI issues.
So I am against introducing any new API that does the same thing as existing API's.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] pmd: change initialization to indicate pci drivers
@ 2015-05-29 15:47  3% Stephen Hemminger
  2015-06-12  9:13  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-29 15:47 UTC (permalink / raw)
  To: dev

Upcoming drivers will need to be able to support other bus types.
This is a transparent change to how struct eth_driver is initialized.
It has not function or ABI layout impact, but makes adding a later
bus type (Xen, Hyper-V, ...) much easier.

Signed-off-by: Stpehen Hemminger <stephen@networkplumber.org>
---
 drivers/net/e1000/em_ethdev.c        | 2 +-
 drivers/net/e1000/igb_ethdev.c       | 4 ++--
 drivers/net/enic/enic_ethdev.c       | 2 +-
 drivers/net/fm10k/fm10k_ethdev.c     | 2 +-
 drivers/net/i40e/i40e_ethdev.c       | 2 +-
 drivers/net/i40e/i40e_ethdev_vf.c    | 2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c     | 4 ++--
 drivers/net/virtio/virtio_ethdev.c   | 2 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +-
 9 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index d28030e..9960c70 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -281,7 +281,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_em_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..bd29bc0 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -680,7 +680,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_igb_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
@@ -693,7 +693,7 @@ static struct eth_driver rte_igb_pmd = {
  * virtual function driver struct
  */
 static struct eth_driver rte_igbvf_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 69ad01b..8280cea 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -609,7 +609,7 @@ static int eth_enicpmd_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_enic_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_enic_pmd",
 		.id_table = pci_id_enic_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 275c19c..3de40e6 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1841,7 +1841,7 @@ static const struct rte_pci_id pci_id_fm10k_map[] = {
 };
 
 static struct eth_driver rte_pmd_fm10k = {
-	{
+	.pci_drv = {
 		.name = "rte_pmd_fm10k",
 		.id_table = pci_id_fm10k_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index fb64027..f283cb3 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -265,7 +265,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
 };
 
 static struct eth_driver rte_i40e_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_i40e_pmd",
 		.id_table = pci_id_i40e_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 9f92a2f..c730bc6 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1199,7 +1199,7 @@ i40evf_dev_init(struct rte_eth_dev *eth_dev)
  * virtual function driver struct
  */
 static struct eth_driver rte_i40evf_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_i40evf_pmd",
 		.id_table = pci_id_i40evf_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..ca65103 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1087,7 +1087,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_ixgbe_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
@@ -1100,7 +1100,7 @@ static struct eth_driver rte_ixgbe_pmd = {
  * virtual function driver struct
  */
 static struct eth_driver rte_ixgbevf_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index f74e413..4353ce2 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1238,7 +1238,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_virtio_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_virtio_pmd",
 		.id_table = pci_id_virtio_map,
 	},
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 1685ce4..0db2a04 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -261,7 +261,7 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev)
 }
 
 static struct eth_driver rte_vmxnet3_pmd = {
-	{
+	.pci_drv = {
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
 		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
-- 
2.1.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-05-29  8:45 10%       ` [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue Cunming Liang
  2015-05-29 15:27  9%         ` Stephen Hemminger
@ 2015-05-29 15:36  5%         ` Vincent JARDIN
  2015-06-01 14:11  5%         ` Stephen Hemminger
  2 siblings, 0 replies; 200+ results
From: Vincent JARDIN @ 2015-05-29 15:36 UTC (permalink / raw)
  To: Cunming Liang, dev; +Cc: liang-min.wang, shemming

On 29/05/2015 10:45, Cunming Liang wrote:
> RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
> The usrs should make sure understand the impact before turning on the feature.
> There are two abi changes required in this interrupt patch set.
> They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.
>
> Signed-off-by: Cunming Liang<cunming.liang@intel.com>

Acked-by: vincent jardin <vincent.jardin@6wind.com>

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-05-29  8:45 10%       ` [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-05-29 15:27  9%         ` Stephen Hemminger
  2015-06-01  8:48  9%           ` Liang, Cunming
  2015-05-29 15:36  5%         ` Vincent JARDIN
  2015-06-01 14:11  5%         ` Stephen Hemminger
  2 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-29 15:27 UTC (permalink / raw)
  To: dev

On Fri, 29 May 2015 16:45:25 +0800
Cunming Liang <cunming.liang@intel.com> wrote:

> +#ifdef RTE_EAL_RX_INTR
> +extern int
>  rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
> +#else
> +static inline int
> +rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
> +{
> +	RTE_SET_USED(port_id);
> +	RTE_SET_USED(epfd);
> +	RTE_SET_USED(op);
> +	RTE_SET_USED(data);
> +	return -1;
> +}
> +#endif

Doing ABI compatibility is good but hard.

I think it would be better not to provide the functions for rx_intr_ctl unless
the feature was configured on. That way anyone using them with incorrect config
would detect failure at build time, rather than run time.

Also, doesn't some doc file have to be updated for the announcement?

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
  2015-05-21 17:58  4%           ` Neil Horman
  2015-05-21 18:21  3%             ` Stephen Hemminger
       [not found]                 ` <20150521111400.2a04a196@urahara>
@ 2015-05-29  8:56  5%             ` Liang, Cunming
  2 siblings, 0 replies; 200+ results
From: Liang, Cunming @ 2015-05-29  8:56 UTC (permalink / raw)
  To: Neil Horman, Stephen Hemminger; +Cc: dev, liang-min.wang

Hi Neil,

On 5/22/2015 1:58 AM, Neil Horman wrote:
> On Thu, May 21, 2015 at 10:43:00AM -0700, Stephen Hemminger wrote:
>> On Thu, 21 May 2015 06:32:02 -0400
>> Neil Horman <nhorman@tuxdriver.com> wrote:
>>
>>> On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
>>>> The patch adds interrupt vectors support in rte_intr_handle.
>>>> 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
>>>> Those event fds are stored in efds[].
>>>> 'intr_vec' is reserved for device driver to initialize the vector mapping table.
>>>> When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
>>>>
>>>> Signed-off-by: Danny Zhou <danny.zhou@intel.com>
>>>> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
>>>> ---
>>>> v7 changes:
>>>>   - add eptrs[], it's used to store the register rte_epoll_event instances.
>>>>   - add vec_en, to log the vector capability status.
>>>>
>>>> v6 changes:
>>>>   - add mapping table between irq vector number and queue id.
>>>>
>>>> v5 changes:
>>>>   - Create this new patch file for changed struct rte_intr_handle that
>>>>     other patches depend on, to avoid breaking git bisect.
>>>>
>>>>   lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>>>> index 6a159c7..27174df 100644
>>>> --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>>>> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>>>> @@ -38,6 +38,8 @@
>>>>   #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
>>>>   #define _RTE_LINUXAPP_INTERRUPTS_H_
>>>>   
>>>> +#define RTE_MAX_RXTX_INTR_VEC_ID     32
>>>> +
>>>>   enum rte_intr_handle_type {
>>>>   	RTE_INTR_HANDLE_UNKNOWN = 0,
>>>>   	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
>>>> @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
>>>>   	RTE_INTR_HANDLE_MAX
>>>>   };
>>>>   
>>>> +struct rte_epoll_event;
>>>> +
>>>>   /** Handle for interrupts. */
>>>>   struct rte_intr_handle {
>>>>   	union {
>>>> @@ -57,6 +61,12 @@ struct rte_intr_handle {
>>>>   	};
>>>>   	int fd;	 /**< interrupt event file descriptor */
>>>>   	enum rte_intr_handle_type type;  /**< handle type */
>>>> +	uint32_t max_intr;               /**< max interrupt requested */
>>>> +	uint32_t nb_efd;                 /**< number of available efds */
>>>> +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
>>>> +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
>>>> +					 /**< intr vector epoll event ptr */
>>>> +	int *intr_vec;                   /**< intr vector number array */
>>>>   };
>>>>     
>>> This is going to be ABI breaking if this from test_interrupts.c:
>>> static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];
>>>
>>> is a plausible way of using this structure.  Even putting the data at the end of
>>> the structure won't help, as the array indicies are off
>> This needs to go in 2.0 and 2.0 has to have new ABI anyway.
>>
> We've already released 2.0, I think you mean 2.1, but 2.1 can't have a new ABI
> because we didn't announce it in 1.8.  The earliest we can update the ABI
> (according to the ABI docs) at this point is 2.2, since we need to announce the
> change in 2.1, then make it in 2.2
>
> Neil
>
I'll follow your guidance to send a separate patch to announce the ABI 
changes in this release.
For this code patch series, I propose to turn off the whole feature by 
default so as to avoid ABI broken in this release.
On next release v2.2, I'll send another cleanup patch to remove the 
feature macro.
In this way, we either won't block the feature code review or won't 
break ABI.
For users who are still going to use the feature in v2.1, they shall 
make sure aware of the impact of ABI changes and take risk to turn on 
the feature manually.
The v9 patch will include this part. Does it sound good to you?

Thanks,
Steve

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue
  2015-05-29  8:45  4%     ` [dpdk-dev] [PATCH v9 00/12] Interrupt mode PMD Cunming Liang
  2015-05-29  8:45  2%       ` [dpdk-dev] [PATCH v9 08/12] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-05-29  8:45 10%       ` Cunming Liang
  2015-05-29 15:27  9%         ` Stephen Hemminger
                           ` (2 more replies)
  2015-06-02  6:53  4%       ` [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
  2 siblings, 3 replies; 200+ results
From: Cunming Liang @ 2015-05-29  8:45 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The usrs should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  4 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 183 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1894,6 +1913,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3750,6 +3770,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3791,6 +3812,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3800,18 +3822,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_EAL_RX_INTR
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3868,6 +3893,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 798bb85..8c7bc99 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1481,7 +1487,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1514,6 +1522,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1532,6 +1541,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -1;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1619,9 +1629,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1727,12 +1739,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2335,6 +2349,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_EAL_RX_INTR
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2345,6 +2360,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3127,7 +3143,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3160,6 +3178,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3177,7 +3196,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3223,19 +3242,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3246,11 +3269,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3834,6 +3859,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3902,21 +3928,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3927,6 +3957,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3937,18 +3968,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4000,6 +4034,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 538bb93..86ff3e9 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,7 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
+		.rxq = 1,
 	},
 };
 
@@ -889,7 +889,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index fc2c46b..f0f6a3f 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -49,9 +49,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 1b80359..abc2062 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_EAL_RX_INTR
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_EAL_RX_INTR
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -875,6 +883,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -917,6 +926,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1054,6 +1064,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1165,3 +1176,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 7c8a62b..5390b21 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_EAL_RX_INTR
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_EAL_RX_INTR
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 846d7f8..823eb46 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3282,6 +3282,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3353,6 +3354,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c199d32..8bea68d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_EAL_RX_INTR
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2943,8 +2945,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2967,9 +2981,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH v9 08/12] ethdev: add rx intr enable, disable and ctl functions
  2015-05-29  8:45  4%     ` [dpdk-dev] [PATCH v9 00/12] Interrupt mode PMD Cunming Liang
@ 2015-05-29  8:45  2%       ` Cunming Liang
  2015-05-29  8:45 10%       ` [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue Cunming Liang
  2015-06-02  6:53  4%       ` [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
  2 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-05-29  8:45 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 024fe8b..846d7f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1386,6 +1396,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2868,6 +2882,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v9 00/12] Interrupt mode PMD
  2015-05-21  8:55  2%   ` [dpdk-dev] [PATCH v8 00/11] Interrupt mode PMD Cunming Liang
    2015-05-21  8:56  2%     ` [dpdk-dev] [PATCH v8 08/11] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-05-29  8:45  4%     ` Cunming Liang
  2015-05-29  8:45  2%       ` [dpdk-dev] [PATCH v9 08/12] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
                         ` (2 more replies)
  2 siblings, 3 replies; 200+ results
From: Cunming Liang @ 2015-05-29  8:45 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by defalut so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (12):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/bsd: dummy for new intr definition
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 207 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 358 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1851 insertions(+), 125 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
  2015-05-28 19:32  3%         ` Dumitrescu, Cristian
@ 2015-05-28 21:41  3%           ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-05-28 21:41 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev

On Thu, 28 May 2015 19:32:32 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> This is just adding  a new field at the end of an API data structure. Based on input from multiple people and after reviewing the rules listed on http://dpdk.org/doc/guides/rel_notes/abi.html , I think this is an acceptable change. There are other patches in flight on this mailing list that are in the same situation. Any typical/well behaved application will not break due to this change.

Expanding a structure can be okay but:
  1. The allocation will have to always take within the library.
     If you let application put structure on stack or allocate on it's own, the ABI would break.

  2. The structure must not be used as a return by reference.
     For example, this would break if sizeof(struct my_stats) changed.

     void foo() {
             struct my_stats stats;
	     int i_will_get_clobbered;
	...
		rte_dpdk_get_stats(obj, &stats)
	}

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
  2015-05-26 21:57  0%       ` Stephen Hemminger
@ 2015-05-28 19:32  3%         ` Dumitrescu, Cristian
  2015-05-28 21:41  3%           ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2015-05-28 19:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, May 26, 2015 10:58 PM
> To: Dumitrescu, Cristian
> Cc: Gajdzica, MaciejX T; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing
> table stats
> 
> On Tue, 26 May 2015 21:40:42 +0000
> "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> > > Hemminger
> > > Sent: Tuesday, May 26, 2015 3:58 PM
> > > To: Gajdzica, MaciejX T
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for
> storing
> > > table stats
> > >
> > > On Tue, 26 May 2015 14:39:38 +0200
> > > Maciej Gajdzica <maciejx.t.gajdzica@intel.com> wrote:
> > >
> > > > +
> > > >  /** Lookup table interface defining the lookup table operation */
> > > >  struct rte_table_ops {
> > > >  	rte_table_op_create f_create;       /**< Create */
> > > > @@ -194,6 +218,7 @@ struct rte_table_ops {
> > > >  	rte_table_op_entry_add f_add;       /**< Entry add */
> > > >  	rte_table_op_entry_delete f_delete; /**< Entry delete */
> > > >  	rte_table_op_lookup f_lookup;       /**< Lookup */
> > > > +	rte_table_op_stats_read f_stats;	/**< Stats */
> > > >  };
> > >
> > > Another good idea, which is an ABI change.
> >
> > This is simply adding a new API function, this is not changing any function
> prototype. There is no change required in the map file of this library. Is there
> anything we should have done and we did not do?
> >
> 
> But if I built an external set of code which had rte_table_ops (don't worry I
> haven't)
> and that binary ran with the new definition, the core code it table would
> reference
> outside the (old version) of rte_table_ops structure and find garbage.

This is just adding  a new field at the end of an API data structure. Based on input from multiple people and after reviewing the rules listed on http://dpdk.org/doc/guides/rel_notes/abi.html , I think this is an acceptable change. There are other patches in flight on this mailing list that are in the same situation. Any typical/well behaved application will not break due to this change.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 4/5] rte_sched: hide structure of port hierarchy
  2015-05-27 18:10  3% [dpdk-dev] [PATCH v4 0/5] rte_sched: cleanup and API enhancements Stephen Hemminger
@ 2015-05-27 18:10  3% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-05-27 18:10 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev

Right now the scheduler hierarchy is encoded as a bitfield
that is visible as part of the ABI. This creates an barrier
limiting future expansion of the hierarchy.

As a transistional step. hide the actual layout of the hierarchy
and mark the exposed structure as deprecated. This will allow for
expansion in later release.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/librte_sched/rte_sched.c           | 54 ++++++++++++++++++++++++++++++++++
 lib/librte_sched/rte_sched.h           | 54 ++++++++++------------------------
 lib/librte_sched/rte_sched_version.map |  3 ++
 3 files changed, 73 insertions(+), 38 deletions(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index b1655b4..9c9419d 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -184,6 +184,21 @@ enum grinder_state {
 	e_GRINDER_READ_MBUF
 };
 
+/*
+ * Path through the scheduler hierarchy used by the scheduler enqueue
+ * operation to identify the destination queue for the current
+ * packet. Stored in the field pkt.hash.sched of struct rte_mbuf of
+ * each packet, typically written by the classification stage and read
+ * by scheduler enqueue.
+ */
+struct __rte_sched_port_hierarchy {
+	uint32_t queue:2;                /**< Queue ID (0 .. 3) */
+	uint32_t traffic_class:2;        /**< Traffic class ID (0 .. 3)*/
+	uint32_t pipe:20;                /**< Pipe ID */
+	uint32_t subport:6;              /**< Subport ID */
+	uint32_t color:2;                /**< Color */
+};
+
 struct rte_sched_grinder {
 	/* Pipe cache */
 	uint16_t pcache_qmask[RTE_SCHED_GRINDER_PCACHE_SIZE];
@@ -910,6 +925,45 @@ rte_sched_pipe_config(struct rte_sched_port *port,
 	return 0;
 }
 
+void
+rte_sched_port_pkt_write(struct rte_mbuf *pkt,
+			 uint32_t subport, uint32_t pipe, uint32_t traffic_class,
+			 uint32_t queue, enum rte_meter_color color)
+{
+	struct __rte_sched_port_hierarchy *sched
+		= (struct __rte_sched_port_hierarchy *) &pkt->hash.sched;
+
+	sched->color = (uint32_t) color;
+	sched->subport = subport;
+	sched->pipe = pipe;
+	sched->traffic_class = traffic_class;
+	sched->queue = queue;
+}
+
+void
+rte_sched_port_pkt_read_tree_path(const struct rte_mbuf *pkt,
+				  uint32_t *subport, uint32_t *pipe,
+				  uint32_t *traffic_class, uint32_t *queue)
+{
+	const struct __rte_sched_port_hierarchy *sched
+		= (const struct __rte_sched_port_hierarchy *) &pkt->hash.sched;
+
+	*subport = sched->subport;
+	*pipe = sched->pipe;
+	*traffic_class = sched->traffic_class;
+	*queue = sched->queue;
+}
+
+
+enum rte_meter_color
+rte_sched_port_pkt_read_color(const struct rte_mbuf *pkt)
+{
+	const struct __rte_sched_port_hierarchy *sched
+		= (const struct __rte_sched_port_hierarchy *) &pkt->hash.sched;
+
+	return (enum rte_meter_color) sched->color;
+}
+
 int
 rte_sched_subport_read_stats(struct rte_sched_port *port,
 	uint32_t subport_id,
diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
index e6bba22..f7c0b8e 100644
--- a/lib/librte_sched/rte_sched.h
+++ b/lib/librte_sched/rte_sched.h
@@ -195,17 +195,19 @@ struct rte_sched_port_params {
 #endif
 };
 
-/** Path through the scheduler hierarchy used by the scheduler enqueue operation to
-identify the destination queue for the current packet. Stored in the field hash.sched
-of struct rte_mbuf of each packet, typically written by the classification stage and read by
-scheduler enqueue.*/
+/*
+ * Path through scheduler hierarchy
+ *
+ * Note: direct access to internal bitfields is deprecated to allow for future expansion.
+ * Use rte_sched_port_pkt_read/write API instead
+ */
 struct rte_sched_port_hierarchy {
 	uint32_t queue:2;                /**< Queue ID (0 .. 3) */
 	uint32_t traffic_class:2;        /**< Traffic class ID (0 .. 3)*/
 	uint32_t pipe:20;                /**< Pipe ID */
 	uint32_t subport:6;              /**< Subport ID */
 	uint32_t color:2;                /**< Color */
-};
+} __attribute__ ((deprecated));
 
 /*
  * Configuration
@@ -328,11 +330,6 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
 	struct rte_sched_queue_stats *stats,
 	uint16_t *qlen);
 
-/*
- * Run-time
- *
- ***/
-
 /**
  * Scheduler hierarchy path write to packet descriptor. Typically called by the
  * packet classification stage.
@@ -348,18 +345,10 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
  * @param queue
  *   Queue ID within pipe traffic class (0 .. 3)
  */
-static inline void
+void
 rte_sched_port_pkt_write(struct rte_mbuf *pkt,
-	uint32_t subport, uint32_t pipe, uint32_t traffic_class, uint32_t queue, enum rte_meter_color color)
-{
-	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
-
-	sched->color = (uint32_t) color;
-	sched->subport = subport;
-	sched->pipe = pipe;
-	sched->traffic_class = traffic_class;
-	sched->queue = queue;
-}
+			 uint32_t subport, uint32_t pipe, uint32_t traffic_class,
+			 uint32_t queue, enum rte_meter_color color);
 
 /**
  * Scheduler hierarchy path read from packet descriptor (struct rte_mbuf). Typically
@@ -378,24 +367,13 @@ rte_sched_port_pkt_write(struct rte_mbuf *pkt,
  *   Queue ID within pipe traffic class (0 .. 3)
  *
  */
-static inline void
-rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t *subport, uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue)
-{
-	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
-
-	*subport = sched->subport;
-	*pipe = sched->pipe;
-	*traffic_class = sched->traffic_class;
-	*queue = sched->queue;
-}
-
-static inline enum rte_meter_color
-rte_sched_port_pkt_read_color(struct rte_mbuf *pkt)
-{
-	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
+void
+rte_sched_port_pkt_read_tree_path(const struct rte_mbuf *pkt,
+				  uint32_t *subport, uint32_t *pipe,
+				  uint32_t *traffic_class, uint32_t *queue);
 
-	return (enum rte_meter_color) sched->color;
-}
+enum rte_meter_color
+rte_sched_port_pkt_read_color(const struct rte_mbuf *pkt);
 
 /**
  * Hierarchical scheduler port enqueue. Writes up to n_pkts to port scheduler and
diff --git a/lib/librte_sched/rte_sched_version.map b/lib/librte_sched/rte_sched_version.map
index 9f74e8b..6626a74 100644
--- a/lib/librte_sched/rte_sched_version.map
+++ b/lib/librte_sched/rte_sched_version.map
@@ -17,6 +17,9 @@ DPDK_2.0 {
 	rte_sched_queue_read_stats;
 	rte_sched_subport_config;
 	rte_sched_subport_read_stats;
+	rte_sched_port_pkt_write;
+	rte_sched_port_pkt_read_tree_path;
+	rte_sched_port_pkt_read_color;
 
 	local: *;
 };
-- 
2.1.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 0/5] rte_sched: cleanup and API enhancements
@ 2015-05-27 18:10  3% Stephen Hemminger
  2015-05-27 18:10  3% ` [dpdk-dev] [PATCH 4/5] rte_sched: hide structure of port hierarchy Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-27 18:10 UTC (permalink / raw)
  To: cristian.dumitrescu; +Cc: dev

This fixes some small issues with rte_sched API and sets stage
for enhancements in later release. Unfortunately, several things
can not be done now because of the ABI rules.

Stephen Hemminger (5):
  rte_sched: make RED optional at runtime
  rte_sched: don't put tabs in log messages
  rte_sched: use correct log level
  rte_sched: hide structure of port hierarchy
  rte_sched: allow reading without clearing

 app/test/test_sched.c                  |   4 +-
 lib/librte_sched/rte_sched.c           | 157 +++++++++++++++++++++++++--------
 lib/librte_sched/rte_sched.h           |  89 +++++++++----------
 lib/librte_sched/rte_sched_version.map |   5 ++
 4 files changed, 171 insertions(+), 84 deletions(-)

-- 
2.1.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/4] kni: add function to query the name of a kni object
       [not found]         ` <5565D195.9040701@bisdn.de>
@ 2015-05-27 15:36  3%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-05-27 15:36 UTC (permalink / raw)
  To: Marc Sune; +Cc: dev

On Wed, May 27, 2015 at 04:15:49PM +0200, Marc Sune wrote:
> 
> 
> On 27/05/15 15:55, Bruce Richardson wrote:
> >On Wed, May 27, 2015 at 03:52:34PM +0200, Marc Sune wrote:
> >>
> >>On 27/05/15 15:47, Bruce Richardson wrote:
> >>>When a KNI object is created, a name is assigned to it which is stored
> >>>internally. There is also an API function to look up a KNI object by
> >>>name, but there is no API to query the current name of an existing
> >>>KNI object. This patch adds just such an API.
> >>>
> >>>Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >>>---
> >>>  lib/librte_kni/rte_kni.c           |  6 ++++++
> >>>  lib/librte_kni/rte_kni.h           | 10 ++++++++++
> >>>  lib/librte_kni/rte_kni_version.map |  1 +
> >>>  3 files changed, 17 insertions(+)
> >>>
> >>>diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> >>>index 4e70fa0..c5a0089 100644
> >>>--- a/lib/librte_kni/rte_kni.c
> >>>+++ b/lib/librte_kni/rte_kni.c
> >>>@@ -674,6 +674,12 @@ rte_kni_get(const char *name)
> >>>  	return NULL;
> >>>  }
> >>>+const char *
> >>>+rte_kni_get_name(const struct rte_kni *kni)
> >>>+{
> >>>+	return kni->name;
> >>>+}
> >>Since a pointer to the kni context (struct rte_kni) is exposed to the user
> >>(rte_kni_get() and rte_kni_alloc ()), and the field is directly in the
> >>struct, is this API call really necessary? I would only see this necessary
> >>if the API would only expose a handle, like a port_id for ethdev
> >>
> >>Marc
> >The structure definition is in rte_kni.c, not in the header file, so applications
> >can't read the name directly. In other words, the create API just exposes a handle.
> >[The structure in the header is the conf structure, not the full kni struct]
> 
> Ops, you are right. I overlooked that. What about:
> 
> extern void rte_kni_get_config(const struct rte_kni *kni, struct
> rte_kni_conf* conf);
> 
> which fills in (copies) the fields of conf would allow to recover the
> original configuration, including the name? It is closer
> rte_eth_dev_info_get (unfortunately rte_kni_info_get is taken by the
> deprecated API), and would work if we add more params to rte_kni_conf.
> 
> Thanks
> marc
> 

Given the issues that have been flagged recently around ABI compatibility, I
don't think I'd introduce such an API. If provided like you describe, it makes
the calling application very dependent upon the size and structure of the
conf structure. Having specific function like this to return specific values
is safer that way.

An alternative is to have a function which returns the conf structure as
a return value, rather than as an in-out arg. That would allow us to add
fields to the end of the structure without breaking applications using that particular
function. This would require us to store the entire conf structure inside the
rte_kni structure though, so we can return a const pointer to it. This is a bit more
invasive of a change.

Overall, I prefer the query-single value option, as I think it's generally the
best-practice for working with structures going forward. [Neil, please correct
me if I'm wrong here!]. If however, you see a current need for looking at the
other fields inside the KNI, I can see about changing things to return the
whole config structure as a const pointer.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
  2015-05-22 16:52  5%                 ` Stephen Hemminger
@ 2015-05-27 10:33  4%                   ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-05-27 10:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, liang-min.wang

On Fri, May 22, 2015 at 09:52:06AM -0700, Stephen Hemminger wrote:
> On Fri, 22 May 2015 00:05:36 +0000
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Thu, May 21, 2015 at 11:14:00AM -0700, Stephen Hemminger wrote:
> > > On Thu, 21 May 2015 13:58:46 -0400
> > > Neil Horman <nhorman@tuxdriver.com> wrote:
> > > 
> > > > On Thu, May 21, 2015 at 10:43:00AM -0700, Stephen Hemminger wrote:
> > > > > On Thu, 21 May 2015 06:32:02 -0400
> > > > > Neil Horman <nhorman@tuxdriver.com> wrote:
> > > > > 
> > > > > > On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
> > > > > > > The patch adds interrupt vectors support in rte_intr_handle.
> > > > > > > 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
> > > > > > > Those event fds are stored in efds[].
> > > > > > > 'intr_vec' is reserved for device driver to initialize the vector mapping table.
> > > > > > > When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
> > > > > > > 
> > > > > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > > ---
> > > > > > > v7 changes:
> > > > > > >  - add eptrs[], it's used to store the register rte_epoll_event instances.
> > > > > > >  - add vec_en, to log the vector capability status.
> > > > > > > 
> > > > > > > v6 changes:
> > > > > > >  - add mapping table between irq vector number and queue id.
> > > > > > > 
> > > > > > > v5 changes:
> > > > > > >  - Create this new patch file for changed struct rte_intr_handle that
> > > > > > >    other patches depend on, to avoid breaking git bisect.
> > > > > > > 
> > > > > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
> > > > > > >  1 file changed, 10 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > > > index 6a159c7..27174df 100644
> > > > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > > > @@ -38,6 +38,8 @@
> > > > > > >  #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
> > > > > > >  #define _RTE_LINUXAPP_INTERRUPTS_H_
> > > > > > >  
> > > > > > > +#define RTE_MAX_RXTX_INTR_VEC_ID     32
> > > > > > > +
> > > > > > >  enum rte_intr_handle_type {
> > > > > > >  	RTE_INTR_HANDLE_UNKNOWN = 0,
> > > > > > >  	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
> > > > > > > @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
> > > > > > >  	RTE_INTR_HANDLE_MAX
> > > > > > >  };
> > > > > > >  
> > > > > > > +struct rte_epoll_event;
> > > > > > > +
> > > > > > >  /** Handle for interrupts. */
> > > > > > >  struct rte_intr_handle {
> > > > > > >  	union {
> > > > > > > @@ -57,6 +61,12 @@ struct rte_intr_handle {
> > > > > > >  	};
> > > > > > >  	int fd;	 /**< interrupt event file descriptor */
> > > > > > >  	enum rte_intr_handle_type type;  /**< handle type */
> > > > > > > +	uint32_t max_intr;               /**< max interrupt requested */
> > > > > > > +	uint32_t nb_efd;                 /**< number of available efds */
> > > > > > > +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
> > > > > > > +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
> > > > > > > +					 /**< intr vector epoll event ptr */
> > > > > > > +	int *intr_vec;                   /**< intr vector number array */
> > > > > > >  };
> > > > > > >    
> > > > > > 
> > > > > > This is going to be ABI breaking if this from test_interrupts.c:
> > > > > > static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];
> > > > > > 
> > > > > > is a plausible way of using this structure.  Even putting the data at the end of
> > > > > > the structure won't help, as the array indicies are off
> > > > > 
> > > > > This needs to go in 2.0 and 2.0 has to have new ABI anyway.
> > > > > 
> > > > We've already released 2.0, I think you mean 2.1, but 2.1 can't have a new ABI
> > > > because we didn't announce it in 1.8.  The earliest we can update the ABI
> > > > (according to the ABI docs) at this point is 2.2, since we need to announce the
> > > > change in 2.1, then make it in 2.2
> > > > 
> > > > Neil
> > > > 
> > > 
> > > Then just skip 2.1 (or make it a trivial doc change only dummy release),
> > > and call it 2.2.
> > > 
> > > I guess we need to proactively say every .x release will have new ABI.
> > > Sorry, this is a project under development.
> > > 
> > Sorry, NAK.  I didn't go through all the trouble of creating an ABI
> > infrastructure just to throw it out the window on some rubber stamp.  We decided
> > on the rules, we need to stick to them.  We have large projects that rely on
> > DPDK now (OVS primarily), and we owe it to them to not just go completely throw
> > out the ABI every release.  We have a process for doing it, lets follow it.
> > 
> > Neil
> > 
> 
> I meant, that close and ship existing 2.1 code base early and open 2.2 early
> to keep things rolling. But in general this project needs x.x.y releases
> with ABI stability, and just admit that x.x releases will not have stable ABI.
> That is reality now.
> 
I'm  not opposed to doing that, though the purpose of the proposed cadence was
to give sufficient notice to downstream consumers of DPDK that ABI changes were
comming.  As long as the time delta beweeen 2.X and 2.X+1 is sufficient for
consumers to have time to react and update their applications I'm ok with it
(which I know is subjective, but I'm willing to experiment there).

> A lot of the ABI problem is that the code does not do a good job of hiding.
> And also does not sepearte driver ABI from user ABI. There are things like
> structure of PCI and interrupt handles that the user from library point
> of view should not care about, but drivers will need to.
> 
I agree.  I had hoped that implementing an ABI process would help drive
improvements in this area, but it hasn't seemed to yet.

Neil

> 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
  2015-05-26 15:46  3%           ` Ananyev, Konstantin
@ 2015-05-27  1:07  0%             ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-05-27  1:07 UTC (permalink / raw)
  To: Ananyev, Konstantin, Stephen Hemminger; +Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, May 26, 2015 11:46 PM
> To: Stephen Hemminger
> Cc: Zhang, Helin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for
> double vlan
> 
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Tuesday, May 26, 2015 4:35 PM
> > To: Ananyev, Konstantin
> > Cc: Zhang, Helin; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for
> > double vlan
> >
> > On Tue, 26 May 2015 15:02:51 +0000
> > "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> >
> > > Hi Stephen,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> > > > Hemminger
> > > > Sent: Tuesday, May 26, 2015 3:55 PM
> > > > To: Zhang, Helin
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits
> > > > for double vlan
> > > >
> > > > On Tue, 26 May 2015 16:36:37 +0800 Helin Zhang
> > > > <helin.zhang@intel.com> wrote:
> > > >
> > > > > Use the reserved 16 bits in rte_mbuf structure for the outer
> > > > > vlan, also add QinQ offloading flags for both RX and TX sides.
> > > > >
> > > > > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > > >
> > > > Yet another change that is much needed, but breaks ABI
> compatibility.
> > >
> > > Why do you think it breaks ABI compatibility?
> > > As I can see, it uses field that was reserved.
> > > Konstantin
> >
> > Because an application maybe assuming something or reusing the
> reserved fields.
> 
> But properly behaving application, shouldn't do that right?
> And for misbehaving ones, why should we care about them?
For any reserved bits, I think all application users should avoid touching it,
as it is reserved for future use, or some special reason. Otherwise,
un-predicted behavior can be expected.

Regards,
Helin

> 
> > Yes, it would be dumb of application to do that but from absolute ABI
> > point of view it is a change.
> 
> So, in theory,  even adding a new field to the end of rte_mbuf is an ABI
> breakage?
> Konstantin

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
  2015-05-26 21:40  0%     ` Dumitrescu, Cristian
@ 2015-05-26 21:57  0%       ` Stephen Hemminger
  2015-05-28 19:32  3%         ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-26 21:57 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev

On Tue, 26 May 2015 21:40:42 +0000
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com> wrote:

> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> > Hemminger
> > Sent: Tuesday, May 26, 2015 3:58 PM
> > To: Gajdzica, MaciejX T
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing
> > table stats
> > 
> > On Tue, 26 May 2015 14:39:38 +0200
> > Maciej Gajdzica <maciejx.t.gajdzica@intel.com> wrote:
> > 
> > > +
> > >  /** Lookup table interface defining the lookup table operation */
> > >  struct rte_table_ops {
> > >  	rte_table_op_create f_create;       /**< Create */
> > > @@ -194,6 +218,7 @@ struct rte_table_ops {
> > >  	rte_table_op_entry_add f_add;       /**< Entry add */
> > >  	rte_table_op_entry_delete f_delete; /**< Entry delete */
> > >  	rte_table_op_lookup f_lookup;       /**< Lookup */
> > > +	rte_table_op_stats_read f_stats;	/**< Stats */
> > >  };
> > 
> > Another good idea, which is an ABI change.
> 
> This is simply adding a new API function, this is not changing any function prototype. There is no change required in the map file of this library. Is there anything we should have done and we did not do?
> 

But if I built an external set of code which had rte_table_ops (don't worry I haven't)
and that binary ran with the new definition, the core code it table would reference
outside the (old version) of rte_table_ops structure and find garbage.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
  2015-05-26 14:57  3%   ` Stephen Hemminger
@ 2015-05-26 21:40  0%     ` Dumitrescu, Cristian
  2015-05-26 21:57  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2015-05-26 21:40 UTC (permalink / raw)
  To: Stephen Hemminger, Gajdzica, MaciejX T; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Tuesday, May 26, 2015 3:58 PM
> To: Gajdzica, MaciejX T
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing
> table stats
> 
> On Tue, 26 May 2015 14:39:38 +0200
> Maciej Gajdzica <maciejx.t.gajdzica@intel.com> wrote:
> 
> > +
> >  /** Lookup table interface defining the lookup table operation */
> >  struct rte_table_ops {
> >  	rte_table_op_create f_create;       /**< Create */
> > @@ -194,6 +218,7 @@ struct rte_table_ops {
> >  	rte_table_op_entry_add f_add;       /**< Entry add */
> >  	rte_table_op_entry_delete f_delete; /**< Entry delete */
> >  	rte_table_op_lookup f_lookup;       /**< Lookup */
> > +	rte_table_op_stats_read f_stats;	/**< Stats */
> >  };
> 
> Another good idea, which is an ABI change.

This is simply adding a new API function, this is not changing any function prototype. There is no change required in the map file of this library. Is there anything we should have done and we did not do?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
  2015-05-26 15:35  3%         ` Stephen Hemminger
@ 2015-05-26 15:46  3%           ` Ananyev, Konstantin
  2015-05-27  1:07  0%             ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-05-26 15:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, May 26, 2015 4:35 PM
> To: Ananyev, Konstantin
> Cc: Zhang, Helin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
> 
> On Tue, 26 May 2015 15:02:51 +0000
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> 
> > Hi Stephen,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > > Sent: Tuesday, May 26, 2015 3:55 PM
> > > To: Zhang, Helin
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
> > >
> > > On Tue, 26 May 2015 16:36:37 +0800
> > > Helin Zhang <helin.zhang@intel.com> wrote:
> > >
> > > > Use the reserved 16 bits in rte_mbuf structure for the outer vlan,
> > > > also add QinQ offloading flags for both RX and TX sides.
> > > >
> > > > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > >
> > > Yet another change that is much needed, but breaks ABI compatibility.
> >
> > Why do you think it breaks ABI compatibility?
> > As I can see, it uses field that was reserved.
> > Konstantin
> 
> Because an application maybe assuming something or reusing the reserved fields.

But properly behaving application, shouldn't do that right?
And for misbehaving ones, why should we care about them?

> Yes, it would be dumb of application to do that but from absolute ABI point
> of view it is a change.

So, in theory,  even adding a new field to the end of rte_mbuf is an ABI breakage?
Konstantin

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
  2015-05-26 15:02  3%       ` Ananyev, Konstantin
@ 2015-05-26 15:35  3%         ` Stephen Hemminger
  2015-05-26 15:46  3%           ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-26 15:35 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Tue, 26 May 2015 15:02:51 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> Hi Stephen,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Tuesday, May 26, 2015 3:55 PM
> > To: Zhang, Helin
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
> > 
> > On Tue, 26 May 2015 16:36:37 +0800
> > Helin Zhang <helin.zhang@intel.com> wrote:
> > 
> > > Use the reserved 16 bits in rte_mbuf structure for the outer vlan,
> > > also add QinQ offloading flags for both RX and TX sides.
> > >
> > > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > 
> > Yet another change that is much needed, but breaks ABI compatibility.
> 
> Why do you think it breaks ABI compatibility?
> As I can see, it uses field that was reserved.
> Konstantin

Because an application maybe assuming something or reusing the reserved fields.
Yes, it would be dumb of application to do that but from absolute ABI point
of view it is a change.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC PATCH 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
  2015-05-26 15:03  3%   ` Stephen Hemminger
@ 2015-05-26 15:09  0%     ` Marc Sune
  0 siblings, 0 replies; 200+ results
From: Marc Sune @ 2015-05-26 15:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev



On 26/05/15 17:03, Stephen Hemminger wrote:
> On Tue, 12 May 2015 01:45:45 +0200
> Marc Sune <marc.sune@bisdn.de> wrote:
>
>> +/**
>> + * Ethernet device information
>> + */
>>   struct rte_eth_dev_info {
>>   	struct rte_pci_device *pci_dev; /**< Device PCI information. */
>>   	const char *driver_name; /**< Device Driver name. */
>> @@ -924,6 +947,7 @@ struct rte_eth_dev_info {
>>   	uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
>>   	uint16_t vmdq_queue_num;  /**< Queue number for VMDQ pools. */
>>   	uint16_t vmdq_pool_base;  /**< First ID of VMDQ pools. */
>> +	uint16_t speed_capa;  /**< Supported speeds bitmap. */
>>   };
>>   
> Since you are changing size of key structure, this is an ABI change.

Yes. This means target would be 2.2?

I will send the new version anyway to further discuss, and will rebase 
again once necessary.

Marc

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
  2015-05-26 14:55  3%     ` Stephen Hemminger
  2015-05-26 15:00  0%       ` Zhang, Helin
@ 2015-05-26 15:02  3%       ` Ananyev, Konstantin
  2015-05-26 15:35  3%         ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-05-26 15:02 UTC (permalink / raw)
  To: Stephen Hemminger, Zhang, Helin; +Cc: dev

Hi Stephen,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Tuesday, May 26, 2015 3:55 PM
> To: Zhang, Helin
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
> 
> On Tue, 26 May 2015 16:36:37 +0800
> Helin Zhang <helin.zhang@intel.com> wrote:
> 
> > Use the reserved 16 bits in rte_mbuf structure for the outer vlan,
> > also add QinQ offloading flags for both RX and TX sides.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> 
> Yet another change that is much needed, but breaks ABI compatibility.

Why do you think it breaks ABI compatibility?
As I can see, it uses field that was reserved.
Konstantin

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC PATCH 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
  @ 2015-05-26 15:03  3%   ` Stephen Hemminger
  2015-05-26 15:09  0%     ` Marc Sune
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-26 15:03 UTC (permalink / raw)
  To: Marc Sune; +Cc: dev

On Tue, 12 May 2015 01:45:45 +0200
Marc Sune <marc.sune@bisdn.de> wrote:

> +/**
> + * Ethernet device information
> + */
>  struct rte_eth_dev_info {
>  	struct rte_pci_device *pci_dev; /**< Device PCI information. */
>  	const char *driver_name; /**< Device Driver name. */
> @@ -924,6 +947,7 @@ struct rte_eth_dev_info {
>  	uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
>  	uint16_t vmdq_queue_num;  /**< Queue number for VMDQ pools. */
>  	uint16_t vmdq_pool_base;  /**< First ID of VMDQ pools. */
> +	uint16_t speed_capa;  /**< Supported speeds bitmap. */
>  };
>  

Since you are changing size of key structure, this is an ABI change.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
  2015-05-26 14:55  3%     ` Stephen Hemminger
@ 2015-05-26 15:00  0%       ` Zhang, Helin
  2015-05-26 15:02  3%       ` Ananyev, Konstantin
  1 sibling, 0 replies; 200+ results
From: Zhang, Helin @ 2015-05-26 15:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, May 26, 2015 10:55 PM
> To: Zhang, Helin
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for
> double vlan
> 
> On Tue, 26 May 2015 16:36:37 +0800
> Helin Zhang <helin.zhang@intel.com> wrote:
> 
> > Use the reserved 16 bits in rte_mbuf structure for the outer vlan,
> > also add QinQ offloading flags for both RX and TX sides.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> 
> Yet another change that is much needed, but breaks ABI compatibility.
Even just use the reserved 16 bits? It seems yes.
Would it be acceptable to use the original name of 'reserved' for the outer vlan?
And then announce the name change, and rename it one release after?

Regards,
Helin

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats
  @ 2015-05-26 14:57  3%   ` Stephen Hemminger
  2015-05-26 21:40  0%     ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-26 14:57 UTC (permalink / raw)
  To: Maciej Gajdzica; +Cc: dev

On Tue, 26 May 2015 14:39:38 +0200
Maciej Gajdzica <maciejx.t.gajdzica@intel.com> wrote:

> +
>  /** Lookup table interface defining the lookup table operation */
>  struct rte_table_ops {
>  	rte_table_op_create f_create;       /**< Create */
> @@ -194,6 +218,7 @@ struct rte_table_ops {
>  	rte_table_op_entry_add f_add;       /**< Entry add */
>  	rte_table_op_entry_delete f_delete; /**< Entry delete */
>  	rte_table_op_lookup f_lookup;       /**< Lookup */
> +	rte_table_op_stats_read f_stats;	/**< Stats */
>  };

Another good idea, which is an ABI change.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan
  @ 2015-05-26 14:55  3%     ` Stephen Hemminger
  2015-05-26 15:00  0%       ` Zhang, Helin
  2015-05-26 15:02  3%       ` Ananyev, Konstantin
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2015-05-26 14:55 UTC (permalink / raw)
  To: Helin Zhang; +Cc: dev

On Tue, 26 May 2015 16:36:37 +0800
Helin Zhang <helin.zhang@intel.com> wrote:

> Use the reserved 16 bits in rte_mbuf structure for the outer vlan,
> also add QinQ offloading flags for both RX and TX sides.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>

Yet another change that is much needed, but breaks ABI compatibility.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
       [not found]                   ` <40594e9e6e0543afa11e4dbd90e59b22@BRMWP-EXMB11.corp.brocade.com>
@ 2015-05-22 16:52  5%                 ` Stephen Hemminger
  2015-05-27 10:33  4%                   ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-22 16:52 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev, liang-min.wang

On Fri, 22 May 2015 00:05:36 +0000
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Thu, May 21, 2015 at 11:14:00AM -0700, Stephen Hemminger wrote:
> > On Thu, 21 May 2015 13:58:46 -0400
> > Neil Horman <nhorman@tuxdriver.com> wrote:
> > 
> > > On Thu, May 21, 2015 at 10:43:00AM -0700, Stephen Hemminger wrote:
> > > > On Thu, 21 May 2015 06:32:02 -0400
> > > > Neil Horman <nhorman@tuxdriver.com> wrote:
> > > > 
> > > > > On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
> > > > > > The patch adds interrupt vectors support in rte_intr_handle.
> > > > > > 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
> > > > > > Those event fds are stored in efds[].
> > > > > > 'intr_vec' is reserved for device driver to initialize the vector mapping table.
> > > > > > When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
> > > > > > 
> > > > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > ---
> > > > > > v7 changes:
> > > > > >  - add eptrs[], it's used to store the register rte_epoll_event instances.
> > > > > >  - add vec_en, to log the vector capability status.
> > > > > > 
> > > > > > v6 changes:
> > > > > >  - add mapping table between irq vector number and queue id.
> > > > > > 
> > > > > > v5 changes:
> > > > > >  - Create this new patch file for changed struct rte_intr_handle that
> > > > > >    other patches depend on, to avoid breaking git bisect.
> > > > > > 
> > > > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > > index 6a159c7..27174df 100644
> > > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > > @@ -38,6 +38,8 @@
> > > > > >  #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
> > > > > >  #define _RTE_LINUXAPP_INTERRUPTS_H_
> > > > > >  
> > > > > > +#define RTE_MAX_RXTX_INTR_VEC_ID     32
> > > > > > +
> > > > > >  enum rte_intr_handle_type {
> > > > > >  	RTE_INTR_HANDLE_UNKNOWN = 0,
> > > > > >  	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
> > > > > > @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
> > > > > >  	RTE_INTR_HANDLE_MAX
> > > > > >  };
> > > > > >  
> > > > > > +struct rte_epoll_event;
> > > > > > +
> > > > > >  /** Handle for interrupts. */
> > > > > >  struct rte_intr_handle {
> > > > > >  	union {
> > > > > > @@ -57,6 +61,12 @@ struct rte_intr_handle {
> > > > > >  	};
> > > > > >  	int fd;	 /**< interrupt event file descriptor */
> > > > > >  	enum rte_intr_handle_type type;  /**< handle type */
> > > > > > +	uint32_t max_intr;               /**< max interrupt requested */
> > > > > > +	uint32_t nb_efd;                 /**< number of available efds */
> > > > > > +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
> > > > > > +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
> > > > > > +					 /**< intr vector epoll event ptr */
> > > > > > +	int *intr_vec;                   /**< intr vector number array */
> > > > > >  };
> > > > > >    
> > > > > 
> > > > > This is going to be ABI breaking if this from test_interrupts.c:
> > > > > static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];
> > > > > 
> > > > > is a plausible way of using this structure.  Even putting the data at the end of
> > > > > the structure won't help, as the array indicies are off
> > > > 
> > > > This needs to go in 2.0 and 2.0 has to have new ABI anyway.
> > > > 
> > > We've already released 2.0, I think you mean 2.1, but 2.1 can't have a new ABI
> > > because we didn't announce it in 1.8.  The earliest we can update the ABI
> > > (according to the ABI docs) at this point is 2.2, since we need to announce the
> > > change in 2.1, then make it in 2.2
> > > 
> > > Neil
> > > 
> > 
> > Then just skip 2.1 (or make it a trivial doc change only dummy release),
> > and call it 2.2.
> > 
> > I guess we need to proactively say every .x release will have new ABI.
> > Sorry, this is a project under development.
> > 
> Sorry, NAK.  I didn't go through all the trouble of creating an ABI
> infrastructure just to throw it out the window on some rubber stamp.  We decided
> on the rules, we need to stick to them.  We have large projects that rely on
> DPDK now (OVS primarily), and we owe it to them to not just go completely throw
> out the ABI every release.  We have a process for doing it, lets follow it.
> 
> Neil
> 

I meant, that close and ship existing 2.1 code base early and open 2.2 early
to keep things rolling. But in general this project needs x.x.y releases
with ABI stability, and just admit that x.x releases will not have stable ABI.
That is reality now.

A lot of the ABI problem is that the code does not do a good job of hiding.
And also does not sepearte driver ABI from user ABI. There are things like
structure of PCI and interrupt handles that the user from library point
of view should not care about, but drivers will need to.

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v5 01/18] mbuf: redefine packet_type in rte_mbuf
  @ 2015-05-22 10:09  3%     ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-05-22 10:09 UTC (permalink / raw)
  To: Helin Zhang; +Cc: dev

On Fri, May 22, 2015 at 04:44:07PM +0800, Helin Zhang wrote:
> In order to unify the packet type, the field of 'packet_type' in
> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> KNI should be right mapped to 'struct rte_mbuf', it should be
> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> by default, as 'struct rte_mbuf' changed.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> ---
>  config/common_linuxapp                             |  2 +-
>  .../linuxapp/eal/include/exec-env/rte_kni_common.h |  4 ++--
>  lib/librte_mbuf/rte_mbuf.h                         | 23 +++++++++++++++-------
>  3 files changed, 19 insertions(+), 10 deletions(-)
> 
> v2 changes:
> * Enlarged the packet_type field from 16 bits to 32 bits.
> * Redefined the packet type sub-fields.
> * Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
> 
> v3 changes:
> * Put the mbuf layout changes into a single patch.
> * Disabled vector ixgbe PMD by default, as mbuf layout changed.
> 
> v5 changes:
> * Re-worded the commit logs.
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 0078dc9..6b067c7 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
>  CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
>  CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
> -CONFIG_RTE_IXGBE_INC_VECTOR=y
> +CONFIG_RTE_IXGBE_INC_VECTOR=n
>  CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
>  
>  #
> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> index 1e55c2d..bd1cc09 100644
> --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> @@ -117,9 +117,9 @@ struct rte_kni_mbuf {
>  	uint16_t data_off;      /**< Start address of data in segment buffer. */
>  	char pad1[4];
>  	uint64_t ol_flags;      /**< Offload features. */
> -	char pad2[2];
> -	uint16_t data_len;      /**< Amount of data in segment buffer. */
> +	char pad2[4];
>  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
> +	uint16_t data_len;      /**< Amount of data in segment buffer. */
>  
>  	/* fields on second cache line */
>  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index ab6de67..c2b1463 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -269,17 +269,26 @@ struct rte_mbuf {
>  	/* remaining bytes are set on RX when pulling packet from descriptor */
>  	MARKER rx_descriptor_fields1;
>  
> -	/**
> -	 * The packet type, which is used to indicate ordinary packet and also
> -	 * tunneled packet format, i.e. each number is represented a type of
> -	 * packet.
> +	/*
> +	 * The packet type, which is the combination of outer/inner L2, L3, L4
> +	 * and tunnel types.
>  	 */
> -	uint16_t packet_type;
> +	union {
> +		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
> +		struct {
> +			uint32_t l2_type:4; /**< (Outer) L2 type. */
> +			uint32_t l3_type:4; /**< (Outer) L3 type. */
> +			uint32_t l4_type:4; /**< (Outer) L4 type. */
> +			uint32_t tun_type:4; /**< Tunnel type. */
> +			uint32_t inner_l2_type:4; /**< Inner L2 type. */
> +			uint32_t inner_l3_type:4; /**< Inner L3 type. */
> +			uint32_t inner_l4_type:4; /**< Inner L4 type. */
> +		};
> +	};
>  
> -	uint16_t data_len;        /**< Amount of data in segment buffer. */
>  	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
> +	uint16_t data_len;        /**< Amount of data in segment buffer. */
>  	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
> -	uint16_t reserved;
>  	union {
>  		uint32_t rss;     /**< RSS hash result if RSS enabled */
>  		struct {


ABI Compatibility process?

Neil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 18/18] mbuf: remove old packet type bit masks
  2015-05-22  8:44  2% ` [dpdk-dev] [PATCH v5 " Helin Zhang
  @ 2015-05-22  8:44  3%   ` Helin Zhang
  2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
  2 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-05-22  8:44 UTC (permalink / raw)
  To: dev

As unified packet types are used instead, those old bit masks and
the relevant macros for packet type indication need to be removed.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 6 ------
 lib/librte_mbuf/rte_mbuf.h | 6 ------
 2 files changed, 12 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.
* Redefined the bit masks for packet RX offload flags.

v5 changes:
* Rolled back the bit masks of RX flags, for ABI compatibility.

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index f506517..78688f7 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -251,14 +251,8 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
 	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
 	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
-	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
-	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
-	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
-	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
-	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
-	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 6a26172..aea9ba8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,14 +91,8 @@ extern "C" {
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR     (0ULL << 0)  /**< Hardware processing error. */
 #define PKT_RX_MAC_ERR       (0ULL << 0)  /**< MAC error. */
-#define PKT_RX_IPV4_HDR      (1ULL << 5)  /**< RX packet with IPv4 header. */
-#define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with extended IPv4 header. */
-#define PKT_RX_IPV6_HDR      (1ULL << 7)  /**< RX packet with IPv6 header. */
-#define PKT_RX_IPV6_HDR_EXT  (1ULL << 8)  /**< RX packet with extended IPv6 header. */
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
-#define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 header.*/
-#define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
 #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
 /* add new RX flags here */
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 00/18] unified packet type
  @ 2015-05-22  8:44  2% ` Helin Zhang
                       ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Helin Zhang @ 2015-05-22  8:44 UTC (permalink / raw)
  To: dev

Currently only 6 bits which are stored in ol_flags are used to indicate
the packet types. This is not enough, as some NIC hardware can recognize
quite a lot of packet types, e.g i40e hardware can recognize more than 150
packet types. Hiding those packet types hides hardware offload capabilities
which could be quite useful for improving performance and for end users. So
an unified packet types are needed to support all possible PMDs. A 16 bits
packet_type in mbuf structure can be changed to 32 bits and used for this
purpose. In addition, all packet types stored in ol_flag field should be
deleted at all, and 6 bits of ol_flags can be save as the benifit.

Initially, 32 bits of packet_type can be divided into several sub fields to
indicate different packet type information of a packet. The initial design
is to divide those bits into fields for L2 types, L3 types, L4 types, tunnel
types, inner L2 types, inner L3 types and inner L4 types. All PMDs should
translate the offloaded packet types into these 7 fields of information, for
user applications.

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
* Used redefined packet types and enlarged packet_type field for all PMDs
  and corresponding applications.
* Removed changes in bond and its relevant application, as there is no need
  at all according to the recent bond changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Put vector ixgbe changes right after mbuf changes.
* Disabled vector ixgbe PMD by default, as mbuf layout changed, and then
  re-enabled it after vector ixgbe PMD updated.
* Put the definitions of unified packet type into a single patch.
* Minor bug fixes and enhancements in l3fwd example.

v4 changes:
* Added detailed description of each packet types.
* Supported unified packet type of fm10k.
* Added printing logs of packet types of each received packet for rxonly
  mode in testpmd.
* Removed several useless code lines which block packet type unification from
  app/test/packet_burst_generator.c.

v5 changes:
* Added more detailed description for each packet types, together with examples.
* Rolled back the macro definitions of RX packet flags, for ABI compitability.

Helin Zhang (18):
  mbuf: redefine packet_type in rte_mbuf
  ixgbe: support unified packet type in vectorized PMD
  mbuf: add definitions of unified packet types
  e1000: replace bit mask based packet type with unified packet type
  ixgbe: replace bit mask based packet type with unified packet type
  i40e: replace bit mask based packet type with unified packet type
  enic: replace bit mask based packet type with unified packet type
  vmxnet3: replace bit mask based packet type with unified packet type
  fm10k: replace bit mask based packet type with unified packet type
  app/test-pipeline: replace bit mask based packet type with unified
    packet type
  app/testpmd: replace bit mask based packet type with unified packet
    type
  app/test: Remove useless code
  examples/ip_fragmentation: replace bit mask based packet type with
    unified packet type
  examples/ip_reassembly: replace bit mask based packet type with
    unified packet type
  examples/l3fwd-acl: replace bit mask based packet type with unified
    packet type
  examples/l3fwd-power: replace bit mask based packet type with unified
    packet type
  examples/l3fwd: replace bit mask based packet type with unified packet
    type
  mbuf: remove old packet type bit masks

 app/test-pipeline/pipeline_hash.c                  |   7 +-
 app/test-pmd/csumonly.c                            |  10 +-
 app/test-pmd/rxonly.c                              | 178 ++++-
 app/test/packet_burst_generator.c                  |  10 -
 examples/ip_fragmentation/main.c                   |   7 +-
 examples/ip_reassembly/main.c                      |   7 +-
 examples/l3fwd-acl/main.c                          |  19 +-
 examples/l3fwd-power/main.c                        |   5 +-
 examples/l3fwd/main.c                              |  71 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |   4 +-
 lib/librte_mbuf/rte_mbuf.c                         |   6 -
 lib/librte_mbuf/rte_mbuf.h                         | 514 +++++++++++++-
 lib/librte_pmd_e1000/igb_rxtx.c                    |  98 ++-
 lib/librte_pmd_enic/enic_main.c                    |  14 +-
 lib/librte_pmd_fm10k/fm10k_rxtx.c                  |  30 +-
 lib/librte_pmd_i40e/i40e_rxtx.c                    | 786 ++++++++++++++-------
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c                  | 139 +++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c              |  49 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c              |   4 +-
 19 files changed, 1498 insertions(+), 460 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
       [not found]                 ` <20150521111400.2a04a196@urahara>
@ 2015-05-22  0:05  4%               ` Neil Horman
       [not found]                   ` <40594e9e6e0543afa11e4dbd90e59b22@BRMWP-EXMB11.corp.brocade.com>
  1 sibling, 0 replies; 200+ results
From: Neil Horman @ 2015-05-22  0:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, liang-min.wang

On Thu, May 21, 2015 at 11:14:00AM -0700, Stephen Hemminger wrote:
> On Thu, 21 May 2015 13:58:46 -0400
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Thu, May 21, 2015 at 10:43:00AM -0700, Stephen Hemminger wrote:
> > > On Thu, 21 May 2015 06:32:02 -0400
> > > Neil Horman <nhorman@tuxdriver.com> wrote:
> > > 
> > > > On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
> > > > > The patch adds interrupt vectors support in rte_intr_handle.
> > > > > 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
> > > > > Those event fds are stored in efds[].
> > > > > 'intr_vec' is reserved for device driver to initialize the vector mapping table.
> > > > > When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
> > > > > 
> > > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > ---
> > > > > v7 changes:
> > > > >  - add eptrs[], it's used to store the register rte_epoll_event instances.
> > > > >  - add vec_en, to log the vector capability status.
> > > > > 
> > > > > v6 changes:
> > > > >  - add mapping table between irq vector number and queue id.
> > > > > 
> > > > > v5 changes:
> > > > >  - Create this new patch file for changed struct rte_intr_handle that
> > > > >    other patches depend on, to avoid breaking git bisect.
> > > > > 
> > > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > index 6a159c7..27174df 100644
> > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > > @@ -38,6 +38,8 @@
> > > > >  #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
> > > > >  #define _RTE_LINUXAPP_INTERRUPTS_H_
> > > > >  
> > > > > +#define RTE_MAX_RXTX_INTR_VEC_ID     32
> > > > > +
> > > > >  enum rte_intr_handle_type {
> > > > >  	RTE_INTR_HANDLE_UNKNOWN = 0,
> > > > >  	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
> > > > > @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
> > > > >  	RTE_INTR_HANDLE_MAX
> > > > >  };
> > > > >  
> > > > > +struct rte_epoll_event;
> > > > > +
> > > > >  /** Handle for interrupts. */
> > > > >  struct rte_intr_handle {
> > > > >  	union {
> > > > > @@ -57,6 +61,12 @@ struct rte_intr_handle {
> > > > >  	};
> > > > >  	int fd;	 /**< interrupt event file descriptor */
> > > > >  	enum rte_intr_handle_type type;  /**< handle type */
> > > > > +	uint32_t max_intr;               /**< max interrupt requested */
> > > > > +	uint32_t nb_efd;                 /**< number of available efds */
> > > > > +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
> > > > > +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
> > > > > +					 /**< intr vector epoll event ptr */
> > > > > +	int *intr_vec;                   /**< intr vector number array */
> > > > >  };
> > > > >    
> > > > 
> > > > This is going to be ABI breaking if this from test_interrupts.c:
> > > > static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];
> > > > 
> > > > is a plausible way of using this structure.  Even putting the data at the end of
> > > > the structure won't help, as the array indicies are off
> > > 
> > > This needs to go in 2.0 and 2.0 has to have new ABI anyway.
> > > 
> > We've already released 2.0, I think you mean 2.1, but 2.1 can't have a new ABI
> > because we didn't announce it in 1.8.  The earliest we can update the ABI
> > (according to the ABI docs) at this point is 2.2, since we need to announce the
> > change in 2.1, then make it in 2.2
> > 
> > Neil
> > 
> 
> Then just skip 2.1 (or make it a trivial doc change only dummy release),
> and call it 2.2.
> 
> I guess we need to proactively say every .x release will have new ABI.
> Sorry, this is a project under development.
> 
Sorry, NAK.  I didn't go through all the trouble of creating an ABI
infrastructure just to throw it out the window on some rubber stamp.  We decided
on the rules, we need to stick to them.  We have large projects that rely on
DPDK now (OVS primarily), and we owe it to them to not just go completely throw
out the ABI every release.  We have a process for doing it, lets follow it.

Neil

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
  2015-05-21 17:58  4%           ` Neil Horman
@ 2015-05-21 18:21  3%             ` Stephen Hemminger
       [not found]                 ` <20150521111400.2a04a196@urahara>
  2015-05-29  8:56  5%             ` Liang, Cunming
  2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-05-21 18:21 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev, liang-min.wang

On Thu, 21 May 2015 13:58:46 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Thu, May 21, 2015 at 10:43:00AM -0700, Stephen Hemminger wrote:
> > On Thu, 21 May 2015 06:32:02 -0400
> > Neil Horman <nhorman@tuxdriver.com> wrote:
> > 
> > > On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
> > > > The patch adds interrupt vectors support in rte_intr_handle.
> > > > 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
> > > > Those event fds are stored in efds[].
> > > > 'intr_vec' is reserved for device driver to initialize the vector mapping table.
> > > > When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
> > > > 
> > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > ---
> > > > v7 changes:
> > > >  - add eptrs[], it's used to store the register rte_epoll_event instances.
> > > >  - add vec_en, to log the vector capability status.
> > > > 
> > > > v6 changes:
> > > >  - add mapping table between irq vector number and queue id.
> > > > 
> > > > v5 changes:
> > > >  - Create this new patch file for changed struct rte_intr_handle that
> > > >    other patches depend on, to avoid breaking git bisect.
> > > > 
> > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > index 6a159c7..27174df 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > > @@ -38,6 +38,8 @@
> > > >  #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
> > > >  #define _RTE_LINUXAPP_INTERRUPTS_H_
> > > >  
> > > > +#define RTE_MAX_RXTX_INTR_VEC_ID     32
> > > > +
> > > >  enum rte_intr_handle_type {
> > > >  	RTE_INTR_HANDLE_UNKNOWN = 0,
> > > >  	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
> > > > @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
> > > >  	RTE_INTR_HANDLE_MAX
> > > >  };
> > > >  
> > > > +struct rte_epoll_event;
> > > > +
> > > >  /** Handle for interrupts. */
> > > >  struct rte_intr_handle {
> > > >  	union {
> > > > @@ -57,6 +61,12 @@ struct rte_intr_handle {
> > > >  	};
> > > >  	int fd;	 /**< interrupt event file descriptor */
> > > >  	enum rte_intr_handle_type type;  /**< handle type */
> > > > +	uint32_t max_intr;               /**< max interrupt requested */
> > > > +	uint32_t nb_efd;                 /**< number of available efds */
> > > > +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
> > > > +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
> > > > +					 /**< intr vector epoll event ptr */
> > > > +	int *intr_vec;                   /**< intr vector number array */
> > > >  };
> > > >    
> > > 
> > > This is going to be ABI breaking if this from test_interrupts.c:
> > > static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];
> > > 
> > > is a plausible way of using this structure.  Even putting the data at the end of
> > > the structure won't help, as the array indicies are off
> > 
> > This needs to go in 2.0 and 2.0 has to have new ABI anyway.
> > 
> We've already released 2.0, I think you mean 2.1, but 2.1 can't have a new ABI
> because we didn't announce it in 1.8.  The earliest we can update the ABI
> (according to the ABI docs) at this point is 2.2, since we need to announce the
> change in 2.1, then make it in 2.2
> 
> Neil
> 

Then just skip 2.1 (or make it a trivial doc change only dummy release),
and call it 2.2.

I guess we need to proactively say every .x release will have new ABI.
Sorry, this is a project under development.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
       [not found]             ` <20150521104300.00757b4e@urahara>
@ 2015-05-21 17:58  4%           ` Neil Horman
  2015-05-21 18:21  3%             ` Stephen Hemminger
                               ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Neil Horman @ 2015-05-21 17:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, liang-min.wang

On Thu, May 21, 2015 at 10:43:00AM -0700, Stephen Hemminger wrote:
> On Thu, 21 May 2015 06:32:02 -0400
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
> > > The patch adds interrupt vectors support in rte_intr_handle.
> > > 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
> > > Those event fds are stored in efds[].
> > > 'intr_vec' is reserved for device driver to initialize the vector mapping table.
> > > When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
> > > 
> > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > ---
> > > v7 changes:
> > >  - add eptrs[], it's used to store the register rte_epoll_event instances.
> > >  - add vec_en, to log the vector capability status.
> > > 
> > > v6 changes:
> > >  - add mapping table between irq vector number and queue id.
> > > 
> > > v5 changes:
> > >  - Create this new patch file for changed struct rte_intr_handle that
> > >    other patches depend on, to avoid breaking git bisect.
> > > 
> > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > index 6a159c7..27174df 100644
> > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> > > @@ -38,6 +38,8 @@
> > >  #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
> > >  #define _RTE_LINUXAPP_INTERRUPTS_H_
> > >  
> > > +#define RTE_MAX_RXTX_INTR_VEC_ID     32
> > > +
> > >  enum rte_intr_handle_type {
> > >  	RTE_INTR_HANDLE_UNKNOWN = 0,
> > >  	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
> > > @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
> > >  	RTE_INTR_HANDLE_MAX
> > >  };
> > >  
> > > +struct rte_epoll_event;
> > > +
> > >  /** Handle for interrupts. */
> > >  struct rte_intr_handle {
> > >  	union {
> > > @@ -57,6 +61,12 @@ struct rte_intr_handle {
> > >  	};
> > >  	int fd;	 /**< interrupt event file descriptor */
> > >  	enum rte_intr_handle_type type;  /**< handle type */
> > > +	uint32_t max_intr;               /**< max interrupt requested */
> > > +	uint32_t nb_efd;                 /**< number of available efds */
> > > +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
> > > +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
> > > +					 /**< intr vector epoll event ptr */
> > > +	int *intr_vec;                   /**< intr vector number array */
> > >  };
> > >    
> > 
> > This is going to be ABI breaking if this from test_interrupts.c:
> > static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];
> > 
> > is a plausible way of using this structure.  Even putting the data at the end of
> > the structure won't help, as the array indicies are off
> 
> This needs to go in 2.0 and 2.0 has to have new ABI anyway.
> 
We've already released 2.0, I think you mean 2.1, but 2.1 can't have a new ABI
because we didn't announce it in 1.8.  The earliest we can update the ABI
(according to the ABI docs) at this point is 2.2, since we need to announce the
change in 2.1, then make it in 2.2

Neil

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type
  @ 2015-05-21 12:12  3%             ` Richardson, Bruce
  0 siblings, 0 replies; 200+ results
From: Richardson, Bruce @ 2015-05-21 12:12 UTC (permalink / raw)
  To: Neil Horman, Marc Sune; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, May 20, 2015 7:47 PM
> To: Marc Sune
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type
> 
> On Wed, May 20, 2015 at 07:01:02PM +0200, Marc Sune wrote:
> >
> >
> > On 20/05/15 12:28, Neil Horman wrote:
> > >On Wed, May 20, 2015 at 12:05:00PM +0200, Marc Sune wrote:
> > >>
> > >>On 20/05/15 10:31, Thomas Monjalon wrote:
> > >>>2015-05-19 12:31, Bruce Richardson:
> > >>>>On Mon, May 11, 2015 at 05:29:39PM +0100, Bruce Richardson wrote:
> > >>>>>Hi all,
> > >>>>>
> > >>>>>after a small amount of offline discussion with Marc Sune, here
> > >>>>>is an alternative proposal for a higher-level interface - aka
> > >>>>>pktdev - to allow a common Rx/Tx API across device types handling
> > >>>>>mbufs [for now, ethdev, ring and KNI]. The key code is in the
> > >>>>>first patch fo the set - the second is an example of a trivial
> usecase.
> > >>>>>
> > >>>>>What is different about this to previously:
> > >>>>>* wrapper class, so no changes to any existing ring, ethdev
> > >>>>>implementations
> > >>>>>* use of function pointers for RX/TX with an API that maps to
> ethdev
> > >>>>>   - this means there is little/no additional overhead for ethdev
> calls
> > >>>>>   - inline special case for rings, to accelerate that. Since we
> are at a
> > >>>>>     higher level, we can special case process some things if
> appropriate. This
> > >>>>>     means the impact to ring ops is one (predictable) branch per
> > >>>>>burst
> > >>>>>* elimination of the queue abstraction. For the ring and KNI, there
> is no
> > >>>>>   concept of queues, so we just wrap the functions directly (no
> need even for
> > >>>>>   wrapper functions, the api's match so we can call directly).
> This also
> > >>>>>   means:
> > >>>>>   - adding in features per-queue, is far easier as we don't need
> to worry about
> > >>>>>     having arrays of multiple queues. For example:
> > >>>>>   - adding in buffering on TX (or RX) is easier since again we
> only have a
> > >>>>>     single queue.
> > >>>>>* thread safety is made easier using a wrapper. For a MP ring, we
> can create
> > >>>>>   multiple pktdevs around it, and each thread will then be able to
> use their
> > >>>>>   own copy, with their own buffering etc.
> > >>>>>
> > >>>>>However, at this point, I'm just looking for general feedback on
> > >>>>>this as an approach. I think it's quite flexible - even more so
> > >>>>>than the earlier proposal we had. It's less proscriptive and
> doesn't make any demands on any other libs.
> > >>>>>
> > >>>>>Comments/thoughts welcome.
> > >>>>Any comments on this RFC before I see about investing further time
> > >>>>in it to clean it up a bit and submit as a non-RFC patchset for
> merge in 2.1?
> > >>>I would say there are 2 possible approaches for KNI and ring
> handling:
> > >>>1/ You Bruce, Marc and Keith are advocating for a layer on top of
> > >>>ethdev, ring, KNI and possibly other devices, which uses mbuf. The
> > >>>set of functions is simpler than ethdev but the data structure is
> > >>>mbuf which is related to ethdev layer.
> > >>>2/ Konstantin and Neil talked about keeping mbuf for ethdev layer
> > >>>and related libs only. Ring and KNI could have an ethdev API with a
> > >>>reduced set of implemented functions. Crypto devices could adopt a
> > >>>specific crypto API and an ethdev API at the same time.
> > >>I don't fully understand which APIs you meant by non-ethdev. This
> > >>pktdev wrapper proposal abstracts RX and TX functions only, and all
> > >>of these are using mbufs as the packet buffer abstraction right now
> anyway (ethdev).
> > >>
> > >He's referring to future device classes (like crypto devices), which
> > >ostensibly would make use of the pktdev API.  My argument (and I
> > >think Thomas') is that if a bit of hardware can be made to operate as
> > >a packet sending/receiving device, then its just as reasonable to use
> > >the existing ethdev api rather than some other restricted version of
> > >it (pktdev)
> > >
> > >>This approach does not preclude that different libraries expose
> > >>other API calls. In fact they will have to; setup the port/device
> > >>... It is just a higher level API, so that you don't have to check
> > >>the type of port in your DPDK application I/O loop, minimizing user's
> code.
> > >>
> > >No argument there.  But if thats the case (and I agree that it is),
> > >an application will implicitly have to know what what type of device
> > >it is, because it (the application) will need to understand the
> specific API it is writing to.
> > >
> > >>Or were you in 2) thinking about creating a different "packet buffer"
> > >>abstraction, independent from the ethdev, and then map the different
> > >>port specifics (e.g. mbuf) to this new abstraction?
> > >>
> > >My argument was to just leave the ethdev api alone.  If a device
> > >class can be made to look like a packet forwarding device, then use
> > >the existing ethdev api to implement it.
> > >
> > >>>I feel it's cleaner, more generic and more maintainable to have
> > >>>drivers implementing one or several stable APIs instead of having
> > >>>some restricted wrappers to update.
> > >>This would be a separate library _on top_ of the existing APIs, and
> > >>it has the advantage to simplify the DPDK user's application code
> > >>when an application needs to deal with several types of port, as
> > >>shown in the example that Bruce provided in PATCH #2.
> > >>
> > >But thats already the purpose of the ethdev api.  Different types of
> > >hardware/software can be made to look like the same thing (an ethdev)
> > >from an application standpoint.  Adding this pktdev layer does
> > >nothing but that, add a layer.  If you want restricted functionality
> > >of an interface, thats ok, ethdev offers that ability.  unimplemented
> > >methods in a pmd cause the ethdev api to return EOPNOTSUP to the
> > >calling application, so the application knows when a given ethdev can't
> do some aspect of what an ethdev is.
> >
> > Hi Neil,
> >
> > Thanks for the clarifications. Now I understand the concern Thomas
> > expressed. Using ethdev API (port-ids) was actually my first
> > suggestion
> > here:
> >
> > http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545
> >
> > And to be honest, what I was expecting when I was reading for the
> > first time DPDK's APIs. It is indeed an option. However, if we take a
> look at the API:
> >
> > http://www.dpdk.org/doc/api/rte__ethdev_8h.html
> >
> > none of the API calls, except the burst RX/TX and, perhaps, the
> > callbacks, would be used by devices other than NICs. It seems going a
> > bit too far using it, but ofc possible.
> >
> So, I'll make 3 counter-arguments here:
> 
> 1) To your point about the ethdev api being much larger than what a non-
> ethernet device could use, I'll tacitly agree, but indicate that its not
> relevant.  If you want a bit of hardware that isn't a network interface to
> behave like a network interface, then there are going to be alot of
> aspects of a network interface that it just can't do.  Thats true
> regardless of how you implement that.  In the pktdev model, you prevent
> those operations from being an option at all, while in the current ethdev
> model, you simply get a return code of EOPNOTSUP, and the application does
> the right thing (which is to say, it understands that this hardware
> doesn't need that aspect of network card mangement and goes on with its
> day).  I assert that, because we already have the ethdev api, its a lower
> time investment to simply reuse it
> 
> 2) To the implication that we aren't working with NICs here, you're
> correct.  As you note in your previous message, the pktdev interface is in
> no way the end all and be all of device model design.  You will need to
> add other api calls to manage the device.  If thats the case, then don't
> shoehorn any one particular aspect of the API to fit a device model that
> the device doesn't conform to.
> Design the API so that it best reflects the hardware behavior.
> 
> 
> 3) An addendum to the point about hardware not being a NIC (and you didn't
> make this point directly above, but I think you may have mentioned it
> previously), sometimes you want a device to behave like another device for
> the purposes of using generic code to talk to several device types.  While
> this is true, this is a case for device translation and use, not for
> carving out parts of an api to make something more generic.  The use case
> I cited previously was an ipsec tunnel.  An ipsec tunnel uses
> cryptography, and crypto device apis to encrypt decrypt packet data.  The
> common way to implement this is to design a crypto api that accepts a
> block of data in a way most condusive to the hardware, and then implement
> a network driver (that uses whatever ethernet api, in this case the ethdev
> api), to integrate with the network datapath.  With this model, the ipsec
> tunnel uses the full range of the ethdev api (or a good deal more of it),
> and the crypto api is optimized to work with crypo acceleration hardware.
> 
> > In essence, rte_ether(rte_ethdev.h) right now has: i) NIC setup;
> > general configuration, queue config, fdir, offloads, hw stuff like
> > leds... ii) RX/TX routines and callbacks iii) Stats and queue stats
> > iv) other utils for ethernet stuff (rte_ether.h)
> >
> The key that I'm taking away here is 'right now'.  Its already written, so
> theres no work involved in implementing it for new devices.
> 
> > i) is clearly HW specific, and does only apply to NICs/ASICs (e.g.
> > FM10k)
> Ok, so it only applies to NIC's, thats fine.  If you want to write a
> driver that leaves those methods for the pmd set to NULL, the ethdev
> library will correctly return EOPNOTSUPP to the calling applications.
> 
> > while ii) and iii) are things that could be abstracted beyond NICs,
> > like KNI, rte_ring, crypto... (iv could be moved into some
> > utils/protocol parsing libraries).
> >
> Right again, so let those device types implement the appropriate portions
> of the pmd driver structure that match to what they support.  EVerything
> else is handled by the ethdev library automatically.
> 
> > Perhaps these two groups could be split into two different libraries
> > and then ii) and iii) together would be something like ~ rte_pktdev
> > (stats are missing on the proposed patch), while i) would be
> > rte_ether, or rte_nic if we think it is a better name.
> >
> The point I'm trying to get to is, why split at all?  Theres just no need
> that I can see. The example I would set here is the dummy driver in linux.
> Its a net device that only serves to act as a sink for network packets.
> It still uses the network driver interface, but of the 65-ish methods that
> the netdevice model in linux offers, it implements 8 (or approximately
> 12%).  The other unused method are just that, unused, and thats ok.
> Applications that try to do things like set flow director options, or
> speed/duplex options gets a return code that effectively says "This device
> can't do that", and thats ok.  Thats what we need to be doing here.
> Instead of finding a way to codify the subset of functionality that other
> devices might be able to implement, for those cases where we want other
> hardware to act like a netdevice, lets just let those devices pick and
> choose what to implement, and the interface we already have will
> communicate with applications appropriately.
> 
> Regards
> Neil
> 

Hi Neil,

First off, a note on the naming and the basic concept: this proposal is not trying to make everything look like NIC, rather we are trying to make a bunch of different components appear as generic sources/sinks for pkts or mbufs. From my point of view, it's an important difference.

Be that as it may, I'd like to first deal with the whole idea of the application needing to know about the type of the underlying device. For me, this is a critical point. Applications - such as all our sample apps - have essentially two parts:
* an initialization and control part
* a data-path part.
These two parts are very, very different in what they do. The initialization part - which e.g. in testpmd continues on in the form of the cmdline interface as a control part - does the initial setup of devices/rings/etc. and potentially makes use of the full APIs provided by the ethdev interface. It's also not performance critical, as evidenced by the fact that the APIs used there have additional checks for valid input etc. The second, data-path part, is entirely the part that this proposal is targeting. This data-path is completely separate in the application, is highly performance sensitive, and rarely, if ever needs to know or care about the actual source of its data. So the idea behind this library is that you can write your initialization control parts of your app as-now, fully aware of the underlying types involved, and without ever using rx/tx burst. Then when you have the various devices and DPDK objects set up, you spawn your data-path threads and pass each one the set of input and outputs it needs, in the form of your generic packet source objects.

This distinction is also why I'm not particularly interested in the ability to pass in different objects via cmdline, as is done now with pcap/ring PMDs. That's ok when you want the initialization part of your app to be oblivious to see everything under a common abstraction, but when it's only the data path you want to work with generic packet objects that's unnecessary, and the initialization path should be able to convert any of the required input/output sources to a generic type using a single API call. [This doesn't rule out specifying different inputs/outputs on the commandline, it's just you can specify them as their native types, rather than hiding them under a common API at the control-path level].

As for what that abstraction should be. There are a number of issues I see with ethdev - as it is right now, as that common abstraction.

1. The use of port-ids. I think port ids are fine for numbering physical ports, but I think pointers are better for passing around objects to be worked on by the data path. What is more concerning [than my opinion on numbers vs pointers :-)] is the fact that we are limited to 256 port ids. Yes, that can be changed, but the impacts are massive. To change the type, we would break the ABI for every single ethdev API, as well as likely other functions too. Furthermore, increasing the size of the port id would require a change to the internals of the mbuf structure, which would lead to the ABI being broken for any function that uses mbufs. By adopting an API, such as proposed, which uses pointers, we avoid the problem, as port ids would only apply to ethdevs.

2. Simplicity. While you say that its fine for an ethdev not to implement all the functions in the ethdev API, to create a proper PMD like you are proposing involves a good deal more work than using the proposed pktdev abstraction. If it's to appear like a proper NIC to the control paths, as well as the init paths - which seems to be what you imply - you really do need to implement additional functions like queue setup, and start and stop. While it's true that the library can return -ENOSUP on an unsupported function, I don't believe any of our sample apps are set up to check for this on NIC setup, and therefore I would hazard a guess that real-world customer apps aren't set up to handle it either.

3. Performance for rings. While not applicable for all cases, the performance of the rings under an ethdev abstraction would not be the same as here. For example, when polling on an empty ring for packets, the current time taken by our ring rx/tx functions, is literally a few cycles (as tested by the rings autotest). If these functions cannot be inlined, that cycle count goes up to 3x what it is now. [I observed this previously when doing reworking of the rings code, and the code-size led to icc no longer doing inlining. In that case, the gcc code for empty polling was indeed 3 times faster than the icc version. Adding forced inlining made things equal again]. This metric of empty polling may seem trivial i.e. "if there are no packets, why does it matter how long it takes?", but is important in real-world cases where you are pulling packets from multiple sources, and your application is only currently dealing with input on one of them. [Often tested to see how an application handles in a single-flow situation - an metric our customers do look at]. Even in the non-empty situation, for smaller packet bursts, the overhead of the function call may slow things down. [For larger bursts, e.g. 32, the effect should not be noticeable, I suspect].

The only other final point I'd make here is that what is proposed is not proscriptive - whatever a future API for handling other device types, such as crypto devices, may look like can be decided separately from this pktdev implementation. Whether one chooses pktdev or ethdev as a common abstraction layer type, the decision of whether or not a particular object type is allowed to be made look like that common type can be made entirely independently, and based upon whether or not such a type-conversion makes sense.

Regards,
/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle
  @ 2015-05-21 10:32  3%       ` Neil Horman
       [not found]             ` <20150521104300.00757b4e@urahara>
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-05-21 10:32 UTC (permalink / raw)
  To: Cunming Liang; +Cc: dev, liang-min.wang, shemming

On Thu, May 21, 2015 at 04:55:53PM +0800, Cunming Liang wrote:
> The patch adds interrupt vectors support in rte_intr_handle.
> 'vec_en' is set when interrupt vectors are detected and associated event fds are set.
> Those event fds are stored in efds[].
> 'intr_vec' is reserved for device driver to initialize the vector mapping table.
> When the event fds add to a specified epoll instance, 'elist' will hold the rte_epoll_event object pointer.
> 
> Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> ---
> v7 changes:
>  - add eptrs[], it's used to store the register rte_epoll_event instances.
>  - add vec_en, to log the vector capability status.
> 
> v6 changes:
>  - add mapping table between irq vector number and queue id.
> 
> v5 changes:
>  - Create this new patch file for changed struct rte_intr_handle that
>    other patches depend on, to avoid breaking git bisect.
> 
>  lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> index 6a159c7..27174df 100644
> --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
> @@ -38,6 +38,8 @@
>  #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
>  #define _RTE_LINUXAPP_INTERRUPTS_H_
>  
> +#define RTE_MAX_RXTX_INTR_VEC_ID     32
> +
>  enum rte_intr_handle_type {
>  	RTE_INTR_HANDLE_UNKNOWN = 0,
>  	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
> @@ -48,6 +50,8 @@ enum rte_intr_handle_type {
>  	RTE_INTR_HANDLE_MAX
>  };
>  
> +struct rte_epoll_event;
> +
>  /** Handle for interrupts. */
>  struct rte_intr_handle {
>  	union {
> @@ -57,6 +61,12 @@ struct rte_intr_handle {
>  	};
>  	int fd;	 /**< interrupt event file descriptor */
>  	enum rte_intr_handle_type type;  /**< handle type */
> +	uint32_t max_intr;               /**< max interrupt requested */
> +	uint32_t nb_efd;                 /**< number of available efds */
> +	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
> +	struct rte_epoll_event *elist[RTE_MAX_RXTX_INTR_VEC_ID];
> +					 /**< intr vector epoll event ptr */
> +	int *intr_vec;                   /**< intr vector number array */
>  };
>  

This is going to be ABI breaking if this from test_interrupts.c:
static struct rte_intr_handle intr_handles[TEST_INTERRUPT_HANDLE_MAX];

is a plausible way of using this structure.  Even putting the data at the end of
the structure won't help, as the array indicies are off
Neil

>  #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
> -- 
> 1.8.1.4
> 
> 

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 08/11] ethdev: add rx intr enable, disable and ctl functions
  2015-05-21  8:55  2%   ` [dpdk-dev] [PATCH v8 00/11] Interrupt mode PMD Cunming Liang
  @ 2015-05-21  8:56  2%     ` Cunming Liang
  2015-05-29  8:45  4%     ` [dpdk-dev] [PATCH v9 00/12] Interrupt mode PMD Cunming Liang
  2 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-05-21  8:56 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 127 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 +++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 235 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 024fe8b..1a47d9a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,133 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec,
+				     data, rte_eth_dev_socket_id(port_id));
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %d q %d rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec,
+			     data, rte_eth_dev_socket_id(port_id));
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %d q %d rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4648290..e5efec0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -829,6 +829,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1034,6 +1036,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1385,6 +1395,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2867,6 +2881,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v8 00/11] Interrupt mode PMD
  2015-05-05  5:39  3% ` [dpdk-dev] From: Cunming Liang <cunming.liang@intel.com> Cunming Liang
  2015-05-05  5:39  2%   ` [dpdk-dev] [PATCH v7 07/10] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-05-21  8:55  2%   ` Cunming Liang
                         ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: Cunming Liang @ 2015-05-21  8:55 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess.
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (11):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/bsd: dummy for new intr definition
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch

 examples/l3fwd-power/main.c                        | 207 +++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  20 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  77 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 351 +++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 160 +++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 127 ++++++
 lib/librte_ether/rte_ethdev.h                      | 104 +++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 lib/librte_pmd_e1000/igb_ethdev.c                  | 292 +++++++++++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                | 482 ++++++++++++++++++++-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h                |   4 +
 13 files changed, 1715 insertions(+), 126 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] Technical Steering Committee (TSC)
  @ 2015-05-19 20:21  3%         ` O'Driscoll, Tim
  0 siblings, 0 replies; 200+ results
From: O'Driscoll, Tim @ 2015-05-19 20:21 UTC (permalink / raw)
  To: Neil Horman, Thomas Monjalon; +Cc: dev


> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> 
> On Tue, May 19, 2015 at 05:45:05PM +0200, Thomas Monjalon wrote:
> > 2015-05-19 11:34, Neil Horman:
> > > On Tue, May 19, 2015 at 07:43:14AM -0700, Stephen Hemminger wrote:
> > > > > Composition of the TSC should reflect contributions to the
> project, but be
> > > > > balanced so that no single party has an undue influence. It
> should also be
> > > > > kept to a manageable size(maybe 7?).
> > > > >
> > > > > The TSC should elect its own chair, who would have the deciding
> vote in
> > > > > the event that the TSC was deadlocked. Once in place, the TSC
> should
> > > > > approve any new members.
> > > > >
> > > > > Specific details on membership can be discussed and agreed
> later, if we
> > > > > agree on the creation of a TSC.
> > > >
> > > > TSC should be limited to those individuals and companies that have
> > > > contributed in a non-trivial way to the DPDK distributed code
> base.
> > > > It should not be a users group, or place for network vendors who
> take but
> > > > never give back.
> > > >
> > > +1
> > >
> > > It should also endavour to only act as a fallback body for any
> issues commonly
> > > handled by the development communtiy (patch acceptance/review, etc)
> >
> > I agree that it should be a fallback.
> > And I'm wondering how useful it would be: have we ever known such
> discussion or
> > conflict without finding a solution or a consensus?
> Well, I suppose the jury is still out on that, since there are ongoing
> problems,
> in the form of patch latency, and such.  But for the most part, no,
> problems
> tend to reach consensus resolution IMO

It's true that there aren't many obvious examples, although as Neil points out there are still some things that are ongoing. One issue that springs to mind where we didn't reach consensus was inclusion of ABI Versioning in 1.8. It was subsequently included in 2.0, but there were people who believed it should have been in 1.8.

The other issue with having to reach consensus on everything is that it tends to a lowest common denominator approach, and can slow things down. There are times where a clear decision and then everybody moving forward is preferable.

> > By the way, is there a TSC in Linux netdev?
> >
> No, but there are TSC for many projects, including Openshift,
> freedesktop.org,
> etc.
> 
> Neil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 14/19] virtio: move virtio PMD to drivers/net
  @ 2015-05-15 15:56  1%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-05-15 15:56 UTC (permalink / raw)
  To: dev

Move virtio PMD to drivers/net directory

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/Makefile                             |    2 +-
 drivers/net/virtio/Makefile                      |   60 +
 drivers/net/virtio/rte_pmd_virtio_version.map    |    4 +
 drivers/net/virtio/virtio_ethdev.c               | 1504 ++++++++++++++++++++++
 drivers/net/virtio/virtio_ethdev.h               |  124 ++
 drivers/net/virtio/virtio_logs.h                 |   70 +
 drivers/net/virtio/virtio_pci.c                  |  147 +++
 drivers/net/virtio/virtio_pci.h                  |  270 ++++
 drivers/net/virtio/virtio_ring.h                 |  163 +++
 drivers/net/virtio/virtio_rxtx.c                 |  815 ++++++++++++
 drivers/net/virtio/virtqueue.c                   |   70 +
 drivers/net/virtio/virtqueue.h                   |  325 +++++
 lib/Makefile                                     |    1 -
 lib/librte_pmd_virtio/Makefile                   |   60 -
 lib/librte_pmd_virtio/rte_pmd_virtio_version.map |    4 -
 lib/librte_pmd_virtio/virtio_ethdev.c            | 1504 ----------------------
 lib/librte_pmd_virtio/virtio_ethdev.h            |  124 --
 lib/librte_pmd_virtio/virtio_logs.h              |   70 -
 lib/librte_pmd_virtio/virtio_pci.c               |  147 ---
 lib/librte_pmd_virtio/virtio_pci.h               |  270 ----
 lib/librte_pmd_virtio/virtio_ring.h              |  163 ---
 lib/librte_pmd_virtio/virtio_rxtx.c              |  815 ------------
 lib/librte_pmd_virtio/virtqueue.c                |   70 -
 lib/librte_pmd_virtio/virtqueue.h                |  325 -----
 24 files changed, 3553 insertions(+), 3554 deletions(-)
 create mode 100644 drivers/net/virtio/Makefile
 create mode 100644 drivers/net/virtio/rte_pmd_virtio_version.map
 create mode 100644 drivers/net/virtio/virtio_ethdev.c
 create mode 100644 drivers/net/virtio/virtio_ethdev.h
 create mode 100644 drivers/net/virtio/virtio_logs.h
 create mode 100644 drivers/net/virtio/virtio_pci.c
 create mode 100644 drivers/net/virtio/virtio_pci.h
 create mode 100644 drivers/net/virtio/virtio_ring.h
 create mode 100644 drivers/net/virtio/virtio_rxtx.c
 create mode 100644 drivers/net/virtio/virtqueue.c
 create mode 100644 drivers/net/virtio/virtqueue.h
 delete mode 100644 lib/librte_pmd_virtio/Makefile
 delete mode 100644 lib/librte_pmd_virtio/rte_pmd_virtio_version.map
 delete mode 100644 lib/librte_pmd_virtio/virtio_ethdev.c
 delete mode 100644 lib/librte_pmd_virtio/virtio_ethdev.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_logs.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_pci.c
 delete mode 100644 lib/librte_pmd_virtio/virtio_pci.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_ring.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_rxtx.c
 delete mode 100644 lib/librte_pmd_virtio/virtqueue.c
 delete mode 100644 lib/librte_pmd_virtio/virtqueue.h

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 267d386..48dd328 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -42,7 +42,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += ring
-#DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
+DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 #DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 #DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
new file mode 100644
index 0000000..21ff7e5
--- /dev/null
+++ b/drivers/net/virtio/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_virtio.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_virtio_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_net lib/librte_malloc
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/virtio/rte_pmd_virtio_version.map b/drivers/net/virtio/rte_pmd_virtio_version.map
new file mode 100644
index 0000000..ef35398
--- /dev/null
+++ b/drivers/net/virtio/rte_pmd_virtio_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+
+	local: *;
+};
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
new file mode 100644
index 0000000..e63dbfb
--- /dev/null
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -0,0 +1,1504 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#ifdef RTE_EXEC_ENV_LINUXAPP
+#include <dirent.h>
+#include <fcntl.h>
+#endif
+
+#include <rte_ethdev.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_pci.h>
+#include <rte_ether.h>
+#include <rte_common.h>
+
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_dev.h>
+
+#include "virtio_ethdev.h"
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+#include "virtqueue.h"
+
+
+static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
+static int  virtio_dev_configure(struct rte_eth_dev *dev);
+static int  virtio_dev_start(struct rte_eth_dev *dev);
+static void virtio_dev_stop(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
+static void virtio_dev_info_get(struct rte_eth_dev *dev,
+				struct rte_eth_dev_info *dev_info);
+static int virtio_dev_link_update(struct rte_eth_dev *dev,
+	__rte_unused int wait_to_complete);
+
+static void virtio_set_hwaddr(struct virtio_hw *hw);
+static void virtio_get_hwaddr(struct virtio_hw *hw);
+
+static void virtio_dev_rx_queue_release(__rte_unused void *rxq);
+static void virtio_dev_tx_queue_release(__rte_unused void *txq);
+
+static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
+static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
+static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
+static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
+				uint16_t vlan_id, int on);
+static void virtio_mac_addr_add(struct rte_eth_dev *dev,
+				struct ether_addr *mac_addr,
+				uint32_t index, uint32_t vmdq __rte_unused);
+static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+static void virtio_mac_addr_set(struct rte_eth_dev *dev,
+				struct ether_addr *mac_addr);
+
+static int virtio_dev_queue_stats_mapping_set(
+	__rte_unused struct rte_eth_dev *eth_dev,
+	__rte_unused uint16_t queue_id,
+	__rte_unused uint8_t stat_idx,
+	__rte_unused uint8_t is_rx);
+
+/*
+ * The set of PCI devices this driver supports
+ */
+static const struct rte_pci_id pci_id_virtio_map[] = {
+
+#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
+#include "rte_pci_dev_ids.h"
+
+{ .vendor_id = 0, /* sentinel */ },
+};
+
+static int
+virtio_send_command(struct virtqueue *vq, struct virtio_pmd_ctrl *ctrl,
+		int *dlen, int pkt_num)
+{
+	uint16_t head = vq->vq_desc_head_idx, i;
+	int k, sum = 0;
+	virtio_net_ctrl_ack status = ~0;
+	struct virtio_pmd_ctrl result;
+
+	ctrl->status = status;
+
+	if (!vq->hw->cvq) {
+		PMD_INIT_LOG(ERR,
+			     "%s(): Control queue is not supported.",
+			     __func__);
+		return -1;
+	}
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
+		"vq->hw->cvq = %p vq = %p",
+		vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+	if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1))
+		return -1;
+
+	memcpy(vq->virtio_net_hdr_mz->addr, ctrl,
+		sizeof(struct virtio_pmd_ctrl));
+
+	/*
+	 * Format is enforced in qemu code:
+	 * One TX packet for header;
+	 * At least one TX packet per argument;
+	 * One RX packet for ACK.
+	 */
+	vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
+	vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+	vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_free_cnt--;
+	i = vq->vq_ring.desc[head].next;
+
+	for (k = 0; k < pkt_num; k++) {
+		vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
+		vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+			+ sizeof(struct virtio_net_ctrl_hdr)
+			+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
+		vq->vq_ring.desc[i].len = dlen[k];
+		sum += dlen[k];
+		vq->vq_free_cnt--;
+		i = vq->vq_ring.desc[i].next;
+	}
+
+	vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+	vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+			+ sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_ring.desc[i].len = sizeof(ctrl->status);
+	vq->vq_free_cnt--;
+
+	vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
+
+	vq_update_avail_ring(vq, head);
+	vq_update_avail_idx(vq);
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index);
+
+	virtqueue_notify(vq);
+
+	rte_rmb();
+	while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+		rte_rmb();
+		usleep(100);
+	}
+
+	while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
+		uint32_t idx, desc_idx, used_idx;
+		struct vring_used_elem *uep;
+
+		used_idx = (uint32_t)(vq->vq_used_cons_idx
+				& (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+		idx = (uint32_t) uep->id;
+		desc_idx = idx;
+
+		while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
+			desc_idx = vq->vq_ring.desc[desc_idx].next;
+			vq->vq_free_cnt++;
+		}
+
+		vq->vq_ring.desc[desc_idx].next = vq->vq_desc_head_idx;
+		vq->vq_desc_head_idx = idx;
+
+		vq->vq_used_cons_idx++;
+		vq->vq_free_cnt++;
+	}
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d",
+			vq->vq_free_cnt, vq->vq_desc_head_idx);
+
+	memcpy(&result, vq->virtio_net_hdr_mz->addr,
+			sizeof(struct virtio_pmd_ctrl));
+
+	return result.status;
+}
+
+static int
+virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
+	memcpy(ctrl.data, &nb_queues, sizeof(uint16_t));
+
+	dlen[0] = sizeof(uint16_t);
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret) {
+		PMD_INIT_LOG(ERR, "Multiqueue configured but send command "
+			  "failed, this is too late now...");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int virtio_dev_queue_setup(struct rte_eth_dev *dev,
+			int queue_type,
+			uint16_t queue_idx,
+			uint16_t  vtpci_queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			struct virtqueue **pvq)
+{
+	char vq_name[VIRTQUEUE_MAX_NAME_SZ];
+	const struct rte_memzone *mz;
+	uint16_t vq_size;
+	int size;
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtqueue  *vq = NULL;
+
+	/* Write the virtqueue index to the Queue Select Field */
+	VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
+	PMD_INIT_LOG(DEBUG, "selecting queue: %d", vtpci_queue_idx);
+
+	/*
+	 * Read the virtqueue size from the Queue Size field
+	 * Always power of 2 and if 0 virtqueue does not exist
+	 */
+	vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
+	PMD_INIT_LOG(DEBUG, "vq_size: %d nb_desc:%d", vq_size, nb_desc);
+	if (nb_desc == 0)
+		nb_desc = vq_size;
+	if (vq_size == 0) {
+		PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
+		return -EINVAL;
+	} else if (!rte_is_power_of_2(vq_size)) {
+		PMD_INIT_LOG(ERR, "%s: virtqueue size is not powerof 2", __func__);
+		return -EINVAL;
+	} else if (nb_desc != vq_size) {
+		PMD_INIT_LOG(ERR, "Warning: nb_desc(%d) is not equal to vq size (%d), fall to vq size",
+			nb_desc, vq_size);
+		nb_desc = vq_size;
+	}
+
+	if (queue_type == VTNET_RQ) {
+		snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
+			dev->data->port_id, queue_idx);
+		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+	} else if (queue_type == VTNET_TQ) {
+		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
+			dev->data->port_id, queue_idx);
+		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+	} else if (queue_type == VTNET_CQ) {
+		snprintf(vq_name, sizeof(vq_name), "port%d_cvq",
+			dev->data->port_id);
+		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+			vq_size * sizeof(struct vq_desc_extra),
+			RTE_CACHE_LINE_SIZE);
+	}
+	if (vq == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
+		return (-ENOMEM);
+	}
+
+	vq->hw = hw;
+	vq->port_id = dev->data->port_id;
+	vq->queue_id = queue_idx;
+	vq->vq_queue_index = vtpci_queue_idx;
+	vq->vq_nentries = vq_size;
+	vq->vq_free_cnt = vq_size;
+
+	/*
+	 * Reserve a memzone for vring elements
+	 */
+	size = vring_size(vq_size, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
+	PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d", size, vq->vq_ring_size);
+
+	mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
+		socket_id, 0, VIRTIO_PCI_VRING_ALIGN);
+	if (mz == NULL) {
+		rte_free(vq);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((mz->phys_addr + vq->vq_ring_size - 1) >> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+		rte_free(vq);
+		return -ENOMEM;
+	}
+
+	memset(mz->addr, 0, sizeof(mz->len));
+	vq->mz = mz;
+	vq->vq_ring_mem = mz->phys_addr;
+	vq->vq_ring_virt_mem = mz->addr;
+	PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:      0x%"PRIx64, (uint64_t)mz->phys_addr);
+	PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, (uint64_t)mz->addr);
+	vq->virtio_net_hdr_mz  = NULL;
+	vq->virtio_net_hdr_mem = 0;
+
+	if (queue_type == VTNET_TQ) {
+		/*
+		 * For each xmit packet, allocate a virtio_net_hdr
+		 */
+		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
+			dev->data->port_id, queue_idx);
+		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
+			vq_size * hw->vtnet_hdr_size,
+			socket_id, 0, RTE_CACHE_LINE_SIZE);
+		if (vq->virtio_net_hdr_mz == NULL) {
+			rte_free(vq);
+			return -ENOMEM;
+		}
+		vq->virtio_net_hdr_mem =
+			vq->virtio_net_hdr_mz->phys_addr;
+		memset(vq->virtio_net_hdr_mz->addr, 0,
+			vq_size * hw->vtnet_hdr_size);
+	} else if (queue_type == VTNET_CQ) {
+		/* Allocate a page for control vq command, data and status */
+		snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
+			dev->data->port_id);
+		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
+			PAGE_SIZE, socket_id, 0, RTE_CACHE_LINE_SIZE);
+		if (vq->virtio_net_hdr_mz == NULL) {
+			rte_free(vq);
+			return -ENOMEM;
+		}
+		vq->virtio_net_hdr_mem =
+			vq->virtio_net_hdr_mz->phys_addr;
+		memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
+	}
+
+	/*
+	 * Set guest physical address of the virtqueue
+	 * in VIRTIO_PCI_QUEUE_PFN config register of device
+	 */
+	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
+			mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	*pvq = vq;
+	return 0;
+}
+
+static int
+virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx,
+		uint32_t socket_id)
+{
+	struct virtqueue *vq;
+	uint16_t nb_desc = 0;
+	int ret;
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
+			vtpci_queue_idx, nb_desc, socket_id, &vq);
+
+	if (ret < 0) {
+		PMD_INIT_LOG(ERR, "control vq initialization failed");
+		return ret;
+	}
+
+	hw->cvq = vq;
+	return 0;
+}
+
+static void
+virtio_dev_close(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct rte_pci_device *pci_dev = dev->pci_dev;
+
+	PMD_INIT_LOG(DEBUG, "virtio_dev_close");
+
+	/* reset the NIC */
+	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+	vtpci_reset(hw);
+	hw->started = 0;
+	virtio_dev_free_mbufs(dev);
+}
+
+static void
+virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+	ctrl.data[0] = 1;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to enable promisc");
+}
+
+static void
+virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+	ctrl.data[0] = 0;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to disable promisc");
+}
+
+static void
+virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+	ctrl.data[0] = 1;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to enable allmulticast");
+}
+
+static void
+virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+	ctrl.data[0] = 0;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
+}
+
+/*
+ * dev_ops for virtio, bare necessities for basic operation
+ */
+static const struct eth_dev_ops virtio_eth_dev_ops = {
+	.dev_configure           = virtio_dev_configure,
+	.dev_start               = virtio_dev_start,
+	.dev_stop                = virtio_dev_stop,
+	.dev_close               = virtio_dev_close,
+	.promiscuous_enable      = virtio_dev_promiscuous_enable,
+	.promiscuous_disable     = virtio_dev_promiscuous_disable,
+	.allmulticast_enable     = virtio_dev_allmulticast_enable,
+	.allmulticast_disable    = virtio_dev_allmulticast_disable,
+
+	.dev_infos_get           = virtio_dev_info_get,
+	.stats_get               = virtio_dev_stats_get,
+	.stats_reset             = virtio_dev_stats_reset,
+	.link_update             = virtio_dev_link_update,
+	.rx_queue_setup          = virtio_dev_rx_queue_setup,
+	/* meaningfull only to multiple queue */
+	.rx_queue_release        = virtio_dev_rx_queue_release,
+	.tx_queue_setup          = virtio_dev_tx_queue_setup,
+	/* meaningfull only to multiple queue */
+	.tx_queue_release        = virtio_dev_tx_queue_release,
+	/* collect stats per queue */
+	.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
+	.vlan_filter_set         = virtio_vlan_filter_set,
+	.mac_addr_add            = virtio_mac_addr_add,
+	.mac_addr_remove         = virtio_mac_addr_remove,
+	.mac_addr_set            = virtio_mac_addr_set,
+};
+
+static inline int
+virtio_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+				struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = link;
+	struct rte_eth_link *src = &(dev->data->dev_link);
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+			*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+/**
+ * Atomically writes the link status information into global
+ * structure rte_eth_dev.
+ *
+ * @param dev
+ *   - Pointer to the structure rte_eth_dev to read from.
+ *   - Pointer to the buffer to be saved with the link status.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, negative value.
+ */
+static inline int
+virtio_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = &(dev->data->dev_link);
+	struct rte_eth_link *src = link;
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+					*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+static void
+virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		const struct virtqueue *txvq = dev->data->tx_queues[i];
+		if (txvq == NULL)
+			continue;
+
+		stats->opackets += txvq->packets;
+		stats->obytes += txvq->bytes;
+		stats->oerrors += txvq->errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txvq->packets;
+			stats->q_obytes[i] = txvq->bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		const struct virtqueue *rxvq = dev->data->rx_queues[i];
+		if (rxvq == NULL)
+			continue;
+
+		stats->ipackets += rxvq->packets;
+		stats->ibytes += rxvq->bytes;
+		stats->ierrors += rxvq->errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxvq->packets;
+			stats->q_ibytes[i] = rxvq->bytes;
+		}
+	}
+
+	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
+}
+
+static void
+virtio_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned int i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct virtqueue *txvq = dev->data->tx_queues[i];
+		if (txvq == NULL)
+			continue;
+
+		txvq->packets = 0;
+		txvq->bytes = 0;
+		txvq->errors = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct virtqueue *rxvq = dev->data->rx_queues[i];
+		if (rxvq == NULL)
+			continue;
+
+		rxvq->packets = 0;
+		rxvq->bytes = 0;
+		rxvq->errors = 0;
+	}
+
+	dev->data->rx_mbuf_alloc_failed = 0;
+}
+
+static void
+virtio_set_hwaddr(struct virtio_hw *hw)
+{
+	vtpci_write_dev_config(hw,
+			offsetof(struct virtio_net_config, mac),
+			&hw->mac_addr, ETHER_ADDR_LEN);
+}
+
+static void
+virtio_get_hwaddr(struct virtio_hw *hw)
+{
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC)) {
+		vtpci_read_dev_config(hw,
+			offsetof(struct virtio_net_config, mac),
+			&hw->mac_addr, ETHER_ADDR_LEN);
+	} else {
+		eth_random_addr(&hw->mac_addr[0]);
+		virtio_set_hwaddr(hw);
+	}
+}
+
+static int
+virtio_mac_table_set(struct virtio_hw *hw,
+		     const struct virtio_net_ctrl_mac *uc,
+		     const struct virtio_net_ctrl_mac *mc)
+{
+	struct virtio_pmd_ctrl ctrl;
+	int err, len[2];
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+
+	len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
+	memcpy(ctrl.data, uc, len[0]);
+
+	len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
+	memcpy(ctrl.data + len[0], mc, len[1]);
+
+	err = virtio_send_command(hw->cvq, &ctrl, len, 2);
+	if (err != 0)
+		PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
+
+	return err;
+}
+
+static void
+virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		    uint32_t index, uint32_t vmdq __rte_unused)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	const struct ether_addr *addrs = dev->data->mac_addrs;
+	unsigned int i;
+	struct virtio_net_ctrl_mac *uc, *mc;
+
+	if (index >= VIRTIO_MAX_MAC_ADDRS) {
+		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+		return;
+	}
+
+	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
+	uc->entries = 0;
+	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
+	mc->entries = 0;
+
+	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+		const struct ether_addr *addr
+			= (i == index) ? mac_addr : addrs + i;
+		struct virtio_net_ctrl_mac *tbl
+			= is_multicast_ether_addr(addr) ? mc : uc;
+
+		memcpy(&tbl->macs[tbl->entries++], addr, ETHER_ADDR_LEN);
+	}
+
+	virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct ether_addr *addrs = dev->data->mac_addrs;
+	struct virtio_net_ctrl_mac *uc, *mc;
+	unsigned int i;
+
+	if (index >= VIRTIO_MAX_MAC_ADDRS) {
+		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+		return;
+	}
+
+	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
+	uc->entries = 0;
+	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
+	mc->entries = 0;
+
+	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+		struct virtio_net_ctrl_mac *tbl;
+
+		if (i == index || is_zero_ether_addr(addrs + i))
+			continue;
+
+		tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
+		memcpy(&tbl->macs[tbl->entries++], addrs + i, ETHER_ADDR_LEN);
+	}
+
+	virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
+
+	/* Use atomic update if available */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+		struct virtio_pmd_ctrl ctrl;
+		int len = ETHER_ADDR_LEN;
+
+		ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+		ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
+
+		memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
+		virtio_send_command(hw->cvq, &ctrl, &len, 1);
+	} else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
+		virtio_set_hwaddr(hw);
+}
+
+static int
+virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int len;
+
+	if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
+		return -ENOTSUP;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
+	ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
+	memcpy(ctrl.data, &vlan_id, sizeof(vlan_id));
+	len = sizeof(vlan_id);
+
+	return virtio_send_command(hw->cvq, &ctrl, &len, 1);
+}
+
+static void
+virtio_negotiate_features(struct virtio_hw *hw)
+{
+	uint32_t host_features, mask;
+
+	/* checksum offload not implemented */
+	mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
+
+	/* TSO and LRO are only available when their corresponding
+	 * checksum offload feature is also negotiated.
+	 */
+	mask |= VIRTIO_NET_F_HOST_TSO4 | VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_ECN;
+	mask |= VIRTIO_NET_F_GUEST_TSO4 | VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN;
+	mask |= VTNET_LRO_FEATURES;
+
+	/* not negotiating INDIRECT descriptor table support */
+	mask |= VIRTIO_RING_F_INDIRECT_DESC;
+
+	/* Prepare guest_features: feature that driver wants to support */
+	hw->guest_features = VTNET_FEATURES & ~mask;
+	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
+		hw->guest_features);
+
+	/* Read device(host) feature bits */
+	host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+	PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
+		host_features);
+
+	/*
+	 * Negotiate features: Subset of device feature bits are written back
+	 * guest feature bits.
+	 */
+	hw->guest_features = vtpci_negotiate_features(hw, host_features);
+	PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
+		hw->guest_features);
+}
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+static int
+parse_sysfs_value(const char *filename, unsigned long *val)
+{
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
+			     __func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
+			     __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoul(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
+			     __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
+
+static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen,
+			unsigned int *uio_num)
+{
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+	snprintf(dirname, sizeof(dirname),
+		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
+		     loc->domain, loc->bus, loc->devid, loc->function);
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		snprintf(dirname, sizeof(dirname),
+			     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
+			     loc->domain, loc->bus, loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */
+		errno = 0;
+		*uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */
+		errno = 0;
+		*uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			snprintf(buf, buflen, "%s/uio:uio%u", dirname,
+				     *uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL) {
+		PMD_INIT_LOG(ERR, "Could not find uio resource");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+virtio_has_msix(const struct rte_pci_addr *loc)
+{
+	DIR *d;
+	char dirname[PATH_MAX];
+
+	snprintf(dirname, sizeof(dirname),
+		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
+		     loc->domain, loc->bus, loc->devid, loc->function);
+
+	d = opendir(dirname);
+	if (d)
+		closedir(d);
+
+	return (d != NULL);
+}
+
+/* Extract I/O port numbers from sysfs */
+static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
+{
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	unsigned long start, size;
+	unsigned int uio_num;
+
+	if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname), &uio_num) < 0)
+		return -1;
+
+	/* get portio size */
+	snprintf(filename, sizeof(filename),
+		     "%s/portio/port0/size", dirname);
+	if (parse_sysfs_value(filename, &size) < 0) {
+		PMD_INIT_LOG(ERR, "%s(): cannot parse size",
+			     __func__);
+		return -1;
+	}
+
+	/* get portio start */
+	snprintf(filename, sizeof(filename),
+		 "%s/portio/port0/start", dirname);
+	if (parse_sysfs_value(filename, &start) < 0) {
+		PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
+			     __func__);
+		return -1;
+	}
+	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
+	pci_dev->mem_resource[0].len =  (uint64_t)size;
+	PMD_INIT_LOG(DEBUG,
+		     "PCI Port IO found start=0x%lx with size=0x%lx",
+		     start, size);
+
+	/* save fd */
+	memset(dirname, 0, sizeof(dirname));
+	snprintf(dirname, sizeof(dirname), "/dev/uio%u", uio_num);
+	pci_dev->intr_handle.fd = open(dirname, O_RDWR);
+	if (pci_dev->intr_handle.fd < 0) {
+		PMD_INIT_LOG(ERR, "Cannot open %s: %s\n",
+			dirname, strerror(errno));
+		return -1;
+	}
+
+	pci_dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+	pci_dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC;
+
+	return 0;
+}
+
+/* Extract port I/O numbers from proc/ioports */
+static int virtio_resource_init_by_ioports(struct rte_pci_device *pci_dev)
+{
+	uint16_t start, end;
+	int size;
+	FILE *fp;
+	char *line = NULL;
+	char pci_id[16];
+	int found = 0;
+	size_t linesz;
+
+	snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
+		 pci_dev->addr.domain,
+		 pci_dev->addr.bus,
+		 pci_dev->addr.devid,
+		 pci_dev->addr.function);
+
+	fp = fopen("/proc/ioports", "r");
+	if (fp == NULL) {
+		PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+		return -1;
+	}
+
+	while (getdelim(&line, &linesz, '\n', fp) > 0) {
+		char *ptr = line;
+		char *left;
+		int n;
+
+		n = strcspn(ptr, ":");
+		ptr[n] = 0;
+		left = &ptr[n+1];
+
+		while (*left && isspace(*left))
+			left++;
+
+		if (!strncmp(left, pci_id, strlen(pci_id))) {
+			found = 1;
+
+			while (*ptr && isspace(*ptr))
+				ptr++;
+
+			sscanf(ptr, "%04hx-%04hx", &start, &end);
+			size = end - start + 1;
+
+			break;
+		}
+	}
+
+	free(line);
+	fclose(fp);
+
+	if (!found)
+		return -1;
+
+	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
+	pci_dev->mem_resource[0].len =  (uint64_t)size;
+	PMD_INIT_LOG(DEBUG,
+		"PCI Port IO found start=0x%x with size=0x%x",
+		start, size);
+
+	/* can't support lsc interrupt without uio */
+	pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+
+	return 0;
+}
+
+/* Extract I/O port numbers from sysfs */
+static int virtio_resource_init(struct rte_pci_device *pci_dev)
+{
+	if (virtio_resource_init_by_uio(pci_dev) == 0)
+		return 0;
+	else
+		return virtio_resource_init_by_ioports(pci_dev);
+}
+
+#else
+static int
+virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
+{
+	/* nic_uio does not enable interrupts, return 0 (false). */
+	return 0;
+}
+
+static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused)
+{
+	/* no setup required */
+	return 0;
+}
+#endif
+
+/*
+ * Process Virtio Config changed interrupt and call the callback
+ * if link state changed.
+ */
+static void
+virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+			 void *param)
+{
+	struct rte_eth_dev *dev = param;
+	struct virtio_hw *hw = dev->data->dev_private;
+	uint8_t isr;
+
+	/* Read interrupt status which clears interrupt */
+	isr = vtpci_isr(hw);
+	PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
+
+	if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
+		PMD_DRV_LOG(ERR, "interrupt enable failed");
+
+	if (isr & VIRTIO_PCI_ISR_CONFIG) {
+		if (virtio_dev_link_update(dev, 0) == 0)
+			_rte_eth_dev_callback_process(dev,
+						      RTE_ETH_EVENT_INTR_LSC);
+	}
+
+}
+
+static void
+rx_func_get(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+		eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
+	else
+		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
+}
+
+/*
+ * This function is based on probe() function in virtio_pci.c
+ * It returns 0 on success.
+ */
+static int
+eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	struct virtio_net_config *config;
+	struct virtio_net_config local_config;
+	uint32_t offset_conf = sizeof(config->mac);
+	struct rte_pci_device *pci_dev;
+
+	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
+
+	eth_dev->dev_ops = &virtio_eth_dev_ops;
+	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		rx_func_get(eth_dev);
+		return 0;
+	}
+
+	/* Allocate memory for storing MAC addresses */
+	eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR,
+			"Failed to allocate %d bytes needed to store MAC addresses",
+			ETHER_ADDR_LEN);
+		return -ENOMEM;
+	}
+
+	pci_dev = eth_dev->pci_dev;
+	if (virtio_resource_init(pci_dev) < 0)
+		return -1;
+
+	hw->use_msix = virtio_has_msix(&pci_dev->addr);
+	hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+
+	/* Reset the device although not necessary at startup */
+	vtpci_reset(hw);
+
+	/* Tell the host we've noticed this device. */
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
+	/* Tell the host we've known how to drive the device. */
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+	virtio_negotiate_features(hw);
+
+	rx_func_get(eth_dev);
+
+	/* Setting up rx_header size for the device */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
+
+	/* Copy the permanent MAC address to: virtio_hw */
+	virtio_get_hwaddr(hw);
+	ether_addr_copy((struct ether_addr *) hw->mac_addr,
+			&eth_dev->data->mac_addrs[0]);
+	PMD_INIT_LOG(DEBUG,
+		     "PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
+		     hw->mac_addr[0], hw->mac_addr[1], hw->mac_addr[2],
+		     hw->mac_addr[3], hw->mac_addr[4], hw->mac_addr[5]);
+
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		config = &local_config;
+
+		if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+			offset_conf += sizeof(config->status);
+		} else {
+			PMD_INIT_LOG(DEBUG,
+				     "VIRTIO_NET_F_STATUS is not supported");
+			config->status = 0;
+		}
+
+		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
+			offset_conf += sizeof(config->max_virtqueue_pairs);
+		} else {
+			PMD_INIT_LOG(DEBUG,
+				     "VIRTIO_NET_F_MQ is not supported");
+			config->max_virtqueue_pairs = 1;
+		}
+
+		vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
+
+		hw->max_rx_queues =
+			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
+			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
+		hw->max_tx_queues =
+			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
+			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
+
+		virtio_dev_cq_queue_setup(eth_dev,
+					config->max_virtqueue_pairs * 2,
+					SOCKET_ID_ANY);
+
+		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
+				config->max_virtqueue_pairs);
+		PMD_INIT_LOG(DEBUG, "config->status=%d", config->status);
+		PMD_INIT_LOG(DEBUG,
+				"PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
+				config->mac[0], config->mac[1],
+				config->mac[2], config->mac[3],
+				config->mac[4], config->mac[5]);
+	} else {
+		hw->max_rx_queues = 1;
+		hw->max_tx_queues = 1;
+	}
+
+	eth_dev->data->nb_rx_queues = hw->max_rx_queues;
+	eth_dev->data->nb_tx_queues = hw->max_tx_queues;
+
+	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
+			hw->max_rx_queues, hw->max_tx_queues);
+	PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
+			eth_dev->data->port_id, pci_dev->id.vendor_id,
+			pci_dev->id.device_id);
+
+	/* Setup interrupt callback  */
+	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+		rte_intr_callback_register(&pci_dev->intr_handle,
+				   virtio_interrupt_handler, eth_dev);
+
+	virtio_dev_cq_start(eth_dev);
+
+	return 0;
+}
+
+static struct eth_driver rte_virtio_pmd = {
+	{
+		.name = "rte_virtio_pmd",
+		.id_table = pci_id_virtio_map,
+	},
+	.eth_dev_init = eth_virtio_dev_init,
+	.dev_private_size = sizeof(struct virtio_hw),
+};
+
+/*
+ * Driver initialization routine.
+ * Invoked once at EAL init time.
+ * Register itself as the [Poll Mode] Driver of PCI virtio devices.
+ * Returns 0 on success.
+ */
+static int
+rte_virtio_pmd_init(const char *name __rte_unused,
+		    const char *param __rte_unused)
+{
+	if (rte_eal_iopl_init() != 0) {
+		PMD_INIT_LOG(ERR, "IOPL call failed - cannot use virtio PMD");
+		return -1;
+	}
+
+	rte_eth_driver_register(&rte_virtio_pmd);
+	return 0;
+}
+
+/*
+ * Only 1 queue is supported, no queue release related operation
+ */
+static void
+virtio_dev_rx_queue_release(__rte_unused void *rxq)
+{
+}
+
+static void
+virtio_dev_tx_queue_release(__rte_unused void *txq)
+{
+}
+
+/*
+ * Configure virtio device
+ * It returns 0 on success.
+ */
+static int
+virtio_dev_configure(struct rte_eth_dev *dev)
+{
+	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct rte_pci_device *pci_dev = dev->pci_dev;
+
+	PMD_INIT_LOG(DEBUG, "configure");
+
+	if (rxmode->hw_ip_checksum) {
+		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
+		return (-EINVAL);
+	}
+
+	hw->vlan_strip = rxmode->hw_vlan_strip;
+
+	if (rxmode->hw_vlan_filter
+	    && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
+		PMD_DRV_LOG(NOTICE,
+			    "vlan filtering not available on this host");
+		return -ENOTSUP;
+	}
+
+	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+		if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
+			PMD_DRV_LOG(ERR, "failed to set config vector");
+			return -EBUSY;
+		}
+
+	return 0;
+}
+
+
+static int
+virtio_dev_start(struct rte_eth_dev *dev)
+{
+	uint16_t nb_queues, i;
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct rte_pci_device *pci_dev = dev->pci_dev;
+
+	/* check if lsc interrupt feature is enabled */
+	if ((dev->data->dev_conf.intr_conf.lsc) &&
+		(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
+		if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+			PMD_DRV_LOG(ERR, "link status not supported by host");
+			return -ENOTSUP;
+		}
+
+		if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
+			PMD_DRV_LOG(ERR, "interrupt enable failed");
+			return -EIO;
+		}
+	}
+
+	/* Initialize Link state */
+	virtio_dev_link_update(dev, 0);
+
+	/* On restart after stop do not touch queues */
+	if (hw->started)
+		return 0;
+
+	/* Do final configuration before rx/tx engine starts */
+	virtio_dev_rxtx_start(dev);
+	vtpci_reinit_complete(hw);
+
+	hw->started = 1;
+
+	/*Notify the backend
+	 *Otherwise the tap backend might already stop its queue due to fullness.
+	 *vhost backend will have no chance to be waked up
+	 */
+	nb_queues = dev->data->nb_rx_queues;
+	if (nb_queues > 1) {
+		if (virtio_set_multiple_queues(dev, nb_queues) != 0)
+			return -EINVAL;
+	}
+
+	PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
+
+	for (i = 0; i < nb_queues; i++)
+		virtqueue_notify(dev->data->rx_queues[i]);
+
+	PMD_INIT_LOG(DEBUG, "Notified backend at initialization");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+
+	return 0;
+}
+
+static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)
+{
+	struct rte_mbuf *buf;
+	int i, mbuf_num = 0;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		PMD_INIT_LOG(DEBUG,
+			     "Before freeing rxq[%d] used and unused buf", i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+
+		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
+					dev->data->rx_queues[i])) != NULL) {
+			rte_pktmbuf_free(buf);
+			mbuf_num++;
+		}
+
+		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
+		PMD_INIT_LOG(DEBUG,
+			     "After freeing rxq[%d] used and unused buf", i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+	}
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		PMD_INIT_LOG(DEBUG,
+			     "Before freeing txq[%d] used and unused bufs",
+			     i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+
+		mbuf_num = 0;
+		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
+					dev->data->tx_queues[i])) != NULL) {
+			rte_pktmbuf_free(buf);
+
+			mbuf_num++;
+		}
+
+		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
+		PMD_INIT_LOG(DEBUG,
+			     "After freeing txq[%d] used and unused buf", i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+	}
+}
+
+/*
+ * Stop device: disable interrupt and mark link down
+ */
+static void
+virtio_dev_stop(struct rte_eth_dev *dev)
+{
+	struct rte_eth_link link;
+
+	PMD_INIT_LOG(DEBUG, "stop");
+
+	if (dev->data->dev_conf.intr_conf.lsc)
+		rte_intr_disable(&dev->pci_dev->intr_handle);
+
+	memset(&link, 0, sizeof(link));
+	virtio_dev_atomic_write_link_status(dev, &link);
+}
+
+static int
+virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct rte_eth_link link, old;
+	uint16_t status;
+	struct virtio_hw *hw = dev->data->dev_private;
+	memset(&link, 0, sizeof(link));
+	virtio_dev_atomic_read_link_status(dev, &link);
+	old = link;
+	link.link_duplex = FULL_DUPLEX;
+	link.link_speed  = SPEED_10G;
+
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+		PMD_INIT_LOG(DEBUG, "Get link status from hw");
+		vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config, status),
+				&status, sizeof(status));
+		if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
+			link.link_status = 0;
+			PMD_INIT_LOG(DEBUG, "Port %d is down",
+				     dev->data->port_id);
+		} else {
+			link.link_status = 1;
+			PMD_INIT_LOG(DEBUG, "Port %d is up",
+				     dev->data->port_id);
+		}
+	} else {
+		link.link_status = 1;   /* Link up */
+	}
+	virtio_dev_atomic_write_link_status(dev, &link);
+
+	return (old.link_status == link.link_status) ? -1 : 0;
+}
+
+static void
+virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	dev_info->driver_name = dev->driver->pci_drv.name;
+	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
+	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
+	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
+	};
+}
+
+/*
+ * It enables testpmd to collect per queue stats.
+ */
+static int
+virtio_dev_queue_stats_mapping_set(__rte_unused struct rte_eth_dev *eth_dev,
+__rte_unused uint16_t queue_id, __rte_unused uint8_t stat_idx,
+__rte_unused uint8_t is_rx)
+{
+	return 0;
+}
+
+static struct rte_driver rte_virtio_driver = {
+	.type = PMD_PDEV,
+	.init = rte_virtio_pmd_init,
+};
+
+PMD_REGISTER_DRIVER(rte_virtio_driver);
diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
new file mode 100644
index 0000000..e6d4533
--- /dev/null
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -0,0 +1,124 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_ETHDEV_H_
+#define _VIRTIO_ETHDEV_H_
+
+#include <stdint.h>
+
+#include "virtio_pci.h"
+
+#define SPEED_10	10
+#define SPEED_100	100
+#define SPEED_1000	1000
+#define SPEED_10G	10000
+#define HALF_DUPLEX	1
+#define FULL_DUPLEX	2
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+
+#define VIRTIO_MAX_RX_QUEUES 128
+#define VIRTIO_MAX_TX_QUEUES 128
+#define VIRTIO_MAX_MAC_ADDRS 64
+#define VIRTIO_MIN_RX_BUFSIZE 64
+#define VIRTIO_MAX_RX_PKTLEN  9728
+
+/* Features desired/implemented by this driver. */
+#define VTNET_FEATURES \
+	(VIRTIO_NET_F_MAC       | \
+	VIRTIO_NET_F_STATUS     | \
+	VIRTIO_NET_F_MQ         | \
+	VIRTIO_NET_F_CTRL_MAC_ADDR | \
+	VIRTIO_NET_F_CTRL_VQ    | \
+	VIRTIO_NET_F_CTRL_RX    | \
+	VIRTIO_NET_F_CTRL_VLAN  | \
+	VIRTIO_NET_F_CSUM       | \
+	VIRTIO_NET_F_HOST_TSO4  | \
+	VIRTIO_NET_F_HOST_TSO6  | \
+	VIRTIO_NET_F_HOST_ECN   | \
+	VIRTIO_NET_F_GUEST_CSUM | \
+	VIRTIO_NET_F_GUEST_TSO4 | \
+	VIRTIO_NET_F_GUEST_TSO6 | \
+	VIRTIO_NET_F_GUEST_ECN  | \
+	VIRTIO_NET_F_MRG_RXBUF  | \
+	VIRTIO_RING_F_INDIRECT_DESC)
+
+/*
+ * CQ function prototype
+ */
+void virtio_dev_cq_start(struct rte_eth_dev *dev);
+
+/*
+ * RX/TX function prototypes
+ */
+void virtio_dev_rxtx_start(struct rte_eth_dev *dev);
+
+int virtio_dev_queue_setup(struct rte_eth_dev *dev,
+			int queue_type,
+			uint16_t queue_idx,
+			uint16_t  vtpci_queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			struct virtqueue **pvq);
+
+int  virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		struct rte_mempool *mb_pool);
+
+int  virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		uint16_t nb_tx_desc, unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf);
+
+uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
+
+/*
+ * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
+ * frames larger than 1514 bytes. We do not yet support software LRO
+ * via tcp_lro_rx().
+ */
+#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
+			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
+
+
+#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/net/virtio/virtio_logs.h b/drivers/net/virtio/virtio_logs.h
new file mode 100644
index 0000000..d6c33f7
--- /dev/null
+++ b/drivers/net/virtio/virtio_logs.h
@@ -0,0 +1,70 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while(0)
+#define PMD_INIT_FUNC_TRACE() do { } while(0)
+#endif
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt , __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while(0)
+#endif
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt , __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while(0)
+#endif
+
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt , __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while(0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
new file mode 100644
index 0000000..2245bec
--- /dev/null
+++ b/drivers/net/virtio/virtio_pci.c
@@ -0,0 +1,147 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+
+static uint8_t vtpci_get_status(struct virtio_hw *);
+
+void
+vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+		void *dst, int length)
+{
+	uint64_t off;
+	uint8_t *d;
+	int size;
+
+	off = VIRTIO_PCI_CONFIG(hw) + offset;
+	for (d = dst; length > 0; d += size, off += size, length -= size) {
+		if (length >= 4) {
+			size = 4;
+			*(uint32_t *)d = VIRTIO_READ_REG_4(hw, off);
+		} else if (length >= 2) {
+			size = 2;
+			*(uint16_t *)d = VIRTIO_READ_REG_2(hw, off);
+		} else {
+			size = 1;
+			*d = VIRTIO_READ_REG_1(hw, off);
+		}
+	}
+}
+
+void
+vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
+		void *src, int length)
+{
+	uint64_t off;
+	uint8_t *s;
+	int size;
+
+	off = VIRTIO_PCI_CONFIG(hw) + offset;
+	for (s = src; length > 0; s += size, off += size, length -= size) {
+		if (length >= 4) {
+			size = 4;
+			VIRTIO_WRITE_REG_4(hw, off, *(uint32_t *)s);
+		} else if (length >= 2) {
+			size = 2;
+			VIRTIO_WRITE_REG_2(hw, off, *(uint16_t *)s);
+		} else {
+			size = 1;
+			VIRTIO_WRITE_REG_1(hw, off, *s);
+		}
+	}
+}
+
+uint32_t
+vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
+{
+	uint32_t features;
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+
+	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
+	return features;
+}
+
+
+void
+vtpci_reset(struct virtio_hw *hw)
+{
+	/*
+	 * Setting the status to RESET sets the host device to
+	 * the original, uninitialized state.
+	 */
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	vtpci_get_status(hw);
+}
+
+void
+vtpci_reinit_complete(struct virtio_hw *hw)
+{
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+static uint8_t
+vtpci_get_status(struct virtio_hw *hw)
+{
+	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
+}
+
+void
+vtpci_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status = (uint8_t)(status | vtpci_get_status(hw));
+
+	VIRTIO_WRITE_REG_1(hw, VIRTIO_PCI_STATUS, status);
+}
+
+uint8_t
+vtpci_isr(struct virtio_hw *hw)
+{
+
+	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_ISR);
+}
+
+
+/* Enable one vector (0) for Link State Intrerrupt */
+uint16_t
+vtpci_irq_config(struct virtio_hw *hw, uint16_t vec)
+{
+	VIRTIO_WRITE_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR, vec);
+	return VIRTIO_READ_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR);
+}
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
new file mode 100644
index 0000000..64d9c34
--- /dev/null
+++ b/drivers/net/virtio/virtio_pci.h
@@ -0,0 +1,270 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <stdint.h>
+
+#ifdef __FreeBSD__
+#include <sys/types.h>
+#include <machine/cpufunc.h>
+#else
+#include <sys/io.h>
+#endif
+
+#include <rte_ethdev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_PCI_VENDORID     0x1AF4
+#define VIRTIO_PCI_DEVICEID_MIN 0x1000
+#define VIRTIO_PCI_DEVICEID_MAX 0x103F
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
+				      * also clears the register (8, RO) */
+/* Only if MSIX is enabled: */
+#define VIRTIO_MSI_CONFIG_VECTOR  20 /* configuration change vector (16, RW) */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22 /* vector for selected VQ notifications
+				      (16, RW) */
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* VirtIO device IDs. */
+#define VIRTIO_ID_NETWORK  0x01
+#define VIRTIO_ID_BLOCK    0x02
+#define VIRTIO_ID_CONSOLE  0x03
+#define VIRTIO_ID_ENTROPY  0x04
+#define VIRTIO_ID_BALLOON  0x05
+#define VIRTIO_ID_IOMEMORY 0x06
+#define VIRTIO_ID_9P       0x09
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Generate interrupt when the virtqueue ring is
+ * completely used, even if we've suppressed them.
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY (1 << 24)
+
+/*
+ * The guest should never negotiate this feature; it
+ * is used to detect faulty drivers.
+ */
+#define VIRTIO_F_BAD_FEATURE (1 << 30)
+
+/*
+ * Some VirtIO feature bits (currently bits 28 through 31) are
+ * reserved for the transport being used (eg. virtio_ring), the
+ * rest are per-device feature bits.
+ */
+#define VIRTIO_TRANSPORT_F_START 28
+#define VIRTIO_TRANSPORT_F_END   32
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* The feature bitmap for virtio net */
+#define VIRTIO_NET_F_CSUM       0x00001 /* Host handles pkts w/ partial csum */
+#define VIRTIO_NET_F_GUEST_CSUM 0x00002 /* Guest handles pkts w/ partial csum*/
+#define VIRTIO_NET_F_MAC        0x00020 /* Host has given MAC address. */
+#define VIRTIO_NET_F_GSO        0x00040 /* Host handles pkts w/ any GSO type */
+#define VIRTIO_NET_F_GUEST_TSO4 0x00080 /* Guest can handle TSOv4 in. */
+#define VIRTIO_NET_F_GUEST_TSO6 0x00100 /* Guest can handle TSOv6 in. */
+#define VIRTIO_NET_F_GUEST_ECN  0x00200 /* Guest can handle TSO[6] w/ ECN in.*/
+#define VIRTIO_NET_F_GUEST_UFO  0x00400 /* Guest can handle UFO in. */
+#define VIRTIO_NET_F_HOST_TSO4  0x00800 /* Host can handle TSOv4 in. */
+#define VIRTIO_NET_F_HOST_TSO6  0x01000 /* Host can handle TSOv6 in. */
+#define VIRTIO_NET_F_HOST_ECN   0x02000 /* Host can handle TSO[6] w/ ECN in. */
+#define VIRTIO_NET_F_HOST_UFO   0x04000 /* Host can handle UFO in. */
+#define VIRTIO_NET_F_MRG_RXBUF  0x08000 /* Host can merge receive buffers. */
+#define VIRTIO_NET_F_STATUS     0x10000 /* virtio_net_config.status available*/
+#define VIRTIO_NET_F_CTRL_VQ    0x20000 /* Control channel available */
+#define VIRTIO_NET_F_CTRL_RX    0x40000 /* Control channel RX mode support */
+#define VIRTIO_NET_F_CTRL_VLAN  0x80000 /* Control channel VLAN filtering */
+#define VIRTIO_NET_F_CTRL_RX_EXTRA  0x100000 /* Extra RX mode control support */
+#define VIRTIO_RING_F_INDIRECT_DESC 0x10000000 /* Support for indirect buffer descriptors. */
+/* The guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ * The host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX 0x20000000
+
+#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
+
+/*
+ * Maximum number of virtqueues per device.
+ */
+#define VIRTIO_MAX_VIRTQUEUES 8
+
+struct virtio_hw {
+	struct virtqueue *cvq;
+	uint32_t    io_base;
+	uint32_t    guest_features;
+	uint32_t    max_tx_queues;
+	uint32_t    max_rx_queues;
+	uint16_t    vtnet_hdr_size;
+	uint8_t	    vlan_strip;
+	uint8_t	    use_msix;
+	uint8_t     started;
+	uint8_t     mac_addr[ETHER_ADDR_LEN];
+};
+
+/*
+ * This structure is just a reference to read
+ * net device specific config space; it just a chodu structure
+ *
+ */
+struct virtio_net_config {
+	/* The config defining mac address (if VIRTIO_NET_F_MAC) */
+	uint8_t    mac[ETHER_ADDR_LEN];
+	/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
+	uint16_t   status;
+	uint16_t   max_virtqueue_pairs;
+} __attribute__((packed));
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+#ifdef __FreeBSD__
+
+static inline void
+outb_p(unsigned char data, unsigned int port)
+{
+
+	outb(port, (u_char)data);
+}
+
+static inline void
+outw_p(unsigned short data, unsigned int port)
+{
+	outw(port, (u_short)data);
+}
+
+static inline void
+outl_p(unsigned int data, unsigned int port)
+{
+	outl(port, (u_int)data);
+}
+#endif
+
+#define VIRTIO_PCI_REG_ADDR(hw, reg) \
+	(unsigned short)((hw)->io_base + (reg))
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+	inb((VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+	outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+
+#define VIRTIO_READ_REG_2(hw, reg) \
+	inw((VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+	outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+
+#define VIRTIO_READ_REG_4(hw, reg) \
+	inl((VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+	outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+
+static inline int
+vtpci_with_feature(struct virtio_hw *hw, uint32_t feature)
+{
+	return (hw->guest_features & feature) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+void vtpci_reset(struct virtio_hw *);
+
+void vtpci_reinit_complete(struct virtio_hw *);
+
+void vtpci_set_status(struct virtio_hw *, uint8_t);
+
+uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
+
+void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);
+
+void vtpci_read_dev_config(struct virtio_hw *, uint64_t, void *, int);
+
+uint8_t vtpci_isr(struct virtio_hw *);
+
+uint16_t vtpci_irq_config(struct virtio_hw *, uint16_t);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/net/virtio/virtio_ring.h b/drivers/net/virtio/virtio_ring.h
new file mode 100644
index 0000000..a16c499
--- /dev/null
+++ b/drivers/net/virtio/virtio_ring.h
@@ -0,0 +1,163 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers. */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.  */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next". */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline int
+vring_size(unsigned int num, unsigned long align)
+{
+	int size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
new file mode 100644
index 0000000..3ff275c
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -0,0 +1,815 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
+#define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
+#else
+#define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
+#endif
+
+static void
+vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
+{
+	struct vring_desc *dp, *dp_tail;
+	struct vq_desc_extra *dxp;
+	uint16_t desc_idx_last = desc_idx;
+
+	dp  = &vq->vq_ring.desc[desc_idx];
+	dxp = &vq->vq_descx[desc_idx];
+	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt + dxp->ndescs);
+	if ((dp->flags & VRING_DESC_F_INDIRECT) == 0) {
+		while (dp->flags & VRING_DESC_F_NEXT) {
+			desc_idx_last = dp->next;
+			dp = &vq->vq_ring.desc[dp->next];
+		}
+	}
+	dxp->ndescs = 0;
+
+	/*
+	 * We must append the existing free chain, if any, to the end of
+	 * newly freed chain. If the virtqueue was completely used, then
+	 * head would be VQ_RING_DESC_CHAIN_END (ASSERTed above).
+	 */
+	if (vq->vq_desc_tail_idx == VQ_RING_DESC_CHAIN_END) {
+		vq->vq_desc_head_idx = desc_idx;
+	} else {
+		dp_tail = &vq->vq_ring.desc[vq->vq_desc_tail_idx];
+		dp_tail->next = desc_idx;
+	}
+
+	vq->vq_desc_tail_idx = desc_idx_last;
+	dp->next = VQ_RING_DESC_CHAIN_END;
+}
+
+static uint16_t
+virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct rte_mbuf **rx_pkts,
+			   uint32_t *len, uint16_t num)
+{
+	struct vring_used_elem *uep;
+	struct rte_mbuf *cookie;
+	uint16_t used_idx, desc_idx;
+	uint16_t i;
+
+	/*  Caller does the check */
+	for (i = 0; i < num ; i++) {
+		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+		desc_idx = (uint16_t) uep->id;
+		len[i] = uep->len;
+		cookie = (struct rte_mbuf *)vq->vq_descx[desc_idx].cookie;
+
+		if (unlikely(cookie == NULL)) {
+			PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie at %u\n",
+				vq->vq_used_cons_idx);
+			break;
+		}
+
+		rte_prefetch0(cookie);
+		rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *));
+		rx_pkts[i]  = cookie;
+		vq->vq_used_cons_idx++;
+		vq_ring_free_chain(vq, desc_idx);
+		vq->vq_descx[desc_idx].cookie = NULL;
+	}
+
+	return i;
+}
+
+#ifndef DEFAULT_TX_FREE_THRESH
+#define DEFAULT_TX_FREE_THRESH 32
+#endif
+
+/* Cleanup from completed transmits. */
+static void
+virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
+{
+	uint16_t i, used_idx, desc_idx;
+	for (i = 0; i < num; i++) {
+		struct vring_used_elem *uep;
+		struct vq_desc_extra *dxp;
+
+		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+
+		desc_idx = (uint16_t) uep->id;
+		dxp = &vq->vq_descx[desc_idx];
+		vq->vq_used_cons_idx++;
+		vq_ring_free_chain(vq, desc_idx);
+
+		if (dxp->cookie != NULL) {
+			rte_pktmbuf_free(dxp->cookie);
+			dxp->cookie = NULL;
+		}
+	}
+}
+
+
+static inline int
+virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct virtio_hw *hw = vq->hw;
+	struct vring_desc *start_dp;
+	uint16_t needed = 1;
+	uint16_t head_idx, idx;
+
+	if (unlikely(vq->vq_free_cnt == 0))
+		return -ENOSPC;
+	if (unlikely(vq->vq_free_cnt < needed))
+		return -EMSGSIZE;
+
+	head_idx = vq->vq_desc_head_idx;
+	if (unlikely(head_idx >= vq->vq_nentries))
+		return -EFAULT;
+
+	idx = head_idx;
+	dxp = &vq->vq_descx[idx];
+	dxp->cookie = (void *)cookie;
+	dxp->ndescs = needed;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[idx].addr =
+		(uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
+		- hw->vtnet_hdr_size);
+	start_dp[idx].len =
+		cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
+	start_dp[idx].flags =  VRING_DESC_F_WRITE;
+	idx = start_dp[idx].next;
+	vq->vq_desc_head_idx = idx;
+	if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
+		vq->vq_desc_tail_idx = idx;
+	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed);
+	vq_update_avail_ring(vq, head_idx);
+
+	return 0;
+}
+
+static int
+virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t seg_num = cookie->nb_segs;
+	uint16_t needed = 1 + seg_num;
+	uint16_t head_idx, idx;
+	uint16_t head_size = txvq->hw->vtnet_hdr_size;
+
+	if (unlikely(txvq->vq_free_cnt == 0))
+		return -ENOSPC;
+	if (unlikely(txvq->vq_free_cnt < needed))
+		return -EMSGSIZE;
+	head_idx = txvq->vq_desc_head_idx;
+	if (unlikely(head_idx >= txvq->vq_nentries))
+		return -EFAULT;
+
+	idx = head_idx;
+	dxp = &txvq->vq_descx[idx];
+	dxp->cookie = (void *)cookie;
+	dxp->ndescs = needed;
+
+	start_dp = txvq->vq_ring.desc;
+	start_dp[idx].addr =
+		txvq->virtio_net_hdr_mem + idx * head_size;
+	start_dp[idx].len = (uint32_t)head_size;
+	start_dp[idx].flags = VRING_DESC_F_NEXT;
+
+	for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
+		idx = start_dp[idx].next;
+		start_dp[idx].addr  = RTE_MBUF_DATA_DMA_ADDR(cookie);
+		start_dp[idx].len   = cookie->data_len;
+		start_dp[idx].flags = VRING_DESC_F_NEXT;
+		cookie = cookie->next;
+	}
+
+	start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
+	idx = start_dp[idx].next;
+	txvq->vq_desc_head_idx = idx;
+	if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
+		txvq->vq_desc_tail_idx = idx;
+	txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed);
+	vq_update_avail_ring(txvq, head_idx);
+
+	return 0;
+}
+
+static inline struct rte_mbuf *
+rte_rxmbuf_alloc(struct rte_mempool *mp)
+{
+	struct rte_mbuf *m;
+
+	m = __rte_mbuf_raw_alloc(mp);
+	__rte_mbuf_sanity_check_raw(m, 0);
+
+	return m;
+}
+
+static void
+virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
+{
+	struct rte_mbuf *m;
+	int i, nbufs, error, size = vq->vq_nentries;
+	struct vring *vr = &vq->vq_ring;
+	uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/*
+	 * Reinitialise since virtio port might have been stopped and restarted
+	 */
+	memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
+	vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_used_cons_idx = 0;
+	vq->vq_desc_head_idx = 0;
+	vq->vq_avail_idx = 0;
+	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+	vq->vq_free_cnt = vq->vq_nentries;
+	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+
+	/* Chain all the descriptors in the ring with an END */
+	for (i = 0; i < size - 1; i++)
+		vr->desc[i].next = (uint16_t)(i + 1);
+	vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
+
+	/*
+	 * Disable device(host) interrupting guest
+	 */
+	virtqueue_disable_intr(vq);
+
+	/* Only rx virtqueue needs mbufs to be allocated at initialization */
+	if (queue_type == VTNET_RQ) {
+		if (vq->mpool == NULL)
+			rte_exit(EXIT_FAILURE,
+			"Cannot allocate initial mbufs for rx virtqueue");
+
+		/* Allocate blank mbufs for the each rx descriptor */
+		nbufs = 0;
+		error = ENOSPC;
+		while (!virtqueue_full(vq)) {
+			m = rte_rxmbuf_alloc(vq->mpool);
+			if (m == NULL)
+				break;
+
+			/******************************************
+			*         Enqueue allocated buffers        *
+			*******************************************/
+			error = virtqueue_enqueue_recv_refill(vq, m);
+
+			if (error) {
+				rte_pktmbuf_free(m);
+				break;
+			}
+			nbufs++;
+		}
+
+		vq_update_avail_idx(vq);
+
+		PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
+
+		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
+			vq->vq_queue_index);
+		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
+			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	} else if (queue_type == VTNET_TQ) {
+		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
+			vq->vq_queue_index);
+		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
+			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	} else {
+		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
+			vq->vq_queue_index);
+		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
+			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	}
+}
+
+void
+virtio_dev_cq_start(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	if (hw->cvq) {
+		virtio_dev_vring_start(hw->cvq, VTNET_CQ);
+		VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq);
+	}
+}
+
+void
+virtio_dev_rxtx_start(struct rte_eth_dev *dev)
+{
+	/*
+	 * Start receive and transmit vrings
+	 * -	Setup vring structure for all queues
+	 * -	Initialize descriptor for the rx vring
+	 * -	Allocate blank mbufs for the each rx descriptor
+	 *
+	 */
+	int i;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Start rx vring. */
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		virtio_dev_vring_start(dev->data->rx_queues[i], VTNET_RQ);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+	}
+
+	/* Start tx vring. */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		virtio_dev_vring_start(dev->data->tx_queues[i], VTNET_TQ);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+	}
+}
+
+int
+virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
+			uint16_t queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			__rte_unused const struct rte_eth_rxconf *rx_conf,
+			struct rte_mempool *mp)
+{
+	uint16_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_RQ_QUEUE_IDX;
+	struct virtqueue *vq;
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+	ret = virtio_dev_queue_setup(dev, VTNET_RQ, queue_idx, vtpci_queue_idx,
+			nb_desc, socket_id, &vq);
+	if (ret < 0) {
+		PMD_INIT_LOG(ERR, "tvq initialization failed");
+		return ret;
+	}
+
+	/* Create mempool for rx mbuf allocation */
+	vq->mpool = mp;
+
+	dev->data->rx_queues[queue_idx] = vq;
+	return 0;
+}
+
+/*
+ * struct rte_eth_dev *dev: Used to update dev
+ * uint16_t nb_desc: Defaults to values read from config space
+ * unsigned int socket_id: Used to allocate memzone
+ * const struct rte_eth_txconf *tx_conf: Used to setup tx engine
+ * uint16_t queue_idx: Just used as an index in dev txq list
+ */
+int
+virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
+			uint16_t queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			const struct rte_eth_txconf *tx_conf)
+{
+	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+	struct virtqueue *vq;
+	uint16_t tx_free_thresh;
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
+	    != ETH_TXQ_FLAGS_NOXSUMS) {
+		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
+		return -EINVAL;
+	}
+
+	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
+			nb_desc, socket_id, &vq);
+	if (ret < 0) {
+		PMD_INIT_LOG(ERR, "rvq initialization failed");
+		return ret;
+	}
+
+	tx_free_thresh = tx_conf->tx_free_thresh;
+	if (tx_free_thresh == 0)
+		tx_free_thresh =
+			RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
+
+	if (tx_free_thresh >= (vq->vq_nentries - 3)) {
+		RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+			"number of TX entries minus 3 (%u)."
+			" (tx_free_thresh=%u port=%u queue=%u)\n",
+			vq->vq_nentries - 3,
+			tx_free_thresh, dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	vq->vq_free_thresh = tx_free_thresh;
+
+	dev->data->tx_queues[queue_idx] = vq;
+	return 0;
+}
+
+static void
+virtio_discard_rxbuf(struct virtqueue *vq, struct rte_mbuf *m)
+{
+	int error;
+	/*
+	 * Requeue the discarded mbuf. This should always be
+	 * successful since it was just dequeued.
+	 */
+	error = virtqueue_enqueue_recv_refill(vq, m);
+	if (unlikely(error)) {
+		RTE_LOG(ERR, PMD, "cannot requeue discarded mbuf");
+		rte_pktmbuf_free(m);
+	}
+}
+
+#define VIRTIO_MBUF_BURST_SZ 64
+#define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
+uint16_t
+virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	struct virtio_hw *hw;
+	struct rte_mbuf *rxm, *new_mbuf;
+	uint16_t nb_used, num, nb_rx;
+	uint32_t len[VIRTIO_MBUF_BURST_SZ];
+	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
+	int error;
+	uint32_t i, nb_enqueued;
+	const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
+
+	nb_used = VIRTQUEUE_NUSED(rxvq);
+
+	virtio_rmb();
+
+	num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
+	num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : VIRTIO_MBUF_BURST_SZ);
+	if (likely(num > DESC_PER_CACHELINE))
+		num = num - ((rxvq->vq_used_cons_idx + num) % DESC_PER_CACHELINE);
+
+	if (num == 0)
+		return 0;
+
+	num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
+	PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
+
+	hw = rxvq->hw;
+	nb_rx = 0;
+	nb_enqueued = 0;
+
+	for (i = 0; i < num ; i++) {
+		rxm = rcv_pkts[i];
+
+		PMD_RX_LOG(DEBUG, "packet len:%d", len[i]);
+
+		if (unlikely(len[i] < hdr_size + ETHER_HDR_LEN)) {
+			PMD_RX_LOG(ERR, "Packet drop");
+			nb_enqueued++;
+			virtio_discard_rxbuf(rxvq, rxm);
+			rxvq->errors++;
+			continue;
+		}
+
+		rxm->port = rxvq->port_id;
+		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+
+		rxm->nb_segs = 1;
+		rxm->next = NULL;
+		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
+		rxm->data_len = (uint16_t)(len[i] - hdr_size);
+
+		if (hw->vlan_strip)
+			rte_vlan_strip(rxm);
+
+		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
+
+		rx_pkts[nb_rx++] = rxm;
+		rxvq->bytes += rx_pkts[nb_rx - 1]->pkt_len;
+	}
+
+	rxvq->packets += nb_rx;
+
+	/* Allocate new mbuf for the used descriptor */
+	error = ENOSPC;
+	while (likely(!virtqueue_full(rxvq))) {
+		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
+		if (unlikely(new_mbuf == NULL)) {
+			struct rte_eth_dev *dev
+				= &rte_eth_devices[rxvq->port_id];
+			dev->data->rx_mbuf_alloc_failed++;
+			break;
+		}
+		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
+		if (unlikely(error)) {
+			rte_pktmbuf_free(new_mbuf);
+			break;
+		}
+		nb_enqueued++;
+	}
+
+	if (likely(nb_enqueued)) {
+		vq_update_avail_idx(rxvq);
+
+		if (unlikely(virtqueue_kick_prepare(rxvq))) {
+			virtqueue_notify(rxvq);
+			PMD_RX_LOG(DEBUG, "Notified\n");
+		}
+	}
+
+	return nb_rx;
+}
+
+uint16_t
+virtio_recv_mergeable_pkts(void *rx_queue,
+			struct rte_mbuf **rx_pkts,
+			uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	struct virtio_hw *hw;
+	struct rte_mbuf *rxm, *new_mbuf;
+	uint16_t nb_used, num, nb_rx;
+	uint32_t len[VIRTIO_MBUF_BURST_SZ];
+	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
+	struct rte_mbuf *prev;
+	int error;
+	uint32_t i, nb_enqueued;
+	uint32_t seg_num;
+	uint16_t extra_idx;
+	uint32_t seg_res;
+	const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+
+	nb_used = VIRTQUEUE_NUSED(rxvq);
+
+	virtio_rmb();
+
+	if (nb_used == 0)
+		return 0;
+
+	PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);
+
+	hw = rxvq->hw;
+	nb_rx = 0;
+	i = 0;
+	nb_enqueued = 0;
+	seg_num = 0;
+	extra_idx = 0;
+	seg_res = 0;
+
+	while (i < nb_used) {
+		struct virtio_net_hdr_mrg_rxbuf *header;
+
+		if (nb_rx == nb_pkts)
+			break;
+
+		num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, 1);
+		if (num != 1)
+			continue;
+
+		i++;
+
+		PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
+		PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
+
+		rxm = rcv_pkts[0];
+
+		if (unlikely(len[0] < hdr_size + ETHER_HDR_LEN)) {
+			PMD_RX_LOG(ERR, "Packet drop\n");
+			nb_enqueued++;
+			virtio_discard_rxbuf(rxvq, rxm);
+			rxvq->errors++;
+			continue;
+		}
+
+		header = (struct virtio_net_hdr_mrg_rxbuf *)((char *)rxm->buf_addr +
+			RTE_PKTMBUF_HEADROOM - hdr_size);
+		seg_num = header->num_buffers;
+
+		if (seg_num == 0)
+			seg_num = 1;
+
+		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->nb_segs = seg_num;
+		rxm->next = NULL;
+		rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
+		rxm->data_len = (uint16_t)(len[0] - hdr_size);
+
+		rxm->port = rxvq->port_id;
+		rx_pkts[nb_rx] = rxm;
+		prev = rxm;
+
+		seg_res = seg_num - 1;
+
+		while (seg_res != 0) {
+			/*
+			 * Get extra segments for current uncompleted packet.
+			 */
+			uint16_t  rcv_cnt =
+				RTE_MIN(seg_res, RTE_DIM(rcv_pkts));
+			if (likely(VIRTQUEUE_NUSED(rxvq) >= rcv_cnt)) {
+				uint32_t rx_num =
+					virtqueue_dequeue_burst_rx(rxvq,
+					rcv_pkts, len, rcv_cnt);
+				i += rx_num;
+				rcv_cnt = rx_num;
+			} else {
+				PMD_RX_LOG(ERR,
+					"No enough segments for packet.\n");
+				nb_enqueued++;
+				virtio_discard_rxbuf(rxvq, rxm);
+				rxvq->errors++;
+				break;
+			}
+
+			extra_idx = 0;
+
+			while (extra_idx < rcv_cnt) {
+				rxm = rcv_pkts[extra_idx];
+
+				rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
+				rxm->next = NULL;
+				rxm->pkt_len = (uint32_t)(len[extra_idx]);
+				rxm->data_len = (uint16_t)(len[extra_idx]);
+
+				if (prev)
+					prev->next = rxm;
+
+				prev = rxm;
+				rx_pkts[nb_rx]->pkt_len += rxm->pkt_len;
+				extra_idx++;
+			};
+			seg_res -= rcv_cnt;
+		}
+
+		if (hw->vlan_strip)
+			rte_vlan_strip(rx_pkts[nb_rx]);
+
+		VIRTIO_DUMP_PACKET(rx_pkts[nb_rx],
+			rx_pkts[nb_rx]->data_len);
+
+		rxvq->bytes += rx_pkts[nb_rx]->pkt_len;
+		nb_rx++;
+	}
+
+	rxvq->packets += nb_rx;
+
+	/* Allocate new mbuf for the used descriptor */
+	error = ENOSPC;
+	while (likely(!virtqueue_full(rxvq))) {
+		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
+		if (unlikely(new_mbuf == NULL)) {
+			struct rte_eth_dev *dev
+				= &rte_eth_devices[rxvq->port_id];
+			dev->data->rx_mbuf_alloc_failed++;
+			break;
+		}
+		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
+		if (unlikely(error)) {
+			rte_pktmbuf_free(new_mbuf);
+			break;
+		}
+		nb_enqueued++;
+	}
+
+	if (likely(nb_enqueued)) {
+		vq_update_avail_idx(rxvq);
+
+		if (unlikely(virtqueue_kick_prepare(rxvq))) {
+			virtqueue_notify(rxvq);
+			PMD_RX_LOG(DEBUG, "Notified");
+		}
+	}
+
+	return nb_rx;
+}
+
+uint16_t
+virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	struct rte_mbuf *txm;
+	uint16_t nb_used, nb_tx;
+	int error;
+
+	if (unlikely(nb_pkts < 1))
+		return nb_pkts;
+
+	PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
+	nb_used = VIRTQUEUE_NUSED(txvq);
+
+	virtio_rmb();
+	if (likely(nb_used > txvq->vq_free_thresh))
+		virtio_xmit_cleanup(txvq, nb_used);
+
+	nb_tx = 0;
+
+	while (nb_tx < nb_pkts) {
+		/* Need one more descriptor for virtio header. */
+		int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
+
+		/*Positive value indicates it need free vring descriptors */
+		if (unlikely(need > 0)) {
+			nb_used = VIRTQUEUE_NUSED(txvq);
+			virtio_rmb();
+			need = RTE_MIN(need, (int)nb_used);
+
+			virtio_xmit_cleanup(txvq, need);
+			need = (int)tx_pkts[nb_tx]->nb_segs -
+				txvq->vq_free_cnt + 1;
+		}
+
+		/*
+		 * Zero or negative value indicates it has enough free
+		 * descriptors to use for transmitting.
+		 */
+		if (likely(need <= 0)) {
+			txm = tx_pkts[nb_tx];
+
+			/* Do VLAN tag insertion */
+			if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
+				error = rte_vlan_insert(&txm);
+				if (unlikely(error)) {
+					rte_pktmbuf_free(txm);
+					++nb_tx;
+					continue;
+				}
+			}
+
+			/* Enqueue Packet buffers */
+			error = virtqueue_enqueue_xmit(txvq, txm);
+			if (unlikely(error)) {
+				if (error == ENOSPC)
+					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
+				else if (error == EMSGSIZE)
+					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
+				else
+					PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
+				break;
+			}
+			nb_tx++;
+			txvq->bytes += txm->pkt_len;
+		} else {
+			PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
+			break;
+		}
+	}
+
+	txvq->packets += nb_tx;
+
+	if (likely(nb_tx)) {
+		vq_update_avail_idx(txvq);
+
+		if (unlikely(virtqueue_kick_prepare(txvq))) {
+			virtqueue_notify(txvq);
+			PMD_TX_LOG(DEBUG, "Notified backend after xmit");
+		}
+	}
+
+	return nb_tx;
+}
diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
new file mode 100644
index 0000000..8a3005f
--- /dev/null
+++ b/drivers/net/virtio/virtqueue.c
@@ -0,0 +1,70 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+#include "virtqueue.h"
+#include "virtio_logs.h"
+#include "virtio_pci.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+/*
+ * Two types of mbuf to be cleaned:
+ * 1) mbuf that has been consumed by backend but not used by virtio.
+ * 2) mbuf that hasn't been consued by backend.
+ */
+struct rte_mbuf *
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_mbuf *cookie;
+	int idx;
+
+	for (idx = 0; idx < vq->vq_nentries; idx++) {
+		if ((cookie = vq->vq_descx[idx].cookie) != NULL) {
+			vq->vq_descx[idx].cookie = NULL;
+			return cookie;
+		}
+	}
+	return NULL;
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
new file mode 100644
index 0000000..9d6079e
--- /dev/null
+++ b/drivers/net/virtio/virtqueue.h
@@ -0,0 +1,325 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ * This driver is for virtio_pci on SMP and therefore can assume
+ * weaker (compiler barriers)
+ */
+#define virtio_mb()	rte_mb()
+#define virtio_rmb()	rte_compiler_barrier()
+#define virtio_wmb()	rte_compiler_barrier()
+
+#ifdef RTE_PMD_PACKET_PREFETCH
+#define rte_packet_prefetch(p)  rte_prefetch1(p)
+#else
+#define rte_packet_prefetch(p)  do {} while(0)
+#endif
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+#define RTE_MBUF_DATA_DMA_ADDR(mb) \
+	(uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
+
+#define VTNET_SQ_RQ_QUEUE_IDX 0
+#define VTNET_SQ_TQ_QUEUE_IDX 1
+#define VTNET_SQ_CQ_QUEUE_IDX 2
+
+enum { VTNET_RQ = 0, VTNET_TQ = 1, VTNET_CQ = 2 };
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+/**
+ * Control the RX mode, ie. promiscuous, allmulti, etc...
+ * All commands require an "out" sg entry containing a 1 byte
+ * state value, zero = disable, non-zero = enable.  Commands
+ * 0 and 1 are supported with the VIRTIO_NET_F_CTRL_RX feature.
+ * Commands 2-5 are added with VIRTIO_NET_F_CTRL_RX_EXTRA.
+ */
+#define VIRTIO_NET_CTRL_RX              0
+#define VIRTIO_NET_CTRL_RX_PROMISC      0
+#define VIRTIO_NET_CTRL_RX_ALLMULTI     1
+#define VIRTIO_NET_CTRL_RX_ALLUNI       2
+#define VIRTIO_NET_CTRL_RX_NOMULTI      3
+#define VIRTIO_NET_CTRL_RX_NOUNI        4
+#define VIRTIO_NET_CTRL_RX_NOBCAST      5
+
+/**
+ * Control the MAC
+ *
+ * The MAC filter table is managed by the hypervisor, the guest should
+ * assume the size is infinite.  Filtering should be considered
+ * non-perfect, ie. based on hypervisor resources, the guest may
+ * received packets from sources not specified in the filter list.
+ *
+ * In addition to the class/cmd header, the TABLE_SET command requires
+ * two out scatterlists.  Each contains a 4 byte count of entries followed
+ * by a concatenated byte stream of the ETH_ALEN MAC addresses.  The
+ * first sg list contains unicast addresses, the second is for multicast.
+ * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
+ * is available.
+ *
+ * The ADDR_SET command requests one out scatterlist, it contains a
+ * 6 bytes MAC address. This functionality is present if the
+ * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
+ */
+struct virtio_net_ctrl_mac {
+	uint32_t entries;
+	uint8_t macs[][ETHER_ADDR_LEN];
+} __attribute__((__packed__));
+
+#define VIRTIO_NET_CTRL_MAC    1
+ #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
+
+/**
+ * Control VLAN filtering
+ *
+ * The VLAN filter table is controlled via a simple ADD/DEL interface.
+ * VLAN IDs not added may be filtered by the hypervisor.  Del is the
+ * opposite of add.  Both commands expect an out entry containing a 2
+ * byte VLAN ID.  VLAN filtering is available with the
+ * VIRTIO_NET_F_CTRL_VLAN feature bit.
+ */
+#define VIRTIO_NET_CTRL_VLAN     2
+#define VIRTIO_NET_CTRL_VLAN_ADD 0
+#define VIRTIO_NET_CTRL_VLAN_DEL 1
+
+struct virtio_net_ctrl_hdr {
+	uint8_t class;
+	uint8_t cmd;
+} __attribute__((packed));
+
+typedef uint8_t virtio_net_ctrl_ack;
+
+#define VIRTIO_NET_OK     0
+#define VIRTIO_NET_ERR    1
+
+#define VIRTIO_MAX_CTRL_DATA 2048
+
+struct virtio_pmd_ctrl {
+	struct virtio_net_ctrl_hdr hdr;
+	virtio_net_ctrl_ack status;
+	uint8_t data[VIRTIO_MAX_CTRL_DATA];
+};
+
+struct virtqueue {
+	struct virtio_hw         *hw;     /**< virtio_hw structure pointer. */
+	const struct rte_memzone *mz;     /**< mem zone to populate RX ring. */
+	const struct rte_memzone *virtio_net_hdr_mz; /**< memzone to populate hdr. */
+	struct rte_mempool       *mpool;  /**< mempool for mbuf allocation */
+	uint16_t    queue_id;             /**< DPDK queue index. */
+	uint8_t     port_id;              /**< Device port identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+	uint16_t    vq_free_thresh; /**< free threshold */
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
+
+	/* Statistics */
+	uint64_t	packets;
+	uint64_t	bytes;
+	uint64_t	errors;
+
+	struct vq_desc_extra {
+		void              *cookie;
+		uint16_t          ndescs;
+	} vq_descx[0];
+};
+
+/* If multiqueue is provided by host, then we suppport it. */
+#ifndef VIRTIO_NET_F_MQ
+/* Device supports Receive Flow Steering */
+#define VIRTIO_NET_F_MQ 0x400000
+#define VIRTIO_NET_CTRL_MQ   4
+#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
+#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
+#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
+#endif
+#ifndef VIRTIO_NET_F_CTRL_MAC_ADDR
+#define VIRTIO_NET_F_CTRL_MAC_ADDR 0x800000
+#define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
+#endif
+
+/**
+ * This is the first element of the scatter-gather list.  If you don't
+ * specify GSO or CSUM features, you can simply ignore the header.
+ */
+struct virtio_net_hdr {
+#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
+	uint8_t flags;
+#define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
+#define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
+#define VIRTIO_NET_HDR_GSO_UDP      3    /**< GSO frame, IPv4 UDP (UFO) */
+#define VIRTIO_NET_HDR_GSO_TCPV6    4    /**< GSO frame, IPv6 TCP */
+#define VIRTIO_NET_HDR_GSO_ECN      0x80 /**< TCP has ECN set */
+	uint8_t gso_type;
+	uint16_t hdr_len;     /**< Ethernet + IP + tcp/udp hdrs */
+	uint16_t gso_size;    /**< Bytes to append to hdr_len per frame */
+	uint16_t csum_start;  /**< Position to start checksumming from */
+	uint16_t csum_offset; /**< Offset after that to place checksum */
+};
+
+/**
+ * This is the version of the header to use when the MRG_RXBUF
+ * feature has been negotiated.
+ */
+struct virtio_net_hdr_mrg_rxbuf {
+	struct   virtio_net_hdr hdr;
+	uint16_t num_buffers; /**< Number of merged rx buffers */
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+/**
+ *  Dump virtqueue internal structures, for debug purpose only.
+ */
+void virtqueue_dump(struct virtqueue *vq);
+/**
+ *  Get all mbufs to be freed.
+ */
+struct rte_mbuf *virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_NOTIFY, vq->vq_queue_index);
+}
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	PMD_INIT_LOG(DEBUG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
diff --git a/lib/Makefile b/lib/Makefile
index 0bb87ba..3e5cab0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,7 +41,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_TIMER) += librte_timer
 DIRS-$(CONFIG_RTE_LIBRTE_CFGFILE) += librte_cfgfile
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
 DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
-DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
diff --git a/lib/librte_pmd_virtio/Makefile b/lib/librte_pmd_virtio/Makefile
deleted file mode 100644
index 21ff7e5..0000000
--- a/lib/librte_pmd_virtio/Makefile
+++ /dev/null
@@ -1,60 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-#
-# library name
-#
-LIB = librte_pmd_virtio.a
-
-CFLAGS += -O3
-CFLAGS += $(WERROR_FLAGS)
-
-EXPORT_MAP := rte_pmd_virtio_version.map
-
-LIBABIVER := 1
-
-#
-# all source are stored in SRCS-y
-#
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
-
-# this lib depends upon:
-DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
-DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
-DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_net lib/librte_malloc
-
-include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_virtio/rte_pmd_virtio_version.map b/lib/librte_pmd_virtio/rte_pmd_virtio_version.map
deleted file mode 100644
index ef35398..0000000
--- a/lib/librte_pmd_virtio/rte_pmd_virtio_version.map
+++ /dev/null
@@ -1,4 +0,0 @@
-DPDK_2.0 {
-
-	local: *;
-};
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c
deleted file mode 100644
index e63dbfb..0000000
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ /dev/null
@@ -1,1504 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdint.h>
-#include <string.h>
-#include <stdio.h>
-#include <errno.h>
-#include <unistd.h>
-#ifdef RTE_EXEC_ENV_LINUXAPP
-#include <dirent.h>
-#include <fcntl.h>
-#endif
-
-#include <rte_ethdev.h>
-#include <rte_memcpy.h>
-#include <rte_string_fns.h>
-#include <rte_memzone.h>
-#include <rte_malloc.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_pci.h>
-#include <rte_ether.h>
-#include <rte_common.h>
-
-#include <rte_memory.h>
-#include <rte_eal.h>
-#include <rte_dev.h>
-
-#include "virtio_ethdev.h"
-#include "virtio_pci.h"
-#include "virtio_logs.h"
-#include "virtqueue.h"
-
-
-static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
-static int  virtio_dev_configure(struct rte_eth_dev *dev);
-static int  virtio_dev_start(struct rte_eth_dev *dev);
-static void virtio_dev_stop(struct rte_eth_dev *dev);
-static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
-static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
-static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
-static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
-static void virtio_dev_info_get(struct rte_eth_dev *dev,
-				struct rte_eth_dev_info *dev_info);
-static int virtio_dev_link_update(struct rte_eth_dev *dev,
-	__rte_unused int wait_to_complete);
-
-static void virtio_set_hwaddr(struct virtio_hw *hw);
-static void virtio_get_hwaddr(struct virtio_hw *hw);
-
-static void virtio_dev_rx_queue_release(__rte_unused void *rxq);
-static void virtio_dev_tx_queue_release(__rte_unused void *txq);
-
-static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
-static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
-static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
-static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
-				uint16_t vlan_id, int on);
-static void virtio_mac_addr_add(struct rte_eth_dev *dev,
-				struct ether_addr *mac_addr,
-				uint32_t index, uint32_t vmdq __rte_unused);
-static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
-static void virtio_mac_addr_set(struct rte_eth_dev *dev,
-				struct ether_addr *mac_addr);
-
-static int virtio_dev_queue_stats_mapping_set(
-	__rte_unused struct rte_eth_dev *eth_dev,
-	__rte_unused uint16_t queue_id,
-	__rte_unused uint8_t stat_idx,
-	__rte_unused uint8_t is_rx);
-
-/*
- * The set of PCI devices this driver supports
- */
-static const struct rte_pci_id pci_id_virtio_map[] = {
-
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
-#include "rte_pci_dev_ids.h"
-
-{ .vendor_id = 0, /* sentinel */ },
-};
-
-static int
-virtio_send_command(struct virtqueue *vq, struct virtio_pmd_ctrl *ctrl,
-		int *dlen, int pkt_num)
-{
-	uint16_t head = vq->vq_desc_head_idx, i;
-	int k, sum = 0;
-	virtio_net_ctrl_ack status = ~0;
-	struct virtio_pmd_ctrl result;
-
-	ctrl->status = status;
-
-	if (!vq->hw->cvq) {
-		PMD_INIT_LOG(ERR,
-			     "%s(): Control queue is not supported.",
-			     __func__);
-		return -1;
-	}
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
-		"vq->hw->cvq = %p vq = %p",
-		vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
-
-	if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1))
-		return -1;
-
-	memcpy(vq->virtio_net_hdr_mz->addr, ctrl,
-		sizeof(struct virtio_pmd_ctrl));
-
-	/*
-	 * Format is enforced in qemu code:
-	 * One TX packet for header;
-	 * At least one TX packet per argument;
-	 * One RX packet for ACK.
-	 */
-	vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
-	vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
-	vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_free_cnt--;
-	i = vq->vq_ring.desc[head].next;
-
-	for (k = 0; k < pkt_num; k++) {
-		vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
-		vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
-			+ sizeof(struct virtio_net_ctrl_hdr)
-			+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
-		vq->vq_ring.desc[i].len = dlen[k];
-		sum += dlen[k];
-		vq->vq_free_cnt--;
-		i = vq->vq_ring.desc[i].next;
-	}
-
-	vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
-	vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
-			+ sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_ring.desc[i].len = sizeof(ctrl->status);
-	vq->vq_free_cnt--;
-
-	vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
-
-	vq_update_avail_ring(vq, head);
-	vq_update_avail_idx(vq);
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index);
-
-	virtqueue_notify(vq);
-
-	rte_rmb();
-	while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
-		rte_rmb();
-		usleep(100);
-	}
-
-	while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
-		uint32_t idx, desc_idx, used_idx;
-		struct vring_used_elem *uep;
-
-		used_idx = (uint32_t)(vq->vq_used_cons_idx
-				& (vq->vq_nentries - 1));
-		uep = &vq->vq_ring.used->ring[used_idx];
-		idx = (uint32_t) uep->id;
-		desc_idx = idx;
-
-		while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
-			desc_idx = vq->vq_ring.desc[desc_idx].next;
-			vq->vq_free_cnt++;
-		}
-
-		vq->vq_ring.desc[desc_idx].next = vq->vq_desc_head_idx;
-		vq->vq_desc_head_idx = idx;
-
-		vq->vq_used_cons_idx++;
-		vq->vq_free_cnt++;
-	}
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d",
-			vq->vq_free_cnt, vq->vq_desc_head_idx);
-
-	memcpy(&result, vq->virtio_net_hdr_mz->addr,
-			sizeof(struct virtio_pmd_ctrl));
-
-	return result.status;
-}
-
-static int
-virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
-	memcpy(ctrl.data, &nb_queues, sizeof(uint16_t));
-
-	dlen[0] = sizeof(uint16_t);
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret) {
-		PMD_INIT_LOG(ERR, "Multiqueue configured but send command "
-			  "failed, this is too late now...");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-int virtio_dev_queue_setup(struct rte_eth_dev *dev,
-			int queue_type,
-			uint16_t queue_idx,
-			uint16_t  vtpci_queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			struct virtqueue **pvq)
-{
-	char vq_name[VIRTQUEUE_MAX_NAME_SZ];
-	const struct rte_memzone *mz;
-	uint16_t vq_size;
-	int size;
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtqueue  *vq = NULL;
-
-	/* Write the virtqueue index to the Queue Select Field */
-	VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
-	PMD_INIT_LOG(DEBUG, "selecting queue: %d", vtpci_queue_idx);
-
-	/*
-	 * Read the virtqueue size from the Queue Size field
-	 * Always power of 2 and if 0 virtqueue does not exist
-	 */
-	vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
-	PMD_INIT_LOG(DEBUG, "vq_size: %d nb_desc:%d", vq_size, nb_desc);
-	if (nb_desc == 0)
-		nb_desc = vq_size;
-	if (vq_size == 0) {
-		PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
-		return -EINVAL;
-	} else if (!rte_is_power_of_2(vq_size)) {
-		PMD_INIT_LOG(ERR, "%s: virtqueue size is not powerof 2", __func__);
-		return -EINVAL;
-	} else if (nb_desc != vq_size) {
-		PMD_INIT_LOG(ERR, "Warning: nb_desc(%d) is not equal to vq size (%d), fall to vq size",
-			nb_desc, vq_size);
-		nb_desc = vq_size;
-	}
-
-	if (queue_type == VTNET_RQ) {
-		snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
-			dev->data->port_id, queue_idx);
-		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
-	} else if (queue_type == VTNET_TQ) {
-		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
-			dev->data->port_id, queue_idx);
-		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
-	} else if (queue_type == VTNET_CQ) {
-		snprintf(vq_name, sizeof(vq_name), "port%d_cvq",
-			dev->data->port_id);
-		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-			vq_size * sizeof(struct vq_desc_extra),
-			RTE_CACHE_LINE_SIZE);
-	}
-	if (vq == NULL) {
-		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
-		return (-ENOMEM);
-	}
-
-	vq->hw = hw;
-	vq->port_id = dev->data->port_id;
-	vq->queue_id = queue_idx;
-	vq->vq_queue_index = vtpci_queue_idx;
-	vq->vq_nentries = vq_size;
-	vq->vq_free_cnt = vq_size;
-
-	/*
-	 * Reserve a memzone for vring elements
-	 */
-	size = vring_size(vq_size, VIRTIO_PCI_VRING_ALIGN);
-	vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
-	PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d", size, vq->vq_ring_size);
-
-	mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
-		socket_id, 0, VIRTIO_PCI_VRING_ALIGN);
-	if (mz == NULL) {
-		rte_free(vq);
-		return -ENOMEM;
-	}
-
-	/*
-	 * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
-	 * and only accepts 32 bit page frame number.
-	 * Check if the allocated physical memory exceeds 16TB.
-	 */
-	if ((mz->phys_addr + vq->vq_ring_size - 1) >> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
-		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
-		rte_free(vq);
-		return -ENOMEM;
-	}
-
-	memset(mz->addr, 0, sizeof(mz->len));
-	vq->mz = mz;
-	vq->vq_ring_mem = mz->phys_addr;
-	vq->vq_ring_virt_mem = mz->addr;
-	PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:      0x%"PRIx64, (uint64_t)mz->phys_addr);
-	PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, (uint64_t)mz->addr);
-	vq->virtio_net_hdr_mz  = NULL;
-	vq->virtio_net_hdr_mem = 0;
-
-	if (queue_type == VTNET_TQ) {
-		/*
-		 * For each xmit packet, allocate a virtio_net_hdr
-		 */
-		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
-			dev->data->port_id, queue_idx);
-		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
-			vq_size * hw->vtnet_hdr_size,
-			socket_id, 0, RTE_CACHE_LINE_SIZE);
-		if (vq->virtio_net_hdr_mz == NULL) {
-			rte_free(vq);
-			return -ENOMEM;
-		}
-		vq->virtio_net_hdr_mem =
-			vq->virtio_net_hdr_mz->phys_addr;
-		memset(vq->virtio_net_hdr_mz->addr, 0,
-			vq_size * hw->vtnet_hdr_size);
-	} else if (queue_type == VTNET_CQ) {
-		/* Allocate a page for control vq command, data and status */
-		snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
-			dev->data->port_id);
-		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
-			PAGE_SIZE, socket_id, 0, RTE_CACHE_LINE_SIZE);
-		if (vq->virtio_net_hdr_mz == NULL) {
-			rte_free(vq);
-			return -ENOMEM;
-		}
-		vq->virtio_net_hdr_mem =
-			vq->virtio_net_hdr_mz->phys_addr;
-		memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
-	}
-
-	/*
-	 * Set guest physical address of the virtqueue
-	 * in VIRTIO_PCI_QUEUE_PFN config register of device
-	 */
-	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
-			mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	*pvq = vq;
-	return 0;
-}
-
-static int
-virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx,
-		uint32_t socket_id)
-{
-	struct virtqueue *vq;
-	uint16_t nb_desc = 0;
-	int ret;
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	PMD_INIT_FUNC_TRACE();
-	ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
-			vtpci_queue_idx, nb_desc, socket_id, &vq);
-
-	if (ret < 0) {
-		PMD_INIT_LOG(ERR, "control vq initialization failed");
-		return ret;
-	}
-
-	hw->cvq = vq;
-	return 0;
-}
-
-static void
-virtio_dev_close(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct rte_pci_device *pci_dev = dev->pci_dev;
-
-	PMD_INIT_LOG(DEBUG, "virtio_dev_close");
-
-	/* reset the NIC */
-	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
-	vtpci_reset(hw);
-	hw->started = 0;
-	virtio_dev_free_mbufs(dev);
-}
-
-static void
-virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
-	ctrl.data[0] = 1;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to enable promisc");
-}
-
-static void
-virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
-	ctrl.data[0] = 0;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to disable promisc");
-}
-
-static void
-virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
-	ctrl.data[0] = 1;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to enable allmulticast");
-}
-
-static void
-virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
-	ctrl.data[0] = 0;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
-}
-
-/*
- * dev_ops for virtio, bare necessities for basic operation
- */
-static const struct eth_dev_ops virtio_eth_dev_ops = {
-	.dev_configure           = virtio_dev_configure,
-	.dev_start               = virtio_dev_start,
-	.dev_stop                = virtio_dev_stop,
-	.dev_close               = virtio_dev_close,
-	.promiscuous_enable      = virtio_dev_promiscuous_enable,
-	.promiscuous_disable     = virtio_dev_promiscuous_disable,
-	.allmulticast_enable     = virtio_dev_allmulticast_enable,
-	.allmulticast_disable    = virtio_dev_allmulticast_disable,
-
-	.dev_infos_get           = virtio_dev_info_get,
-	.stats_get               = virtio_dev_stats_get,
-	.stats_reset             = virtio_dev_stats_reset,
-	.link_update             = virtio_dev_link_update,
-	.rx_queue_setup          = virtio_dev_rx_queue_setup,
-	/* meaningfull only to multiple queue */
-	.rx_queue_release        = virtio_dev_rx_queue_release,
-	.tx_queue_setup          = virtio_dev_tx_queue_setup,
-	/* meaningfull only to multiple queue */
-	.tx_queue_release        = virtio_dev_tx_queue_release,
-	/* collect stats per queue */
-	.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
-	.vlan_filter_set         = virtio_vlan_filter_set,
-	.mac_addr_add            = virtio_mac_addr_add,
-	.mac_addr_remove         = virtio_mac_addr_remove,
-	.mac_addr_set            = virtio_mac_addr_set,
-};
-
-static inline int
-virtio_dev_atomic_read_link_status(struct rte_eth_dev *dev,
-				struct rte_eth_link *link)
-{
-	struct rte_eth_link *dst = link;
-	struct rte_eth_link *src = &(dev->data->dev_link);
-
-	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
-			*(uint64_t *)src) == 0)
-		return -1;
-
-	return 0;
-}
-
-/**
- * Atomically writes the link status information into global
- * structure rte_eth_dev.
- *
- * @param dev
- *   - Pointer to the structure rte_eth_dev to read from.
- *   - Pointer to the buffer to be saved with the link status.
- *
- * @return
- *   - On success, zero.
- *   - On failure, negative value.
- */
-static inline int
-virtio_dev_atomic_write_link_status(struct rte_eth_dev *dev,
-		struct rte_eth_link *link)
-{
-	struct rte_eth_link *dst = &(dev->data->dev_link);
-	struct rte_eth_link *src = link;
-
-	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
-					*(uint64_t *)src) == 0)
-		return -1;
-
-	return 0;
-}
-
-static void
-virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
-{
-	unsigned i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		const struct virtqueue *txvq = dev->data->tx_queues[i];
-		if (txvq == NULL)
-			continue;
-
-		stats->opackets += txvq->packets;
-		stats->obytes += txvq->bytes;
-		stats->oerrors += txvq->errors;
-
-		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			stats->q_opackets[i] = txvq->packets;
-			stats->q_obytes[i] = txvq->bytes;
-		}
-	}
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		const struct virtqueue *rxvq = dev->data->rx_queues[i];
-		if (rxvq == NULL)
-			continue;
-
-		stats->ipackets += rxvq->packets;
-		stats->ibytes += rxvq->bytes;
-		stats->ierrors += rxvq->errors;
-
-		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			stats->q_ipackets[i] = rxvq->packets;
-			stats->q_ibytes[i] = rxvq->bytes;
-		}
-	}
-
-	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
-}
-
-static void
-virtio_dev_stats_reset(struct rte_eth_dev *dev)
-{
-	unsigned int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		struct virtqueue *txvq = dev->data->tx_queues[i];
-		if (txvq == NULL)
-			continue;
-
-		txvq->packets = 0;
-		txvq->bytes = 0;
-		txvq->errors = 0;
-	}
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		struct virtqueue *rxvq = dev->data->rx_queues[i];
-		if (rxvq == NULL)
-			continue;
-
-		rxvq->packets = 0;
-		rxvq->bytes = 0;
-		rxvq->errors = 0;
-	}
-
-	dev->data->rx_mbuf_alloc_failed = 0;
-}
-
-static void
-virtio_set_hwaddr(struct virtio_hw *hw)
-{
-	vtpci_write_dev_config(hw,
-			offsetof(struct virtio_net_config, mac),
-			&hw->mac_addr, ETHER_ADDR_LEN);
-}
-
-static void
-virtio_get_hwaddr(struct virtio_hw *hw)
-{
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC)) {
-		vtpci_read_dev_config(hw,
-			offsetof(struct virtio_net_config, mac),
-			&hw->mac_addr, ETHER_ADDR_LEN);
-	} else {
-		eth_random_addr(&hw->mac_addr[0]);
-		virtio_set_hwaddr(hw);
-	}
-}
-
-static int
-virtio_mac_table_set(struct virtio_hw *hw,
-		     const struct virtio_net_ctrl_mac *uc,
-		     const struct virtio_net_ctrl_mac *mc)
-{
-	struct virtio_pmd_ctrl ctrl;
-	int err, len[2];
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
-
-	len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
-	memcpy(ctrl.data, uc, len[0]);
-
-	len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
-	memcpy(ctrl.data + len[0], mc, len[1]);
-
-	err = virtio_send_command(hw->cvq, &ctrl, len, 2);
-	if (err != 0)
-		PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
-
-	return err;
-}
-
-static void
-virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
-		    uint32_t index, uint32_t vmdq __rte_unused)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	const struct ether_addr *addrs = dev->data->mac_addrs;
-	unsigned int i;
-	struct virtio_net_ctrl_mac *uc, *mc;
-
-	if (index >= VIRTIO_MAX_MAC_ADDRS) {
-		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
-		return;
-	}
-
-	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
-	uc->entries = 0;
-	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
-	mc->entries = 0;
-
-	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
-		const struct ether_addr *addr
-			= (i == index) ? mac_addr : addrs + i;
-		struct virtio_net_ctrl_mac *tbl
-			= is_multicast_ether_addr(addr) ? mc : uc;
-
-		memcpy(&tbl->macs[tbl->entries++], addr, ETHER_ADDR_LEN);
-	}
-
-	virtio_mac_table_set(hw, uc, mc);
-}
-
-static void
-virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct ether_addr *addrs = dev->data->mac_addrs;
-	struct virtio_net_ctrl_mac *uc, *mc;
-	unsigned int i;
-
-	if (index >= VIRTIO_MAX_MAC_ADDRS) {
-		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
-		return;
-	}
-
-	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
-	uc->entries = 0;
-	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
-	mc->entries = 0;
-
-	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
-		struct virtio_net_ctrl_mac *tbl;
-
-		if (i == index || is_zero_ether_addr(addrs + i))
-			continue;
-
-		tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
-		memcpy(&tbl->macs[tbl->entries++], addrs + i, ETHER_ADDR_LEN);
-	}
-
-	virtio_mac_table_set(hw, uc, mc);
-}
-
-static void
-virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
-
-	/* Use atomic update if available */
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
-		struct virtio_pmd_ctrl ctrl;
-		int len = ETHER_ADDR_LEN;
-
-		ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
-		ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
-
-		memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
-		virtio_send_command(hw->cvq, &ctrl, &len, 1);
-	} else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
-		virtio_set_hwaddr(hw);
-}
-
-static int
-virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int len;
-
-	if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
-		return -ENOTSUP;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
-	ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
-	memcpy(ctrl.data, &vlan_id, sizeof(vlan_id));
-	len = sizeof(vlan_id);
-
-	return virtio_send_command(hw->cvq, &ctrl, &len, 1);
-}
-
-static void
-virtio_negotiate_features(struct virtio_hw *hw)
-{
-	uint32_t host_features, mask;
-
-	/* checksum offload not implemented */
-	mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
-
-	/* TSO and LRO are only available when their corresponding
-	 * checksum offload feature is also negotiated.
-	 */
-	mask |= VIRTIO_NET_F_HOST_TSO4 | VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_ECN;
-	mask |= VIRTIO_NET_F_GUEST_TSO4 | VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN;
-	mask |= VTNET_LRO_FEATURES;
-
-	/* not negotiating INDIRECT descriptor table support */
-	mask |= VIRTIO_RING_F_INDIRECT_DESC;
-
-	/* Prepare guest_features: feature that driver wants to support */
-	hw->guest_features = VTNET_FEATURES & ~mask;
-	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
-		hw->guest_features);
-
-	/* Read device(host) feature bits */
-	host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
-	PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
-		host_features);
-
-	/*
-	 * Negotiate features: Subset of device feature bits are written back
-	 * guest feature bits.
-	 */
-	hw->guest_features = vtpci_negotiate_features(hw, host_features);
-	PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
-		hw->guest_features);
-}
-
-#ifdef RTE_EXEC_ENV_LINUXAPP
-static int
-parse_sysfs_value(const char *filename, unsigned long *val)
-{
-	FILE *f;
-	char buf[BUFSIZ];
-	char *end = NULL;
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
-			     __func__, filename);
-		return -1;
-	}
-
-	if (fgets(buf, sizeof(buf), f) == NULL) {
-		PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
-			     __func__, filename);
-		fclose(f);
-		return -1;
-	}
-	*val = strtoul(buf, &end, 0);
-	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-		PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
-			     __func__, filename);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-	return 0;
-}
-
-static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen,
-			unsigned int *uio_num)
-{
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-	snprintf(dirname, sizeof(dirname),
-		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-		     loc->domain, loc->bus, loc->devid, loc->function);
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		snprintf(dirname, sizeof(dirname),
-			     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-			     loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1;
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		*uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		*uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			snprintf(buf, buflen, "%s/uio:uio%u", dirname,
-				     *uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL) {
-		PMD_INIT_LOG(ERR, "Could not find uio resource");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-virtio_has_msix(const struct rte_pci_addr *loc)
-{
-	DIR *d;
-	char dirname[PATH_MAX];
-
-	snprintf(dirname, sizeof(dirname),
-		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
-		     loc->domain, loc->bus, loc->devid, loc->function);
-
-	d = opendir(dirname);
-	if (d)
-		closedir(d);
-
-	return (d != NULL);
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
-{
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	unsigned long start, size;
-	unsigned int uio_num;
-
-	if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname), &uio_num) < 0)
-		return -1;
-
-	/* get portio size */
-	snprintf(filename, sizeof(filename),
-		     "%s/portio/port0/size", dirname);
-	if (parse_sysfs_value(filename, &size) < 0) {
-		PMD_INIT_LOG(ERR, "%s(): cannot parse size",
-			     __func__);
-		return -1;
-	}
-
-	/* get portio start */
-	snprintf(filename, sizeof(filename),
-		 "%s/portio/port0/start", dirname);
-	if (parse_sysfs_value(filename, &start) < 0) {
-		PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
-			     __func__);
-		return -1;
-	}
-	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
-	pci_dev->mem_resource[0].len =  (uint64_t)size;
-	PMD_INIT_LOG(DEBUG,
-		     "PCI Port IO found start=0x%lx with size=0x%lx",
-		     start, size);
-
-	/* save fd */
-	memset(dirname, 0, sizeof(dirname));
-	snprintf(dirname, sizeof(dirname), "/dev/uio%u", uio_num);
-	pci_dev->intr_handle.fd = open(dirname, O_RDWR);
-	if (pci_dev->intr_handle.fd < 0) {
-		PMD_INIT_LOG(ERR, "Cannot open %s: %s\n",
-			dirname, strerror(errno));
-		return -1;
-	}
-
-	pci_dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-	pci_dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC;
-
-	return 0;
-}
-
-/* Extract port I/O numbers from proc/ioports */
-static int virtio_resource_init_by_ioports(struct rte_pci_device *pci_dev)
-{
-	uint16_t start, end;
-	int size;
-	FILE *fp;
-	char *line = NULL;
-	char pci_id[16];
-	int found = 0;
-	size_t linesz;
-
-	snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
-		 pci_dev->addr.domain,
-		 pci_dev->addr.bus,
-		 pci_dev->addr.devid,
-		 pci_dev->addr.function);
-
-	fp = fopen("/proc/ioports", "r");
-	if (fp == NULL) {
-		PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
-		return -1;
-	}
-
-	while (getdelim(&line, &linesz, '\n', fp) > 0) {
-		char *ptr = line;
-		char *left;
-		int n;
-
-		n = strcspn(ptr, ":");
-		ptr[n] = 0;
-		left = &ptr[n+1];
-
-		while (*left && isspace(*left))
-			left++;
-
-		if (!strncmp(left, pci_id, strlen(pci_id))) {
-			found = 1;
-
-			while (*ptr && isspace(*ptr))
-				ptr++;
-
-			sscanf(ptr, "%04hx-%04hx", &start, &end);
-			size = end - start + 1;
-
-			break;
-		}
-	}
-
-	free(line);
-	fclose(fp);
-
-	if (!found)
-		return -1;
-
-	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
-	pci_dev->mem_resource[0].len =  (uint64_t)size;
-	PMD_INIT_LOG(DEBUG,
-		"PCI Port IO found start=0x%x with size=0x%x",
-		start, size);
-
-	/* can't support lsc interrupt without uio */
-	pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
-
-	return 0;
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init(struct rte_pci_device *pci_dev)
-{
-	if (virtio_resource_init_by_uio(pci_dev) == 0)
-		return 0;
-	else
-		return virtio_resource_init_by_ioports(pci_dev);
-}
-
-#else
-static int
-virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
-{
-	/* nic_uio does not enable interrupts, return 0 (false). */
-	return 0;
-}
-
-static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused)
-{
-	/* no setup required */
-	return 0;
-}
-#endif
-
-/*
- * Process Virtio Config changed interrupt and call the callback
- * if link state changed.
- */
-static void
-virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
-			 void *param)
-{
-	struct rte_eth_dev *dev = param;
-	struct virtio_hw *hw = dev->data->dev_private;
-	uint8_t isr;
-
-	/* Read interrupt status which clears interrupt */
-	isr = vtpci_isr(hw);
-	PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
-
-	if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
-		PMD_DRV_LOG(ERR, "interrupt enable failed");
-
-	if (isr & VIRTIO_PCI_ISR_CONFIG) {
-		if (virtio_dev_link_update(dev, 0) == 0)
-			_rte_eth_dev_callback_process(dev,
-						      RTE_ETH_EVENT_INTR_LSC);
-	}
-
-}
-
-static void
-rx_func_get(struct rte_eth_dev *eth_dev)
-{
-	struct virtio_hw *hw = eth_dev->data->dev_private;
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
-		eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
-	else
-		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
-}
-
-/*
- * This function is based on probe() function in virtio_pci.c
- * It returns 0 on success.
- */
-static int
-eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
-{
-	struct virtio_hw *hw = eth_dev->data->dev_private;
-	struct virtio_net_config *config;
-	struct virtio_net_config local_config;
-	uint32_t offset_conf = sizeof(config->mac);
-	struct rte_pci_device *pci_dev;
-
-	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
-
-	eth_dev->dev_ops = &virtio_eth_dev_ops;
-	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		rx_func_get(eth_dev);
-		return 0;
-	}
-
-	/* Allocate memory for storing MAC addresses */
-	eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
-	if (eth_dev->data->mac_addrs == NULL) {
-		PMD_INIT_LOG(ERR,
-			"Failed to allocate %d bytes needed to store MAC addresses",
-			ETHER_ADDR_LEN);
-		return -ENOMEM;
-	}
-
-	pci_dev = eth_dev->pci_dev;
-	if (virtio_resource_init(pci_dev) < 0)
-		return -1;
-
-	hw->use_msix = virtio_has_msix(&pci_dev->addr);
-	hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
-
-	/* Reset the device although not necessary at startup */
-	vtpci_reset(hw);
-
-	/* Tell the host we've noticed this device. */
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
-	/* Tell the host we've known how to drive the device. */
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-	virtio_negotiate_features(hw);
-
-	rx_func_get(eth_dev);
-
-	/* Setting up rx_header size for the device */
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
-		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-	else
-		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
-
-	/* Copy the permanent MAC address to: virtio_hw */
-	virtio_get_hwaddr(hw);
-	ether_addr_copy((struct ether_addr *) hw->mac_addr,
-			&eth_dev->data->mac_addrs[0]);
-	PMD_INIT_LOG(DEBUG,
-		     "PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
-		     hw->mac_addr[0], hw->mac_addr[1], hw->mac_addr[2],
-		     hw->mac_addr[3], hw->mac_addr[4], hw->mac_addr[5]);
-
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
-		config = &local_config;
-
-		if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-			offset_conf += sizeof(config->status);
-		} else {
-			PMD_INIT_LOG(DEBUG,
-				     "VIRTIO_NET_F_STATUS is not supported");
-			config->status = 0;
-		}
-
-		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
-			offset_conf += sizeof(config->max_virtqueue_pairs);
-		} else {
-			PMD_INIT_LOG(DEBUG,
-				     "VIRTIO_NET_F_MQ is not supported");
-			config->max_virtqueue_pairs = 1;
-		}
-
-		vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
-
-		hw->max_rx_queues =
-			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-		hw->max_tx_queues =
-			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-		virtio_dev_cq_queue_setup(eth_dev,
-					config->max_virtqueue_pairs * 2,
-					SOCKET_ID_ANY);
-
-		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
-				config->max_virtqueue_pairs);
-		PMD_INIT_LOG(DEBUG, "config->status=%d", config->status);
-		PMD_INIT_LOG(DEBUG,
-				"PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
-				config->mac[0], config->mac[1],
-				config->mac[2], config->mac[3],
-				config->mac[4], config->mac[5]);
-	} else {
-		hw->max_rx_queues = 1;
-		hw->max_tx_queues = 1;
-	}
-
-	eth_dev->data->nb_rx_queues = hw->max_rx_queues;
-	eth_dev->data->nb_tx_queues = hw->max_tx_queues;
-
-	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-			hw->max_rx_queues, hw->max_tx_queues);
-	PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
-			eth_dev->data->port_id, pci_dev->id.vendor_id,
-			pci_dev->id.device_id);
-
-	/* Setup interrupt callback  */
-	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-		rte_intr_callback_register(&pci_dev->intr_handle,
-				   virtio_interrupt_handler, eth_dev);
-
-	virtio_dev_cq_start(eth_dev);
-
-	return 0;
-}
-
-static struct eth_driver rte_virtio_pmd = {
-	{
-		.name = "rte_virtio_pmd",
-		.id_table = pci_id_virtio_map,
-	},
-	.eth_dev_init = eth_virtio_dev_init,
-	.dev_private_size = sizeof(struct virtio_hw),
-};
-
-/*
- * Driver initialization routine.
- * Invoked once at EAL init time.
- * Register itself as the [Poll Mode] Driver of PCI virtio devices.
- * Returns 0 on success.
- */
-static int
-rte_virtio_pmd_init(const char *name __rte_unused,
-		    const char *param __rte_unused)
-{
-	if (rte_eal_iopl_init() != 0) {
-		PMD_INIT_LOG(ERR, "IOPL call failed - cannot use virtio PMD");
-		return -1;
-	}
-
-	rte_eth_driver_register(&rte_virtio_pmd);
-	return 0;
-}
-
-/*
- * Only 1 queue is supported, no queue release related operation
- */
-static void
-virtio_dev_rx_queue_release(__rte_unused void *rxq)
-{
-}
-
-static void
-virtio_dev_tx_queue_release(__rte_unused void *txq)
-{
-}
-
-/*
- * Configure virtio device
- * It returns 0 on success.
- */
-static int
-virtio_dev_configure(struct rte_eth_dev *dev)
-{
-	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct rte_pci_device *pci_dev = dev->pci_dev;
-
-	PMD_INIT_LOG(DEBUG, "configure");
-
-	if (rxmode->hw_ip_checksum) {
-		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
-		return (-EINVAL);
-	}
-
-	hw->vlan_strip = rxmode->hw_vlan_strip;
-
-	if (rxmode->hw_vlan_filter
-	    && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
-		PMD_DRV_LOG(NOTICE,
-			    "vlan filtering not available on this host");
-		return -ENOTSUP;
-	}
-
-	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-		if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
-			PMD_DRV_LOG(ERR, "failed to set config vector");
-			return -EBUSY;
-		}
-
-	return 0;
-}
-
-
-static int
-virtio_dev_start(struct rte_eth_dev *dev)
-{
-	uint16_t nb_queues, i;
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct rte_pci_device *pci_dev = dev->pci_dev;
-
-	/* check if lsc interrupt feature is enabled */
-	if ((dev->data->dev_conf.intr_conf.lsc) &&
-		(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
-		if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-			PMD_DRV_LOG(ERR, "link status not supported by host");
-			return -ENOTSUP;
-		}
-
-		if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
-			PMD_DRV_LOG(ERR, "interrupt enable failed");
-			return -EIO;
-		}
-	}
-
-	/* Initialize Link state */
-	virtio_dev_link_update(dev, 0);
-
-	/* On restart after stop do not touch queues */
-	if (hw->started)
-		return 0;
-
-	/* Do final configuration before rx/tx engine starts */
-	virtio_dev_rxtx_start(dev);
-	vtpci_reinit_complete(hw);
-
-	hw->started = 1;
-
-	/*Notify the backend
-	 *Otherwise the tap backend might already stop its queue due to fullness.
-	 *vhost backend will have no chance to be waked up
-	 */
-	nb_queues = dev->data->nb_rx_queues;
-	if (nb_queues > 1) {
-		if (virtio_set_multiple_queues(dev, nb_queues) != 0)
-			return -EINVAL;
-	}
-
-	PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
-
-	for (i = 0; i < nb_queues; i++)
-		virtqueue_notify(dev->data->rx_queues[i]);
-
-	PMD_INIT_LOG(DEBUG, "Notified backend at initialization");
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++)
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++)
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-
-	return 0;
-}
-
-static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)
-{
-	struct rte_mbuf *buf;
-	int i, mbuf_num = 0;
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		PMD_INIT_LOG(DEBUG,
-			     "Before freeing rxq[%d] used and unused buf", i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-
-		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
-					dev->data->rx_queues[i])) != NULL) {
-			rte_pktmbuf_free(buf);
-			mbuf_num++;
-		}
-
-		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
-		PMD_INIT_LOG(DEBUG,
-			     "After freeing rxq[%d] used and unused buf", i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-	}
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		PMD_INIT_LOG(DEBUG,
-			     "Before freeing txq[%d] used and unused bufs",
-			     i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-
-		mbuf_num = 0;
-		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
-					dev->data->tx_queues[i])) != NULL) {
-			rte_pktmbuf_free(buf);
-
-			mbuf_num++;
-		}
-
-		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
-		PMD_INIT_LOG(DEBUG,
-			     "After freeing txq[%d] used and unused buf", i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-	}
-}
-
-/*
- * Stop device: disable interrupt and mark link down
- */
-static void
-virtio_dev_stop(struct rte_eth_dev *dev)
-{
-	struct rte_eth_link link;
-
-	PMD_INIT_LOG(DEBUG, "stop");
-
-	if (dev->data->dev_conf.intr_conf.lsc)
-		rte_intr_disable(&dev->pci_dev->intr_handle);
-
-	memset(&link, 0, sizeof(link));
-	virtio_dev_atomic_write_link_status(dev, &link);
-}
-
-static int
-virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
-{
-	struct rte_eth_link link, old;
-	uint16_t status;
-	struct virtio_hw *hw = dev->data->dev_private;
-	memset(&link, 0, sizeof(link));
-	virtio_dev_atomic_read_link_status(dev, &link);
-	old = link;
-	link.link_duplex = FULL_DUPLEX;
-	link.link_speed  = SPEED_10G;
-
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-		PMD_INIT_LOG(DEBUG, "Get link status from hw");
-		vtpci_read_dev_config(hw,
-				offsetof(struct virtio_net_config, status),
-				&status, sizeof(status));
-		if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
-			link.link_status = 0;
-			PMD_INIT_LOG(DEBUG, "Port %d is down",
-				     dev->data->port_id);
-		} else {
-			link.link_status = 1;
-			PMD_INIT_LOG(DEBUG, "Port %d is up",
-				     dev->data->port_id);
-		}
-	} else {
-		link.link_status = 1;   /* Link up */
-	}
-	virtio_dev_atomic_write_link_status(dev, &link);
-
-	return (old.link_status == link.link_status) ? -1 : 0;
-}
-
-static void
-virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	dev_info->driver_name = dev->driver->pci_drv.name;
-	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
-	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
-	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
-	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
-	dev_info->default_txconf = (struct rte_eth_txconf) {
-		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
-	};
-}
-
-/*
- * It enables testpmd to collect per queue stats.
- */
-static int
-virtio_dev_queue_stats_mapping_set(__rte_unused struct rte_eth_dev *eth_dev,
-__rte_unused uint16_t queue_id, __rte_unused uint8_t stat_idx,
-__rte_unused uint8_t is_rx)
-{
-	return 0;
-}
-
-static struct rte_driver rte_virtio_driver = {
-	.type = PMD_PDEV,
-	.init = rte_virtio_pmd_init,
-};
-
-PMD_REGISTER_DRIVER(rte_virtio_driver);
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.h b/lib/librte_pmd_virtio/virtio_ethdev.h
deleted file mode 100644
index e6d4533..0000000
--- a/lib/librte_pmd_virtio/virtio_ethdev.h
+++ /dev/null
@@ -1,124 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_ETHDEV_H_
-#define _VIRTIO_ETHDEV_H_
-
-#include <stdint.h>
-
-#include "virtio_pci.h"
-
-#define SPEED_10	10
-#define SPEED_100	100
-#define SPEED_1000	1000
-#define SPEED_10G	10000
-#define HALF_DUPLEX	1
-#define FULL_DUPLEX	2
-
-#ifndef PAGE_SIZE
-#define PAGE_SIZE 4096
-#endif
-
-#define VIRTIO_MAX_RX_QUEUES 128
-#define VIRTIO_MAX_TX_QUEUES 128
-#define VIRTIO_MAX_MAC_ADDRS 64
-#define VIRTIO_MIN_RX_BUFSIZE 64
-#define VIRTIO_MAX_RX_PKTLEN  9728
-
-/* Features desired/implemented by this driver. */
-#define VTNET_FEATURES \
-	(VIRTIO_NET_F_MAC       | \
-	VIRTIO_NET_F_STATUS     | \
-	VIRTIO_NET_F_MQ         | \
-	VIRTIO_NET_F_CTRL_MAC_ADDR | \
-	VIRTIO_NET_F_CTRL_VQ    | \
-	VIRTIO_NET_F_CTRL_RX    | \
-	VIRTIO_NET_F_CTRL_VLAN  | \
-	VIRTIO_NET_F_CSUM       | \
-	VIRTIO_NET_F_HOST_TSO4  | \
-	VIRTIO_NET_F_HOST_TSO6  | \
-	VIRTIO_NET_F_HOST_ECN   | \
-	VIRTIO_NET_F_GUEST_CSUM | \
-	VIRTIO_NET_F_GUEST_TSO4 | \
-	VIRTIO_NET_F_GUEST_TSO6 | \
-	VIRTIO_NET_F_GUEST_ECN  | \
-	VIRTIO_NET_F_MRG_RXBUF  | \
-	VIRTIO_RING_F_INDIRECT_DESC)
-
-/*
- * CQ function prototype
- */
-void virtio_dev_cq_start(struct rte_eth_dev *dev);
-
-/*
- * RX/TX function prototypes
- */
-void virtio_dev_rxtx_start(struct rte_eth_dev *dev);
-
-int virtio_dev_queue_setup(struct rte_eth_dev *dev,
-			int queue_type,
-			uint16_t queue_idx,
-			uint16_t  vtpci_queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			struct virtqueue **pvq);
-
-int  virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
-		uint16_t nb_rx_desc, unsigned int socket_id,
-		const struct rte_eth_rxconf *rx_conf,
-		struct rte_mempool *mb_pool);
-
-int  virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
-		uint16_t nb_tx_desc, unsigned int socket_id,
-		const struct rte_eth_txconf *tx_conf);
-
-uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
-		uint16_t nb_pkts);
-
-uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
-		uint16_t nb_pkts);
-
-uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
-		uint16_t nb_pkts);
-
-
-/*
- * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
- * frames larger than 1514 bytes. We do not yet support software LRO
- * via tcp_lro_rx().
- */
-#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
-			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
-
-
-#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_logs.h b/lib/librte_pmd_virtio/virtio_logs.h
deleted file mode 100644
index d6c33f7..0000000
--- a/lib/librte_pmd_virtio/virtio_logs.h
+++ /dev/null
@@ -1,70 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_LOGS_H_
-#define _VIRTIO_LOGS_H_
-
-#include <rte_log.h>
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_INIT
-#define PMD_INIT_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
-#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
-#else
-#define PMD_INIT_LOG(level, fmt, args...) do { } while(0)
-#define PMD_INIT_FUNC_TRACE() do { } while(0)
-#endif
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_RX
-#define PMD_RX_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s() rx: " fmt , __func__, ## args)
-#else
-#define PMD_RX_LOG(level, fmt, args...) do { } while(0)
-#endif
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_TX
-#define PMD_TX_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s() tx: " fmt , __func__, ## args)
-#else
-#define PMD_TX_LOG(level, fmt, args...) do { } while(0)
-#endif
-
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DRIVER
-#define PMD_DRV_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s(): " fmt , __func__, ## args)
-#else
-#define PMD_DRV_LOG(level, fmt, args...) do { } while(0)
-#endif
-
-#endif /* _VIRTIO_LOGS_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_pci.c b/lib/librte_pmd_virtio/virtio_pci.c
deleted file mode 100644
index 2245bec..0000000
--- a/lib/librte_pmd_virtio/virtio_pci.c
+++ /dev/null
@@ -1,147 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-#include <stdint.h>
-
-#include "virtio_pci.h"
-#include "virtio_logs.h"
-
-static uint8_t vtpci_get_status(struct virtio_hw *);
-
-void
-vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
-		void *dst, int length)
-{
-	uint64_t off;
-	uint8_t *d;
-	int size;
-
-	off = VIRTIO_PCI_CONFIG(hw) + offset;
-	for (d = dst; length > 0; d += size, off += size, length -= size) {
-		if (length >= 4) {
-			size = 4;
-			*(uint32_t *)d = VIRTIO_READ_REG_4(hw, off);
-		} else if (length >= 2) {
-			size = 2;
-			*(uint16_t *)d = VIRTIO_READ_REG_2(hw, off);
-		} else {
-			size = 1;
-			*d = VIRTIO_READ_REG_1(hw, off);
-		}
-	}
-}
-
-void
-vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
-		void *src, int length)
-{
-	uint64_t off;
-	uint8_t *s;
-	int size;
-
-	off = VIRTIO_PCI_CONFIG(hw) + offset;
-	for (s = src; length > 0; s += size, off += size, length -= size) {
-		if (length >= 4) {
-			size = 4;
-			VIRTIO_WRITE_REG_4(hw, off, *(uint32_t *)s);
-		} else if (length >= 2) {
-			size = 2;
-			VIRTIO_WRITE_REG_2(hw, off, *(uint16_t *)s);
-		} else {
-			size = 1;
-			VIRTIO_WRITE_REG_1(hw, off, *s);
-		}
-	}
-}
-
-uint32_t
-vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
-{
-	uint32_t features;
-	/*
-	 * Limit negotiated features to what the driver, virtqueue, and
-	 * host all support.
-	 */
-	features = host_features & hw->guest_features;
-
-	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
-	return features;
-}
-
-
-void
-vtpci_reset(struct virtio_hw *hw)
-{
-	/*
-	 * Setting the status to RESET sets the host device to
-	 * the original, uninitialized state.
-	 */
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
-	vtpci_get_status(hw);
-}
-
-void
-vtpci_reinit_complete(struct virtio_hw *hw)
-{
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
-}
-
-static uint8_t
-vtpci_get_status(struct virtio_hw *hw)
-{
-	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
-}
-
-void
-vtpci_set_status(struct virtio_hw *hw, uint8_t status)
-{
-	if (status != VIRTIO_CONFIG_STATUS_RESET)
-		status = (uint8_t)(status | vtpci_get_status(hw));
-
-	VIRTIO_WRITE_REG_1(hw, VIRTIO_PCI_STATUS, status);
-}
-
-uint8_t
-vtpci_isr(struct virtio_hw *hw)
-{
-
-	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_ISR);
-}
-
-
-/* Enable one vector (0) for Link State Intrerrupt */
-uint16_t
-vtpci_irq_config(struct virtio_hw *hw, uint16_t vec)
-{
-	VIRTIO_WRITE_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR, vec);
-	return VIRTIO_READ_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR);
-}
diff --git a/lib/librte_pmd_virtio/virtio_pci.h b/lib/librte_pmd_virtio/virtio_pci.h
deleted file mode 100644
index 64d9c34..0000000
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ /dev/null
@@ -1,270 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_PCI_H_
-#define _VIRTIO_PCI_H_
-
-#include <stdint.h>
-
-#ifdef __FreeBSD__
-#include <sys/types.h>
-#include <machine/cpufunc.h>
-#else
-#include <sys/io.h>
-#endif
-
-#include <rte_ethdev.h>
-
-struct virtqueue;
-
-/* VirtIO PCI vendor/device ID. */
-#define VIRTIO_PCI_VENDORID     0x1AF4
-#define VIRTIO_PCI_DEVICEID_MIN 0x1000
-#define VIRTIO_PCI_DEVICEID_MAX 0x103F
-
-/* VirtIO ABI version, this must match exactly. */
-#define VIRTIO_PCI_ABI_VERSION 0
-
-/*
- * VirtIO Header, located in BAR 0.
- */
-#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
-#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
-#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
-#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
-#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
-#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
-#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
-#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
-				      * also clears the register (8, RO) */
-/* Only if MSIX is enabled: */
-#define VIRTIO_MSI_CONFIG_VECTOR  20 /* configuration change vector (16, RW) */
-#define VIRTIO_MSI_QUEUE_VECTOR	  22 /* vector for selected VQ notifications
-				      (16, RW) */
-
-/* The bit of the ISR which indicates a device has an interrupt. */
-#define VIRTIO_PCI_ISR_INTR   0x1
-/* The bit of the ISR which indicates a device configuration change. */
-#define VIRTIO_PCI_ISR_CONFIG 0x2
-/* Vector value used to disable MSI for queue. */
-#define VIRTIO_MSI_NO_VECTOR 0xFFFF
-
-/* VirtIO device IDs. */
-#define VIRTIO_ID_NETWORK  0x01
-#define VIRTIO_ID_BLOCK    0x02
-#define VIRTIO_ID_CONSOLE  0x03
-#define VIRTIO_ID_ENTROPY  0x04
-#define VIRTIO_ID_BALLOON  0x05
-#define VIRTIO_ID_IOMEMORY 0x06
-#define VIRTIO_ID_9P       0x09
-
-/* Status byte for guest to report progress. */
-#define VIRTIO_CONFIG_STATUS_RESET     0x00
-#define VIRTIO_CONFIG_STATUS_ACK       0x01
-#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
-#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
-#define VIRTIO_CONFIG_STATUS_FAILED    0x80
-
-/*
- * Generate interrupt when the virtqueue ring is
- * completely used, even if we've suppressed them.
- */
-#define VIRTIO_F_NOTIFY_ON_EMPTY (1 << 24)
-
-/*
- * The guest should never negotiate this feature; it
- * is used to detect faulty drivers.
- */
-#define VIRTIO_F_BAD_FEATURE (1 << 30)
-
-/*
- * Some VirtIO feature bits (currently bits 28 through 31) are
- * reserved for the transport being used (eg. virtio_ring), the
- * rest are per-device feature bits.
- */
-#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END   32
-
-/*
- * Each virtqueue indirect descriptor list must be physically contiguous.
- * To allow us to malloc(9) each list individually, limit the number
- * supported to what will fit in one page. With 4KB pages, this is a limit
- * of 256 descriptors. If there is ever a need for more, we can switch to
- * contigmalloc(9) for the larger allocations, similar to what
- * bus_dmamem_alloc(9) does.
- *
- * Note the sizeof(struct vring_desc) is 16 bytes.
- */
-#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
-
-/* The feature bitmap for virtio net */
-#define VIRTIO_NET_F_CSUM       0x00001 /* Host handles pkts w/ partial csum */
-#define VIRTIO_NET_F_GUEST_CSUM 0x00002 /* Guest handles pkts w/ partial csum*/
-#define VIRTIO_NET_F_MAC        0x00020 /* Host has given MAC address. */
-#define VIRTIO_NET_F_GSO        0x00040 /* Host handles pkts w/ any GSO type */
-#define VIRTIO_NET_F_GUEST_TSO4 0x00080 /* Guest can handle TSOv4 in. */
-#define VIRTIO_NET_F_GUEST_TSO6 0x00100 /* Guest can handle TSOv6 in. */
-#define VIRTIO_NET_F_GUEST_ECN  0x00200 /* Guest can handle TSO[6] w/ ECN in.*/
-#define VIRTIO_NET_F_GUEST_UFO  0x00400 /* Guest can handle UFO in. */
-#define VIRTIO_NET_F_HOST_TSO4  0x00800 /* Host can handle TSOv4 in. */
-#define VIRTIO_NET_F_HOST_TSO6  0x01000 /* Host can handle TSOv6 in. */
-#define VIRTIO_NET_F_HOST_ECN   0x02000 /* Host can handle TSO[6] w/ ECN in. */
-#define VIRTIO_NET_F_HOST_UFO   0x04000 /* Host can handle UFO in. */
-#define VIRTIO_NET_F_MRG_RXBUF  0x08000 /* Host can merge receive buffers. */
-#define VIRTIO_NET_F_STATUS     0x10000 /* virtio_net_config.status available*/
-#define VIRTIO_NET_F_CTRL_VQ    0x20000 /* Control channel available */
-#define VIRTIO_NET_F_CTRL_RX    0x40000 /* Control channel RX mode support */
-#define VIRTIO_NET_F_CTRL_VLAN  0x80000 /* Control channel VLAN filtering */
-#define VIRTIO_NET_F_CTRL_RX_EXTRA  0x100000 /* Extra RX mode control support */
-#define VIRTIO_RING_F_INDIRECT_DESC 0x10000000 /* Support for indirect buffer descriptors. */
-/* The guest publishes the used index for which it expects an interrupt
- * at the end of the avail ring. Host should ignore the avail->flags field.
- * The host publishes the avail index for which it expects a kick
- * at the end of the used ring. Guest should ignore the used->flags field.
- */
-#define VIRTIO_RING_F_EVENT_IDX 0x20000000
-
-#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
-
-/*
- * Maximum number of virtqueues per device.
- */
-#define VIRTIO_MAX_VIRTQUEUES 8
-
-struct virtio_hw {
-	struct virtqueue *cvq;
-	uint32_t    io_base;
-	uint32_t    guest_features;
-	uint32_t    max_tx_queues;
-	uint32_t    max_rx_queues;
-	uint16_t    vtnet_hdr_size;
-	uint8_t	    vlan_strip;
-	uint8_t	    use_msix;
-	uint8_t     started;
-	uint8_t     mac_addr[ETHER_ADDR_LEN];
-};
-
-/*
- * This structure is just a reference to read
- * net device specific config space; it just a chodu structure
- *
- */
-struct virtio_net_config {
-	/* The config defining mac address (if VIRTIO_NET_F_MAC) */
-	uint8_t    mac[ETHER_ADDR_LEN];
-	/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
-	uint16_t   status;
-	uint16_t   max_virtqueue_pairs;
-} __attribute__((packed));
-
-/*
- * The remaining space is defined by each driver as the per-driver
- * configuration space.
- */
-#define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)
-
-/*
- * How many bits to shift physical queue address written to QUEUE_PFN.
- * 12 is historical, and due to x86 page size.
- */
-#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
-
-/* The alignment to use between consumer and producer parts of vring. */
-#define VIRTIO_PCI_VRING_ALIGN 4096
-
-#ifdef __FreeBSD__
-
-static inline void
-outb_p(unsigned char data, unsigned int port)
-{
-
-	outb(port, (u_char)data);
-}
-
-static inline void
-outw_p(unsigned short data, unsigned int port)
-{
-	outw(port, (u_short)data);
-}
-
-static inline void
-outl_p(unsigned int data, unsigned int port)
-{
-	outl(port, (u_int)data);
-}
-#endif
-
-#define VIRTIO_PCI_REG_ADDR(hw, reg) \
-	(unsigned short)((hw)->io_base + (reg))
-
-#define VIRTIO_READ_REG_1(hw, reg) \
-	inb((VIRTIO_PCI_REG_ADDR((hw), (reg))))
-#define VIRTIO_WRITE_REG_1(hw, reg, value) \
-	outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
-
-#define VIRTIO_READ_REG_2(hw, reg) \
-	inw((VIRTIO_PCI_REG_ADDR((hw), (reg))))
-#define VIRTIO_WRITE_REG_2(hw, reg, value) \
-	outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
-
-#define VIRTIO_READ_REG_4(hw, reg) \
-	inl((VIRTIO_PCI_REG_ADDR((hw), (reg))))
-#define VIRTIO_WRITE_REG_4(hw, reg, value) \
-	outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
-
-static inline int
-vtpci_with_feature(struct virtio_hw *hw, uint32_t feature)
-{
-	return (hw->guest_features & feature) != 0;
-}
-
-/*
- * Function declaration from virtio_pci.c
- */
-void vtpci_reset(struct virtio_hw *);
-
-void vtpci_reinit_complete(struct virtio_hw *);
-
-void vtpci_set_status(struct virtio_hw *, uint8_t);
-
-uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
-
-void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);
-
-void vtpci_read_dev_config(struct virtio_hw *, uint64_t, void *, int);
-
-uint8_t vtpci_isr(struct virtio_hw *);
-
-uint16_t vtpci_irq_config(struct virtio_hw *, uint16_t);
-
-#endif /* _VIRTIO_PCI_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_ring.h b/lib/librte_pmd_virtio/virtio_ring.h
deleted file mode 100644
index a16c499..0000000
--- a/lib/librte_pmd_virtio/virtio_ring.h
+++ /dev/null
@@ -1,163 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_RING_H_
-#define _VIRTIO_RING_H_
-
-#include <stdint.h>
-
-#include <rte_common.h>
-
-/* This marks a buffer as continuing via the next field. */
-#define VRING_DESC_F_NEXT       1
-/* This marks a buffer as write-only (otherwise read-only). */
-#define VRING_DESC_F_WRITE      2
-/* This means the buffer contains a list of buffer descriptors. */
-#define VRING_DESC_F_INDIRECT   4
-
-/* The Host uses this in used->flags to advise the Guest: don't kick me
- * when you add a buffer.  It's unreliable, so it's simply an
- * optimization.  Guest will still kick if it's out of buffers. */
-#define VRING_USED_F_NO_NOTIFY  1
-/* The Guest uses this in avail->flags to advise the Host: don't
- * interrupt me when you consume a buffer.  It's unreliable, so it's
- * simply an optimization.  */
-#define VRING_AVAIL_F_NO_INTERRUPT  1
-
-/* VirtIO ring descriptors: 16 bytes.
- * These can chain together via "next". */
-struct vring_desc {
-	uint64_t addr;  /*  Address (guest-physical). */
-	uint32_t len;   /* Length. */
-	uint16_t flags; /* The flags as indicated above. */
-	uint16_t next;  /* We chain unused descriptors via this. */
-};
-
-struct vring_avail {
-	uint16_t flags;
-	uint16_t idx;
-	uint16_t ring[0];
-};
-
-/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
-struct vring_used_elem {
-	/* Index of start of used descriptor chain. */
-	uint32_t id;
-	/* Total length of the descriptor chain which was written to. */
-	uint32_t len;
-};
-
-struct vring_used {
-	uint16_t flags;
-	uint16_t idx;
-	struct vring_used_elem ring[0];
-};
-
-struct vring {
-	unsigned int num;
-	struct vring_desc  *desc;
-	struct vring_avail *avail;
-	struct vring_used  *used;
-};
-
-/* The standard layout for the ring is a continuous chunk of memory which
- * looks like this.  We assume num is a power of 2.
- *
- * struct vring {
- *      // The actual descriptors (16 bytes each)
- *      struct vring_desc desc[num];
- *
- *      // A ring of available descriptor heads with free-running index.
- *      __u16 avail_flags;
- *      __u16 avail_idx;
- *      __u16 available[num];
- *      __u16 used_event_idx;
- *
- *      // Padding to the next align boundary.
- *      char pad[];
- *
- *      // A ring of used descriptor heads with free-running index.
- *      __u16 used_flags;
- *      __u16 used_idx;
- *      struct vring_used_elem used[num];
- *      __u16 avail_event_idx;
- * };
- *
- * NOTE: for VirtIO PCI, align is 4096.
- */
-
-/*
- * We publish the used event index at the end of the available ring, and vice
- * versa. They are at the end for backwards compatibility.
- */
-#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
-#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
-
-static inline int
-vring_size(unsigned int num, unsigned long align)
-{
-	int size;
-
-	size = num * sizeof(struct vring_desc);
-	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
-	size = RTE_ALIGN_CEIL(size, align);
-	size += sizeof(struct vring_used) +
-		(num * sizeof(struct vring_used_elem));
-	return size;
-}
-
-static inline void
-vring_init(struct vring *vr, unsigned int num, uint8_t *p,
-	unsigned long align)
-{
-	vr->num = num;
-	vr->desc = (struct vring_desc *) p;
-	vr->avail = (struct vring_avail *) (p +
-		num * sizeof(struct vring_desc));
-	vr->used = (void *)
-		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
-}
-
-/*
- * The following is used with VIRTIO_RING_F_EVENT_IDX.
- * Assuming a given event_idx value from the other size, if we have
- * just incremented index from old to new_idx, should we trigger an
- * event?
- */
-static inline int
-vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
-{
-	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
-}
-
-#endif /* _VIRTIO_RING_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c b/lib/librte_pmd_virtio/virtio_rxtx.c
deleted file mode 100644
index 3ff275c..0000000
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ /dev/null
@@ -1,815 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdint.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <errno.h>
-
-#include <rte_cycles.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_branch_prediction.h>
-#include <rte_mempool.h>
-#include <rte_malloc.h>
-#include <rte_mbuf.h>
-#include <rte_ether.h>
-#include <rte_ethdev.h>
-#include <rte_prefetch.h>
-#include <rte_string_fns.h>
-#include <rte_errno.h>
-#include <rte_byteorder.h>
-
-#include "virtio_logs.h"
-#include "virtio_ethdev.h"
-#include "virtqueue.h"
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
-#define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
-#else
-#define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
-#endif
-
-static void
-vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
-{
-	struct vring_desc *dp, *dp_tail;
-	struct vq_desc_extra *dxp;
-	uint16_t desc_idx_last = desc_idx;
-
-	dp  = &vq->vq_ring.desc[desc_idx];
-	dxp = &vq->vq_descx[desc_idx];
-	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt + dxp->ndescs);
-	if ((dp->flags & VRING_DESC_F_INDIRECT) == 0) {
-		while (dp->flags & VRING_DESC_F_NEXT) {
-			desc_idx_last = dp->next;
-			dp = &vq->vq_ring.desc[dp->next];
-		}
-	}
-	dxp->ndescs = 0;
-
-	/*
-	 * We must append the existing free chain, if any, to the end of
-	 * newly freed chain. If the virtqueue was completely used, then
-	 * head would be VQ_RING_DESC_CHAIN_END (ASSERTed above).
-	 */
-	if (vq->vq_desc_tail_idx == VQ_RING_DESC_CHAIN_END) {
-		vq->vq_desc_head_idx = desc_idx;
-	} else {
-		dp_tail = &vq->vq_ring.desc[vq->vq_desc_tail_idx];
-		dp_tail->next = desc_idx;
-	}
-
-	vq->vq_desc_tail_idx = desc_idx_last;
-	dp->next = VQ_RING_DESC_CHAIN_END;
-}
-
-static uint16_t
-virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct rte_mbuf **rx_pkts,
-			   uint32_t *len, uint16_t num)
-{
-	struct vring_used_elem *uep;
-	struct rte_mbuf *cookie;
-	uint16_t used_idx, desc_idx;
-	uint16_t i;
-
-	/*  Caller does the check */
-	for (i = 0; i < num ; i++) {
-		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-		uep = &vq->vq_ring.used->ring[used_idx];
-		desc_idx = (uint16_t) uep->id;
-		len[i] = uep->len;
-		cookie = (struct rte_mbuf *)vq->vq_descx[desc_idx].cookie;
-
-		if (unlikely(cookie == NULL)) {
-			PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie at %u\n",
-				vq->vq_used_cons_idx);
-			break;
-		}
-
-		rte_prefetch0(cookie);
-		rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *));
-		rx_pkts[i]  = cookie;
-		vq->vq_used_cons_idx++;
-		vq_ring_free_chain(vq, desc_idx);
-		vq->vq_descx[desc_idx].cookie = NULL;
-	}
-
-	return i;
-}
-
-#ifndef DEFAULT_TX_FREE_THRESH
-#define DEFAULT_TX_FREE_THRESH 32
-#endif
-
-/* Cleanup from completed transmits. */
-static void
-virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
-{
-	uint16_t i, used_idx, desc_idx;
-	for (i = 0; i < num; i++) {
-		struct vring_used_elem *uep;
-		struct vq_desc_extra *dxp;
-
-		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-		uep = &vq->vq_ring.used->ring[used_idx];
-
-		desc_idx = (uint16_t) uep->id;
-		dxp = &vq->vq_descx[desc_idx];
-		vq->vq_used_cons_idx++;
-		vq_ring_free_chain(vq, desc_idx);
-
-		if (dxp->cookie != NULL) {
-			rte_pktmbuf_free(dxp->cookie);
-			dxp->cookie = NULL;
-		}
-	}
-}
-
-
-static inline int
-virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
-{
-	struct vq_desc_extra *dxp;
-	struct virtio_hw *hw = vq->hw;
-	struct vring_desc *start_dp;
-	uint16_t needed = 1;
-	uint16_t head_idx, idx;
-
-	if (unlikely(vq->vq_free_cnt == 0))
-		return -ENOSPC;
-	if (unlikely(vq->vq_free_cnt < needed))
-		return -EMSGSIZE;
-
-	head_idx = vq->vq_desc_head_idx;
-	if (unlikely(head_idx >= vq->vq_nentries))
-		return -EFAULT;
-
-	idx = head_idx;
-	dxp = &vq->vq_descx[idx];
-	dxp->cookie = (void *)cookie;
-	dxp->ndescs = needed;
-
-	start_dp = vq->vq_ring.desc;
-	start_dp[idx].addr =
-		(uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
-		- hw->vtnet_hdr_size);
-	start_dp[idx].len =
-		cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
-	start_dp[idx].flags =  VRING_DESC_F_WRITE;
-	idx = start_dp[idx].next;
-	vq->vq_desc_head_idx = idx;
-	if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
-		vq->vq_desc_tail_idx = idx;
-	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed);
-	vq_update_avail_ring(vq, head_idx);
-
-	return 0;
-}
-
-static int
-virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
-{
-	struct vq_desc_extra *dxp;
-	struct vring_desc *start_dp;
-	uint16_t seg_num = cookie->nb_segs;
-	uint16_t needed = 1 + seg_num;
-	uint16_t head_idx, idx;
-	uint16_t head_size = txvq->hw->vtnet_hdr_size;
-
-	if (unlikely(txvq->vq_free_cnt == 0))
-		return -ENOSPC;
-	if (unlikely(txvq->vq_free_cnt < needed))
-		return -EMSGSIZE;
-	head_idx = txvq->vq_desc_head_idx;
-	if (unlikely(head_idx >= txvq->vq_nentries))
-		return -EFAULT;
-
-	idx = head_idx;
-	dxp = &txvq->vq_descx[idx];
-	dxp->cookie = (void *)cookie;
-	dxp->ndescs = needed;
-
-	start_dp = txvq->vq_ring.desc;
-	start_dp[idx].addr =
-		txvq->virtio_net_hdr_mem + idx * head_size;
-	start_dp[idx].len = (uint32_t)head_size;
-	start_dp[idx].flags = VRING_DESC_F_NEXT;
-
-	for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
-		idx = start_dp[idx].next;
-		start_dp[idx].addr  = RTE_MBUF_DATA_DMA_ADDR(cookie);
-		start_dp[idx].len   = cookie->data_len;
-		start_dp[idx].flags = VRING_DESC_F_NEXT;
-		cookie = cookie->next;
-	}
-
-	start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
-	idx = start_dp[idx].next;
-	txvq->vq_desc_head_idx = idx;
-	if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
-		txvq->vq_desc_tail_idx = idx;
-	txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed);
-	vq_update_avail_ring(txvq, head_idx);
-
-	return 0;
-}
-
-static inline struct rte_mbuf *
-rte_rxmbuf_alloc(struct rte_mempool *mp)
-{
-	struct rte_mbuf *m;
-
-	m = __rte_mbuf_raw_alloc(mp);
-	__rte_mbuf_sanity_check_raw(m, 0);
-
-	return m;
-}
-
-static void
-virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
-{
-	struct rte_mbuf *m;
-	int i, nbufs, error, size = vq->vq_nentries;
-	struct vring *vr = &vq->vq_ring;
-	uint8_t *ring_mem = vq->vq_ring_virt_mem;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/*
-	 * Reinitialise since virtio port might have been stopped and restarted
-	 */
-	memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-	vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
-	vq->vq_used_cons_idx = 0;
-	vq->vq_desc_head_idx = 0;
-	vq->vq_avail_idx = 0;
-	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
-	vq->vq_free_cnt = vq->vq_nentries;
-	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
-
-	/* Chain all the descriptors in the ring with an END */
-	for (i = 0; i < size - 1; i++)
-		vr->desc[i].next = (uint16_t)(i + 1);
-	vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
-
-	/*
-	 * Disable device(host) interrupting guest
-	 */
-	virtqueue_disable_intr(vq);
-
-	/* Only rx virtqueue needs mbufs to be allocated at initialization */
-	if (queue_type == VTNET_RQ) {
-		if (vq->mpool == NULL)
-			rte_exit(EXIT_FAILURE,
-			"Cannot allocate initial mbufs for rx virtqueue");
-
-		/* Allocate blank mbufs for the each rx descriptor */
-		nbufs = 0;
-		error = ENOSPC;
-		while (!virtqueue_full(vq)) {
-			m = rte_rxmbuf_alloc(vq->mpool);
-			if (m == NULL)
-				break;
-
-			/******************************************
-			*         Enqueue allocated buffers        *
-			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
-			if (error) {
-				rte_pktmbuf_free(m);
-				break;
-			}
-			nbufs++;
-		}
-
-		vq_update_avail_idx(vq);
-
-		PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-			vq->vq_queue_index);
-		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	} else if (queue_type == VTNET_TQ) {
-		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-			vq->vq_queue_index);
-		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	} else {
-		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-			vq->vq_queue_index);
-		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	}
-}
-
-void
-virtio_dev_cq_start(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	if (hw->cvq) {
-		virtio_dev_vring_start(hw->cvq, VTNET_CQ);
-		VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq);
-	}
-}
-
-void
-virtio_dev_rxtx_start(struct rte_eth_dev *dev)
-{
-	/*
-	 * Start receive and transmit vrings
-	 * -	Setup vring structure for all queues
-	 * -	Initialize descriptor for the rx vring
-	 * -	Allocate blank mbufs for the each rx descriptor
-	 *
-	 */
-	int i;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* Start rx vring. */
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		virtio_dev_vring_start(dev->data->rx_queues[i], VTNET_RQ);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-	}
-
-	/* Start tx vring. */
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		virtio_dev_vring_start(dev->data->tx_queues[i], VTNET_TQ);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-	}
-}
-
-int
-virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
-			uint16_t queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			__rte_unused const struct rte_eth_rxconf *rx_conf,
-			struct rte_mempool *mp)
-{
-	uint16_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_RQ_QUEUE_IDX;
-	struct virtqueue *vq;
-	int ret;
-
-	PMD_INIT_FUNC_TRACE();
-	ret = virtio_dev_queue_setup(dev, VTNET_RQ, queue_idx, vtpci_queue_idx,
-			nb_desc, socket_id, &vq);
-	if (ret < 0) {
-		PMD_INIT_LOG(ERR, "tvq initialization failed");
-		return ret;
-	}
-
-	/* Create mempool for rx mbuf allocation */
-	vq->mpool = mp;
-
-	dev->data->rx_queues[queue_idx] = vq;
-	return 0;
-}
-
-/*
- * struct rte_eth_dev *dev: Used to update dev
- * uint16_t nb_desc: Defaults to values read from config space
- * unsigned int socket_id: Used to allocate memzone
- * const struct rte_eth_txconf *tx_conf: Used to setup tx engine
- * uint16_t queue_idx: Just used as an index in dev txq list
- */
-int
-virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
-			uint16_t queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			const struct rte_eth_txconf *tx_conf)
-{
-	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
-	struct virtqueue *vq;
-	uint16_t tx_free_thresh;
-	int ret;
-
-	PMD_INIT_FUNC_TRACE();
-
-	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
-	    != ETH_TXQ_FLAGS_NOXSUMS) {
-		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
-		return -EINVAL;
-	}
-
-	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
-			nb_desc, socket_id, &vq);
-	if (ret < 0) {
-		PMD_INIT_LOG(ERR, "rvq initialization failed");
-		return ret;
-	}
-
-	tx_free_thresh = tx_conf->tx_free_thresh;
-	if (tx_free_thresh == 0)
-		tx_free_thresh =
-			RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
-
-	if (tx_free_thresh >= (vq->vq_nentries - 3)) {
-		RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
-			"number of TX entries minus 3 (%u)."
-			" (tx_free_thresh=%u port=%u queue=%u)\n",
-			vq->vq_nentries - 3,
-			tx_free_thresh, dev->data->port_id, queue_idx);
-		return -EINVAL;
-	}
-
-	vq->vq_free_thresh = tx_free_thresh;
-
-	dev->data->tx_queues[queue_idx] = vq;
-	return 0;
-}
-
-static void
-virtio_discard_rxbuf(struct virtqueue *vq, struct rte_mbuf *m)
-{
-	int error;
-	/*
-	 * Requeue the discarded mbuf. This should always be
-	 * successful since it was just dequeued.
-	 */
-	error = virtqueue_enqueue_recv_refill(vq, m);
-	if (unlikely(error)) {
-		RTE_LOG(ERR, PMD, "cannot requeue discarded mbuf");
-		rte_pktmbuf_free(m);
-	}
-}
-
-#define VIRTIO_MBUF_BURST_SZ 64
-#define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
-uint16_t
-virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
-{
-	struct virtqueue *rxvq = rx_queue;
-	struct virtio_hw *hw;
-	struct rte_mbuf *rxm, *new_mbuf;
-	uint16_t nb_used, num, nb_rx;
-	uint32_t len[VIRTIO_MBUF_BURST_SZ];
-	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
-	int error;
-	uint32_t i, nb_enqueued;
-	const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
-
-	nb_used = VIRTQUEUE_NUSED(rxvq);
-
-	virtio_rmb();
-
-	num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
-	num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : VIRTIO_MBUF_BURST_SZ);
-	if (likely(num > DESC_PER_CACHELINE))
-		num = num - ((rxvq->vq_used_cons_idx + num) % DESC_PER_CACHELINE);
-
-	if (num == 0)
-		return 0;
-
-	num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
-	PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
-
-	hw = rxvq->hw;
-	nb_rx = 0;
-	nb_enqueued = 0;
-
-	for (i = 0; i < num ; i++) {
-		rxm = rcv_pkts[i];
-
-		PMD_RX_LOG(DEBUG, "packet len:%d", len[i]);
-
-		if (unlikely(len[i] < hdr_size + ETHER_HDR_LEN)) {
-			PMD_RX_LOG(ERR, "Packet drop");
-			nb_enqueued++;
-			virtio_discard_rxbuf(rxvq, rxm);
-			rxvq->errors++;
-			continue;
-		}
-
-		rxm->port = rxvq->port_id;
-		rxm->data_off = RTE_PKTMBUF_HEADROOM;
-
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
-		rxm->data_len = (uint16_t)(len[i] - hdr_size);
-
-		if (hw->vlan_strip)
-			rte_vlan_strip(rxm);
-
-		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
-
-		rx_pkts[nb_rx++] = rxm;
-		rxvq->bytes += rx_pkts[nb_rx - 1]->pkt_len;
-	}
-
-	rxvq->packets += nb_rx;
-
-	/* Allocate new mbuf for the used descriptor */
-	error = ENOSPC;
-	while (likely(!virtqueue_full(rxvq))) {
-		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
-		if (unlikely(new_mbuf == NULL)) {
-			struct rte_eth_dev *dev
-				= &rte_eth_devices[rxvq->port_id];
-			dev->data->rx_mbuf_alloc_failed++;
-			break;
-		}
-		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
-		if (unlikely(error)) {
-			rte_pktmbuf_free(new_mbuf);
-			break;
-		}
-		nb_enqueued++;
-	}
-
-	if (likely(nb_enqueued)) {
-		vq_update_avail_idx(rxvq);
-
-		if (unlikely(virtqueue_kick_prepare(rxvq))) {
-			virtqueue_notify(rxvq);
-			PMD_RX_LOG(DEBUG, "Notified\n");
-		}
-	}
-
-	return nb_rx;
-}
-
-uint16_t
-virtio_recv_mergeable_pkts(void *rx_queue,
-			struct rte_mbuf **rx_pkts,
-			uint16_t nb_pkts)
-{
-	struct virtqueue *rxvq = rx_queue;
-	struct virtio_hw *hw;
-	struct rte_mbuf *rxm, *new_mbuf;
-	uint16_t nb_used, num, nb_rx;
-	uint32_t len[VIRTIO_MBUF_BURST_SZ];
-	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
-	struct rte_mbuf *prev;
-	int error;
-	uint32_t i, nb_enqueued;
-	uint32_t seg_num;
-	uint16_t extra_idx;
-	uint32_t seg_res;
-	const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-
-	nb_used = VIRTQUEUE_NUSED(rxvq);
-
-	virtio_rmb();
-
-	if (nb_used == 0)
-		return 0;
-
-	PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);
-
-	hw = rxvq->hw;
-	nb_rx = 0;
-	i = 0;
-	nb_enqueued = 0;
-	seg_num = 0;
-	extra_idx = 0;
-	seg_res = 0;
-
-	while (i < nb_used) {
-		struct virtio_net_hdr_mrg_rxbuf *header;
-
-		if (nb_rx == nb_pkts)
-			break;
-
-		num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, 1);
-		if (num != 1)
-			continue;
-
-		i++;
-
-		PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
-		PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
-
-		rxm = rcv_pkts[0];
-
-		if (unlikely(len[0] < hdr_size + ETHER_HDR_LEN)) {
-			PMD_RX_LOG(ERR, "Packet drop\n");
-			nb_enqueued++;
-			virtio_discard_rxbuf(rxvq, rxm);
-			rxvq->errors++;
-			continue;
-		}
-
-		header = (struct virtio_net_hdr_mrg_rxbuf *)((char *)rxm->buf_addr +
-			RTE_PKTMBUF_HEADROOM - hdr_size);
-		seg_num = header->num_buffers;
-
-		if (seg_num == 0)
-			seg_num = 1;
-
-		rxm->data_off = RTE_PKTMBUF_HEADROOM;
-		rxm->nb_segs = seg_num;
-		rxm->next = NULL;
-		rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
-		rxm->data_len = (uint16_t)(len[0] - hdr_size);
-
-		rxm->port = rxvq->port_id;
-		rx_pkts[nb_rx] = rxm;
-		prev = rxm;
-
-		seg_res = seg_num - 1;
-
-		while (seg_res != 0) {
-			/*
-			 * Get extra segments for current uncompleted packet.
-			 */
-			uint16_t  rcv_cnt =
-				RTE_MIN(seg_res, RTE_DIM(rcv_pkts));
-			if (likely(VIRTQUEUE_NUSED(rxvq) >= rcv_cnt)) {
-				uint32_t rx_num =
-					virtqueue_dequeue_burst_rx(rxvq,
-					rcv_pkts, len, rcv_cnt);
-				i += rx_num;
-				rcv_cnt = rx_num;
-			} else {
-				PMD_RX_LOG(ERR,
-					"No enough segments for packet.\n");
-				nb_enqueued++;
-				virtio_discard_rxbuf(rxvq, rxm);
-				rxvq->errors++;
-				break;
-			}
-
-			extra_idx = 0;
-
-			while (extra_idx < rcv_cnt) {
-				rxm = rcv_pkts[extra_idx];
-
-				rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
-				rxm->next = NULL;
-				rxm->pkt_len = (uint32_t)(len[extra_idx]);
-				rxm->data_len = (uint16_t)(len[extra_idx]);
-
-				if (prev)
-					prev->next = rxm;
-
-				prev = rxm;
-				rx_pkts[nb_rx]->pkt_len += rxm->pkt_len;
-				extra_idx++;
-			};
-			seg_res -= rcv_cnt;
-		}
-
-		if (hw->vlan_strip)
-			rte_vlan_strip(rx_pkts[nb_rx]);
-
-		VIRTIO_DUMP_PACKET(rx_pkts[nb_rx],
-			rx_pkts[nb_rx]->data_len);
-
-		rxvq->bytes += rx_pkts[nb_rx]->pkt_len;
-		nb_rx++;
-	}
-
-	rxvq->packets += nb_rx;
-
-	/* Allocate new mbuf for the used descriptor */
-	error = ENOSPC;
-	while (likely(!virtqueue_full(rxvq))) {
-		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
-		if (unlikely(new_mbuf == NULL)) {
-			struct rte_eth_dev *dev
-				= &rte_eth_devices[rxvq->port_id];
-			dev->data->rx_mbuf_alloc_failed++;
-			break;
-		}
-		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
-		if (unlikely(error)) {
-			rte_pktmbuf_free(new_mbuf);
-			break;
-		}
-		nb_enqueued++;
-	}
-
-	if (likely(nb_enqueued)) {
-		vq_update_avail_idx(rxvq);
-
-		if (unlikely(virtqueue_kick_prepare(rxvq))) {
-			virtqueue_notify(rxvq);
-			PMD_RX_LOG(DEBUG, "Notified");
-		}
-	}
-
-	return nb_rx;
-}
-
-uint16_t
-virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-	struct virtqueue *txvq = tx_queue;
-	struct rte_mbuf *txm;
-	uint16_t nb_used, nb_tx;
-	int error;
-
-	if (unlikely(nb_pkts < 1))
-		return nb_pkts;
-
-	PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
-	nb_used = VIRTQUEUE_NUSED(txvq);
-
-	virtio_rmb();
-	if (likely(nb_used > txvq->vq_free_thresh))
-		virtio_xmit_cleanup(txvq, nb_used);
-
-	nb_tx = 0;
-
-	while (nb_tx < nb_pkts) {
-		/* Need one more descriptor for virtio header. */
-		int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
-
-		/*Positive value indicates it need free vring descriptors */
-		if (unlikely(need > 0)) {
-			nb_used = VIRTQUEUE_NUSED(txvq);
-			virtio_rmb();
-			need = RTE_MIN(need, (int)nb_used);
-
-			virtio_xmit_cleanup(txvq, need);
-			need = (int)tx_pkts[nb_tx]->nb_segs -
-				txvq->vq_free_cnt + 1;
-		}
-
-		/*
-		 * Zero or negative value indicates it has enough free
-		 * descriptors to use for transmitting.
-		 */
-		if (likely(need <= 0)) {
-			txm = tx_pkts[nb_tx];
-
-			/* Do VLAN tag insertion */
-			if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
-				error = rte_vlan_insert(&txm);
-				if (unlikely(error)) {
-					rte_pktmbuf_free(txm);
-					++nb_tx;
-					continue;
-				}
-			}
-
-			/* Enqueue Packet buffers */
-			error = virtqueue_enqueue_xmit(txvq, txm);
-			if (unlikely(error)) {
-				if (error == ENOSPC)
-					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
-				else if (error == EMSGSIZE)
-					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
-				else
-					PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
-				break;
-			}
-			nb_tx++;
-			txvq->bytes += txm->pkt_len;
-		} else {
-			PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
-			break;
-		}
-	}
-
-	txvq->packets += nb_tx;
-
-	if (likely(nb_tx)) {
-		vq_update_avail_idx(txvq);
-
-		if (unlikely(virtqueue_kick_prepare(txvq))) {
-			virtqueue_notify(txvq);
-			PMD_TX_LOG(DEBUG, "Notified backend after xmit");
-		}
-	}
-
-	return nb_tx;
-}
diff --git a/lib/librte_pmd_virtio/virtqueue.c b/lib/librte_pmd_virtio/virtqueue.c
deleted file mode 100644
index 8a3005f..0000000
--- a/lib/librte_pmd_virtio/virtqueue.c
+++ /dev/null
@@ -1,70 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-#include <stdint.h>
-
-#include <rte_mbuf.h>
-
-#include "virtqueue.h"
-#include "virtio_logs.h"
-#include "virtio_pci.h"
-
-void
-virtqueue_disable_intr(struct virtqueue *vq)
-{
-	/*
-	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
-	 * not to interrupt when it consumes packets
-	 * Note: this is only considered a hint to the host
-	 */
-	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
-}
-
-/*
- * Two types of mbuf to be cleaned:
- * 1) mbuf that has been consumed by backend but not used by virtio.
- * 2) mbuf that hasn't been consued by backend.
- */
-struct rte_mbuf *
-virtqueue_detatch_unused(struct virtqueue *vq)
-{
-	struct rte_mbuf *cookie;
-	int idx;
-
-	for (idx = 0; idx < vq->vq_nentries; idx++) {
-		if ((cookie = vq->vq_descx[idx].cookie) != NULL) {
-			vq->vq_descx[idx].cookie = NULL;
-			return cookie;
-		}
-	}
-	return NULL;
-}
diff --git a/lib/librte_pmd_virtio/virtqueue.h b/lib/librte_pmd_virtio/virtqueue.h
deleted file mode 100644
index 9d6079e..0000000
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ /dev/null
@@ -1,325 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTQUEUE_H_
-#define _VIRTQUEUE_H_
-
-#include <stdint.h>
-
-#include <rte_atomic.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_mempool.h>
-
-#include "virtio_pci.h"
-#include "virtio_ring.h"
-#include "virtio_logs.h"
-
-struct rte_mbuf;
-
-/*
- * Per virtio_config.h in Linux.
- *     For virtio_pci on SMP, we don't need to order with respect to MMIO
- *     accesses through relaxed memory I/O windows, so smp_mb() et al are
- *     sufficient.
- *
- * This driver is for virtio_pci on SMP and therefore can assume
- * weaker (compiler barriers)
- */
-#define virtio_mb()	rte_mb()
-#define virtio_rmb()	rte_compiler_barrier()
-#define virtio_wmb()	rte_compiler_barrier()
-
-#ifdef RTE_PMD_PACKET_PREFETCH
-#define rte_packet_prefetch(p)  rte_prefetch1(p)
-#else
-#define rte_packet_prefetch(p)  do {} while(0)
-#endif
-
-#define VIRTQUEUE_MAX_NAME_SZ 32
-
-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-	(uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
-#define VTNET_SQ_RQ_QUEUE_IDX 0
-#define VTNET_SQ_TQ_QUEUE_IDX 1
-#define VTNET_SQ_CQ_QUEUE_IDX 2
-
-enum { VTNET_RQ = 0, VTNET_TQ = 1, VTNET_CQ = 2 };
-/**
- * The maximum virtqueue size is 2^15. Use that value as the end of
- * descriptor chain terminator since it will never be a valid index
- * in the descriptor table. This is used to verify we are correctly
- * handling vq_free_cnt.
- */
-#define VQ_RING_DESC_CHAIN_END 32768
-
-/**
- * Control the RX mode, ie. promiscuous, allmulti, etc...
- * All commands require an "out" sg entry containing a 1 byte
- * state value, zero = disable, non-zero = enable.  Commands
- * 0 and 1 are supported with the VIRTIO_NET_F_CTRL_RX feature.
- * Commands 2-5 are added with VIRTIO_NET_F_CTRL_RX_EXTRA.
- */
-#define VIRTIO_NET_CTRL_RX              0
-#define VIRTIO_NET_CTRL_RX_PROMISC      0
-#define VIRTIO_NET_CTRL_RX_ALLMULTI     1
-#define VIRTIO_NET_CTRL_RX_ALLUNI       2
-#define VIRTIO_NET_CTRL_RX_NOMULTI      3
-#define VIRTIO_NET_CTRL_RX_NOUNI        4
-#define VIRTIO_NET_CTRL_RX_NOBCAST      5
-
-/**
- * Control the MAC
- *
- * The MAC filter table is managed by the hypervisor, the guest should
- * assume the size is infinite.  Filtering should be considered
- * non-perfect, ie. based on hypervisor resources, the guest may
- * received packets from sources not specified in the filter list.
- *
- * In addition to the class/cmd header, the TABLE_SET command requires
- * two out scatterlists.  Each contains a 4 byte count of entries followed
- * by a concatenated byte stream of the ETH_ALEN MAC addresses.  The
- * first sg list contains unicast addresses, the second is for multicast.
- * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
- * is available.
- *
- * The ADDR_SET command requests one out scatterlist, it contains a
- * 6 bytes MAC address. This functionality is present if the
- * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
- */
-struct virtio_net_ctrl_mac {
-	uint32_t entries;
-	uint8_t macs[][ETHER_ADDR_LEN];
-} __attribute__((__packed__));
-
-#define VIRTIO_NET_CTRL_MAC    1
- #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0
- #define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
-
-/**
- * Control VLAN filtering
- *
- * The VLAN filter table is controlled via a simple ADD/DEL interface.
- * VLAN IDs not added may be filtered by the hypervisor.  Del is the
- * opposite of add.  Both commands expect an out entry containing a 2
- * byte VLAN ID.  VLAN filtering is available with the
- * VIRTIO_NET_F_CTRL_VLAN feature bit.
- */
-#define VIRTIO_NET_CTRL_VLAN     2
-#define VIRTIO_NET_CTRL_VLAN_ADD 0
-#define VIRTIO_NET_CTRL_VLAN_DEL 1
-
-struct virtio_net_ctrl_hdr {
-	uint8_t class;
-	uint8_t cmd;
-} __attribute__((packed));
-
-typedef uint8_t virtio_net_ctrl_ack;
-
-#define VIRTIO_NET_OK     0
-#define VIRTIO_NET_ERR    1
-
-#define VIRTIO_MAX_CTRL_DATA 2048
-
-struct virtio_pmd_ctrl {
-	struct virtio_net_ctrl_hdr hdr;
-	virtio_net_ctrl_ack status;
-	uint8_t data[VIRTIO_MAX_CTRL_DATA];
-};
-
-struct virtqueue {
-	struct virtio_hw         *hw;     /**< virtio_hw structure pointer. */
-	const struct rte_memzone *mz;     /**< mem zone to populate RX ring. */
-	const struct rte_memzone *virtio_net_hdr_mz; /**< memzone to populate hdr. */
-	struct rte_mempool       *mpool;  /**< mempool for mbuf allocation */
-	uint16_t    queue_id;             /**< DPDK queue index. */
-	uint8_t     port_id;              /**< Device port identifier. */
-	uint16_t    vq_queue_index;       /**< PCI queue index */
-
-	void        *vq_ring_virt_mem;    /**< linear address of vring*/
-	unsigned int vq_ring_size;
-	phys_addr_t vq_ring_mem;          /**< physical address of vring */
-
-	struct vring vq_ring;    /**< vring keeping desc, used and avail */
-	uint16_t    vq_free_cnt; /**< num of desc available */
-	uint16_t    vq_nentries; /**< vring desc numbers */
-	uint16_t    vq_free_thresh; /**< free threshold */
-	/**
-	 * Head of the free chain in the descriptor table. If
-	 * there are no free descriptors, this will be set to
-	 * VQ_RING_DESC_CHAIN_END.
-	 */
-	uint16_t  vq_desc_head_idx;
-	uint16_t  vq_desc_tail_idx;
-	/**
-	 * Last consumed descriptor in the used table,
-	 * trails vq_ring.used->idx.
-	 */
-	uint16_t vq_used_cons_idx;
-	uint16_t vq_avail_idx;
-	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
-
-	/* Statistics */
-	uint64_t	packets;
-	uint64_t	bytes;
-	uint64_t	errors;
-
-	struct vq_desc_extra {
-		void              *cookie;
-		uint16_t          ndescs;
-	} vq_descx[0];
-};
-
-/* If multiqueue is provided by host, then we suppport it. */
-#ifndef VIRTIO_NET_F_MQ
-/* Device supports Receive Flow Steering */
-#define VIRTIO_NET_F_MQ 0x400000
-#define VIRTIO_NET_CTRL_MQ   4
-#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
-#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
-#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
-#endif
-#ifndef VIRTIO_NET_F_CTRL_MAC_ADDR
-#define VIRTIO_NET_F_CTRL_MAC_ADDR 0x800000
-#define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
-#endif
-
-/**
- * This is the first element of the scatter-gather list.  If you don't
- * specify GSO or CSUM features, you can simply ignore the header.
- */
-struct virtio_net_hdr {
-#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
-	uint8_t flags;
-#define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
-#define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
-#define VIRTIO_NET_HDR_GSO_UDP      3    /**< GSO frame, IPv4 UDP (UFO) */
-#define VIRTIO_NET_HDR_GSO_TCPV6    4    /**< GSO frame, IPv6 TCP */
-#define VIRTIO_NET_HDR_GSO_ECN      0x80 /**< TCP has ECN set */
-	uint8_t gso_type;
-	uint16_t hdr_len;     /**< Ethernet + IP + tcp/udp hdrs */
-	uint16_t gso_size;    /**< Bytes to append to hdr_len per frame */
-	uint16_t csum_start;  /**< Position to start checksumming from */
-	uint16_t csum_offset; /**< Offset after that to place checksum */
-};
-
-/**
- * This is the version of the header to use when the MRG_RXBUF
- * feature has been negotiated.
- */
-struct virtio_net_hdr_mrg_rxbuf {
-	struct   virtio_net_hdr hdr;
-	uint16_t num_buffers; /**< Number of merged rx buffers */
-};
-
-/**
- * Tell the backend not to interrupt us.
- */
-void virtqueue_disable_intr(struct virtqueue *vq);
-/**
- *  Dump virtqueue internal structures, for debug purpose only.
- */
-void virtqueue_dump(struct virtqueue *vq);
-/**
- *  Get all mbufs to be freed.
- */
-struct rte_mbuf *virtqueue_detatch_unused(struct virtqueue *vq);
-
-static inline int
-virtqueue_full(const struct virtqueue *vq)
-{
-	return vq->vq_free_cnt == 0;
-}
-
-#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
-
-static inline void
-vq_update_avail_idx(struct virtqueue *vq)
-{
-	virtio_wmb();
-	vq->vq_ring.avail->idx = vq->vq_avail_idx;
-}
-
-static inline void
-vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
-{
-	uint16_t avail_idx;
-	/*
-	 * Place the head of the descriptor chain into the next slot and make
-	 * it usable to the host. The chain is made available now rather than
-	 * deferring to virtqueue_notify() in the hopes that if the host is
-	 * currently running on another CPU, we can keep it processing the new
-	 * descriptor.
-	 */
-	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
-	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
-	vq->vq_avail_idx++;
-}
-
-static inline int
-virtqueue_kick_prepare(struct virtqueue *vq)
-{
-	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
-}
-
-static inline void
-virtqueue_notify(struct virtqueue *vq)
-{
-	/*
-	 * Ensure updated avail->idx is visible to host.
-	 * For virtio on IA, the notificaiton is through io port operation
-	 * which is a serialization instruction itself.
-	 */
-	VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_NOTIFY, vq->vq_queue_index);
-}
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
-#define VIRTQUEUE_DUMP(vq) do { \
-	uint16_t used_idx, nused; \
-	used_idx = (vq)->vq_ring.used->idx; \
-	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
-	PMD_INIT_LOG(DEBUG, \
-	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
-	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
-	  " avail.flags=0x%x; used.flags=0x%x", \
-	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
-	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
-	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
-	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
-} while (0)
-#else
-#define VIRTQUEUE_DUMP(vq) do { } while (0)
-#endif
-
-#endif /* _VIRTQUEUE_H_ */
-- 
2.1.0

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v4 0/6] update jhash function
  2015-05-13 13:52  0%     ` De Lara Guarch, Pablo
@ 2015-05-13 14:20  0%       ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-05-13 14:20 UTC (permalink / raw)
  To: De Lara Guarch, Pablo; +Cc: dev

On Wed, May 13, 2015 at 01:52:33PM +0000, De Lara Guarch, Pablo wrote:
> Hi Neil,
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Tuesday, May 12, 2015 4:33 PM
> > To: De Lara Guarch, Pablo
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v4 0/6] update jhash function
> > 
> > On Tue, May 12, 2015 at 12:02:32PM +0100, Pablo de Lara wrote:
> > > Jenkins hash function was developed originally in 1996,
> > > and was integrated in first versions of DPDK.
> > > The function has been improved in 2006,
> > > achieving up to 60% better performance, compared to the original one.
> > >
> > > This patchset updates the current jhash in DPDK,
> > > including two new functions that generate two hashes from a single key.
> > >
> > > It also separates the existing hash function performance tests to
> > > another file, to make it quicker to run.
> > >
> > > changes in v4:
> > > - Simplify key alignment checks
> > > - Include missing x86 arch check
> > >
> > > changes in v3:
> > >
> > > - Update rte_jhash_1word, rte_jhash_2words and rte_jhash_3words
> > >   functions
> > >
> > > changes in v2:
> > >
> > > - Split single commit in three commits, one that updates the existing
> > functions
> > >   and another that adds two new functions and use one of those functions
> > >   as a base to be called by the other ones.
> > > - Remove some unnecessary ifdefs in the code.
> > > - Add new macros to help on the reutilization of constants
> > > - Separate hash function performance tests to another file
> > >   and improve cycle measurements.
> > > - Rename existing function rte_jhash2 to rte_jhash_32b
> > >   (something more meaninful) and mark rte_jhash2 as
> > >   deprecated
> > >
> > > Pablo de Lara (6):
> > >   test/hash: move hash function perf tests to separate file
> > >   test/hash: improve accuracy on cycle measurements
> > >   hash: update jhash function with the latest available
> > >   hash: add two new functions to jhash library
> > >   hash: remove duplicated code
> > >   hash: rename rte_jhash2 to rte_jhash_32b
> > >
> > >  app/test/Makefile               |    1 +
> > >  app/test/test_func_reentrancy.c |    2 +-
> > >  app/test/test_hash.c            |    4 +-
> > >  app/test/test_hash_func_perf.c  |  145 +++++++++++++++++
> > >  app/test/test_hash_perf.c       |   71 +--------
> > >  lib/librte_hash/rte_jhash.h     |  338 +++++++++++++++++++++++++++++-
> > ---------
> > >  6 files changed, 402 insertions(+), 159 deletions(-)
> > >  create mode 100644 app/test/test_hash_func_perf.c
> > >
> > > --
> > > 1.7.4.1
> > >
> > >
> > did you run this through the ABI checker?  I see you're removing several
> > symbols
> > that will likely need to go through the ABI deprecation process.
> > 
> > Neil
> 
> I had not run it, but I just did. I see no problems on librte_hash
> (but I see some on rte_ethdev.h, due to another commit).
> 
> Anyway, I renamed two functions to be more meaningful, but those functions are "static inline", 
> so I am not sure exactly what the deprecation process is for those.
> What I did was leaving the original function that calls the same function as the new renamed one,
> but adds a line warning that the functions is deprecated.
> 
> Is that OK or should I do it differently?
> 
As long as their all static inline and binaries that are already compiled can
continue to access the data structures they reference at the member offsets
encoded to them at compile time, you should be ok.

Thanks!
Neil

> Thanks!
> Pablo
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 0/6] update jhash function
  2015-05-12 15:33  4%   ` Neil Horman
@ 2015-05-13 13:52  0%     ` De Lara Guarch, Pablo
  2015-05-13 14:20  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: De Lara Guarch, Pablo @ 2015-05-13 13:52 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

Hi Neil,

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Tuesday, May 12, 2015 4:33 PM
> To: De Lara Guarch, Pablo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 0/6] update jhash function
> 
> On Tue, May 12, 2015 at 12:02:32PM +0100, Pablo de Lara wrote:
> > Jenkins hash function was developed originally in 1996,
> > and was integrated in first versions of DPDK.
> > The function has been improved in 2006,
> > achieving up to 60% better performance, compared to the original one.
> >
> > This patchset updates the current jhash in DPDK,
> > including two new functions that generate two hashes from a single key.
> >
> > It also separates the existing hash function performance tests to
> > another file, to make it quicker to run.
> >
> > changes in v4:
> > - Simplify key alignment checks
> > - Include missing x86 arch check
> >
> > changes in v3:
> >
> > - Update rte_jhash_1word, rte_jhash_2words and rte_jhash_3words
> >   functions
> >
> > changes in v2:
> >
> > - Split single commit in three commits, one that updates the existing
> functions
> >   and another that adds two new functions and use one of those functions
> >   as a base to be called by the other ones.
> > - Remove some unnecessary ifdefs in the code.
> > - Add new macros to help on the reutilization of constants
> > - Separate hash function performance tests to another file
> >   and improve cycle measurements.
> > - Rename existing function rte_jhash2 to rte_jhash_32b
> >   (something more meaninful) and mark rte_jhash2 as
> >   deprecated
> >
> > Pablo de Lara (6):
> >   test/hash: move hash function perf tests to separate file
> >   test/hash: improve accuracy on cycle measurements
> >   hash: update jhash function with the latest available
> >   hash: add two new functions to jhash library
> >   hash: remove duplicated code
> >   hash: rename rte_jhash2 to rte_jhash_32b
> >
> >  app/test/Makefile               |    1 +
> >  app/test/test_func_reentrancy.c |    2 +-
> >  app/test/test_hash.c            |    4 +-
> >  app/test/test_hash_func_perf.c  |  145 +++++++++++++++++
> >  app/test/test_hash_perf.c       |   71 +--------
> >  lib/librte_hash/rte_jhash.h     |  338 +++++++++++++++++++++++++++++-
> ---------
> >  6 files changed, 402 insertions(+), 159 deletions(-)
> >  create mode 100644 app/test/test_hash_func_perf.c
> >
> > --
> > 1.7.4.1
> >
> >
> did you run this through the ABI checker?  I see you're removing several
> symbols
> that will likely need to go through the ABI deprecation process.
> 
> Neil

I had not run it, but I just did. I see no problems on librte_hash
(but I see some on rte_ethdev.h, due to another commit).

Anyway, I renamed two functions to be more meaningful, but those functions are "static inline", 
so I am not sure exactly what the deprecation process is for those.
What I did was leaving the original function that calls the same function as the new renamed one,
but adds a line warning that the functions is deprecated.

Is that OK or should I do it differently?

Thanks!
Pablo

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 0/3] eal: uio irq fixes and enhancements
  2015-05-13  8:57  3%   ` Bruce Richardson
@ 2015-05-13  9:32  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-05-13  9:32 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

2015-05-13 09:57, Bruce Richardson:
> On Tue, May 12, 2015 at 10:02:20PM +0200, Thomas Monjalon wrote:
> > 2015-04-28 09:36, Stephen Hemminger:
> > > This set of patches starts out with fixing a regression where
> > > uio_pci_generic broke link state interrupt, then adds better
> > > management of PCI config space.
> > > 
> > > Will leave up to document writers to update various release
> > > notes and API manuals as they see fit.
> > > 
> > > Also, needs what ever shared library map file updates which
> > > maybe required when using dynamic libraries. But that should
> > > not stop acceptance of this patch set.
> > 
> > No, an incomplete patch cannot be accepted.
> > There are several solutions:
> > - Siobhan and Neil accept to work on doc and .map file
> > - You provide a good v2
> > - Someone else finish this patchset
> > - The bug remains (not a solution)
> 
> Merge patch one on it's own to fix the issue? I don't think patch 1 requires
> any further doc or ABI map file changes, does it?

Yes you're right.
First patch is now applied.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 0/3] eal: uio irq fixes and enhancements
  @ 2015-05-13  8:57  3%   ` Bruce Richardson
  2015-05-13  9:32  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-05-13  8:57 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Tue, May 12, 2015 at 10:02:20PM +0200, Thomas Monjalon wrote:
> 2015-04-28 09:36, Stephen Hemminger:
> > This set of patches starts out with fixing a regression where
> > uio_pci_generic broke link state interrupt, then adds better
> > management of PCI config space.
> > 
> > Will leave up to document writers to update various release
> > notes and API manuals as they see fit.
> > 
> > Also, needs what ever shared library map file updates which
> > maybe required when using dynamic libraries. But that should
> > not stop acceptance of this patch set.
> 
> No, an incomplete patch cannot be accepted.
> There are several solutions:
> - Siobhan and Neil accept to work on doc and .map file
> - You provide a good v2
> - Someone else finish this patchset
> - The bug remains (not a solution)

Merge patch one on it's own to fix the issue? I don't think patch 1 requires
any further doc or ABI map file changes, does it?

/Bruce

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 14/19] virtio: move virtio PMD to drivers directory
  @ 2015-05-12 17:05  1%   ` Bruce Richardson
    1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2015-05-12 17:05 UTC (permalink / raw)
  To: dev

Move virtio PMD to drivers directory

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/Makefile                                 |    2 +-
 drivers/virtio/Makefile                          |   60 +
 drivers/virtio/rte_pmd_virtio_version.map        |    4 +
 drivers/virtio/virtio_ethdev.c                   | 1504 ++++++++++++++++++++++
 drivers/virtio/virtio_ethdev.h                   |  124 ++
 drivers/virtio/virtio_logs.h                     |   70 +
 drivers/virtio/virtio_pci.c                      |  147 +++
 drivers/virtio/virtio_pci.h                      |  270 ++++
 drivers/virtio/virtio_ring.h                     |  163 +++
 drivers/virtio/virtio_rxtx.c                     |  815 ++++++++++++
 drivers/virtio/virtqueue.c                       |   70 +
 drivers/virtio/virtqueue.h                       |  325 +++++
 lib/Makefile                                     |    1 -
 lib/librte_pmd_virtio/Makefile                   |   60 -
 lib/librte_pmd_virtio/rte_pmd_virtio_version.map |    4 -
 lib/librte_pmd_virtio/virtio_ethdev.c            | 1504 ----------------------
 lib/librte_pmd_virtio/virtio_ethdev.h            |  124 --
 lib/librte_pmd_virtio/virtio_logs.h              |   70 -
 lib/librte_pmd_virtio/virtio_pci.c               |  147 ---
 lib/librte_pmd_virtio/virtio_pci.h               |  270 ----
 lib/librte_pmd_virtio/virtio_ring.h              |  163 ---
 lib/librte_pmd_virtio/virtio_rxtx.c              |  815 ------------
 lib/librte_pmd_virtio/virtqueue.c                |   70 -
 lib/librte_pmd_virtio/virtqueue.h                |  325 -----
 24 files changed, 3553 insertions(+), 3554 deletions(-)
 create mode 100644 drivers/virtio/Makefile
 create mode 100644 drivers/virtio/rte_pmd_virtio_version.map
 create mode 100644 drivers/virtio/virtio_ethdev.c
 create mode 100644 drivers/virtio/virtio_ethdev.h
 create mode 100644 drivers/virtio/virtio_logs.h
 create mode 100644 drivers/virtio/virtio_pci.c
 create mode 100644 drivers/virtio/virtio_pci.h
 create mode 100644 drivers/virtio/virtio_ring.h
 create mode 100644 drivers/virtio/virtio_rxtx.c
 create mode 100644 drivers/virtio/virtqueue.c
 create mode 100644 drivers/virtio/virtqueue.h
 delete mode 100644 lib/librte_pmd_virtio/Makefile
 delete mode 100644 lib/librte_pmd_virtio/rte_pmd_virtio_version.map
 delete mode 100644 lib/librte_pmd_virtio/virtio_ethdev.c
 delete mode 100644 lib/librte_pmd_virtio/virtio_ethdev.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_logs.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_pci.c
 delete mode 100644 lib/librte_pmd_virtio/virtio_pci.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_ring.h
 delete mode 100644 lib/librte_pmd_virtio/virtio_rxtx.c
 delete mode 100644 lib/librte_pmd_virtio/virtqueue.c
 delete mode 100644 lib/librte_pmd_virtio/virtqueue.h

diff --git a/drivers/Makefile b/drivers/Makefile
index 567d77f..7d848e1 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -42,7 +42,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += ring
-#DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
+DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 #DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 #DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
new file mode 100644
index 0000000..21ff7e5
--- /dev/null
+++ b/drivers/virtio/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_virtio.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_virtio_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_net lib/librte_malloc
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/virtio/rte_pmd_virtio_version.map b/drivers/virtio/rte_pmd_virtio_version.map
new file mode 100644
index 0000000..ef35398
--- /dev/null
+++ b/drivers/virtio/rte_pmd_virtio_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+
+	local: *;
+};
diff --git a/drivers/virtio/virtio_ethdev.c b/drivers/virtio/virtio_ethdev.c
new file mode 100644
index 0000000..e63dbfb
--- /dev/null
+++ b/drivers/virtio/virtio_ethdev.c
@@ -0,0 +1,1504 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#ifdef RTE_EXEC_ENV_LINUXAPP
+#include <dirent.h>
+#include <fcntl.h>
+#endif
+
+#include <rte_ethdev.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_pci.h>
+#include <rte_ether.h>
+#include <rte_common.h>
+
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_dev.h>
+
+#include "virtio_ethdev.h"
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+#include "virtqueue.h"
+
+
+static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
+static int  virtio_dev_configure(struct rte_eth_dev *dev);
+static int  virtio_dev_start(struct rte_eth_dev *dev);
+static void virtio_dev_stop(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
+static void virtio_dev_info_get(struct rte_eth_dev *dev,
+				struct rte_eth_dev_info *dev_info);
+static int virtio_dev_link_update(struct rte_eth_dev *dev,
+	__rte_unused int wait_to_complete);
+
+static void virtio_set_hwaddr(struct virtio_hw *hw);
+static void virtio_get_hwaddr(struct virtio_hw *hw);
+
+static void virtio_dev_rx_queue_release(__rte_unused void *rxq);
+static void virtio_dev_tx_queue_release(__rte_unused void *txq);
+
+static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
+static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
+static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
+static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
+				uint16_t vlan_id, int on);
+static void virtio_mac_addr_add(struct rte_eth_dev *dev,
+				struct ether_addr *mac_addr,
+				uint32_t index, uint32_t vmdq __rte_unused);
+static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+static void virtio_mac_addr_set(struct rte_eth_dev *dev,
+				struct ether_addr *mac_addr);
+
+static int virtio_dev_queue_stats_mapping_set(
+	__rte_unused struct rte_eth_dev *eth_dev,
+	__rte_unused uint16_t queue_id,
+	__rte_unused uint8_t stat_idx,
+	__rte_unused uint8_t is_rx);
+
+/*
+ * The set of PCI devices this driver supports
+ */
+static const struct rte_pci_id pci_id_virtio_map[] = {
+
+#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
+#include "rte_pci_dev_ids.h"
+
+{ .vendor_id = 0, /* sentinel */ },
+};
+
+static int
+virtio_send_command(struct virtqueue *vq, struct virtio_pmd_ctrl *ctrl,
+		int *dlen, int pkt_num)
+{
+	uint16_t head = vq->vq_desc_head_idx, i;
+	int k, sum = 0;
+	virtio_net_ctrl_ack status = ~0;
+	struct virtio_pmd_ctrl result;
+
+	ctrl->status = status;
+
+	if (!vq->hw->cvq) {
+		PMD_INIT_LOG(ERR,
+			     "%s(): Control queue is not supported.",
+			     __func__);
+		return -1;
+	}
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
+		"vq->hw->cvq = %p vq = %p",
+		vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+	if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1))
+		return -1;
+
+	memcpy(vq->virtio_net_hdr_mz->addr, ctrl,
+		sizeof(struct virtio_pmd_ctrl));
+
+	/*
+	 * Format is enforced in qemu code:
+	 * One TX packet for header;
+	 * At least one TX packet per argument;
+	 * One RX packet for ACK.
+	 */
+	vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
+	vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+	vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_free_cnt--;
+	i = vq->vq_ring.desc[head].next;
+
+	for (k = 0; k < pkt_num; k++) {
+		vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
+		vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+			+ sizeof(struct virtio_net_ctrl_hdr)
+			+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
+		vq->vq_ring.desc[i].len = dlen[k];
+		sum += dlen[k];
+		vq->vq_free_cnt--;
+		i = vq->vq_ring.desc[i].next;
+	}
+
+	vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+	vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+			+ sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_ring.desc[i].len = sizeof(ctrl->status);
+	vq->vq_free_cnt--;
+
+	vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
+
+	vq_update_avail_ring(vq, head);
+	vq_update_avail_idx(vq);
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index);
+
+	virtqueue_notify(vq);
+
+	rte_rmb();
+	while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+		rte_rmb();
+		usleep(100);
+	}
+
+	while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
+		uint32_t idx, desc_idx, used_idx;
+		struct vring_used_elem *uep;
+
+		used_idx = (uint32_t)(vq->vq_used_cons_idx
+				& (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+		idx = (uint32_t) uep->id;
+		desc_idx = idx;
+
+		while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
+			desc_idx = vq->vq_ring.desc[desc_idx].next;
+			vq->vq_free_cnt++;
+		}
+
+		vq->vq_ring.desc[desc_idx].next = vq->vq_desc_head_idx;
+		vq->vq_desc_head_idx = idx;
+
+		vq->vq_used_cons_idx++;
+		vq->vq_free_cnt++;
+	}
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d",
+			vq->vq_free_cnt, vq->vq_desc_head_idx);
+
+	memcpy(&result, vq->virtio_net_hdr_mz->addr,
+			sizeof(struct virtio_pmd_ctrl));
+
+	return result.status;
+}
+
+static int
+virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
+	memcpy(ctrl.data, &nb_queues, sizeof(uint16_t));
+
+	dlen[0] = sizeof(uint16_t);
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret) {
+		PMD_INIT_LOG(ERR, "Multiqueue configured but send command "
+			  "failed, this is too late now...");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int virtio_dev_queue_setup(struct rte_eth_dev *dev,
+			int queue_type,
+			uint16_t queue_idx,
+			uint16_t  vtpci_queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			struct virtqueue **pvq)
+{
+	char vq_name[VIRTQUEUE_MAX_NAME_SZ];
+	const struct rte_memzone *mz;
+	uint16_t vq_size;
+	int size;
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtqueue  *vq = NULL;
+
+	/* Write the virtqueue index to the Queue Select Field */
+	VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
+	PMD_INIT_LOG(DEBUG, "selecting queue: %d", vtpci_queue_idx);
+
+	/*
+	 * Read the virtqueue size from the Queue Size field
+	 * Always power of 2 and if 0 virtqueue does not exist
+	 */
+	vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
+	PMD_INIT_LOG(DEBUG, "vq_size: %d nb_desc:%d", vq_size, nb_desc);
+	if (nb_desc == 0)
+		nb_desc = vq_size;
+	if (vq_size == 0) {
+		PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
+		return -EINVAL;
+	} else if (!rte_is_power_of_2(vq_size)) {
+		PMD_INIT_LOG(ERR, "%s: virtqueue size is not powerof 2", __func__);
+		return -EINVAL;
+	} else if (nb_desc != vq_size) {
+		PMD_INIT_LOG(ERR, "Warning: nb_desc(%d) is not equal to vq size (%d), fall to vq size",
+			nb_desc, vq_size);
+		nb_desc = vq_size;
+	}
+
+	if (queue_type == VTNET_RQ) {
+		snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
+			dev->data->port_id, queue_idx);
+		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+	} else if (queue_type == VTNET_TQ) {
+		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
+			dev->data->port_id, queue_idx);
+		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
+	} else if (queue_type == VTNET_CQ) {
+		snprintf(vq_name, sizeof(vq_name), "port%d_cvq",
+			dev->data->port_id);
+		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+			vq_size * sizeof(struct vq_desc_extra),
+			RTE_CACHE_LINE_SIZE);
+	}
+	if (vq == NULL) {
+		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
+		return (-ENOMEM);
+	}
+
+	vq->hw = hw;
+	vq->port_id = dev->data->port_id;
+	vq->queue_id = queue_idx;
+	vq->vq_queue_index = vtpci_queue_idx;
+	vq->vq_nentries = vq_size;
+	vq->vq_free_cnt = vq_size;
+
+	/*
+	 * Reserve a memzone for vring elements
+	 */
+	size = vring_size(vq_size, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
+	PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d", size, vq->vq_ring_size);
+
+	mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
+		socket_id, 0, VIRTIO_PCI_VRING_ALIGN);
+	if (mz == NULL) {
+		rte_free(vq);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((mz->phys_addr + vq->vq_ring_size - 1) >> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+		rte_free(vq);
+		return -ENOMEM;
+	}
+
+	memset(mz->addr, 0, sizeof(mz->len));
+	vq->mz = mz;
+	vq->vq_ring_mem = mz->phys_addr;
+	vq->vq_ring_virt_mem = mz->addr;
+	PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:      0x%"PRIx64, (uint64_t)mz->phys_addr);
+	PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, (uint64_t)mz->addr);
+	vq->virtio_net_hdr_mz  = NULL;
+	vq->virtio_net_hdr_mem = 0;
+
+	if (queue_type == VTNET_TQ) {
+		/*
+		 * For each xmit packet, allocate a virtio_net_hdr
+		 */
+		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
+			dev->data->port_id, queue_idx);
+		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
+			vq_size * hw->vtnet_hdr_size,
+			socket_id, 0, RTE_CACHE_LINE_SIZE);
+		if (vq->virtio_net_hdr_mz == NULL) {
+			rte_free(vq);
+			return -ENOMEM;
+		}
+		vq->virtio_net_hdr_mem =
+			vq->virtio_net_hdr_mz->phys_addr;
+		memset(vq->virtio_net_hdr_mz->addr, 0,
+			vq_size * hw->vtnet_hdr_size);
+	} else if (queue_type == VTNET_CQ) {
+		/* Allocate a page for control vq command, data and status */
+		snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
+			dev->data->port_id);
+		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
+			PAGE_SIZE, socket_id, 0, RTE_CACHE_LINE_SIZE);
+		if (vq->virtio_net_hdr_mz == NULL) {
+			rte_free(vq);
+			return -ENOMEM;
+		}
+		vq->virtio_net_hdr_mem =
+			vq->virtio_net_hdr_mz->phys_addr;
+		memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
+	}
+
+	/*
+	 * Set guest physical address of the virtqueue
+	 * in VIRTIO_PCI_QUEUE_PFN config register of device
+	 */
+	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
+			mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	*pvq = vq;
+	return 0;
+}
+
+static int
+virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx,
+		uint32_t socket_id)
+{
+	struct virtqueue *vq;
+	uint16_t nb_desc = 0;
+	int ret;
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	PMD_INIT_FUNC_TRACE();
+	ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
+			vtpci_queue_idx, nb_desc, socket_id, &vq);
+
+	if (ret < 0) {
+		PMD_INIT_LOG(ERR, "control vq initialization failed");
+		return ret;
+	}
+
+	hw->cvq = vq;
+	return 0;
+}
+
+static void
+virtio_dev_close(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct rte_pci_device *pci_dev = dev->pci_dev;
+
+	PMD_INIT_LOG(DEBUG, "virtio_dev_close");
+
+	/* reset the NIC */
+	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+	vtpci_reset(hw);
+	hw->started = 0;
+	virtio_dev_free_mbufs(dev);
+}
+
+static void
+virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+	ctrl.data[0] = 1;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to enable promisc");
+}
+
+static void
+virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+	ctrl.data[0] = 0;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to disable promisc");
+}
+
+static void
+virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+	ctrl.data[0] = 1;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to enable allmulticast");
+}
+
+static void
+virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+	int ret;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+	ctrl.data[0] = 0;
+	dlen[0] = 1;
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+	if (ret)
+		PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
+}
+
+/*
+ * dev_ops for virtio, bare necessities for basic operation
+ */
+static const struct eth_dev_ops virtio_eth_dev_ops = {
+	.dev_configure           = virtio_dev_configure,
+	.dev_start               = virtio_dev_start,
+	.dev_stop                = virtio_dev_stop,
+	.dev_close               = virtio_dev_close,
+	.promiscuous_enable      = virtio_dev_promiscuous_enable,
+	.promiscuous_disable     = virtio_dev_promiscuous_disable,
+	.allmulticast_enable     = virtio_dev_allmulticast_enable,
+	.allmulticast_disable    = virtio_dev_allmulticast_disable,
+
+	.dev_infos_get           = virtio_dev_info_get,
+	.stats_get               = virtio_dev_stats_get,
+	.stats_reset             = virtio_dev_stats_reset,
+	.link_update             = virtio_dev_link_update,
+	.rx_queue_setup          = virtio_dev_rx_queue_setup,
+	/* meaningfull only to multiple queue */
+	.rx_queue_release        = virtio_dev_rx_queue_release,
+	.tx_queue_setup          = virtio_dev_tx_queue_setup,
+	/* meaningfull only to multiple queue */
+	.tx_queue_release        = virtio_dev_tx_queue_release,
+	/* collect stats per queue */
+	.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
+	.vlan_filter_set         = virtio_vlan_filter_set,
+	.mac_addr_add            = virtio_mac_addr_add,
+	.mac_addr_remove         = virtio_mac_addr_remove,
+	.mac_addr_set            = virtio_mac_addr_set,
+};
+
+static inline int
+virtio_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+				struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = link;
+	struct rte_eth_link *src = &(dev->data->dev_link);
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+			*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+/**
+ * Atomically writes the link status information into global
+ * structure rte_eth_dev.
+ *
+ * @param dev
+ *   - Pointer to the structure rte_eth_dev to read from.
+ *   - Pointer to the buffer to be saved with the link status.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, negative value.
+ */
+static inline int
+virtio_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+		struct rte_eth_link *link)
+{
+	struct rte_eth_link *dst = &(dev->data->dev_link);
+	struct rte_eth_link *src = link;
+
+	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+					*(uint64_t *)src) == 0)
+		return -1;
+
+	return 0;
+}
+
+static void
+virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	unsigned i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		const struct virtqueue *txvq = dev->data->tx_queues[i];
+		if (txvq == NULL)
+			continue;
+
+		stats->opackets += txvq->packets;
+		stats->obytes += txvq->bytes;
+		stats->oerrors += txvq->errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_opackets[i] = txvq->packets;
+			stats->q_obytes[i] = txvq->bytes;
+		}
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		const struct virtqueue *rxvq = dev->data->rx_queues[i];
+		if (rxvq == NULL)
+			continue;
+
+		stats->ipackets += rxvq->packets;
+		stats->ibytes += rxvq->bytes;
+		stats->ierrors += rxvq->errors;
+
+		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			stats->q_ipackets[i] = rxvq->packets;
+			stats->q_ibytes[i] = rxvq->bytes;
+		}
+	}
+
+	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
+}
+
+static void
+virtio_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	unsigned int i;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct virtqueue *txvq = dev->data->tx_queues[i];
+		if (txvq == NULL)
+			continue;
+
+		txvq->packets = 0;
+		txvq->bytes = 0;
+		txvq->errors = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct virtqueue *rxvq = dev->data->rx_queues[i];
+		if (rxvq == NULL)
+			continue;
+
+		rxvq->packets = 0;
+		rxvq->bytes = 0;
+		rxvq->errors = 0;
+	}
+
+	dev->data->rx_mbuf_alloc_failed = 0;
+}
+
+static void
+virtio_set_hwaddr(struct virtio_hw *hw)
+{
+	vtpci_write_dev_config(hw,
+			offsetof(struct virtio_net_config, mac),
+			&hw->mac_addr, ETHER_ADDR_LEN);
+}
+
+static void
+virtio_get_hwaddr(struct virtio_hw *hw)
+{
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC)) {
+		vtpci_read_dev_config(hw,
+			offsetof(struct virtio_net_config, mac),
+			&hw->mac_addr, ETHER_ADDR_LEN);
+	} else {
+		eth_random_addr(&hw->mac_addr[0]);
+		virtio_set_hwaddr(hw);
+	}
+}
+
+static int
+virtio_mac_table_set(struct virtio_hw *hw,
+		     const struct virtio_net_ctrl_mac *uc,
+		     const struct virtio_net_ctrl_mac *mc)
+{
+	struct virtio_pmd_ctrl ctrl;
+	int err, len[2];
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+
+	len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
+	memcpy(ctrl.data, uc, len[0]);
+
+	len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
+	memcpy(ctrl.data + len[0], mc, len[1]);
+
+	err = virtio_send_command(hw->cvq, &ctrl, len, 2);
+	if (err != 0)
+		PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
+
+	return err;
+}
+
+static void
+virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		    uint32_t index, uint32_t vmdq __rte_unused)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	const struct ether_addr *addrs = dev->data->mac_addrs;
+	unsigned int i;
+	struct virtio_net_ctrl_mac *uc, *mc;
+
+	if (index >= VIRTIO_MAX_MAC_ADDRS) {
+		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+		return;
+	}
+
+	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
+	uc->entries = 0;
+	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
+	mc->entries = 0;
+
+	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+		const struct ether_addr *addr
+			= (i == index) ? mac_addr : addrs + i;
+		struct virtio_net_ctrl_mac *tbl
+			= is_multicast_ether_addr(addr) ? mc : uc;
+
+		memcpy(&tbl->macs[tbl->entries++], addr, ETHER_ADDR_LEN);
+	}
+
+	virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct ether_addr *addrs = dev->data->mac_addrs;
+	struct virtio_net_ctrl_mac *uc, *mc;
+	unsigned int i;
+
+	if (index >= VIRTIO_MAX_MAC_ADDRS) {
+		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+		return;
+	}
+
+	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
+	uc->entries = 0;
+	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
+	mc->entries = 0;
+
+	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+		struct virtio_net_ctrl_mac *tbl;
+
+		if (i == index || is_zero_ether_addr(addrs + i))
+			continue;
+
+		tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
+		memcpy(&tbl->macs[tbl->entries++], addrs + i, ETHER_ADDR_LEN);
+	}
+
+	virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
+
+	/* Use atomic update if available */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+		struct virtio_pmd_ctrl ctrl;
+		int len = ETHER_ADDR_LEN;
+
+		ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+		ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
+
+		memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
+		virtio_send_command(hw->cvq, &ctrl, &len, 1);
+	} else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
+		virtio_set_hwaddr(hw);
+}
+
+static int
+virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct virtio_pmd_ctrl ctrl;
+	int len;
+
+	if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
+		return -ENOTSUP;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
+	ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
+	memcpy(ctrl.data, &vlan_id, sizeof(vlan_id));
+	len = sizeof(vlan_id);
+
+	return virtio_send_command(hw->cvq, &ctrl, &len, 1);
+}
+
+static void
+virtio_negotiate_features(struct virtio_hw *hw)
+{
+	uint32_t host_features, mask;
+
+	/* checksum offload not implemented */
+	mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
+
+	/* TSO and LRO are only available when their corresponding
+	 * checksum offload feature is also negotiated.
+	 */
+	mask |= VIRTIO_NET_F_HOST_TSO4 | VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_ECN;
+	mask |= VIRTIO_NET_F_GUEST_TSO4 | VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN;
+	mask |= VTNET_LRO_FEATURES;
+
+	/* not negotiating INDIRECT descriptor table support */
+	mask |= VIRTIO_RING_F_INDIRECT_DESC;
+
+	/* Prepare guest_features: feature that driver wants to support */
+	hw->guest_features = VTNET_FEATURES & ~mask;
+	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
+		hw->guest_features);
+
+	/* Read device(host) feature bits */
+	host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+	PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
+		host_features);
+
+	/*
+	 * Negotiate features: Subset of device feature bits are written back
+	 * guest feature bits.
+	 */
+	hw->guest_features = vtpci_negotiate_features(hw, host_features);
+	PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
+		hw->guest_features);
+}
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+static int
+parse_sysfs_value(const char *filename, unsigned long *val)
+{
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
+			     __func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
+			     __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoul(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
+			     __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
+
+static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen,
+			unsigned int *uio_num)
+{
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+	snprintf(dirname, sizeof(dirname),
+		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
+		     loc->domain, loc->bus, loc->devid, loc->function);
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		snprintf(dirname, sizeof(dirname),
+			     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
+			     loc->domain, loc->bus, loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */
+		errno = 0;
+		*uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */
+		errno = 0;
+		*uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			snprintf(buf, buflen, "%s/uio:uio%u", dirname,
+				     *uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL) {
+		PMD_INIT_LOG(ERR, "Could not find uio resource");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+virtio_has_msix(const struct rte_pci_addr *loc)
+{
+	DIR *d;
+	char dirname[PATH_MAX];
+
+	snprintf(dirname, sizeof(dirname),
+		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
+		     loc->domain, loc->bus, loc->devid, loc->function);
+
+	d = opendir(dirname);
+	if (d)
+		closedir(d);
+
+	return (d != NULL);
+}
+
+/* Extract I/O port numbers from sysfs */
+static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
+{
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	unsigned long start, size;
+	unsigned int uio_num;
+
+	if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname), &uio_num) < 0)
+		return -1;
+
+	/* get portio size */
+	snprintf(filename, sizeof(filename),
+		     "%s/portio/port0/size", dirname);
+	if (parse_sysfs_value(filename, &size) < 0) {
+		PMD_INIT_LOG(ERR, "%s(): cannot parse size",
+			     __func__);
+		return -1;
+	}
+
+	/* get portio start */
+	snprintf(filename, sizeof(filename),
+		 "%s/portio/port0/start", dirname);
+	if (parse_sysfs_value(filename, &start) < 0) {
+		PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
+			     __func__);
+		return -1;
+	}
+	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
+	pci_dev->mem_resource[0].len =  (uint64_t)size;
+	PMD_INIT_LOG(DEBUG,
+		     "PCI Port IO found start=0x%lx with size=0x%lx",
+		     start, size);
+
+	/* save fd */
+	memset(dirname, 0, sizeof(dirname));
+	snprintf(dirname, sizeof(dirname), "/dev/uio%u", uio_num);
+	pci_dev->intr_handle.fd = open(dirname, O_RDWR);
+	if (pci_dev->intr_handle.fd < 0) {
+		PMD_INIT_LOG(ERR, "Cannot open %s: %s\n",
+			dirname, strerror(errno));
+		return -1;
+	}
+
+	pci_dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+	pci_dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC;
+
+	return 0;
+}
+
+/* Extract port I/O numbers from proc/ioports */
+static int virtio_resource_init_by_ioports(struct rte_pci_device *pci_dev)
+{
+	uint16_t start, end;
+	int size;
+	FILE *fp;
+	char *line = NULL;
+	char pci_id[16];
+	int found = 0;
+	size_t linesz;
+
+	snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
+		 pci_dev->addr.domain,
+		 pci_dev->addr.bus,
+		 pci_dev->addr.devid,
+		 pci_dev->addr.function);
+
+	fp = fopen("/proc/ioports", "r");
+	if (fp == NULL) {
+		PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+		return -1;
+	}
+
+	while (getdelim(&line, &linesz, '\n', fp) > 0) {
+		char *ptr = line;
+		char *left;
+		int n;
+
+		n = strcspn(ptr, ":");
+		ptr[n] = 0;
+		left = &ptr[n+1];
+
+		while (*left && isspace(*left))
+			left++;
+
+		if (!strncmp(left, pci_id, strlen(pci_id))) {
+			found = 1;
+
+			while (*ptr && isspace(*ptr))
+				ptr++;
+
+			sscanf(ptr, "%04hx-%04hx", &start, &end);
+			size = end - start + 1;
+
+			break;
+		}
+	}
+
+	free(line);
+	fclose(fp);
+
+	if (!found)
+		return -1;
+
+	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
+	pci_dev->mem_resource[0].len =  (uint64_t)size;
+	PMD_INIT_LOG(DEBUG,
+		"PCI Port IO found start=0x%x with size=0x%x",
+		start, size);
+
+	/* can't support lsc interrupt without uio */
+	pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
+
+	return 0;
+}
+
+/* Extract I/O port numbers from sysfs */
+static int virtio_resource_init(struct rte_pci_device *pci_dev)
+{
+	if (virtio_resource_init_by_uio(pci_dev) == 0)
+		return 0;
+	else
+		return virtio_resource_init_by_ioports(pci_dev);
+}
+
+#else
+static int
+virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
+{
+	/* nic_uio does not enable interrupts, return 0 (false). */
+	return 0;
+}
+
+static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused)
+{
+	/* no setup required */
+	return 0;
+}
+#endif
+
+/*
+ * Process Virtio Config changed interrupt and call the callback
+ * if link state changed.
+ */
+static void
+virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+			 void *param)
+{
+	struct rte_eth_dev *dev = param;
+	struct virtio_hw *hw = dev->data->dev_private;
+	uint8_t isr;
+
+	/* Read interrupt status which clears interrupt */
+	isr = vtpci_isr(hw);
+	PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
+
+	if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
+		PMD_DRV_LOG(ERR, "interrupt enable failed");
+
+	if (isr & VIRTIO_PCI_ISR_CONFIG) {
+		if (virtio_dev_link_update(dev, 0) == 0)
+			_rte_eth_dev_callback_process(dev,
+						      RTE_ETH_EVENT_INTR_LSC);
+	}
+
+}
+
+static void
+rx_func_get(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+		eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
+	else
+		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
+}
+
+/*
+ * This function is based on probe() function in virtio_pci.c
+ * It returns 0 on success.
+ */
+static int
+eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
+{
+	struct virtio_hw *hw = eth_dev->data->dev_private;
+	struct virtio_net_config *config;
+	struct virtio_net_config local_config;
+	uint32_t offset_conf = sizeof(config->mac);
+	struct rte_pci_device *pci_dev;
+
+	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
+
+	eth_dev->dev_ops = &virtio_eth_dev_ops;
+	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		rx_func_get(eth_dev);
+		return 0;
+	}
+
+	/* Allocate memory for storing MAC addresses */
+	eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
+	if (eth_dev->data->mac_addrs == NULL) {
+		PMD_INIT_LOG(ERR,
+			"Failed to allocate %d bytes needed to store MAC addresses",
+			ETHER_ADDR_LEN);
+		return -ENOMEM;
+	}
+
+	pci_dev = eth_dev->pci_dev;
+	if (virtio_resource_init(pci_dev) < 0)
+		return -1;
+
+	hw->use_msix = virtio_has_msix(&pci_dev->addr);
+	hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
+
+	/* Reset the device although not necessary at startup */
+	vtpci_reset(hw);
+
+	/* Tell the host we've noticed this device. */
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
+	/* Tell the host we've known how to drive the device. */
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+	virtio_negotiate_features(hw);
+
+	rx_func_get(eth_dev);
+
+	/* Setting up rx_header size for the device */
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
+
+	/* Copy the permanent MAC address to: virtio_hw */
+	virtio_get_hwaddr(hw);
+	ether_addr_copy((struct ether_addr *) hw->mac_addr,
+			&eth_dev->data->mac_addrs[0]);
+	PMD_INIT_LOG(DEBUG,
+		     "PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
+		     hw->mac_addr[0], hw->mac_addr[1], hw->mac_addr[2],
+		     hw->mac_addr[3], hw->mac_addr[4], hw->mac_addr[5]);
+
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		config = &local_config;
+
+		if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+			offset_conf += sizeof(config->status);
+		} else {
+			PMD_INIT_LOG(DEBUG,
+				     "VIRTIO_NET_F_STATUS is not supported");
+			config->status = 0;
+		}
+
+		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
+			offset_conf += sizeof(config->max_virtqueue_pairs);
+		} else {
+			PMD_INIT_LOG(DEBUG,
+				     "VIRTIO_NET_F_MQ is not supported");
+			config->max_virtqueue_pairs = 1;
+		}
+
+		vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
+
+		hw->max_rx_queues =
+			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
+			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
+		hw->max_tx_queues =
+			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
+			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
+
+		virtio_dev_cq_queue_setup(eth_dev,
+					config->max_virtqueue_pairs * 2,
+					SOCKET_ID_ANY);
+
+		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
+				config->max_virtqueue_pairs);
+		PMD_INIT_LOG(DEBUG, "config->status=%d", config->status);
+		PMD_INIT_LOG(DEBUG,
+				"PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
+				config->mac[0], config->mac[1],
+				config->mac[2], config->mac[3],
+				config->mac[4], config->mac[5]);
+	} else {
+		hw->max_rx_queues = 1;
+		hw->max_tx_queues = 1;
+	}
+
+	eth_dev->data->nb_rx_queues = hw->max_rx_queues;
+	eth_dev->data->nb_tx_queues = hw->max_tx_queues;
+
+	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
+			hw->max_rx_queues, hw->max_tx_queues);
+	PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
+			eth_dev->data->port_id, pci_dev->id.vendor_id,
+			pci_dev->id.device_id);
+
+	/* Setup interrupt callback  */
+	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+		rte_intr_callback_register(&pci_dev->intr_handle,
+				   virtio_interrupt_handler, eth_dev);
+
+	virtio_dev_cq_start(eth_dev);
+
+	return 0;
+}
+
+static struct eth_driver rte_virtio_pmd = {
+	{
+		.name = "rte_virtio_pmd",
+		.id_table = pci_id_virtio_map,
+	},
+	.eth_dev_init = eth_virtio_dev_init,
+	.dev_private_size = sizeof(struct virtio_hw),
+};
+
+/*
+ * Driver initialization routine.
+ * Invoked once at EAL init time.
+ * Register itself as the [Poll Mode] Driver of PCI virtio devices.
+ * Returns 0 on success.
+ */
+static int
+rte_virtio_pmd_init(const char *name __rte_unused,
+		    const char *param __rte_unused)
+{
+	if (rte_eal_iopl_init() != 0) {
+		PMD_INIT_LOG(ERR, "IOPL call failed - cannot use virtio PMD");
+		return -1;
+	}
+
+	rte_eth_driver_register(&rte_virtio_pmd);
+	return 0;
+}
+
+/*
+ * Only 1 queue is supported, no queue release related operation
+ */
+static void
+virtio_dev_rx_queue_release(__rte_unused void *rxq)
+{
+}
+
+static void
+virtio_dev_tx_queue_release(__rte_unused void *txq)
+{
+}
+
+/*
+ * Configure virtio device
+ * It returns 0 on success.
+ */
+static int
+virtio_dev_configure(struct rte_eth_dev *dev)
+{
+	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct rte_pci_device *pci_dev = dev->pci_dev;
+
+	PMD_INIT_LOG(DEBUG, "configure");
+
+	if (rxmode->hw_ip_checksum) {
+		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
+		return (-EINVAL);
+	}
+
+	hw->vlan_strip = rxmode->hw_vlan_strip;
+
+	if (rxmode->hw_vlan_filter
+	    && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
+		PMD_DRV_LOG(NOTICE,
+			    "vlan filtering not available on this host");
+		return -ENOTSUP;
+	}
+
+	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
+		if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
+			PMD_DRV_LOG(ERR, "failed to set config vector");
+			return -EBUSY;
+		}
+
+	return 0;
+}
+
+
+static int
+virtio_dev_start(struct rte_eth_dev *dev)
+{
+	uint16_t nb_queues, i;
+	struct virtio_hw *hw = dev->data->dev_private;
+	struct rte_pci_device *pci_dev = dev->pci_dev;
+
+	/* check if lsc interrupt feature is enabled */
+	if ((dev->data->dev_conf.intr_conf.lsc) &&
+		(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
+		if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+			PMD_DRV_LOG(ERR, "link status not supported by host");
+			return -ENOTSUP;
+		}
+
+		if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
+			PMD_DRV_LOG(ERR, "interrupt enable failed");
+			return -EIO;
+		}
+	}
+
+	/* Initialize Link state */
+	virtio_dev_link_update(dev, 0);
+
+	/* On restart after stop do not touch queues */
+	if (hw->started)
+		return 0;
+
+	/* Do final configuration before rx/tx engine starts */
+	virtio_dev_rxtx_start(dev);
+	vtpci_reinit_complete(hw);
+
+	hw->started = 1;
+
+	/*Notify the backend
+	 *Otherwise the tap backend might already stop its queue due to fullness.
+	 *vhost backend will have no chance to be waked up
+	 */
+	nb_queues = dev->data->nb_rx_queues;
+	if (nb_queues > 1) {
+		if (virtio_set_multiple_queues(dev, nb_queues) != 0)
+			return -EINVAL;
+	}
+
+	PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
+
+	for (i = 0; i < nb_queues; i++)
+		virtqueue_notify(dev->data->rx_queues[i]);
+
+	PMD_INIT_LOG(DEBUG, "Notified backend at initialization");
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+
+	return 0;
+}
+
+static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)
+{
+	struct rte_mbuf *buf;
+	int i, mbuf_num = 0;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		PMD_INIT_LOG(DEBUG,
+			     "Before freeing rxq[%d] used and unused buf", i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+
+		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
+					dev->data->rx_queues[i])) != NULL) {
+			rte_pktmbuf_free(buf);
+			mbuf_num++;
+		}
+
+		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
+		PMD_INIT_LOG(DEBUG,
+			     "After freeing rxq[%d] used and unused buf", i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+	}
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		PMD_INIT_LOG(DEBUG,
+			     "Before freeing txq[%d] used and unused bufs",
+			     i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+
+		mbuf_num = 0;
+		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
+					dev->data->tx_queues[i])) != NULL) {
+			rte_pktmbuf_free(buf);
+
+			mbuf_num++;
+		}
+
+		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
+		PMD_INIT_LOG(DEBUG,
+			     "After freeing txq[%d] used and unused buf", i);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+	}
+}
+
+/*
+ * Stop device: disable interrupt and mark link down
+ */
+static void
+virtio_dev_stop(struct rte_eth_dev *dev)
+{
+	struct rte_eth_link link;
+
+	PMD_INIT_LOG(DEBUG, "stop");
+
+	if (dev->data->dev_conf.intr_conf.lsc)
+		rte_intr_disable(&dev->pci_dev->intr_handle);
+
+	memset(&link, 0, sizeof(link));
+	virtio_dev_atomic_write_link_status(dev, &link);
+}
+
+static int
+virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
+{
+	struct rte_eth_link link, old;
+	uint16_t status;
+	struct virtio_hw *hw = dev->data->dev_private;
+	memset(&link, 0, sizeof(link));
+	virtio_dev_atomic_read_link_status(dev, &link);
+	old = link;
+	link.link_duplex = FULL_DUPLEX;
+	link.link_speed  = SPEED_10G;
+
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+		PMD_INIT_LOG(DEBUG, "Get link status from hw");
+		vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config, status),
+				&status, sizeof(status));
+		if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
+			link.link_status = 0;
+			PMD_INIT_LOG(DEBUG, "Port %d is down",
+				     dev->data->port_id);
+		} else {
+			link.link_status = 1;
+			PMD_INIT_LOG(DEBUG, "Port %d is up",
+				     dev->data->port_id);
+		}
+	} else {
+		link.link_status = 1;   /* Link up */
+	}
+	virtio_dev_atomic_write_link_status(dev, &link);
+
+	return (old.link_status == link.link_status) ? -1 : 0;
+}
+
+static void
+virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	dev_info->driver_name = dev->driver->pci_drv.name;
+	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
+	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
+	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
+	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
+	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
+	dev_info->default_txconf = (struct rte_eth_txconf) {
+		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
+	};
+}
+
+/*
+ * It enables testpmd to collect per queue stats.
+ */
+static int
+virtio_dev_queue_stats_mapping_set(__rte_unused struct rte_eth_dev *eth_dev,
+__rte_unused uint16_t queue_id, __rte_unused uint8_t stat_idx,
+__rte_unused uint8_t is_rx)
+{
+	return 0;
+}
+
+static struct rte_driver rte_virtio_driver = {
+	.type = PMD_PDEV,
+	.init = rte_virtio_pmd_init,
+};
+
+PMD_REGISTER_DRIVER(rte_virtio_driver);
diff --git a/drivers/virtio/virtio_ethdev.h b/drivers/virtio/virtio_ethdev.h
new file mode 100644
index 0000000..e6d4533
--- /dev/null
+++ b/drivers/virtio/virtio_ethdev.h
@@ -0,0 +1,124 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_ETHDEV_H_
+#define _VIRTIO_ETHDEV_H_
+
+#include <stdint.h>
+
+#include "virtio_pci.h"
+
+#define SPEED_10	10
+#define SPEED_100	100
+#define SPEED_1000	1000
+#define SPEED_10G	10000
+#define HALF_DUPLEX	1
+#define FULL_DUPLEX	2
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+
+#define VIRTIO_MAX_RX_QUEUES 128
+#define VIRTIO_MAX_TX_QUEUES 128
+#define VIRTIO_MAX_MAC_ADDRS 64
+#define VIRTIO_MIN_RX_BUFSIZE 64
+#define VIRTIO_MAX_RX_PKTLEN  9728
+
+/* Features desired/implemented by this driver. */
+#define VTNET_FEATURES \
+	(VIRTIO_NET_F_MAC       | \
+	VIRTIO_NET_F_STATUS     | \
+	VIRTIO_NET_F_MQ         | \
+	VIRTIO_NET_F_CTRL_MAC_ADDR | \
+	VIRTIO_NET_F_CTRL_VQ    | \
+	VIRTIO_NET_F_CTRL_RX    | \
+	VIRTIO_NET_F_CTRL_VLAN  | \
+	VIRTIO_NET_F_CSUM       | \
+	VIRTIO_NET_F_HOST_TSO4  | \
+	VIRTIO_NET_F_HOST_TSO6  | \
+	VIRTIO_NET_F_HOST_ECN   | \
+	VIRTIO_NET_F_GUEST_CSUM | \
+	VIRTIO_NET_F_GUEST_TSO4 | \
+	VIRTIO_NET_F_GUEST_TSO6 | \
+	VIRTIO_NET_F_GUEST_ECN  | \
+	VIRTIO_NET_F_MRG_RXBUF  | \
+	VIRTIO_RING_F_INDIRECT_DESC)
+
+/*
+ * CQ function prototype
+ */
+void virtio_dev_cq_start(struct rte_eth_dev *dev);
+
+/*
+ * RX/TX function prototypes
+ */
+void virtio_dev_rxtx_start(struct rte_eth_dev *dev);
+
+int virtio_dev_queue_setup(struct rte_eth_dev *dev,
+			int queue_type,
+			uint16_t queue_idx,
+			uint16_t  vtpci_queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			struct virtqueue **pvq);
+
+int  virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+		uint16_t nb_rx_desc, unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		struct rte_mempool *mb_pool);
+
+int  virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
+		uint16_t nb_tx_desc, unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf);
+
+uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts);
+
+
+/*
+ * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
+ * frames larger than 1514 bytes. We do not yet support software LRO
+ * via tcp_lro_rx().
+ */
+#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
+			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
+
+
+#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/drivers/virtio/virtio_logs.h b/drivers/virtio/virtio_logs.h
new file mode 100644
index 0000000..d6c33f7
--- /dev/null
+++ b/drivers/virtio/virtio_logs.h
@@ -0,0 +1,70 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while(0)
+#define PMD_INIT_FUNC_TRACE() do { } while(0)
+#endif
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt , __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while(0)
+#endif
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt , __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while(0)
+#endif
+
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt , __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while(0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
new file mode 100644
index 0000000..2245bec
--- /dev/null
+++ b/drivers/virtio/virtio_pci.c
@@ -0,0 +1,147 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+
+#include "virtio_pci.h"
+#include "virtio_logs.h"
+
+static uint8_t vtpci_get_status(struct virtio_hw *);
+
+void
+vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+		void *dst, int length)
+{
+	uint64_t off;
+	uint8_t *d;
+	int size;
+
+	off = VIRTIO_PCI_CONFIG(hw) + offset;
+	for (d = dst; length > 0; d += size, off += size, length -= size) {
+		if (length >= 4) {
+			size = 4;
+			*(uint32_t *)d = VIRTIO_READ_REG_4(hw, off);
+		} else if (length >= 2) {
+			size = 2;
+			*(uint16_t *)d = VIRTIO_READ_REG_2(hw, off);
+		} else {
+			size = 1;
+			*d = VIRTIO_READ_REG_1(hw, off);
+		}
+	}
+}
+
+void
+vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
+		void *src, int length)
+{
+	uint64_t off;
+	uint8_t *s;
+	int size;
+
+	off = VIRTIO_PCI_CONFIG(hw) + offset;
+	for (s = src; length > 0; s += size, off += size, length -= size) {
+		if (length >= 4) {
+			size = 4;
+			VIRTIO_WRITE_REG_4(hw, off, *(uint32_t *)s);
+		} else if (length >= 2) {
+			size = 2;
+			VIRTIO_WRITE_REG_2(hw, off, *(uint16_t *)s);
+		} else {
+			size = 1;
+			VIRTIO_WRITE_REG_1(hw, off, *s);
+		}
+	}
+}
+
+uint32_t
+vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
+{
+	uint32_t features;
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+
+	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
+	return features;
+}
+
+
+void
+vtpci_reset(struct virtio_hw *hw)
+{
+	/*
+	 * Setting the status to RESET sets the host device to
+	 * the original, uninitialized state.
+	 */
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	vtpci_get_status(hw);
+}
+
+void
+vtpci_reinit_complete(struct virtio_hw *hw)
+{
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+static uint8_t
+vtpci_get_status(struct virtio_hw *hw)
+{
+	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
+}
+
+void
+vtpci_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status = (uint8_t)(status | vtpci_get_status(hw));
+
+	VIRTIO_WRITE_REG_1(hw, VIRTIO_PCI_STATUS, status);
+}
+
+uint8_t
+vtpci_isr(struct virtio_hw *hw)
+{
+
+	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_ISR);
+}
+
+
+/* Enable one vector (0) for Link State Intrerrupt */
+uint16_t
+vtpci_irq_config(struct virtio_hw *hw, uint16_t vec)
+{
+	VIRTIO_WRITE_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR, vec);
+	return VIRTIO_READ_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR);
+}
diff --git a/drivers/virtio/virtio_pci.h b/drivers/virtio/virtio_pci.h
new file mode 100644
index 0000000..64d9c34
--- /dev/null
+++ b/drivers/virtio/virtio_pci.h
@@ -0,0 +1,270 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <stdint.h>
+
+#ifdef __FreeBSD__
+#include <sys/types.h>
+#include <machine/cpufunc.h>
+#else
+#include <sys/io.h>
+#endif
+
+#include <rte_ethdev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_PCI_VENDORID     0x1AF4
+#define VIRTIO_PCI_DEVICEID_MIN 0x1000
+#define VIRTIO_PCI_DEVICEID_MAX 0x103F
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
+				      * also clears the register (8, RO) */
+/* Only if MSIX is enabled: */
+#define VIRTIO_MSI_CONFIG_VECTOR  20 /* configuration change vector (16, RW) */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22 /* vector for selected VQ notifications
+				      (16, RW) */
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* VirtIO device IDs. */
+#define VIRTIO_ID_NETWORK  0x01
+#define VIRTIO_ID_BLOCK    0x02
+#define VIRTIO_ID_CONSOLE  0x03
+#define VIRTIO_ID_ENTROPY  0x04
+#define VIRTIO_ID_BALLOON  0x05
+#define VIRTIO_ID_IOMEMORY 0x06
+#define VIRTIO_ID_9P       0x09
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Generate interrupt when the virtqueue ring is
+ * completely used, even if we've suppressed them.
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY (1 << 24)
+
+/*
+ * The guest should never negotiate this feature; it
+ * is used to detect faulty drivers.
+ */
+#define VIRTIO_F_BAD_FEATURE (1 << 30)
+
+/*
+ * Some VirtIO feature bits (currently bits 28 through 31) are
+ * reserved for the transport being used (eg. virtio_ring), the
+ * rest are per-device feature bits.
+ */
+#define VIRTIO_TRANSPORT_F_START 28
+#define VIRTIO_TRANSPORT_F_END   32
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* The feature bitmap for virtio net */
+#define VIRTIO_NET_F_CSUM       0x00001 /* Host handles pkts w/ partial csum */
+#define VIRTIO_NET_F_GUEST_CSUM 0x00002 /* Guest handles pkts w/ partial csum*/
+#define VIRTIO_NET_F_MAC        0x00020 /* Host has given MAC address. */
+#define VIRTIO_NET_F_GSO        0x00040 /* Host handles pkts w/ any GSO type */
+#define VIRTIO_NET_F_GUEST_TSO4 0x00080 /* Guest can handle TSOv4 in. */
+#define VIRTIO_NET_F_GUEST_TSO6 0x00100 /* Guest can handle TSOv6 in. */
+#define VIRTIO_NET_F_GUEST_ECN  0x00200 /* Guest can handle TSO[6] w/ ECN in.*/
+#define VIRTIO_NET_F_GUEST_UFO  0x00400 /* Guest can handle UFO in. */
+#define VIRTIO_NET_F_HOST_TSO4  0x00800 /* Host can handle TSOv4 in. */
+#define VIRTIO_NET_F_HOST_TSO6  0x01000 /* Host can handle TSOv6 in. */
+#define VIRTIO_NET_F_HOST_ECN   0x02000 /* Host can handle TSO[6] w/ ECN in. */
+#define VIRTIO_NET_F_HOST_UFO   0x04000 /* Host can handle UFO in. */
+#define VIRTIO_NET_F_MRG_RXBUF  0x08000 /* Host can merge receive buffers. */
+#define VIRTIO_NET_F_STATUS     0x10000 /* virtio_net_config.status available*/
+#define VIRTIO_NET_F_CTRL_VQ    0x20000 /* Control channel available */
+#define VIRTIO_NET_F_CTRL_RX    0x40000 /* Control channel RX mode support */
+#define VIRTIO_NET_F_CTRL_VLAN  0x80000 /* Control channel VLAN filtering */
+#define VIRTIO_NET_F_CTRL_RX_EXTRA  0x100000 /* Extra RX mode control support */
+#define VIRTIO_RING_F_INDIRECT_DESC 0x10000000 /* Support for indirect buffer descriptors. */
+/* The guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ * The host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX 0x20000000
+
+#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
+
+/*
+ * Maximum number of virtqueues per device.
+ */
+#define VIRTIO_MAX_VIRTQUEUES 8
+
+struct virtio_hw {
+	struct virtqueue *cvq;
+	uint32_t    io_base;
+	uint32_t    guest_features;
+	uint32_t    max_tx_queues;
+	uint32_t    max_rx_queues;
+	uint16_t    vtnet_hdr_size;
+	uint8_t	    vlan_strip;
+	uint8_t	    use_msix;
+	uint8_t     started;
+	uint8_t     mac_addr[ETHER_ADDR_LEN];
+};
+
+/*
+ * This structure is just a reference to read
+ * net device specific config space; it just a chodu structure
+ *
+ */
+struct virtio_net_config {
+	/* The config defining mac address (if VIRTIO_NET_F_MAC) */
+	uint8_t    mac[ETHER_ADDR_LEN];
+	/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
+	uint16_t   status;
+	uint16_t   max_virtqueue_pairs;
+} __attribute__((packed));
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+#ifdef __FreeBSD__
+
+static inline void
+outb_p(unsigned char data, unsigned int port)
+{
+
+	outb(port, (u_char)data);
+}
+
+static inline void
+outw_p(unsigned short data, unsigned int port)
+{
+	outw(port, (u_short)data);
+}
+
+static inline void
+outl_p(unsigned int data, unsigned int port)
+{
+	outl(port, (u_int)data);
+}
+#endif
+
+#define VIRTIO_PCI_REG_ADDR(hw, reg) \
+	(unsigned short)((hw)->io_base + (reg))
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+	inb((VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+	outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+
+#define VIRTIO_READ_REG_2(hw, reg) \
+	inw((VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+	outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+
+#define VIRTIO_READ_REG_4(hw, reg) \
+	inl((VIRTIO_PCI_REG_ADDR((hw), (reg))))
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+	outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
+
+static inline int
+vtpci_with_feature(struct virtio_hw *hw, uint32_t feature)
+{
+	return (hw->guest_features & feature) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+void vtpci_reset(struct virtio_hw *);
+
+void vtpci_reinit_complete(struct virtio_hw *);
+
+void vtpci_set_status(struct virtio_hw *, uint8_t);
+
+uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
+
+void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);
+
+void vtpci_read_dev_config(struct virtio_hw *, uint64_t, void *, int);
+
+uint8_t vtpci_isr(struct virtio_hw *);
+
+uint16_t vtpci_irq_config(struct virtio_hw *, uint16_t);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/virtio/virtio_ring.h b/drivers/virtio/virtio_ring.h
new file mode 100644
index 0000000..a16c499
--- /dev/null
+++ b/drivers/virtio/virtio_ring.h
@@ -0,0 +1,163 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers. */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.  */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next". */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline int
+vring_size(unsigned int num, unsigned long align)
+{
+	int size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/virtio/virtio_rxtx.c b/drivers/virtio/virtio_rxtx.c
new file mode 100644
index 0000000..3ff275c
--- /dev/null
+++ b/drivers/virtio/virtio_rxtx.c
@@ -0,0 +1,815 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <rte_cycles.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_branch_prediction.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_prefetch.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_byteorder.h>
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
+#define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
+#else
+#define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
+#endif
+
+static void
+vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
+{
+	struct vring_desc *dp, *dp_tail;
+	struct vq_desc_extra *dxp;
+	uint16_t desc_idx_last = desc_idx;
+
+	dp  = &vq->vq_ring.desc[desc_idx];
+	dxp = &vq->vq_descx[desc_idx];
+	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt + dxp->ndescs);
+	if ((dp->flags & VRING_DESC_F_INDIRECT) == 0) {
+		while (dp->flags & VRING_DESC_F_NEXT) {
+			desc_idx_last = dp->next;
+			dp = &vq->vq_ring.desc[dp->next];
+		}
+	}
+	dxp->ndescs = 0;
+
+	/*
+	 * We must append the existing free chain, if any, to the end of
+	 * newly freed chain. If the virtqueue was completely used, then
+	 * head would be VQ_RING_DESC_CHAIN_END (ASSERTed above).
+	 */
+	if (vq->vq_desc_tail_idx == VQ_RING_DESC_CHAIN_END) {
+		vq->vq_desc_head_idx = desc_idx;
+	} else {
+		dp_tail = &vq->vq_ring.desc[vq->vq_desc_tail_idx];
+		dp_tail->next = desc_idx;
+	}
+
+	vq->vq_desc_tail_idx = desc_idx_last;
+	dp->next = VQ_RING_DESC_CHAIN_END;
+}
+
+static uint16_t
+virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct rte_mbuf **rx_pkts,
+			   uint32_t *len, uint16_t num)
+{
+	struct vring_used_elem *uep;
+	struct rte_mbuf *cookie;
+	uint16_t used_idx, desc_idx;
+	uint16_t i;
+
+	/*  Caller does the check */
+	for (i = 0; i < num ; i++) {
+		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+		desc_idx = (uint16_t) uep->id;
+		len[i] = uep->len;
+		cookie = (struct rte_mbuf *)vq->vq_descx[desc_idx].cookie;
+
+		if (unlikely(cookie == NULL)) {
+			PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie at %u\n",
+				vq->vq_used_cons_idx);
+			break;
+		}
+
+		rte_prefetch0(cookie);
+		rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *));
+		rx_pkts[i]  = cookie;
+		vq->vq_used_cons_idx++;
+		vq_ring_free_chain(vq, desc_idx);
+		vq->vq_descx[desc_idx].cookie = NULL;
+	}
+
+	return i;
+}
+
+#ifndef DEFAULT_TX_FREE_THRESH
+#define DEFAULT_TX_FREE_THRESH 32
+#endif
+
+/* Cleanup from completed transmits. */
+static void
+virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
+{
+	uint16_t i, used_idx, desc_idx;
+	for (i = 0; i < num; i++) {
+		struct vring_used_elem *uep;
+		struct vq_desc_extra *dxp;
+
+		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+
+		desc_idx = (uint16_t) uep->id;
+		dxp = &vq->vq_descx[desc_idx];
+		vq->vq_used_cons_idx++;
+		vq_ring_free_chain(vq, desc_idx);
+
+		if (dxp->cookie != NULL) {
+			rte_pktmbuf_free(dxp->cookie);
+			dxp->cookie = NULL;
+		}
+	}
+}
+
+
+static inline int
+virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct virtio_hw *hw = vq->hw;
+	struct vring_desc *start_dp;
+	uint16_t needed = 1;
+	uint16_t head_idx, idx;
+
+	if (unlikely(vq->vq_free_cnt == 0))
+		return -ENOSPC;
+	if (unlikely(vq->vq_free_cnt < needed))
+		return -EMSGSIZE;
+
+	head_idx = vq->vq_desc_head_idx;
+	if (unlikely(head_idx >= vq->vq_nentries))
+		return -EFAULT;
+
+	idx = head_idx;
+	dxp = &vq->vq_descx[idx];
+	dxp->cookie = (void *)cookie;
+	dxp->ndescs = needed;
+
+	start_dp = vq->vq_ring.desc;
+	start_dp[idx].addr =
+		(uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
+		- hw->vtnet_hdr_size);
+	start_dp[idx].len =
+		cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
+	start_dp[idx].flags =  VRING_DESC_F_WRITE;
+	idx = start_dp[idx].next;
+	vq->vq_desc_head_idx = idx;
+	if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
+		vq->vq_desc_tail_idx = idx;
+	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed);
+	vq_update_avail_ring(vq, head_idx);
+
+	return 0;
+}
+
+static int
+virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
+{
+	struct vq_desc_extra *dxp;
+	struct vring_desc *start_dp;
+	uint16_t seg_num = cookie->nb_segs;
+	uint16_t needed = 1 + seg_num;
+	uint16_t head_idx, idx;
+	uint16_t head_size = txvq->hw->vtnet_hdr_size;
+
+	if (unlikely(txvq->vq_free_cnt == 0))
+		return -ENOSPC;
+	if (unlikely(txvq->vq_free_cnt < needed))
+		return -EMSGSIZE;
+	head_idx = txvq->vq_desc_head_idx;
+	if (unlikely(head_idx >= txvq->vq_nentries))
+		return -EFAULT;
+
+	idx = head_idx;
+	dxp = &txvq->vq_descx[idx];
+	dxp->cookie = (void *)cookie;
+	dxp->ndescs = needed;
+
+	start_dp = txvq->vq_ring.desc;
+	start_dp[idx].addr =
+		txvq->virtio_net_hdr_mem + idx * head_size;
+	start_dp[idx].len = (uint32_t)head_size;
+	start_dp[idx].flags = VRING_DESC_F_NEXT;
+
+	for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
+		idx = start_dp[idx].next;
+		start_dp[idx].addr  = RTE_MBUF_DATA_DMA_ADDR(cookie);
+		start_dp[idx].len   = cookie->data_len;
+		start_dp[idx].flags = VRING_DESC_F_NEXT;
+		cookie = cookie->next;
+	}
+
+	start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
+	idx = start_dp[idx].next;
+	txvq->vq_desc_head_idx = idx;
+	if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
+		txvq->vq_desc_tail_idx = idx;
+	txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed);
+	vq_update_avail_ring(txvq, head_idx);
+
+	return 0;
+}
+
+static inline struct rte_mbuf *
+rte_rxmbuf_alloc(struct rte_mempool *mp)
+{
+	struct rte_mbuf *m;
+
+	m = __rte_mbuf_raw_alloc(mp);
+	__rte_mbuf_sanity_check_raw(m, 0);
+
+	return m;
+}
+
+static void
+virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
+{
+	struct rte_mbuf *m;
+	int i, nbufs, error, size = vq->vq_nentries;
+	struct vring *vr = &vq->vq_ring;
+	uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/*
+	 * Reinitialise since virtio port might have been stopped and restarted
+	 */
+	memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
+	vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_used_cons_idx = 0;
+	vq->vq_desc_head_idx = 0;
+	vq->vq_avail_idx = 0;
+	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+	vq->vq_free_cnt = vq->vq_nentries;
+	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+
+	/* Chain all the descriptors in the ring with an END */
+	for (i = 0; i < size - 1; i++)
+		vr->desc[i].next = (uint16_t)(i + 1);
+	vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
+
+	/*
+	 * Disable device(host) interrupting guest
+	 */
+	virtqueue_disable_intr(vq);
+
+	/* Only rx virtqueue needs mbufs to be allocated at initialization */
+	if (queue_type == VTNET_RQ) {
+		if (vq->mpool == NULL)
+			rte_exit(EXIT_FAILURE,
+			"Cannot allocate initial mbufs for rx virtqueue");
+
+		/* Allocate blank mbufs for the each rx descriptor */
+		nbufs = 0;
+		error = ENOSPC;
+		while (!virtqueue_full(vq)) {
+			m = rte_rxmbuf_alloc(vq->mpool);
+			if (m == NULL)
+				break;
+
+			/******************************************
+			*         Enqueue allocated buffers        *
+			*******************************************/
+			error = virtqueue_enqueue_recv_refill(vq, m);
+
+			if (error) {
+				rte_pktmbuf_free(m);
+				break;
+			}
+			nbufs++;
+		}
+
+		vq_update_avail_idx(vq);
+
+		PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
+
+		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
+			vq->vq_queue_index);
+		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
+			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	} else if (queue_type == VTNET_TQ) {
+		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
+			vq->vq_queue_index);
+		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
+			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	} else {
+		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
+			vq->vq_queue_index);
+		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
+			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+	}
+}
+
+void
+virtio_dev_cq_start(struct rte_eth_dev *dev)
+{
+	struct virtio_hw *hw = dev->data->dev_private;
+
+	if (hw->cvq) {
+		virtio_dev_vring_start(hw->cvq, VTNET_CQ);
+		VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq);
+	}
+}
+
+void
+virtio_dev_rxtx_start(struct rte_eth_dev *dev)
+{
+	/*
+	 * Start receive and transmit vrings
+	 * -	Setup vring structure for all queues
+	 * -	Initialize descriptor for the rx vring
+	 * -	Allocate blank mbufs for the each rx descriptor
+	 *
+	 */
+	int i;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Start rx vring. */
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		virtio_dev_vring_start(dev->data->rx_queues[i], VTNET_RQ);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
+	}
+
+	/* Start tx vring. */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		virtio_dev_vring_start(dev->data->tx_queues[i], VTNET_TQ);
+		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
+	}
+}
+
+int
+virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
+			uint16_t queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			__rte_unused const struct rte_eth_rxconf *rx_conf,
+			struct rte_mempool *mp)
+{
+	uint16_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_RQ_QUEUE_IDX;
+	struct virtqueue *vq;
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+	ret = virtio_dev_queue_setup(dev, VTNET_RQ, queue_idx, vtpci_queue_idx,
+			nb_desc, socket_id, &vq);
+	if (ret < 0) {
+		PMD_INIT_LOG(ERR, "tvq initialization failed");
+		return ret;
+	}
+
+	/* Create mempool for rx mbuf allocation */
+	vq->mpool = mp;
+
+	dev->data->rx_queues[queue_idx] = vq;
+	return 0;
+}
+
+/*
+ * struct rte_eth_dev *dev: Used to update dev
+ * uint16_t nb_desc: Defaults to values read from config space
+ * unsigned int socket_id: Used to allocate memzone
+ * const struct rte_eth_txconf *tx_conf: Used to setup tx engine
+ * uint16_t queue_idx: Just used as an index in dev txq list
+ */
+int
+virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
+			uint16_t queue_idx,
+			uint16_t nb_desc,
+			unsigned int socket_id,
+			const struct rte_eth_txconf *tx_conf)
+{
+	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+	struct virtqueue *vq;
+	uint16_t tx_free_thresh;
+	int ret;
+
+	PMD_INIT_FUNC_TRACE();
+
+	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
+	    != ETH_TXQ_FLAGS_NOXSUMS) {
+		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
+		return -EINVAL;
+	}
+
+	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
+			nb_desc, socket_id, &vq);
+	if (ret < 0) {
+		PMD_INIT_LOG(ERR, "rvq initialization failed");
+		return ret;
+	}
+
+	tx_free_thresh = tx_conf->tx_free_thresh;
+	if (tx_free_thresh == 0)
+		tx_free_thresh =
+			RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
+
+	if (tx_free_thresh >= (vq->vq_nentries - 3)) {
+		RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+			"number of TX entries minus 3 (%u)."
+			" (tx_free_thresh=%u port=%u queue=%u)\n",
+			vq->vq_nentries - 3,
+			tx_free_thresh, dev->data->port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	vq->vq_free_thresh = tx_free_thresh;
+
+	dev->data->tx_queues[queue_idx] = vq;
+	return 0;
+}
+
+static void
+virtio_discard_rxbuf(struct virtqueue *vq, struct rte_mbuf *m)
+{
+	int error;
+	/*
+	 * Requeue the discarded mbuf. This should always be
+	 * successful since it was just dequeued.
+	 */
+	error = virtqueue_enqueue_recv_refill(vq, m);
+	if (unlikely(error)) {
+		RTE_LOG(ERR, PMD, "cannot requeue discarded mbuf");
+		rte_pktmbuf_free(m);
+	}
+}
+
+#define VIRTIO_MBUF_BURST_SZ 64
+#define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
+uint16_t
+virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	struct virtio_hw *hw;
+	struct rte_mbuf *rxm, *new_mbuf;
+	uint16_t nb_used, num, nb_rx;
+	uint32_t len[VIRTIO_MBUF_BURST_SZ];
+	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
+	int error;
+	uint32_t i, nb_enqueued;
+	const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
+
+	nb_used = VIRTQUEUE_NUSED(rxvq);
+
+	virtio_rmb();
+
+	num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
+	num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : VIRTIO_MBUF_BURST_SZ);
+	if (likely(num > DESC_PER_CACHELINE))
+		num = num - ((rxvq->vq_used_cons_idx + num) % DESC_PER_CACHELINE);
+
+	if (num == 0)
+		return 0;
+
+	num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
+	PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
+
+	hw = rxvq->hw;
+	nb_rx = 0;
+	nb_enqueued = 0;
+
+	for (i = 0; i < num ; i++) {
+		rxm = rcv_pkts[i];
+
+		PMD_RX_LOG(DEBUG, "packet len:%d", len[i]);
+
+		if (unlikely(len[i] < hdr_size + ETHER_HDR_LEN)) {
+			PMD_RX_LOG(ERR, "Packet drop");
+			nb_enqueued++;
+			virtio_discard_rxbuf(rxvq, rxm);
+			rxvq->errors++;
+			continue;
+		}
+
+		rxm->port = rxvq->port_id;
+		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+
+		rxm->nb_segs = 1;
+		rxm->next = NULL;
+		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
+		rxm->data_len = (uint16_t)(len[i] - hdr_size);
+
+		if (hw->vlan_strip)
+			rte_vlan_strip(rxm);
+
+		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
+
+		rx_pkts[nb_rx++] = rxm;
+		rxvq->bytes += rx_pkts[nb_rx - 1]->pkt_len;
+	}
+
+	rxvq->packets += nb_rx;
+
+	/* Allocate new mbuf for the used descriptor */
+	error = ENOSPC;
+	while (likely(!virtqueue_full(rxvq))) {
+		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
+		if (unlikely(new_mbuf == NULL)) {
+			struct rte_eth_dev *dev
+				= &rte_eth_devices[rxvq->port_id];
+			dev->data->rx_mbuf_alloc_failed++;
+			break;
+		}
+		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
+		if (unlikely(error)) {
+			rte_pktmbuf_free(new_mbuf);
+			break;
+		}
+		nb_enqueued++;
+	}
+
+	if (likely(nb_enqueued)) {
+		vq_update_avail_idx(rxvq);
+
+		if (unlikely(virtqueue_kick_prepare(rxvq))) {
+			virtqueue_notify(rxvq);
+			PMD_RX_LOG(DEBUG, "Notified\n");
+		}
+	}
+
+	return nb_rx;
+}
+
+uint16_t
+virtio_recv_mergeable_pkts(void *rx_queue,
+			struct rte_mbuf **rx_pkts,
+			uint16_t nb_pkts)
+{
+	struct virtqueue *rxvq = rx_queue;
+	struct virtio_hw *hw;
+	struct rte_mbuf *rxm, *new_mbuf;
+	uint16_t nb_used, num, nb_rx;
+	uint32_t len[VIRTIO_MBUF_BURST_SZ];
+	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
+	struct rte_mbuf *prev;
+	int error;
+	uint32_t i, nb_enqueued;
+	uint32_t seg_num;
+	uint16_t extra_idx;
+	uint32_t seg_res;
+	const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+
+	nb_used = VIRTQUEUE_NUSED(rxvq);
+
+	virtio_rmb();
+
+	if (nb_used == 0)
+		return 0;
+
+	PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);
+
+	hw = rxvq->hw;
+	nb_rx = 0;
+	i = 0;
+	nb_enqueued = 0;
+	seg_num = 0;
+	extra_idx = 0;
+	seg_res = 0;
+
+	while (i < nb_used) {
+		struct virtio_net_hdr_mrg_rxbuf *header;
+
+		if (nb_rx == nb_pkts)
+			break;
+
+		num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, 1);
+		if (num != 1)
+			continue;
+
+		i++;
+
+		PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
+		PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
+
+		rxm = rcv_pkts[0];
+
+		if (unlikely(len[0] < hdr_size + ETHER_HDR_LEN)) {
+			PMD_RX_LOG(ERR, "Packet drop\n");
+			nb_enqueued++;
+			virtio_discard_rxbuf(rxvq, rxm);
+			rxvq->errors++;
+			continue;
+		}
+
+		header = (struct virtio_net_hdr_mrg_rxbuf *)((char *)rxm->buf_addr +
+			RTE_PKTMBUF_HEADROOM - hdr_size);
+		seg_num = header->num_buffers;
+
+		if (seg_num == 0)
+			seg_num = 1;
+
+		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->nb_segs = seg_num;
+		rxm->next = NULL;
+		rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
+		rxm->data_len = (uint16_t)(len[0] - hdr_size);
+
+		rxm->port = rxvq->port_id;
+		rx_pkts[nb_rx] = rxm;
+		prev = rxm;
+
+		seg_res = seg_num - 1;
+
+		while (seg_res != 0) {
+			/*
+			 * Get extra segments for current uncompleted packet.
+			 */
+			uint16_t  rcv_cnt =
+				RTE_MIN(seg_res, RTE_DIM(rcv_pkts));
+			if (likely(VIRTQUEUE_NUSED(rxvq) >= rcv_cnt)) {
+				uint32_t rx_num =
+					virtqueue_dequeue_burst_rx(rxvq,
+					rcv_pkts, len, rcv_cnt);
+				i += rx_num;
+				rcv_cnt = rx_num;
+			} else {
+				PMD_RX_LOG(ERR,
+					"No enough segments for packet.\n");
+				nb_enqueued++;
+				virtio_discard_rxbuf(rxvq, rxm);
+				rxvq->errors++;
+				break;
+			}
+
+			extra_idx = 0;
+
+			while (extra_idx < rcv_cnt) {
+				rxm = rcv_pkts[extra_idx];
+
+				rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
+				rxm->next = NULL;
+				rxm->pkt_len = (uint32_t)(len[extra_idx]);
+				rxm->data_len = (uint16_t)(len[extra_idx]);
+
+				if (prev)
+					prev->next = rxm;
+
+				prev = rxm;
+				rx_pkts[nb_rx]->pkt_len += rxm->pkt_len;
+				extra_idx++;
+			};
+			seg_res -= rcv_cnt;
+		}
+
+		if (hw->vlan_strip)
+			rte_vlan_strip(rx_pkts[nb_rx]);
+
+		VIRTIO_DUMP_PACKET(rx_pkts[nb_rx],
+			rx_pkts[nb_rx]->data_len);
+
+		rxvq->bytes += rx_pkts[nb_rx]->pkt_len;
+		nb_rx++;
+	}
+
+	rxvq->packets += nb_rx;
+
+	/* Allocate new mbuf for the used descriptor */
+	error = ENOSPC;
+	while (likely(!virtqueue_full(rxvq))) {
+		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
+		if (unlikely(new_mbuf == NULL)) {
+			struct rte_eth_dev *dev
+				= &rte_eth_devices[rxvq->port_id];
+			dev->data->rx_mbuf_alloc_failed++;
+			break;
+		}
+		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
+		if (unlikely(error)) {
+			rte_pktmbuf_free(new_mbuf);
+			break;
+		}
+		nb_enqueued++;
+	}
+
+	if (likely(nb_enqueued)) {
+		vq_update_avail_idx(rxvq);
+
+		if (unlikely(virtqueue_kick_prepare(rxvq))) {
+			virtqueue_notify(rxvq);
+			PMD_RX_LOG(DEBUG, "Notified");
+		}
+	}
+
+	return nb_rx;
+}
+
+uint16_t
+virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct virtqueue *txvq = tx_queue;
+	struct rte_mbuf *txm;
+	uint16_t nb_used, nb_tx;
+	int error;
+
+	if (unlikely(nb_pkts < 1))
+		return nb_pkts;
+
+	PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
+	nb_used = VIRTQUEUE_NUSED(txvq);
+
+	virtio_rmb();
+	if (likely(nb_used > txvq->vq_free_thresh))
+		virtio_xmit_cleanup(txvq, nb_used);
+
+	nb_tx = 0;
+
+	while (nb_tx < nb_pkts) {
+		/* Need one more descriptor for virtio header. */
+		int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
+
+		/*Positive value indicates it need free vring descriptors */
+		if (unlikely(need > 0)) {
+			nb_used = VIRTQUEUE_NUSED(txvq);
+			virtio_rmb();
+			need = RTE_MIN(need, (int)nb_used);
+
+			virtio_xmit_cleanup(txvq, need);
+			need = (int)tx_pkts[nb_tx]->nb_segs -
+				txvq->vq_free_cnt + 1;
+		}
+
+		/*
+		 * Zero or negative value indicates it has enough free
+		 * descriptors to use for transmitting.
+		 */
+		if (likely(need <= 0)) {
+			txm = tx_pkts[nb_tx];
+
+			/* Do VLAN tag insertion */
+			if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
+				error = rte_vlan_insert(&txm);
+				if (unlikely(error)) {
+					rte_pktmbuf_free(txm);
+					++nb_tx;
+					continue;
+				}
+			}
+
+			/* Enqueue Packet buffers */
+			error = virtqueue_enqueue_xmit(txvq, txm);
+			if (unlikely(error)) {
+				if (error == ENOSPC)
+					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
+				else if (error == EMSGSIZE)
+					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
+				else
+					PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
+				break;
+			}
+			nb_tx++;
+			txvq->bytes += txm->pkt_len;
+		} else {
+			PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
+			break;
+		}
+	}
+
+	txvq->packets += nb_tx;
+
+	if (likely(nb_tx)) {
+		vq_update_avail_idx(txvq);
+
+		if (unlikely(virtqueue_kick_prepare(txvq))) {
+			virtqueue_notify(txvq);
+			PMD_TX_LOG(DEBUG, "Notified backend after xmit");
+		}
+	}
+
+	return nb_tx;
+}
diff --git a/drivers/virtio/virtqueue.c b/drivers/virtio/virtqueue.c
new file mode 100644
index 0000000..8a3005f
--- /dev/null
+++ b/drivers/virtio/virtqueue.c
@@ -0,0 +1,70 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+
+#include "virtqueue.h"
+#include "virtio_logs.h"
+#include "virtio_pci.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+/*
+ * Two types of mbuf to be cleaned:
+ * 1) mbuf that has been consumed by backend but not used by virtio.
+ * 2) mbuf that hasn't been consued by backend.
+ */
+struct rte_mbuf *
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_mbuf *cookie;
+	int idx;
+
+	for (idx = 0; idx < vq->vq_nentries; idx++) {
+		if ((cookie = vq->vq_descx[idx].cookie) != NULL) {
+			vq->vq_descx[idx].cookie = NULL;
+			return cookie;
+		}
+	}
+	return NULL;
+}
diff --git a/drivers/virtio/virtqueue.h b/drivers/virtio/virtqueue.h
new file mode 100644
index 0000000..9d6079e
--- /dev/null
+++ b/drivers/virtio/virtqueue.h
@@ -0,0 +1,325 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ * This driver is for virtio_pci on SMP and therefore can assume
+ * weaker (compiler barriers)
+ */
+#define virtio_mb()	rte_mb()
+#define virtio_rmb()	rte_compiler_barrier()
+#define virtio_wmb()	rte_compiler_barrier()
+
+#ifdef RTE_PMD_PACKET_PREFETCH
+#define rte_packet_prefetch(p)  rte_prefetch1(p)
+#else
+#define rte_packet_prefetch(p)  do {} while(0)
+#endif
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+#define RTE_MBUF_DATA_DMA_ADDR(mb) \
+	(uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
+
+#define VTNET_SQ_RQ_QUEUE_IDX 0
+#define VTNET_SQ_TQ_QUEUE_IDX 1
+#define VTNET_SQ_CQ_QUEUE_IDX 2
+
+enum { VTNET_RQ = 0, VTNET_TQ = 1, VTNET_CQ = 2 };
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+/**
+ * Control the RX mode, ie. promiscuous, allmulti, etc...
+ * All commands require an "out" sg entry containing a 1 byte
+ * state value, zero = disable, non-zero = enable.  Commands
+ * 0 and 1 are supported with the VIRTIO_NET_F_CTRL_RX feature.
+ * Commands 2-5 are added with VIRTIO_NET_F_CTRL_RX_EXTRA.
+ */
+#define VIRTIO_NET_CTRL_RX              0
+#define VIRTIO_NET_CTRL_RX_PROMISC      0
+#define VIRTIO_NET_CTRL_RX_ALLMULTI     1
+#define VIRTIO_NET_CTRL_RX_ALLUNI       2
+#define VIRTIO_NET_CTRL_RX_NOMULTI      3
+#define VIRTIO_NET_CTRL_RX_NOUNI        4
+#define VIRTIO_NET_CTRL_RX_NOBCAST      5
+
+/**
+ * Control the MAC
+ *
+ * The MAC filter table is managed by the hypervisor, the guest should
+ * assume the size is infinite.  Filtering should be considered
+ * non-perfect, ie. based on hypervisor resources, the guest may
+ * received packets from sources not specified in the filter list.
+ *
+ * In addition to the class/cmd header, the TABLE_SET command requires
+ * two out scatterlists.  Each contains a 4 byte count of entries followed
+ * by a concatenated byte stream of the ETH_ALEN MAC addresses.  The
+ * first sg list contains unicast addresses, the second is for multicast.
+ * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
+ * is available.
+ *
+ * The ADDR_SET command requests one out scatterlist, it contains a
+ * 6 bytes MAC address. This functionality is present if the
+ * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
+ */
+struct virtio_net_ctrl_mac {
+	uint32_t entries;
+	uint8_t macs[][ETHER_ADDR_LEN];
+} __attribute__((__packed__));
+
+#define VIRTIO_NET_CTRL_MAC    1
+ #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0
+ #define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
+
+/**
+ * Control VLAN filtering
+ *
+ * The VLAN filter table is controlled via a simple ADD/DEL interface.
+ * VLAN IDs not added may be filtered by the hypervisor.  Del is the
+ * opposite of add.  Both commands expect an out entry containing a 2
+ * byte VLAN ID.  VLAN filtering is available with the
+ * VIRTIO_NET_F_CTRL_VLAN feature bit.
+ */
+#define VIRTIO_NET_CTRL_VLAN     2
+#define VIRTIO_NET_CTRL_VLAN_ADD 0
+#define VIRTIO_NET_CTRL_VLAN_DEL 1
+
+struct virtio_net_ctrl_hdr {
+	uint8_t class;
+	uint8_t cmd;
+} __attribute__((packed));
+
+typedef uint8_t virtio_net_ctrl_ack;
+
+#define VIRTIO_NET_OK     0
+#define VIRTIO_NET_ERR    1
+
+#define VIRTIO_MAX_CTRL_DATA 2048
+
+struct virtio_pmd_ctrl {
+	struct virtio_net_ctrl_hdr hdr;
+	virtio_net_ctrl_ack status;
+	uint8_t data[VIRTIO_MAX_CTRL_DATA];
+};
+
+struct virtqueue {
+	struct virtio_hw         *hw;     /**< virtio_hw structure pointer. */
+	const struct rte_memzone *mz;     /**< mem zone to populate RX ring. */
+	const struct rte_memzone *virtio_net_hdr_mz; /**< memzone to populate hdr. */
+	struct rte_mempool       *mpool;  /**< mempool for mbuf allocation */
+	uint16_t    queue_id;             /**< DPDK queue index. */
+	uint8_t     port_id;              /**< Device port identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+	uint16_t    vq_free_thresh; /**< free threshold */
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
+
+	/* Statistics */
+	uint64_t	packets;
+	uint64_t	bytes;
+	uint64_t	errors;
+
+	struct vq_desc_extra {
+		void              *cookie;
+		uint16_t          ndescs;
+	} vq_descx[0];
+};
+
+/* If multiqueue is provided by host, then we suppport it. */
+#ifndef VIRTIO_NET_F_MQ
+/* Device supports Receive Flow Steering */
+#define VIRTIO_NET_F_MQ 0x400000
+#define VIRTIO_NET_CTRL_MQ   4
+#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
+#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
+#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
+#endif
+#ifndef VIRTIO_NET_F_CTRL_MAC_ADDR
+#define VIRTIO_NET_F_CTRL_MAC_ADDR 0x800000
+#define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
+#endif
+
+/**
+ * This is the first element of the scatter-gather list.  If you don't
+ * specify GSO or CSUM features, you can simply ignore the header.
+ */
+struct virtio_net_hdr {
+#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
+	uint8_t flags;
+#define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
+#define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
+#define VIRTIO_NET_HDR_GSO_UDP      3    /**< GSO frame, IPv4 UDP (UFO) */
+#define VIRTIO_NET_HDR_GSO_TCPV6    4    /**< GSO frame, IPv6 TCP */
+#define VIRTIO_NET_HDR_GSO_ECN      0x80 /**< TCP has ECN set */
+	uint8_t gso_type;
+	uint16_t hdr_len;     /**< Ethernet + IP + tcp/udp hdrs */
+	uint16_t gso_size;    /**< Bytes to append to hdr_len per frame */
+	uint16_t csum_start;  /**< Position to start checksumming from */
+	uint16_t csum_offset; /**< Offset after that to place checksum */
+};
+
+/**
+ * This is the version of the header to use when the MRG_RXBUF
+ * feature has been negotiated.
+ */
+struct virtio_net_hdr_mrg_rxbuf {
+	struct   virtio_net_hdr hdr;
+	uint16_t num_buffers; /**< Number of merged rx buffers */
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+/**
+ *  Dump virtqueue internal structures, for debug purpose only.
+ */
+void virtqueue_dump(struct virtqueue *vq);
+/**
+ *  Get all mbufs to be freed.
+ */
+struct rte_mbuf *virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_NOTIFY, vq->vq_queue_index);
+}
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	PMD_INIT_LOG(DEBUG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
diff --git a/lib/Makefile b/lib/Makefile
index 68b6706..d0e7fa4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,7 +41,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_TIMER) += librte_timer
 DIRS-$(CONFIG_RTE_LIBRTE_CFGFILE) += librte_cfgfile
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
 DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
-DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
 DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
diff --git a/lib/librte_pmd_virtio/Makefile b/lib/librte_pmd_virtio/Makefile
deleted file mode 100644
index 21ff7e5..0000000
--- a/lib/librte_pmd_virtio/Makefile
+++ /dev/null
@@ -1,60 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-#
-# library name
-#
-LIB = librte_pmd_virtio.a
-
-CFLAGS += -O3
-CFLAGS += $(WERROR_FLAGS)
-
-EXPORT_MAP := rte_pmd_virtio_version.map
-
-LIBABIVER := 1
-
-#
-# all source are stored in SRCS-y
-#
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
-
-# this lib depends upon:
-DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
-DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
-DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_net lib/librte_malloc
-
-include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_virtio/rte_pmd_virtio_version.map b/lib/librte_pmd_virtio/rte_pmd_virtio_version.map
deleted file mode 100644
index ef35398..0000000
--- a/lib/librte_pmd_virtio/rte_pmd_virtio_version.map
+++ /dev/null
@@ -1,4 +0,0 @@
-DPDK_2.0 {
-
-	local: *;
-};
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c
deleted file mode 100644
index e63dbfb..0000000
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ /dev/null
@@ -1,1504 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdint.h>
-#include <string.h>
-#include <stdio.h>
-#include <errno.h>
-#include <unistd.h>
-#ifdef RTE_EXEC_ENV_LINUXAPP
-#include <dirent.h>
-#include <fcntl.h>
-#endif
-
-#include <rte_ethdev.h>
-#include <rte_memcpy.h>
-#include <rte_string_fns.h>
-#include <rte_memzone.h>
-#include <rte_malloc.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_pci.h>
-#include <rte_ether.h>
-#include <rte_common.h>
-
-#include <rte_memory.h>
-#include <rte_eal.h>
-#include <rte_dev.h>
-
-#include "virtio_ethdev.h"
-#include "virtio_pci.h"
-#include "virtio_logs.h"
-#include "virtqueue.h"
-
-
-static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
-static int  virtio_dev_configure(struct rte_eth_dev *dev);
-static int  virtio_dev_start(struct rte_eth_dev *dev);
-static void virtio_dev_stop(struct rte_eth_dev *dev);
-static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
-static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
-static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
-static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
-static void virtio_dev_info_get(struct rte_eth_dev *dev,
-				struct rte_eth_dev_info *dev_info);
-static int virtio_dev_link_update(struct rte_eth_dev *dev,
-	__rte_unused int wait_to_complete);
-
-static void virtio_set_hwaddr(struct virtio_hw *hw);
-static void virtio_get_hwaddr(struct virtio_hw *hw);
-
-static void virtio_dev_rx_queue_release(__rte_unused void *rxq);
-static void virtio_dev_tx_queue_release(__rte_unused void *txq);
-
-static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
-static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
-static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
-static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
-				uint16_t vlan_id, int on);
-static void virtio_mac_addr_add(struct rte_eth_dev *dev,
-				struct ether_addr *mac_addr,
-				uint32_t index, uint32_t vmdq __rte_unused);
-static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
-static void virtio_mac_addr_set(struct rte_eth_dev *dev,
-				struct ether_addr *mac_addr);
-
-static int virtio_dev_queue_stats_mapping_set(
-	__rte_unused struct rte_eth_dev *eth_dev,
-	__rte_unused uint16_t queue_id,
-	__rte_unused uint8_t stat_idx,
-	__rte_unused uint8_t is_rx);
-
-/*
- * The set of PCI devices this driver supports
- */
-static const struct rte_pci_id pci_id_virtio_map[] = {
-
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
-#include "rte_pci_dev_ids.h"
-
-{ .vendor_id = 0, /* sentinel */ },
-};
-
-static int
-virtio_send_command(struct virtqueue *vq, struct virtio_pmd_ctrl *ctrl,
-		int *dlen, int pkt_num)
-{
-	uint16_t head = vq->vq_desc_head_idx, i;
-	int k, sum = 0;
-	virtio_net_ctrl_ack status = ~0;
-	struct virtio_pmd_ctrl result;
-
-	ctrl->status = status;
-
-	if (!vq->hw->cvq) {
-		PMD_INIT_LOG(ERR,
-			     "%s(): Control queue is not supported.",
-			     __func__);
-		return -1;
-	}
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
-		"vq->hw->cvq = %p vq = %p",
-		vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
-
-	if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1))
-		return -1;
-
-	memcpy(vq->virtio_net_hdr_mz->addr, ctrl,
-		sizeof(struct virtio_pmd_ctrl));
-
-	/*
-	 * Format is enforced in qemu code:
-	 * One TX packet for header;
-	 * At least one TX packet per argument;
-	 * One RX packet for ACK.
-	 */
-	vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
-	vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
-	vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_free_cnt--;
-	i = vq->vq_ring.desc[head].next;
-
-	for (k = 0; k < pkt_num; k++) {
-		vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
-		vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
-			+ sizeof(struct virtio_net_ctrl_hdr)
-			+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
-		vq->vq_ring.desc[i].len = dlen[k];
-		sum += dlen[k];
-		vq->vq_free_cnt--;
-		i = vq->vq_ring.desc[i].next;
-	}
-
-	vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
-	vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
-			+ sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_ring.desc[i].len = sizeof(ctrl->status);
-	vq->vq_free_cnt--;
-
-	vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
-
-	vq_update_avail_ring(vq, head);
-	vq_update_avail_idx(vq);
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index);
-
-	virtqueue_notify(vq);
-
-	rte_rmb();
-	while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
-		rte_rmb();
-		usleep(100);
-	}
-
-	while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
-		uint32_t idx, desc_idx, used_idx;
-		struct vring_used_elem *uep;
-
-		used_idx = (uint32_t)(vq->vq_used_cons_idx
-				& (vq->vq_nentries - 1));
-		uep = &vq->vq_ring.used->ring[used_idx];
-		idx = (uint32_t) uep->id;
-		desc_idx = idx;
-
-		while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
-			desc_idx = vq->vq_ring.desc[desc_idx].next;
-			vq->vq_free_cnt++;
-		}
-
-		vq->vq_ring.desc[desc_idx].next = vq->vq_desc_head_idx;
-		vq->vq_desc_head_idx = idx;
-
-		vq->vq_used_cons_idx++;
-		vq->vq_free_cnt++;
-	}
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d",
-			vq->vq_free_cnt, vq->vq_desc_head_idx);
-
-	memcpy(&result, vq->virtio_net_hdr_mz->addr,
-			sizeof(struct virtio_pmd_ctrl));
-
-	return result.status;
-}
-
-static int
-virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
-	memcpy(ctrl.data, &nb_queues, sizeof(uint16_t));
-
-	dlen[0] = sizeof(uint16_t);
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret) {
-		PMD_INIT_LOG(ERR, "Multiqueue configured but send command "
-			  "failed, this is too late now...");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-int virtio_dev_queue_setup(struct rte_eth_dev *dev,
-			int queue_type,
-			uint16_t queue_idx,
-			uint16_t  vtpci_queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			struct virtqueue **pvq)
-{
-	char vq_name[VIRTQUEUE_MAX_NAME_SZ];
-	const struct rte_memzone *mz;
-	uint16_t vq_size;
-	int size;
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtqueue  *vq = NULL;
-
-	/* Write the virtqueue index to the Queue Select Field */
-	VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
-	PMD_INIT_LOG(DEBUG, "selecting queue: %d", vtpci_queue_idx);
-
-	/*
-	 * Read the virtqueue size from the Queue Size field
-	 * Always power of 2 and if 0 virtqueue does not exist
-	 */
-	vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
-	PMD_INIT_LOG(DEBUG, "vq_size: %d nb_desc:%d", vq_size, nb_desc);
-	if (nb_desc == 0)
-		nb_desc = vq_size;
-	if (vq_size == 0) {
-		PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
-		return -EINVAL;
-	} else if (!rte_is_power_of_2(vq_size)) {
-		PMD_INIT_LOG(ERR, "%s: virtqueue size is not powerof 2", __func__);
-		return -EINVAL;
-	} else if (nb_desc != vq_size) {
-		PMD_INIT_LOG(ERR, "Warning: nb_desc(%d) is not equal to vq size (%d), fall to vq size",
-			nb_desc, vq_size);
-		nb_desc = vq_size;
-	}
-
-	if (queue_type == VTNET_RQ) {
-		snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
-			dev->data->port_id, queue_idx);
-		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
-	} else if (queue_type == VTNET_TQ) {
-		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
-			dev->data->port_id, queue_idx);
-		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-			vq_size * sizeof(struct vq_desc_extra), RTE_CACHE_LINE_SIZE);
-	} else if (queue_type == VTNET_CQ) {
-		snprintf(vq_name, sizeof(vq_name), "port%d_cvq",
-			dev->data->port_id);
-		vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-			vq_size * sizeof(struct vq_desc_extra),
-			RTE_CACHE_LINE_SIZE);
-	}
-	if (vq == NULL) {
-		PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
-		return (-ENOMEM);
-	}
-
-	vq->hw = hw;
-	vq->port_id = dev->data->port_id;
-	vq->queue_id = queue_idx;
-	vq->vq_queue_index = vtpci_queue_idx;
-	vq->vq_nentries = vq_size;
-	vq->vq_free_cnt = vq_size;
-
-	/*
-	 * Reserve a memzone for vring elements
-	 */
-	size = vring_size(vq_size, VIRTIO_PCI_VRING_ALIGN);
-	vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
-	PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d", size, vq->vq_ring_size);
-
-	mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
-		socket_id, 0, VIRTIO_PCI_VRING_ALIGN);
-	if (mz == NULL) {
-		rte_free(vq);
-		return -ENOMEM;
-	}
-
-	/*
-	 * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
-	 * and only accepts 32 bit page frame number.
-	 * Check if the allocated physical memory exceeds 16TB.
-	 */
-	if ((mz->phys_addr + vq->vq_ring_size - 1) >> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
-		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
-		rte_free(vq);
-		return -ENOMEM;
-	}
-
-	memset(mz->addr, 0, sizeof(mz->len));
-	vq->mz = mz;
-	vq->vq_ring_mem = mz->phys_addr;
-	vq->vq_ring_virt_mem = mz->addr;
-	PMD_INIT_LOG(DEBUG, "vq->vq_ring_mem:      0x%"PRIx64, (uint64_t)mz->phys_addr);
-	PMD_INIT_LOG(DEBUG, "vq->vq_ring_virt_mem: 0x%"PRIx64, (uint64_t)mz->addr);
-	vq->virtio_net_hdr_mz  = NULL;
-	vq->virtio_net_hdr_mem = 0;
-
-	if (queue_type == VTNET_TQ) {
-		/*
-		 * For each xmit packet, allocate a virtio_net_hdr
-		 */
-		snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
-			dev->data->port_id, queue_idx);
-		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
-			vq_size * hw->vtnet_hdr_size,
-			socket_id, 0, RTE_CACHE_LINE_SIZE);
-		if (vq->virtio_net_hdr_mz == NULL) {
-			rte_free(vq);
-			return -ENOMEM;
-		}
-		vq->virtio_net_hdr_mem =
-			vq->virtio_net_hdr_mz->phys_addr;
-		memset(vq->virtio_net_hdr_mz->addr, 0,
-			vq_size * hw->vtnet_hdr_size);
-	} else if (queue_type == VTNET_CQ) {
-		/* Allocate a page for control vq command, data and status */
-		snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
-			dev->data->port_id);
-		vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
-			PAGE_SIZE, socket_id, 0, RTE_CACHE_LINE_SIZE);
-		if (vq->virtio_net_hdr_mz == NULL) {
-			rte_free(vq);
-			return -ENOMEM;
-		}
-		vq->virtio_net_hdr_mem =
-			vq->virtio_net_hdr_mz->phys_addr;
-		memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
-	}
-
-	/*
-	 * Set guest physical address of the virtqueue
-	 * in VIRTIO_PCI_QUEUE_PFN config register of device
-	 */
-	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
-			mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	*pvq = vq;
-	return 0;
-}
-
-static int
-virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx,
-		uint32_t socket_id)
-{
-	struct virtqueue *vq;
-	uint16_t nb_desc = 0;
-	int ret;
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	PMD_INIT_FUNC_TRACE();
-	ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
-			vtpci_queue_idx, nb_desc, socket_id, &vq);
-
-	if (ret < 0) {
-		PMD_INIT_LOG(ERR, "control vq initialization failed");
-		return ret;
-	}
-
-	hw->cvq = vq;
-	return 0;
-}
-
-static void
-virtio_dev_close(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct rte_pci_device *pci_dev = dev->pci_dev;
-
-	PMD_INIT_LOG(DEBUG, "virtio_dev_close");
-
-	/* reset the NIC */
-	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-		vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
-	vtpci_reset(hw);
-	hw->started = 0;
-	virtio_dev_free_mbufs(dev);
-}
-
-static void
-virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
-	ctrl.data[0] = 1;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to enable promisc");
-}
-
-static void
-virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
-	ctrl.data[0] = 0;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to disable promisc");
-}
-
-static void
-virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
-	ctrl.data[0] = 1;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to enable allmulticast");
-}
-
-static void
-virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int dlen[1];
-	int ret;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
-	ctrl.data[0] = 0;
-	dlen[0] = 1;
-
-	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
-
-	if (ret)
-		PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
-}
-
-/*
- * dev_ops for virtio, bare necessities for basic operation
- */
-static const struct eth_dev_ops virtio_eth_dev_ops = {
-	.dev_configure           = virtio_dev_configure,
-	.dev_start               = virtio_dev_start,
-	.dev_stop                = virtio_dev_stop,
-	.dev_close               = virtio_dev_close,
-	.promiscuous_enable      = virtio_dev_promiscuous_enable,
-	.promiscuous_disable     = virtio_dev_promiscuous_disable,
-	.allmulticast_enable     = virtio_dev_allmulticast_enable,
-	.allmulticast_disable    = virtio_dev_allmulticast_disable,
-
-	.dev_infos_get           = virtio_dev_info_get,
-	.stats_get               = virtio_dev_stats_get,
-	.stats_reset             = virtio_dev_stats_reset,
-	.link_update             = virtio_dev_link_update,
-	.rx_queue_setup          = virtio_dev_rx_queue_setup,
-	/* meaningfull only to multiple queue */
-	.rx_queue_release        = virtio_dev_rx_queue_release,
-	.tx_queue_setup          = virtio_dev_tx_queue_setup,
-	/* meaningfull only to multiple queue */
-	.tx_queue_release        = virtio_dev_tx_queue_release,
-	/* collect stats per queue */
-	.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
-	.vlan_filter_set         = virtio_vlan_filter_set,
-	.mac_addr_add            = virtio_mac_addr_add,
-	.mac_addr_remove         = virtio_mac_addr_remove,
-	.mac_addr_set            = virtio_mac_addr_set,
-};
-
-static inline int
-virtio_dev_atomic_read_link_status(struct rte_eth_dev *dev,
-				struct rte_eth_link *link)
-{
-	struct rte_eth_link *dst = link;
-	struct rte_eth_link *src = &(dev->data->dev_link);
-
-	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
-			*(uint64_t *)src) == 0)
-		return -1;
-
-	return 0;
-}
-
-/**
- * Atomically writes the link status information into global
- * structure rte_eth_dev.
- *
- * @param dev
- *   - Pointer to the structure rte_eth_dev to read from.
- *   - Pointer to the buffer to be saved with the link status.
- *
- * @return
- *   - On success, zero.
- *   - On failure, negative value.
- */
-static inline int
-virtio_dev_atomic_write_link_status(struct rte_eth_dev *dev,
-		struct rte_eth_link *link)
-{
-	struct rte_eth_link *dst = &(dev->data->dev_link);
-	struct rte_eth_link *src = link;
-
-	if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
-					*(uint64_t *)src) == 0)
-		return -1;
-
-	return 0;
-}
-
-static void
-virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
-{
-	unsigned i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		const struct virtqueue *txvq = dev->data->tx_queues[i];
-		if (txvq == NULL)
-			continue;
-
-		stats->opackets += txvq->packets;
-		stats->obytes += txvq->bytes;
-		stats->oerrors += txvq->errors;
-
-		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			stats->q_opackets[i] = txvq->packets;
-			stats->q_obytes[i] = txvq->bytes;
-		}
-	}
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		const struct virtqueue *rxvq = dev->data->rx_queues[i];
-		if (rxvq == NULL)
-			continue;
-
-		stats->ipackets += rxvq->packets;
-		stats->ibytes += rxvq->bytes;
-		stats->ierrors += rxvq->errors;
-
-		if (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			stats->q_ipackets[i] = rxvq->packets;
-			stats->q_ibytes[i] = rxvq->bytes;
-		}
-	}
-
-	stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
-}
-
-static void
-virtio_dev_stats_reset(struct rte_eth_dev *dev)
-{
-	unsigned int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		struct virtqueue *txvq = dev->data->tx_queues[i];
-		if (txvq == NULL)
-			continue;
-
-		txvq->packets = 0;
-		txvq->bytes = 0;
-		txvq->errors = 0;
-	}
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		struct virtqueue *rxvq = dev->data->rx_queues[i];
-		if (rxvq == NULL)
-			continue;
-
-		rxvq->packets = 0;
-		rxvq->bytes = 0;
-		rxvq->errors = 0;
-	}
-
-	dev->data->rx_mbuf_alloc_failed = 0;
-}
-
-static void
-virtio_set_hwaddr(struct virtio_hw *hw)
-{
-	vtpci_write_dev_config(hw,
-			offsetof(struct virtio_net_config, mac),
-			&hw->mac_addr, ETHER_ADDR_LEN);
-}
-
-static void
-virtio_get_hwaddr(struct virtio_hw *hw)
-{
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC)) {
-		vtpci_read_dev_config(hw,
-			offsetof(struct virtio_net_config, mac),
-			&hw->mac_addr, ETHER_ADDR_LEN);
-	} else {
-		eth_random_addr(&hw->mac_addr[0]);
-		virtio_set_hwaddr(hw);
-	}
-}
-
-static int
-virtio_mac_table_set(struct virtio_hw *hw,
-		     const struct virtio_net_ctrl_mac *uc,
-		     const struct virtio_net_ctrl_mac *mc)
-{
-	struct virtio_pmd_ctrl ctrl;
-	int err, len[2];
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
-	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
-
-	len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
-	memcpy(ctrl.data, uc, len[0]);
-
-	len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
-	memcpy(ctrl.data + len[0], mc, len[1]);
-
-	err = virtio_send_command(hw->cvq, &ctrl, len, 2);
-	if (err != 0)
-		PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
-
-	return err;
-}
-
-static void
-virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
-		    uint32_t index, uint32_t vmdq __rte_unused)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	const struct ether_addr *addrs = dev->data->mac_addrs;
-	unsigned int i;
-	struct virtio_net_ctrl_mac *uc, *mc;
-
-	if (index >= VIRTIO_MAX_MAC_ADDRS) {
-		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
-		return;
-	}
-
-	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
-	uc->entries = 0;
-	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
-	mc->entries = 0;
-
-	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
-		const struct ether_addr *addr
-			= (i == index) ? mac_addr : addrs + i;
-		struct virtio_net_ctrl_mac *tbl
-			= is_multicast_ether_addr(addr) ? mc : uc;
-
-		memcpy(&tbl->macs[tbl->entries++], addr, ETHER_ADDR_LEN);
-	}
-
-	virtio_mac_table_set(hw, uc, mc);
-}
-
-static void
-virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct ether_addr *addrs = dev->data->mac_addrs;
-	struct virtio_net_ctrl_mac *uc, *mc;
-	unsigned int i;
-
-	if (index >= VIRTIO_MAX_MAC_ADDRS) {
-		PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
-		return;
-	}
-
-	uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(uc->entries));
-	uc->entries = 0;
-	mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + sizeof(mc->entries));
-	mc->entries = 0;
-
-	for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
-		struct virtio_net_ctrl_mac *tbl;
-
-		if (i == index || is_zero_ether_addr(addrs + i))
-			continue;
-
-		tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
-		memcpy(&tbl->macs[tbl->entries++], addrs + i, ETHER_ADDR_LEN);
-	}
-
-	virtio_mac_table_set(hw, uc, mc);
-}
-
-static void
-virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
-
-	/* Use atomic update if available */
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
-		struct virtio_pmd_ctrl ctrl;
-		int len = ETHER_ADDR_LEN;
-
-		ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
-		ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
-
-		memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
-		virtio_send_command(hw->cvq, &ctrl, &len, 1);
-	} else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
-		virtio_set_hwaddr(hw);
-}
-
-static int
-virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct virtio_pmd_ctrl ctrl;
-	int len;
-
-	if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
-		return -ENOTSUP;
-
-	ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
-	ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
-	memcpy(ctrl.data, &vlan_id, sizeof(vlan_id));
-	len = sizeof(vlan_id);
-
-	return virtio_send_command(hw->cvq, &ctrl, &len, 1);
-}
-
-static void
-virtio_negotiate_features(struct virtio_hw *hw)
-{
-	uint32_t host_features, mask;
-
-	/* checksum offload not implemented */
-	mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
-
-	/* TSO and LRO are only available when their corresponding
-	 * checksum offload feature is also negotiated.
-	 */
-	mask |= VIRTIO_NET_F_HOST_TSO4 | VIRTIO_NET_F_HOST_TSO6 | VIRTIO_NET_F_HOST_ECN;
-	mask |= VIRTIO_NET_F_GUEST_TSO4 | VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN;
-	mask |= VTNET_LRO_FEATURES;
-
-	/* not negotiating INDIRECT descriptor table support */
-	mask |= VIRTIO_RING_F_INDIRECT_DESC;
-
-	/* Prepare guest_features: feature that driver wants to support */
-	hw->guest_features = VTNET_FEATURES & ~mask;
-	PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
-		hw->guest_features);
-
-	/* Read device(host) feature bits */
-	host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
-	PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
-		host_features);
-
-	/*
-	 * Negotiate features: Subset of device feature bits are written back
-	 * guest feature bits.
-	 */
-	hw->guest_features = vtpci_negotiate_features(hw, host_features);
-	PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
-		hw->guest_features);
-}
-
-#ifdef RTE_EXEC_ENV_LINUXAPP
-static int
-parse_sysfs_value(const char *filename, unsigned long *val)
-{
-	FILE *f;
-	char buf[BUFSIZ];
-	char *end = NULL;
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
-			     __func__, filename);
-		return -1;
-	}
-
-	if (fgets(buf, sizeof(buf), f) == NULL) {
-		PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
-			     __func__, filename);
-		fclose(f);
-		return -1;
-	}
-	*val = strtoul(buf, &end, 0);
-	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-		PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
-			     __func__, filename);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-	return 0;
-}
-
-static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen,
-			unsigned int *uio_num)
-{
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-	snprintf(dirname, sizeof(dirname),
-		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-		     loc->domain, loc->bus, loc->devid, loc->function);
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		snprintf(dirname, sizeof(dirname),
-			     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-			     loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1;
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		*uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		*uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			snprintf(buf, buflen, "%s/uio:uio%u", dirname,
-				     *uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL) {
-		PMD_INIT_LOG(ERR, "Could not find uio resource");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-virtio_has_msix(const struct rte_pci_addr *loc)
-{
-	DIR *d;
-	char dirname[PATH_MAX];
-
-	snprintf(dirname, sizeof(dirname),
-		     SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
-		     loc->domain, loc->bus, loc->devid, loc->function);
-
-	d = opendir(dirname);
-	if (d)
-		closedir(d);
-
-	return (d != NULL);
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
-{
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	unsigned long start, size;
-	unsigned int uio_num;
-
-	if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname), &uio_num) < 0)
-		return -1;
-
-	/* get portio size */
-	snprintf(filename, sizeof(filename),
-		     "%s/portio/port0/size", dirname);
-	if (parse_sysfs_value(filename, &size) < 0) {
-		PMD_INIT_LOG(ERR, "%s(): cannot parse size",
-			     __func__);
-		return -1;
-	}
-
-	/* get portio start */
-	snprintf(filename, sizeof(filename),
-		 "%s/portio/port0/start", dirname);
-	if (parse_sysfs_value(filename, &start) < 0) {
-		PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
-			     __func__);
-		return -1;
-	}
-	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
-	pci_dev->mem_resource[0].len =  (uint64_t)size;
-	PMD_INIT_LOG(DEBUG,
-		     "PCI Port IO found start=0x%lx with size=0x%lx",
-		     start, size);
-
-	/* save fd */
-	memset(dirname, 0, sizeof(dirname));
-	snprintf(dirname, sizeof(dirname), "/dev/uio%u", uio_num);
-	pci_dev->intr_handle.fd = open(dirname, O_RDWR);
-	if (pci_dev->intr_handle.fd < 0) {
-		PMD_INIT_LOG(ERR, "Cannot open %s: %s\n",
-			dirname, strerror(errno));
-		return -1;
-	}
-
-	pci_dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-	pci_dev->driver->drv_flags |= RTE_PCI_DRV_INTR_LSC;
-
-	return 0;
-}
-
-/* Extract port I/O numbers from proc/ioports */
-static int virtio_resource_init_by_ioports(struct rte_pci_device *pci_dev)
-{
-	uint16_t start, end;
-	int size;
-	FILE *fp;
-	char *line = NULL;
-	char pci_id[16];
-	int found = 0;
-	size_t linesz;
-
-	snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
-		 pci_dev->addr.domain,
-		 pci_dev->addr.bus,
-		 pci_dev->addr.devid,
-		 pci_dev->addr.function);
-
-	fp = fopen("/proc/ioports", "r");
-	if (fp == NULL) {
-		PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
-		return -1;
-	}
-
-	while (getdelim(&line, &linesz, '\n', fp) > 0) {
-		char *ptr = line;
-		char *left;
-		int n;
-
-		n = strcspn(ptr, ":");
-		ptr[n] = 0;
-		left = &ptr[n+1];
-
-		while (*left && isspace(*left))
-			left++;
-
-		if (!strncmp(left, pci_id, strlen(pci_id))) {
-			found = 1;
-
-			while (*ptr && isspace(*ptr))
-				ptr++;
-
-			sscanf(ptr, "%04hx-%04hx", &start, &end);
-			size = end - start + 1;
-
-			break;
-		}
-	}
-
-	free(line);
-	fclose(fp);
-
-	if (!found)
-		return -1;
-
-	pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
-	pci_dev->mem_resource[0].len =  (uint64_t)size;
-	PMD_INIT_LOG(DEBUG,
-		"PCI Port IO found start=0x%x with size=0x%x",
-		start, size);
-
-	/* can't support lsc interrupt without uio */
-	pci_dev->driver->drv_flags &= ~RTE_PCI_DRV_INTR_LSC;
-
-	return 0;
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init(struct rte_pci_device *pci_dev)
-{
-	if (virtio_resource_init_by_uio(pci_dev) == 0)
-		return 0;
-	else
-		return virtio_resource_init_by_ioports(pci_dev);
-}
-
-#else
-static int
-virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
-{
-	/* nic_uio does not enable interrupts, return 0 (false). */
-	return 0;
-}
-
-static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused)
-{
-	/* no setup required */
-	return 0;
-}
-#endif
-
-/*
- * Process Virtio Config changed interrupt and call the callback
- * if link state changed.
- */
-static void
-virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
-			 void *param)
-{
-	struct rte_eth_dev *dev = param;
-	struct virtio_hw *hw = dev->data->dev_private;
-	uint8_t isr;
-
-	/* Read interrupt status which clears interrupt */
-	isr = vtpci_isr(hw);
-	PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
-
-	if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
-		PMD_DRV_LOG(ERR, "interrupt enable failed");
-
-	if (isr & VIRTIO_PCI_ISR_CONFIG) {
-		if (virtio_dev_link_update(dev, 0) == 0)
-			_rte_eth_dev_callback_process(dev,
-						      RTE_ETH_EVENT_INTR_LSC);
-	}
-
-}
-
-static void
-rx_func_get(struct rte_eth_dev *eth_dev)
-{
-	struct virtio_hw *hw = eth_dev->data->dev_private;
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
-		eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
-	else
-		eth_dev->rx_pkt_burst = &virtio_recv_pkts;
-}
-
-/*
- * This function is based on probe() function in virtio_pci.c
- * It returns 0 on success.
- */
-static int
-eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
-{
-	struct virtio_hw *hw = eth_dev->data->dev_private;
-	struct virtio_net_config *config;
-	struct virtio_net_config local_config;
-	uint32_t offset_conf = sizeof(config->mac);
-	struct rte_pci_device *pci_dev;
-
-	RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
-
-	eth_dev->dev_ops = &virtio_eth_dev_ops;
-	eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		rx_func_get(eth_dev);
-		return 0;
-	}
-
-	/* Allocate memory for storing MAC addresses */
-	eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
-	if (eth_dev->data->mac_addrs == NULL) {
-		PMD_INIT_LOG(ERR,
-			"Failed to allocate %d bytes needed to store MAC addresses",
-			ETHER_ADDR_LEN);
-		return -ENOMEM;
-	}
-
-	pci_dev = eth_dev->pci_dev;
-	if (virtio_resource_init(pci_dev) < 0)
-		return -1;
-
-	hw->use_msix = virtio_has_msix(&pci_dev->addr);
-	hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
-
-	/* Reset the device although not necessary at startup */
-	vtpci_reset(hw);
-
-	/* Tell the host we've noticed this device. */
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
-	/* Tell the host we've known how to drive the device. */
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-	virtio_negotiate_features(hw);
-
-	rx_func_get(eth_dev);
-
-	/* Setting up rx_header size for the device */
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
-		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-	else
-		hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
-
-	/* Copy the permanent MAC address to: virtio_hw */
-	virtio_get_hwaddr(hw);
-	ether_addr_copy((struct ether_addr *) hw->mac_addr,
-			&eth_dev->data->mac_addrs[0]);
-	PMD_INIT_LOG(DEBUG,
-		     "PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
-		     hw->mac_addr[0], hw->mac_addr[1], hw->mac_addr[2],
-		     hw->mac_addr[3], hw->mac_addr[4], hw->mac_addr[5]);
-
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
-		config = &local_config;
-
-		if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-			offset_conf += sizeof(config->status);
-		} else {
-			PMD_INIT_LOG(DEBUG,
-				     "VIRTIO_NET_F_STATUS is not supported");
-			config->status = 0;
-		}
-
-		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
-			offset_conf += sizeof(config->max_virtqueue_pairs);
-		} else {
-			PMD_INIT_LOG(DEBUG,
-				     "VIRTIO_NET_F_MQ is not supported");
-			config->max_virtqueue_pairs = 1;
-		}
-
-		vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
-
-		hw->max_rx_queues =
-			(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-		hw->max_tx_queues =
-			(VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ?
-			VIRTIO_MAX_TX_QUEUES : config->max_virtqueue_pairs;
-
-		virtio_dev_cq_queue_setup(eth_dev,
-					config->max_virtqueue_pairs * 2,
-					SOCKET_ID_ANY);
-
-		PMD_INIT_LOG(DEBUG, "config->max_virtqueue_pairs=%d",
-				config->max_virtqueue_pairs);
-		PMD_INIT_LOG(DEBUG, "config->status=%d", config->status);
-		PMD_INIT_LOG(DEBUG,
-				"PORT MAC: %02X:%02X:%02X:%02X:%02X:%02X",
-				config->mac[0], config->mac[1],
-				config->mac[2], config->mac[3],
-				config->mac[4], config->mac[5]);
-	} else {
-		hw->max_rx_queues = 1;
-		hw->max_tx_queues = 1;
-	}
-
-	eth_dev->data->nb_rx_queues = hw->max_rx_queues;
-	eth_dev->data->nb_tx_queues = hw->max_tx_queues;
-
-	PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
-			hw->max_rx_queues, hw->max_tx_queues);
-	PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
-			eth_dev->data->port_id, pci_dev->id.vendor_id,
-			pci_dev->id.device_id);
-
-	/* Setup interrupt callback  */
-	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-		rte_intr_callback_register(&pci_dev->intr_handle,
-				   virtio_interrupt_handler, eth_dev);
-
-	virtio_dev_cq_start(eth_dev);
-
-	return 0;
-}
-
-static struct eth_driver rte_virtio_pmd = {
-	{
-		.name = "rte_virtio_pmd",
-		.id_table = pci_id_virtio_map,
-	},
-	.eth_dev_init = eth_virtio_dev_init,
-	.dev_private_size = sizeof(struct virtio_hw),
-};
-
-/*
- * Driver initialization routine.
- * Invoked once at EAL init time.
- * Register itself as the [Poll Mode] Driver of PCI virtio devices.
- * Returns 0 on success.
- */
-static int
-rte_virtio_pmd_init(const char *name __rte_unused,
-		    const char *param __rte_unused)
-{
-	if (rte_eal_iopl_init() != 0) {
-		PMD_INIT_LOG(ERR, "IOPL call failed - cannot use virtio PMD");
-		return -1;
-	}
-
-	rte_eth_driver_register(&rte_virtio_pmd);
-	return 0;
-}
-
-/*
- * Only 1 queue is supported, no queue release related operation
- */
-static void
-virtio_dev_rx_queue_release(__rte_unused void *rxq)
-{
-}
-
-static void
-virtio_dev_tx_queue_release(__rte_unused void *txq)
-{
-}
-
-/*
- * Configure virtio device
- * It returns 0 on success.
- */
-static int
-virtio_dev_configure(struct rte_eth_dev *dev)
-{
-	const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct rte_pci_device *pci_dev = dev->pci_dev;
-
-	PMD_INIT_LOG(DEBUG, "configure");
-
-	if (rxmode->hw_ip_checksum) {
-		PMD_DRV_LOG(ERR, "HW IP checksum not supported");
-		return (-EINVAL);
-	}
-
-	hw->vlan_strip = rxmode->hw_vlan_strip;
-
-	if (rxmode->hw_vlan_filter
-	    && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
-		PMD_DRV_LOG(NOTICE,
-			    "vlan filtering not available on this host");
-		return -ENOTSUP;
-	}
-
-	if (pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)
-		if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
-			PMD_DRV_LOG(ERR, "failed to set config vector");
-			return -EBUSY;
-		}
-
-	return 0;
-}
-
-
-static int
-virtio_dev_start(struct rte_eth_dev *dev)
-{
-	uint16_t nb_queues, i;
-	struct virtio_hw *hw = dev->data->dev_private;
-	struct rte_pci_device *pci_dev = dev->pci_dev;
-
-	/* check if lsc interrupt feature is enabled */
-	if ((dev->data->dev_conf.intr_conf.lsc) &&
-		(pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC)) {
-		if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-			PMD_DRV_LOG(ERR, "link status not supported by host");
-			return -ENOTSUP;
-		}
-
-		if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
-			PMD_DRV_LOG(ERR, "interrupt enable failed");
-			return -EIO;
-		}
-	}
-
-	/* Initialize Link state */
-	virtio_dev_link_update(dev, 0);
-
-	/* On restart after stop do not touch queues */
-	if (hw->started)
-		return 0;
-
-	/* Do final configuration before rx/tx engine starts */
-	virtio_dev_rxtx_start(dev);
-	vtpci_reinit_complete(hw);
-
-	hw->started = 1;
-
-	/*Notify the backend
-	 *Otherwise the tap backend might already stop its queue due to fullness.
-	 *vhost backend will have no chance to be waked up
-	 */
-	nb_queues = dev->data->nb_rx_queues;
-	if (nb_queues > 1) {
-		if (virtio_set_multiple_queues(dev, nb_queues) != 0)
-			return -EINVAL;
-	}
-
-	PMD_INIT_LOG(DEBUG, "nb_queues=%d", nb_queues);
-
-	for (i = 0; i < nb_queues; i++)
-		virtqueue_notify(dev->data->rx_queues[i]);
-
-	PMD_INIT_LOG(DEBUG, "Notified backend at initialization");
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++)
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++)
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-
-	return 0;
-}
-
-static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)
-{
-	struct rte_mbuf *buf;
-	int i, mbuf_num = 0;
-
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		PMD_INIT_LOG(DEBUG,
-			     "Before freeing rxq[%d] used and unused buf", i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-
-		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
-					dev->data->rx_queues[i])) != NULL) {
-			rte_pktmbuf_free(buf);
-			mbuf_num++;
-		}
-
-		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
-		PMD_INIT_LOG(DEBUG,
-			     "After freeing rxq[%d] used and unused buf", i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-	}
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		PMD_INIT_LOG(DEBUG,
-			     "Before freeing txq[%d] used and unused bufs",
-			     i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-
-		mbuf_num = 0;
-		while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
-					dev->data->tx_queues[i])) != NULL) {
-			rte_pktmbuf_free(buf);
-
-			mbuf_num++;
-		}
-
-		PMD_INIT_LOG(DEBUG, "free %d mbufs", mbuf_num);
-		PMD_INIT_LOG(DEBUG,
-			     "After freeing txq[%d] used and unused buf", i);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-	}
-}
-
-/*
- * Stop device: disable interrupt and mark link down
- */
-static void
-virtio_dev_stop(struct rte_eth_dev *dev)
-{
-	struct rte_eth_link link;
-
-	PMD_INIT_LOG(DEBUG, "stop");
-
-	if (dev->data->dev_conf.intr_conf.lsc)
-		rte_intr_disable(&dev->pci_dev->intr_handle);
-
-	memset(&link, 0, sizeof(link));
-	virtio_dev_atomic_write_link_status(dev, &link);
-}
-
-static int
-virtio_dev_link_update(struct rte_eth_dev *dev, __rte_unused int wait_to_complete)
-{
-	struct rte_eth_link link, old;
-	uint16_t status;
-	struct virtio_hw *hw = dev->data->dev_private;
-	memset(&link, 0, sizeof(link));
-	virtio_dev_atomic_read_link_status(dev, &link);
-	old = link;
-	link.link_duplex = FULL_DUPLEX;
-	link.link_speed  = SPEED_10G;
-
-	if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-		PMD_INIT_LOG(DEBUG, "Get link status from hw");
-		vtpci_read_dev_config(hw,
-				offsetof(struct virtio_net_config, status),
-				&status, sizeof(status));
-		if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
-			link.link_status = 0;
-			PMD_INIT_LOG(DEBUG, "Port %d is down",
-				     dev->data->port_id);
-		} else {
-			link.link_status = 1;
-			PMD_INIT_LOG(DEBUG, "Port %d is up",
-				     dev->data->port_id);
-		}
-	} else {
-		link.link_status = 1;   /* Link up */
-	}
-	virtio_dev_atomic_write_link_status(dev, &link);
-
-	return (old.link_status == link.link_status) ? -1 : 0;
-}
-
-static void
-virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	dev_info->driver_name = dev->driver->pci_drv.name;
-	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
-	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
-	dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
-	dev_info->max_rx_pktlen = VIRTIO_MAX_RX_PKTLEN;
-	dev_info->max_mac_addrs = VIRTIO_MAX_MAC_ADDRS;
-	dev_info->default_txconf = (struct rte_eth_txconf) {
-		.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS
-	};
-}
-
-/*
- * It enables testpmd to collect per queue stats.
- */
-static int
-virtio_dev_queue_stats_mapping_set(__rte_unused struct rte_eth_dev *eth_dev,
-__rte_unused uint16_t queue_id, __rte_unused uint8_t stat_idx,
-__rte_unused uint8_t is_rx)
-{
-	return 0;
-}
-
-static struct rte_driver rte_virtio_driver = {
-	.type = PMD_PDEV,
-	.init = rte_virtio_pmd_init,
-};
-
-PMD_REGISTER_DRIVER(rte_virtio_driver);
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.h b/lib/librte_pmd_virtio/virtio_ethdev.h
deleted file mode 100644
index e6d4533..0000000
--- a/lib/librte_pmd_virtio/virtio_ethdev.h
+++ /dev/null
@@ -1,124 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_ETHDEV_H_
-#define _VIRTIO_ETHDEV_H_
-
-#include <stdint.h>
-
-#include "virtio_pci.h"
-
-#define SPEED_10	10
-#define SPEED_100	100
-#define SPEED_1000	1000
-#define SPEED_10G	10000
-#define HALF_DUPLEX	1
-#define FULL_DUPLEX	2
-
-#ifndef PAGE_SIZE
-#define PAGE_SIZE 4096
-#endif
-
-#define VIRTIO_MAX_RX_QUEUES 128
-#define VIRTIO_MAX_TX_QUEUES 128
-#define VIRTIO_MAX_MAC_ADDRS 64
-#define VIRTIO_MIN_RX_BUFSIZE 64
-#define VIRTIO_MAX_RX_PKTLEN  9728
-
-/* Features desired/implemented by this driver. */
-#define VTNET_FEATURES \
-	(VIRTIO_NET_F_MAC       | \
-	VIRTIO_NET_F_STATUS     | \
-	VIRTIO_NET_F_MQ         | \
-	VIRTIO_NET_F_CTRL_MAC_ADDR | \
-	VIRTIO_NET_F_CTRL_VQ    | \
-	VIRTIO_NET_F_CTRL_RX    | \
-	VIRTIO_NET_F_CTRL_VLAN  | \
-	VIRTIO_NET_F_CSUM       | \
-	VIRTIO_NET_F_HOST_TSO4  | \
-	VIRTIO_NET_F_HOST_TSO6  | \
-	VIRTIO_NET_F_HOST_ECN   | \
-	VIRTIO_NET_F_GUEST_CSUM | \
-	VIRTIO_NET_F_GUEST_TSO4 | \
-	VIRTIO_NET_F_GUEST_TSO6 | \
-	VIRTIO_NET_F_GUEST_ECN  | \
-	VIRTIO_NET_F_MRG_RXBUF  | \
-	VIRTIO_RING_F_INDIRECT_DESC)
-
-/*
- * CQ function prototype
- */
-void virtio_dev_cq_start(struct rte_eth_dev *dev);
-
-/*
- * RX/TX function prototypes
- */
-void virtio_dev_rxtx_start(struct rte_eth_dev *dev);
-
-int virtio_dev_queue_setup(struct rte_eth_dev *dev,
-			int queue_type,
-			uint16_t queue_idx,
-			uint16_t  vtpci_queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			struct virtqueue **pvq);
-
-int  virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
-		uint16_t nb_rx_desc, unsigned int socket_id,
-		const struct rte_eth_rxconf *rx_conf,
-		struct rte_mempool *mb_pool);
-
-int  virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
-		uint16_t nb_tx_desc, unsigned int socket_id,
-		const struct rte_eth_txconf *tx_conf);
-
-uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
-		uint16_t nb_pkts);
-
-uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
-		uint16_t nb_pkts);
-
-uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
-		uint16_t nb_pkts);
-
-
-/*
- * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
- * frames larger than 1514 bytes. We do not yet support software LRO
- * via tcp_lro_rx().
- */
-#define VTNET_LRO_FEATURES (VIRTIO_NET_F_GUEST_TSO4 | \
-			    VIRTIO_NET_F_GUEST_TSO6 | VIRTIO_NET_F_GUEST_ECN)
-
-
-#endif /* _VIRTIO_ETHDEV_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_logs.h b/lib/librte_pmd_virtio/virtio_logs.h
deleted file mode 100644
index d6c33f7..0000000
--- a/lib/librte_pmd_virtio/virtio_logs.h
+++ /dev/null
@@ -1,70 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_LOGS_H_
-#define _VIRTIO_LOGS_H_
-
-#include <rte_log.h>
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_INIT
-#define PMD_INIT_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
-#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
-#else
-#define PMD_INIT_LOG(level, fmt, args...) do { } while(0)
-#define PMD_INIT_FUNC_TRACE() do { } while(0)
-#endif
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_RX
-#define PMD_RX_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s() rx: " fmt , __func__, ## args)
-#else
-#define PMD_RX_LOG(level, fmt, args...) do { } while(0)
-#endif
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_TX
-#define PMD_TX_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s() tx: " fmt , __func__, ## args)
-#else
-#define PMD_TX_LOG(level, fmt, args...) do { } while(0)
-#endif
-
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DRIVER
-#define PMD_DRV_LOG(level, fmt, args...) \
-	RTE_LOG(level, PMD, "%s(): " fmt , __func__, ## args)
-#else
-#define PMD_DRV_LOG(level, fmt, args...) do { } while(0)
-#endif
-
-#endif /* _VIRTIO_LOGS_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_pci.c b/lib/librte_pmd_virtio/virtio_pci.c
deleted file mode 100644
index 2245bec..0000000
--- a/lib/librte_pmd_virtio/virtio_pci.c
+++ /dev/null
@@ -1,147 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-#include <stdint.h>
-
-#include "virtio_pci.h"
-#include "virtio_logs.h"
-
-static uint8_t vtpci_get_status(struct virtio_hw *);
-
-void
-vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
-		void *dst, int length)
-{
-	uint64_t off;
-	uint8_t *d;
-	int size;
-
-	off = VIRTIO_PCI_CONFIG(hw) + offset;
-	for (d = dst; length > 0; d += size, off += size, length -= size) {
-		if (length >= 4) {
-			size = 4;
-			*(uint32_t *)d = VIRTIO_READ_REG_4(hw, off);
-		} else if (length >= 2) {
-			size = 2;
-			*(uint16_t *)d = VIRTIO_READ_REG_2(hw, off);
-		} else {
-			size = 1;
-			*d = VIRTIO_READ_REG_1(hw, off);
-		}
-	}
-}
-
-void
-vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
-		void *src, int length)
-{
-	uint64_t off;
-	uint8_t *s;
-	int size;
-
-	off = VIRTIO_PCI_CONFIG(hw) + offset;
-	for (s = src; length > 0; s += size, off += size, length -= size) {
-		if (length >= 4) {
-			size = 4;
-			VIRTIO_WRITE_REG_4(hw, off, *(uint32_t *)s);
-		} else if (length >= 2) {
-			size = 2;
-			VIRTIO_WRITE_REG_2(hw, off, *(uint16_t *)s);
-		} else {
-			size = 1;
-			VIRTIO_WRITE_REG_1(hw, off, *s);
-		}
-	}
-}
-
-uint32_t
-vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
-{
-	uint32_t features;
-	/*
-	 * Limit negotiated features to what the driver, virtqueue, and
-	 * host all support.
-	 */
-	features = host_features & hw->guest_features;
-
-	VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
-	return features;
-}
-
-
-void
-vtpci_reset(struct virtio_hw *hw)
-{
-	/*
-	 * Setting the status to RESET sets the host device to
-	 * the original, uninitialized state.
-	 */
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
-	vtpci_get_status(hw);
-}
-
-void
-vtpci_reinit_complete(struct virtio_hw *hw)
-{
-	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
-}
-
-static uint8_t
-vtpci_get_status(struct virtio_hw *hw)
-{
-	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
-}
-
-void
-vtpci_set_status(struct virtio_hw *hw, uint8_t status)
-{
-	if (status != VIRTIO_CONFIG_STATUS_RESET)
-		status = (uint8_t)(status | vtpci_get_status(hw));
-
-	VIRTIO_WRITE_REG_1(hw, VIRTIO_PCI_STATUS, status);
-}
-
-uint8_t
-vtpci_isr(struct virtio_hw *hw)
-{
-
-	return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_ISR);
-}
-
-
-/* Enable one vector (0) for Link State Intrerrupt */
-uint16_t
-vtpci_irq_config(struct virtio_hw *hw, uint16_t vec)
-{
-	VIRTIO_WRITE_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR, vec);
-	return VIRTIO_READ_REG_2(hw, VIRTIO_MSI_CONFIG_VECTOR);
-}
diff --git a/lib/librte_pmd_virtio/virtio_pci.h b/lib/librte_pmd_virtio/virtio_pci.h
deleted file mode 100644
index 64d9c34..0000000
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ /dev/null
@@ -1,270 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_PCI_H_
-#define _VIRTIO_PCI_H_
-
-#include <stdint.h>
-
-#ifdef __FreeBSD__
-#include <sys/types.h>
-#include <machine/cpufunc.h>
-#else
-#include <sys/io.h>
-#endif
-
-#include <rte_ethdev.h>
-
-struct virtqueue;
-
-/* VirtIO PCI vendor/device ID. */
-#define VIRTIO_PCI_VENDORID     0x1AF4
-#define VIRTIO_PCI_DEVICEID_MIN 0x1000
-#define VIRTIO_PCI_DEVICEID_MAX 0x103F
-
-/* VirtIO ABI version, this must match exactly. */
-#define VIRTIO_PCI_ABI_VERSION 0
-
-/*
- * VirtIO Header, located in BAR 0.
- */
-#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
-#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
-#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
-#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
-#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
-#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
-#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
-#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
-				      * also clears the register (8, RO) */
-/* Only if MSIX is enabled: */
-#define VIRTIO_MSI_CONFIG_VECTOR  20 /* configuration change vector (16, RW) */
-#define VIRTIO_MSI_QUEUE_VECTOR	  22 /* vector for selected VQ notifications
-				      (16, RW) */
-
-/* The bit of the ISR which indicates a device has an interrupt. */
-#define VIRTIO_PCI_ISR_INTR   0x1
-/* The bit of the ISR which indicates a device configuration change. */
-#define VIRTIO_PCI_ISR_CONFIG 0x2
-/* Vector value used to disable MSI for queue. */
-#define VIRTIO_MSI_NO_VECTOR 0xFFFF
-
-/* VirtIO device IDs. */
-#define VIRTIO_ID_NETWORK  0x01
-#define VIRTIO_ID_BLOCK    0x02
-#define VIRTIO_ID_CONSOLE  0x03
-#define VIRTIO_ID_ENTROPY  0x04
-#define VIRTIO_ID_BALLOON  0x05
-#define VIRTIO_ID_IOMEMORY 0x06
-#define VIRTIO_ID_9P       0x09
-
-/* Status byte for guest to report progress. */
-#define VIRTIO_CONFIG_STATUS_RESET     0x00
-#define VIRTIO_CONFIG_STATUS_ACK       0x01
-#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
-#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
-#define VIRTIO_CONFIG_STATUS_FAILED    0x80
-
-/*
- * Generate interrupt when the virtqueue ring is
- * completely used, even if we've suppressed them.
- */
-#define VIRTIO_F_NOTIFY_ON_EMPTY (1 << 24)
-
-/*
- * The guest should never negotiate this feature; it
- * is used to detect faulty drivers.
- */
-#define VIRTIO_F_BAD_FEATURE (1 << 30)
-
-/*
- * Some VirtIO feature bits (currently bits 28 through 31) are
- * reserved for the transport being used (eg. virtio_ring), the
- * rest are per-device feature bits.
- */
-#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END   32
-
-/*
- * Each virtqueue indirect descriptor list must be physically contiguous.
- * To allow us to malloc(9) each list individually, limit the number
- * supported to what will fit in one page. With 4KB pages, this is a limit
- * of 256 descriptors. If there is ever a need for more, we can switch to
- * contigmalloc(9) for the larger allocations, similar to what
- * bus_dmamem_alloc(9) does.
- *
- * Note the sizeof(struct vring_desc) is 16 bytes.
- */
-#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
-
-/* The feature bitmap for virtio net */
-#define VIRTIO_NET_F_CSUM       0x00001 /* Host handles pkts w/ partial csum */
-#define VIRTIO_NET_F_GUEST_CSUM 0x00002 /* Guest handles pkts w/ partial csum*/
-#define VIRTIO_NET_F_MAC        0x00020 /* Host has given MAC address. */
-#define VIRTIO_NET_F_GSO        0x00040 /* Host handles pkts w/ any GSO type */
-#define VIRTIO_NET_F_GUEST_TSO4 0x00080 /* Guest can handle TSOv4 in. */
-#define VIRTIO_NET_F_GUEST_TSO6 0x00100 /* Guest can handle TSOv6 in. */
-#define VIRTIO_NET_F_GUEST_ECN  0x00200 /* Guest can handle TSO[6] w/ ECN in.*/
-#define VIRTIO_NET_F_GUEST_UFO  0x00400 /* Guest can handle UFO in. */
-#define VIRTIO_NET_F_HOST_TSO4  0x00800 /* Host can handle TSOv4 in. */
-#define VIRTIO_NET_F_HOST_TSO6  0x01000 /* Host can handle TSOv6 in. */
-#define VIRTIO_NET_F_HOST_ECN   0x02000 /* Host can handle TSO[6] w/ ECN in. */
-#define VIRTIO_NET_F_HOST_UFO   0x04000 /* Host can handle UFO in. */
-#define VIRTIO_NET_F_MRG_RXBUF  0x08000 /* Host can merge receive buffers. */
-#define VIRTIO_NET_F_STATUS     0x10000 /* virtio_net_config.status available*/
-#define VIRTIO_NET_F_CTRL_VQ    0x20000 /* Control channel available */
-#define VIRTIO_NET_F_CTRL_RX    0x40000 /* Control channel RX mode support */
-#define VIRTIO_NET_F_CTRL_VLAN  0x80000 /* Control channel VLAN filtering */
-#define VIRTIO_NET_F_CTRL_RX_EXTRA  0x100000 /* Extra RX mode control support */
-#define VIRTIO_RING_F_INDIRECT_DESC 0x10000000 /* Support for indirect buffer descriptors. */
-/* The guest publishes the used index for which it expects an interrupt
- * at the end of the avail ring. Host should ignore the avail->flags field.
- * The host publishes the avail index for which it expects a kick
- * at the end of the used ring. Guest should ignore the used->flags field.
- */
-#define VIRTIO_RING_F_EVENT_IDX 0x20000000
-
-#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
-
-/*
- * Maximum number of virtqueues per device.
- */
-#define VIRTIO_MAX_VIRTQUEUES 8
-
-struct virtio_hw {
-	struct virtqueue *cvq;
-	uint32_t    io_base;
-	uint32_t    guest_features;
-	uint32_t    max_tx_queues;
-	uint32_t    max_rx_queues;
-	uint16_t    vtnet_hdr_size;
-	uint8_t	    vlan_strip;
-	uint8_t	    use_msix;
-	uint8_t     started;
-	uint8_t     mac_addr[ETHER_ADDR_LEN];
-};
-
-/*
- * This structure is just a reference to read
- * net device specific config space; it just a chodu structure
- *
- */
-struct virtio_net_config {
-	/* The config defining mac address (if VIRTIO_NET_F_MAC) */
-	uint8_t    mac[ETHER_ADDR_LEN];
-	/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
-	uint16_t   status;
-	uint16_t   max_virtqueue_pairs;
-} __attribute__((packed));
-
-/*
- * The remaining space is defined by each driver as the per-driver
- * configuration space.
- */
-#define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)
-
-/*
- * How many bits to shift physical queue address written to QUEUE_PFN.
- * 12 is historical, and due to x86 page size.
- */
-#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
-
-/* The alignment to use between consumer and producer parts of vring. */
-#define VIRTIO_PCI_VRING_ALIGN 4096
-
-#ifdef __FreeBSD__
-
-static inline void
-outb_p(unsigned char data, unsigned int port)
-{
-
-	outb(port, (u_char)data);
-}
-
-static inline void
-outw_p(unsigned short data, unsigned int port)
-{
-	outw(port, (u_short)data);
-}
-
-static inline void
-outl_p(unsigned int data, unsigned int port)
-{
-	outl(port, (u_int)data);
-}
-#endif
-
-#define VIRTIO_PCI_REG_ADDR(hw, reg) \
-	(unsigned short)((hw)->io_base + (reg))
-
-#define VIRTIO_READ_REG_1(hw, reg) \
-	inb((VIRTIO_PCI_REG_ADDR((hw), (reg))))
-#define VIRTIO_WRITE_REG_1(hw, reg, value) \
-	outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
-
-#define VIRTIO_READ_REG_2(hw, reg) \
-	inw((VIRTIO_PCI_REG_ADDR((hw), (reg))))
-#define VIRTIO_WRITE_REG_2(hw, reg, value) \
-	outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
-
-#define VIRTIO_READ_REG_4(hw, reg) \
-	inl((VIRTIO_PCI_REG_ADDR((hw), (reg))))
-#define VIRTIO_WRITE_REG_4(hw, reg, value) \
-	outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg))))
-
-static inline int
-vtpci_with_feature(struct virtio_hw *hw, uint32_t feature)
-{
-	return (hw->guest_features & feature) != 0;
-}
-
-/*
- * Function declaration from virtio_pci.c
- */
-void vtpci_reset(struct virtio_hw *);
-
-void vtpci_reinit_complete(struct virtio_hw *);
-
-void vtpci_set_status(struct virtio_hw *, uint8_t);
-
-uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
-
-void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);
-
-void vtpci_read_dev_config(struct virtio_hw *, uint64_t, void *, int);
-
-uint8_t vtpci_isr(struct virtio_hw *);
-
-uint16_t vtpci_irq_config(struct virtio_hw *, uint16_t);
-
-#endif /* _VIRTIO_PCI_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_ring.h b/lib/librte_pmd_virtio/virtio_ring.h
deleted file mode 100644
index a16c499..0000000
--- a/lib/librte_pmd_virtio/virtio_ring.h
+++ /dev/null
@@ -1,163 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_RING_H_
-#define _VIRTIO_RING_H_
-
-#include <stdint.h>
-
-#include <rte_common.h>
-
-/* This marks a buffer as continuing via the next field. */
-#define VRING_DESC_F_NEXT       1
-/* This marks a buffer as write-only (otherwise read-only). */
-#define VRING_DESC_F_WRITE      2
-/* This means the buffer contains a list of buffer descriptors. */
-#define VRING_DESC_F_INDIRECT   4
-
-/* The Host uses this in used->flags to advise the Guest: don't kick me
- * when you add a buffer.  It's unreliable, so it's simply an
- * optimization.  Guest will still kick if it's out of buffers. */
-#define VRING_USED_F_NO_NOTIFY  1
-/* The Guest uses this in avail->flags to advise the Host: don't
- * interrupt me when you consume a buffer.  It's unreliable, so it's
- * simply an optimization.  */
-#define VRING_AVAIL_F_NO_INTERRUPT  1
-
-/* VirtIO ring descriptors: 16 bytes.
- * These can chain together via "next". */
-struct vring_desc {
-	uint64_t addr;  /*  Address (guest-physical). */
-	uint32_t len;   /* Length. */
-	uint16_t flags; /* The flags as indicated above. */
-	uint16_t next;  /* We chain unused descriptors via this. */
-};
-
-struct vring_avail {
-	uint16_t flags;
-	uint16_t idx;
-	uint16_t ring[0];
-};
-
-/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
-struct vring_used_elem {
-	/* Index of start of used descriptor chain. */
-	uint32_t id;
-	/* Total length of the descriptor chain which was written to. */
-	uint32_t len;
-};
-
-struct vring_used {
-	uint16_t flags;
-	uint16_t idx;
-	struct vring_used_elem ring[0];
-};
-
-struct vring {
-	unsigned int num;
-	struct vring_desc  *desc;
-	struct vring_avail *avail;
-	struct vring_used  *used;
-};
-
-/* The standard layout for the ring is a continuous chunk of memory which
- * looks like this.  We assume num is a power of 2.
- *
- * struct vring {
- *      // The actual descriptors (16 bytes each)
- *      struct vring_desc desc[num];
- *
- *      // A ring of available descriptor heads with free-running index.
- *      __u16 avail_flags;
- *      __u16 avail_idx;
- *      __u16 available[num];
- *      __u16 used_event_idx;
- *
- *      // Padding to the next align boundary.
- *      char pad[];
- *
- *      // A ring of used descriptor heads with free-running index.
- *      __u16 used_flags;
- *      __u16 used_idx;
- *      struct vring_used_elem used[num];
- *      __u16 avail_event_idx;
- * };
- *
- * NOTE: for VirtIO PCI, align is 4096.
- */
-
-/*
- * We publish the used event index at the end of the available ring, and vice
- * versa. They are at the end for backwards compatibility.
- */
-#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
-#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
-
-static inline int
-vring_size(unsigned int num, unsigned long align)
-{
-	int size;
-
-	size = num * sizeof(struct vring_desc);
-	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
-	size = RTE_ALIGN_CEIL(size, align);
-	size += sizeof(struct vring_used) +
-		(num * sizeof(struct vring_used_elem));
-	return size;
-}
-
-static inline void
-vring_init(struct vring *vr, unsigned int num, uint8_t *p,
-	unsigned long align)
-{
-	vr->num = num;
-	vr->desc = (struct vring_desc *) p;
-	vr->avail = (struct vring_avail *) (p +
-		num * sizeof(struct vring_desc));
-	vr->used = (void *)
-		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
-}
-
-/*
- * The following is used with VIRTIO_RING_F_EVENT_IDX.
- * Assuming a given event_idx value from the other size, if we have
- * just incremented index from old to new_idx, should we trigger an
- * event?
- */
-static inline int
-vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
-{
-	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
-}
-
-#endif /* _VIRTIO_RING_H_ */
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c b/lib/librte_pmd_virtio/virtio_rxtx.c
deleted file mode 100644
index 3ff275c..0000000
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ /dev/null
@@ -1,815 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdint.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <errno.h>
-
-#include <rte_cycles.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_branch_prediction.h>
-#include <rte_mempool.h>
-#include <rte_malloc.h>
-#include <rte_mbuf.h>
-#include <rte_ether.h>
-#include <rte_ethdev.h>
-#include <rte_prefetch.h>
-#include <rte_string_fns.h>
-#include <rte_errno.h>
-#include <rte_byteorder.h>
-
-#include "virtio_logs.h"
-#include "virtio_ethdev.h"
-#include "virtqueue.h"
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
-#define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
-#else
-#define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
-#endif
-
-static void
-vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
-{
-	struct vring_desc *dp, *dp_tail;
-	struct vq_desc_extra *dxp;
-	uint16_t desc_idx_last = desc_idx;
-
-	dp  = &vq->vq_ring.desc[desc_idx];
-	dxp = &vq->vq_descx[desc_idx];
-	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt + dxp->ndescs);
-	if ((dp->flags & VRING_DESC_F_INDIRECT) == 0) {
-		while (dp->flags & VRING_DESC_F_NEXT) {
-			desc_idx_last = dp->next;
-			dp = &vq->vq_ring.desc[dp->next];
-		}
-	}
-	dxp->ndescs = 0;
-
-	/*
-	 * We must append the existing free chain, if any, to the end of
-	 * newly freed chain. If the virtqueue was completely used, then
-	 * head would be VQ_RING_DESC_CHAIN_END (ASSERTed above).
-	 */
-	if (vq->vq_desc_tail_idx == VQ_RING_DESC_CHAIN_END) {
-		vq->vq_desc_head_idx = desc_idx;
-	} else {
-		dp_tail = &vq->vq_ring.desc[vq->vq_desc_tail_idx];
-		dp_tail->next = desc_idx;
-	}
-
-	vq->vq_desc_tail_idx = desc_idx_last;
-	dp->next = VQ_RING_DESC_CHAIN_END;
-}
-
-static uint16_t
-virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct rte_mbuf **rx_pkts,
-			   uint32_t *len, uint16_t num)
-{
-	struct vring_used_elem *uep;
-	struct rte_mbuf *cookie;
-	uint16_t used_idx, desc_idx;
-	uint16_t i;
-
-	/*  Caller does the check */
-	for (i = 0; i < num ; i++) {
-		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-		uep = &vq->vq_ring.used->ring[used_idx];
-		desc_idx = (uint16_t) uep->id;
-		len[i] = uep->len;
-		cookie = (struct rte_mbuf *)vq->vq_descx[desc_idx].cookie;
-
-		if (unlikely(cookie == NULL)) {
-			PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie at %u\n",
-				vq->vq_used_cons_idx);
-			break;
-		}
-
-		rte_prefetch0(cookie);
-		rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *));
-		rx_pkts[i]  = cookie;
-		vq->vq_used_cons_idx++;
-		vq_ring_free_chain(vq, desc_idx);
-		vq->vq_descx[desc_idx].cookie = NULL;
-	}
-
-	return i;
-}
-
-#ifndef DEFAULT_TX_FREE_THRESH
-#define DEFAULT_TX_FREE_THRESH 32
-#endif
-
-/* Cleanup from completed transmits. */
-static void
-virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
-{
-	uint16_t i, used_idx, desc_idx;
-	for (i = 0; i < num; i++) {
-		struct vring_used_elem *uep;
-		struct vq_desc_extra *dxp;
-
-		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-		uep = &vq->vq_ring.used->ring[used_idx];
-
-		desc_idx = (uint16_t) uep->id;
-		dxp = &vq->vq_descx[desc_idx];
-		vq->vq_used_cons_idx++;
-		vq_ring_free_chain(vq, desc_idx);
-
-		if (dxp->cookie != NULL) {
-			rte_pktmbuf_free(dxp->cookie);
-			dxp->cookie = NULL;
-		}
-	}
-}
-
-
-static inline int
-virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct rte_mbuf *cookie)
-{
-	struct vq_desc_extra *dxp;
-	struct virtio_hw *hw = vq->hw;
-	struct vring_desc *start_dp;
-	uint16_t needed = 1;
-	uint16_t head_idx, idx;
-
-	if (unlikely(vq->vq_free_cnt == 0))
-		return -ENOSPC;
-	if (unlikely(vq->vq_free_cnt < needed))
-		return -EMSGSIZE;
-
-	head_idx = vq->vq_desc_head_idx;
-	if (unlikely(head_idx >= vq->vq_nentries))
-		return -EFAULT;
-
-	idx = head_idx;
-	dxp = &vq->vq_descx[idx];
-	dxp->cookie = (void *)cookie;
-	dxp->ndescs = needed;
-
-	start_dp = vq->vq_ring.desc;
-	start_dp[idx].addr =
-		(uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
-		- hw->vtnet_hdr_size);
-	start_dp[idx].len =
-		cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
-	start_dp[idx].flags =  VRING_DESC_F_WRITE;
-	idx = start_dp[idx].next;
-	vq->vq_desc_head_idx = idx;
-	if (vq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
-		vq->vq_desc_tail_idx = idx;
-	vq->vq_free_cnt = (uint16_t)(vq->vq_free_cnt - needed);
-	vq_update_avail_ring(vq, head_idx);
-
-	return 0;
-}
-
-static int
-virtqueue_enqueue_xmit(struct virtqueue *txvq, struct rte_mbuf *cookie)
-{
-	struct vq_desc_extra *dxp;
-	struct vring_desc *start_dp;
-	uint16_t seg_num = cookie->nb_segs;
-	uint16_t needed = 1 + seg_num;
-	uint16_t head_idx, idx;
-	uint16_t head_size = txvq->hw->vtnet_hdr_size;
-
-	if (unlikely(txvq->vq_free_cnt == 0))
-		return -ENOSPC;
-	if (unlikely(txvq->vq_free_cnt < needed))
-		return -EMSGSIZE;
-	head_idx = txvq->vq_desc_head_idx;
-	if (unlikely(head_idx >= txvq->vq_nentries))
-		return -EFAULT;
-
-	idx = head_idx;
-	dxp = &txvq->vq_descx[idx];
-	dxp->cookie = (void *)cookie;
-	dxp->ndescs = needed;
-
-	start_dp = txvq->vq_ring.desc;
-	start_dp[idx].addr =
-		txvq->virtio_net_hdr_mem + idx * head_size;
-	start_dp[idx].len = (uint32_t)head_size;
-	start_dp[idx].flags = VRING_DESC_F_NEXT;
-
-	for (; ((seg_num > 0) && (cookie != NULL)); seg_num--) {
-		idx = start_dp[idx].next;
-		start_dp[idx].addr  = RTE_MBUF_DATA_DMA_ADDR(cookie);
-		start_dp[idx].len   = cookie->data_len;
-		start_dp[idx].flags = VRING_DESC_F_NEXT;
-		cookie = cookie->next;
-	}
-
-	start_dp[idx].flags &= ~VRING_DESC_F_NEXT;
-	idx = start_dp[idx].next;
-	txvq->vq_desc_head_idx = idx;
-	if (txvq->vq_desc_head_idx == VQ_RING_DESC_CHAIN_END)
-		txvq->vq_desc_tail_idx = idx;
-	txvq->vq_free_cnt = (uint16_t)(txvq->vq_free_cnt - needed);
-	vq_update_avail_ring(txvq, head_idx);
-
-	return 0;
-}
-
-static inline struct rte_mbuf *
-rte_rxmbuf_alloc(struct rte_mempool *mp)
-{
-	struct rte_mbuf *m;
-
-	m = __rte_mbuf_raw_alloc(mp);
-	__rte_mbuf_sanity_check_raw(m, 0);
-
-	return m;
-}
-
-static void
-virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
-{
-	struct rte_mbuf *m;
-	int i, nbufs, error, size = vq->vq_nentries;
-	struct vring *vr = &vq->vq_ring;
-	uint8_t *ring_mem = vq->vq_ring_virt_mem;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/*
-	 * Reinitialise since virtio port might have been stopped and restarted
-	 */
-	memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-	vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
-	vq->vq_used_cons_idx = 0;
-	vq->vq_desc_head_idx = 0;
-	vq->vq_avail_idx = 0;
-	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
-	vq->vq_free_cnt = vq->vq_nentries;
-	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
-
-	/* Chain all the descriptors in the ring with an END */
-	for (i = 0; i < size - 1; i++)
-		vr->desc[i].next = (uint16_t)(i + 1);
-	vr->desc[i].next = VQ_RING_DESC_CHAIN_END;
-
-	/*
-	 * Disable device(host) interrupting guest
-	 */
-	virtqueue_disable_intr(vq);
-
-	/* Only rx virtqueue needs mbufs to be allocated at initialization */
-	if (queue_type == VTNET_RQ) {
-		if (vq->mpool == NULL)
-			rte_exit(EXIT_FAILURE,
-			"Cannot allocate initial mbufs for rx virtqueue");
-
-		/* Allocate blank mbufs for the each rx descriptor */
-		nbufs = 0;
-		error = ENOSPC;
-		while (!virtqueue_full(vq)) {
-			m = rte_rxmbuf_alloc(vq->mpool);
-			if (m == NULL)
-				break;
-
-			/******************************************
-			*         Enqueue allocated buffers        *
-			*******************************************/
-			error = virtqueue_enqueue_recv_refill(vq, m);
-
-			if (error) {
-				rte_pktmbuf_free(m);
-				break;
-			}
-			nbufs++;
-		}
-
-		vq_update_avail_idx(vq);
-
-		PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-			vq->vq_queue_index);
-		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	} else if (queue_type == VTNET_TQ) {
-		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-			vq->vq_queue_index);
-		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	} else {
-		VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-			vq->vq_queue_index);
-		VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-			vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-	}
-}
-
-void
-virtio_dev_cq_start(struct rte_eth_dev *dev)
-{
-	struct virtio_hw *hw = dev->data->dev_private;
-
-	if (hw->cvq) {
-		virtio_dev_vring_start(hw->cvq, VTNET_CQ);
-		VIRTQUEUE_DUMP((struct virtqueue *)hw->cvq);
-	}
-}
-
-void
-virtio_dev_rxtx_start(struct rte_eth_dev *dev)
-{
-	/*
-	 * Start receive and transmit vrings
-	 * -	Setup vring structure for all queues
-	 * -	Initialize descriptor for the rx vring
-	 * -	Allocate blank mbufs for the each rx descriptor
-	 *
-	 */
-	int i;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* Start rx vring. */
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		virtio_dev_vring_start(dev->data->rx_queues[i], VTNET_RQ);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->rx_queues[i]);
-	}
-
-	/* Start tx vring. */
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		virtio_dev_vring_start(dev->data->tx_queues[i], VTNET_TQ);
-		VIRTQUEUE_DUMP((struct virtqueue *)dev->data->tx_queues[i]);
-	}
-}
-
-int
-virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
-			uint16_t queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			__rte_unused const struct rte_eth_rxconf *rx_conf,
-			struct rte_mempool *mp)
-{
-	uint16_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_RQ_QUEUE_IDX;
-	struct virtqueue *vq;
-	int ret;
-
-	PMD_INIT_FUNC_TRACE();
-	ret = virtio_dev_queue_setup(dev, VTNET_RQ, queue_idx, vtpci_queue_idx,
-			nb_desc, socket_id, &vq);
-	if (ret < 0) {
-		PMD_INIT_LOG(ERR, "tvq initialization failed");
-		return ret;
-	}
-
-	/* Create mempool for rx mbuf allocation */
-	vq->mpool = mp;
-
-	dev->data->rx_queues[queue_idx] = vq;
-	return 0;
-}
-
-/*
- * struct rte_eth_dev *dev: Used to update dev
- * uint16_t nb_desc: Defaults to values read from config space
- * unsigned int socket_id: Used to allocate memzone
- * const struct rte_eth_txconf *tx_conf: Used to setup tx engine
- * uint16_t queue_idx: Just used as an index in dev txq list
- */
-int
-virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
-			uint16_t queue_idx,
-			uint16_t nb_desc,
-			unsigned int socket_id,
-			const struct rte_eth_txconf *tx_conf)
-{
-	uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
-	struct virtqueue *vq;
-	uint16_t tx_free_thresh;
-	int ret;
-
-	PMD_INIT_FUNC_TRACE();
-
-	if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
-	    != ETH_TXQ_FLAGS_NOXSUMS) {
-		PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
-		return -EINVAL;
-	}
-
-	ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
-			nb_desc, socket_id, &vq);
-	if (ret < 0) {
-		PMD_INIT_LOG(ERR, "rvq initialization failed");
-		return ret;
-	}
-
-	tx_free_thresh = tx_conf->tx_free_thresh;
-	if (tx_free_thresh == 0)
-		tx_free_thresh =
-			RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
-
-	if (tx_free_thresh >= (vq->vq_nentries - 3)) {
-		RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
-			"number of TX entries minus 3 (%u)."
-			" (tx_free_thresh=%u port=%u queue=%u)\n",
-			vq->vq_nentries - 3,
-			tx_free_thresh, dev->data->port_id, queue_idx);
-		return -EINVAL;
-	}
-
-	vq->vq_free_thresh = tx_free_thresh;
-
-	dev->data->tx_queues[queue_idx] = vq;
-	return 0;
-}
-
-static void
-virtio_discard_rxbuf(struct virtqueue *vq, struct rte_mbuf *m)
-{
-	int error;
-	/*
-	 * Requeue the discarded mbuf. This should always be
-	 * successful since it was just dequeued.
-	 */
-	error = virtqueue_enqueue_recv_refill(vq, m);
-	if (unlikely(error)) {
-		RTE_LOG(ERR, PMD, "cannot requeue discarded mbuf");
-		rte_pktmbuf_free(m);
-	}
-}
-
-#define VIRTIO_MBUF_BURST_SZ 64
-#define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
-uint16_t
-virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
-{
-	struct virtqueue *rxvq = rx_queue;
-	struct virtio_hw *hw;
-	struct rte_mbuf *rxm, *new_mbuf;
-	uint16_t nb_used, num, nb_rx;
-	uint32_t len[VIRTIO_MBUF_BURST_SZ];
-	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
-	int error;
-	uint32_t i, nb_enqueued;
-	const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
-
-	nb_used = VIRTQUEUE_NUSED(rxvq);
-
-	virtio_rmb();
-
-	num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
-	num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : VIRTIO_MBUF_BURST_SZ);
-	if (likely(num > DESC_PER_CACHELINE))
-		num = num - ((rxvq->vq_used_cons_idx + num) % DESC_PER_CACHELINE);
-
-	if (num == 0)
-		return 0;
-
-	num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
-	PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
-
-	hw = rxvq->hw;
-	nb_rx = 0;
-	nb_enqueued = 0;
-
-	for (i = 0; i < num ; i++) {
-		rxm = rcv_pkts[i];
-
-		PMD_RX_LOG(DEBUG, "packet len:%d", len[i]);
-
-		if (unlikely(len[i] < hdr_size + ETHER_HDR_LEN)) {
-			PMD_RX_LOG(ERR, "Packet drop");
-			nb_enqueued++;
-			virtio_discard_rxbuf(rxvq, rxm);
-			rxvq->errors++;
-			continue;
-		}
-
-		rxm->port = rxvq->port_id;
-		rxm->data_off = RTE_PKTMBUF_HEADROOM;
-
-		rxm->nb_segs = 1;
-		rxm->next = NULL;
-		rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
-		rxm->data_len = (uint16_t)(len[i] - hdr_size);
-
-		if (hw->vlan_strip)
-			rte_vlan_strip(rxm);
-
-		VIRTIO_DUMP_PACKET(rxm, rxm->data_len);
-
-		rx_pkts[nb_rx++] = rxm;
-		rxvq->bytes += rx_pkts[nb_rx - 1]->pkt_len;
-	}
-
-	rxvq->packets += nb_rx;
-
-	/* Allocate new mbuf for the used descriptor */
-	error = ENOSPC;
-	while (likely(!virtqueue_full(rxvq))) {
-		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
-		if (unlikely(new_mbuf == NULL)) {
-			struct rte_eth_dev *dev
-				= &rte_eth_devices[rxvq->port_id];
-			dev->data->rx_mbuf_alloc_failed++;
-			break;
-		}
-		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
-		if (unlikely(error)) {
-			rte_pktmbuf_free(new_mbuf);
-			break;
-		}
-		nb_enqueued++;
-	}
-
-	if (likely(nb_enqueued)) {
-		vq_update_avail_idx(rxvq);
-
-		if (unlikely(virtqueue_kick_prepare(rxvq))) {
-			virtqueue_notify(rxvq);
-			PMD_RX_LOG(DEBUG, "Notified\n");
-		}
-	}
-
-	return nb_rx;
-}
-
-uint16_t
-virtio_recv_mergeable_pkts(void *rx_queue,
-			struct rte_mbuf **rx_pkts,
-			uint16_t nb_pkts)
-{
-	struct virtqueue *rxvq = rx_queue;
-	struct virtio_hw *hw;
-	struct rte_mbuf *rxm, *new_mbuf;
-	uint16_t nb_used, num, nb_rx;
-	uint32_t len[VIRTIO_MBUF_BURST_SZ];
-	struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
-	struct rte_mbuf *prev;
-	int error;
-	uint32_t i, nb_enqueued;
-	uint32_t seg_num;
-	uint16_t extra_idx;
-	uint32_t seg_res;
-	const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-
-	nb_used = VIRTQUEUE_NUSED(rxvq);
-
-	virtio_rmb();
-
-	if (nb_used == 0)
-		return 0;
-
-	PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);
-
-	hw = rxvq->hw;
-	nb_rx = 0;
-	i = 0;
-	nb_enqueued = 0;
-	seg_num = 0;
-	extra_idx = 0;
-	seg_res = 0;
-
-	while (i < nb_used) {
-		struct virtio_net_hdr_mrg_rxbuf *header;
-
-		if (nb_rx == nb_pkts)
-			break;
-
-		num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, 1);
-		if (num != 1)
-			continue;
-
-		i++;
-
-		PMD_RX_LOG(DEBUG, "dequeue:%d\n", num);
-		PMD_RX_LOG(DEBUG, "packet len:%d\n", len[0]);
-
-		rxm = rcv_pkts[0];
-
-		if (unlikely(len[0] < hdr_size + ETHER_HDR_LEN)) {
-			PMD_RX_LOG(ERR, "Packet drop\n");
-			nb_enqueued++;
-			virtio_discard_rxbuf(rxvq, rxm);
-			rxvq->errors++;
-			continue;
-		}
-
-		header = (struct virtio_net_hdr_mrg_rxbuf *)((char *)rxm->buf_addr +
-			RTE_PKTMBUF_HEADROOM - hdr_size);
-		seg_num = header->num_buffers;
-
-		if (seg_num == 0)
-			seg_num = 1;
-
-		rxm->data_off = RTE_PKTMBUF_HEADROOM;
-		rxm->nb_segs = seg_num;
-		rxm->next = NULL;
-		rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
-		rxm->data_len = (uint16_t)(len[0] - hdr_size);
-
-		rxm->port = rxvq->port_id;
-		rx_pkts[nb_rx] = rxm;
-		prev = rxm;
-
-		seg_res = seg_num - 1;
-
-		while (seg_res != 0) {
-			/*
-			 * Get extra segments for current uncompleted packet.
-			 */
-			uint16_t  rcv_cnt =
-				RTE_MIN(seg_res, RTE_DIM(rcv_pkts));
-			if (likely(VIRTQUEUE_NUSED(rxvq) >= rcv_cnt)) {
-				uint32_t rx_num =
-					virtqueue_dequeue_burst_rx(rxvq,
-					rcv_pkts, len, rcv_cnt);
-				i += rx_num;
-				rcv_cnt = rx_num;
-			} else {
-				PMD_RX_LOG(ERR,
-					"No enough segments for packet.\n");
-				nb_enqueued++;
-				virtio_discard_rxbuf(rxvq, rxm);
-				rxvq->errors++;
-				break;
-			}
-
-			extra_idx = 0;
-
-			while (extra_idx < rcv_cnt) {
-				rxm = rcv_pkts[extra_idx];
-
-				rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
-				rxm->next = NULL;
-				rxm->pkt_len = (uint32_t)(len[extra_idx]);
-				rxm->data_len = (uint16_t)(len[extra_idx]);
-
-				if (prev)
-					prev->next = rxm;
-
-				prev = rxm;
-				rx_pkts[nb_rx]->pkt_len += rxm->pkt_len;
-				extra_idx++;
-			};
-			seg_res -= rcv_cnt;
-		}
-
-		if (hw->vlan_strip)
-			rte_vlan_strip(rx_pkts[nb_rx]);
-
-		VIRTIO_DUMP_PACKET(rx_pkts[nb_rx],
-			rx_pkts[nb_rx]->data_len);
-
-		rxvq->bytes += rx_pkts[nb_rx]->pkt_len;
-		nb_rx++;
-	}
-
-	rxvq->packets += nb_rx;
-
-	/* Allocate new mbuf for the used descriptor */
-	error = ENOSPC;
-	while (likely(!virtqueue_full(rxvq))) {
-		new_mbuf = rte_rxmbuf_alloc(rxvq->mpool);
-		if (unlikely(new_mbuf == NULL)) {
-			struct rte_eth_dev *dev
-				= &rte_eth_devices[rxvq->port_id];
-			dev->data->rx_mbuf_alloc_failed++;
-			break;
-		}
-		error = virtqueue_enqueue_recv_refill(rxvq, new_mbuf);
-		if (unlikely(error)) {
-			rte_pktmbuf_free(new_mbuf);
-			break;
-		}
-		nb_enqueued++;
-	}
-
-	if (likely(nb_enqueued)) {
-		vq_update_avail_idx(rxvq);
-
-		if (unlikely(virtqueue_kick_prepare(rxvq))) {
-			virtqueue_notify(rxvq);
-			PMD_RX_LOG(DEBUG, "Notified");
-		}
-	}
-
-	return nb_rx;
-}
-
-uint16_t
-virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
-{
-	struct virtqueue *txvq = tx_queue;
-	struct rte_mbuf *txm;
-	uint16_t nb_used, nb_tx;
-	int error;
-
-	if (unlikely(nb_pkts < 1))
-		return nb_pkts;
-
-	PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
-	nb_used = VIRTQUEUE_NUSED(txvq);
-
-	virtio_rmb();
-	if (likely(nb_used > txvq->vq_free_thresh))
-		virtio_xmit_cleanup(txvq, nb_used);
-
-	nb_tx = 0;
-
-	while (nb_tx < nb_pkts) {
-		/* Need one more descriptor for virtio header. */
-		int need = tx_pkts[nb_tx]->nb_segs - txvq->vq_free_cnt + 1;
-
-		/*Positive value indicates it need free vring descriptors */
-		if (unlikely(need > 0)) {
-			nb_used = VIRTQUEUE_NUSED(txvq);
-			virtio_rmb();
-			need = RTE_MIN(need, (int)nb_used);
-
-			virtio_xmit_cleanup(txvq, need);
-			need = (int)tx_pkts[nb_tx]->nb_segs -
-				txvq->vq_free_cnt + 1;
-		}
-
-		/*
-		 * Zero or negative value indicates it has enough free
-		 * descriptors to use for transmitting.
-		 */
-		if (likely(need <= 0)) {
-			txm = tx_pkts[nb_tx];
-
-			/* Do VLAN tag insertion */
-			if (unlikely(txm->ol_flags & PKT_TX_VLAN_PKT)) {
-				error = rte_vlan_insert(&txm);
-				if (unlikely(error)) {
-					rte_pktmbuf_free(txm);
-					++nb_tx;
-					continue;
-				}
-			}
-
-			/* Enqueue Packet buffers */
-			error = virtqueue_enqueue_xmit(txvq, txm);
-			if (unlikely(error)) {
-				if (error == ENOSPC)
-					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count = 0");
-				else if (error == EMSGSIZE)
-					PMD_TX_LOG(ERR, "virtqueue_enqueue Free count < 1");
-				else
-					PMD_TX_LOG(ERR, "virtqueue_enqueue error: %d", error);
-				break;
-			}
-			nb_tx++;
-			txvq->bytes += txm->pkt_len;
-		} else {
-			PMD_TX_LOG(ERR, "No free tx descriptors to transmit");
-			break;
-		}
-	}
-
-	txvq->packets += nb_tx;
-
-	if (likely(nb_tx)) {
-		vq_update_avail_idx(txvq);
-
-		if (unlikely(virtqueue_kick_prepare(txvq))) {
-			virtqueue_notify(txvq);
-			PMD_TX_LOG(DEBUG, "Notified backend after xmit");
-		}
-	}
-
-	return nb_tx;
-}
diff --git a/lib/librte_pmd_virtio/virtqueue.c b/lib/librte_pmd_virtio/virtqueue.c
deleted file mode 100644
index 8a3005f..0000000
--- a/lib/librte_pmd_virtio/virtqueue.c
+++ /dev/null
@@ -1,70 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-#include <stdint.h>
-
-#include <rte_mbuf.h>
-
-#include "virtqueue.h"
-#include "virtio_logs.h"
-#include "virtio_pci.h"
-
-void
-virtqueue_disable_intr(struct virtqueue *vq)
-{
-	/*
-	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
-	 * not to interrupt when it consumes packets
-	 * Note: this is only considered a hint to the host
-	 */
-	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
-}
-
-/*
- * Two types of mbuf to be cleaned:
- * 1) mbuf that has been consumed by backend but not used by virtio.
- * 2) mbuf that hasn't been consued by backend.
- */
-struct rte_mbuf *
-virtqueue_detatch_unused(struct virtqueue *vq)
-{
-	struct rte_mbuf *cookie;
-	int idx;
-
-	for (idx = 0; idx < vq->vq_nentries; idx++) {
-		if ((cookie = vq->vq_descx[idx].cookie) != NULL) {
-			vq->vq_descx[idx].cookie = NULL;
-			return cookie;
-		}
-	}
-	return NULL;
-}
diff --git a/lib/librte_pmd_virtio/virtqueue.h b/lib/librte_pmd_virtio/virtqueue.h
deleted file mode 100644
index 9d6079e..0000000
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ /dev/null
@@ -1,325 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTQUEUE_H_
-#define _VIRTQUEUE_H_
-
-#include <stdint.h>
-
-#include <rte_atomic.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_mempool.h>
-
-#include "virtio_pci.h"
-#include "virtio_ring.h"
-#include "virtio_logs.h"
-
-struct rte_mbuf;
-
-/*
- * Per virtio_config.h in Linux.
- *     For virtio_pci on SMP, we don't need to order with respect to MMIO
- *     accesses through relaxed memory I/O windows, so smp_mb() et al are
- *     sufficient.
- *
- * This driver is for virtio_pci on SMP and therefore can assume
- * weaker (compiler barriers)
- */
-#define virtio_mb()	rte_mb()
-#define virtio_rmb()	rte_compiler_barrier()
-#define virtio_wmb()	rte_compiler_barrier()
-
-#ifdef RTE_PMD_PACKET_PREFETCH
-#define rte_packet_prefetch(p)  rte_prefetch1(p)
-#else
-#define rte_packet_prefetch(p)  do {} while(0)
-#endif
-
-#define VIRTQUEUE_MAX_NAME_SZ 32
-
-#define RTE_MBUF_DATA_DMA_ADDR(mb) \
-	(uint64_t) ((mb)->buf_physaddr + (mb)->data_off)
-
-#define VTNET_SQ_RQ_QUEUE_IDX 0
-#define VTNET_SQ_TQ_QUEUE_IDX 1
-#define VTNET_SQ_CQ_QUEUE_IDX 2
-
-enum { VTNET_RQ = 0, VTNET_TQ = 1, VTNET_CQ = 2 };
-/**
- * The maximum virtqueue size is 2^15. Use that value as the end of
- * descriptor chain terminator since it will never be a valid index
- * in the descriptor table. This is used to verify we are correctly
- * handling vq_free_cnt.
- */
-#define VQ_RING_DESC_CHAIN_END 32768
-
-/**
- * Control the RX mode, ie. promiscuous, allmulti, etc...
- * All commands require an "out" sg entry containing a 1 byte
- * state value, zero = disable, non-zero = enable.  Commands
- * 0 and 1 are supported with the VIRTIO_NET_F_CTRL_RX feature.
- * Commands 2-5 are added with VIRTIO_NET_F_CTRL_RX_EXTRA.
- */
-#define VIRTIO_NET_CTRL_RX              0
-#define VIRTIO_NET_CTRL_RX_PROMISC      0
-#define VIRTIO_NET_CTRL_RX_ALLMULTI     1
-#define VIRTIO_NET_CTRL_RX_ALLUNI       2
-#define VIRTIO_NET_CTRL_RX_NOMULTI      3
-#define VIRTIO_NET_CTRL_RX_NOUNI        4
-#define VIRTIO_NET_CTRL_RX_NOBCAST      5
-
-/**
- * Control the MAC
- *
- * The MAC filter table is managed by the hypervisor, the guest should
- * assume the size is infinite.  Filtering should be considered
- * non-perfect, ie. based on hypervisor resources, the guest may
- * received packets from sources not specified in the filter list.
- *
- * In addition to the class/cmd header, the TABLE_SET command requires
- * two out scatterlists.  Each contains a 4 byte count of entries followed
- * by a concatenated byte stream of the ETH_ALEN MAC addresses.  The
- * first sg list contains unicast addresses, the second is for multicast.
- * This functionality is present if the VIRTIO_NET_F_CTRL_RX feature
- * is available.
- *
- * The ADDR_SET command requests one out scatterlist, it contains a
- * 6 bytes MAC address. This functionality is present if the
- * VIRTIO_NET_F_CTRL_MAC_ADDR feature is available.
- */
-struct virtio_net_ctrl_mac {
-	uint32_t entries;
-	uint8_t macs[][ETHER_ADDR_LEN];
-} __attribute__((__packed__));
-
-#define VIRTIO_NET_CTRL_MAC    1
- #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0
- #define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
-
-/**
- * Control VLAN filtering
- *
- * The VLAN filter table is controlled via a simple ADD/DEL interface.
- * VLAN IDs not added may be filtered by the hypervisor.  Del is the
- * opposite of add.  Both commands expect an out entry containing a 2
- * byte VLAN ID.  VLAN filtering is available with the
- * VIRTIO_NET_F_CTRL_VLAN feature bit.
- */
-#define VIRTIO_NET_CTRL_VLAN     2
-#define VIRTIO_NET_CTRL_VLAN_ADD 0
-#define VIRTIO_NET_CTRL_VLAN_DEL 1
-
-struct virtio_net_ctrl_hdr {
-	uint8_t class;
-	uint8_t cmd;
-} __attribute__((packed));
-
-typedef uint8_t virtio_net_ctrl_ack;
-
-#define VIRTIO_NET_OK     0
-#define VIRTIO_NET_ERR    1
-
-#define VIRTIO_MAX_CTRL_DATA 2048
-
-struct virtio_pmd_ctrl {
-	struct virtio_net_ctrl_hdr hdr;
-	virtio_net_ctrl_ack status;
-	uint8_t data[VIRTIO_MAX_CTRL_DATA];
-};
-
-struct virtqueue {
-	struct virtio_hw         *hw;     /**< virtio_hw structure pointer. */
-	const struct rte_memzone *mz;     /**< mem zone to populate RX ring. */
-	const struct rte_memzone *virtio_net_hdr_mz; /**< memzone to populate hdr. */
-	struct rte_mempool       *mpool;  /**< mempool for mbuf allocation */
-	uint16_t    queue_id;             /**< DPDK queue index. */
-	uint8_t     port_id;              /**< Device port identifier. */
-	uint16_t    vq_queue_index;       /**< PCI queue index */
-
-	void        *vq_ring_virt_mem;    /**< linear address of vring*/
-	unsigned int vq_ring_size;
-	phys_addr_t vq_ring_mem;          /**< physical address of vring */
-
-	struct vring vq_ring;    /**< vring keeping desc, used and avail */
-	uint16_t    vq_free_cnt; /**< num of desc available */
-	uint16_t    vq_nentries; /**< vring desc numbers */
-	uint16_t    vq_free_thresh; /**< free threshold */
-	/**
-	 * Head of the free chain in the descriptor table. If
-	 * there are no free descriptors, this will be set to
-	 * VQ_RING_DESC_CHAIN_END.
-	 */
-	uint16_t  vq_desc_head_idx;
-	uint16_t  vq_desc_tail_idx;
-	/**
-	 * Last consumed descriptor in the used table,
-	 * trails vq_ring.used->idx.
-	 */
-	uint16_t vq_used_cons_idx;
-	uint16_t vq_avail_idx;
-	phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */
-
-	/* Statistics */
-	uint64_t	packets;
-	uint64_t	bytes;
-	uint64_t	errors;
-
-	struct vq_desc_extra {
-		void              *cookie;
-		uint16_t          ndescs;
-	} vq_descx[0];
-};
-
-/* If multiqueue is provided by host, then we suppport it. */
-#ifndef VIRTIO_NET_F_MQ
-/* Device supports Receive Flow Steering */
-#define VIRTIO_NET_F_MQ 0x400000
-#define VIRTIO_NET_CTRL_MQ   4
-#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0
-#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1
-#define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
-#endif
-#ifndef VIRTIO_NET_F_CTRL_MAC_ADDR
-#define VIRTIO_NET_F_CTRL_MAC_ADDR 0x800000
-#define VIRTIO_NET_CTRL_MAC_ADDR_SET         1
-#endif
-
-/**
- * This is the first element of the scatter-gather list.  If you don't
- * specify GSO or CSUM features, you can simply ignore the header.
- */
-struct virtio_net_hdr {
-#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1    /**< Use csum_start,csum_offset*/
-	uint8_t flags;
-#define VIRTIO_NET_HDR_GSO_NONE     0    /**< Not a GSO frame */
-#define VIRTIO_NET_HDR_GSO_TCPV4    1    /**< GSO frame, IPv4 TCP (TSO) */
-#define VIRTIO_NET_HDR_GSO_UDP      3    /**< GSO frame, IPv4 UDP (UFO) */
-#define VIRTIO_NET_HDR_GSO_TCPV6    4    /**< GSO frame, IPv6 TCP */
-#define VIRTIO_NET_HDR_GSO_ECN      0x80 /**< TCP has ECN set */
-	uint8_t gso_type;
-	uint16_t hdr_len;     /**< Ethernet + IP + tcp/udp hdrs */
-	uint16_t gso_size;    /**< Bytes to append to hdr_len per frame */
-	uint16_t csum_start;  /**< Position to start checksumming from */
-	uint16_t csum_offset; /**< Offset after that to place checksum */
-};
-
-/**
- * This is the version of the header to use when the MRG_RXBUF
- * feature has been negotiated.
- */
-struct virtio_net_hdr_mrg_rxbuf {
-	struct   virtio_net_hdr hdr;
-	uint16_t num_buffers; /**< Number of merged rx buffers */
-};
-
-/**
- * Tell the backend not to interrupt us.
- */
-void virtqueue_disable_intr(struct virtqueue *vq);
-/**
- *  Dump virtqueue internal structures, for debug purpose only.
- */
-void virtqueue_dump(struct virtqueue *vq);
-/**
- *  Get all mbufs to be freed.
- */
-struct rte_mbuf *virtqueue_detatch_unused(struct virtqueue *vq);
-
-static inline int
-virtqueue_full(const struct virtqueue *vq)
-{
-	return vq->vq_free_cnt == 0;
-}
-
-#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
-
-static inline void
-vq_update_avail_idx(struct virtqueue *vq)
-{
-	virtio_wmb();
-	vq->vq_ring.avail->idx = vq->vq_avail_idx;
-}
-
-static inline void
-vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
-{
-	uint16_t avail_idx;
-	/*
-	 * Place the head of the descriptor chain into the next slot and make
-	 * it usable to the host. The chain is made available now rather than
-	 * deferring to virtqueue_notify() in the hopes that if the host is
-	 * currently running on another CPU, we can keep it processing the new
-	 * descriptor.
-	 */
-	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
-	vq->vq_ring.avail->ring[avail_idx] = desc_idx;
-	vq->vq_avail_idx++;
-}
-
-static inline int
-virtqueue_kick_prepare(struct virtqueue *vq)
-{
-	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
-}
-
-static inline void
-virtqueue_notify(struct virtqueue *vq)
-{
-	/*
-	 * Ensure updated avail->idx is visible to host.
-	 * For virtio on IA, the notificaiton is through io port operation
-	 * which is a serialization instruction itself.
-	 */
-	VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_NOTIFY, vq->vq_queue_index);
-}
-
-#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
-#define VIRTQUEUE_DUMP(vq) do { \
-	uint16_t used_idx, nused; \
-	used_idx = (vq)->vq_ring.used->idx; \
-	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
-	PMD_INIT_LOG(DEBUG, \
-	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
-	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
-	  " avail.flags=0x%x; used.flags=0x%x", \
-	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
-	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
-	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
-	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
-} while (0)
-#else
-#define VIRTQUEUE_DUMP(vq) do { } while (0)
-#endif
-
-#endif /* _VIRTQUEUE_H_ */
-- 
2.1.0

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [RFC PATCH 0/2] dynamic memzones
  2015-05-08 16:37  4% [dpdk-dev] [RFC PATCH 0/2] dynamic memzones Sergio Gonzalez Monroy
  2015-05-08 16:37  1% ` [dpdk-dev] [RFC PATCH 2/2] eal: memzone allocated by malloc Sergio Gonzalez Monroy
@ 2015-05-12 16:30  0% ` Olivier MATZ
    2 siblings, 0 replies; 200+ results
From: Olivier MATZ @ 2015-05-12 16:30 UTC (permalink / raw)
  To: Sergio Gonzalez Monroy, dev

Hi Sergio,

On 05/08/2015 06:37 PM, Sergio Gonzalez Monroy wrote:
> Please NOTE that this series is meant to illustrate an idea/approach and start
> discussion on the topic.
>
> Current implemetation allows reserving/creating memzones but not the opposite
> (unreserve/delete). This affects mempools and other memzone based objects.
>
>  From my point of view, implementing unreserve functionality for memzones would
> look like malloc over memsegs.
> Thus, this approach moves malloc inside eal (which in turn removes a circular
> dependency), where malloc heaps are composed of memsegs.
> We keep both malloc and memzone APIs as they are, but memzones allocate its
> memory by calling malloc_heap_alloc (there would be some ABI changes, see below).
> Some extra functionality is required in malloc to allow for boundary constrained
> memory requests.
> In summary, currently malloc is based on memzones, and with this approach
> memzones are based on malloc.
>
> An alternative would be to move malloc internals (malloc_heap, malloc_elem)
> to the eal, but keeping the malloc library as is, where malloc is based on
> memzones. This way we could avoid ABI changes while keeping the existing
> circular dependency between malloc and eal.
>
> TODOs:
>   - Implement memzone_unreserve, simply call rte_malloc_free.
>   - Implement mempool_delete, simply call rte_memzone_unreserve.
>   - Init heaps with all available memsegs at once.
>   - Review symbols in version map.
>
> ABI changes:
>   - Removed support for rte_memzone_reserve_xxxx with len=0 (not needed?).
>   - Removed librte_malloc as single library (linker script as work around?).
>
> IDEAS FOR FUTURE WORK:
>   - More control over requested memory, ie. shared/private, phys_contig, etc.
>     One of the goals would be trying to reduce the need of physically contiguous
>     memory when not required.
>   - Attach/unattach hugepages at runtime (faster VM migration).
>   - Improve malloc algorithm? ie. jemalloc (or any other).
>
>
> Any comments/toughts and/or different approaches are welcome.

I like the idea and I don't see any issue on the principle. It
will clearly help to have dynamic pools or rings.

(I didn't dive in the second patch very deep, it's just a
high-level thought).

Regards,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 0/6] update jhash function
  @ 2015-05-12 15:33  4%   ` Neil Horman
  2015-05-13 13:52  0%     ` De Lara Guarch, Pablo
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-05-12 15:33 UTC (permalink / raw)
  To: Pablo de Lara; +Cc: dev

On Tue, May 12, 2015 at 12:02:32PM +0100, Pablo de Lara wrote:
> Jenkins hash function was developed originally in 1996,
> and was integrated in first versions of DPDK.
> The function has been improved in 2006,
> achieving up to 60% better performance, compared to the original one.
> 
> This patchset updates the current jhash in DPDK,
> including two new functions that generate two hashes from a single key.
> 
> It also separates the existing hash function performance tests to
> another file, to make it quicker to run.
> 
> changes in v4:
> - Simplify key alignment checks
> - Include missing x86 arch check
> 
> changes in v3:
> 
> - Update rte_jhash_1word, rte_jhash_2words and rte_jhash_3words
>   functions
> 
> changes in v2:
> 
> - Split single commit in three commits, one that updates the existing functions
>   and another that adds two new functions and use one of those functions
>   as a base to be called by the other ones.
> - Remove some unnecessary ifdefs in the code.
> - Add new macros to help on the reutilization of constants
> - Separate hash function performance tests to another file
>   and improve cycle measurements.
> - Rename existing function rte_jhash2 to rte_jhash_32b
>   (something more meaninful) and mark rte_jhash2 as
>   deprecated
> 
> Pablo de Lara (6):
>   test/hash: move hash function perf tests to separate file
>   test/hash: improve accuracy on cycle measurements
>   hash: update jhash function with the latest available
>   hash: add two new functions to jhash library
>   hash: remove duplicated code
>   hash: rename rte_jhash2 to rte_jhash_32b
> 
>  app/test/Makefile               |    1 +
>  app/test/test_func_reentrancy.c |    2 +-
>  app/test/test_hash.c            |    4 +-
>  app/test/test_hash_func_perf.c  |  145 +++++++++++++++++
>  app/test/test_hash_perf.c       |   71 +--------
>  lib/librte_hash/rte_jhash.h     |  338 +++++++++++++++++++++++++++++----------
>  6 files changed, 402 insertions(+), 159 deletions(-)
>  create mode 100644 app/test/test_hash_func_perf.c
> 
> -- 
> 1.7.4.1
> 
> 
did you run this through the ABI checker?  I see you're removing several symbols
that will likely need to go through the ABI deprecation process.

Neil

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/6] rte_sched: expand scheduler hierarchy for more VLAN's
  2015-05-11 17:32  4%     ` Stephen Hemminger
@ 2015-05-11 17:43  4%       ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-05-11 17:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

On Mon, May 11, 2015 at 10:32:59AM -0700, Stephen Hemminger wrote:
> On Mon, 11 May 2015 17:20:07 +0000
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Have you run this through the ABI checker?  Seems like this would alter lots of
> > pointer offsets.
> > Neil
> 
> No, I have not run it through ABI checker.
> It would change the ABI for applications using qos_sched but will not
> change layout of mbuf.
> 
> But my assumption was that as part of release process the ABI version
> would change rather than doing for each patch that gets merged.
> 

You're correct that the ABI version can change, but the process is to make an
update to doc/guides/rel_notes/abi.rst documenting the proposed changed, wait
for that to be published in an official release, then make the change for the
following release.  That way downstream adopters have some lead time to prep for
upstream changes.

Neil

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/6] rte_sched: expand scheduler hierarchy for more VLAN's
       [not found]       ` <8edea4c81f624728bb5f0476b680c410@BRMWP-EXMB11.corp.brocade.com>
@ 2015-05-11 17:32  4%     ` Stephen Hemminger
  2015-05-11 17:43  4%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-05-11 17:32 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev, Stephen Hemminger

On Mon, 11 May 2015 17:20:07 +0000
Neil Horman <nhorman@tuxdriver.com> wrote:

> Have you run this through the ABI checker?  Seems like this would alter lots of
> pointer offsets.
> Neil

No, I have not run it through ABI checker.
It would change the ABI for applications using qos_sched but will not
change layout of mbuf.

But my assumption was that as part of release process the ABI version
would change rather than doing for each patch that gets merged.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/6] rte_sched: expand scheduler hierarchy for more VLAN's
  @ 2015-05-11 17:20  3%   ` Neil Horman
       [not found]       ` <8edea4c81f624728bb5f0476b680c410@BRMWP-EXMB11.corp.brocade.com>
  1 sibling, 0 replies; 200+ results
From: Neil Horman @ 2015-05-11 17:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

On Mon, May 11, 2015 at 10:07:47AM -0700, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
> 
> The QoS subport is limited to 8 bits in original code.
> But customers demanded ability to support full number of VLAN's (4096)
> therefore use the full part of the tag field of mbuf.
> 
> Resize the pipe as well to allow for more pipes in future and
> avoid expensive bitfield access.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/librte_mbuf/rte_mbuf.h   |  5 ++++-
>  lib/librte_sched/rte_sched.h | 38 ++++++++++++++++++++++++--------------
>  2 files changed, 28 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index ab6de67..cc0658d 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -295,7 +295,10 @@ struct rte_mbuf {
>  			/**< First 4 flexible bytes or FD ID, dependent on
>  			     PKT_RX_FDIR_* flag in ol_flags. */
>  		} fdir;           /**< Filter identifier if FDIR enabled */
> -		uint32_t sched;   /**< Hierarchical scheduler */
> +		struct {
> +			uint32_t lo;
> +			uint32_t hi;
> +		} sched;          /**< Hierarchical scheduler */
>  		uint32_t usr;	  /**< User defined tags. See rte_distributor_process() */
>  	} hash;                   /**< hash information */
>  
> diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h
> index e6bba22..bf5ef8d 100644
> --- a/lib/librte_sched/rte_sched.h
> +++ b/lib/librte_sched/rte_sched.h
> @@ -195,16 +195,20 @@ struct rte_sched_port_params {
>  #endif
>  };
>  
> -/** Path through the scheduler hierarchy used by the scheduler enqueue operation to
> -identify the destination queue for the current packet. Stored in the field hash.sched
> -of struct rte_mbuf of each packet, typically written by the classification stage and read by
> -scheduler enqueue.*/
> +/*
> + * Path through the scheduler hierarchy used by the scheduler enqueue
> + * operation to identify the destination queue for the current
> + * packet. Stored in the field pkt.hash.sched of struct rte_mbuf of
> + * each packet, typically written by the classification stage and read
> + * by scheduler enqueue.
> + */
>  struct rte_sched_port_hierarchy {
> -	uint32_t queue:2;                /**< Queue ID (0 .. 3) */
> -	uint32_t traffic_class:2;        /**< Traffic class ID (0 .. 3)*/
> -	uint32_t pipe:20;                /**< Pipe ID */
> -	uint32_t subport:6;              /**< Subport ID */
> -	uint32_t color:2;                /**< Color */
> +	uint16_t queue:2;		 /**< Queue ID (0 .. 3) */
> +	uint16_t traffic_class:2;	 /**< Traffic class ID (0 .. 3)*/
> +	uint16_t color:2;		 /**< Color */
> +	uint16_t unused:10;
> +	uint16_t subport;		 /**< Subport ID */
> +	uint32_t pipe;			 /**< Pipe ID */
>  };
Have you run this through the ABI checker?  Seems like this would alter lots of
pointer offsets.
Neil

>  
>  /*
> @@ -350,12 +354,15 @@ rte_sched_queue_read_stats(struct rte_sched_port *port,
>   */
>  static inline void
>  rte_sched_port_pkt_write(struct rte_mbuf *pkt,
> -	uint32_t subport, uint32_t pipe, uint32_t traffic_class, uint32_t queue, enum rte_meter_color color)
> +			 uint32_t subport, uint32_t pipe,
> +			 uint32_t traffic_class,
> +			 uint32_t queue, enum rte_meter_color color)
>  {
> -	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
> +	struct rte_sched_port_hierarchy *sched
> +		= (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
>  
> -	sched->color = (uint32_t) color;
>  	sched->subport = subport;
> +	sched->color = (uint32_t) color;
>  	sched->pipe = pipe;
>  	sched->traffic_class = traffic_class;
>  	sched->queue = queue;
> @@ -379,9 +386,12 @@ rte_sched_port_pkt_write(struct rte_mbuf *pkt,
>   *
>   */
>  static inline void
> -rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t *subport, uint32_t *pipe, uint32_t *traffic_class, uint32_t *queue)
> +rte_sched_port_pkt_read_tree_path(struct rte_mbuf *pkt, uint32_t *subport,
> +				  uint32_t *pipe, uint32_t *traffic_class,
> +				  uint32_t *queue)
>  {
> -	struct rte_sched_port_hierarchy *sched = (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
> +	struct rte_sched_port_hierarchy *sched
> +		= (struct rte_sched_port_hierarchy *) &pkt->hash.sched;
>  
>  	*subport = sched->subport;
>  	*pipe = sched->pipe;
> -- 
> 2.1.4
> 
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 4/6] rte_sched: allow reading without clearing
  @ 2015-05-11 12:53  3%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-05-11 12:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2015-04-29 10:04, Stephen Hemminger:
> The rte_sched statistics API should allow reading statistics without
> clearing. Make auto-clear optional.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_sched.c        |  4 ++--
>  examples/qos_sched/stats.c   | 16 +++++++++++-----
>  lib/librte_sched/rte_sched.c | 44 ++++++++++++++++++++++----------------------
>  lib/librte_sched/rte_sched.h | 18 ++++++++++--------
[...]

This API change needs more adjustments in the example app:

examples/qos_sched/stats.c: In function ‘subport_stat’:
examples/qos_sched/stats.c:263:9: error: too few arguments to function ‘rte_sched_subport_read_stats’
         rte_sched_subport_read_stats(port, subport_id, &stats, tc_ov);
         ^
examples/qos_sched/stats.c: In function ‘pipe_stat’:
examples/qos_sched/stats.c:309:25: error: too few arguments to function ‘rte_sched_queue_read_stats’
                         rte_sched_queue_read_stats(port, queue_id + (i * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS + j), &stats, &qlen);
                         ^

[...]
> --- a/lib/librte_sched/rte_sched.h
> +++ b/lib/librte_sched/rte_sched.h
> @@ -308,14 +308,15 @@ rte_sched_port_get_memory_footprint(struct rte_sched_port_params *params);
>   * @param tc_ov
>   *   Pointer to pre-allocated 4-entry array where the oversubscription status for
>   *   each of the 4 subport traffic classes should be stored.
> + * @parm clear
> + *   Reset statistics after read
>   * @return
>   *   0 upon success, error code otherwise
>   */
>  int
> -rte_sched_subport_read_stats(struct rte_sched_port *port,
> -	uint32_t subport_id,
> -	struct rte_sched_subport_stats *stats,
> -	uint32_t *tc_ov);
> +rte_sched_subport_read_stats(struct rte_sched_port *port, uint32_t subport_id,
> +			     struct rte_sched_subport_stats *stats,
> +			     uint32_t *tc_ov, int clear);
>  
>  /**
>   * Hierarchical scheduler queue statistics read
> @@ -329,14 +330,15 @@ rte_sched_subport_read_stats(struct rte_sched_port *port,
>   *   counters should be stored
>   * @param qlen
>   *   Pointer to pre-allocated variable where the current queue length should be stored.
> + * @parm clear
> + *   Reset statistics after read
>   * @return
>   *   0 upon success, error code otherwise
>   */
>  int
> -rte_sched_queue_read_stats(struct rte_sched_port *port,
> -	uint32_t queue_id,
> -	struct rte_sched_queue_stats *stats,
> -	uint16_t *qlen);
> +rte_sched_queue_read_stats(struct rte_sched_port *port, uint32_t queue_id,
> +			   struct rte_sched_queue_stats *stats,
> +			   uint16_t *qlen, int clear);
>  
>  /*
>   * Run-time
> 

What about ABI versioning? compatibility?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [RFC PATCH 0/2] dynamic memzones
@ 2015-05-08 16:37  4% Sergio Gonzalez Monroy
  2015-05-08 16:37  1% ` [dpdk-dev] [RFC PATCH 2/2] eal: memzone allocated by malloc Sergio Gonzalez Monroy
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-05-08 16:37 UTC (permalink / raw)
  To: dev

Please NOTE that this series is meant to illustrate an idea/approach and start
discussion on the topic.

Current implemetation allows reserving/creating memzones but not the opposite
(unreserve/delete). This affects mempools and other memzone based objects.

>From my point of view, implementing unreserve functionality for memzones would
look like malloc over memsegs.
Thus, this approach moves malloc inside eal (which in turn removes a circular
dependency), where malloc heaps are composed of memsegs.
We keep both malloc and memzone APIs as they are, but memzones allocate its
memory by calling malloc_heap_alloc (there would be some ABI changes, see below).
Some extra functionality is required in malloc to allow for boundary constrained
memory requests.
In summary, currently malloc is based on memzones, and with this approach
memzones are based on malloc.

An alternative would be to move malloc internals (malloc_heap, malloc_elem)
to the eal, but keeping the malloc library as is, where malloc is based on
memzones. This way we could avoid ABI changes while keeping the existing
circular dependency between malloc and eal.

TODOs:
 - Implement memzone_unreserve, simply call rte_malloc_free.
 - Implement mempool_delete, simply call rte_memzone_unreserve.
 - Init heaps with all available memsegs at once.
 - Review symbols in version map.

ABI changes:
 - Removed support for rte_memzone_reserve_xxxx with len=0 (not needed?).
 - Removed librte_malloc as single library (linker script as work around?).

IDEAS FOR FUTURE WORK:
 - More control over requested memory, ie. shared/private, phys_contig, etc.
   One of the goals would be trying to reduce the need of physically contiguous
   memory when not required.
 - Attach/unattach hugepages at runtime (faster VM migration).
 - Improve malloc algorithm? ie. jemalloc (or any other).


Any comments/toughts and/or different approaches are welcome.


Sergio Gonzalez Monroy (2):
  eal: move librte_malloc to eal/common
  eal: memzone allocated by malloc

 config/common_bsdapp                            |   9 +-
 config/common_linuxapp                          |   9 +-
 lib/Makefile                                    |   1 -
 lib/librte_acl/Makefile                         |   2 +-
 lib/librte_eal/bsdapp/eal/Makefile              |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  18 ++
 lib/librte_eal/common/Makefile                  |   1 +
 lib/librte_eal/common/eal_common_memzone.c      | 233 ++--------------
 lib/librte_eal/common/include/rte_malloc.h      | 342 ++++++++++++++++++++++++
 lib/librte_eal/common/include/rte_malloc_heap.h |   4 +-
 lib/librte_eal/common/include/rte_memory.h      |   1 +
 lib/librte_eal/common/malloc_elem.c             | 342 ++++++++++++++++++++++++
 lib/librte_eal/common/malloc_elem.h             | 192 +++++++++++++
 lib/librte_eal/common/malloc_heap.c             | 287 ++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h             |  70 +++++
 lib/librte_eal/common/rte_malloc.c              | 259 ++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile            |   4 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  18 ++
 lib/librte_hash/Makefile                        |   2 +-
 lib/librte_lpm/Makefile                         |   2 +-
 lib/librte_malloc/Makefile                      |  52 ----
 lib/librte_malloc/malloc_elem.c                 | 320 ----------------------
 lib/librte_malloc/malloc_elem.h                 | 190 -------------
 lib/librte_malloc/malloc_heap.c                 | 209 ---------------
 lib/librte_malloc/malloc_heap.h                 |  70 -----
 lib/librte_malloc/rte_malloc.c                  | 260 ------------------
 lib/librte_malloc/rte_malloc.h                  | 342 ------------------------
 lib/librte_malloc/rte_malloc_version.map        |  19 --
 lib/librte_mempool/Makefile                     |   2 -
 lib/librte_pmd_af_packet/Makefile               |   1 -
 lib/librte_pmd_bond/Makefile                    |   1 -
 lib/librte_pmd_e1000/Makefile                   |   2 +-
 lib/librte_pmd_enic/Makefile                    |   2 +-
 lib/librte_pmd_fm10k/Makefile                   |   2 +-
 lib/librte_pmd_i40e/Makefile                    |   2 +-
 lib/librte_pmd_ixgbe/Makefile                   |   2 +-
 lib/librte_pmd_mlx4/Makefile                    |   1 -
 lib/librte_pmd_null/Makefile                    |   1 -
 lib/librte_pmd_pcap/Makefile                    |   1 -
 lib/librte_pmd_virtio/Makefile                  |   2 +-
 lib/librte_pmd_vmxnet3/Makefile                 |   2 +-
 lib/librte_pmd_xenvirt/Makefile                 |   2 +-
 lib/librte_port/Makefile                        |   1 -
 lib/librte_ring/Makefile                        |   3 +-
 lib/librte_table/Makefile                       |   1 -
 45 files changed, 1571 insertions(+), 1719 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_malloc.h
 create mode 100644 lib/librte_eal/common/malloc_elem.c
 create mode 100644 lib/librte_eal/common/malloc_elem.h
 create mode 100644 lib/librte_eal/common/malloc_heap.c
 create mode 100644 lib/librte_eal/common/malloc_heap.h
 create mode 100644 lib/librte_eal/common/rte_malloc.c
 delete mode 100644 lib/librte_malloc/Makefile
 delete mode 100644 lib/librte_malloc/malloc_elem.c
 delete mode 100644 lib/librte_malloc/malloc_elem.h
 delete mode 100644 lib/librte_malloc/malloc_heap.c
 delete mode 100644 lib/librte_malloc/malloc_heap.h
 delete mode 100644 lib/librte_malloc/rte_malloc.c
 delete mode 100644 lib/librte_malloc/rte_malloc.h
 delete mode 100644 lib/librte_malloc/rte_malloc_version.map

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [RFC PATCH 2/2] eal: memzone allocated by malloc
  2015-05-08 16:37  4% [dpdk-dev] [RFC PATCH 0/2] dynamic memzones Sergio Gonzalez Monroy
@ 2015-05-08 16:37  1% ` Sergio Gonzalez Monroy
  2015-05-12 16:30  0% ` [dpdk-dev] [RFC PATCH 0/2] dynamic memzones Olivier MATZ
    2 siblings, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-05-08 16:37 UTC (permalink / raw)
  To: dev

In the current memory hierarchy, memsegs are groups of physically contiguous
hugepages, memzone are slices of memsegs and malloc further slices memzones
into smaller memory chunks.

This patch modifies malloc so it slices/partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memoy allocation while
maintaining its ABI. The only exception is the reserving a memzone with
len=0 is not supported anymore.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c      | 233 ++----------------------
 lib/librte_eal/common/include/rte_malloc_heap.h |   4 +-
 lib/librte_eal/common/include/rte_memory.h      |   1 +
 lib/librte_eal/common/malloc_elem.c             |  60 ++++--
 lib/librte_eal/common/malloc_elem.h             |  14 +-
 lib/librte_eal/common/malloc_heap.c             | 188 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h             |   4 +-
 lib/librte_eal/common/rte_malloc.c              |   7 +-
 8 files changed, 207 insertions(+), 304 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 888f9e5..3dc8133 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -50,11 +50,10 @@
 #include <rte_string_fns.h>
 #include <rte_common.h>
 
+#include "malloc_heap.h"
+#include "malloc_elem.h"
 #include "eal_private.h"
 
-/* internal copy of free memory segments */
-static struct rte_memseg *free_memseg = NULL;
-
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
@@ -88,53 +87,12 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 			len, socket_id, flags, RTE_CACHE_LINE_SIZE);
 }
 
-/*
- * Helper function for memzone_reserve_aligned_thread_unsafe().
- * Calculate address offset from the start of the segment.
- * Align offset in that way that it satisfy istart alignmnet and
- * buffer of the  requested length would not cross specified boundary.
- */
-static inline phys_addr_t
-align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align,
-	size_t bound)
-{
-	phys_addr_t addr_offset, bmask, end, start;
-	size_t step;
-
-	step = RTE_MAX(align, bound);
-	bmask = ~((phys_addr_t)bound - 1);
-
-	/* calculate offset to closest alignment */
-	start = RTE_ALIGN_CEIL(ms->phys_addr, align);
-	addr_offset = start - ms->phys_addr;
-
-	while (addr_offset + len < ms->len) {
-
-		/* check, do we meet boundary condition */
-		end = start + len - (len != 0);
-		if ((start & bmask) == (end & bmask))
-			break;
-
-		/* calculate next offset */
-		start = RTE_ALIGN_CEIL(start + 1, step);
-		addr_offset = start - ms->phys_addr;
-	}
-
-	return (addr_offset);
-}
-
 static const struct rte_memzone *
 memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
 	struct rte_mem_config *mcfg;
-	unsigned i = 0;
-	int memseg_idx = -1;
-	uint64_t addr_offset, seg_offset = 0;
 	size_t requested_len;
-	size_t memseg_len = 0;
-	phys_addr_t memseg_physaddr;
-	void *memseg_addr;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -166,10 +124,10 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (align < RTE_CACHE_LINE_SIZE)
 		align = RTE_CACHE_LINE_SIZE;
 
-
-	/* align length on cache boundary. Check for overflow before doing so */
-	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
-		rte_errno = EINVAL; /* requested size too big */
+	/* align length on cache boundary. Check for overflow before doing so
+	 * FIXME need to update API doc regarding len value*/
+	if ((len > SIZE_MAX - RTE_CACHE_LINE_MASK) || (len == 0)){
+		rte_errno = EINVAL;
 		return NULL;
 	}
 
@@ -186,123 +144,29 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	/* find the smallest segment matching requirements */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		/* last segment */
-		if (free_memseg[i].addr == NULL)
-			break;
-
-		/* empty segment, skip it */
-		if (free_memseg[i].len == 0)
-			continue;
-
-		/* bad socket ID */
-		if (socket_id != SOCKET_ID_ANY &&
-		    free_memseg[i].socket_id != SOCKET_ID_ANY &&
-		    socket_id != free_memseg[i].socket_id)
-			continue;
-
-		/*
-		 * calculate offset to closest alignment that
-		 * meets boundary conditions.
-		 */
-		addr_offset = align_phys_boundary(free_memseg + i,
-			requested_len, align, bound);
-
-		/* check len */
-		if ((requested_len + addr_offset) > free_memseg[i].len)
-			continue;
-
-		/* check flags for hugepage sizes */
-		if ((flags & RTE_MEMZONE_2MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_1G)
-			continue;
-		if ((flags & RTE_MEMZONE_1GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_2M)
-			continue;
-		if ((flags & RTE_MEMZONE_16MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16G)
-			continue;
-		if ((flags & RTE_MEMZONE_16GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16M)
-			continue;
-
-		/* this segment is the best until now */
-		if (memseg_idx == -1) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-		/* find the biggest contiguous zone */
-		else if (len == 0) {
-			if (free_memseg[i].len > memseg_len) {
-				memseg_idx = i;
-				memseg_len = free_memseg[i].len;
-				seg_offset = addr_offset;
-			}
-		}
-		/*
-		 * find the smallest (we already checked that current
-		 * zone length is > len
-		 */
-		else if (free_memseg[i].len + align < memseg_len ||
-				(free_memseg[i].len <= memseg_len + align &&
-				addr_offset < seg_offset)) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-	}
 
-	/* no segment found */
-	if (memseg_idx == -1) {
-		/*
-		 * If RTE_MEMZONE_SIZE_HINT_ONLY flag is specified,
-		 * try allocating again without the size parameter otherwise -fail.
-		 */
-		if ((flags & RTE_MEMZONE_SIZE_HINT_ONLY)  &&
-		    ((flags & RTE_MEMZONE_1GB) || (flags & RTE_MEMZONE_2MB)
-		|| (flags & RTE_MEMZONE_16MB) || (flags & RTE_MEMZONE_16GB)))
-			return memzone_reserve_aligned_thread_unsafe(name,
-				len, socket_id, 0, align, bound);
+	/* get socket heap */
 
+	/* allocate memory on heap */
+	void *mz_addr = malloc_heap_alloc(&mcfg->malloc_heaps[socket_id], NULL,
+			requested_len, flags, align, bound);
+	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
 
-	/* save aligned physical and virtual addresses */
-	memseg_physaddr = free_memseg[memseg_idx].phys_addr + seg_offset;
-	memseg_addr = RTE_PTR_ADD(free_memseg[memseg_idx].addr,
-			(uintptr_t) seg_offset);
-
-	/* if we are looking for a biggest memzone */
-	if (len == 0) {
-		if (bound == 0)
-			requested_len = memseg_len - seg_offset;
-		else
-			requested_len = RTE_ALIGN_CEIL(memseg_physaddr + 1,
-				bound) - memseg_physaddr;
-	}
-
-	/* set length to correct value */
-	len = (size_t)seg_offset + requested_len;
-
-	/* update our internal state */
-	free_memseg[memseg_idx].len -= len;
-	free_memseg[memseg_idx].phys_addr += len;
-	free_memseg[memseg_idx].addr =
-		(char *)free_memseg[memseg_idx].addr + len;
+	const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
 
 	/* fill the zone in config */
 	struct rte_memzone *mz = &mcfg->memzone[mcfg->memzone_idx++];
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
-	mz->phys_addr = memseg_physaddr;
-	mz->addr = memseg_addr;
+	mz->phys_addr = rte_malloc_virt2phy(mz_addr);
+	mz->addr = mz_addr;
 	mz->len = requested_len;
-	mz->hugepage_sz = free_memseg[memseg_idx].hugepage_sz;
-	mz->socket_id = free_memseg[memseg_idx].socket_id;
+	mz->hugepage_sz = elem->ms->hugepage_sz;
+	mz->socket_id = elem->ms->socket_id;
 	mz->flags = 0;
-	mz->memseg_id = memseg_idx;
+	mz->memseg_id = elem->ms - rte_eal_get_configuration()->mem_config->memseg;
 
 	return mz;
 }
@@ -419,45 +283,6 @@ rte_memzone_dump(FILE *f)
 }
 
 /*
- * called by init: modify the free memseg list to have cache-aligned
- * addresses and cache-aligned lengths
- */
-static int
-memseg_sanitize(struct rte_memseg *memseg)
-{
-	unsigned phys_align;
-	unsigned virt_align;
-	unsigned off;
-
-	phys_align = memseg->phys_addr & RTE_CACHE_LINE_MASK;
-	virt_align = (unsigned long)memseg->addr & RTE_CACHE_LINE_MASK;
-
-	/*
-	 * sanity check: phys_addr and addr must have the same
-	 * alignment
-	 */
-	if (phys_align != virt_align)
-		return -1;
-
-	/* memseg is really too small, don't bother with it */
-	if (memseg->len < (2 * RTE_CACHE_LINE_SIZE)) {
-		memseg->len = 0;
-		return 0;
-	}
-
-	/* align start address */
-	off = (RTE_CACHE_LINE_SIZE - phys_align) & RTE_CACHE_LINE_MASK;
-	memseg->phys_addr += off;
-	memseg->addr = (char *)memseg->addr + off;
-	memseg->len -= off;
-
-	/* align end address */
-	memseg->len &= ~((uint64_t)RTE_CACHE_LINE_MASK);
-
-	return 0;
-}
-
-/*
  * Init the memzone subsystem
  */
 int
@@ -465,14 +290,10 @@ rte_eal_memzone_init(void)
 {
 	struct rte_mem_config *mcfg;
 	const struct rte_memseg *memseg;
-	unsigned i = 0;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* mirror the runtime memsegs from config */
-	free_memseg = mcfg->free_memseg;
-
 	/* secondary processes don't need to initialise anything */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		return 0;
@@ -485,26 +306,6 @@ rte_eal_memzone_init(void)
 
 	rte_rwlock_write_lock(&mcfg->mlock);
 
-	/* fill in uninitialized free_memsegs */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (memseg[i].addr == NULL)
-			break;
-		if (free_memseg[i].addr != NULL)
-			continue;
-		memcpy(&free_memseg[i], &memseg[i], sizeof(struct rte_memseg));
-	}
-
-	/* make all zones cache-aligned */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (free_memseg[i].addr == NULL)
-			break;
-		if (memseg_sanitize(&free_memseg[i]) < 0) {
-			RTE_LOG(ERR, EAL, "%s(): Sanity check failed\n", __func__);
-			rte_rwlock_write_unlock(&mcfg->mlock);
-			return -1;
-		}
-	}
-
 	/* delete all zones */
 	mcfg->memzone_idx = 0;
 	memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index 716216f..5333348 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -40,7 +40,7 @@
 #include <rte_memory.h>
 
 /* Number of free lists per heap, grouped by size. */
-#define RTE_HEAP_NUM_FREELISTS  5
+#define RTE_HEAP_NUM_FREELISTS  10
 
 /**
  * Structure to hold malloc heap
@@ -48,7 +48,7 @@
 struct malloc_heap {
 	rte_spinlock_t lock;
 	LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
-	unsigned mz_count;
+	unsigned ms_count;
 	unsigned alloc_count;
 	size_t total_size;
 } __rte_cache_aligned;
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 7f8103f..ab13d04 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -100,6 +100,7 @@ struct rte_memseg {
 	 /**< store segment MFNs */
 	uint64_t mfn[DOM0_NUM_MEMBLOCK];
 #endif
+	uint8_t used;               /**< already used by a heap */
 } __attribute__((__packed__));
 
 /**
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index a5e1248..5e95abb 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -37,7 +37,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
 #include <rte_per_lcore.h>
@@ -56,10 +55,10 @@
  */
 void
 malloc_elem_init(struct malloc_elem *elem,
-		struct malloc_heap *heap, const struct rte_memzone *mz, size_t size)
+		struct malloc_heap *heap, const struct rte_memseg *ms, size_t size)
 {
 	elem->heap = heap;
-	elem->mz = mz;
+	elem->ms = ms;
 	elem->prev = NULL;
 	memset(&elem->free_list, 0, sizeof(elem->free_list));
 	elem->state = ELEM_FREE;
@@ -70,12 +69,12 @@ malloc_elem_init(struct malloc_elem *elem,
 }
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
 {
-	malloc_elem_init(elem, prev->heap, prev->mz, 0);
+	malloc_elem_init(elem, prev->heap, prev->ms, 0);
 	elem->prev = prev;
 	elem->state = ELEM_BUSY; /* mark busy so its never merged */
 }
@@ -86,12 +85,24 @@ malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
  * fit, return NULL.
  */
 static void *
-elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
+elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	const uintptr_t end_pt = (uintptr_t)elem +
+	const size_t bmask = ~(bound - 1);
+	uintptr_t end_pt = (uintptr_t)elem +
 			elem->size - MALLOC_ELEM_TRAILER_LEN;
-	const uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
-	const uintptr_t new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
+	uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+	uintptr_t new_elem_start;
+
+	/* check boundary */
+	if ((new_data_start & bmask) != (end_pt & bmask)) {
+		end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
+		new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+		if ((end_pt & bmask) != (new_data_start & bmask))
+			return NULL;
+	}
+
+	new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
 
 	/* if the new start point is before the exist start, it won't fit */
 	return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
@@ -102,9 +113,10 @@ elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
  * alignment request from the current element
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,	unsigned align,
+		size_t bound)
 {
-	return elem_start_pt(elem, size, align) != NULL;
+	return elem_start_pt(elem, size, align, bound) != NULL;
 }
 
 /*
@@ -118,7 +130,7 @@ split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt)
 	const unsigned old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
 	const unsigned new_elem_size = elem->size - old_elem_size;
 
-	malloc_elem_init(split_pt, elem->heap, elem->mz, new_elem_size);
+	malloc_elem_init(split_pt, elem->heap, elem->ms, new_elem_size);
 	split_pt->prev = elem;
 	next_elem->prev = split_pt;
 	elem->size = old_elem_size;
@@ -190,12 +202,25 @@ elem_free_list_remove(struct malloc_elem *elem)
  * is not done here, as it's done there previously.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	struct malloc_elem *new_elem = elem_start_pt(elem, size, align);
-	const unsigned old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
+	const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	const size_t trailer_size = elem->size - old_elem_size - size;
+
+	elem_free_list_remove(elem);
 
-	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE){
+	if (trailer_size > MALLOC_ELEM_OVERHEAD * 2 + MIN_DATA_SIZE) {
+		/* split it, too much free space after elem */
+		struct malloc_elem *new_free_elem =
+				RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
+
+		split_elem(elem, new_free_elem);
+		malloc_elem_free_list_insert(new_free_elem);
+	}
+
+	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
 		/* don't split it, pad the element instead */
 		elem->state = ELEM_BUSY;
 		elem->pad = old_elem_size;
@@ -208,8 +233,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 			new_elem->size = elem->size - elem->pad;
 			set_header(new_elem);
 		}
-		/* remove element from free list */
-		elem_free_list_remove(elem);
 
 		return new_elem;
 	}
@@ -219,7 +242,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 	 * Re-insert original element, in case its new size makes it
 	 * belong on a different list.
 	 */
-	elem_free_list_remove(elem);
 	split_elem(elem, new_elem);
 	new_elem->state = ELEM_BUSY;
 	malloc_elem_free_list_insert(elem);
diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h
index 9790b1a..e05d2ea 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -47,9 +47,9 @@ enum elem_state {
 
 struct malloc_elem {
 	struct malloc_heap *heap;
-	struct malloc_elem *volatile prev;      /* points to prev elem in memzone */
+	struct malloc_elem *volatile prev;      /* points to prev elem in memseg */
 	LIST_ENTRY(malloc_elem) free_list;      /* list of free elements in heap */
-	const struct rte_memzone *mz;
+	const struct rte_memseg *ms;
 	volatile enum elem_state state;
 	uint32_t pad;
 	size_t size;
@@ -136,11 +136,11 @@ malloc_elem_from_data(const void *data)
 void
 malloc_elem_init(struct malloc_elem *elem,
 		struct malloc_heap *heap,
-		const struct rte_memzone *mz,
+		const struct rte_memseg *ms,
 		size_t size);
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem,
@@ -151,14 +151,16 @@ malloc_elem_mkend(struct malloc_elem *elem,
  * of the requested size and with the requested alignment
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * reserve a block of data in an existing malloc_elem. If the malloc_elem
  * is much larger than the data block requested, we split the element in two.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_alloc(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * free a malloc_elem block by adding it to the free list. If the
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index defb903..b79e0e9 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -39,7 +39,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_launch.h>
@@ -54,69 +53,136 @@
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
-/* since the memzone size starts with a digit, it will appear unquoted in
- * rte_config.h, so quote it so it can be passed to rte_str_to_size */
-#define MALLOC_MEMZONE_SIZE RTE_STR(RTE_MALLOC_MEMZONE_SIZE)
+static unsigned
+check_hugepage_sz(unsigned flags, size_t hugepage_sz)
+{
+	unsigned ret = 1;
 
-/*
- * returns the configuration setting for the memzone size as a size_t value
- */
-static inline size_t
-get_malloc_memzone_size(void)
+	if ((flags & RTE_MEMZONE_2MB) && hugepage_sz == RTE_PGSIZE_1G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_1GB) && hugepage_sz == RTE_PGSIZE_2M)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16MB) && hugepage_sz == RTE_PGSIZE_16G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16GB) && hugepage_sz == RTE_PGSIZE_16M)
+		ret = 0;
+
+	return ret;
+}
+
+static struct rte_memseg*
+find_suitable_memseg(int socket_id, size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
-	return rte_str_to_size(MALLOC_MEMZONE_SIZE);
+	struct rte_memseg *ms = rte_eal_get_configuration()->mem_config->memseg;
+	uintptr_t data_end, data_start;
+	size_t bmask = ~(bound - 1);
+	unsigned i;
+	int ms_idx = -1, alt_ms_idx = -1;
+	size_t ms_len = 0, alt_ms_len = 0;
+	size_t min_size;
+
+	min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		/* last segment */
+		if (ms[i].addr == NULL)
+			break;
+
+		/* in use */
+		if (ms[i].used)
+			continue;
+
+		/* bad socket ID */
+		if (socket_id != SOCKET_ID_ANY && ms[i].socket_id != SOCKET_ID_ANY &&
+		    socket_id != ms[i].socket_id)
+			continue;
+
+		/* check len */
+		if (min_size > ms[i].len)
+			continue;
+
+		/* check boundary */
+		data_end = (uintptr_t)ms[i].addr + ms[i].len -
+			MALLOC_ELEM_OVERHEAD - MALLOC_ELEM_TRAILER_LEN ;
+		data_end = RTE_ALIGN_FLOOR(data_end, RTE_CACHE_LINE_SIZE);
+		data_start = RTE_ALIGN_FLOOR((data_end - size), align);
+		if ((data_end & bmask) != (data_start & bmask)) {
+			/* check we have enough space before boudnary */
+			data_end = RTE_ALIGN_FLOOR(data_end, bound);
+			data_start = RTE_ALIGN_FLOOR((data_end - size), align);
+			if (((data_end & bmask) != (data_start & bmask)) ||
+					((uintptr_t)ms[i].addr > (data_start - MALLOC_ELEM_HEADER_LEN)))
+				continue;
+		}
+
+		/* at this point, we have a memseg */
+
+		/* keep best memseg found */
+		if ((alt_ms_idx == -1) ||
+			(ms[i].len < alt_ms_len)) {
+			alt_ms_idx = i;
+			alt_ms_len = ms[i].len;
+		}
+
+		/* check flags for hugepage sizes */
+		if (!check_hugepage_sz(flags, ms[i].hugepage_sz))
+			continue;
+
+		/* keep best memseg found with requested hugepage size */
+		if ((ms_idx == -1) ||
+			(ms[i].len < ms_len)) {
+			ms_idx = i;
+			ms_len = ms[i].len;
+		}
+	}
+
+	if ((ms_idx == -1) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		ms_idx = alt_ms_idx;
+
+	if (ms_idx == -1)
+		return NULL;
+
+	return &ms[ms_idx];
 }
 
+/* This function expects correct values:
+ * - size: >= RTE_CACHE_LINE_SIZE
+ * - align: power_of_two && >= RTE_CACHE_LINE_SIZE
+ * - bound: power_of_two && >= size
+ */
 /*
+ * find a suitable memory segment available to expand the heap
  * reserve an extra memory zone and make it available for use by a particular
  * heap. This reserves the zone and sets a dummy malloc_elem header at the end
  * to prevent overflow. The rest of the zone is added to free list as a single
  * large free block
  */
 static int
-malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
+malloc_heap_add_memseg(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
-	const unsigned mz_flags = 0;
-	const size_t block_size = get_malloc_memzone_size();
-	/* ensure the data we want to allocate will fit in the memzone */
-	const size_t min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
-	const struct rte_memzone *mz = NULL;
+	struct rte_memseg *ms = NULL;
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	unsigned numa_socket = heap - mcfg->malloc_heaps;
-
-	size_t mz_size = min_size;
-	if (mz_size < block_size)
-		mz_size = block_size;
-
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	snprintf(mz_name, sizeof(mz_name), "MALLOC_S%u_HEAP_%u",
-		     numa_socket, heap->mz_count++);
-
-	/* try getting a block. if we fail and we don't need as big a block
-	 * as given in the config, we can shrink our request and try again
-	 */
-	do {
-		mz = rte_memzone_reserve(mz_name, mz_size, numa_socket,
-					 mz_flags);
-		if (mz == NULL)
-			mz_size /= 2;
-	} while (mz == NULL && mz_size > min_size);
-	if (mz == NULL)
+	unsigned socket_id = heap - mcfg->malloc_heaps;
+
+	ms = find_suitable_memseg(socket_id, size, flags, align, bound);
+	if (ms == NULL)
 		return -1;
 
+	ms->used = 1;
 	/* allocate the memory block headers, one at end, one at start */
-	struct malloc_elem *start_elem = (struct malloc_elem *)mz->addr;
-	struct malloc_elem *end_elem = RTE_PTR_ADD(mz->addr,
-			mz_size - MALLOC_ELEM_OVERHEAD);
+	struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
+	struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr,
+			ms->len - MALLOC_ELEM_OVERHEAD);
 	end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
 
 	const unsigned elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
-	malloc_elem_init(start_elem, heap, mz, elem_size);
+	malloc_elem_init(start_elem, heap, ms, elem_size);
 	malloc_elem_mkend(end_elem, start_elem);
 	malloc_elem_free_list_insert(start_elem);
 
-	/* increase heap total size by size of new memzone */
-	heap->total_size+=mz_size - MALLOC_ELEM_OVERHEAD;
+	heap->total_size += ms->len - MALLOC_ELEM_OVERHEAD;
 	return 0;
 }
 
@@ -126,10 +192,11 @@ malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
  * Returns null on failure, or pointer to element on success.
  */
 static struct malloc_elem *
-find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
+find_suitable_element(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
 	size_t idx;
-	struct malloc_elem *elem;
+	struct malloc_elem *elem, *alt_elem = NULL;
 
 	for (idx = malloc_elem_free_list_index(size);
 		idx < RTE_HEAP_NUM_FREELISTS; idx++)
@@ -137,40 +204,51 @@ find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
 		for (elem = LIST_FIRST(&heap->free_head[idx]);
 			!!elem; elem = LIST_NEXT(elem, free_list))
 		{
-			if (malloc_elem_can_hold(elem, size, align))
-				return elem;
+			if (malloc_elem_can_hold(elem, size, align, bound)) {
+				if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
+					return elem;
+				else
+					alt_elem = elem;
+			}
 		}
 	}
+
+	if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		return alt_elem;
+
 	return NULL;
 }
 
 /*
- * Main function called by malloc to allocate a block of memory from the
- * heap. It locks the free list, scans it, and adds a new memzone if the
- * scan fails. Once the new memzone is added, it re-scans and should return
+ * Main function to allocate a block of memory from the heap.
+ * It locks the free list, scans it, and adds a new memseg if the
+ * scan fails. Once the new memseg is added, it re-scans and should return
  * the new element after releasing the lock.
  */
 void *
 malloc_heap_alloc(struct malloc_heap *heap,
-		const char *type __attribute__((unused)), size_t size, unsigned align)
+		const char *type __attribute__((unused)), size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
 	size = RTE_CACHE_LINE_ROUNDUP(size);
 	align = RTE_CACHE_LINE_ROUNDUP(align);
+
 	rte_spinlock_lock(&heap->lock);
-	struct malloc_elem *elem = find_suitable_element(heap, size, align);
+	struct malloc_elem *elem = find_suitable_element(heap, size, flags,
+			align, bound);
 	if (elem == NULL){
-		if ((malloc_heap_add_memzone(heap, size, align)) == 0)
-			elem = find_suitable_element(heap, size, align);
+		if ((malloc_heap_add_memseg(heap, size, flags, align, bound)) == 0)
+			elem = find_suitable_element(heap, size, flags, align, bound);
 	}
 
 	if (elem != NULL){
-		elem = malloc_elem_alloc(elem, size, align);
+		elem = malloc_elem_alloc(elem, size, align, bound);
 		/* increase heap's count of allocated elements */
 		heap->alloc_count++;
 	}
 	rte_spinlock_unlock(&heap->lock);
-	return elem == NULL ? NULL : (void *)(&elem[1]);
 
+	return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
 /*
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a47136d..4ba3353 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -53,8 +53,8 @@ malloc_get_numa_socket(void)
 }
 
 void *
-malloc_heap_alloc(struct malloc_heap *heap, const char *type,
-		size_t size, unsigned align);
+malloc_heap_alloc(struct malloc_heap *heap,	const char *type, size_t size,
+		unsigned flags, size_t align, size_t bound);
 
 int
 malloc_heap_get_stats(const struct malloc_heap *heap,
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index c313a57..54c2bd8 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -39,7 +39,6 @@
 
 #include <rte_memcpy.h>
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_branch_prediction.h>
@@ -87,7 +86,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 		return NULL;
 
 	ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
-				size, align == 0 ? 1 : align);
+				size, 0, align == 0 ? 1 : align, 0);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
@@ -98,7 +97,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 			continue;
 
 		ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
-					size, align == 0 ? 1 : align);
+					size, 0, align == 0 ? 1 : align, 0);
 		if (ret != NULL)
 			return ret;
 	}
@@ -256,5 +255,5 @@ rte_malloc_virt2phy(const void *addr)
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->mz->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->mz->addr);
+	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
-- 
1.9.3

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v7 00/10] Interrupt mode PMD
    2015-05-05  5:39  3% ` [dpdk-dev] From: Cunming Liang <cunming.liang@intel.com> Cunming Liang
@ 2015-05-05  5:53  3% ` Cunming Liang
  1 sibling, 0 replies; 200+ results
From: Cunming Liang @ 2015-05-05  5:53 UTC (permalink / raw)
  To: dev; +Cc: shemming

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess.
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (10):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/bsd: dummy for new intr definition
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch

 examples/l3fwd-power/main.c                        | 206 ++++++++--
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |   6 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 232 +++++++++--
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  12 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  97 +++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   4 +
 lib/librte_ether/rte_ethdev.c                      | 132 +++++++
 lib/librte_ether/rte_ethdev.h                      | 104 +++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 lib/librte_pmd_e1000/e1000_ethdev.h                |   3 +
 lib/librte_pmd_e1000/igb_ethdev.c                  | 256 +++++++++++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                | 425 ++++++++++++++++++++-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h                |   7 +
 13 files changed, 1394 insertions(+), 94 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 07/10] ethdev: add rx intr enable, disable and ctl functions
  2015-05-05  5:39  3% ` [dpdk-dev] From: Cunming Liang <cunming.liang@intel.com> Cunming Liang
@ 2015-05-05  5:39  2%   ` Cunming Liang
  2015-05-21  8:55  2%   ` [dpdk-dev] [PATCH v8 00/11] Interrupt mode PMD Cunming Liang
  1 sibling, 0 replies; 200+ results
From: Cunming Liang @ 2015-05-05  5:39 UTC (permalink / raw)
  To: dev; +Cc: shemming

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 132 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 +
 3 files changed, 240 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 024fe8b..cdde14c 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,138 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		if (intr_handle->intr_vec[qid] < 0) {
+			PMD_DEBUG_TRACE("RX Intr vector invalid on %d\n", qid);
+			continue;
+		}
+
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec,
+				     data, rte_eth_dev_socket_id(port_id));
+		if (rc) {
+			PMD_DEBUG_TRACE("p %d q %d rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec || intr_handle->intr_vec[queue_id] < 0) {
+		PMD_DEBUG_TRACE("RX Intr vector unset on %d\n", rx_queue_id);
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec,
+			     data, rte_eth_dev_socket_id(port_id));
+	if (rc) {
+		PMD_DEBUG_TRACE("p %d q %d rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (dev == NULL) {
+		PMD_DEBUG_TRACE("Invalid port device\n");
+		return -ENODEV;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 4648290..e5efec0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -829,6 +829,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1034,6 +1036,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1385,6 +1395,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2867,6 +2881,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] From: Cunming Liang <cunming.liang@intel.com>
  @ 2015-05-05  5:39  3% ` Cunming Liang
  2015-05-05  5:39  2%   ` [dpdk-dev] [PATCH v7 07/10] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-05-21  8:55  2%   ` [dpdk-dev] [PATCH v8 00/11] Interrupt mode PMD Cunming Liang
  2015-05-05  5:53  3% ` [dpdk-dev] [PATCH v7 00/10] " Cunming Liang
  1 sibling, 2 replies; 200+ results
From: Cunming Liang @ 2015-05-05  5:39 UTC (permalink / raw)
  To: dev; +Cc: shemming

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess.
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (10):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/bsd: dummy for new intr definition
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch

 examples/l3fwd-power/main.c                        | 206 ++++++++--
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |   6 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 232 +++++++++--
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  12 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  97 +++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   4 +
 lib/librte_ether/rte_ethdev.c                      | 132 +++++++
 lib/librte_ether/rte_ethdev.h                      | 104 +++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 lib/librte_pmd_e1000/e1000_ethdev.h                |   3 +
 lib/librte_pmd_e1000/igb_ethdev.c                  | 256 +++++++++++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                | 425 ++++++++++++++++++++-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h                |   7 +
 13 files changed, 1394 insertions(+), 94 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-30 16:00  3%                               ` Neil Horman
@ 2015-05-01  0:15  4%                                 ` Ravi Kerur
  0 siblings, 0 replies; 200+ results
From: Ravi Kerur @ 2015-05-01  0:15 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Thu, Apr 30, 2015 at 9:00 AM, Neil Horman <nhorman@tuxdriver.com> wrote:

> On Wed, Apr 29, 2015 at 10:47:04AM -0700, Ravi Kerur wrote:
> > > > I tried to run validate-abi.sh on BSD but ran into errors. If there
> is a
> > > > way to check against BSD please let me know.
> > > >
> > > The ABI checker should work on BSD as far as I know, since it only
> relies
> > > on
> > > dwarf information in the output binary.  What errors are you seeing?
> > >
> >
> > dpdk-bsd:/home/rkerur/dpdk-validate-abi-1/dpdk # sh
> > ./scripts/validate-abi.sh v2.0.0-rc3 v2.0.0-abi
> x86_64-native-bsdapp-clang
> > mktemp: illegal option -- p
> Ah, bsd mktemp doesn't support the -p option.  I'll see if I can fix that.
>

I think there are couple of other issues I found

freeBSD sed is different from Linux (GNU sed) and I get following errors
with the script

"sed 1 command c expects \ followed by text".

I have to use gsed (GNU sed) in freeBSD to get rid of that error and
similarly freeBSD uses gmake instead of make.  I have made those minor
changes and sending them with this email as an attachment.


> > usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
> >        mktemp [-d] [-q] [-u] -t prefix
> > Cant find abi-compliance-checker utility
> >
> > abi-compliance-checker is installed as shown below.
> >
> > dpdk-bsd:/home/rkerur/dpdk-validate-abi-1/dpdk # pkg install
> > devel/abi-compliance-checker
> > Updating FreeBSD repository catalogue...
> > FreeBSD repository is up-to-date.
> > All repositories are up-to-date.
> > Checking integrity... done (0 conflicting)
> > The most recent version of packages are already installed
> >
>
> Whats the path for abi-compliance checker there?  It would seem that the
> binary
> isn't in your path, as which isn't locating it.
>

I am using regular freeBSD port install which doesn't install in any
/usr/bin or /usr/local/bin. I finally decided to install both abi-dumper
and abi-compliance-checker from source, compile and install it in correct
directory. Above error is fixed after that, however, abi utilities use
"eu-readelf" and I can't find that utility to install in freeBSD. I get
following errors

ERROR: can't find "eu-readelf" command

freeBSD has only readelf. Please let me know if there is a way to get rid
of this error.

Thanks,
Ravi

>
> > >
> > > Neil
> > >
> > >
>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-29 17:47  5%                             ` Ravi Kerur
@ 2015-04-30 16:00  3%                               ` Neil Horman
  2015-05-01  0:15  4%                                 ` Ravi Kerur
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-04-30 16:00 UTC (permalink / raw)
  To: Ravi Kerur; +Cc: dev

On Wed, Apr 29, 2015 at 10:47:04AM -0700, Ravi Kerur wrote:
> > > I tried to run validate-abi.sh on BSD but ran into errors. If there is a
> > > way to check against BSD please let me know.
> > >
> > The ABI checker should work on BSD as far as I know, since it only relies
> > on
> > dwarf information in the output binary.  What errors are you seeing?
> >
> 
> dpdk-bsd:/home/rkerur/dpdk-validate-abi-1/dpdk # sh
> ./scripts/validate-abi.sh v2.0.0-rc3 v2.0.0-abi x86_64-native-bsdapp-clang
> mktemp: illegal option -- p
Ah, bsd mktemp doesn't support the -p option.  I'll see if I can fix that.

> usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
>        mktemp [-d] [-q] [-u] -t prefix
> Cant find abi-compliance-checker utility
> 
> abi-compliance-checker is installed as shown below.
> 
> dpdk-bsd:/home/rkerur/dpdk-validate-abi-1/dpdk # pkg install
> devel/abi-compliance-checker
> Updating FreeBSD repository catalogue...
> FreeBSD repository is up-to-date.
> All repositories are up-to-date.
> Checking integrity... done (0 conflicting)
> The most recent version of packages are already installed
> 

Whats the path for abi-compliance checker there?  It would seem that the binary
isn't in your path, as which isn't locating it.
Neil

> 
> >
> > Neil
> >
> >

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] gmake test on freeBSD
  2015-04-29  8:29  0% ` [dpdk-dev] gmake test on freeBSD Bruce Richardson
@ 2015-04-29 17:58  0%   ` Ravi Kerur
  0 siblings, 0 replies; 200+ results
From: Ravi Kerur @ 2015-04-29 17:58 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Wed, Apr 29, 2015 at 1:29 AM, Bruce Richardson <
bruce.richardson@intel.com> wrote:

> On Tue, Apr 28, 2015 at 06:15:53PM -0700, Ravi Kerur wrote:
> > DPDK team,
> >
> > Is there a automated tests to run on freeBSD similar to Linux (make
> test).
> >
> > I ran "gmake test T=x86_64-native-bsdapp-clang CC=clang" I get following
> > output
> >
> > /usr/home/rkerur/dpdk-validate-abi-1/dpdk/build/app/test -c f -n 4
> >
> > Test name                      Test result                      Test
> > Total
> >
> ================================================================================
> > Start group_1:                 Fail [Can't run]              [00m 00s]
> > Timer autotest:                Fail [Can't run]              [00m 00s]
> > Debug autotest:                Fail [Can't run]              [00m 00s]
> > Errno autotest:                Fail [Can't run]              [00m 00s]
> > Meter autotest:                Fail [Can't run]              [00m 00s]
> > Common autotest:               Fail [Can't run]              [00m 00s]
> > Dump log history:              Fail [Can't run]              [00m 00s]
> > ...
> > Start memcpy_perf:             Fail [No prompt]              [00m 00s]
> > Memcpy performance autotest:   Fail [No prompt]              [00m 00s]
> [00m
> > 01s]
> > Start hash_perf:               Fail [No prompt]              [00m 00s]
> > Hash performance autotest:     Fail [No prompt]              [00m 00s]
> [00m
> > 01s]
> > Start power:                   Fail [No prompt]              [00m 00s]
> > Power autotest:                Fail [No prompt]              [00m 00s]
> [00m
> > 01s]
> > ...
> >
> > I have contigmem and nic_uio installed. I know some applications are
> > linuxapp specific but wanted to know if there is a similar automated test
> > tool like Linux?
> >
> > Thanks,
> > Ravi
>
> There is no separate test tool for FreeBSD. Unfortunately there are a
> number of little
> things that don't really work on FreeBSD - and this looks to be one of
> them. We
> probably need to look to fix this.
>
> Thanks Bruce. Is it due to missing infra for BSD or some minor fixes in
app/test?

> /Bruce
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-29 10:04  3%                           ` Neil Horman
@ 2015-04-29 17:47  5%                             ` Ravi Kerur
  2015-04-30 16:00  3%                               ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ravi Kerur @ 2015-04-29 17:47 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

> > I tried to run validate-abi.sh on BSD but ran into errors. If there is a
> > way to check against BSD please let me know.
> >
> The ABI checker should work on BSD as far as I know, since it only relies
> on
> dwarf information in the output binary.  What errors are you seeing?
>

dpdk-bsd:/home/rkerur/dpdk-validate-abi-1/dpdk # sh
./scripts/validate-abi.sh v2.0.0-rc3 v2.0.0-abi x86_64-native-bsdapp-clang
mktemp: illegal option -- p
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
Cant find abi-compliance-checker utility

abi-compliance-checker is installed as shown below.

dpdk-bsd:/home/rkerur/dpdk-validate-abi-1/dpdk # pkg install
devel/abi-compliance-checker
Updating FreeBSD repository catalogue...
FreeBSD repository is up-to-date.
All repositories are up-to-date.
Checking integrity... done (0 conflicting)
The most recent version of packages are already installed


>
> Neil
>
>

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL
  2015-04-28 23:46  4% [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL Ravi Kerur
  2015-04-28 23:46  2% ` [dpdk-dev] [PATCH v8 1/6] Move common functions in eal_thread.c Ravi Kerur
@ 2015-04-29 10:14  0% ` Neil Horman
  1 sibling, 0 replies; 200+ results
From: Neil Horman @ 2015-04-29 10:14 UTC (permalink / raw)
  To: Ravi Kerur; +Cc: dev

On Tue, Apr 28, 2015 at 04:46:21PM -0700, Ravi Kerur wrote:
> Changes in v8 includes
> Re-ordering source file compilation to fix ABI warning.
> Ran validate-abi against x86_64-native-linuxapp-gcc,
> x86_64-native-linuxapp-clang and x86_64-ivshmem-linuxapp-gcc
> environments.
> 
> Testing:
> Linux - Ubuntu x86_64 14.04
> Compilation successful (x86_64-native-linuxapp-gcc and
> x86_64-native-linuxapp-clang).
> "make test" results match baseline code.
> testpmd utility on I217/I218 Intel chipset.
> 
> FreeBSD 10.0 x86_64
> Compilation successful (x86_64-native-bsdapp-gcc and
> x86_64-native-bsdapp-clang).
> Tested with helloworld, timer and cmdline examples.
> 
> Ravi Kerur (6):
>   Move common functions in eal_thread.c
>   Move common functions in eal.c
>   Move common functions in eal_lcore.c
>   Move common functions in eal_timer.c
>   Move common functions in eal_memory.c
>   Move common functions in eal_pci.c
> 
>  lib/librte_eal/bsdapp/eal/Makefile           |   9 +-
>  lib/librte_eal/bsdapp/eal/eal.c              | 271 +++---------------------
>  lib/librte_eal/bsdapp/eal/eal_lcore.c        |  72 ++-----
>  lib/librte_eal/bsdapp/eal/eal_memory.c       |  47 ++---
>  lib/librte_eal/bsdapp/eal/eal_pci.c          |  72 +------
>  lib/librte_eal/bsdapp/eal/eal_thread.c       | 152 --------------
>  lib/librte_eal/bsdapp/eal/eal_timer.c        |  52 +----
>  lib/librte_eal/common/eal_common_app_usage.c |  63 ++++++
>  lib/librte_eal/common/eal_common_lcore.c     | 107 ++++++++++
>  lib/librte_eal/common/eal_common_mem_cfg.c   | 224 ++++++++++++++++++++
>  lib/librte_eal/common/eal_common_memory.c    |  38 +++-
>  lib/librte_eal/common/eal_common_pci.c       |  72 +++++++
>  lib/librte_eal/common/eal_common_proc_type.c |  58 ++++++
>  lib/librte_eal/common/eal_common_sysfs.c     | 148 ++++++++++++++
>  lib/librte_eal/common/eal_common_thread.c    | 147 ++++++++++++-
>  lib/librte_eal/common/eal_common_timer.c     | 102 +++++++++
>  lib/librte_eal/common/eal_hugepages.h        |   1 +
>  lib/librte_eal/common/eal_private.h          | 171 +++++++++++++++-
>  lib/librte_eal/common/include/rte_eal.h      |   4 +
>  lib/librte_eal/linuxapp/eal/Makefile         |  10 +-
>  lib/librte_eal/linuxapp/eal/eal.c            | 296 ++++-----------------------
>  lib/librte_eal/linuxapp/eal/eal_lcore.c      |  66 +-----
>  lib/librte_eal/linuxapp/eal/eal_memory.c     |  36 +---
>  lib/librte_eal/linuxapp/eal/eal_pci.c        |  75 +------
>  lib/librte_eal/linuxapp/eal/eal_thread.c     | 152 +-------------
>  lib/librte_eal/linuxapp/eal/eal_timer.c      |  55 +----
>  26 files changed, 1277 insertions(+), 1223 deletions(-)
>  create mode 100644 lib/librte_eal/common/eal_common_app_usage.c
>  create mode 100644 lib/librte_eal/common/eal_common_lcore.c
>  create mode 100644 lib/librte_eal/common/eal_common_mem_cfg.c
>  create mode 100644 lib/librte_eal/common/eal_common_proc_type.c
>  create mode 100644 lib/librte_eal/common/eal_common_sysfs.c
>  create mode 100644 lib/librte_eal/common/eal_common_timer.c
> 
> -- 
> 1.9.1
> 
> 

Series
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-28 23:52  4%                         ` Ravi Kerur
@ 2015-04-29 10:04  3%                           ` Neil Horman
  2015-04-29 17:47  5%                             ` Ravi Kerur
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-04-29 10:04 UTC (permalink / raw)
  To: Ravi Kerur; +Cc: dev

On Tue, Apr 28, 2015 at 04:52:37PM -0700, Ravi Kerur wrote:
> On Tue, Apr 28, 2015 at 12:35 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Mon, Apr 27, 2015 at 03:39:41PM -0700, Ravi Kerur wrote:
> > > On Mon, Apr 27, 2015 at 6:44 AM, Neil Horman <nhorman@tuxdriver.com>
> > wrote:
> > >
> > > > On Sat, Apr 25, 2015 at 05:09:01PM -0700, Ravi Kerur wrote:
> > > > > On Sat, Apr 25, 2015 at 6:02 AM, Neil Horman <nhorman@tuxdriver.com>
> > > > wrote:
> > > > >
> > > > > > On Sat, Apr 25, 2015 at 08:32:42AM -0400, Neil Horman wrote:
> > > > > > > On Fri, Apr 24, 2015 at 06:45:06PM -0700, Ravi Kerur wrote:
> > > > > > > > On Fri, Apr 24, 2015 at 2:24 PM, Ravi Kerur <rkerur@gmail.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Apr 24, 2015 at 12:51 PM, Neil Horman <
> > > > nhorman@tuxdriver.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> On Fri, Apr 24, 2015 at 12:21:23PM -0700, Ravi Kerur wrote:
> > > > > > > > >> > On Fri, Apr 24, 2015 at 11:53 AM, Neil Horman <
> > > > > > nhorman@tuxdriver.com>
> > > > > > > > >> wrote:
> > > > > > > > >> >
> > > > > > > > >> > > On Fri, Apr 24, 2015 at 09:45:24AM -0700, Ravi Kerur
> > wrote:
> > > > > > > > >> > > > On Fri, Apr 24, 2015 at 8:22 AM, Neil Horman <
> > > > > > nhorman@tuxdriver.com
> > > > > > > > >> >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > On Fri, Apr 24, 2015 at 08:14:04AM -0700, Ravi Kerur
> > > > wrote:
> > > > > > > > >> > > > > > On Fri, Apr 24, 2015 at 6:51 AM, Neil Horman <
> > > > > > > > >> nhorman@tuxdriver.com>
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > > On Thu, Apr 23, 2015 at 02:35:31PM -0700, Ravi
> > Kerur
> > > > > > wrote:
> > > > > > > > >> > > > > > > > Changes in v7
> > > > > > > > >> > > > > > > > Remove _setname_ pthread calls.
> > > > > > > > >> > > > > > > > Use rte_gettid() API in RTE_LOG to print
> > > > thread_id.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Changes in v6
> > > > > > > > >> > > > > > > > Remove RTE_EXEC_ENV_BSDAPP from
> > > > eal_common_thread.c
> > > > > > file.
> > > > > > > > >> > > > > > > > Add pthread_setname_np/pthread_set_name_np for
> > > > > > Linux/FreeBSD
> > > > > > > > >> > > > > > > > respectively. Plan to use _getname_ in RTE_LOG
> > > > when
> > > > > > > > >> available.
> > > > > > > > >> > > > > > > > Use existing rte_get_systid() in RTE_LOG to
> > print
> > > > > > thread_id.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Changes in v5
> > > > > > > > >> > > > > > > > Rebase to latest code.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Changes in v4
> > > > > > > > >> > > > > > > > None
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Changes in v3
> > > > > > > > >> > > > > > > > Changed subject to be more explicit on file
> > name
> > > > > > inclusion.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Changes in v2
> > > > > > > > >> > > > > > > > None
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Changes in v1
> > > > > > > > >> > > > > > > > eal_thread.c has minor differences between
> > Linux
> > > > and
> > > > > > BSD,
> > > > > > > > >> move
> > > > > > > > >> > > > > > > > entire file into common directory.
> > > > > > > > >> > > > > > > > Use RTE_EXEC_ENV_BSDAPP to differentiate on
> > minor
> > > > > > > > >> differences.
> > > > > > > > >> > > > > > > > Rename eal_thread.c to eal_common_thread.c
> > > > > > > > >> > > > > > > > Makefile changes to reflect file move and name
> > > > change.
> > > > > > > > >> > > > > > > > Fix checkpatch warnings.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Signed-off-by: Ravi Kerur <rkerur@gmail.com>
> > > > > > > > >> > > > > > > > ---
> > > > > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/Makefile        |
> >  2
> > > > +-
> > > > > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/eal_thread.c    |
> > 152
> > > > > > > > >> > > > > > > ------------------------------
> > > > > > > > >> > > > > > > >  lib/librte_eal/common/eal_common_thread.c |
> > 147
> > > > > > > > >> > > > > > > ++++++++++++++++++++++++++++-
> > > > > > > > >> > > > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  |
> > 152
> > > > > > > > >> > > > > > > +-----------------------------
> > > > > > > > >> > > > > > > >  4 files changed, 148 insertions(+), 305
> > > > deletions(-)
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > diff --git
> > a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > > >> > > > > > > > index 2357cfa..55971b9 100644
> > > > > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > > >> > > > > > > > @@ -87,7 +87,7 @@ CFLAGS_eal_common_log.o :=
> > > > > > -D_GNU_SOURCE
> > > > > > > > >> > > > > > > >  # workaround for a gcc bug with noreturn
> > > > attribute
> > > > > > > > >> > > > > > > >  #
> > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> > > > > > > > >> > > > > > > >  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > > > > > > > >> > > > > > > > -CFLAGS_eal_thread.o += -Wno-return-type
> > > > > > > > >> > > > > > > > +CFLAGS_eal_common_thread.o +=
> > -Wno-return-type
> > > > > > > > >> > > > > > > >  CFLAGS_eal_hpet.o += -Wno-return-type
> > > > > > > > >> > > > > > > >  endif
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > diff --git
> > > > a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > > >> > > > > > > > index 9a03437..5714b8f 100644
> > > > > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > > >> > > > > > > > @@ -35,163 +35,11 @@
> > > > > > > > >> > > > > > > >  #include <stdio.h>
> > > > > > > > >> > > > > > > >  #include <stdlib.h>
> > > > > > > > >> > > > > > > >  #include <stdint.h>
> > > > > > > > >> > > > > > > > -#include <unistd.h>
> > > > > > > > >> > > > > > > > -#include <sched.h>
> > > > > > > > >> > > > > > > > -#include <pthread_np.h>
> > > > > > > > >> > > > > > > > -#include <sys/queue.h>
> > > > > > > > >> > > > > > > >  #include <sys/thr.h>
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > -#include <rte_debug.h>
> > > > > > > > >> > > > > > > > -#include <rte_atomic.h>
> > > > > > > > >> > > > > > > > -#include <rte_launch.h>
> > > > > > > > >> > > > > > > > -#include <rte_log.h>
> > > > > > > > >> > > > > > > > -#include <rte_memory.h>
> > > > > > > > >> > > > > > > > -#include <rte_memzone.h>
> > > > > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > > > > >> > > > > > > > -#include <rte_eal.h>
> > > > > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > > > > >> > > > > > > > -#include <rte_lcore.h>
> > > > > > > > >> > > > > > > > -
> > > > > > > > >> > > > > > > >  #include "eal_private.h"
> > > > > > > > >> > > > > > > >  #include "eal_thread.h"
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > > > > > LCORE_ID_ANY;
> > > > > > > > >> > > > > > > NAK, these are exported symbols, you can't
> > remove
> > > > them
> > > > > > without
> > > > > > > > >> > > going
> > > > > > > > >> > > > > > > through the
> > > > > > > > >> > > > > > > deprecation process.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > They are not removed/deleted, they are moved from
> > > > > > eal_thread.c
> > > > > > > > >> to
> > > > > > > > >> > > > > > eal_common_thread.c file since it is common to
> > both
> > > > Linux
> > > > > > and
> > > > > > > > >> BSD.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > Then perhaps you forgot to export the symbol?  Its
> > > > showing
> > > > > > up as
> > > > > > > > >> > > removed
> > > > > > > > >> > > > > on the
> > > > > > > > >> > > > > ABI checker utility.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Neil
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > > > Can you please show me in the current code where it is
> > > > being
> > > > > > > > >> exported? I
> > > > > > > > >> > > > have only moved definitions to _common_ files, not
> > sure
> > > > why it
> > > > > > > > >> should be
> > > > > > > > >> > > > exported now.  I searched in the current code for
> > > > > > > > >> RTE_DEFINE_PER_LCORE
> > > > > > > > >> > > >
> > > > > > > > >> > > > #home/rkerur/dpdk-tmp/dpdk# grep -ir
> > RTE_DEFINE_PER_LCORE
> > > > *
> > > > > > > > >> > > > app/test/test_per_lcore.c:static
> > > > > > RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > > >> test) =
> > > > > > > > >> > > > 0x12345678;
> > > > > > > > >> > > >
> > > > > > > > >>
> > > > > >
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > > > > >> > > >
> > > > > > > > >>
> > > > > >
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >>
> > > > > >
> > > >
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > > > > >> > > > _cpuset);
> > > > > > > > >> > > >
> > > > > > > > >>
> > > > > >
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > > > > >> > > >
> > > > > > > > >>
> > > > > >
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > > > > >> > > >
> > > > > > > > >>
> > > > > >
> > > >
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > > > > >> > > > _cpuset);
> > > > > > > > >> > > > lib/librte_eal/common/include/rte_per_lcore.h:#define
> > > > > > > > >> > > > RTE_DEFINE_PER_LCORE(type, name)            \
> > > > > > > > >> > > > lib/librte_eal/common/include/rte_eal.h:    static
> > > > > > > > >> > > > RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
> > > > > > > > >> > > >
> > > > > > lib/librte_eal/common/eal_common_errno.c:RTE_DEFINE_PER_LCORE(int,
> > > > > > > > >> > > > _rte_errno);
> > > > > > > > >> > > > lib/librte_eal/common/eal_common_errno.c:    static
> > > > > > > > >> > > > RTE_DEFINE_PER_LCORE(char[RETVAL_SZ], retval);
> > > > > > > > >> > > >
> > > > > > > > >> > > >
> > > > > > > > >> > > > > > Thanks
> > > > > > > > >> > > > > > Ravi
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Regards
> > > > > > > > >> > > > > > > Neil
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > Its exported in the version map file:
> > > > > > > > >> > >  per_lcore__lcore_id;
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > Thanks Neil, I checked and both linux and bsd
> > > > rte_eal_version.map
> > > > > > have
> > > > > > > > >> it.
> > > > > > > > >> > I compared .map file between "changed code" and the
> > original,
> > > > > > they are
> > > > > > > > >> same
> > > > > > > > >> > for both linux and bsd. In fact you had ACK'd v4 version
> > of
> > > > this
> > > > > > patch
> > > > > > > > >> > series and no major changes after that. Please let me
> > know if
> > > > I
> > > > > > missed
> > > > > > > > >> > something.
> > > > > > > > >> >
> > > > > > > > >> I did, and I'm retracting that, because I didn't think to
> > check
> > > > the
> > > > > > ABI
> > > > > > > > >> compatibility on this.  But I ran it throught the ABI
> > checking
> > > > > > script
> > > > > > > > >> this and
> > > > > > > > >> this error popped out.  You should run it as well, its in
> > the
> > > > > > scripts
> > > > > > > > >> directory.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> I see in your first patch you removed it and re-added it in
> > the
> > > > > > common
> > > > > > > > >> section.
> > > > > > > > >> But something about how its building is causing it to not
> > show
> > > > up
> > > > > > as an
> > > > > > > > >> exported
> > > > > > > > >> symbol, which is problematic, as other applications are
> > going to
> > > > > > want
> > > > > > > > >> access to
> > > > > > > > >> it.
> > > > > > > > >>
> > > > > > > > >> It also possible that the ABI checker is throwing a false
> > > > positive,
> > > > > > but
> > > > > > > > >> either
> > > > > > > > >> way, it needs to be looked into prior to moving forward with
> > > > this.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > > I did following things.
> > > > > > > > >
> > > > > > > > > Put a tag (v2.0.0-before-common-eal)  before EAL common
> > functions
> > > > > > changes
> > > > > > > > > for commit (3c0c807038ad642f4be7deb9370293c39d12f029 net:
> > remove
> > > > > > unneeded
> > > > > > > > > include)
> > > > > > > > >
> > > > > > > > > Put a tag (v2.0.0-common-eal) after EAL common functions
> > changes
> > > > for
> > > > > > > > > commit (25737e5a7212630a7b5d8ca756860a062f403789 Move common
> > > > > > functions in
> > > > > > > > > eal_pci.c)
> > > > > > > > >
> > > > > > > > > Ran validate-abi against x86_64-native-linuxapp-gcc and
> > > > > > > > >
> > > > > > > > > v2.0.0-rc3 and v2.0.0-before-common-eal, html report for
> > > > > > librte_eal.so
> > > > > > > > > shows removed symbols for "per_lcore__cpuset"
> > > > > > > > >
> > > > > > > > > v2.0.0-rc3 and v2.0.0-common-eal, html report for
> > librte_eal.so
> > > > shows
> > > > > > > > > removed symbols for "per_lcore__cpuset"
> > > > > > > > >
> > > > > > > > > Removed symbol is different from what you have reported and
> > in my
> > > > > > case I
> > > > > > > > > see it even before my commit. If you are interested I can
> > unicast
> > > > > > you html
> > > > > > > > > report file. Please let me know how to proceed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > I did some experiment and found some interesting things.  I
> > will
> > > > take
> > > > > > eal.c
> > > > > > > > as an example
> > > > > > > >
> > > > > > > > eal.c is split into eal_common_sysfs.c eal_common_mem_cfg.c
> > > > > > > > eal_common_proc_type.c and eal_common_app_usage.c. In
> > > > > > linuxapp/eal/Makefile
> > > > > > > > if I compile new files right after eal.c as shown below
> > > > > > > >
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) := eal.c
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_sysfs.c
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_mem_cfg.c
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=
> > eal_common_proc_type.c
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=
> > eal_common_app_usage.c
> > > > > > > > ...
> > > > > > > >
> > > > > > > > validate-abi results matches baseline. Instead if i place new
> > > > _common_
> > > > > > > > files in common area in linuxapp/eal/Makefile as shown below
> > > > > > > >
> > > > > > > > # from common dir
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_memzone.c
> > > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_log.c
> > > > > > > > ...
> > > > > > > >
> > > > > > > > validate-abi reports problem in binary compatibility and source
> > > > > > > > compatiblity
> > > > > > > >
> > > > > > > > eal_filesystem.h, librte_eal.so.1
> > > > > > > >  [+] eal_parse_sysfs_value ( char const* filename, unsigned
> > long*
> > > > val )
> > > > > > > >  @@ DPDK_2.0 (2)
> > > > > > > >
> > > > > > > > I believe files in common and linuxapp directory are compiled
> > same
> > > > way
> > > > > > so
> > > > > > > > not sure why placement in makefile makes difference.
> > > > > > > >
> > > > > > > > Could this be false-positive from validate-abi script??
> > > > > > > >
> > > > > > > It could be, yes.  Though I'm more inclined to think that
> > perhaps in
> > > > the
> > > > > > new
> > > > > > > version of the code we're not generating ithe same dwarf
> > information
> > > > out
> > > > > > of it.
> > > > > > > In fact for some reason, I've checked both the build before and
> > after
> > > > > > your
> > > > > > > patch series, and the exported CFLAGS aren't getting passed to
> > the
> > > > build
> > > > > > > properly, implying that we're not building all the code in the
> > > > validator
> > > > > > with
> > > > > > > the -g flag, which the validator need to function properly.  I'm
> > > > looking
> > > > > > into
> > > > > > > that
> > > > > > > Neil
> > > > > > >
> > > > > > >
> > > > > > Found the problem, I was stupidly reading the report incorrectly.
> > The
> > > > > > problem
> > > > > > regarding _lcore_id is a source compatibilty issue (because the
> > symbol
> > > > > > moved to
> > > > > > a new location), which is irrelevant to us.  Its not in any way a
> > > > binary
> > > > > > compat
> > > > > > problem, which is what we care about.  Sorry for the noise.
> > > > > >
> > > > > > I do still have a few concerns about some changed calling
> > conventions
> > > > with
> > > > > > a few
> > > > > > other functions, which I'll look into on monday.
> > > > > >
> > > > > >
> > > > > Please let me know your inputs on changed calling conventions. Most
> > of
> > > > them
> > > > > can be fixed by re-arranging moved code in _common_ files and order
> > of
> > > > > compilation.
> > > > >
> > > > If moving the order of compliation around fixes the problem, then I am
> > > > reasonably convinced that it is, if not a false positive, a minor issue
> > > > with the
> > > > compilers dwarf information (The compiler just can't sanely change the
> > > > location
> > > > in which parameters are passed).  If you make those changes, I'll ACK
> > > > them, and
> > > > look into whats going on with the calling conventions
> > > >
> > >
> > > Issues like the one shown below are taken care by reordering the code
> > > compilation.
> > >
> > > eal_parse_sysfs_value ( char const* filename, unsigned long* val )
> > >
> > > Change
> > > The parameter filename became passed on stack instead of rdi register
> > >
> > > Effect
> > > Violation of the calling convention. This may result in crash or
> > incorrect
> > > behavior of applications.
> > >
> > > Last one that is left out is in
> > >
> > > rte_thread_set_affinity ( rte_cpuset_t* p1 )
> > >
> > > Change
> > > The parameter *p1* became passed in *rdi* register instead of stack.
> > >
> > > Effect
> > > Violation of the calling convention. This may result in crash or
> > incorrect
> > > behavior of applications.
> > >
> > > After checking abi-0.99.pdf (x86-64.org) looks like for
> > > "rte_thread_set_affinity" new code is doing the right thing by passing
> > the
> > > parameter in "rdi" register since pointer is classified as
> > "integer_class".
> > > Nothing needs to be fixed here. After you confirm that warning can be
> > > ignored I will work on sending new revision.
> > >
> > ACK then, send the new revision, this appears to be a false positive.
> >
> > Thanks for taking the time to confirm.
> >
> 
> Thanks Neil. I have sent v8 which fixes ABI warnings. I have tested it with
> x86_64-native-linuxapp-gcc, x86_64-native-linuxapp-clang and
> x86_64-ivshmem-gcc targets. ABI results look fine to me.
> 
> I tried to run validate-abi.sh on BSD but ran into errors. If there is a
> way to check against BSD please let me know.
> 
The ABI checker should work on BSD as far as I know, since it only relies on
dwarf information in the output binary.  What errors are you seeing?

Neil

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] gmake test on freeBSD
       [not found]     <CAFb4SLBGcR1EHL5FkJ7r6-7mqWR9UJ7GLD2cm18SJ8AuoWu_Og@mail.gmail.com>
@ 2015-04-29  8:29  0% ` Bruce Richardson
  2015-04-29 17:58  0%   ` Ravi Kerur
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-04-29  8:29 UTC (permalink / raw)
  To: Ravi Kerur; +Cc: dev

On Tue, Apr 28, 2015 at 06:15:53PM -0700, Ravi Kerur wrote:
> DPDK team,
> 
> Is there a automated tests to run on freeBSD similar to Linux (make test).
> 
> I ran "gmake test T=x86_64-native-bsdapp-clang CC=clang" I get following
> output
> 
> /usr/home/rkerur/dpdk-validate-abi-1/dpdk/build/app/test -c f -n 4
> 
> Test name                      Test result                      Test
> Total
> ================================================================================
> Start group_1:                 Fail [Can't run]              [00m 00s]
> Timer autotest:                Fail [Can't run]              [00m 00s]
> Debug autotest:                Fail [Can't run]              [00m 00s]
> Errno autotest:                Fail [Can't run]              [00m 00s]
> Meter autotest:                Fail [Can't run]              [00m 00s]
> Common autotest:               Fail [Can't run]              [00m 00s]
> Dump log history:              Fail [Can't run]              [00m 00s]
> ...
> Start memcpy_perf:             Fail [No prompt]              [00m 00s]
> Memcpy performance autotest:   Fail [No prompt]              [00m 00s] [00m
> 01s]
> Start hash_perf:               Fail [No prompt]              [00m 00s]
> Hash performance autotest:     Fail [No prompt]              [00m 00s] [00m
> 01s]
> Start power:                   Fail [No prompt]              [00m 00s]
> Power autotest:                Fail [No prompt]              [00m 00s] [00m
> 01s]
> ...
> 
> I have contigmem and nic_uio installed. I know some applications are
> linuxapp specific but wanted to know if there is a similar automated test
> tool like Linux?
> 
> Thanks,
> Ravi

There is no separate test tool for FreeBSD. Unfortunately there are a number of little
things that don't really work on FreeBSD - and this looks to be one of them. We
probably need to look to fix this.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] pcap: utilize underlying real interface properties
@ 2015-04-29  0:30  2% Nicolás Pernas Maradei
  0 siblings, 0 replies; 200+ results
From: Nicolás Pernas Maradei @ 2015-04-29  0:30 UTC (permalink / raw)
  To: dev, tero.aho

Hi Tero,

Just a few comments on one of your patches - see inline comments below. Interesting features btw.

Nico.

-- 
Nicolás Pernas Maradei


On 27 February 2015 at 13:43:14, dev-request@dpdk.org (dev-request@dpdk.org) wrote:

Message: 5 
Date: Fri, 27 Feb 2015 15:42:38 +0200 
From: Tero Aho <tero.aho@coriant.com> 
To: <dev@dpdk.org> 
Subject: [dpdk-dev] [PATCH 1/3] pcap: utilize underlying real 
interface	properties 
Message-ID: <1425044560-23397-2-git-send-email-tero.aho@coriant.com> 
Content-Type: text/plain 

These changes set pcap interface mac address to the real underlying 
interface address instead of the default one. Also real interface link 
status, speed and duplex are reported when eth_link_update is called 
for the pcap interface. 

Signed-off-by: Tero Aho <tero.aho@coriant.com> 
--- 
lib/librte_pmd_pcap/rte_eth_pcap.c | 51 +++++++++++++++++++++++++++++++++++--- 
1 file changed, 47 insertions(+), 4 deletions(-) 

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c b/lib/librte_pmd_pcap/rte_eth_pcap.c 
index 5e94930..289af28 100644 
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c 
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c 
@@ -43,6 +43,11 @@ 
#include <rte_dev.h> 

#include <net/if.h> 
+#include <sys/socket.h> 
+#include <sys/ioctl.h> 
+#include <string.h> 
+#include <linux/ethtool.h> 
+#include <linux/sockios.h> 

#include <pcap.h> 

@@ -102,6 +107,8 @@ struct pmd_internals { 
unsigned nb_tx_queues; 
int if_index; 
int single_iface; 
+ const char *if_name; 
+ int if_fd; 
}; 

const char *valid_arguments[] = { 
@@ -451,6 +458,26 @@ static int 
eth_link_update(struct rte_eth_dev *dev __rte_unused, 
*dev is being used. Remove __rte_unused


int wait_to_complete __rte_unused) 
{ 
+ struct ifreq ifr; 
+ struct ethtool_cmd cmd; 
+ struct pmd_internals *internals = dev->data->dev_private; 
+ 
+ if (internals->if_name && (internals->if_fd != -1)) { 
+ /* get link status, speed and duplex from the underlying interface */ 
+ 
+ strncpy(ifr.ifr_name, internals->if_name, sizeof(ifr.ifr_name)-1); 
+ ifr.ifr_name[sizeof(ifr.ifr_name)-1] = 0; 
Use snprintf(ifr.ifr_name, sizeof(ifr.ifr_name), “%s”, internals->if_name) instead. It’s safer and cleaner.


+ if (!ioctl(internals->if_fd, SIOCGIFFLAGS, &ifr)) 
+ dev->data->dev_link.link_status = (ifr.ifr_flags & IFF_UP) ? 1 : 0; 
+ 
+ cmd.cmd = ETHTOOL_GSET; 
+ ifr.ifr_data = (void *)&cmd; 
+ if (!ioctl(internals->if_fd, SIOCETHTOOL, &ifr)) { 
+ dev->data->dev_link.link_speed = ethtool_cmd_speed(&cmd); 
+ dev->data->dev_link.link_duplex = 
+ cmd.duplex ? ETH_LINK_FULL_DUPLEX : ETH_LINK_HALF_DUPLEX; 
+ } 
+ } 
return 0; 
} 

@@ -736,11 +763,24 @@ rte_pmd_init_internals(const char *name, const unsigned nb_rx_queues, 
(*internals)->nb_rx_queues = nb_rx_queues; 
(*internals)->nb_tx_queues = nb_tx_queues; 

- if (pair == NULL) 
+ if (pair == NULL) { 
(*internals)->if_index = 0; 
- else 
+ } else { 
+ /* use real inteface mac addr, save name and fd for eth_link_update */ 
(*internals)->if_index = if_nametoindex(pair->value); 
+ (*internals)->if_name = strdup(pair->value); 
+ (*internals)->if_fd = socket(AF_INET, SOCK_DGRAM, 0); 
I see you are using a socket and ioctl calls to get the info you need from the interface. I’m not a big fan of opening a socket at this point just to get some parameters of the NIC. I’d rather reading those from sysfs. Is there a reason why you’d prefer to open a socket?

These would be the files you’d need to open and read to the get the info you are looking for. 

# cat /sys/class/net/eth0/address
# cat /sys/class/net/eth0/duplex
# cat /sys/class/net/eth0/speed

In my opinion the code would be cleaner doing it this way. DPDK already manipulates sysfs in other places too. 
What do you think?


+ if ((*internals)->if_fd != -1) { 
+ struct ifreq ifr; 
+ strncpy(ifr.ifr_name, pair->value, sizeof(ifr.ifr_name)-1); 
+ ifr.ifr_name[sizeof(ifr.ifr_name)-1] = 0; 
Use snprintf() like before.


+ if (!ioctl((*internals)->if_fd, SIOCGIFHWADDR, &ifr)) { 
+ data->mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN, 0, numa_node); 
+ if (data->mac_addrs) 
+ rte_memcpy(data->mac_addrs, ifr.ifr_addr.sa_data, ETHER_ADDR_LEN); 
+ } 
+ } 
+ } 
pci_dev->numa_node = numa_node; 

data->dev_private = *internals; 
@@ -749,7 +789,8 @@ rte_pmd_init_internals(const char *name, const unsigned nb_rx_queues, 
data->nb_rx_queues = (uint16_t)nb_rx_queues; 
data->nb_tx_queues = (uint16_t)nb_tx_queues; 
data->dev_link = pmd_link; 
- data->mac_addrs = &eth_addr; 
+ if (data->mac_addrs == NULL) 
+ data->mac_addrs = &eth_addr; 
strncpy(data->name, 
(*eth_dev)->data->name, strlen((*eth_dev)->data->name)); 

@@ -758,6 +799,8 @@ rte_pmd_init_internals(const char *name, const unsigned nb_rx_queues, 
(*eth_dev)->pci_dev = pci_dev; 
(*eth_dev)->driver = &rte_pcap_pmd; 

+ eth_link_update((*eth_dev), 0); 
+ 
return 0; 

error: if (data) 
-- 
1.9.1 


============================================================ 
The information contained in this message may be privileged 
and confidential and protected from disclosure. If the reader 
of this message is not the intended recipient, or an employee 
or agent responsible for delivering this message to the 
intended recipient, you are hereby notified that any reproduction, 
dissemination or distribution of this communication is strictly 
prohibited. If you have received this communication in error, 
please notify us immediately by replying to the message and 
deleting it from your computer. Thank you. Coriant-Tellabs 
============================================================ 


------------------------------ 

Subject: Digest Footer 

_______________________________________________ 
dev mailing list 
dev@dpdk.org 
http://dpdk.org/ml/listinfo/dev 


------------------------------ 

End of dev Digest, Vol 29, Issue 99 
*********************************** 
From rkerur@gmail.com  Wed Apr 29 03:15:55 2015
Return-Path: <rkerur@gmail.com>
Received: from mail-ob0-f177.google.com (mail-ob0-f177.google.com
 [209.85.214.177]) by dpdk.org (Postfix) with ESMTP id 5986EC740
 for <dev@dpdk.org>; Wed, 29 Apr 2015 03:15:55 +0200 (CEST)
Received: by obcux3 with SMTP id ux3so9679357obc.2
 for <dev@dpdk.org>; Tue, 28 Apr 2015 18:15:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s 120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=MZ4S6lBJpl67DyjGKoevJH7hj1P2/ahlV1nfxWJ6CPw=;
 b=MlHdvUs4IlMw5URRnd8FCPs1PxkZyuqSk6Wem3+AbXe0sX3eOHd3SDgmk2/oSL9TPc
 Lnrs+5BGw+bS9WPki9UN2tg6QokPszOxpfxfodEpqa0PgWFeRZbRrqE5fEwJbcFNkeyD
 E1SatooWejmZst7JVyACc+lnno8tm4KVhebLtnX7uCsUzrRZ4cTIwY/S26NhCSpUzDjO
 HU4BQgzDMMShyYavYUeK8TIaRfVz3EnQs1yBeEJFNZrDvu8fucl4CKGofmJHu94a1+cs
 eIO0DM1sqnBQstBzuVH21PkmBmn2f2CgZ04G0HzmanmWvwhYi+fEnskb12ZJ1fnE/Xs2
 eH0w=MIME-Version: 1.0
X-Received: by 10.182.39.168 with SMTP id q8mr16869793obk.23.1430270153684;
 Tue, 28 Apr 2015 18:15:53 -0700 (PDT)
Received: by 10.202.179.195 with HTTP; Tue, 28 Apr 2015 18:15:53 -0700 (PDT)
Date: Tue, 28 Apr 2015 18:15:53 -0700
Message-ID: <CAFb4SLBGcR1EHL5FkJ7r6-7mqWR9UJ7GLD2cm18SJ8AuoWu_Og@mail.gmail.com>
From: Ravi Kerur <rkerur@gmail.com>
To: "dev@dpdk.org" <dev@dpdk.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: [dpdk-dev] gmake test on freeBSD
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Apr 2015 01:15:55 -0000

DPDK team,

Is there a automated tests to run on freeBSD similar to Linux (make test).

I ran "gmake test T=x86_64-native-bsdapp-clang CC=clang" I get following
output

/usr/home/rkerur/dpdk-validate-abi-1/dpdk/build/app/test -c f -n 4

Test name                      Test result                      Test
Total
===============================================================================Start group_1:                 Fail [Can't run]              [00m 00s]
Timer autotest:                Fail [Can't run]              [00m 00s]
Debug autotest:                Fail [Can't run]              [00m 00s]
Errno autotest:                Fail [Can't run]              [00m 00s]
Meter autotest:                Fail [Can't run]              [00m 00s]
Common autotest:               Fail [Can't run]              [00m 00s]
Dump log history:              Fail [Can't run]              [00m 00s]
...
Start memcpy_perf:             Fail [No prompt]              [00m 00s]
Memcpy performance autotest:   Fail [No prompt]              [00m 00s] [00m
01s]
Start hash_perf:               Fail [No prompt]              [00m 00s]
Hash performance autotest:     Fail [No prompt]              [00m 00s] [00m
01s]
Start power:                   Fail [No prompt]              [00m 00s]
Power autotest:                Fail [No prompt]              [00m 00s] [00m
01s]
...

I have contigmem and nic_uio installed. I know some applications are
linuxapp specific but wanted to know if there is a similar automated test
tool like Linux?

Thanks,
Ravi

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-28 19:35  0%                       ` Neil Horman
@ 2015-04-28 23:52  4%                         ` Ravi Kerur
  2015-04-29 10:04  3%                           ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ravi Kerur @ 2015-04-28 23:52 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Tue, Apr 28, 2015 at 12:35 PM, Neil Horman <nhorman@tuxdriver.com> wrote:

> On Mon, Apr 27, 2015 at 03:39:41PM -0700, Ravi Kerur wrote:
> > On Mon, Apr 27, 2015 at 6:44 AM, Neil Horman <nhorman@tuxdriver.com>
> wrote:
> >
> > > On Sat, Apr 25, 2015 at 05:09:01PM -0700, Ravi Kerur wrote:
> > > > On Sat, Apr 25, 2015 at 6:02 AM, Neil Horman <nhorman@tuxdriver.com>
> > > wrote:
> > > >
> > > > > On Sat, Apr 25, 2015 at 08:32:42AM -0400, Neil Horman wrote:
> > > > > > On Fri, Apr 24, 2015 at 06:45:06PM -0700, Ravi Kerur wrote:
> > > > > > > On Fri, Apr 24, 2015 at 2:24 PM, Ravi Kerur <rkerur@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Apr 24, 2015 at 12:51 PM, Neil Horman <
> > > nhorman@tuxdriver.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> On Fri, Apr 24, 2015 at 12:21:23PM -0700, Ravi Kerur wrote:
> > > > > > > >> > On Fri, Apr 24, 2015 at 11:53 AM, Neil Horman <
> > > > > nhorman@tuxdriver.com>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > > On Fri, Apr 24, 2015 at 09:45:24AM -0700, Ravi Kerur
> wrote:
> > > > > > > >> > > > On Fri, Apr 24, 2015 at 8:22 AM, Neil Horman <
> > > > > nhorman@tuxdriver.com
> > > > > > > >> >
> > > > > > > >> > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > On Fri, Apr 24, 2015 at 08:14:04AM -0700, Ravi Kerur
> > > wrote:
> > > > > > > >> > > > > > On Fri, Apr 24, 2015 at 6:51 AM, Neil Horman <
> > > > > > > >> nhorman@tuxdriver.com>
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > > On Thu, Apr 23, 2015 at 02:35:31PM -0700, Ravi
> Kerur
> > > > > wrote:
> > > > > > > >> > > > > > > > Changes in v7
> > > > > > > >> > > > > > > > Remove _setname_ pthread calls.
> > > > > > > >> > > > > > > > Use rte_gettid() API in RTE_LOG to print
> > > thread_id.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Changes in v6
> > > > > > > >> > > > > > > > Remove RTE_EXEC_ENV_BSDAPP from
> > > eal_common_thread.c
> > > > > file.
> > > > > > > >> > > > > > > > Add pthread_setname_np/pthread_set_name_np for
> > > > > Linux/FreeBSD
> > > > > > > >> > > > > > > > respectively. Plan to use _getname_ in RTE_LOG
> > > when
> > > > > > > >> available.
> > > > > > > >> > > > > > > > Use existing rte_get_systid() in RTE_LOG to
> print
> > > > > thread_id.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Changes in v5
> > > > > > > >> > > > > > > > Rebase to latest code.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Changes in v4
> > > > > > > >> > > > > > > > None
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Changes in v3
> > > > > > > >> > > > > > > > Changed subject to be more explicit on file
> name
> > > > > inclusion.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Changes in v2
> > > > > > > >> > > > > > > > None
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Changes in v1
> > > > > > > >> > > > > > > > eal_thread.c has minor differences between
> Linux
> > > and
> > > > > BSD,
> > > > > > > >> move
> > > > > > > >> > > > > > > > entire file into common directory.
> > > > > > > >> > > > > > > > Use RTE_EXEC_ENV_BSDAPP to differentiate on
> minor
> > > > > > > >> differences.
> > > > > > > >> > > > > > > > Rename eal_thread.c to eal_common_thread.c
> > > > > > > >> > > > > > > > Makefile changes to reflect file move and name
> > > change.
> > > > > > > >> > > > > > > > Fix checkpatch warnings.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Signed-off-by: Ravi Kerur <rkerur@gmail.com>
> > > > > > > >> > > > > > > > ---
> > > > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/Makefile        |
>  2
> > > +-
> > > > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/eal_thread.c    |
> 152
> > > > > > > >> > > > > > > ------------------------------
> > > > > > > >> > > > > > > >  lib/librte_eal/common/eal_common_thread.c |
> 147
> > > > > > > >> > > > > > > ++++++++++++++++++++++++++++-
> > > > > > > >> > > > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  |
> 152
> > > > > > > >> > > > > > > +-----------------------------
> > > > > > > >> > > > > > > >  4 files changed, 148 insertions(+), 305
> > > deletions(-)
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > diff --git
> a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > >> > > > > > > > index 2357cfa..55971b9 100644
> > > > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > > >> > > > > > > > @@ -87,7 +87,7 @@ CFLAGS_eal_common_log.o :=
> > > > > -D_GNU_SOURCE
> > > > > > > >> > > > > > > >  # workaround for a gcc bug with noreturn
> > > attribute
> > > > > > > >> > > > > > > >  #
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> > > > > > > >> > > > > > > >  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > > > > > > >> > > > > > > > -CFLAGS_eal_thread.o += -Wno-return-type
> > > > > > > >> > > > > > > > +CFLAGS_eal_common_thread.o +=
> -Wno-return-type
> > > > > > > >> > > > > > > >  CFLAGS_eal_hpet.o += -Wno-return-type
> > > > > > > >> > > > > > > >  endif
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > diff --git
> > > a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > >> > > > > > > > index 9a03437..5714b8f 100644
> > > > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > > >> > > > > > > > @@ -35,163 +35,11 @@
> > > > > > > >> > > > > > > >  #include <stdio.h>
> > > > > > > >> > > > > > > >  #include <stdlib.h>
> > > > > > > >> > > > > > > >  #include <stdint.h>
> > > > > > > >> > > > > > > > -#include <unistd.h>
> > > > > > > >> > > > > > > > -#include <sched.h>
> > > > > > > >> > > > > > > > -#include <pthread_np.h>
> > > > > > > >> > > > > > > > -#include <sys/queue.h>
> > > > > > > >> > > > > > > >  #include <sys/thr.h>
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > -#include <rte_debug.h>
> > > > > > > >> > > > > > > > -#include <rte_atomic.h>
> > > > > > > >> > > > > > > > -#include <rte_launch.h>
> > > > > > > >> > > > > > > > -#include <rte_log.h>
> > > > > > > >> > > > > > > > -#include <rte_memory.h>
> > > > > > > >> > > > > > > > -#include <rte_memzone.h>
> > > > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > > > >> > > > > > > > -#include <rte_eal.h>
> > > > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > > > >> > > > > > > > -#include <rte_lcore.h>
> > > > > > > >> > > > > > > > -
> > > > > > > >> > > > > > > >  #include "eal_private.h"
> > > > > > > >> > > > > > > >  #include "eal_thread.h"
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > > > > LCORE_ID_ANY;
> > > > > > > >> > > > > > > NAK, these are exported symbols, you can't
> remove
> > > them
> > > > > without
> > > > > > > >> > > going
> > > > > > > >> > > > > > > through the
> > > > > > > >> > > > > > > deprecation process.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > They are not removed/deleted, they are moved from
> > > > > eal_thread.c
> > > > > > > >> to
> > > > > > > >> > > > > > eal_common_thread.c file since it is common to
> both
> > > Linux
> > > > > and
> > > > > > > >> BSD.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > Then perhaps you forgot to export the symbol?  Its
> > > showing
> > > > > up as
> > > > > > > >> > > removed
> > > > > > > >> > > > > on the
> > > > > > > >> > > > > ABI checker utility.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Neil
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > > > Can you please show me in the current code where it is
> > > being
> > > > > > > >> exported? I
> > > > > > > >> > > > have only moved definitions to _common_ files, not
> sure
> > > why it
> > > > > > > >> should be
> > > > > > > >> > > > exported now.  I searched in the current code for
> > > > > > > >> RTE_DEFINE_PER_LCORE
> > > > > > > >> > > >
> > > > > > > >> > > > #home/rkerur/dpdk-tmp/dpdk# grep -ir
> RTE_DEFINE_PER_LCORE
> > > *
> > > > > > > >> > > > app/test/test_per_lcore.c:static
> > > > > RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > >> test) =
> > > > > > > >> > > > 0x12345678;
> > > > > > > >> > > >
> > > > > > > >>
> > > > >
> lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > > > >> > > >
> > > > > > > >>
> > > > >
> lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >>
> > > > >
> > >
> lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > > > >> > > > _cpuset);
> > > > > > > >> > > >
> > > > > > > >>
> > > > >
> lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > > > >> > > >
> > > > > > > >>
> > > > >
> lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > > > >> > > >
> > > > > > > >>
> > > > >
> > >
> lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > > > >> > > > _cpuset);
> > > > > > > >> > > > lib/librte_eal/common/include/rte_per_lcore.h:#define
> > > > > > > >> > > > RTE_DEFINE_PER_LCORE(type, name)            \
> > > > > > > >> > > > lib/librte_eal/common/include/rte_eal.h:    static
> > > > > > > >> > > > RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
> > > > > > > >> > > >
> > > > > lib/librte_eal/common/eal_common_errno.c:RTE_DEFINE_PER_LCORE(int,
> > > > > > > >> > > > _rte_errno);
> > > > > > > >> > > > lib/librte_eal/common/eal_common_errno.c:    static
> > > > > > > >> > > > RTE_DEFINE_PER_LCORE(char[RETVAL_SZ], retval);
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > > > Thanks
> > > > > > > >> > > > > > Ravi
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Regards
> > > > > > > >> > > > > > > Neil
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > Its exported in the version map file:
> > > > > > > >> > >  per_lcore__lcore_id;
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > Thanks Neil, I checked and both linux and bsd
> > > rte_eal_version.map
> > > > > have
> > > > > > > >> it.
> > > > > > > >> > I compared .map file between "changed code" and the
> original,
> > > > > they are
> > > > > > > >> same
> > > > > > > >> > for both linux and bsd. In fact you had ACK'd v4 version
> of
> > > this
> > > > > patch
> > > > > > > >> > series and no major changes after that. Please let me
> know if
> > > I
> > > > > missed
> > > > > > > >> > something.
> > > > > > > >> >
> > > > > > > >> I did, and I'm retracting that, because I didn't think to
> check
> > > the
> > > > > ABI
> > > > > > > >> compatibility on this.  But I ran it throught the ABI
> checking
> > > > > script
> > > > > > > >> this and
> > > > > > > >> this error popped out.  You should run it as well, its in
> the
> > > > > scripts
> > > > > > > >> directory.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I see in your first patch you removed it and re-added it in
> the
> > > > > common
> > > > > > > >> section.
> > > > > > > >> But something about how its building is causing it to not
> show
> > > up
> > > > > as an
> > > > > > > >> exported
> > > > > > > >> symbol, which is problematic, as other applications are
> going to
> > > > > want
> > > > > > > >> access to
> > > > > > > >> it.
> > > > > > > >>
> > > > > > > >> It also possible that the ABI checker is throwing a false
> > > positive,
> > > > > but
> > > > > > > >> either
> > > > > > > >> way, it needs to be looked into prior to moving forward with
> > > this.
> > > > > > > >>
> > > > > > > >>
> > > > > > > > I did following things.
> > > > > > > >
> > > > > > > > Put a tag (v2.0.0-before-common-eal)  before EAL common
> functions
> > > > > changes
> > > > > > > > for commit (3c0c807038ad642f4be7deb9370293c39d12f029 net:
> remove
> > > > > unneeded
> > > > > > > > include)
> > > > > > > >
> > > > > > > > Put a tag (v2.0.0-common-eal) after EAL common functions
> changes
> > > for
> > > > > > > > commit (25737e5a7212630a7b5d8ca756860a062f403789 Move common
> > > > > functions in
> > > > > > > > eal_pci.c)
> > > > > > > >
> > > > > > > > Ran validate-abi against x86_64-native-linuxapp-gcc and
> > > > > > > >
> > > > > > > > v2.0.0-rc3 and v2.0.0-before-common-eal, html report for
> > > > > librte_eal.so
> > > > > > > > shows removed symbols for "per_lcore__cpuset"
> > > > > > > >
> > > > > > > > v2.0.0-rc3 and v2.0.0-common-eal, html report for
> librte_eal.so
> > > shows
> > > > > > > > removed symbols for "per_lcore__cpuset"
> > > > > > > >
> > > > > > > > Removed symbol is different from what you have reported and
> in my
> > > > > case I
> > > > > > > > see it even before my commit. If you are interested I can
> unicast
> > > > > you html
> > > > > > > > report file. Please let me know how to proceed.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > I did some experiment and found some interesting things.  I
> will
> > > take
> > > > > eal.c
> > > > > > > as an example
> > > > > > >
> > > > > > > eal.c is split into eal_common_sysfs.c eal_common_mem_cfg.c
> > > > > > > eal_common_proc_type.c and eal_common_app_usage.c. In
> > > > > linuxapp/eal/Makefile
> > > > > > > if I compile new files right after eal.c as shown below
> > > > > > >
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) := eal.c
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_sysfs.c
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_mem_cfg.c
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=
> eal_common_proc_type.c
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=
> eal_common_app_usage.c
> > > > > > > ...
> > > > > > >
> > > > > > > validate-abi results matches baseline. Instead if i place new
> > > _common_
> > > > > > > files in common area in linuxapp/eal/Makefile as shown below
> > > > > > >
> > > > > > > # from common dir
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_memzone.c
> > > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_log.c
> > > > > > > ...
> > > > > > >
> > > > > > > validate-abi reports problem in binary compatibility and source
> > > > > > > compatiblity
> > > > > > >
> > > > > > > eal_filesystem.h, librte_eal.so.1
> > > > > > >  [+] eal_parse_sysfs_value ( char const* filename, unsigned
> long*
> > > val )
> > > > > > >  @@ DPDK_2.0 (2)
> > > > > > >
> > > > > > > I believe files in common and linuxapp directory are compiled
> same
> > > way
> > > > > so
> > > > > > > not sure why placement in makefile makes difference.
> > > > > > >
> > > > > > > Could this be false-positive from validate-abi script??
> > > > > > >
> > > > > > It could be, yes.  Though I'm more inclined to think that
> perhaps in
> > > the
> > > > > new
> > > > > > version of the code we're not generating ithe same dwarf
> information
> > > out
> > > > > of it.
> > > > > > In fact for some reason, I've checked both the build before and
> after
> > > > > your
> > > > > > patch series, and the exported CFLAGS aren't getting passed to
> the
> > > build
> > > > > > properly, implying that we're not building all the code in the
> > > validator
> > > > > with
> > > > > > the -g flag, which the validator need to function properly.  I'm
> > > looking
> > > > > into
> > > > > > that
> > > > > > Neil
> > > > > >
> > > > > >
> > > > > Found the problem, I was stupidly reading the report incorrectly.
> The
> > > > > problem
> > > > > regarding _lcore_id is a source compatibilty issue (because the
> symbol
> > > > > moved to
> > > > > a new location), which is irrelevant to us.  Its not in any way a
> > > binary
> > > > > compat
> > > > > problem, which is what we care about.  Sorry for the noise.
> > > > >
> > > > > I do still have a few concerns about some changed calling
> conventions
> > > with
> > > > > a few
> > > > > other functions, which I'll look into on monday.
> > > > >
> > > > >
> > > > Please let me know your inputs on changed calling conventions. Most
> of
> > > them
> > > > can be fixed by re-arranging moved code in _common_ files and order
> of
> > > > compilation.
> > > >
> > > If moving the order of compliation around fixes the problem, then I am
> > > reasonably convinced that it is, if not a false positive, a minor issue
> > > with the
> > > compilers dwarf information (The compiler just can't sanely change the
> > > location
> > > in which parameters are passed).  If you make those changes, I'll ACK
> > > them, and
> > > look into whats going on with the calling conventions
> > >
> >
> > Issues like the one shown below are taken care by reordering the code
> > compilation.
> >
> > eal_parse_sysfs_value ( char const* filename, unsigned long* val )
> >
> > Change
> > The parameter filename became passed on stack instead of rdi register
> >
> > Effect
> > Violation of the calling convention. This may result in crash or
> incorrect
> > behavior of applications.
> >
> > Last one that is left out is in
> >
> > rte_thread_set_affinity ( rte_cpuset_t* p1 )
> >
> > Change
> > The parameter *p1* became passed in *rdi* register instead of stack.
> >
> > Effect
> > Violation of the calling convention. This may result in crash or
> incorrect
> > behavior of applications.
> >
> > After checking abi-0.99.pdf (x86-64.org) looks like for
> > "rte_thread_set_affinity" new code is doing the right thing by passing
> the
> > parameter in "rdi" register since pointer is classified as
> "integer_class".
> > Nothing needs to be fixed here. After you confirm that warning can be
> > ignored I will work on sending new revision.
> >
> ACK then, send the new revision, this appears to be a false positive.
>
> Thanks for taking the time to confirm.
>

Thanks Neil. I have sent v8 which fixes ABI warnings. I have tested it with
x86_64-native-linuxapp-gcc, x86_64-native-linuxapp-clang and
x86_64-ivshmem-gcc targets. ABI results look fine to me.

I tried to run validate-abi.sh on BSD but ran into errors. If there is a
way to check against BSD please let me know.

>
> Best
> Neil
>
> > Thanks,
> > Ravi
> >
> >
> > > Thanks!
> > > Neil
> > >
> > > > Thanks,
> > > > Ravi
> > > >
> > > > Regards
> > > > > Neil
> > > > >
> > > > >
> > >
>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 2/6] Move common functions in eal.c
  2015-04-28 23:46  2% ` [dpdk-dev] [PATCH v8 1/6] Move common functions in eal_thread.c Ravi Kerur
@ 2015-04-28 23:46  1%   ` Ravi Kerur
  0 siblings, 0 replies; 200+ results
From: Ravi Kerur @ 2015-04-28 23:46 UTC (permalink / raw)
  To: dev

Changes in v8
Fix ABI warnings by reordering compilation of
eal_common_sysfs.c
eal_common_mem_cfg.c
eal_common_proc_type.c
eal_common_app_usage.c

Changes in v7
Fix compilation errors in clang.

Changes in v6
Split eal_common_system.c and eal_common_runtime.c into
eal_common_sysfs.c
eal_common_mem_cfg.c
eal_common_proc_type.c
eal_common_app_usage.c
based on functionality.

Changes in v5
Rebase to latest code.

Changes in v4
Remove eal_externs.h file, instead use  _get_ and _set_ APIS
to access those variables.
Split eal_common.c into eal_common_system.c and
and eal_common_runtime.c
rte_eal prefix functions are moved to _runtime_ and
eal prefix functions are moved to _system_ files respectively.

Changes in v3
Changed subject to be more explicit on file name inclusion.

Changes in v2
In function rte_eal_config_create remove #ifdef _BSDAPP_
and initialize mem_cfg_addr unconditionally.

Changes in v1
Move common functions in eal.c to librte_eal/common/eal_common.c.

Following functions are moved to eal_common.c file.

struct rte_config *rte_eal_get_configuration(void);
int eal_parse_sysfs_value(const char *filename, unsigned long *val);
static void rte_eal_config_create(void);
enum rte_proc_type_t eal_proc_type_detect(void);
void rte_eal_config_init(void);
rte_usage_hook_t rte_set_application_usage_hook(rte_usage_hook_t
usage_func);
inline size_t eal_get_hugepage_mem_size(void);
void eal_check_mem_on_local_socket(void);
int sync_func(__attribute__((unused)) void *arg);
inline void rte_eal_mcfg_complete(void);
int rte_eal_has_hugepages(void);
enum rte_lcore_role_t rte_eal_lcore_role(unsigned lcore_id);
enum rte_proc_type_t rte_eal_process_type(void);

Makefile changes to reflect new files added.
Fix checkpatch warnings and errors.

Signed-off-by: Ravi Kerur <rkerur@gmail.com>
---
 lib/librte_eal/bsdapp/eal/Makefile           |   4 +
 lib/librte_eal/bsdapp/eal/eal.c              | 271 +++---------------------
 lib/librte_eal/common/eal_common_app_usage.c |  63 ++++++
 lib/librte_eal/common/eal_common_mem_cfg.c   | 224 ++++++++++++++++++++
 lib/librte_eal/common/eal_common_proc_type.c |  58 ++++++
 lib/librte_eal/common/eal_common_sysfs.c     | 148 ++++++++++++++
 lib/librte_eal/common/eal_hugepages.h        |   1 +
 lib/librte_eal/common/eal_private.h          |  78 +++++++
 lib/librte_eal/common/include/rte_eal.h      |   4 +
 lib/librte_eal/linuxapp/eal/Makefile         |   4 +
 lib/librte_eal/linuxapp/eal/eal.c            | 296 ++++-----------------------
 11 files changed, 660 insertions(+), 491 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_app_usage.c
 create mode 100644 lib/librte_eal/common/eal_common_mem_cfg.c
 create mode 100644 lib/librte_eal/common/eal_common_proc_type.c
 create mode 100644 lib/librte_eal/common/eal_common_sysfs.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index b7ca47c..67abc54 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -52,6 +52,10 @@ LIBABIVER := 1
 
 # specific to linuxapp exec-env
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) := eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_sysfs.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_mem_cfg.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_proc_type.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_app_usage.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_memory.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_hugepage_info.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_thread.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 43e8a47..a9b1f38 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -80,29 +80,6 @@
 #include "eal_hugepages.h"
 #include "eal_options.h"
 
-#define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
-
-/* Allow the application to print its usage message too if set */
-static rte_usage_hook_t	rte_application_usage_hook = NULL;
-/* early configuration structure, when memory config is not mmapped */
-static struct rte_mem_config early_mem_config;
-
-/* define fd variable here, because file needs to be kept open for the
- * duration of the program, as we hold a write lock on it in the primary proc */
-static int mem_cfg_fd = -1;
-
-static struct flock wr_lock = {
-		.l_type = F_WRLCK,
-		.l_whence = SEEK_SET,
-		.l_start = offsetof(struct rte_mem_config, memseg),
-		.l_len = sizeof(early_mem_config.memseg),
-};
-
-/* Address of global and public configuration */
-static struct rte_config rte_config = {
-		.mem_config = &early_mem_config,
-};
-
 /* internal configuration (per-core) */
 struct lcore_config lcore_config[RTE_MAX_LCORE];
 
@@ -112,160 +89,57 @@ struct internal_config internal_config;
 /* used by rte_rdtsc() */
 int rte_cycles_vmware_tsc_map;
 
-/* Return a pointer to the configuration structure */
-struct rte_config *
-rte_eal_get_configuration(void)
-{
-	return &rte_config;
-}
-
-/* parse a sysfs (or other) file containing one integer value */
-int
-eal_parse_sysfs_value(const char *filename, unsigned long *val)
-{
-	FILE *f;
-	char buf[BUFSIZ];
-	char *end = NULL;
-
-	if ((f = fopen(filename, "r")) == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-			__func__, filename);
-		return -1;
-	}
-
-	if (fgets(buf, sizeof(buf), f) == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-			__func__, filename);
-		fclose(f);
-		return -1;
-	}
-	*val = strtoul(buf, &end, 0);
-	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-				__func__, filename);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-	return 0;
-}
-
-
-/* create memory configuration in shared/mmap memory. Take out
- * a write lock on the memsegs, so we can auto-detect primary/secondary.
- * This means we never close the file while running (auto-close on exit).
- * We also don't lock the whole file, so that in future we can use read-locks
- * on other parts, e.g. memzones, to detect if there are running secondary
- * processes. */
-static void
-rte_eal_config_create(void)
+inline void *
+rte_eal_get_mem_cfg_addr(void)
 {
-	void *rte_mem_cfg_addr;
-	int retval;
-
-	const char *pathname = eal_runtime_config_path();
-
-	if (internal_config.no_shconf)
-		return;
-
-	if (mem_cfg_fd < 0){
-		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
-		if (mem_cfg_fd < 0)
-			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
-	}
-
-	retval = ftruncate(mem_cfg_fd, sizeof(*rte_config.mem_config));
-	if (retval < 0){
-		close(mem_cfg_fd);
-		rte_panic("Cannot resize '%s' for rte_mem_config\n", pathname);
-	}
-
-	retval = fcntl(mem_cfg_fd, F_SETLK, &wr_lock);
-	if (retval < 0){
-		close(mem_cfg_fd);
-		rte_exit(EXIT_FAILURE, "Cannot create lock on '%s'. Is another primary "
-				"process running?\n", pathname);
-	}
-
-	rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
-				PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 0);
-
-	if (rte_mem_cfg_addr == MAP_FAILED){
-		rte_panic("Cannot mmap memory for rte_config\n");
-	}
-	memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
-	rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+	return NULL;
 }
 
 /* attach to an existing shared memory config */
-static void
+void
 rte_eal_config_attach(void)
 {
-	void *rte_mem_cfg_addr;
+	struct rte_mem_config *mem_config;
+	struct rte_config *rte_config;
 	const char *pathname = eal_runtime_config_path();
+	int *mem_cfg_fd = eal_get_mem_cfg_fd();
 
 	if (internal_config.no_shconf)
 		return;
 
-	if (mem_cfg_fd < 0){
-		mem_cfg_fd = open(pathname, O_RDWR);
-		if (mem_cfg_fd < 0)
+	rte_config = rte_eal_get_configuration();
+	if (rte_config == NULL)
+		return;
+
+	if (*mem_cfg_fd < 0) {
+		*mem_cfg_fd = open(pathname, O_RDWR);
+		if (*mem_cfg_fd < 0)
 			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
 	}
 
-	rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
-				PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 0);
-	close(mem_cfg_fd);
-	if (rte_mem_cfg_addr == MAP_FAILED)
+	mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
+				PROT_READ | PROT_WRITE,
+				MAP_SHARED, *mem_cfg_fd, 0);
+	close(*mem_cfg_fd);
+	if (mem_config == MAP_FAILED)
 		rte_panic("Cannot mmap memory for rte_config\n");
 
-	rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
-}
-
-/* Detect if we are a primary or a secondary process */
-enum rte_proc_type_t
-eal_proc_type_detect(void)
-{
-	enum rte_proc_type_t ptype = RTE_PROC_PRIMARY;
-	const char *pathname = eal_runtime_config_path();
-
-	/* if we can open the file but not get a write-lock we are a secondary
-	 * process. NOTE: if we get a file handle back, we keep that open
-	 * and don't close it to prevent a race condition between multiple opens */
-	if (((mem_cfg_fd = open(pathname, O_RDWR)) >= 0) &&
-			(fcntl(mem_cfg_fd, F_SETLK, &wr_lock) < 0))
-		ptype = RTE_PROC_SECONDARY;
-
-	RTE_LOG(INFO, EAL, "Auto-detected process type: %s\n",
-			ptype == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
-
-	return ptype;
+	rte_config->mem_config = mem_config;
 }
 
-/* Sets up rte_config structure with the pointer to shared memory config.*/
-static void
-rte_config_init(void)
+/* NOP for BSD */
+void
+rte_eal_config_reattach(void)
 {
-	rte_config.process_type = internal_config.process_type;
-
-	switch (rte_config.process_type){
-	case RTE_PROC_PRIMARY:
-		rte_eal_config_create();
-		break;
-	case RTE_PROC_SECONDARY:
-		rte_eal_config_attach();
-		rte_eal_mcfg_wait_complete(rte_config.mem_config);
-		break;
-	case RTE_PROC_AUTO:
-	case RTE_PROC_INVALID:
-		rte_panic("Invalid process type\n");
-	}
 }
 
 /* display usage */
 static void
 eal_usage(const char *prgname)
 {
+	rte_usage_hook_t rte_application_usage_hook =
+		rte_get_application_usage_hook();
+
 	printf("\nUsage: %s ", prgname);
 	eal_common_usage();
 	/* Allow the application to print its usage message too if hook is set */
@@ -275,37 +149,6 @@ eal_usage(const char *prgname)
 	}
 }
 
-/* Set a per-application usage message */
-rte_usage_hook_t
-rte_set_application_usage_hook( rte_usage_hook_t usage_func )
-{
-	rte_usage_hook_t	old_func;
-
-	/* Will be NULL on the first call to denote the last usage routine. */
-	old_func					= rte_application_usage_hook;
-	rte_application_usage_hook	= usage_func;
-
-	return old_func;
-}
-
-static inline size_t
-eal_get_hugepage_mem_size(void)
-{
-	uint64_t size = 0;
-	unsigned i, j;
-
-	for (i = 0; i < internal_config.num_hugepage_sizes; i++) {
-		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
-		if (hpi->hugedir != NULL) {
-			for (j = 0; j < RTE_MAX_NUMA_NODES; j++) {
-				size += hpi->hugepage_sz * hpi->num_pages[j];
-			}
-		}
-	}
-
-	return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
-}
-
 /* Parse the argument given in the command line of the application */
 static int
 eal_parse_args(int argc, char **argv)
@@ -378,45 +221,6 @@ eal_parse_args(int argc, char **argv)
 	return ret;
 }
 
-static void
-eal_check_mem_on_local_socket(void)
-{
-	const struct rte_memseg *ms;
-	int i, socket_id;
-
-	socket_id = rte_lcore_to_socket_id(rte_config.master_lcore);
-
-	ms = rte_eal_get_physmem_layout();
-
-	for (i = 0; i < RTE_MAX_MEMSEG; i++)
-		if (ms[i].socket_id == socket_id &&
-				ms[i].len > 0)
-			return;
-
-	RTE_LOG(WARNING, EAL, "WARNING: Master core has no "
-			"memory on local socket!\n");
-}
-
-static int
-sync_func(__attribute__((unused)) void *arg)
-{
-	return 0;
-}
-
-inline static void
-rte_eal_mcfg_complete(void)
-{
-	/* ALL shared mem_config related INIT DONE */
-	if (rte_config.process_type == RTE_PROC_PRIMARY)
-		rte_config.mem_config->magic = RTE_MAGIC;
-}
-
-/* return non-zero if hugepages are enabled. */
-int rte_eal_has_hugepages(void)
-{
-	return !internal_config.no_hugetlbfs;
-}
-
 /* Abstraction for port I/0 privilege */
 int
 rte_eal_iopl_init(void)
@@ -437,8 +241,13 @@ rte_eal_init(int argc, char **argv)
 	int i, fctret, ret;
 	pthread_t thread_id;
 	static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
+	struct rte_config *rte_config;
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
 
+	rte_config = rte_eal_get_configuration();
+	if (rte_config == NULL)
+		return -1;
+
 	if (!rte_atomic32_test_and_set(&run_once))
 		return -1;
 
@@ -512,12 +321,12 @@ rte_eal_init(int argc, char **argv)
 
 	rte_eal_mcfg_complete();
 
-	eal_thread_init_master(rte_config.master_lcore);
+	eal_thread_init_master(rte_config->master_lcore);
 
 	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
 
 	RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s%s])\n",
-		rte_config.master_lcore, thread_id, cpuset,
+		rte_config->master_lcore, thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
 	if (rte_eal_dev_init() < 0)
@@ -556,17 +365,3 @@ rte_eal_init(int argc, char **argv)
 
 	return fctret;
 }
-
-/* get core role */
-enum rte_lcore_role_t
-rte_eal_lcore_role(unsigned lcore_id)
-{
-	return (rte_config.lcore_role[lcore_id]);
-}
-
-enum rte_proc_type_t
-rte_eal_process_type(void)
-{
-	return (rte_config.process_type);
-}
-
diff --git a/lib/librte_eal/common/eal_common_app_usage.c b/lib/librte_eal/common/eal_common_app_usage.c
new file mode 100644
index 0000000..5f64d35
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_app_usage.c
@@ -0,0 +1,63 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2014 6WIND S.A.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+
+#include "eal_private.h"
+
+/* Allow the application to print its usage message too if set */
+rte_usage_hook_t rte_application_usage_hook = NULL;
+
+/* Get per-application usage message */
+rte_usage_hook_t
+rte_get_application_usage_hook(void)
+{
+	return rte_application_usage_hook;
+}
+
+/* Set a per-application usage message */
+rte_usage_hook_t
+rte_set_application_usage_hook(rte_usage_hook_t usage_func)
+{
+	rte_usage_hook_t	old_func;
+
+	/* Will be NULL on the first call to denote the last usage routine. */
+	old_func	= rte_application_usage_hook;
+	rte_application_usage_hook	= usage_func;
+
+	return old_func;
+}
diff --git a/lib/librte_eal/common/eal_common_mem_cfg.c b/lib/librte_eal/common/eal_common_mem_cfg.c
new file mode 100644
index 0000000..c8bf218
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_mem_cfg.c
@@ -0,0 +1,224 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2014 6WIND S.A.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdarg.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <syslog.h>
+#include <getopt.h>
+#include <sys/file.h>
+#include <stddef.h>
+#include <errno.h>
+#include <limits.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <sys/queue.h>
+
+#include <rte_debug.h>
+#include <rte_eal_memconfig.h>
+#include <rte_log.h>
+
+#include "eal_private.h"
+#include "eal_thread.h"
+#include "eal_internal_cfg.h"
+#include "eal_filesystem.h"
+#include "eal_options.h"
+
+/* early configuration structure, when memory config is not mmapped */
+static struct rte_mem_config early_mem_config;
+
+/* define fd variable here, because file needs to be kept open for the
+ * duration of the program, as we hold a write lock on it in the primary proc */
+static int mem_cfg_fd = -1;
+
+static struct flock wr_lock = {
+		.l_type = F_WRLCK,
+		.l_whence = SEEK_SET,
+		.l_start = offsetof(struct rte_mem_config, memseg),
+		.l_len = sizeof(((struct rte_mem_config *)0)->memseg),
+};
+
+/* Address of global and public configuration */
+static struct rte_config rte_config = {
+		.mem_config = &early_mem_config,
+};
+
+/* Return a pointer to the configuration structure */
+struct rte_config *
+rte_eal_get_configuration(void)
+{
+	return &rte_config;
+}
+
+/* Return memory config file descriptor */
+int*
+eal_get_mem_cfg_fd(void)
+{
+	return &mem_cfg_fd;
+}
+
+/* get core role */
+enum rte_lcore_role_t
+rte_eal_lcore_role(unsigned lcore_id)
+{
+	return rte_config.lcore_role[lcore_id];
+}
+
+/* create memory configuration in shared/mmap memory. Take out
+ * a write lock on the memsegs, so we can auto-detect primary/secondary.
+ * This means we never close the file while running (auto-close on exit).
+ * We also don't lock the whole file, so that in future we can use read-locks
+ * on other parts, e.g. memzones, to detect if there are running secondary
+ * processes. */
+static void
+rte_eal_config_create(void)
+{
+	void *rte_mem_cfg_addr;
+	int retval;
+
+	const char *pathname = eal_runtime_config_path();
+
+	if (internal_config.no_shconf)
+		return;
+
+	rte_mem_cfg_addr = rte_eal_get_mem_cfg_addr();
+
+	if (mem_cfg_fd < 0) {
+		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
+		if (mem_cfg_fd < 0)
+			rte_panic("Cannot open '%s' for rte_mem_config\n",
+					pathname);
+	}
+
+	retval = eal_ftruncate_and_fcntl(sizeof(*rte_config.mem_config));
+
+	if (retval == -1) {
+		close(mem_cfg_fd);
+		rte_panic("Cannot resize '%s' for rte_mem_config\n", pathname);
+	} else if (retval == -2) {
+		close(mem_cfg_fd);
+		rte_exit(EXIT_FAILURE, "Cannot create lock on '%s'. "
+			"Is another primary process running?\n", pathname);
+	}
+
+	rte_mem_cfg_addr = mmap(rte_mem_cfg_addr,
+			sizeof(*rte_config.mem_config), PROT_READ | PROT_WRITE,
+			MAP_SHARED, mem_cfg_fd, 0);
+
+	if (rte_mem_cfg_addr == MAP_FAILED)
+		rte_panic("Cannot mmap memory for rte_config\n");
+
+	memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
+	rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+
+	/* store address of the config in the config itself so that secondary
+	 * processes could later map the config into this exact location
+	 */
+	rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+}
+
+/* Sets up rte_config structure with the pointer to shared memory config.*/
+void
+rte_config_init(void)
+{
+	rte_config.process_type = internal_config.process_type;
+
+	switch (rte_config.process_type) {
+	case RTE_PROC_PRIMARY:
+		rte_eal_config_create();
+		break;
+	case RTE_PROC_SECONDARY:
+		rte_eal_config_attach();
+		rte_eal_mcfg_wait_complete(rte_config.mem_config);
+		rte_eal_config_reattach();
+		break;
+	case RTE_PROC_AUTO:
+	case RTE_PROC_INVALID:
+		rte_panic("Invalid process type\n");
+	}
+}
+
+inline void
+rte_eal_mcfg_complete(void)
+{
+	/* ALL shared mem_config related INIT DONE */
+	if (rte_config.process_type == RTE_PROC_PRIMARY)
+		rte_config.mem_config->magic = RTE_MAGIC;
+}
+
+/* Detect if we are a primary or a secondary process */
+enum rte_proc_type_t
+eal_proc_type_detect(void)
+{
+	enum rte_proc_type_t ptype = RTE_PROC_PRIMARY;
+	const char *pathname = eal_runtime_config_path();
+
+	/* if we can open the file but not get a write-lock we are
+	 * a secondary process. NOTE: if we get a file handle back,
+	 * we keep that open and don't close it to prevent a race
+	 * condition between multiple opens
+	 */
+	mem_cfg_fd = open(pathname, O_RDWR);
+	if ((mem_cfg_fd >= 0) &&
+			(fcntl(mem_cfg_fd, F_SETLK, &wr_lock) < 0))
+		ptype = RTE_PROC_SECONDARY;
+
+	RTE_LOG(INFO, EAL, "Auto-detected process type: %s\n",
+			ptype == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
+
+	return ptype;
+}
+
+/*
+ * Perform ftruncate and fcntl operations on
+ * memory config file descriptor.
+ */
+int
+eal_ftruncate_and_fcntl(size_t size)
+{
+	int retval;
+
+	retval = ftruncate(mem_cfg_fd, size);
+	if (retval < 0)
+		return -1;
+
+	retval = fcntl(mem_cfg_fd, F_SETLK, &wr_lock);
+	if (retval < 0)
+		return -2;
+	return 0;
+}
diff --git a/lib/librte_eal/common/eal_common_proc_type.c b/lib/librte_eal/common/eal_common_proc_type.c
new file mode 100644
index 0000000..f8bb47f
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_proc_type.c
@@ -0,0 +1,58 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2014 6WIND S.A.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdarg.h>
+#include <unistd.h>
+
+#include "eal_private.h"
+
+#include <rte_log.h>
+
+enum rte_proc_type_t
+rte_eal_process_type(void)
+{
+	struct rte_config *rte_config =
+		rte_eal_get_configuration();
+
+	if (rte_config == NULL) {
+		RTE_LOG(WARNING, EAL, "WARNING: rte_config NULL!\n");
+		return RTE_PROC_INVALID;
+	}
+
+	return rte_config->process_type;
+}
diff --git a/lib/librte_eal/common/eal_common_sysfs.c b/lib/librte_eal/common/eal_common_sysfs.c
new file mode 100644
index 0000000..e4dcd55
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_sysfs.c
@@ -0,0 +1,148 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2014 6WIND S.A.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdarg.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <syslog.h>
+#include <getopt.h>
+#include <sys/file.h>
+#include <stddef.h>
+#include <errno.h>
+#include <limits.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <sys/queue.h>
+
+#include "eal_private.h"
+#include "eal_internal_cfg.h"
+#include "eal_filesystem.h"
+#include "eal_hugepages.h"
+#include "eal_options.h"
+
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+
+/* parse a sysfs (or other) file containing one integer value */
+int
+eal_parse_sysfs_value(const char *filename, unsigned long *val)
+{
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
+			__func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
+			__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoul(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
+
+inline size_t
+eal_get_hugepage_mem_size(void)
+{
+	uint64_t size = 0;
+	unsigned i, j;
+
+	for (i = 0; i < internal_config.num_hugepage_sizes; i++) {
+		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
+
+		if (hpi->hugedir != NULL) {
+			for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
+				size += hpi->hugepage_sz * hpi->num_pages[j];
+		}
+	}
+
+	return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
+}
+
+void
+eal_check_mem_on_local_socket(void)
+{
+	const struct rte_memseg *ms;
+	int i, socket_id;
+	struct rte_config *rte_config =
+		rte_eal_get_configuration();
+
+	if (rte_config == NULL) {
+		RTE_LOG(WARNING, EAL, "WARNING: rte_config NULL!\n");
+		return;
+	}
+
+	socket_id = rte_lcore_to_socket_id(rte_config->master_lcore);
+
+	ms = rte_eal_get_physmem_layout();
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++)
+		if (ms[i].socket_id == socket_id &&
+				ms[i].len > 0)
+			return;
+
+	RTE_LOG(WARNING, EAL, "WARNING: Master core has no "
+			"memory on local socket!\n");
+}
+
+int
+sync_func(__attribute__((unused)) void *arg)
+{
+	return 0;
+}
+
+/* return non-zero if hugepages are enabled. */
+int rte_eal_has_hugepages(void)
+{
+	return !internal_config.no_hugetlbfs;
+}
diff --git a/lib/librte_eal/common/eal_hugepages.h b/lib/librte_eal/common/eal_hugepages.h
index 38edac0..d79ef8a 100644
--- a/lib/librte_eal/common/eal_hugepages.h
+++ b/lib/librte_eal/common/eal_hugepages.h
@@ -63,5 +63,6 @@ struct hugepage_file {
  * for the EAL to use
  */
 int eal_hugepage_info_init(void);
+size_t eal_get_hugepage_mem_size(void);
 
 #endif /* EAL_HUGEPAGES_H */
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4acf5a0..bcf603f 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -36,6 +36,8 @@
 
 #include <stdio.h>
 
+#include <rte_eal.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -232,4 +234,80 @@ int rte_eal_dev_init(void);
  */
 int rte_eal_check_module(const char *module_name);
 
+/**
+ * This function sets up rte_config structure
+ *
+ * This function is private to the EAL.
+ */
+void rte_config_init(void);
+
+/**
+ * This function checks memory on local socket(NUMA)
+ *
+ * This function is private to the EAL.
+ */
+void eal_check_mem_on_local_socket(void);
+
+/**
+ * This function updates shared mem_config INIT DONE
+ *
+ * This function is private to the EAL.
+ */
+void rte_eal_mcfg_complete(void);
+
+/**
+ *
+ * This function is private to the EAL.
+ */
+int sync_func(__attribute__((unused)) void *arg);
+
+/**
+ *
+ * This function is private to the EAL.
+ */
+void *rte_eal_get_mem_cfg_addr(void);
+
+/**
+ * Return a pointer to the configuration structure
+ *
+ * This function is private to the EAL.
+ */
+struct rte_config *rte_eal_get_configuration(void);
+
+/**
+ * Return memory config file descriptor
+ *
+ * This function is private to the EAL.
+ */
+int *eal_get_mem_cfg_fd(void);
+
+/**
+ * Perform ftruncate and fcntl operations on
+ * memory config file descriptor.
+ *
+ * This function is private to the EAL.
+ */
+int eal_ftruncate_and_fcntl(size_t size);
+
+/**
+ * Get per-application usage message
+ *
+ * This function is private to the EAL.
+ */
+rte_usage_hook_t rte_get_application_usage_hook(void);
+
+/**
+ * This function attaches shared memory config
+ *
+ * This function is private to the EAL.
+ */
+void rte_eal_config_attach(void);
+
+/**
+ * This function reattaches shared memory config
+ *
+ * This function is private to the EAL.
+ */
+void rte_eal_config_reattach(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 1385a73..daf2ee0 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -51,6 +51,10 @@ extern "C" {
 
 #define RTE_MAGIC 19820526 /**< Magic number written by the main partition when ready. */
 
+#define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
+
+#define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
+
 /**
  * The lcore role (used in RTE or not).
  */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index f11ef59..8e872e0 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -55,6 +55,10 @@ CFLAGS += $(WERROR_FLAGS) -O3
 
 # specific to linuxapp exec-env
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) := eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_sysfs.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_mem_cfg.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_proc_type.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_app_usage.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_hugepage_info.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_memory.c
 ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index bd770cf..a7de8e0 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -84,13 +84,6 @@
 #include "eal_hugepages.h"
 #include "eal_options.h"
 
-#define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
-
-#define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
-
-/* Allow the application to print its usage message too if set */
-static rte_usage_hook_t	rte_application_usage_hook = NULL;
-
 TAILQ_HEAD(shared_driver_list, shared_driver);
 
 /* Definition for shared object drivers. */
@@ -105,25 +98,6 @@ struct shared_driver {
 static struct shared_driver_list solib_list =
 TAILQ_HEAD_INITIALIZER(solib_list);
 
-/* early configuration structure, when memory config is not mmapped */
-static struct rte_mem_config early_mem_config;
-
-/* define fd variable here, because file needs to be kept open for the
- * duration of the program, as we hold a write lock on it in the primary proc */
-static int mem_cfg_fd = -1;
-
-static struct flock wr_lock = {
-		.l_type = F_WRLCK,
-		.l_whence = SEEK_SET,
-		.l_start = offsetof(struct rte_mem_config, memseg),
-		.l_len = sizeof(early_mem_config.memseg),
-};
-
-/* Address of global and public configuration */
-static struct rte_config rte_config = {
-		.mem_config = &early_mem_config,
-};
-
 /* internal configuration (per-core) */
 struct lcore_config lcore_config[RTE_MAX_LCORE];
 
@@ -133,196 +107,85 @@ struct internal_config internal_config;
 /* used by rte_rdtsc() */
 int rte_cycles_vmware_tsc_map;
 
-/* Return a pointer to the configuration structure */
-struct rte_config *
-rte_eal_get_configuration(void)
-{
-	return &rte_config;
-}
-
-/* parse a sysfs (or other) file containing one integer value */
-int
-eal_parse_sysfs_value(const char *filename, unsigned long *val)
-{
-	FILE *f;
-	char buf[BUFSIZ];
-	char *end = NULL;
-
-	if ((f = fopen(filename, "r")) == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-			__func__, filename);
-		return -1;
-	}
-
-	if (fgets(buf, sizeof(buf), f) == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-			__func__, filename);
-		fclose(f);
-		return -1;
-	}
-	*val = strtoul(buf, &end, 0);
-	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-				__func__, filename);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-	return 0;
-}
-
-
-/* create memory configuration in shared/mmap memory. Take out
- * a write lock on the memsegs, so we can auto-detect primary/secondary.
- * This means we never close the file while running (auto-close on exit).
- * We also don't lock the whole file, so that in future we can use read-locks
- * on other parts, e.g. memzones, to detect if there are running secondary
- * processes. */
-static void
-rte_eal_config_create(void)
+inline void *
+rte_eal_get_mem_cfg_addr(void)
 {
-	void *rte_mem_cfg_addr;
-	int retval;
-
-	const char *pathname = eal_runtime_config_path();
-
-	if (internal_config.no_shconf)
-		return;
+	void *mem_cfg_addr;
 
-	/* map the config before hugepage address so that we don't waste a page */
 	if (internal_config.base_virtaddr != 0)
-		rte_mem_cfg_addr = (void *)
+		mem_cfg_addr = (void *)
 			RTE_ALIGN_FLOOR(internal_config.base_virtaddr -
 			sizeof(struct rte_mem_config), sysconf(_SC_PAGE_SIZE));
 	else
-		rte_mem_cfg_addr = NULL;
-
-	if (mem_cfg_fd < 0){
-		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
-		if (mem_cfg_fd < 0)
-			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
-	}
-
-	retval = ftruncate(mem_cfg_fd, sizeof(*rte_config.mem_config));
-	if (retval < 0){
-		close(mem_cfg_fd);
-		rte_panic("Cannot resize '%s' for rte_mem_config\n", pathname);
-	}
-
-	retval = fcntl(mem_cfg_fd, F_SETLK, &wr_lock);
-	if (retval < 0){
-		close(mem_cfg_fd);
-		rte_exit(EXIT_FAILURE, "Cannot create lock on '%s'. Is another primary "
-				"process running?\n", pathname);
-	}
-
-	rte_mem_cfg_addr = mmap(rte_mem_cfg_addr, sizeof(*rte_config.mem_config),
-				PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 0);
-
-	if (rte_mem_cfg_addr == MAP_FAILED){
-		rte_panic("Cannot mmap memory for rte_config\n");
-	}
-	memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
-	rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
-
-	/* store address of the config in the config itself so that secondary
-	 * processes could later map the config into this exact location */
-	rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+		mem_cfg_addr = NULL;
 
+	return mem_cfg_addr;
 }
 
 /* attach to an existing shared memory config */
-static void
+void
 rte_eal_config_attach(void)
 {
 	struct rte_mem_config *mem_config;
+	struct rte_config *rte_config;
+	int *mem_cfg_fd = eal_get_mem_cfg_fd();
 
 	const char *pathname = eal_runtime_config_path();
 
 	if (internal_config.no_shconf)
 		return;
 
-	if (mem_cfg_fd < 0){
-		mem_cfg_fd = open(pathname, O_RDWR);
-		if (mem_cfg_fd < 0)
+	rte_config = rte_eal_get_configuration();
+	if (rte_config == NULL)
+		return;
+
+	if (*mem_cfg_fd < 0) {
+		*mem_cfg_fd = open(pathname, O_RDWR);
+		if (*mem_cfg_fd < 0)
 			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
 	}
 
 	/* map it as read-only first */
 	mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
-			PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
+			PROT_READ, MAP_SHARED, *mem_cfg_fd, 0);
 	if (mem_config == MAP_FAILED)
 		rte_panic("Cannot mmap memory for rte_config\n");
 
-	rte_config.mem_config = mem_config;
+	rte_config->mem_config = mem_config;
 }
 
 /* reattach the shared config at exact memory location primary process has it */
-static void
+void
 rte_eal_config_reattach(void)
 {
 	struct rte_mem_config *mem_config;
 	void *rte_mem_cfg_addr;
+	struct rte_config *rte_config;
+	int *mem_cfg_fd = eal_get_mem_cfg_fd();
 
 	if (internal_config.no_shconf)
 		return;
 
+	rte_config = rte_eal_get_configuration();
+	if (rte_config == NULL)
+		return;
+
 	/* save the address primary process has mapped shared config to */
-	rte_mem_cfg_addr = (void *) (uintptr_t) rte_config.mem_config->mem_cfg_addr;
+	rte_mem_cfg_addr =
+		(void *) (uintptr_t) rte_config->mem_config->mem_cfg_addr;
 
 	/* unmap original config */
-	munmap(rte_config.mem_config, sizeof(struct rte_mem_config));
+	munmap(rte_config->mem_config, sizeof(struct rte_mem_config));
 
 	/* remap the config at proper address */
 	mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
-			sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
-			mem_cfg_fd, 0);
-	close(mem_cfg_fd);
+			sizeof(*mem_config), PROT_READ | PROT_WRITE,
+			MAP_SHARED, *mem_cfg_fd, 0);
+	close(*mem_cfg_fd);
 	if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
 		rte_panic("Cannot mmap memory for rte_config\n");
 
-	rte_config.mem_config = mem_config;
-}
-
-/* Detect if we are a primary or a secondary process */
-enum rte_proc_type_t
-eal_proc_type_detect(void)
-{
-	enum rte_proc_type_t ptype = RTE_PROC_PRIMARY;
-	const char *pathname = eal_runtime_config_path();
-
-	/* if we can open the file but not get a write-lock we are a secondary
-	 * process. NOTE: if we get a file handle back, we keep that open
-	 * and don't close it to prevent a race condition between multiple opens */
-	if (((mem_cfg_fd = open(pathname, O_RDWR)) >= 0) &&
-			(fcntl(mem_cfg_fd, F_SETLK, &wr_lock) < 0))
-		ptype = RTE_PROC_SECONDARY;
-
-	RTE_LOG(INFO, EAL, "Auto-detected process type: %s\n",
-			ptype == RTE_PROC_PRIMARY ? "PRIMARY" : "SECONDARY");
-
-	return ptype;
-}
-
-/* Sets up rte_config structure with the pointer to shared memory config.*/
-static void
-rte_config_init(void)
-{
-	rte_config.process_type = internal_config.process_type;
-
-	switch (rte_config.process_type){
-	case RTE_PROC_PRIMARY:
-		rte_eal_config_create();
-		break;
-	case RTE_PROC_SECONDARY:
-		rte_eal_config_attach();
-		rte_eal_mcfg_wait_complete(rte_config.mem_config);
-		rte_eal_config_reattach();
-		break;
-	case RTE_PROC_AUTO:
-	case RTE_PROC_INVALID:
-		rte_panic("Invalid process type\n");
-	}
+	rte_config->mem_config = mem_config;
 }
 
 /* Unlocks hugepage directories that were locked by eal_hugepage_info_init */
@@ -348,6 +211,9 @@ eal_hugedirs_unlock(void)
 static void
 eal_usage(const char *prgname)
 {
+	rte_usage_hook_t rte_application_usage_hook =
+		rte_get_application_usage_hook();
+
 	printf("\nUsage: %s ", prgname);
 	eal_common_usage();
 	printf("EAL Linux options:\n"
@@ -367,19 +233,6 @@ eal_usage(const char *prgname)
 	}
 }
 
-/* Set a per-application usage message */
-rte_usage_hook_t
-rte_set_application_usage_hook( rte_usage_hook_t usage_func )
-{
-	rte_usage_hook_t	old_func;
-
-	/* Will be NULL on the first call to denote the last usage routine. */
-	old_func					= rte_application_usage_hook;
-	rte_application_usage_hook	= usage_func;
-
-	return old_func;
-}
-
 static int
 eal_parse_socket_mem(char *socket_mem)
 {
@@ -481,24 +334,6 @@ eal_parse_vfio_intr(const char *mode)
 	return -1;
 }
 
-static inline size_t
-eal_get_hugepage_mem_size(void)
-{
-	uint64_t size = 0;
-	unsigned i, j;
-
-	for (i = 0; i < internal_config.num_hugepage_sizes; i++) {
-		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
-		if (hpi->hugedir != NULL) {
-			for (j = 0; j < RTE_MAX_NUMA_NODES; j++) {
-				size += hpi->hugepage_sz * hpi->num_pages[j];
-			}
-		}
-	}
-
-	return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
-}
-
 /* Parse the argument given in the command line of the application */
 static int
 eal_parse_args(int argc, char **argv)
@@ -645,39 +480,6 @@ eal_parse_args(int argc, char **argv)
 	return ret;
 }
 
-static void
-eal_check_mem_on_local_socket(void)
-{
-	const struct rte_memseg *ms;
-	int i, socket_id;
-
-	socket_id = rte_lcore_to_socket_id(rte_config.master_lcore);
-
-	ms = rte_eal_get_physmem_layout();
-
-	for (i = 0; i < RTE_MAX_MEMSEG; i++)
-		if (ms[i].socket_id == socket_id &&
-				ms[i].len > 0)
-			return;
-
-	RTE_LOG(WARNING, EAL, "WARNING: Master core has no "
-			"memory on local socket!\n");
-}
-
-static int
-sync_func(__attribute__((unused)) void *arg)
-{
-	return 0;
-}
-
-inline static void
-rte_eal_mcfg_complete(void)
-{
-	/* ALL shared mem_config related INIT DONE */
-	if (rte_config.process_type == RTE_PROC_PRIMARY)
-		rte_config.mem_config->magic = RTE_MAGIC;
-}
-
 /*
  * Request iopl privilege for all RPL, returns 0 on success
  * iopl() call is mostly for the i386 architecture. For other architectures,
@@ -706,6 +508,12 @@ rte_eal_init(int argc, char **argv)
 	const char *logid;
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
 
+	struct rte_config *rte_config;
+
+	rte_config = rte_eal_get_configuration();
+	if (rte_config == NULL)
+		return -1;
+
 	if (!rte_atomic32_test_and_set(&run_once))
 		return -1;
 
@@ -803,12 +611,12 @@ rte_eal_init(int argc, char **argv)
 			RTE_LOG(WARNING, EAL, "%s\n", dlerror());
 	}
 
-	eal_thread_init_master(rte_config.master_lcore);
+	eal_thread_init_master(rte_config->master_lcore);
 
 	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
 
 	RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%x;cpuset=[%s%s])\n",
-		rte_config.master_lcore, (int)thread_id, cpuset,
+		rte_config->master_lcore, (int)thread_id, cpuset,
 		ret == 0 ? "" : "...");
 
 	if (rte_eal_dev_init() < 0)
@@ -848,24 +656,6 @@ rte_eal_init(int argc, char **argv)
 	return fctret;
 }
 
-/* get core role */
-enum rte_lcore_role_t
-rte_eal_lcore_role(unsigned lcore_id)
-{
-	return (rte_config.lcore_role[lcore_id]);
-}
-
-enum rte_proc_type_t
-rte_eal_process_type(void)
-{
-	return (rte_config.process_type);
-}
-
-int rte_eal_has_hugepages(void)
-{
-	return ! internal_config.no_hugetlbfs;
-}
-
 int
 rte_eal_check_module(const char *module_name)
 {
-- 
1.9.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v8 1/6] Move common functions in eal_thread.c
  2015-04-28 23:46  4% [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL Ravi Kerur
@ 2015-04-28 23:46  2% ` Ravi Kerur
  2015-04-28 23:46  1%   ` [dpdk-dev] [PATCH v8 2/6] Move common functions in eal.c Ravi Kerur
  2015-04-29 10:14  0% ` [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL Neil Horman
  1 sibling, 1 reply; 200+ results
From: Ravi Kerur @ 2015-04-28 23:46 UTC (permalink / raw)
  To: dev

Changes in v8
Fixed ABI warnings by reordering compilation of
eal_common_thread.c.

Changes in v7
Remove _setname_ pthread calls.
Use rte_gettid() API in RTE_LOG to print thread_id.

Changes in v6
Remove RTE_EXEC_ENV_BSDAPP from eal_common_thread.c file.
Add pthread_setname_np/pthread_set_name_np for Linux/FreeBSD
respectively. Plan to use _getname_ in RTE_LOG when available.
Use existing rte_get_systid() in RTE_LOG to print thread_id.

Changes in v5
Rebase to latest code.

Changes in v4
None

Changes in v3
Changed subject to be more explicit on file name inclusion.

Changes in v2
None

Changes in v1
eal_thread.c has minor differences between Linux and BSD, move
entire file into common directory.
Use RTE_EXEC_ENV_BSDAPP to differentiate on minor differences.
Rename eal_thread.c to eal_common_thread.c
Makefile changes to reflect file move and name change.
Fix checkpatch warnings.

Signed-off-by: Ravi Kerur <rkerur@gmail.com>
---
 lib/librte_eal/bsdapp/eal/Makefile        |   3 +-
 lib/librte_eal/bsdapp/eal/eal_thread.c    | 152 ------------------------------
 lib/librte_eal/common/eal_common_thread.c | 147 ++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile      |   3 +-
 lib/librte_eal/linuxapp/eal/eal_thread.c  | 152 +-----------------------------
 5 files changed, 151 insertions(+), 306 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index 2357cfa..b7ca47c 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -55,6 +55,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) := eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_memory.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_hugepage_info.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_thread.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_debug.c
@@ -77,7 +78,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_hexdump.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_devargs.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_options.c
-SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_thread.c
 
 CFLAGS_eal.o := -D_GNU_SOURCE
 #CFLAGS_eal_thread.o := -D_GNU_SOURCE
@@ -88,6 +88,7 @@ CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
+CFLAGS_eal_common_thread.o += -Wno-return-type
 CFLAGS_eal_hpet.o += -Wno-return-type
 endif
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 9a03437..5714b8f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -35,163 +35,11 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <stdint.h>
-#include <unistd.h>
-#include <sched.h>
-#include <pthread_np.h>
-#include <sys/queue.h>
 #include <sys/thr.h>
 
-#include <rte_debug.h>
-#include <rte_atomic.h>
-#include <rte_launch.h>
-#include <rte_log.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_per_lcore.h>
-#include <rte_eal.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
-
 #include "eal_private.h"
 #include "eal_thread.h"
 
-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY;
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
-RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
-
-/*
- * Send a message to a slave lcore identified by slave_id to call a
- * function f with argument arg. Once the execution is done, the
- * remote lcore switch in FINISHED state.
- */
-int
-rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
-{
-	int n;
-	char c = 0;
-	int m2s = lcore_config[slave_id].pipe_master2slave[1];
-	int s2m = lcore_config[slave_id].pipe_slave2master[0];
-
-	if (lcore_config[slave_id].state != WAIT)
-		return -EBUSY;
-
-	lcore_config[slave_id].f = f;
-	lcore_config[slave_id].arg = arg;
-
-	/* send message */
-	n = 0;
-	while (n == 0 || (n < 0 && errno == EINTR))
-		n = write(m2s, &c, 1);
-	if (n < 0)
-		rte_panic("cannot write on configuration pipe\n");
-
-	/* wait ack */
-	do {
-		n = read(s2m, &c, 1);
-	} while (n < 0 && errno == EINTR);
-
-	if (n <= 0)
-		rte_panic("cannot read on configuration pipe\n");
-
-	return 0;
-}
-
-/* set affinity for current thread */
-static int
-eal_thread_set_affinity(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-
-	/* acquire system unique id  */
-	rte_gettid();
-
-	/* update EAL thread core affinity */
-	return rte_thread_set_affinity(&lcore_config[lcore_id].cpuset);
-}
-
-void eal_thread_init_master(unsigned lcore_id)
-{
-	/* set the lcore ID in per-lcore memory area */
-	RTE_PER_LCORE(_lcore_id) = lcore_id;
-
-	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
-}
-
-/* main loop of threads */
-__attribute__((noreturn)) void *
-eal_thread_loop(__attribute__((unused)) void *arg)
-{
-	char c;
-	int n, ret;
-	unsigned lcore_id;
-	pthread_t thread_id;
-	int m2s, s2m;
-	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
-
-	thread_id = pthread_self();
-
-	/* retrieve our lcore_id from the configuration structure */
-	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (thread_id == lcore_config[lcore_id].thread_id)
-			break;
-	}
-	if (lcore_id == RTE_MAX_LCORE)
-		rte_panic("cannot retrieve lcore id\n");
-
-	m2s = lcore_config[lcore_id].pipe_master2slave[0];
-	s2m = lcore_config[lcore_id].pipe_slave2master[1];
-
-	/* set the lcore ID in per-lcore memory area */
-	RTE_PER_LCORE(_lcore_id) = lcore_id;
-
-	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
-
-	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
-
-	RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%p;cpuset=[%s%s])\n",
-		lcore_id, thread_id, cpuset, ret == 0 ? "" : "...");
-
-	/* read on our pipe to get commands */
-	while (1) {
-		void *fct_arg;
-
-		/* wait command */
-		do {
-			n = read(m2s, &c, 1);
-		} while (n < 0 && errno == EINTR);
-
-		if (n <= 0)
-			rte_panic("cannot read on configuration pipe\n");
-
-		lcore_config[lcore_id].state = RUNNING;
-
-		/* send ack */
-		n = 0;
-		while (n == 0 || (n < 0 && errno == EINTR))
-			n = write(s2m, &c, 1);
-		if (n < 0)
-			rte_panic("cannot write on configuration pipe\n");
-
-		if (lcore_config[lcore_id].f == NULL)
-			rte_panic("NULL function pointer\n");
-
-		/* call the function and store the return value */
-		fct_arg = lcore_config[lcore_id].arg;
-		ret = lcore_config[lcore_id].f(fct_arg);
-		lcore_config[lcore_id].ret = ret;
-		rte_wmb();
-		lcore_config[lcore_id].state = FINISHED;
-	}
-
-	/* never reached */
-	/* pthread_exit(NULL); */
-	/* return NULL; */
-}
-
 /* require calling thread tid by gettid() */
 int rte_sys_gettid(void)
 {
diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 2405e93..5e55401 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -31,11 +31,12 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <stdint.h>
 #include <unistd.h>
-#include <pthread.h>
+#include <sys/queue.h>
 #include <sched.h>
 #include <assert.h>
 #include <string.h>
@@ -43,10 +44,21 @@
 #include <rte_lcore.h>
 #include <rte_memory.h>
 #include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_atomic.h>
+#include <rte_launch.h>
+#include <rte_memzone.h>
+#include <rte_per_lcore.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
 
+#include "eal_private.h"
 #include "eal_thread.h"
 
 RTE_DECLARE_PER_LCORE(unsigned , _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
+RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
 
 unsigned rte_socket_id(void)
 {
@@ -155,3 +167,136 @@ exit:
 
 	return ret;
 }
+
+/*
+ * Send a message to a slave lcore identified by slave_id to call a
+ * function f with argument arg. Once the execution is done, the
+ * remote lcore switch in FINISHED state.
+ */
+int
+rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
+{
+	int n;
+	char c = 0;
+	int m2s = lcore_config[slave_id].pipe_master2slave[1];
+	int s2m = lcore_config[slave_id].pipe_slave2master[0];
+
+	if (lcore_config[slave_id].state != WAIT)
+		return -EBUSY;
+
+	lcore_config[slave_id].f = f;
+	lcore_config[slave_id].arg = arg;
+
+	/* send message */
+	n = 0;
+	while (n == 0 || (n < 0 && errno == EINTR))
+		n = write(m2s, &c, 1);
+	if (n < 0)
+		rte_panic("cannot write on configuration pipe\n");
+
+	/* wait ack */
+	do {
+		n = read(s2m, &c, 1);
+	} while (n < 0 && errno == EINTR);
+
+	if (n <= 0)
+		rte_panic("cannot read on configuration pipe\n");
+
+	return 0;
+}
+
+/* set affinity for current EAL thread */
+static int
+eal_thread_set_affinity(void)
+{
+	unsigned lcore_id = rte_lcore_id();
+
+	/* acquire system unique id  */
+	rte_gettid();
+
+	/* update EAL thread core affinity */
+	return rte_thread_set_affinity(&lcore_config[lcore_id].cpuset);
+}
+
+void eal_thread_init_master(unsigned lcore_id)
+{
+	/* set the lcore ID in per-lcore memory area */
+	RTE_PER_LCORE(_lcore_id) = lcore_id;
+
+	/* set CPU affinity */
+	if (eal_thread_set_affinity() < 0)
+		rte_panic("cannot set affinity\n");
+}
+
+/* main loop of threads */
+__attribute__((noreturn)) void *
+eal_thread_loop(__attribute__((unused)) void *arg)
+{
+	char c;
+	int n, ret;
+	unsigned lcore_id;
+	pthread_t thread_id;
+	int m2s, s2m;
+	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
+
+	thread_id = pthread_self();
+
+	/* retrieve our lcore_id from the configuration structure */
+	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+		if (thread_id == lcore_config[lcore_id].thread_id)
+			break;
+	}
+	if (lcore_id == RTE_MAX_LCORE)
+		rte_panic("cannot retrieve lcore id\n");
+
+	m2s = lcore_config[lcore_id].pipe_master2slave[0];
+	s2m = lcore_config[lcore_id].pipe_slave2master[1];
+
+	/* set the lcore ID in per-lcore memory area */
+	RTE_PER_LCORE(_lcore_id) = lcore_id;
+
+	/* set CPU affinity */
+	if (eal_thread_set_affinity() < 0)
+		rte_panic("cannot set affinity\n");
+
+	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
+
+	RTE_LOG(DEBUG, EAL, "lcore %u is ready (thread=%d;cpuset=[%s%s])\n",
+		lcore_id, rte_gettid(), cpuset, ret == 0 ? "" : "...");
+
+	/* read on our pipe to get commands */
+	while (1) {
+		void *fct_arg;
+
+		/* wait command */
+		do {
+			n = read(m2s, &c, 1);
+		} while (n < 0 && errno == EINTR);
+
+		if (n <= 0)
+			rte_panic("cannot read on configuration pipe\n");
+
+		lcore_config[lcore_id].state = RUNNING;
+
+		/* send ack */
+		n = 0;
+		while (n == 0 || (n < 0 && errno == EINTR))
+			n = write(s2m, &c, 1);
+		if (n < 0)
+			rte_panic("cannot write on configuration pipe\n");
+
+		if (lcore_config[lcore_id].f == NULL)
+			rte_panic("NULL function pointer\n");
+
+		/* call the function and store the return value */
+		fct_arg = lcore_config[lcore_id].arg;
+		ret = lcore_config[lcore_id].f(fct_arg);
+		lcore_config[lcore_id].ret = ret;
+		rte_wmb();
+		lcore_config[lcore_id].state = FINISHED;
+	}
+
+	/* never reached */
+	/* pthread_exit(NULL); */
+	/* return NULL; */
+}
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 01f7b70..f11ef59 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -61,6 +61,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_xen_memory.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
@@ -89,7 +90,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_hexdump.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_devargs.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_options.c
-SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_thread.c
 
 CFLAGS_eal.o := -D_GNU_SOURCE
 CFLAGS_eal_interrupts.o := -D_GNU_SOURCE
@@ -109,6 +109,7 @@ CFLAGS_eal_common_thread.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
+CFLAGS_eal_common_thread.o += -Wno-return-type
 endif
 
 INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 18bd8e0..51dca37 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -34,163 +34,13 @@
 #include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <stdint.h>
 #include <unistd.h>
-#include <pthread.h>
-#include <sched.h>
-#include <sys/queue.h>
 #include <sys/syscall.h>
 
-#include <rte_debug.h>
-#include <rte_atomic.h>
-#include <rte_launch.h>
-#include <rte_log.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_per_lcore.h>
-#include <rte_eal.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
-
 #include "eal_private.h"
 #include "eal_thread.h"
 
-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY;
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
-RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
-
-/*
- * Send a message to a slave lcore identified by slave_id to call a
- * function f with argument arg. Once the execution is done, the
- * remote lcore switch in FINISHED state.
- */
-int
-rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id)
-{
-	int n;
-	char c = 0;
-	int m2s = lcore_config[slave_id].pipe_master2slave[1];
-	int s2m = lcore_config[slave_id].pipe_slave2master[0];
-
-	if (lcore_config[slave_id].state != WAIT)
-		return -EBUSY;
-
-	lcore_config[slave_id].f = f;
-	lcore_config[slave_id].arg = arg;
-
-	/* send message */
-	n = 0;
-	while (n == 0 || (n < 0 && errno == EINTR))
-		n = write(m2s, &c, 1);
-	if (n < 0)
-		rte_panic("cannot write on configuration pipe\n");
-
-	/* wait ack */
-	do {
-		n = read(s2m, &c, 1);
-	} while (n < 0 && errno == EINTR);
-
-	if (n <= 0)
-		rte_panic("cannot read on configuration pipe\n");
-
-	return 0;
-}
-
-/* set affinity for current EAL thread */
-static int
-eal_thread_set_affinity(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-
-	/* acquire system unique id  */
-	rte_gettid();
-
-	/* update EAL thread core affinity */
-	return rte_thread_set_affinity(&lcore_config[lcore_id].cpuset);
-}
-
-void eal_thread_init_master(unsigned lcore_id)
-{
-	/* set the lcore ID in per-lcore memory area */
-	RTE_PER_LCORE(_lcore_id) = lcore_id;
-
-	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
-}
-
-/* main loop of threads */
-__attribute__((noreturn)) void *
-eal_thread_loop(__attribute__((unused)) void *arg)
-{
-	char c;
-	int n, ret;
-	unsigned lcore_id;
-	pthread_t thread_id;
-	int m2s, s2m;
-	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
-
-	thread_id = pthread_self();
-
-	/* retrieve our lcore_id from the configuration structure */
-	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (thread_id == lcore_config[lcore_id].thread_id)
-			break;
-	}
-	if (lcore_id == RTE_MAX_LCORE)
-		rte_panic("cannot retrieve lcore id\n");
-
-	m2s = lcore_config[lcore_id].pipe_master2slave[0];
-	s2m = lcore_config[lcore_id].pipe_slave2master[1];
-
-	/* set the lcore ID in per-lcore memory area */
-	RTE_PER_LCORE(_lcore_id) = lcore_id;
-
-	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
-
-	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
-
-	RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%x;cpuset=[%s%s])\n",
-		lcore_id, (int)thread_id, cpuset, ret == 0 ? "" : "...");
-
-	/* read on our pipe to get commands */
-	while (1) {
-		void *fct_arg;
-
-		/* wait command */
-		do {
-			n = read(m2s, &c, 1);
-		} while (n < 0 && errno == EINTR);
-
-		if (n <= 0)
-			rte_panic("cannot read on configuration pipe\n");
-
-		lcore_config[lcore_id].state = RUNNING;
-
-		/* send ack */
-		n = 0;
-		while (n == 0 || (n < 0 && errno == EINTR))
-			n = write(s2m, &c, 1);
-		if (n < 0)
-			rte_panic("cannot write on configuration pipe\n");
-
-		if (lcore_config[lcore_id].f == NULL)
-			rte_panic("NULL function pointer\n");
-
-		/* call the function and store the return value */
-		fct_arg = lcore_config[lcore_id].arg;
-		ret = lcore_config[lcore_id].f(fct_arg);
-		lcore_config[lcore_id].ret = ret;
-		rte_wmb();
-		lcore_config[lcore_id].state = FINISHED;
-	}
-
-	/* never reached */
-	/* pthread_exit(NULL); */
-	/* return NULL; */
-}
+#include <rte_log.h>
 
 /* require calling thread tid by gettid() */
 int rte_sys_gettid(void)
-- 
1.9.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL
@ 2015-04-28 23:46  4% Ravi Kerur
  2015-04-28 23:46  2% ` [dpdk-dev] [PATCH v8 1/6] Move common functions in eal_thread.c Ravi Kerur
  2015-04-29 10:14  0% ` [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL Neil Horman
  0 siblings, 2 replies; 200+ results
From: Ravi Kerur @ 2015-04-28 23:46 UTC (permalink / raw)
  To: dev

Changes in v8 includes
Re-ordering source file compilation to fix ABI warning.
Ran validate-abi against x86_64-native-linuxapp-gcc,
x86_64-native-linuxapp-clang and x86_64-ivshmem-linuxapp-gcc
environments.

Testing:
Linux - Ubuntu x86_64 14.04
Compilation successful (x86_64-native-linuxapp-gcc and
x86_64-native-linuxapp-clang).
"make test" results match baseline code.
testpmd utility on I217/I218 Intel chipset.

FreeBSD 10.0 x86_64
Compilation successful (x86_64-native-bsdapp-gcc and
x86_64-native-bsdapp-clang).
Tested with helloworld, timer and cmdline examples.

Ravi Kerur (6):
  Move common functions in eal_thread.c
  Move common functions in eal.c
  Move common functions in eal_lcore.c
  Move common functions in eal_timer.c
  Move common functions in eal_memory.c
  Move common functions in eal_pci.c

 lib/librte_eal/bsdapp/eal/Makefile           |   9 +-
 lib/librte_eal/bsdapp/eal/eal.c              | 271 +++---------------------
 lib/librte_eal/bsdapp/eal/eal_lcore.c        |  72 ++-----
 lib/librte_eal/bsdapp/eal/eal_memory.c       |  47 ++---
 lib/librte_eal/bsdapp/eal/eal_pci.c          |  72 +------
 lib/librte_eal/bsdapp/eal/eal_thread.c       | 152 --------------
 lib/librte_eal/bsdapp/eal/eal_timer.c        |  52 +----
 lib/librte_eal/common/eal_common_app_usage.c |  63 ++++++
 lib/librte_eal/common/eal_common_lcore.c     | 107 ++++++++++
 lib/librte_eal/common/eal_common_mem_cfg.c   | 224 ++++++++++++++++++++
 lib/librte_eal/common/eal_common_memory.c    |  38 +++-
 lib/librte_eal/common/eal_common_pci.c       |  72 +++++++
 lib/librte_eal/common/eal_common_proc_type.c |  58 ++++++
 lib/librte_eal/common/eal_common_sysfs.c     | 148 ++++++++++++++
 lib/librte_eal/common/eal_common_thread.c    | 147 ++++++++++++-
 lib/librte_eal/common/eal_common_timer.c     | 102 +++++++++
 lib/librte_eal/common/eal_hugepages.h        |   1 +
 lib/librte_eal/common/eal_private.h          | 171 +++++++++++++++-
 lib/librte_eal/common/include/rte_eal.h      |   4 +
 lib/librte_eal/linuxapp/eal/Makefile         |  10 +-
 lib/librte_eal/linuxapp/eal/eal.c            | 296 ++++-----------------------
 lib/librte_eal/linuxapp/eal/eal_lcore.c      |  66 +-----
 lib/librte_eal/linuxapp/eal/eal_memory.c     |  36 +---
 lib/librte_eal/linuxapp/eal/eal_pci.c        |  75 +------
 lib/librte_eal/linuxapp/eal/eal_thread.c     | 152 +-------------
 lib/librte_eal/linuxapp/eal/eal_timer.c      |  55 +----
 26 files changed, 1277 insertions(+), 1223 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_app_usage.c
 create mode 100644 lib/librte_eal/common/eal_common_lcore.c
 create mode 100644 lib/librte_eal/common/eal_common_mem_cfg.c
 create mode 100644 lib/librte_eal/common/eal_common_proc_type.c
 create mode 100644 lib/librte_eal/common/eal_common_sysfs.c
 create mode 100644 lib/librte_eal/common/eal_common_timer.c

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-27 22:39  3%                     ` Ravi Kerur
@ 2015-04-28 19:35  0%                       ` Neil Horman
  2015-04-28 23:52  4%                         ` Ravi Kerur
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-04-28 19:35 UTC (permalink / raw)
  To: Ravi Kerur; +Cc: dev

On Mon, Apr 27, 2015 at 03:39:41PM -0700, Ravi Kerur wrote:
> On Mon, Apr 27, 2015 at 6:44 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Sat, Apr 25, 2015 at 05:09:01PM -0700, Ravi Kerur wrote:
> > > On Sat, Apr 25, 2015 at 6:02 AM, Neil Horman <nhorman@tuxdriver.com>
> > wrote:
> > >
> > > > On Sat, Apr 25, 2015 at 08:32:42AM -0400, Neil Horman wrote:
> > > > > On Fri, Apr 24, 2015 at 06:45:06PM -0700, Ravi Kerur wrote:
> > > > > > On Fri, Apr 24, 2015 at 2:24 PM, Ravi Kerur <rkerur@gmail.com>
> > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Apr 24, 2015 at 12:51 PM, Neil Horman <
> > nhorman@tuxdriver.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> On Fri, Apr 24, 2015 at 12:21:23PM -0700, Ravi Kerur wrote:
> > > > > > >> > On Fri, Apr 24, 2015 at 11:53 AM, Neil Horman <
> > > > nhorman@tuxdriver.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > On Fri, Apr 24, 2015 at 09:45:24AM -0700, Ravi Kerur wrote:
> > > > > > >> > > > On Fri, Apr 24, 2015 at 8:22 AM, Neil Horman <
> > > > nhorman@tuxdriver.com
> > > > > > >> >
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > On Fri, Apr 24, 2015 at 08:14:04AM -0700, Ravi Kerur
> > wrote:
> > > > > > >> > > > > > On Fri, Apr 24, 2015 at 6:51 AM, Neil Horman <
> > > > > > >> nhorman@tuxdriver.com>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > On Thu, Apr 23, 2015 at 02:35:31PM -0700, Ravi Kerur
> > > > wrote:
> > > > > > >> > > > > > > > Changes in v7
> > > > > > >> > > > > > > > Remove _setname_ pthread calls.
> > > > > > >> > > > > > > > Use rte_gettid() API in RTE_LOG to print
> > thread_id.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Changes in v6
> > > > > > >> > > > > > > > Remove RTE_EXEC_ENV_BSDAPP from
> > eal_common_thread.c
> > > > file.
> > > > > > >> > > > > > > > Add pthread_setname_np/pthread_set_name_np for
> > > > Linux/FreeBSD
> > > > > > >> > > > > > > > respectively. Plan to use _getname_ in RTE_LOG
> > when
> > > > > > >> available.
> > > > > > >> > > > > > > > Use existing rte_get_systid() in RTE_LOG to print
> > > > thread_id.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Changes in v5
> > > > > > >> > > > > > > > Rebase to latest code.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Changes in v4
> > > > > > >> > > > > > > > None
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Changes in v3
> > > > > > >> > > > > > > > Changed subject to be more explicit on file name
> > > > inclusion.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Changes in v2
> > > > > > >> > > > > > > > None
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Changes in v1
> > > > > > >> > > > > > > > eal_thread.c has minor differences between Linux
> > and
> > > > BSD,
> > > > > > >> move
> > > > > > >> > > > > > > > entire file into common directory.
> > > > > > >> > > > > > > > Use RTE_EXEC_ENV_BSDAPP to differentiate on minor
> > > > > > >> differences.
> > > > > > >> > > > > > > > Rename eal_thread.c to eal_common_thread.c
> > > > > > >> > > > > > > > Makefile changes to reflect file move and name
> > change.
> > > > > > >> > > > > > > > Fix checkpatch warnings.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Signed-off-by: Ravi Kerur <rkerur@gmail.com>
> > > > > > >> > > > > > > > ---
> > > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/Makefile        |   2
> > +-
> > > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/eal_thread.c    | 152
> > > > > > >> > > > > > > ------------------------------
> > > > > > >> > > > > > > >  lib/librte_eal/common/eal_common_thread.c | 147
> > > > > > >> > > > > > > ++++++++++++++++++++++++++++-
> > > > > > >> > > > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 152
> > > > > > >> > > > > > > +-----------------------------
> > > > > > >> > > > > > > >  4 files changed, 148 insertions(+), 305
> > deletions(-)
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > diff --git a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > >> > > > > > > > index 2357cfa..55971b9 100644
> > > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > > >> > > > > > > > @@ -87,7 +87,7 @@ CFLAGS_eal_common_log.o :=
> > > > -D_GNU_SOURCE
> > > > > > >> > > > > > > >  # workaround for a gcc bug with noreturn
> > attribute
> > > > > > >> > > > > > > >  #
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> > > > > > >> > > > > > > >  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > > > > > >> > > > > > > > -CFLAGS_eal_thread.o += -Wno-return-type
> > > > > > >> > > > > > > > +CFLAGS_eal_common_thread.o += -Wno-return-type
> > > > > > >> > > > > > > >  CFLAGS_eal_hpet.o += -Wno-return-type
> > > > > > >> > > > > > > >  endif
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > diff --git
> > a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > >> > > > > > > > index 9a03437..5714b8f 100644
> > > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > > >> > > > > > > > @@ -35,163 +35,11 @@
> > > > > > >> > > > > > > >  #include <stdio.h>
> > > > > > >> > > > > > > >  #include <stdlib.h>
> > > > > > >> > > > > > > >  #include <stdint.h>
> > > > > > >> > > > > > > > -#include <unistd.h>
> > > > > > >> > > > > > > > -#include <sched.h>
> > > > > > >> > > > > > > > -#include <pthread_np.h>
> > > > > > >> > > > > > > > -#include <sys/queue.h>
> > > > > > >> > > > > > > >  #include <sys/thr.h>
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > -#include <rte_debug.h>
> > > > > > >> > > > > > > > -#include <rte_atomic.h>
> > > > > > >> > > > > > > > -#include <rte_launch.h>
> > > > > > >> > > > > > > > -#include <rte_log.h>
> > > > > > >> > > > > > > > -#include <rte_memory.h>
> > > > > > >> > > > > > > > -#include <rte_memzone.h>
> > > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > > >> > > > > > > > -#include <rte_eal.h>
> > > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > > >> > > > > > > > -#include <rte_lcore.h>
> > > > > > >> > > > > > > > -
> > > > > > >> > > > > > > >  #include "eal_private.h"
> > > > > > >> > > > > > > >  #include "eal_thread.h"
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > > > LCORE_ID_ANY;
> > > > > > >> > > > > > > NAK, these are exported symbols, you can't remove
> > them
> > > > without
> > > > > > >> > > going
> > > > > > >> > > > > > > through the
> > > > > > >> > > > > > > deprecation process.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > > They are not removed/deleted, they are moved from
> > > > eal_thread.c
> > > > > > >> to
> > > > > > >> > > > > > eal_common_thread.c file since it is common to both
> > Linux
> > > > and
> > > > > > >> BSD.
> > > > > > >> > > > > >
> > > > > > >> > > > > Then perhaps you forgot to export the symbol?  Its
> > showing
> > > > up as
> > > > > > >> > > removed
> > > > > > >> > > > > on the
> > > > > > >> > > > > ABI checker utility.
> > > > > > >> > > > >
> > > > > > >> > > > > Neil
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > > Can you please show me in the current code where it is
> > being
> > > > > > >> exported? I
> > > > > > >> > > > have only moved definitions to _common_ files, not sure
> > why it
> > > > > > >> should be
> > > > > > >> > > > exported now.  I searched in the current code for
> > > > > > >> RTE_DEFINE_PER_LCORE
> > > > > > >> > > >
> > > > > > >> > > > #home/rkerur/dpdk-tmp/dpdk# grep -ir RTE_DEFINE_PER_LCORE
> > *
> > > > > > >> > > > app/test/test_per_lcore.c:static
> > > > RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > >> test) =
> > > > > > >> > > > 0x12345678;
> > > > > > >> > > >
> > > > > > >>
> > > > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > > >> > > >
> > > > > > >>
> > > > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >>
> > > >
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > > >> > > > _cpuset);
> > > > > > >> > > >
> > > > > > >>
> > > > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > > >> > > >
> > > > > > >>
> > > > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > > >> > > >
> > > > > > >>
> > > >
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > > >> > > > _cpuset);
> > > > > > >> > > > lib/librte_eal/common/include/rte_per_lcore.h:#define
> > > > > > >> > > > RTE_DEFINE_PER_LCORE(type, name)            \
> > > > > > >> > > > lib/librte_eal/common/include/rte_eal.h:    static
> > > > > > >> > > > RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
> > > > > > >> > > >
> > > > lib/librte_eal/common/eal_common_errno.c:RTE_DEFINE_PER_LCORE(int,
> > > > > > >> > > > _rte_errno);
> > > > > > >> > > > lib/librte_eal/common/eal_common_errno.c:    static
> > > > > > >> > > > RTE_DEFINE_PER_LCORE(char[RETVAL_SZ], retval);
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > > > Thanks
> > > > > > >> > > > > > Ravi
> > > > > > >> > > > > >
> > > > > > >> > > > > > Regards
> > > > > > >> > > > > > > Neil
> > > > > > >> > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > >
> > > > > > >> > > Its exported in the version map file:
> > > > > > >> > >  per_lcore__lcore_id;
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > Thanks Neil, I checked and both linux and bsd
> > rte_eal_version.map
> > > > have
> > > > > > >> it.
> > > > > > >> > I compared .map file between "changed code" and the original,
> > > > they are
> > > > > > >> same
> > > > > > >> > for both linux and bsd. In fact you had ACK'd v4 version of
> > this
> > > > patch
> > > > > > >> > series and no major changes after that. Please let me know if
> > I
> > > > missed
> > > > > > >> > something.
> > > > > > >> >
> > > > > > >> I did, and I'm retracting that, because I didn't think to check
> > the
> > > > ABI
> > > > > > >> compatibility on this.  But I ran it throught the ABI checking
> > > > script
> > > > > > >> this and
> > > > > > >> this error popped out.  You should run it as well, its in the
> > > > scripts
> > > > > > >> directory.
> > > > > > >>
> > > > > > >>
> > > > > > >> I see in your first patch you removed it and re-added it in the
> > > > common
> > > > > > >> section.
> > > > > > >> But something about how its building is causing it to not show
> > up
> > > > as an
> > > > > > >> exported
> > > > > > >> symbol, which is problematic, as other applications are going to
> > > > want
> > > > > > >> access to
> > > > > > >> it.
> > > > > > >>
> > > > > > >> It also possible that the ABI checker is throwing a false
> > positive,
> > > > but
> > > > > > >> either
> > > > > > >> way, it needs to be looked into prior to moving forward with
> > this.
> > > > > > >>
> > > > > > >>
> > > > > > > I did following things.
> > > > > > >
> > > > > > > Put a tag (v2.0.0-before-common-eal)  before EAL common functions
> > > > changes
> > > > > > > for commit (3c0c807038ad642f4be7deb9370293c39d12f029 net: remove
> > > > unneeded
> > > > > > > include)
> > > > > > >
> > > > > > > Put a tag (v2.0.0-common-eal) after EAL common functions changes
> > for
> > > > > > > commit (25737e5a7212630a7b5d8ca756860a062f403789 Move common
> > > > functions in
> > > > > > > eal_pci.c)
> > > > > > >
> > > > > > > Ran validate-abi against x86_64-native-linuxapp-gcc and
> > > > > > >
> > > > > > > v2.0.0-rc3 and v2.0.0-before-common-eal, html report for
> > > > librte_eal.so
> > > > > > > shows removed symbols for "per_lcore__cpuset"
> > > > > > >
> > > > > > > v2.0.0-rc3 and v2.0.0-common-eal, html report for librte_eal.so
> > shows
> > > > > > > removed symbols for "per_lcore__cpuset"
> > > > > > >
> > > > > > > Removed symbol is different from what you have reported and in my
> > > > case I
> > > > > > > see it even before my commit. If you are interested I can unicast
> > > > you html
> > > > > > > report file. Please let me know how to proceed.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > I did some experiment and found some interesting things.  I will
> > take
> > > > eal.c
> > > > > > as an example
> > > > > >
> > > > > > eal.c is split into eal_common_sysfs.c eal_common_mem_cfg.c
> > > > > > eal_common_proc_type.c and eal_common_app_usage.c. In
> > > > linuxapp/eal/Makefile
> > > > > > if I compile new files right after eal.c as shown below
> > > > > >
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) := eal.c
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_sysfs.c
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_mem_cfg.c
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_proc_type.c
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_app_usage.c
> > > > > > ...
> > > > > >
> > > > > > validate-abi results matches baseline. Instead if i place new
> > _common_
> > > > > > files in common area in linuxapp/eal/Makefile as shown below
> > > > > >
> > > > > > # from common dir
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_memzone.c
> > > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_log.c
> > > > > > ...
> > > > > >
> > > > > > validate-abi reports problem in binary compatibility and source
> > > > > > compatiblity
> > > > > >
> > > > > > eal_filesystem.h, librte_eal.so.1
> > > > > >  [+] eal_parse_sysfs_value ( char const* filename, unsigned long*
> > val )
> > > > > >  @@ DPDK_2.0 (2)
> > > > > >
> > > > > > I believe files in common and linuxapp directory are compiled same
> > way
> > > > so
> > > > > > not sure why placement in makefile makes difference.
> > > > > >
> > > > > > Could this be false-positive from validate-abi script??
> > > > > >
> > > > > It could be, yes.  Though I'm more inclined to think that perhaps in
> > the
> > > > new
> > > > > version of the code we're not generating ithe same dwarf information
> > out
> > > > of it.
> > > > > In fact for some reason, I've checked both the build before and after
> > > > your
> > > > > patch series, and the exported CFLAGS aren't getting passed to the
> > build
> > > > > properly, implying that we're not building all the code in the
> > validator
> > > > with
> > > > > the -g flag, which the validator need to function properly.  I'm
> > looking
> > > > into
> > > > > that
> > > > > Neil
> > > > >
> > > > >
> > > > Found the problem, I was stupidly reading the report incorrectly.  The
> > > > problem
> > > > regarding _lcore_id is a source compatibilty issue (because the symbol
> > > > moved to
> > > > a new location), which is irrelevant to us.  Its not in any way a
> > binary
> > > > compat
> > > > problem, which is what we care about.  Sorry for the noise.
> > > >
> > > > I do still have a few concerns about some changed calling conventions
> > with
> > > > a few
> > > > other functions, which I'll look into on monday.
> > > >
> > > >
> > > Please let me know your inputs on changed calling conventions. Most of
> > them
> > > can be fixed by re-arranging moved code in _common_ files and order of
> > > compilation.
> > >
> > If moving the order of compliation around fixes the problem, then I am
> > reasonably convinced that it is, if not a false positive, a minor issue
> > with the
> > compilers dwarf information (The compiler just can't sanely change the
> > location
> > in which parameters are passed).  If you make those changes, I'll ACK
> > them, and
> > look into whats going on with the calling conventions
> >
> 
> Issues like the one shown below are taken care by reordering the code
> compilation.
> 
> eal_parse_sysfs_value ( char const* filename, unsigned long* val )
> 
> Change
> The parameter filename became passed on stack instead of rdi register
> 
> Effect
> Violation of the calling convention. This may result in crash or incorrect
> behavior of applications.
> 
> Last one that is left out is in
> 
> rte_thread_set_affinity ( rte_cpuset_t* p1 )
> 
> Change
> The parameter *p1* became passed in *rdi* register instead of stack.
> 
> Effect
> Violation of the calling convention. This may result in crash or incorrect
> behavior of applications.
> 
> After checking abi-0.99.pdf (x86-64.org) looks like for
> "rte_thread_set_affinity" new code is doing the right thing by passing the
> parameter in "rdi" register since pointer is classified as "integer_class".
> Nothing needs to be fixed here. After you confirm that warning can be
> ignored I will work on sending new revision.
> 
ACK then, send the new revision, this appears to be a false positive.

Thanks for taking the time to confirm.

Best
Neil

> Thanks,
> Ravi
> 
> 
> > Thanks!
> > Neil
> >
> > > Thanks,
> > > Ravi
> > >
> > > Regards
> > > > Neil
> > > >
> > > >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  2015-04-27 13:44  0%                   ` Neil Horman
@ 2015-04-27 22:39  3%                     ` Ravi Kerur
  2015-04-28 19:35  0%                       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ravi Kerur @ 2015-04-27 22:39 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On Mon, Apr 27, 2015 at 6:44 AM, Neil Horman <nhorman@tuxdriver.com> wrote:

> On Sat, Apr 25, 2015 at 05:09:01PM -0700, Ravi Kerur wrote:
> > On Sat, Apr 25, 2015 at 6:02 AM, Neil Horman <nhorman@tuxdriver.com>
> wrote:
> >
> > > On Sat, Apr 25, 2015 at 08:32:42AM -0400, Neil Horman wrote:
> > > > On Fri, Apr 24, 2015 at 06:45:06PM -0700, Ravi Kerur wrote:
> > > > > On Fri, Apr 24, 2015 at 2:24 PM, Ravi Kerur <rkerur@gmail.com>
> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 24, 2015 at 12:51 PM, Neil Horman <
> nhorman@tuxdriver.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> On Fri, Apr 24, 2015 at 12:21:23PM -0700, Ravi Kerur wrote:
> > > > > >> > On Fri, Apr 24, 2015 at 11:53 AM, Neil Horman <
> > > nhorman@tuxdriver.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > On Fri, Apr 24, 2015 at 09:45:24AM -0700, Ravi Kerur wrote:
> > > > > >> > > > On Fri, Apr 24, 2015 at 8:22 AM, Neil Horman <
> > > nhorman@tuxdriver.com
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > On Fri, Apr 24, 2015 at 08:14:04AM -0700, Ravi Kerur
> wrote:
> > > > > >> > > > > > On Fri, Apr 24, 2015 at 6:51 AM, Neil Horman <
> > > > > >> nhorman@tuxdriver.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > On Thu, Apr 23, 2015 at 02:35:31PM -0700, Ravi Kerur
> > > wrote:
> > > > > >> > > > > > > > Changes in v7
> > > > > >> > > > > > > > Remove _setname_ pthread calls.
> > > > > >> > > > > > > > Use rte_gettid() API in RTE_LOG to print
> thread_id.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Changes in v6
> > > > > >> > > > > > > > Remove RTE_EXEC_ENV_BSDAPP from
> eal_common_thread.c
> > > file.
> > > > > >> > > > > > > > Add pthread_setname_np/pthread_set_name_np for
> > > Linux/FreeBSD
> > > > > >> > > > > > > > respectively. Plan to use _getname_ in RTE_LOG
> when
> > > > > >> available.
> > > > > >> > > > > > > > Use existing rte_get_systid() in RTE_LOG to print
> > > thread_id.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Changes in v5
> > > > > >> > > > > > > > Rebase to latest code.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Changes in v4
> > > > > >> > > > > > > > None
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Changes in v3
> > > > > >> > > > > > > > Changed subject to be more explicit on file name
> > > inclusion.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Changes in v2
> > > > > >> > > > > > > > None
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Changes in v1
> > > > > >> > > > > > > > eal_thread.c has minor differences between Linux
> and
> > > BSD,
> > > > > >> move
> > > > > >> > > > > > > > entire file into common directory.
> > > > > >> > > > > > > > Use RTE_EXEC_ENV_BSDAPP to differentiate on minor
> > > > > >> differences.
> > > > > >> > > > > > > > Rename eal_thread.c to eal_common_thread.c
> > > > > >> > > > > > > > Makefile changes to reflect file move and name
> change.
> > > > > >> > > > > > > > Fix checkpatch warnings.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Signed-off-by: Ravi Kerur <rkerur@gmail.com>
> > > > > >> > > > > > > > ---
> > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/Makefile        |   2
> +-
> > > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/eal_thread.c    | 152
> > > > > >> > > > > > > ------------------------------
> > > > > >> > > > > > > >  lib/librte_eal/common/eal_common_thread.c | 147
> > > > > >> > > > > > > ++++++++++++++++++++++++++++-
> > > > > >> > > > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 152
> > > > > >> > > > > > > +-----------------------------
> > > > > >> > > > > > > >  4 files changed, 148 insertions(+), 305
> deletions(-)
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > diff --git a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > >> > > > > > > > index 2357cfa..55971b9 100644
> > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/Makefile
> > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/Makefile
> > > > > >> > > > > > > > @@ -87,7 +87,7 @@ CFLAGS_eal_common_log.o :=
> > > -D_GNU_SOURCE
> > > > > >> > > > > > > >  # workaround for a gcc bug with noreturn
> attribute
> > > > > >> > > > > > > >  #
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> > > > > >> > > > > > > >  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > > > > >> > > > > > > > -CFLAGS_eal_thread.o += -Wno-return-type
> > > > > >> > > > > > > > +CFLAGS_eal_common_thread.o += -Wno-return-type
> > > > > >> > > > > > > >  CFLAGS_eal_hpet.o += -Wno-return-type
> > > > > >> > > > > > > >  endif
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > diff --git
> a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > >> > > > > > > > index 9a03437..5714b8f 100644
> > > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > > >> > > > > > > > @@ -35,163 +35,11 @@
> > > > > >> > > > > > > >  #include <stdio.h>
> > > > > >> > > > > > > >  #include <stdlib.h>
> > > > > >> > > > > > > >  #include <stdint.h>
> > > > > >> > > > > > > > -#include <unistd.h>
> > > > > >> > > > > > > > -#include <sched.h>
> > > > > >> > > > > > > > -#include <pthread_np.h>
> > > > > >> > > > > > > > -#include <sys/queue.h>
> > > > > >> > > > > > > >  #include <sys/thr.h>
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > -#include <rte_debug.h>
> > > > > >> > > > > > > > -#include <rte_atomic.h>
> > > > > >> > > > > > > > -#include <rte_launch.h>
> > > > > >> > > > > > > > -#include <rte_log.h>
> > > > > >> > > > > > > > -#include <rte_memory.h>
> > > > > >> > > > > > > > -#include <rte_memzone.h>
> > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > >> > > > > > > > -#include <rte_eal.h>
> > > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > > >> > > > > > > > -#include <rte_lcore.h>
> > > > > >> > > > > > > > -
> > > > > >> > > > > > > >  #include "eal_private.h"
> > > > > >> > > > > > > >  #include "eal_thread.h"
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > > LCORE_ID_ANY;
> > > > > >> > > > > > > NAK, these are exported symbols, you can't remove
> them
> > > without
> > > > > >> > > going
> > > > > >> > > > > > > through the
> > > > > >> > > > > > > deprecation process.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > They are not removed/deleted, they are moved from
> > > eal_thread.c
> > > > > >> to
> > > > > >> > > > > > eal_common_thread.c file since it is common to both
> Linux
> > > and
> > > > > >> BSD.
> > > > > >> > > > > >
> > > > > >> > > > > Then perhaps you forgot to export the symbol?  Its
> showing
> > > up as
> > > > > >> > > removed
> > > > > >> > > > > on the
> > > > > >> > > > > ABI checker utility.
> > > > > >> > > > >
> > > > > >> > > > > Neil
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > > Can you please show me in the current code where it is
> being
> > > > > >> exported? I
> > > > > >> > > > have only moved definitions to _common_ files, not sure
> why it
> > > > > >> should be
> > > > > >> > > > exported now.  I searched in the current code for
> > > > > >> RTE_DEFINE_PER_LCORE
> > > > > >> > > >
> > > > > >> > > > #home/rkerur/dpdk-tmp/dpdk# grep -ir RTE_DEFINE_PER_LCORE
> *
> > > > > >> > > > app/test/test_per_lcore.c:static
> > > RTE_DEFINE_PER_LCORE(unsigned,
> > > > > >> test) =
> > > > > >> > > > 0x12345678;
> > > > > >> > > >
> > > > > >>
> > > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > >> > > >
> > > > > >>
> > > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > >> > > >
> > > > > >> > >
> > > > > >>
> > >
> lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > >> > > > _cpuset);
> > > > > >> > > >
> > > > > >>
> > > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > > >> > > >
> > > > > >>
> > > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > > >> > > >
> > > > > >>
> > >
> lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > > >> > > > _cpuset);
> > > > > >> > > > lib/librte_eal/common/include/rte_per_lcore.h:#define
> > > > > >> > > > RTE_DEFINE_PER_LCORE(type, name)            \
> > > > > >> > > > lib/librte_eal/common/include/rte_eal.h:    static
> > > > > >> > > > RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
> > > > > >> > > >
> > > lib/librte_eal/common/eal_common_errno.c:RTE_DEFINE_PER_LCORE(int,
> > > > > >> > > > _rte_errno);
> > > > > >> > > > lib/librte_eal/common/eal_common_errno.c:    static
> > > > > >> > > > RTE_DEFINE_PER_LCORE(char[RETVAL_SZ], retval);
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > > > Thanks
> > > > > >> > > > > > Ravi
> > > > > >> > > > > >
> > > > > >> > > > > > Regards
> > > > > >> > > > > > > Neil
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > >
> > > > > >> > > Its exported in the version map file:
> > > > > >> > >  per_lcore__lcore_id;
> > > > > >> > >
> > > > > >> > >
> > > > > >> > Thanks Neil, I checked and both linux and bsd
> rte_eal_version.map
> > > have
> > > > > >> it.
> > > > > >> > I compared .map file between "changed code" and the original,
> > > they are
> > > > > >> same
> > > > > >> > for both linux and bsd. In fact you had ACK'd v4 version of
> this
> > > patch
> > > > > >> > series and no major changes after that. Please let me know if
> I
> > > missed
> > > > > >> > something.
> > > > > >> >
> > > > > >> I did, and I'm retracting that, because I didn't think to check
> the
> > > ABI
> > > > > >> compatibility on this.  But I ran it throught the ABI checking
> > > script
> > > > > >> this and
> > > > > >> this error popped out.  You should run it as well, its in the
> > > scripts
> > > > > >> directory.
> > > > > >>
> > > > > >>
> > > > > >> I see in your first patch you removed it and re-added it in the
> > > common
> > > > > >> section.
> > > > > >> But something about how its building is causing it to not show
> up
> > > as an
> > > > > >> exported
> > > > > >> symbol, which is problematic, as other applications are going to
> > > want
> > > > > >> access to
> > > > > >> it.
> > > > > >>
> > > > > >> It also possible that the ABI checker is throwing a false
> positive,
> > > but
> > > > > >> either
> > > > > >> way, it needs to be looked into prior to moving forward with
> this.
> > > > > >>
> > > > > >>
> > > > > > I did following things.
> > > > > >
> > > > > > Put a tag (v2.0.0-before-common-eal)  before EAL common functions
> > > changes
> > > > > > for commit (3c0c807038ad642f4be7deb9370293c39d12f029 net: remove
> > > unneeded
> > > > > > include)
> > > > > >
> > > > > > Put a tag (v2.0.0-common-eal) after EAL common functions changes
> for
> > > > > > commit (25737e5a7212630a7b5d8ca756860a062f403789 Move common
> > > functions in
> > > > > > eal_pci.c)
> > > > > >
> > > > > > Ran validate-abi against x86_64-native-linuxapp-gcc and
> > > > > >
> > > > > > v2.0.0-rc3 and v2.0.0-before-common-eal, html report for
> > > librte_eal.so
> > > > > > shows removed symbols for "per_lcore__cpuset"
> > > > > >
> > > > > > v2.0.0-rc3 and v2.0.0-common-eal, html report for librte_eal.so
> shows
> > > > > > removed symbols for "per_lcore__cpuset"
> > > > > >
> > > > > > Removed symbol is different from what you have reported and in my
> > > case I
> > > > > > see it even before my commit. If you are interested I can unicast
> > > you html
> > > > > > report file. Please let me know how to proceed.
> > > > > >
> > > > > >
> > > > >
> > > > > I did some experiment and found some interesting things.  I will
> take
> > > eal.c
> > > > > as an example
> > > > >
> > > > > eal.c is split into eal_common_sysfs.c eal_common_mem_cfg.c
> > > > > eal_common_proc_type.c and eal_common_app_usage.c. In
> > > linuxapp/eal/Makefile
> > > > > if I compile new files right after eal.c as shown below
> > > > >
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) := eal.c
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_sysfs.c
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_mem_cfg.c
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_proc_type.c
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_app_usage.c
> > > > > ...
> > > > >
> > > > > validate-abi results matches baseline. Instead if i place new
> _common_
> > > > > files in common area in linuxapp/eal/Makefile as shown below
> > > > >
> > > > > # from common dir
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_memzone.c
> > > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_log.c
> > > > > ...
> > > > >
> > > > > validate-abi reports problem in binary compatibility and source
> > > > > compatiblity
> > > > >
> > > > > eal_filesystem.h, librte_eal.so.1
> > > > >  [+] eal_parse_sysfs_value ( char const* filename, unsigned long*
> val )
> > > > >  @@ DPDK_2.0 (2)
> > > > >
> > > > > I believe files in common and linuxapp directory are compiled same
> way
> > > so
> > > > > not sure why placement in makefile makes difference.
> > > > >
> > > > > Could this be false-positive from validate-abi script??
> > > > >
> > > > It could be, yes.  Though I'm more inclined to think that perhaps in
> the
> > > new
> > > > version of the code we're not generating ithe same dwarf information
> out
> > > of it.
> > > > In fact for some reason, I've checked both the build before and after
> > > your
> > > > patch series, and the exported CFLAGS aren't getting passed to the
> build
> > > > properly, implying that we're not building all the code in the
> validator
> > > with
> > > > the -g flag, which the validator need to function properly.  I'm
> looking
> > > into
> > > > that
> > > > Neil
> > > >
> > > >
> > > Found the problem, I was stupidly reading the report incorrectly.  The
> > > problem
> > > regarding _lcore_id is a source compatibilty issue (because the symbol
> > > moved to
> > > a new location), which is irrelevant to us.  Its not in any way a
> binary
> > > compat
> > > problem, which is what we care about.  Sorry for the noise.
> > >
> > > I do still have a few concerns about some changed calling conventions
> with
> > > a few
> > > other functions, which I'll look into on monday.
> > >
> > >
> > Please let me know your inputs on changed calling conventions. Most of
> them
> > can be fixed by re-arranging moved code in _common_ files and order of
> > compilation.
> >
> If moving the order of compliation around fixes the problem, then I am
> reasonably convinced that it is, if not a false positive, a minor issue
> with the
> compilers dwarf information (The compiler just can't sanely change the
> location
> in which parameters are passed).  If you make those changes, I'll ACK
> them, and
> look into whats going on with the calling conventions
>

Issues like the one shown below are taken care by reordering the code
compilation.

eal_parse_sysfs_value ( char const* filename, unsigned long* val )

Change
The parameter filename became passed on stack instead of rdi register

Effect
Violation of the calling convention. This may result in crash or incorrect
behavior of applications.

Last one that is left out is in

rte_thread_set_affinity ( rte_cpuset_t* p1 )

Change
The parameter *p1* became passed in *rdi* register instead of stack.

Effect
Violation of the calling convention. This may result in crash or incorrect
behavior of applications.

After checking abi-0.99.pdf (x86-64.org) looks like for
"rte_thread_set_affinity" new code is doing the right thing by passing the
parameter in "rdi" register since pointer is classified as "integer_class".
Nothing needs to be fixed here. After you confirm that warning can be
ignored I will work on sending new revision.

Thanks,
Ravi


> Thanks!
> Neil
>
> > Thanks,
> > Ravi
> >
> > Regards
> > > Neil
> > >
> > >
>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC PATCH] ethdev: remove old flow director API
  @ 2015-04-27 16:08  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-04-27 16:08 UTC (permalink / raw)
  To: Venky Venkatesan, Neil Horman; +Cc: dev

2015-04-20 09:45, Venky Venkatesan:
> On 04/20/2015 09:33 AM, Neil Horman wrote:
> > On Mon, Apr 20, 2015 at 04:11:43PM +0200, Thomas Monjalon wrote:
> >> It's time to remove this old API.
> >> It seems some work is still needed to rely only on eth_ctrl API.
> >> At least ixgbe, i40e and testpmd must be fixed.
> >> Jingjing, do you think it's possible to remove all these structures
> >> from rte_ethdev.h?
> >>
> >> Thanks
> >>
> > NAK.
> >
> > I'm certainly not opposed to removing the API's if they are truly no longer
> > needed.  But they have been codified as part of the ABI, so the deprecation
> > schedule needs to be followed.  Given what you've said above, it seems like that
> > might be worthwhile anyway, as it will provide the needed runway to allow users
> > to convert to the new API.
> >
> > Neil
> +1 NAK. Agree with Neil.

+1 Agree with you :)

The goal of this RFC proposal is to see how to progress on API cleanup.
There are actually 2 parts:
1/ The flow director functions of rte_ethdev.h were only used for enic in
DPDK 2.0. We could set a deprecation notice to remove them in DPDK 2.2.
2/ Some associated structures are also used for rte_eth_conf.
My question was to check how it would be relevant to remove this rte_fdir_conf.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Beyond DPDK 2.0
  2015-04-27 13:39  0%                 ` Wiles, Keith
@ 2015-04-27 15:34  0%                   ` Marc Sune
  0 siblings, 0 replies; 200+ results
From: Marc Sune @ 2015-04-27 15:34 UTC (permalink / raw)
  To: Wiles, Keith, Neil Horman; +Cc: dev



On 27/04/15 15:39, Wiles, Keith wrote:
>
> On 4/27/15, 4:52 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
>
>>
>> On 27/04/15 03:41, Wiles, Keith wrote:
>>> On 4/26/15, 4:56 PM, "Neil Horman" <nhorman@tuxdriver.com> wrote:
>>>
>>>> On Sat, Apr 25, 2015 at 04:08:23PM +0000, Wiles, Keith wrote:
>>>>> On 4/25/15, 8:30 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
>>>>>
>>>>>> On 24/04/15 19:51, Matthew Hall wrote:
>>>>>>> On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
>>>>>>>> I can tell you that if DPDK were GPL-based, my company wouldn't be
>>>>>>>> using
>>>>>>>> it. I suspect we wouldn't be the only ones...
>>>>>>>>
>>>>>>>> Jay
>>>>>>> I could second this, from the past employer where I used it. Right
>>>>> now
>>>>>>> I am
>>>>>>> using it in an open source app, I have a bit of GPL here and there
>>>>> but
>>>>>>> I'm
>>>>>>> trying to get rid of it or confine it to separate address spaces,
>>>>> where
>>>>>>> it
>>>>>>> won't impact the core code written around DPDK, as I don't want to
>>>>> cause
>>>>>>> headaches for any downstream users I attract someday.
>>>>>>>
>>>>>>> Hard-core GPL would not be possible for most. LGPL could be
>>>>>>> possible,
>>>>>>> but I
>>>>>>> don't think it could be worth the relicensing headache for that
>>>>>>> small
>>>>>>> change.
>>>>>>>
>>>>>>> Instead we should make the patch process as easy as humanly possible
>>>>> so
>>>>>>> people
>>>>>>> are encouraged to send us the fixes and not cart them around their
>>>>>>> companies
>>>>>>> constantly.
>>>>> +1 and besides the GPL or LGPL ship has sailed IMHO and we can not go
>>>>> back.
>>>> Actually, IANAL, but I think we can.  The BSD license allows us to fork
>>>> and
>>>> relicense the code I think, under GPL or any other license.  I'm not
>>>> advocating
>>>> for that mind you, just suggesting that its possible should it ever
>>>> become
>>>> needed.
>>>>
>>>>>> I agree. My feeling is that as the number of patches in the mailing
>>>>> list
>>>>>> grows, keeping track of them gets more and more complicated.
>>>>>> Patchwork
>>>>>> website was a way to try to address this issue. I think it was an
>>>>>> improvement, but to be honest, patchwork lacks a lot of
>>>>>> functionality,
>>>>>> such as properly tracking multiple versions of the patch (superseding
>>>>>> them automatically), and it lacks some filtering capabilities e.g.
>>>>>> per
>>>>>> user, per tag/label or library, automatically track if it has been
>>>>>> merged, give an overall status of the pending vs merged patches, set
>>>>>> milestones... Is there any alternative tool or improved version for
>>>>> that?
>>>>>
>>>> Agreed, this has come up before, off list unfortunately.  The volume of
>>>> patches
>>>> seems to be increasing at such a rate that a single maintainer has
>>>> difficulty
>>>> keeping up.  I proposed that the workload be split out to multiple
>>>> subtrees,
>>>> with prefixes being added to patch subjects on the list for local
>>>> filtering to
>>>> stem the tide.  Specifically I had proposed that the PMD's be split
>>>> into a
>>>> separate subtree, but that received pushback in favor of having each
>>>> library
>>>> having its own separate subtree, with a pilot program being made out of
>>>> the I40e
>>>> driver (which you might note sends pull requests to the list now).  I'd
>>>> still
>>>> like to see all PMD's come under a single subtree, but thats likely an
>>>> argument
>>>> for later.
>>>>
>>>> That said, Do you think that this patch latency is really a contributor
>>>> to low
>>>> project participation?  It definately a problem, but it seems to me
>>>> that
>>>> this
>>>> sort of issue would lead to people trying to parcitipate, then giving
>>>> up
>>>> (i.e.
>>>> we would see 1-2 emails from an individual, then not see them again).
>>>> I'd need
>>>> to look through the mailing list for such a pattern, but anecdotally
>>>> I've
>>>> not
>>>> seen that happen.  The problem you describe above is definately a
>>>> problem, but
>>>> its one for those individuals who are participating, not for those who
>>>> are
>>>> simply choosing not to.  And I think we need to address both.
>>>>
>>>>> I agree patchwork has some limitation, but I think the biggest issue
>>>>> is
>>>>> keeping up with the patches. Getting patches introduced into the main
>>>>> line
>>>>> is very slow. A patch submitted today may not get applied for weeks or
>>>>> months, then when another person submits a patch he is starting to
>>>>> run a
>>>>> very high risk of having to redo that patch, because a pervious patch
>>>>> makes his fail weeks/months later. I would love to see a better tool
>>>>> then
>>>>> patchwork, but the biggest issue is we have a huge backlog of patches.
>>>>> Personally I am not sure how Thomas or any is able to keep up with the
>>>>> patches.
>>>>>
>>>> This is absolutely a problem.  I'd like to think, more than a tool like
>>>> patchwork, a subtree organization to allow some modicum of parallel
>>>> review and
>>>> integration would really be a benefit here.
>>> Subtrees could work, but the real problem I think is the number of
>>> committers must be higher then one. Something like GitHub (and I assume
>>> Linux Foundation) have a method to add committers to a project. In the
>>> case of GitHub they just have to have a free GitHub account and they can
>>> become committers of the project buying the owner of the project enables
>>> them.
>>>
>>> On GitHub they have personal accounts and organization accounts I know
>>> only about the personal accounts, but they allow for 5 private repos and
>>> any number of public repos. The organization account has a lot of extra
>>> features that seem better for a DPDK community IMO and should be the one
>>> we use if we decide it is the right direction. We can always give it a
>>> shot for while and keep the dpdk.org and use dev@dpdk.org and its repo
>>> mirrored from GitHub as a transition phase. This way we can fall back to
>>> dpdk.org or move one to something else if we like.
>>>
>>> https://help.github.com/categories/organizations/
>>>
>>> The developers could still send patches via email list, but creating a
>>> repo and forking dpdk is easy, then send a pull request.
>> For the github "community" or free service, organization accounts just
>> allow you to set teams, where each time can be assigned to one or more
>> repositories. The differences are summarized here:
>>
>> https://help.github.com/articles/what-s-the-difference-between-user-and-or
>> ganization-accounts/
>>
>> And the permission schema, per team, is summarized here:
>>
>> https://help.github.com/articles/permission-levels-for-an-organization-rep
>> ository/
>>
>> Some limitations: i) only if the team has write permissions (IOW push
>> permissions) you can manage issues ii) there cannot be per-branch ACLs.
> I was assuming the organization GitHub is just to allow more then one
> admin/maintainers along with teams if needed. I would assume the repos are
> still public and others are allowed to fork or pull the repos. I think of
> the org version is just extra controls on top of a personal repo like
> design. The org/personal one should appear to the
> non-maintainers/admins/owner as a normal repo on GitHub, correct?

Right

>
> The GitHub organization is built for open-source and you can still have
> private repos, but then you start to have a cost depending on the number
> of private repos you want. If you do not have a lot of private repos then
> you should have no cost (I think). I do not see any reason for private
> repos, but I guest we could have some and we get 5 free and 10 is $25 per
> month.

I don't see the reason either, and I don't know why private repos would 
be useful here.

>>>
>>>>> The other problem I see is how patches are agreed on to be included in
>>>>> the
>>>>> mainline. Today it is just an ACK or a NAK on the mailing list. Then I
>>>>> see
>>>>> what I think to be only a few people ACKing or NAKing patches. This
>>>>> process has a lot of problems from a patch being ignore for some
>>>>> reason
>>>>> or
>>>>> someone having negative feed back on very minor detail or no way to
>>>>> push a
>>>>> patch forward a single NAK or comment.
>>>>>
>>>> So, this is an interesting issue in ideal meritocracies.  Currently
>>>> is/should be
>>>> looking for ACKs/NAK/s from the individuals listed in the MAINTAINER
>>>> files, and
>>>> those people should be the definitive subject matter experts on the
>>>> code
>>>> they
>>>> cover.  As such, I would agrue that they should be entitled to a
>>>> modicum
>>>> of
>>>> stylistic/trivial leeway.  That is to say, if they choose to block a
>>>> patch
>>>> around a very minor detail, then between them changing their position,
>>>> and the
>>>> patch author changing the code, the latter is likely the easier course
>>>> of
>>>> action, especially if the author can't make an argument for their
>>>> position.
>>>> That said, if such patch blockage becomes so egregious that individuals
>>>> stop
>>>> contributing, that needs to be known as well.  If you as a patch
>>>> author:
>>>>
>>>> 1) Have tried to submit patches
>>>> 2) Had them blocked for what you consider trivial reasons
>>>> 3) Plan to not contribute further because of this
>>>> 4) Still rely on the DPDK for your product
>>>>
>>>> Please, say something.  People in charge need to know when they're
>>>> pushing
>>>> contributors away.
>>>>
>>>> FWIW, I've tried to do some correlation between the git history and the
>>>> mailing
>>>> list.  I need to do more searches, but I have a feeling that early on,
>>>> the
>>>> majority of people who stopped contributing, did so because their
>>>> patches
>>>> weren't expressely blocked, but rather because they were simply
>>>> ignored.
>>>> No one
>>>> working on DPDK bothered to review those patches, and so they never got
>>>> merged.
>>>> Hopefully that problem has been addressed somewhat now.
>> I agree 100%
>>>>> I would like to see some type of layering process to allow patches to
>>>>> be
>>>>> applied in a timely manner a few weeks not months or completely
>>>>> ignored.
>>>>> Maybe some type of voting is reasonable, but we need to do something
>>>>> to
>>>>> turn around the patches in clean reasonable manner.
>>>>>
>>>>> Think we need some type of group meeting every week to look at the
>>>>> patches
>>>>> and determining which ones get applied, this gives quick feedback to
>>>>> the
>>>>> submitter as to the status of the patch.
>>>>>
>>>> I think a group meeting is going to be way too much overhead to manage
>>>> properly.
>>>> You'll get different people every week with agenda that may not line up
>>>> with
>>>> code quality, which is really what the review is meant to provide.  I
>>>> think
>>> I was only suggesting the maintainers attend the meeting. Of course they
>>> have to attend or have someone attend for them, just to get the voting
>>> done. If you do not attend then you do not get to vote or something like
>>> that is reasonable. Not that we should try and define the process here.
>>>
>>>> perhaps a better approach would be to require that that code owners
>>>> from
>>>> the
>>>> maintainer file provide and ACK/NAK on their patches within 3-4 days,
>>>> and
>>>> require a corresponding tree maintainer to apply the patch within 7 or
>>>> so.  That
>>>> would cap our patch latency.  Likewise, if a patch slips in creating a
>>>> regression, the author needs to be alerted and given a time window in
>>>> which to
>>>> fix the problem before the offending patch is reverted during the QE
>>>> cycle.
>>>>
>>>>
>>>>>> On the other side, since user questions, community discussions and
>>>>>> development happens in the same mailing list, things get really
>>>>>> complicated, specially for users seeking for help. Even though I
>>>>>> think
>>>>>> the average skills of the users of DPDK is generally higher than in
>>>>>> other software projects, if DPDK wants to attract more users, having
>>>>>> a
>>>>>> better user support is key, IMHO.
>>>>>>
>>>>>> So I would see with good eyes a separation between, at least,
>>>>>> dpdk-user
>>>>>> and dpdk-dev.
>>>> I wouldn't argue with this separation, seems like a reasonable
>>>> approach.
>>>>
>>>>> I do not remember seeing too many users on the list and making a list
>>>>> just
>>>>> for then is OK if everyone is fine with a list that has very few
>>>>> emails.
>>>>>> If the number of patches keeps growing, splitting the "dev" mailing
>>>>>> lists into different categories (eal and common, pmds, higher level
>>>>>> abstractions...) could be an option. However, this last point opens a
>>>>>> lot of questions on how to minimize interference between the
>>>>>> different
>>>>>> parts and API/ABI compatibility during the development.
>>>>> I believe if we just make sure we use tags in the subject line then we
>>>>> can
>>>>> have our email clients do the splitting of the emails instead of
>>>>> adding
>>>>> more emails lists.
>>>>>
>>>> Agreed
>> I think it is a good idea too. Maybe we can standardize some format e.g.
>> [TAG][PATCH vX], or something like that.
>>
>>>>>>> Perhaps it means having some ReviewBoard type of tools, a clone in
>>>>>>> Github or
>>>>>>> Bitbucket where the less hardcore kernel-workflow types could send
>>>>> back
>>>>>>> their
>>>>>>> small bug fixes a bit more easily, this kind of stuff. Google has
>>>>> been
>>>>>>> getting
>>>>>>> good uptake since they moved most of their open source across to
>>>>> Github,
>>>>>>> because the contribution workflow was more convenient than Google
>>>>> Code
>>>>>>> was.
>>>>> I like GitHub it is a much better designed tool then patchwork, plus
>>>>> it
>>>>> could get more eyes as it is very well know to the developer community
>>>>> in
>>>>> general. I feel GitHub has many advantages over the current systems in
>>>>> place but, it does not solve the all patch issues.
>>>>>
>>>> Github is actually a bit irritating for this sort of thing, as it
>>>> presumes a web
>>>> based interface for discussion.  They have some modicum of email
>>>> forwarding
>>>> enabled, but it has never quite worked right, or integrated properly.
>> An alternative to githubs and bitbuckets is a self-hosted forge, like
>> gitlab:
>>
>> https://about.gitlab.com/
>>
>> To be honest, I mostly work on open-source repositories, and in our
>> organization we use only gitlab for private repositories, so I haven't
>> played that much with it. But it seems to do its job and has almost all
>> of the features of the "community" github, if not more. I don't know if
>> you can even integrate it with github's accounts somehow, to prevent to
>> have to register.
>>
>> However, one of the important points of using github/bitbucket is
>> visibility and ease the contribution process. By using an self-hosted
>> solution, even if it is similar to github and well advertised in DPDK's
>> website, you kind of loose part of that advantage.
> I would suggest we use GitHub then picking yet another not as well know
> Git Repo system, if we decide to change.

I agree. I was just pointing out this as an option instead of 
github/bitbucket. Basically to (still) self-host the repository and tools.

>>> Email forwarding has seemed to work for me and in one case it took a bit
>>> to have GitHub stop sending me emails on a repo I did not want anymore
>>> :-)
>>>>> The only way we can get patch issues resolved is to put a bit more
>>>>> process
>>>>> in place.
>>>>>> Although I agree, we have to be careful on how github or bitbucket is
>>>>>> used. Having issues or even (e.g. github) pull requests *in addition*
>>>>> to
>>>>>> the normal contribution workflow can be a nightmare to deal with, in
>>>>>> terms of synchronization and preventing double work. So I guess
>>>>>> setting
>>>>>> up an official github or bitbucket mirror would be fine, via some
>>>>> simple
>>>>>> cronjob, but I guess it would end-up not using PRs or issues in
>>>>>> github
>>>>>> like the Linux kernel does.
>>>> 100% agree, we can't be split about this.  Allowing contributions from
>>>> n
>>>> channels just means most developers will only see/reviews 1/nth of the
>>>> patches
>>>> of interest to them.
>>> If we setup a GitHub or some other site, we would need to make Github
>>> the
>>> primary site to remove this type of problem IMO.
>> You mean changing the workflow from email based to issues and pull-req
>> or github pull req? Do you really think this is possible?
> Yes, I think pull-req is the standard GitHub method as everyone needs a
> repo anyway. If we can figure out how to integrate the email patches that
> would be great.

I think it is quite complicated. It needs to be completely seemless or 
it won't work, and we will have part of the discussions in the mailing 
list, and part in the pull-req issues.

I would think it the other way around => pull requests are "echoed" to 
the mailing list to be discussed there, and always CCed (how) to the 
issue to capture the discussion there too. Not trivial at all.

marc

>>>>>   From what I can tell GitHub seems to be a better solution for a free
>>>>> open
>>>>> environment. Bitbucket I have never used and GitHub seems more popular
>>>>> from one article I read.
>>>>>
>>>>>
>>>>>
>>>>> https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UT
>>>>> F-
>>>>> 8#
>>>>> q=bitbucket%20vs%20github
>>>>>
>>>>>
>>>>>> Btw, is this github organization already registered by Intel or some
>>>>>> other company of the community?
>>>>>>
>>>>>> https://github.com/dpdk
>>>>>>
>>> I was hoping someone would own up to the GitHub dpdk site.
>> Just wanted to know if this was the case. But, even if that would not be
>> the case, I *guess* that, as it happens with other services like
>> twitter, facebook..., Intel could claim the user, since it has the
>> registered trademark.
>>
>> marc
>>
>>>>>> Marc
>>>>> If we can used the above that would be great, but a name like
>>>>> Œdpdk-community¹ or something could work too.
>>>>>
>>>>> We can host the web site here and have many sub-projects like
>>>>> Pktgen-DPDK
>>>>> :-) under the same page. Not to say anything bad about our current web
>>>>> pages as I find it difficult to use sometimes and find things like
>>>>> patchwork link. Maintaining a web site is a full time job and GitHub
>>>>> does
>>>>> maintain the site, plus we can collaborate on host web page on the
>>>>> GitHub
>>>>> site easier.
>>>>>
>>>>> Moving to the Linux Foundation is an option as well as it is very well
>>>>> know and has some nice ways to get your project promoted. It does
>>>>> have a
>>>>> few drawbacks in process handling and cost to state a few. The process
>>>>> model is all ready defined, which is good and bad it just depends on
>>>>> your
>>>>> needs IMO.
>>>>>
>>>>> Regards,
>>>>> ++Keith
>>>>>
>>>>>>> Matthew.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Beyond DPDK 2.0
  2015-04-27 10:29  0%               ` Neil Horman
@ 2015-04-27 13:50  0%                 ` Wiles, Keith
  0 siblings, 0 replies; 200+ results
From: Wiles, Keith @ 2015-04-27 13:50 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev



On 4/27/15, 5:29 AM, "Neil Horman" <nhorman@tuxdriver.com> wrote:

>On Mon, Apr 27, 2015 at 01:41:11AM +0000, Wiles, Keith wrote:
>> 
>> 
>> On 4/26/15, 4:56 PM, "Neil Horman" <nhorman@tuxdriver.com> wrote:
>> 
>> >On Sat, Apr 25, 2015 at 04:08:23PM +0000, Wiles, Keith wrote:
>> >> 
>> >> 
>> >> On 4/25/15, 8:30 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
>> >> 
>> >> >
>> >> >
>> >> >On 24/04/15 19:51, Matthew Hall wrote:
>> >> >> On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
>> >> >>> I can tell you that if DPDK were GPL-based, my company wouldn't
>>be
>> >> >>>using
>> >> >>> it. I suspect we wouldn't be the only ones...
>> >> >>>
>> >> >>> Jay
>> >> >> I could second this, from the past employer where I used it. Right
>> >>now
>> >> >>I am
>> >> >> using it in an open source app, I have a bit of GPL here and there
>> >>but
>> >> >>I'm
>> >> >> trying to get rid of it or confine it to separate address spaces,
>> >>where
>> >> >>it
>> >> >> won't impact the core code written around DPDK, as I don't want to
>> >>cause
>> >> >> headaches for any downstream users I attract someday.
>> >> >>
>> >> >> Hard-core GPL would not be possible for most. LGPL could be
>>possible,
>> >> >>but I
>> >> >> don't think it could be worth the relicensing headache for that
>>small
>> >> >>change.
>> >> >>
>> >> >> Instead we should make the patch process as easy as humanly
>>possible
>> >>so
>> >> >>people
>> >> >> are encouraged to send us the fixes and not cart them around their
>> >> >>companies
>> >> >> constantly.
>> >> 
>> >> +1 and besides the GPL or LGPL ship has sailed IMHO and we can not go
>> >>back.
>> >Actually, IANAL, but I think we can.  The BSD license allows us to fork
>> >and
>> >relicense the code I think, under GPL or any other license.  I'm not
>> >advocating
>> >for that mind you, just suggesting that its possible should it ever
>>become
>> >needed.
>> >
>> >> >
>> >> >I agree. My feeling is that as the number of patches in the mailing
>> >>list
>> >> >grows, keeping track of them gets more and more complicated.
>>Patchwork
>> >> >website was a way to try to address this issue. I think it was an
>> >> >improvement, but to be honest, patchwork lacks a lot of
>>functionality,
>> >> >such as properly tracking multiple versions of the patch
>>(superseding
>> >> >them automatically), and it lacks some filtering capabilities e.g.
>>per
>> >> >user, per tag/label or library, automatically track if it has been
>> >> >merged, give an overall status of the pending vs merged patches, set
>> >> >milestones... Is there any alternative tool or improved version for
>> >>that?
>> >> 
>> >Agreed, this has come up before, off list unfortunately.  The volume of
>> >patches
>> >seems to be increasing at such a rate that a single maintainer has
>> >difficulty
>> >keeping up.  I proposed that the workload be split out to multiple
>> >subtrees,
>> >with prefixes being added to patch subjects on the list for local
>> >filtering to
>> >stem the tide.  Specifically I had proposed that the PMD's be split
>>into a
>> >separate subtree, but that received pushback in favor of having each
>> >library
>> >having its own separate subtree, with a pilot program being made out of
>> >the I40e
>> >driver (which you might note sends pull requests to the list now).  I'd
>> >still
>> >like to see all PMD's come under a single subtree, but thats likely an
>> >argument
>> >for later.
>> >
>> >That said, Do you think that this patch latency is really a contributor
>> >to low
>> >project participation?  It definately a problem, but it seems to me
>>that
>> >this
>> >sort of issue would lead to people trying to parcitipate, then giving
>>up
>> >(i.e.
>> >we would see 1-2 emails from an individual, then not see them again).
>> >I'd need
>> >to look through the mailing list for such a pattern, but anecdotally
>>I've
>> >not
>> >seen that happen.  The problem you describe above is definately a
>> >problem, but
>> >its one for those individuals who are participating, not for those who
>>are
>> >simply choosing not to.  And I think we need to address both.
>> >
>> >> I agree patchwork has some limitation, but I think the biggest issue
>>is
>> >> keeping up with the patches. Getting patches introduced into the main
>> >>line
>> >> is very slow. A patch submitted today may not get applied for weeks
>>or
>> >> months, then when another person submits a patch he is starting to
>>run a
>> >> very high risk of having to redo that patch, because a pervious patch
>> >> makes his fail weeks/months later. I would love to see a better tool
>> >>then
>> >> patchwork, but the biggest issue is we have a huge backlog of
>>patches.
>> >> Personally I am not sure how Thomas or any is able to keep up with
>>the
>> >> patches.
>> >> 
>> >This is absolutely a problem.  I'd like to think, more than a tool like
>> >patchwork, a subtree organization to allow some modicum of parallel
>> >review and
>> >integration would really be a benefit here.
>> Subtrees could work, but the real problem I think is the number of
>> committers must be higher then one. Something like GitHub (and I assume
>> Linux Foundation) have a method to add committers to a project. In the
>> case of GitHub they just have to have a free GitHub account and they can
>> become committers of the project buying the owner of the project enables
>> them.
>> 
>> On GitHub they have personal accounts and organization accounts I know
>> only about the personal accounts, but they allow for 5 private repos and
>> any number of public repos. The organization account has a lot of extra
>> features that seem better for a DPDK community IMO and should be the one
>> we use if we decide it is the right direction. We can always give it a
>> shot for while and keep the dpdk.org and use dev@dpdk.org and its repo
>> mirrored from GitHub as a transition phase. This way we can fall back to
>> dpdk.org or move one to something else if we like.
>> 
>> https://help.github.com/categories/organizations/
>> 
>> The developers could still send patches via email list, but creating a
>> repo and forking dpdk is easy, then send a pull request.
>> 
>I'm not opposed to github per-se, but nothing described above is unique to
>github. Theres no reason we can't allow multiple comitters to the current
>tree
>as hosted on the current server, we just have to configure it as such.
>
>And FWIW, the assumption is that, with multiple subtrees, you implicitly
>have
>multiple comitters, assuming that pull requests from those subtree
>maintainers
>are trusted by the top level tree maintainer.
>
>In fact I feel somewhat better about that model as it provides a nice
>stairstep
>integration path for new features.
>
>Not explicitly opposed to a movement to github, I just feel like it may
>not
>address the problem at hand.

As I see your concerns is creating multiple repos or splitting up the
current repo, which can be done in a single GitHub org account and they
all appear on the page. This way we can move the current other repos like
Pktgen to this location and everyone sees all of the repos in a much
easier way IMO. The org account at GitHub gives you the multiple
committers and even teams. I see we only need one team today for DPDK repo
and then we have something like Pktgen as a different team and so on.
>
>> 
>> >
>> >> The other problem I see is how patches are agreed on to be included
>>in
>> >>the
>> >> mainline. Today it is just an ACK or a NAK on the mailing list. Then
>>I
>> >>see
>> >> what I think to be only a few people ACKing or NAKing patches. This
>> >> process has a lot of problems from a patch being ignore for some
>>reason
>> >>or
>> >> someone having negative feed back on very minor detail or no way to
>> >>push a
>> >> patch forward a single NAK or comment.
>> >> 
>> >
>> >So, this is an interesting issue in ideal meritocracies.  Currently
>> >is/should be
>> >looking for ACKs/NAK/s from the individuals listed in the MAINTAINER
>> >files, and
>> >those people should be the definitive subject matter experts on the
>>code
>> >they
>> >cover.  As such, I would agrue that they should be entitled to a
>>modicum
>> >of
>> >stylistic/trivial leeway.  That is to say, if they choose to block a
>>patch
>> >around a very minor detail, then between them changing their position,
>> >and the
>> >patch author changing the code, the latter is likely the easier course
>>of
>> >action, especially if the author can't make an argument for their
>> >position.
>> >That said, if such patch blockage becomes so egregious that individuals
>> >stop
>> >contributing, that needs to be known as well.  If you as a patch
>>author:
>> >
>> >1) Have tried to submit patches
>> >2) Had them blocked for what you consider trivial reasons
>> >3) Plan to not contribute further because of this
>> >4) Still rely on the DPDK for your product
>> >
>> >Please, say something.  People in charge need to know when they're
>>pushing
>> >contributors away.
>> >
>> >FWIW, I've tried to do some correlation between the git history and the
>> >mailing
>> >list.  I need to do more searches, but I have a feeling that early on,
>>the
>> >majority of people who stopped contributing, did so because their
>>patches
>> >weren't expressely blocked, but rather because they were simply
>>ignored.
>> >No one
>> >working on DPDK bothered to review those patches, and so they never got
>> >merged.
>> >Hopefully that problem has been addressed somewhat now.
>> >
>> >> I would like to see some type of layering process to allow patches
>>to be
>> >> applied in a timely manner a few weeks not months or completely
>>ignored.
>> >> Maybe some type of voting is reasonable, but we need to do something
>>to
>> >> turn around the patches in clean reasonable manner.
>> >> 
>> >> Think we need some type of group meeting every week to look at the
>> >>patches
>> >> and determining which ones get applied, this gives quick feedback to
>>the
>> >> submitter as to the status of the patch.
>> >> 
>> >I think a group meeting is going to be way too much overhead to manage
>> >properly.
>> >You'll get different people every week with agenda that may not line up
>> >with
>> >code quality, which is really what the review is meant to provide.  I
>> >think
>> 
>> I was only suggesting the maintainers attend the meeting. Of course they
>> have to attend or have someone attend for them, just to get the voting
>> done. If you do not attend then you do not get to vote or something like
>> that is reasonable. Not that we should try and define the process here.
>> 
>If you use multiple subtrees, theres no need for a meeting, or any sort of
>defiend process for voting, theres only an implicitly defined heirarchy of
>acceptance in bundled changesets.  If a desire of the community is to see
>more
>efficient review and lower changeset acceptance latency, it seems to be
>that a
>weekly meeting of any sort is somewhat anathema to that.

That is fine if you do not want a meeting, but just trying to figure out
the best solution to the problems.
>
>> >perhaps a better approach would be to require that that code owners
>>from
>> >the
>> >maintainer file provide and ACK/NAK on their patches within 3-4 days,
>>and
>> >require a corresponding tree maintainer to apply the patch within 7 or
>> >so.  That
>> >would cap our patch latency.  Likewise, if a patch slips in creating a
>> >regression, the author needs to be alerted and given a time window in
>> >which to
>> >fix the problem before the offending patch is reverted during the QE
>> >cycle.
>> >
>> >
>> >> >
>> >> >On the other side, since user questions, community discussions and
>> >> >development happens in the same mailing list, things get really
>> >> >complicated, specially for users seeking for help. Even though I
>>think
>> >> >the average skills of the users of DPDK is generally higher than in
>> >> >other software projects, if DPDK wants to attract more users,
>>having a
>> >> >better user support is key, IMHO.
>> >> >
>> >> >So I would see with good eyes a separation between, at least,
>>dpdk-user
>> >> >and dpdk-dev.
>> >> 
>> >I wouldn't argue with this separation, seems like a reasonable
>>approach.
>> >
>> >> I do not remember seeing too many users on the list and making a list
>> >>just
>> >> for then is OK if everyone is fine with a list that has very few
>>emails.
>> >> >
>> >> >If the number of patches keeps growing, splitting the "dev" mailing
>> >> >lists into different categories (eal and common, pmds, higher level
>> >> >abstractions...) could be an option. However, this last point opens
>>a
>> >> >lot of questions on how to minimize interference between the
>>different
>> >> >parts and API/ABI compatibility during the development.
>> >> 
>> >> I believe if we just make sure we use tags in the subject line then
>>we
>> >>can
>> >> have our email clients do the splitting of the emails instead of
>>adding
>> >> more emails lists.
>> >> 
>> >Agreed
>> >
>> >> >
>> >> >>
>> >> >> Perhaps it means having some ReviewBoard type of tools, a clone in
>> >> >>Github or
>> >> >> Bitbucket where the less hardcore kernel-workflow types could send
>> >>back
>> >> >>their
>> >> >> small bug fixes a bit more easily, this kind of stuff. Google has
>> >>been
>> >> >>getting
>> >> >> good uptake since they moved most of their open source across to
>> >>Github,
>> >> >> because the contribution workflow was more convenient than Google
>> >>Code
>> >> >>was.
>> >> 
>> >> I like GitHub it is a much better designed tool then patchwork, plus
>>it
>> >> could get more eyes as it is very well know to the developer
>>community
>> >>in
>> >> general. I feel GitHub has many advantages over the current systems
>>in
>> >> place but, it does not solve the all patch issues.
>> >> 
>> >Github is actually a bit irritating for this sort of thing, as it
>> >presumes a web
>> >based interface for discussion.  They have some modicum of email
>> >forwarding
>> >enabled, but it has never quite worked right, or integrated properly.
>> 
>> Email forwarding has seemed to work for me and in one case it took a bit
>> to have GitHub stop sending me emails on a repo I did not want anymore
>>:-)
>
>Forwarding works fine, its responding that doesn't usually work well.
>Emails
>from pull requests and issues are forwarded from a 'do not reply' address
>and
>responding requires that you visit the github page in a browser.  Thats
>especially trying with patch review, as there is no real concept of
>patches on a
>list, only pull requests which require that you pull a new branch in and
>review
>it.  Once you have to leave your MUA, your efficiency quickly goes down,
>which
>is what we're trying to avoid here.

OK, I see your point and yes that could be an issue, but not a huge impact
in efficiency IMO. Not everyone and not all of the time will you need to
go to the web site.
>
>
>> >
>> >> The only way we can get patch issues resolved is to put a bit more
>> >>process
>> >> in place.
>> >> >
>> >> >Although I agree, we have to be careful on how github or bitbucket
>>is
>> >> >used. Having issues or even (e.g. github) pull requests *in
>>addition*
>> >>to
>> >> >the normal contribution workflow can be a nightmare to deal with, in
>> >> >terms of synchronization and preventing double work. So I guess
>>setting
>> >> >up an official github or bitbucket mirror would be fine, via some
>> >>simple
>> >> >cronjob, but I guess it would end-up not using PRs or issues in
>>github
>> >> >like the Linux kernel does.
>> >> 
>> >100% agree, we can't be split about this.  Allowing contributions from
>>n
>> >channels just means most developers will only see/reviews 1/nth of the
>> >patches
>> >of interest to them.
>> 
>> If we setup a GitHub or some other site, we would need to make Github
>>the
>> primary site to remove this type of problem IMO.
>> >
>> >> From what I can tell GitHub seems to be a better solution for a free
>> >>open
>> >> environment. Bitbucket I have never used and GitHub seems more
>>popular
>> >> from one article I read.
>> >> 
>> >> 
>> 
>>>>https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UT
>>>>F-
>> >>8#
>> >> q=bitbucket%20vs%20github
>> >> 
>> >> 
>> >> >Btw, is this github organization already registered by Intel or some
>> >> >other company of the community?
>> >> >
>> >> >https://github.com/dpdk
>> >> >
>> 
>> I was hoping someone would own up to the GitHub dpdk site.
>> 
>Hmm, looks almost defunct.  If no one steps up, perhaps reporting abuse
>for
>camping on the name might be worthwhile?  That would at least get their
>attention.

+1
>
>Neil
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c
  @ 2015-04-27 13:44  0%                   ` Neil Horman
  2015-04-27 22:39  3%                     ` Ravi Kerur
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-04-27 13:44 UTC (permalink / raw)
  To: Ravi Kerur; +Cc: dev

On Sat, Apr 25, 2015 at 05:09:01PM -0700, Ravi Kerur wrote:
> On Sat, Apr 25, 2015 at 6:02 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Sat, Apr 25, 2015 at 08:32:42AM -0400, Neil Horman wrote:
> > > On Fri, Apr 24, 2015 at 06:45:06PM -0700, Ravi Kerur wrote:
> > > > On Fri, Apr 24, 2015 at 2:24 PM, Ravi Kerur <rkerur@gmail.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 24, 2015 at 12:51 PM, Neil Horman <nhorman@tuxdriver.com
> > >
> > > > > wrote:
> > > > >
> > > > >> On Fri, Apr 24, 2015 at 12:21:23PM -0700, Ravi Kerur wrote:
> > > > >> > On Fri, Apr 24, 2015 at 11:53 AM, Neil Horman <
> > nhorman@tuxdriver.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > On Fri, Apr 24, 2015 at 09:45:24AM -0700, Ravi Kerur wrote:
> > > > >> > > > On Fri, Apr 24, 2015 at 8:22 AM, Neil Horman <
> > nhorman@tuxdriver.com
> > > > >> >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > On Fri, Apr 24, 2015 at 08:14:04AM -0700, Ravi Kerur wrote:
> > > > >> > > > > > On Fri, Apr 24, 2015 at 6:51 AM, Neil Horman <
> > > > >> nhorman@tuxdriver.com>
> > > > >> > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > On Thu, Apr 23, 2015 at 02:35:31PM -0700, Ravi Kerur
> > wrote:
> > > > >> > > > > > > > Changes in v7
> > > > >> > > > > > > > Remove _setname_ pthread calls.
> > > > >> > > > > > > > Use rte_gettid() API in RTE_LOG to print thread_id.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Changes in v6
> > > > >> > > > > > > > Remove RTE_EXEC_ENV_BSDAPP from eal_common_thread.c
> > file.
> > > > >> > > > > > > > Add pthread_setname_np/pthread_set_name_np for
> > Linux/FreeBSD
> > > > >> > > > > > > > respectively. Plan to use _getname_ in RTE_LOG when
> > > > >> available.
> > > > >> > > > > > > > Use existing rte_get_systid() in RTE_LOG to print
> > thread_id.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Changes in v5
> > > > >> > > > > > > > Rebase to latest code.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Changes in v4
> > > > >> > > > > > > > None
> > > > >> > > > > > > >
> > > > >> > > > > > > > Changes in v3
> > > > >> > > > > > > > Changed subject to be more explicit on file name
> > inclusion.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Changes in v2
> > > > >> > > > > > > > None
> > > > >> > > > > > > >
> > > > >> > > > > > > > Changes in v1
> > > > >> > > > > > > > eal_thread.c has minor differences between Linux and
> > BSD,
> > > > >> move
> > > > >> > > > > > > > entire file into common directory.
> > > > >> > > > > > > > Use RTE_EXEC_ENV_BSDAPP to differentiate on minor
> > > > >> differences.
> > > > >> > > > > > > > Rename eal_thread.c to eal_common_thread.c
> > > > >> > > > > > > > Makefile changes to reflect file move and name change.
> > > > >> > > > > > > > Fix checkpatch warnings.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Signed-off-by: Ravi Kerur <rkerur@gmail.com>
> > > > >> > > > > > > > ---
> > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/Makefile        |   2 +-
> > > > >> > > > > > > >  lib/librte_eal/bsdapp/eal/eal_thread.c    | 152
> > > > >> > > > > > > ------------------------------
> > > > >> > > > > > > >  lib/librte_eal/common/eal_common_thread.c | 147
> > > > >> > > > > > > ++++++++++++++++++++++++++++-
> > > > >> > > > > > > >  lib/librte_eal/linuxapp/eal/eal_thread.c  | 152
> > > > >> > > > > > > +-----------------------------
> > > > >> > > > > > > >  4 files changed, 148 insertions(+), 305 deletions(-)
> > > > >> > > > > > > >
> > > > >> > > > > > > > diff --git a/lib/librte_eal/bsdapp/eal/Makefile
> > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/Makefile
> > > > >> > > > > > > > index 2357cfa..55971b9 100644
> > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/Makefile
> > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/Makefile
> > > > >> > > > > > > > @@ -87,7 +87,7 @@ CFLAGS_eal_common_log.o :=
> > -D_GNU_SOURCE
> > > > >> > > > > > > >  # workaround for a gcc bug with noreturn attribute
> > > > >> > > > > > > >  # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> > > > >> > > > > > > >  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > > > >> > > > > > > > -CFLAGS_eal_thread.o += -Wno-return-type
> > > > >> > > > > > > > +CFLAGS_eal_common_thread.o += -Wno-return-type
> > > > >> > > > > > > >  CFLAGS_eal_hpet.o += -Wno-return-type
> > > > >> > > > > > > >  endif
> > > > >> > > > > > > >
> > > > >> > > > > > > > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > >> > > > > > > b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > >> > > > > > > > index 9a03437..5714b8f 100644
> > > > >> > > > > > > > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > >> > > > > > > > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > > > >> > > > > > > > @@ -35,163 +35,11 @@
> > > > >> > > > > > > >  #include <stdio.h>
> > > > >> > > > > > > >  #include <stdlib.h>
> > > > >> > > > > > > >  #include <stdint.h>
> > > > >> > > > > > > > -#include <unistd.h>
> > > > >> > > > > > > > -#include <sched.h>
> > > > >> > > > > > > > -#include <pthread_np.h>
> > > > >> > > > > > > > -#include <sys/queue.h>
> > > > >> > > > > > > >  #include <sys/thr.h>
> > > > >> > > > > > > >
> > > > >> > > > > > > > -#include <rte_debug.h>
> > > > >> > > > > > > > -#include <rte_atomic.h>
> > > > >> > > > > > > > -#include <rte_launch.h>
> > > > >> > > > > > > > -#include <rte_log.h>
> > > > >> > > > > > > > -#include <rte_memory.h>
> > > > >> > > > > > > > -#include <rte_memzone.h>
> > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > >> > > > > > > > -#include <rte_eal.h>
> > > > >> > > > > > > > -#include <rte_per_lcore.h>
> > > > >> > > > > > > > -#include <rte_lcore.h>
> > > > >> > > > > > > > -
> > > > >> > > > > > > >  #include "eal_private.h"
> > > > >> > > > > > > >  #include "eal_thread.h"
> > > > >> > > > > > > >
> > > > >> > > > > > > > -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > LCORE_ID_ANY;
> > > > >> > > > > > > NAK, these are exported symbols, you can't remove them
> > without
> > > > >> > > going
> > > > >> > > > > > > through the
> > > > >> > > > > > > deprecation process.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > They are not removed/deleted, they are moved from
> > eal_thread.c
> > > > >> to
> > > > >> > > > > > eal_common_thread.c file since it is common to both Linux
> > and
> > > > >> BSD.
> > > > >> > > > > >
> > > > >> > > > > Then perhaps you forgot to export the symbol?  Its showing
> > up as
> > > > >> > > removed
> > > > >> > > > > on the
> > > > >> > > > > ABI checker utility.
> > > > >> > > > >
> > > > >> > > > > Neil
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > > Can you please show me in the current code where it is being
> > > > >> exported? I
> > > > >> > > > have only moved definitions to _common_ files, not sure why it
> > > > >> should be
> > > > >> > > > exported now.  I searched in the current code for
> > > > >> RTE_DEFINE_PER_LCORE
> > > > >> > > >
> > > > >> > > > #home/rkerur/dpdk-tmp/dpdk# grep -ir RTE_DEFINE_PER_LCORE *
> > > > >> > > > app/test/test_per_lcore.c:static
> > RTE_DEFINE_PER_LCORE(unsigned,
> > > > >> test) =
> > > > >> > > > 0x12345678;
> > > > >> > > >
> > > > >>
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > >> > > >
> > > > >>
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > >> > > >
> > > > >> > >
> > > > >>
> > lib/librte_eal/linuxapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > >> > > > _cpuset);
> > > > >> > > >
> > > > >>
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > >> > > > _lcore_id) = LCORE_ID_ANY;
> > > > >> > > >
> > > > >>
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(unsigned,
> > > > >> > > > _socket_id) = (unsigned)SOCKET_ID_ANY;
> > > > >> > > >
> > > > >>
> > lib/librte_eal/bsdapp/eal/eal_thread.c:RTE_DEFINE_PER_LCORE(rte_cpuset_t,
> > > > >> > > > _cpuset);
> > > > >> > > > lib/librte_eal/common/include/rte_per_lcore.h:#define
> > > > >> > > > RTE_DEFINE_PER_LCORE(type, name)            \
> > > > >> > > > lib/librte_eal/common/include/rte_eal.h:    static
> > > > >> > > > RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
> > > > >> > > >
> > lib/librte_eal/common/eal_common_errno.c:RTE_DEFINE_PER_LCORE(int,
> > > > >> > > > _rte_errno);
> > > > >> > > > lib/librte_eal/common/eal_common_errno.c:    static
> > > > >> > > > RTE_DEFINE_PER_LCORE(char[RETVAL_SZ], retval);
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > > > Thanks
> > > > >> > > > > > Ravi
> > > > >> > > > > >
> > > > >> > > > > > Regards
> > > > >> > > > > > > Neil
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > >
> > > > >> > > Its exported in the version map file:
> > > > >> > >  per_lcore__lcore_id;
> > > > >> > >
> > > > >> > >
> > > > >> > Thanks Neil, I checked and both linux and bsd rte_eal_version.map
> > have
> > > > >> it.
> > > > >> > I compared .map file between "changed code" and the original,
> > they are
> > > > >> same
> > > > >> > for both linux and bsd. In fact you had ACK'd v4 version of this
> > patch
> > > > >> > series and no major changes after that. Please let me know if I
> > missed
> > > > >> > something.
> > > > >> >
> > > > >> I did, and I'm retracting that, because I didn't think to check the
> > ABI
> > > > >> compatibility on this.  But I ran it throught the ABI checking
> > script
> > > > >> this and
> > > > >> this error popped out.  You should run it as well, its in the
> > scripts
> > > > >> directory.
> > > > >>
> > > > >>
> > > > >> I see in your first patch you removed it and re-added it in the
> > common
> > > > >> section.
> > > > >> But something about how its building is causing it to not show up
> > as an
> > > > >> exported
> > > > >> symbol, which is problematic, as other applications are going to
> > want
> > > > >> access to
> > > > >> it.
> > > > >>
> > > > >> It also possible that the ABI checker is throwing a false positive,
> > but
> > > > >> either
> > > > >> way, it needs to be looked into prior to moving forward with this.
> > > > >>
> > > > >>
> > > > > I did following things.
> > > > >
> > > > > Put a tag (v2.0.0-before-common-eal)  before EAL common functions
> > changes
> > > > > for commit (3c0c807038ad642f4be7deb9370293c39d12f029 net: remove
> > unneeded
> > > > > include)
> > > > >
> > > > > Put a tag (v2.0.0-common-eal) after EAL common functions changes for
> > > > > commit (25737e5a7212630a7b5d8ca756860a062f403789 Move common
> > functions in
> > > > > eal_pci.c)
> > > > >
> > > > > Ran validate-abi against x86_64-native-linuxapp-gcc and
> > > > >
> > > > > v2.0.0-rc3 and v2.0.0-before-common-eal, html report for
> > librte_eal.so
> > > > > shows removed symbols for "per_lcore__cpuset"
> > > > >
> > > > > v2.0.0-rc3 and v2.0.0-common-eal, html report for librte_eal.so shows
> > > > > removed symbols for "per_lcore__cpuset"
> > > > >
> > > > > Removed symbol is different from what you have reported and in my
> > case I
> > > > > see it even before my commit. If you are interested I can unicast
> > you html
> > > > > report file. Please let me know how to proceed.
> > > > >
> > > > >
> > > >
> > > > I did some experiment and found some interesting things.  I will take
> > eal.c
> > > > as an example
> > > >
> > > > eal.c is split into eal_common_sysfs.c eal_common_mem_cfg.c
> > > > eal_common_proc_type.c and eal_common_app_usage.c. In
> > linuxapp/eal/Makefile
> > > > if I compile new files right after eal.c as shown below
> > > >
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) := eal.c
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_sysfs.c
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_mem_cfg.c
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_proc_type.c
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_app_usage.c
> > > > ...
> > > >
> > > > validate-abi results matches baseline. Instead if i place new _common_
> > > > files in common area in linuxapp/eal/Makefile as shown below
> > > >
> > > > # from common dir
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_memzone.c
> > > > SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_log.c
> > > > ...
> > > >
> > > > validate-abi reports problem in binary compatibility and source
> > > > compatiblity
> > > >
> > > > eal_filesystem.h, librte_eal.so.1
> > > >  [+] eal_parse_sysfs_value ( char const* filename, unsigned long* val )
> > > >  @@ DPDK_2.0 (2)
> > > >
> > > > I believe files in common and linuxapp directory are compiled same way
> > so
> > > > not sure why placement in makefile makes difference.
> > > >
> > > > Could this be false-positive from validate-abi script??
> > > >
> > > It could be, yes.  Though I'm more inclined to think that perhaps in the
> > new
> > > version of the code we're not generating ithe same dwarf information out
> > of it.
> > > In fact for some reason, I've checked both the build before and after
> > your
> > > patch series, and the exported CFLAGS aren't getting passed to the build
> > > properly, implying that we're not building all the code in the validator
> > with
> > > the -g flag, which the validator need to function properly.  I'm looking
> > into
> > > that
> > > Neil
> > >
> > >
> > Found the problem, I was stupidly reading the report incorrectly.  The
> > problem
> > regarding _lcore_id is a source compatibilty issue (because the symbol
> > moved to
> > a new location), which is irrelevant to us.  Its not in any way a binary
> > compat
> > problem, which is what we care about.  Sorry for the noise.
> >
> > I do still have a few concerns about some changed calling conventions with
> > a few
> > other functions, which I'll look into on monday.
> >
> >
> Please let me know your inputs on changed calling conventions. Most of them
> can be fixed by re-arranging moved code in _common_ files and order of
> compilation.
> 
If moving the order of compliation around fixes the problem, then I am
reasonably convinced that it is, if not a false positive, a minor issue with the
compilers dwarf information (The compiler just can't sanely change the location
in which parameters are passed).  If you make those changes, I'll ACK them, and
look into whats going on with the calling conventions

Thanks!
Neil

> Thanks,
> Ravi
> 
> Regards
> > Neil
> >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Beyond DPDK 2.0
  2015-04-27  9:52  0%               ` Marc Sune
@ 2015-04-27 13:39  0%                 ` Wiles, Keith
  2015-04-27 15:34  0%                   ` Marc Sune
  0 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2015-04-27 13:39 UTC (permalink / raw)
  To: Marc Sune, Neil Horman; +Cc: dev



On 4/27/15, 4:52 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:

>
>
>On 27/04/15 03:41, Wiles, Keith wrote:
>>
>> On 4/26/15, 4:56 PM, "Neil Horman" <nhorman@tuxdriver.com> wrote:
>>
>>> On Sat, Apr 25, 2015 at 04:08:23PM +0000, Wiles, Keith wrote:
>>>>
>>>> On 4/25/15, 8:30 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
>>>>
>>>>>
>>>>> On 24/04/15 19:51, Matthew Hall wrote:
>>>>>> On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
>>>>>>> I can tell you that if DPDK were GPL-based, my company wouldn't be
>>>>>>> using
>>>>>>> it. I suspect we wouldn't be the only ones...
>>>>>>>
>>>>>>> Jay
>>>>>> I could second this, from the past employer where I used it. Right
>>>> now
>>>>>> I am
>>>>>> using it in an open source app, I have a bit of GPL here and there
>>>> but
>>>>>> I'm
>>>>>> trying to get rid of it or confine it to separate address spaces,
>>>> where
>>>>>> it
>>>>>> won't impact the core code written around DPDK, as I don't want to
>>>> cause
>>>>>> headaches for any downstream users I attract someday.
>>>>>>
>>>>>> Hard-core GPL would not be possible for most. LGPL could be
>>>>>>possible,
>>>>>> but I
>>>>>> don't think it could be worth the relicensing headache for that
>>>>>>small
>>>>>> change.
>>>>>>
>>>>>> Instead we should make the patch process as easy as humanly possible
>>>> so
>>>>>> people
>>>>>> are encouraged to send us the fixes and not cart them around their
>>>>>> companies
>>>>>> constantly.
>>>> +1 and besides the GPL or LGPL ship has sailed IMHO and we can not go
>>>> back.
>>> Actually, IANAL, but I think we can.  The BSD license allows us to fork
>>> and
>>> relicense the code I think, under GPL or any other license.  I'm not
>>> advocating
>>> for that mind you, just suggesting that its possible should it ever
>>>become
>>> needed.
>>>
>>>>> I agree. My feeling is that as the number of patches in the mailing
>>>> list
>>>>> grows, keeping track of them gets more and more complicated.
>>>>>Patchwork
>>>>> website was a way to try to address this issue. I think it was an
>>>>> improvement, but to be honest, patchwork lacks a lot of
>>>>>functionality,
>>>>> such as properly tracking multiple versions of the patch (superseding
>>>>> them automatically), and it lacks some filtering capabilities e.g.
>>>>>per
>>>>> user, per tag/label or library, automatically track if it has been
>>>>> merged, give an overall status of the pending vs merged patches, set
>>>>> milestones... Is there any alternative tool or improved version for
>>>> that?
>>>>
>>> Agreed, this has come up before, off list unfortunately.  The volume of
>>> patches
>>> seems to be increasing at such a rate that a single maintainer has
>>> difficulty
>>> keeping up.  I proposed that the workload be split out to multiple
>>> subtrees,
>>> with prefixes being added to patch subjects on the list for local
>>> filtering to
>>> stem the tide.  Specifically I had proposed that the PMD's be split
>>>into a
>>> separate subtree, but that received pushback in favor of having each
>>> library
>>> having its own separate subtree, with a pilot program being made out of
>>> the I40e
>>> driver (which you might note sends pull requests to the list now).  I'd
>>> still
>>> like to see all PMD's come under a single subtree, but thats likely an
>>> argument
>>> for later.
>>>
>>> That said, Do you think that this patch latency is really a contributor
>>> to low
>>> project participation?  It definately a problem, but it seems to me
>>>that
>>> this
>>> sort of issue would lead to people trying to parcitipate, then giving
>>>up
>>> (i.e.
>>> we would see 1-2 emails from an individual, then not see them again).
>>> I'd need
>>> to look through the mailing list for such a pattern, but anecdotally
>>>I've
>>> not
>>> seen that happen.  The problem you describe above is definately a
>>> problem, but
>>> its one for those individuals who are participating, not for those who
>>>are
>>> simply choosing not to.  And I think we need to address both.
>>>
>>>> I agree patchwork has some limitation, but I think the biggest issue
>>>>is
>>>> keeping up with the patches. Getting patches introduced into the main
>>>> line
>>>> is very slow. A patch submitted today may not get applied for weeks or
>>>> months, then when another person submits a patch he is starting to
>>>>run a
>>>> very high risk of having to redo that patch, because a pervious patch
>>>> makes his fail weeks/months later. I would love to see a better tool
>>>> then
>>>> patchwork, but the biggest issue is we have a huge backlog of patches.
>>>> Personally I am not sure how Thomas or any is able to keep up with the
>>>> patches.
>>>>
>>> This is absolutely a problem.  I'd like to think, more than a tool like
>>> patchwork, a subtree organization to allow some modicum of parallel
>>> review and
>>> integration would really be a benefit here.
>> Subtrees could work, but the real problem I think is the number of
>> committers must be higher then one. Something like GitHub (and I assume
>> Linux Foundation) have a method to add committers to a project. In the
>> case of GitHub they just have to have a free GitHub account and they can
>> become committers of the project buying the owner of the project enables
>> them.
>>
>> On GitHub they have personal accounts and organization accounts I know
>> only about the personal accounts, but they allow for 5 private repos and
>> any number of public repos. The organization account has a lot of extra
>> features that seem better for a DPDK community IMO and should be the one
>> we use if we decide it is the right direction. We can always give it a
>> shot for while and keep the dpdk.org and use dev@dpdk.org and its repo
>> mirrored from GitHub as a transition phase. This way we can fall back to
>> dpdk.org or move one to something else if we like.
>>
>> https://help.github.com/categories/organizations/
>>
>> The developers could still send patches via email list, but creating a
>> repo and forking dpdk is easy, then send a pull request.
>
>For the github "community" or free service, organization accounts just
>allow you to set teams, where each time can be assigned to one or more
>repositories. The differences are summarized here:
>
>https://help.github.com/articles/what-s-the-difference-between-user-and-or
>ganization-accounts/
>
>And the permission schema, per team, is summarized here:
>
>https://help.github.com/articles/permission-levels-for-an-organization-rep
>ository/
>
>Some limitations: i) only if the team has write permissions (IOW push
>permissions) you can manage issues ii) there cannot be per-branch ACLs.

I was assuming the organization GitHub is just to allow more then one
admin/maintainers along with teams if needed. I would assume the repos are
still public and others are allowed to fork or pull the repos. I think of
the org version is just extra controls on top of a personal repo like
design. The org/personal one should appear to the
non-maintainers/admins/owner as a normal repo on GitHub, correct?

The GitHub organization is built for open-source and you can still have
private repos, but then you start to have a cost depending on the number
of private repos you want. If you do not have a lot of private repos then
you should have no cost (I think). I do not see any reason for private
repos, but I guest we could have some and we get 5 free and 10 is $25 per
month.
>
>>
>>
>>>> The other problem I see is how patches are agreed on to be included in
>>>> the
>>>> mainline. Today it is just an ACK or a NAK on the mailing list. Then I
>>>> see
>>>> what I think to be only a few people ACKing or NAKing patches. This
>>>> process has a lot of problems from a patch being ignore for some
>>>>reason
>>>> or
>>>> someone having negative feed back on very minor detail or no way to
>>>> push a
>>>> patch forward a single NAK or comment.
>>>>
>>> So, this is an interesting issue in ideal meritocracies.  Currently
>>> is/should be
>>> looking for ACKs/NAK/s from the individuals listed in the MAINTAINER
>>> files, and
>>> those people should be the definitive subject matter experts on the
>>>code
>>> they
>>> cover.  As such, I would agrue that they should be entitled to a
>>>modicum
>>> of
>>> stylistic/trivial leeway.  That is to say, if they choose to block a
>>>patch
>>> around a very minor detail, then between them changing their position,
>>> and the
>>> patch author changing the code, the latter is likely the easier course
>>>of
>>> action, especially if the author can't make an argument for their
>>> position.
>>> That said, if such patch blockage becomes so egregious that individuals
>>> stop
>>> contributing, that needs to be known as well.  If you as a patch
>>>author:
>>>
>>> 1) Have tried to submit patches
>>> 2) Had them blocked for what you consider trivial reasons
>>> 3) Plan to not contribute further because of this
>>> 4) Still rely on the DPDK for your product
>>>
>>> Please, say something.  People in charge need to know when they're
>>>pushing
>>> contributors away.
>>>
>>> FWIW, I've tried to do some correlation between the git history and the
>>> mailing
>>> list.  I need to do more searches, but I have a feeling that early on,
>>>the
>>> majority of people who stopped contributing, did so because their
>>>patches
>>> weren't expressely blocked, but rather because they were simply
>>>ignored.
>>> No one
>>> working on DPDK bothered to review those patches, and so they never got
>>> merged.
>>> Hopefully that problem has been addressed somewhat now.
>I agree 100%
>>>
>>>> I would like to see some type of layering process to allow patches to
>>>>be
>>>> applied in a timely manner a few weeks not months or completely
>>>>ignored.
>>>> Maybe some type of voting is reasonable, but we need to do something
>>>>to
>>>> turn around the patches in clean reasonable manner.
>>>>
>>>> Think we need some type of group meeting every week to look at the
>>>> patches
>>>> and determining which ones get applied, this gives quick feedback to
>>>>the
>>>> submitter as to the status of the patch.
>>>>
>>> I think a group meeting is going to be way too much overhead to manage
>>> properly.
>>> You'll get different people every week with agenda that may not line up
>>> with
>>> code quality, which is really what the review is meant to provide.  I
>>> think
>> I was only suggesting the maintainers attend the meeting. Of course they
>> have to attend or have someone attend for them, just to get the voting
>> done. If you do not attend then you do not get to vote or something like
>> that is reasonable. Not that we should try and define the process here.
>>
>>> perhaps a better approach would be to require that that code owners
>>>from
>>> the
>>> maintainer file provide and ACK/NAK on their patches within 3-4 days,
>>>and
>>> require a corresponding tree maintainer to apply the patch within 7 or
>>> so.  That
>>> would cap our patch latency.  Likewise, if a patch slips in creating a
>>> regression, the author needs to be alerted and given a time window in
>>> which to
>>> fix the problem before the offending patch is reverted during the QE
>>> cycle.
>>>
>>>
>>>>> On the other side, since user questions, community discussions and
>>>>> development happens in the same mailing list, things get really
>>>>> complicated, specially for users seeking for help. Even though I
>>>>>think
>>>>> the average skills of the users of DPDK is generally higher than in
>>>>> other software projects, if DPDK wants to attract more users, having
>>>>>a
>>>>> better user support is key, IMHO.
>>>>>
>>>>> So I would see with good eyes a separation between, at least,
>>>>>dpdk-user
>>>>> and dpdk-dev.
>>> I wouldn't argue with this separation, seems like a reasonable
>>>approach.
>>>
>>>> I do not remember seeing too many users on the list and making a list
>>>> just
>>>> for then is OK if everyone is fine with a list that has very few
>>>>emails.
>>>>> If the number of patches keeps growing, splitting the "dev" mailing
>>>>> lists into different categories (eal and common, pmds, higher level
>>>>> abstractions...) could be an option. However, this last point opens a
>>>>> lot of questions on how to minimize interference between the
>>>>>different
>>>>> parts and API/ABI compatibility during the development.
>>>> I believe if we just make sure we use tags in the subject line then we
>>>> can
>>>> have our email clients do the splitting of the emails instead of
>>>>adding
>>>> more emails lists.
>>>>
>>> Agreed
>
>I think it is a good idea too. Maybe we can standardize some format e.g.
>[TAG][PATCH vX], or something like that.
>
>>>
>>>>>> Perhaps it means having some ReviewBoard type of tools, a clone in
>>>>>> Github or
>>>>>> Bitbucket where the less hardcore kernel-workflow types could send
>>>> back
>>>>>> their
>>>>>> small bug fixes a bit more easily, this kind of stuff. Google has
>>>> been
>>>>>> getting
>>>>>> good uptake since they moved most of their open source across to
>>>> Github,
>>>>>> because the contribution workflow was more convenient than Google
>>>> Code
>>>>>> was.
>>>> I like GitHub it is a much better designed tool then patchwork, plus
>>>>it
>>>> could get more eyes as it is very well know to the developer community
>>>> in
>>>> general. I feel GitHub has many advantages over the current systems in
>>>> place but, it does not solve the all patch issues.
>>>>
>>> Github is actually a bit irritating for this sort of thing, as it
>>> presumes a web
>>> based interface for discussion.  They have some modicum of email
>>> forwarding
>>> enabled, but it has never quite worked right, or integrated properly.
>
>An alternative to githubs and bitbuckets is a self-hosted forge, like
>gitlab:
>
>https://about.gitlab.com/
>
>To be honest, I mostly work on open-source repositories, and in our
>organization we use only gitlab for private repositories, so I haven't
>played that much with it. But it seems to do its job and has almost all
>of the features of the "community" github, if not more. I don't know if
>you can even integrate it with github's accounts somehow, to prevent to
>have to register.
>
>However, one of the important points of using github/bitbucket is
>visibility and ease the contribution process. By using an self-hosted
>solution, even if it is similar to github and well advertised in DPDK's
>website, you kind of loose part of that advantage.

I would suggest we use GitHub then picking yet another not as well know
Git Repo system, if we decide to change.
>
>> Email forwarding has seemed to work for me and in one case it took a bit
>> to have GitHub stop sending me emails on a repo I did not want anymore
>>:-)
>>>> The only way we can get patch issues resolved is to put a bit more
>>>> process
>>>> in place.
>>>>> Although I agree, we have to be careful on how github or bitbucket is
>>>>> used. Having issues or even (e.g. github) pull requests *in addition*
>>>> to
>>>>> the normal contribution workflow can be a nightmare to deal with, in
>>>>> terms of synchronization and preventing double work. So I guess
>>>>>setting
>>>>> up an official github or bitbucket mirror would be fine, via some
>>>> simple
>>>>> cronjob, but I guess it would end-up not using PRs or issues in
>>>>>github
>>>>> like the Linux kernel does.
>>> 100% agree, we can't be split about this.  Allowing contributions from
>>>n
>>> channels just means most developers will only see/reviews 1/nth of the
>>> patches
>>> of interest to them.
>> If we setup a GitHub or some other site, we would need to make Github
>>the
>> primary site to remove this type of problem IMO.
>
>You mean changing the workflow from email based to issues and pull-req
>or github pull req? Do you really think this is possible?

Yes, I think pull-req is the standard GitHub method as everyone needs a
repo anyway. If we can figure out how to integrate the email patches that
would be great.
>
>>>>  From what I can tell GitHub seems to be a better solution for a free
>>>> open
>>>> environment. Bitbucket I have never used and GitHub seems more popular
>>>> from one article I read.
>>>>
>>>>
>>>> 
>>>>https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UT
>>>>F-
>>>> 8#
>>>> q=bitbucket%20vs%20github
>>>>
>>>>
>>>>> Btw, is this github organization already registered by Intel or some
>>>>> other company of the community?
>>>>>
>>>>> https://github.com/dpdk
>>>>>
>> I was hoping someone would own up to the GitHub dpdk site.
>
>Just wanted to know if this was the case. But, even if that would not be
>the case, I *guess* that, as it happens with other services like
>twitter, facebook..., Intel could claim the user, since it has the
>registered trademark.
>
>marc
>
>>
>>>>> Marc
>>>> If we can used the above that would be great, but a name like
>>>> Œdpdk-community¹ or something could work too.
>>>>
>>>> We can host the web site here and have many sub-projects like
>>>> Pktgen-DPDK
>>>> :-) under the same page. Not to say anything bad about our current web
>>>> pages as I find it difficult to use sometimes and find things like
>>>> patchwork link. Maintaining a web site is a full time job and GitHub
>>>> does
>>>> maintain the site, plus we can collaborate on host web page on the
>>>> GitHub
>>>> site easier.
>>>>
>>>> Moving to the Linux Foundation is an option as well as it is very well
>>>> know and has some nice ways to get your project promoted. It does
>>>>have a
>>>> few drawbacks in process handling and cost to state a few. The process
>>>> model is all ready defined, which is good and bad it just depends on
>>>> your
>>>> needs IMO.
>>>>
>>>> Regards,
>>>> ++Keith
>>>>
>>>>>> Matthew.
>>>>
>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Beyond DPDK 2.0
       [not found]                 ` <D162FA4E.1DED8%keith.wiles@intel.com>
  2015-04-27  9:52  0%               ` Marc Sune
@ 2015-04-27 10:29  0%               ` Neil Horman
  2015-04-27 13:50  0%                 ` Wiles, Keith
  1 sibling, 1 reply; 200+ results
From: Neil Horman @ 2015-04-27 10:29 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: dev

On Mon, Apr 27, 2015 at 01:41:11AM +0000, Wiles, Keith wrote:
> 
> 
> On 4/26/15, 4:56 PM, "Neil Horman" <nhorman@tuxdriver.com> wrote:
> 
> >On Sat, Apr 25, 2015 at 04:08:23PM +0000, Wiles, Keith wrote:
> >> 
> >> 
> >> On 4/25/15, 8:30 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
> >> 
> >> >
> >> >
> >> >On 24/04/15 19:51, Matthew Hall wrote:
> >> >> On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
> >> >>> I can tell you that if DPDK were GPL-based, my company wouldn't be
> >> >>>using
> >> >>> it. I suspect we wouldn't be the only ones...
> >> >>>
> >> >>> Jay
> >> >> I could second this, from the past employer where I used it. Right
> >>now
> >> >>I am
> >> >> using it in an open source app, I have a bit of GPL here and there
> >>but
> >> >>I'm
> >> >> trying to get rid of it or confine it to separate address spaces,
> >>where
> >> >>it
> >> >> won't impact the core code written around DPDK, as I don't want to
> >>cause
> >> >> headaches for any downstream users I attract someday.
> >> >>
> >> >> Hard-core GPL would not be possible for most. LGPL could be possible,
> >> >>but I
> >> >> don't think it could be worth the relicensing headache for that small
> >> >>change.
> >> >>
> >> >> Instead we should make the patch process as easy as humanly possible
> >>so
> >> >>people
> >> >> are encouraged to send us the fixes and not cart them around their
> >> >>companies
> >> >> constantly.
> >> 
> >> +1 and besides the GPL or LGPL ship has sailed IMHO and we can not go
> >>back.
> >Actually, IANAL, but I think we can.  The BSD license allows us to fork
> >and
> >relicense the code I think, under GPL or any other license.  I'm not
> >advocating
> >for that mind you, just suggesting that its possible should it ever become
> >needed.
> >
> >> >
> >> >I agree. My feeling is that as the number of patches in the mailing
> >>list
> >> >grows, keeping track of them gets more and more complicated. Patchwork
> >> >website was a way to try to address this issue. I think it was an
> >> >improvement, but to be honest, patchwork lacks a lot of functionality,
> >> >such as properly tracking multiple versions of the patch (superseding
> >> >them automatically), and it lacks some filtering capabilities e.g. per
> >> >user, per tag/label or library, automatically track if it has been
> >> >merged, give an overall status of the pending vs merged patches, set
> >> >milestones... Is there any alternative tool or improved version for
> >>that?
> >> 
> >Agreed, this has come up before, off list unfortunately.  The volume of
> >patches
> >seems to be increasing at such a rate that a single maintainer has
> >difficulty
> >keeping up.  I proposed that the workload be split out to multiple
> >subtrees,
> >with prefixes being added to patch subjects on the list for local
> >filtering to
> >stem the tide.  Specifically I had proposed that the PMD's be split into a
> >separate subtree, but that received pushback in favor of having each
> >library
> >having its own separate subtree, with a pilot program being made out of
> >the I40e
> >driver (which you might note sends pull requests to the list now).  I'd
> >still
> >like to see all PMD's come under a single subtree, but thats likely an
> >argument
> >for later.
> >
> >That said, Do you think that this patch latency is really a contributor
> >to low
> >project participation?  It definately a problem, but it seems to me that
> >this
> >sort of issue would lead to people trying to parcitipate, then giving up
> >(i.e.
> >we would see 1-2 emails from an individual, then not see them again).
> >I'd need
> >to look through the mailing list for such a pattern, but anecdotally I've
> >not
> >seen that happen.  The problem you describe above is definately a
> >problem, but
> >its one for those individuals who are participating, not for those who are
> >simply choosing not to.  And I think we need to address both.
> >
> >> I agree patchwork has some limitation, but I think the biggest issue is
> >> keeping up with the patches. Getting patches introduced into the main
> >>line
> >> is very slow. A patch submitted today may not get applied for weeks or
> >> months, then when another person submits a patch he is starting to run a
> >> very high risk of having to redo that patch, because a pervious patch
> >> makes his fail weeks/months later. I would love to see a better tool
> >>then
> >> patchwork, but the biggest issue is we have a huge backlog of patches.
> >> Personally I am not sure how Thomas or any is able to keep up with the
> >> patches.
> >> 
> >This is absolutely a problem.  I'd like to think, more than a tool like
> >patchwork, a subtree organization to allow some modicum of parallel
> >review and
> >integration would really be a benefit here.
> Subtrees could work, but the real problem I think is the number of
> committers must be higher then one. Something like GitHub (and I assume
> Linux Foundation) have a method to add committers to a project. In the
> case of GitHub they just have to have a free GitHub account and they can
> become committers of the project buying the owner of the project enables
> them.
> 
> On GitHub they have personal accounts and organization accounts I know
> only about the personal accounts, but they allow for 5 private repos and
> any number of public repos. The organization account has a lot of extra
> features that seem better for a DPDK community IMO and should be the one
> we use if we decide it is the right direction. We can always give it a
> shot for while and keep the dpdk.org and use dev@dpdk.org and its repo
> mirrored from GitHub as a transition phase. This way we can fall back to
> dpdk.org or move one to something else if we like.
> 
> https://help.github.com/categories/organizations/
> 
> The developers could still send patches via email list, but creating a
> repo and forking dpdk is easy, then send a pull request.
> 
I'm not opposed to github per-se, but nothing described above is unique to
github. Theres no reason we can't allow multiple comitters to the current tree
as hosted on the current server, we just have to configure it as such.

And FWIW, the assumption is that, with multiple subtrees, you implicitly have
multiple comitters, assuming that pull requests from those subtree maintainers
are trusted by the top level tree maintainer.

In fact I feel somewhat better about that model as it provides a nice stairstep
integration path for new features.

Not explicitly opposed to a movement to github, I just feel like it may not
address the problem at hand.

> 
> >
> >> The other problem I see is how patches are agreed on to be included in
> >>the
> >> mainline. Today it is just an ACK or a NAK on the mailing list. Then I
> >>see
> >> what I think to be only a few people ACKing or NAKing patches. This
> >> process has a lot of problems from a patch being ignore for some reason
> >>or
> >> someone having negative feed back on very minor detail or no way to
> >>push a
> >> patch forward a single NAK or comment.
> >> 
> >
> >So, this is an interesting issue in ideal meritocracies.  Currently
> >is/should be
> >looking for ACKs/NAK/s from the individuals listed in the MAINTAINER
> >files, and
> >those people should be the definitive subject matter experts on the code
> >they
> >cover.  As such, I would agrue that they should be entitled to a modicum
> >of
> >stylistic/trivial leeway.  That is to say, if they choose to block a patch
> >around a very minor detail, then between them changing their position,
> >and the
> >patch author changing the code, the latter is likely the easier course of
> >action, especially if the author can't make an argument for their
> >position.
> >That said, if such patch blockage becomes so egregious that individuals
> >stop
> >contributing, that needs to be known as well.  If you as a patch author:
> >
> >1) Have tried to submit patches
> >2) Had them blocked for what you consider trivial reasons
> >3) Plan to not contribute further because of this
> >4) Still rely on the DPDK for your product
> >
> >Please, say something.  People in charge need to know when they're pushing
> >contributors away.
> >
> >FWIW, I've tried to do some correlation between the git history and the
> >mailing
> >list.  I need to do more searches, but I have a feeling that early on, the
> >majority of people who stopped contributing, did so because their patches
> >weren't expressely blocked, but rather because they were simply ignored.
> >No one
> >working on DPDK bothered to review those patches, and so they never got
> >merged.
> >Hopefully that problem has been addressed somewhat now.
> >
> >> I would like to see some type of layering process to allow patches to be
> >> applied in a timely manner a few weeks not months or completely ignored.
> >> Maybe some type of voting is reasonable, but we need to do something to
> >> turn around the patches in clean reasonable manner.
> >> 
> >> Think we need some type of group meeting every week to look at the
> >>patches
> >> and determining which ones get applied, this gives quick feedback to the
> >> submitter as to the status of the patch.
> >> 
> >I think a group meeting is going to be way too much overhead to manage
> >properly.
> >You'll get different people every week with agenda that may not line up
> >with
> >code quality, which is really what the review is meant to provide.  I
> >think
> 
> I was only suggesting the maintainers attend the meeting. Of course they
> have to attend or have someone attend for them, just to get the voting
> done. If you do not attend then you do not get to vote or something like
> that is reasonable. Not that we should try and define the process here.
> 
If you use multiple subtrees, theres no need for a meeting, or any sort of
defiend process for voting, theres only an implicitly defined heirarchy of
acceptance in bundled changesets.  If a desire of the community is to see more
efficient review and lower changeset acceptance latency, it seems to be that a
weekly meeting of any sort is somewhat anathema to that.  

> >perhaps a better approach would be to require that that code owners from
> >the
> >maintainer file provide and ACK/NAK on their patches within 3-4 days, and
> >require a corresponding tree maintainer to apply the patch within 7 or
> >so.  That
> >would cap our patch latency.  Likewise, if a patch slips in creating a
> >regression, the author needs to be alerted and given a time window in
> >which to
> >fix the problem before the offending patch is reverted during the QE
> >cycle.
> >
> >
> >> >
> >> >On the other side, since user questions, community discussions and
> >> >development happens in the same mailing list, things get really
> >> >complicated, specially for users seeking for help. Even though I think
> >> >the average skills of the users of DPDK is generally higher than in
> >> >other software projects, if DPDK wants to attract more users, having a
> >> >better user support is key, IMHO.
> >> >
> >> >So I would see with good eyes a separation between, at least, dpdk-user
> >> >and dpdk-dev.
> >> 
> >I wouldn't argue with this separation, seems like a reasonable approach.
> >
> >> I do not remember seeing too many users on the list and making a list
> >>just
> >> for then is OK if everyone is fine with a list that has very few emails.
> >> >
> >> >If the number of patches keeps growing, splitting the "dev" mailing
> >> >lists into different categories (eal and common, pmds, higher level
> >> >abstractions...) could be an option. However, this last point opens a
> >> >lot of questions on how to minimize interference between the different
> >> >parts and API/ABI compatibility during the development.
> >> 
> >> I believe if we just make sure we use tags in the subject line then we
> >>can
> >> have our email clients do the splitting of the emails instead of adding
> >> more emails lists.
> >> 
> >Agreed
> >
> >> >
> >> >>
> >> >> Perhaps it means having some ReviewBoard type of tools, a clone in
> >> >>Github or
> >> >> Bitbucket where the less hardcore kernel-workflow types could send
> >>back
> >> >>their
> >> >> small bug fixes a bit more easily, this kind of stuff. Google has
> >>been
> >> >>getting
> >> >> good uptake since they moved most of their open source across to
> >>Github,
> >> >> because the contribution workflow was more convenient than Google
> >>Code
> >> >>was.
> >> 
> >> I like GitHub it is a much better designed tool then patchwork, plus it
> >> could get more eyes as it is very well know to the developer community
> >>in
> >> general. I feel GitHub has many advantages over the current systems in
> >> place but, it does not solve the all patch issues.
> >> 
> >Github is actually a bit irritating for this sort of thing, as it
> >presumes a web
> >based interface for discussion.  They have some modicum of email
> >forwarding
> >enabled, but it has never quite worked right, or integrated properly.
> 
> Email forwarding has seemed to work for me and in one case it took a bit
> to have GitHub stop sending me emails on a repo I did not want anymore :-)

Forwarding works fine, its responding that doesn't usually work well.  Emails
from pull requests and issues are forwarded from a 'do not reply' address and
responding requires that you visit the github page in a browser.  Thats
especially trying with patch review, as there is no real concept of patches on a
list, only pull requests which require that you pull a new branch in and review
it.  Once you have to leave your MUA, your efficiency quickly goes down, which
is what we're trying to avoid here.


> >
> >> The only way we can get patch issues resolved is to put a bit more
> >>process
> >> in place.
> >> >
> >> >Although I agree, we have to be careful on how github or bitbucket is
> >> >used. Having issues or even (e.g. github) pull requests *in addition*
> >>to
> >> >the normal contribution workflow can be a nightmare to deal with, in
> >> >terms of synchronization and preventing double work. So I guess setting
> >> >up an official github or bitbucket mirror would be fine, via some
> >>simple
> >> >cronjob, but I guess it would end-up not using PRs or issues in github
> >> >like the Linux kernel does.
> >> 
> >100% agree, we can't be split about this.  Allowing contributions from n
> >channels just means most developers will only see/reviews 1/nth of the
> >patches
> >of interest to them.
> 
> If we setup a GitHub or some other site, we would need to make Github the
> primary site to remove this type of problem IMO.
> >
> >> From what I can tell GitHub seems to be a better solution for a free
> >>open
> >> environment. Bitbucket I have never used and GitHub seems more popular
> >> from one article I read.
> >> 
> >> 
> >>https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-
> >>8#
> >> q=bitbucket%20vs%20github
> >> 
> >> 
> >> >Btw, is this github organization already registered by Intel or some
> >> >other company of the community?
> >> >
> >> >https://github.com/dpdk
> >> >
> 
> I was hoping someone would own up to the GitHub dpdk site.
> 
Hmm, looks almost defunct.  If no one steps up, perhaps reporting abuse for
camping on the name might be worthwhile?  That would at least get their
attention.

Neil

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Beyond DPDK 2.0
       [not found]                 ` <D162FA4E.1DED8%keith.wiles@intel.com>
@ 2015-04-27  9:52  0%               ` Marc Sune
  2015-04-27 13:39  0%                 ` Wiles, Keith
  2015-04-27 10:29  0%               ` Neil Horman
  1 sibling, 1 reply; 200+ results
From: Marc Sune @ 2015-04-27  9:52 UTC (permalink / raw)
  To: Wiles, Keith, Neil Horman; +Cc: dev



On 27/04/15 03:41, Wiles, Keith wrote:
>
> On 4/26/15, 4:56 PM, "Neil Horman" <nhorman@tuxdriver.com> wrote:
>
>> On Sat, Apr 25, 2015 at 04:08:23PM +0000, Wiles, Keith wrote:
>>>
>>> On 4/25/15, 8:30 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
>>>
>>>>
>>>> On 24/04/15 19:51, Matthew Hall wrote:
>>>>> On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
>>>>>> I can tell you that if DPDK were GPL-based, my company wouldn't be
>>>>>> using
>>>>>> it. I suspect we wouldn't be the only ones...
>>>>>>
>>>>>> Jay
>>>>> I could second this, from the past employer where I used it. Right
>>> now
>>>>> I am
>>>>> using it in an open source app, I have a bit of GPL here and there
>>> but
>>>>> I'm
>>>>> trying to get rid of it or confine it to separate address spaces,
>>> where
>>>>> it
>>>>> won't impact the core code written around DPDK, as I don't want to
>>> cause
>>>>> headaches for any downstream users I attract someday.
>>>>>
>>>>> Hard-core GPL would not be possible for most. LGPL could be possible,
>>>>> but I
>>>>> don't think it could be worth the relicensing headache for that small
>>>>> change.
>>>>>
>>>>> Instead we should make the patch process as easy as humanly possible
>>> so
>>>>> people
>>>>> are encouraged to send us the fixes and not cart them around their
>>>>> companies
>>>>> constantly.
>>> +1 and besides the GPL or LGPL ship has sailed IMHO and we can not go
>>> back.
>> Actually, IANAL, but I think we can.  The BSD license allows us to fork
>> and
>> relicense the code I think, under GPL or any other license.  I'm not
>> advocating
>> for that mind you, just suggesting that its possible should it ever become
>> needed.
>>
>>>> I agree. My feeling is that as the number of patches in the mailing
>>> list
>>>> grows, keeping track of them gets more and more complicated. Patchwork
>>>> website was a way to try to address this issue. I think it was an
>>>> improvement, but to be honest, patchwork lacks a lot of functionality,
>>>> such as properly tracking multiple versions of the patch (superseding
>>>> them automatically), and it lacks some filtering capabilities e.g. per
>>>> user, per tag/label or library, automatically track if it has been
>>>> merged, give an overall status of the pending vs merged patches, set
>>>> milestones... Is there any alternative tool or improved version for
>>> that?
>>>
>> Agreed, this has come up before, off list unfortunately.  The volume of
>> patches
>> seems to be increasing at such a rate that a single maintainer has
>> difficulty
>> keeping up.  I proposed that the workload be split out to multiple
>> subtrees,
>> with prefixes being added to patch subjects on the list for local
>> filtering to
>> stem the tide.  Specifically I had proposed that the PMD's be split into a
>> separate subtree, but that received pushback in favor of having each
>> library
>> having its own separate subtree, with a pilot program being made out of
>> the I40e
>> driver (which you might note sends pull requests to the list now).  I'd
>> still
>> like to see all PMD's come under a single subtree, but thats likely an
>> argument
>> for later.
>>
>> That said, Do you think that this patch latency is really a contributor
>> to low
>> project participation?  It definately a problem, but it seems to me that
>> this
>> sort of issue would lead to people trying to parcitipate, then giving up
>> (i.e.
>> we would see 1-2 emails from an individual, then not see them again).
>> I'd need
>> to look through the mailing list for such a pattern, but anecdotally I've
>> not
>> seen that happen.  The problem you describe above is definately a
>> problem, but
>> its one for those individuals who are participating, not for those who are
>> simply choosing not to.  And I think we need to address both.
>>
>>> I agree patchwork has some limitation, but I think the biggest issue is
>>> keeping up with the patches. Getting patches introduced into the main
>>> line
>>> is very slow. A patch submitted today may not get applied for weeks or
>>> months, then when another person submits a patch he is starting to run a
>>> very high risk of having to redo that patch, because a pervious patch
>>> makes his fail weeks/months later. I would love to see a better tool
>>> then
>>> patchwork, but the biggest issue is we have a huge backlog of patches.
>>> Personally I am not sure how Thomas or any is able to keep up with the
>>> patches.
>>>
>> This is absolutely a problem.  I'd like to think, more than a tool like
>> patchwork, a subtree organization to allow some modicum of parallel
>> review and
>> integration would really be a benefit here.
> Subtrees could work, but the real problem I think is the number of
> committers must be higher then one. Something like GitHub (and I assume
> Linux Foundation) have a method to add committers to a project. In the
> case of GitHub they just have to have a free GitHub account and they can
> become committers of the project buying the owner of the project enables
> them.
>
> On GitHub they have personal accounts and organization accounts I know
> only about the personal accounts, but they allow for 5 private repos and
> any number of public repos. The organization account has a lot of extra
> features that seem better for a DPDK community IMO and should be the one
> we use if we decide it is the right direction. We can always give it a
> shot for while and keep the dpdk.org and use dev@dpdk.org and its repo
> mirrored from GitHub as a transition phase. This way we can fall back to
> dpdk.org or move one to something else if we like.
>
> https://help.github.com/categories/organizations/
>
> The developers could still send patches via email list, but creating a
> repo and forking dpdk is easy, then send a pull request.

For the github "community" or free service, organization accounts just 
allow you to set teams, where each time can be assigned to one or more 
repositories. The differences are summarized here:

https://help.github.com/articles/what-s-the-difference-between-user-and-organization-accounts/

And the permission schema, per team, is summarized here:

https://help.github.com/articles/permission-levels-for-an-organization-repository/

Some limitations: i) only if the team has write permissions (IOW push 
permissions) you can manage issues ii) there cannot be per-branch ACLs.

>
>
>>> The other problem I see is how patches are agreed on to be included in
>>> the
>>> mainline. Today it is just an ACK or a NAK on the mailing list. Then I
>>> see
>>> what I think to be only a few people ACKing or NAKing patches. This
>>> process has a lot of problems from a patch being ignore for some reason
>>> or
>>> someone having negative feed back on very minor detail or no way to
>>> push a
>>> patch forward a single NAK or comment.
>>>
>> So, this is an interesting issue in ideal meritocracies.  Currently
>> is/should be
>> looking for ACKs/NAK/s from the individuals listed in the MAINTAINER
>> files, and
>> those people should be the definitive subject matter experts on the code
>> they
>> cover.  As such, I would agrue that they should be entitled to a modicum
>> of
>> stylistic/trivial leeway.  That is to say, if they choose to block a patch
>> around a very minor detail, then between them changing their position,
>> and the
>> patch author changing the code, the latter is likely the easier course of
>> action, especially if the author can't make an argument for their
>> position.
>> That said, if such patch blockage becomes so egregious that individuals
>> stop
>> contributing, that needs to be known as well.  If you as a patch author:
>>
>> 1) Have tried to submit patches
>> 2) Had them blocked for what you consider trivial reasons
>> 3) Plan to not contribute further because of this
>> 4) Still rely on the DPDK for your product
>>
>> Please, say something.  People in charge need to know when they're pushing
>> contributors away.
>>
>> FWIW, I've tried to do some correlation between the git history and the
>> mailing
>> list.  I need to do more searches, but I have a feeling that early on, the
>> majority of people who stopped contributing, did so because their patches
>> weren't expressely blocked, but rather because they were simply ignored.
>> No one
>> working on DPDK bothered to review those patches, and so they never got
>> merged.
>> Hopefully that problem has been addressed somewhat now.
I agree 100%
>>
>>> I would like to see some type of layering process to allow patches to be
>>> applied in a timely manner a few weeks not months or completely ignored.
>>> Maybe some type of voting is reasonable, but we need to do something to
>>> turn around the patches in clean reasonable manner.
>>>
>>> Think we need some type of group meeting every week to look at the
>>> patches
>>> and determining which ones get applied, this gives quick feedback to the
>>> submitter as to the status of the patch.
>>>
>> I think a group meeting is going to be way too much overhead to manage
>> properly.
>> You'll get different people every week with agenda that may not line up
>> with
>> code quality, which is really what the review is meant to provide.  I
>> think
> I was only suggesting the maintainers attend the meeting. Of course they
> have to attend or have someone attend for them, just to get the voting
> done. If you do not attend then you do not get to vote or something like
> that is reasonable. Not that we should try and define the process here.
>
>> perhaps a better approach would be to require that that code owners from
>> the
>> maintainer file provide and ACK/NAK on their patches within 3-4 days, and
>> require a corresponding tree maintainer to apply the patch within 7 or
>> so.  That
>> would cap our patch latency.  Likewise, if a patch slips in creating a
>> regression, the author needs to be alerted and given a time window in
>> which to
>> fix the problem before the offending patch is reverted during the QE
>> cycle.
>>
>>
>>>> On the other side, since user questions, community discussions and
>>>> development happens in the same mailing list, things get really
>>>> complicated, specially for users seeking for help. Even though I think
>>>> the average skills of the users of DPDK is generally higher than in
>>>> other software projects, if DPDK wants to attract more users, having a
>>>> better user support is key, IMHO.
>>>>
>>>> So I would see with good eyes a separation between, at least, dpdk-user
>>>> and dpdk-dev.
>> I wouldn't argue with this separation, seems like a reasonable approach.
>>
>>> I do not remember seeing too many users on the list and making a list
>>> just
>>> for then is OK if everyone is fine with a list that has very few emails.
>>>> If the number of patches keeps growing, splitting the "dev" mailing
>>>> lists into different categories (eal and common, pmds, higher level
>>>> abstractions...) could be an option. However, this last point opens a
>>>> lot of questions on how to minimize interference between the different
>>>> parts and API/ABI compatibility during the development.
>>> I believe if we just make sure we use tags in the subject line then we
>>> can
>>> have our email clients do the splitting of the emails instead of adding
>>> more emails lists.
>>>
>> Agreed

I think it is a good idea too. Maybe we can standardize some format e.g. 
[TAG][PATCH vX], or something like that.

>>
>>>>> Perhaps it means having some ReviewBoard type of tools, a clone in
>>>>> Github or
>>>>> Bitbucket where the less hardcore kernel-workflow types could send
>>> back
>>>>> their
>>>>> small bug fixes a bit more easily, this kind of stuff. Google has
>>> been
>>>>> getting
>>>>> good uptake since they moved most of their open source across to
>>> Github,
>>>>> because the contribution workflow was more convenient than Google
>>> Code
>>>>> was.
>>> I like GitHub it is a much better designed tool then patchwork, plus it
>>> could get more eyes as it is very well know to the developer community
>>> in
>>> general. I feel GitHub has many advantages over the current systems in
>>> place but, it does not solve the all patch issues.
>>>
>> Github is actually a bit irritating for this sort of thing, as it
>> presumes a web
>> based interface for discussion.  They have some modicum of email
>> forwarding
>> enabled, but it has never quite worked right, or integrated properly.

An alternative to githubs and bitbuckets is a self-hosted forge, like 
gitlab:

https://about.gitlab.com/

To be honest, I mostly work on open-source repositories, and in our 
organization we use only gitlab for private repositories, so I haven't 
played that much with it. But it seems to do its job and has almost all 
of the features of the "community" github, if not more. I don't know if 
you can even integrate it with github's accounts somehow, to prevent to 
have to register.

However, one of the important points of using github/bitbucket is 
visibility and ease the contribution process. By using an self-hosted 
solution, even if it is similar to github and well advertised in DPDK's 
website, you kind of loose part of that advantage.

> Email forwarding has seemed to work for me and in one case it took a bit
> to have GitHub stop sending me emails on a repo I did not want anymore :-)
>>> The only way we can get patch issues resolved is to put a bit more
>>> process
>>> in place.
>>>> Although I agree, we have to be careful on how github or bitbucket is
>>>> used. Having issues or even (e.g. github) pull requests *in addition*
>>> to
>>>> the normal contribution workflow can be a nightmare to deal with, in
>>>> terms of synchronization and preventing double work. So I guess setting
>>>> up an official github or bitbucket mirror would be fine, via some
>>> simple
>>>> cronjob, but I guess it would end-up not using PRs or issues in github
>>>> like the Linux kernel does.
>> 100% agree, we can't be split about this.  Allowing contributions from n
>> channels just means most developers will only see/reviews 1/nth of the
>> patches
>> of interest to them.
> If we setup a GitHub or some other site, we would need to make Github the
> primary site to remove this type of problem IMO.

You mean changing the workflow from email based to issues and pull-req 
or github pull req? Do you really think this is possible?

>>>  From what I can tell GitHub seems to be a better solution for a free
>>> open
>>> environment. Bitbucket I have never used and GitHub seems more popular
>>> from one article I read.
>>>
>>>
>>> https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-
>>> 8#
>>> q=bitbucket%20vs%20github
>>>
>>>
>>>> Btw, is this github organization already registered by Intel or some
>>>> other company of the community?
>>>>
>>>> https://github.com/dpdk
>>>>
> I was hoping someone would own up to the GitHub dpdk site.

Just wanted to know if this was the case. But, even if that would not be 
the case, I *guess* that, as it happens with other services like 
twitter, facebook..., Intel could claim the user, since it has the 
registered trademark.

marc

>
>>>> Marc
>>> If we can used the above that would be great, but a name like
>>> Œdpdk-community¹ or something could work too.
>>>
>>> We can host the web site here and have many sub-projects like
>>> Pktgen-DPDK
>>> :-) under the same page. Not to say anything bad about our current web
>>> pages as I find it difficult to use sometimes and find things like
>>> patchwork link. Maintaining a web site is a full time job and GitHub
>>> does
>>> maintain the site, plus we can collaborate on host web page on the
>>> GitHub
>>> site easier.
>>>
>>> Moving to the Linux Foundation is an option as well as it is very well
>>> know and has some nice ways to get your project promoted. It does have a
>>> few drawbacks in process handling and cost to state a few. The process
>>> model is all ready defined, which is good and bad it just depends on
>>> your
>>> needs IMO.
>>>
>>> Regards,
>>> ++Keith
>>>
>>>>> Matthew.
>>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Beyond DPDK 2.0
  @ 2015-04-26 21:56  0%           ` Neil Horman
       [not found]                 ` <D162FA4E.1DED8%keith.wiles@intel.com>
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-04-26 21:56 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: dev

On Sat, Apr 25, 2015 at 04:08:23PM +0000, Wiles, Keith wrote:
> 
> 
> On 4/25/15, 8:30 AM, "Marc Sune" <marc.sune@bisdn.de> wrote:
> 
> >
> >
> >On 24/04/15 19:51, Matthew Hall wrote:
> >> On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
> >>> I can tell you that if DPDK were GPL-based, my company wouldn't be
> >>>using
> >>> it. I suspect we wouldn't be the only ones...
> >>>
> >>> Jay
> >> I could second this, from the past employer where I used it. Right now
> >>I am
> >> using it in an open source app, I have a bit of GPL here and there but
> >>I'm
> >> trying to get rid of it or confine it to separate address spaces, where
> >>it
> >> won't impact the core code written around DPDK, as I don't want to cause
> >> headaches for any downstream users I attract someday.
> >>
> >> Hard-core GPL would not be possible for most. LGPL could be possible,
> >>but I
> >> don't think it could be worth the relicensing headache for that small
> >>change.
> >>
> >> Instead we should make the patch process as easy as humanly possible so
> >>people
> >> are encouraged to send us the fixes and not cart them around their
> >>companies
> >> constantly.
> 
> +1 and besides the GPL or LGPL ship has sailed IMHO and we can not go back.
Actually, IANAL, but I think we can.  The BSD license allows us to fork and
relicense the code I think, under GPL or any other license.  I'm not advocating
for that mind you, just suggesting that its possible should it ever become
needed.

> >
> >I agree. My feeling is that as the number of patches in the mailing list
> >grows, keeping track of them gets more and more complicated. Patchwork
> >website was a way to try to address this issue. I think it was an
> >improvement, but to be honest, patchwork lacks a lot of functionality,
> >such as properly tracking multiple versions of the patch (superseding
> >them automatically), and it lacks some filtering capabilities e.g. per
> >user, per tag/label or library, automatically track if it has been
> >merged, give an overall status of the pending vs merged patches, set
> >milestones... Is there any alternative tool or improved version for that?
> 
Agreed, this has come up before, off list unfortunately.  The volume of patches
seems to be increasing at such a rate that a single maintainer has difficulty
keeping up.  I proposed that the workload be split out to multiple subtrees,
with prefixes being added to patch subjects on the list for local filtering to
stem the tide.  Specifically I had proposed that the PMD's be split into a
separate subtree, but that received pushback in favor of having each library
having its own separate subtree, with a pilot program being made out of the I40e
driver (which you might note sends pull requests to the list now).  I'd still
like to see all PMD's come under a single subtree, but thats likely an argument
for later.

That said, Do you think that this patch latency is really a contributor to low
project participation?  It definately a problem, but it seems to me that this
sort of issue would lead to people trying to parcitipate, then giving up (i.e.
we would see 1-2 emails from an individual, then not see them again).  I'd need
to look through the mailing list for such a pattern, but anecdotally I've not
seen that happen.  The problem you describe above is definately a problem, but
its one for those individuals who are participating, not for those who are
simply choosing not to.  And I think we need to address both.

> I agree patchwork has some limitation, but I think the biggest issue is
> keeping up with the patches. Getting patches introduced into the main line
> is very slow. A patch submitted today may not get applied for weeks or
> months, then when another person submits a patch he is starting to run a
> very high risk of having to redo that patch, because a pervious patch
> makes his fail weeks/months later. I would love to see a better tool then
> patchwork, but the biggest issue is we have a huge backlog of patches.
> Personally I am not sure how Thomas or any is able to keep up with the
> patches.
> 
This is absolutely a problem.  I'd like to think, more than a tool like
patchwork, a subtree organization to allow some modicum of parallel review and
integration would really be a benefit here.

> The other problem I see is how patches are agreed on to be included in the
> mainline. Today it is just an ACK or a NAK on the mailing list. Then I see
> what I think to be only a few people ACKing or NAKing patches. This
> process has a lot of problems from a patch being ignore for some reason or
> someone having negative feed back on very minor detail or no way to push a
> patch forward a single NAK or comment.
> 

So, this is an interesting issue in ideal meritocracies.  Currently is/should be
looking for ACKs/NAK/s from the individuals listed in the MAINTAINER files, and
those people should be the definitive subject matter experts on the code they
cover.  As such, I would agrue that they should be entitled to a modicum of
stylistic/trivial leeway.  That is to say, if they choose to block a patch
around a very minor detail, then between them changing their position, and the
patch author changing the code, the latter is likely the easier course of
action, especially if the author can't make an argument for their position.
That said, if such patch blockage becomes so egregious that individuals stop
contributing, that needs to be known as well.  If you as a patch author:

1) Have tried to submit patches 
2) Had them blocked for what you consider trivial reasons
3) Plan to not contribute further because of this
4) Still rely on the DPDK for your product

Please, say something.  People in charge need to know when they're pushing
contributors away.

FWIW, I've tried to do some correlation between the git history and the mailing
list.  I need to do more searches, but I have a feeling that early on, the
majority of people who stopped contributing, did so because their patches
weren't expressely blocked, but rather because they were simply ignored.  No one
working on DPDK bothered to review those patches, and so they never got merged.
Hopefully that problem has been addressed somewhat now.

> I would like to see some type of layering process to allow patches to be
> applied in a timely manner a few weeks not months or completely ignored.
> Maybe some type of voting is reasonable, but we need to do something to
> turn around the patches in clean reasonable manner.
> 
> Think we need some type of group meeting every week to look at the patches
> and determining which ones get applied, this gives quick feedback to the
> submitter as to the status of the patch.
> 
I think a group meeting is going to be way too much overhead to manage properly.
You'll get different people every week with agenda that may not line up with
code quality, which is really what the review is meant to provide.  I think
perhaps a better approach would be to require that that code owners from the
maintainer file provide and ACK/NAK on their patches within 3-4 days, and
require a corresponding tree maintainer to apply the patch within 7 or so.  That
would cap our patch latency.  Likewise, if a patch slips in creating a
regression, the author needs to be alerted and given a time window in which to
fix the problem before the offending patch is reverted during the QE cycle.


> >
> >On the other side, since user questions, community discussions and
> >development happens in the same mailing list, things get really
> >complicated, specially for users seeking for help. Even though I think
> >the average skills of the users of DPDK is generally higher than in
> >other software projects, if DPDK wants to attract more users, having a
> >better user support is key, IMHO.
> >
> >So I would see with good eyes a separation between, at least, dpdk-user
> >and dpdk-dev.
> 
I wouldn't argue with this separation, seems like a reasonable approach.

> I do not remember seeing too many users on the list and making a list just
> for then is OK if everyone is fine with a list that has very few emails.
> >
> >If the number of patches keeps growing, splitting the "dev" mailing
> >lists into different categories (eal and common, pmds, higher level
> >abstractions...) could be an option. However, this last point opens a
> >lot of questions on how to minimize interference between the different
> >parts and API/ABI compatibility during the development.
> 
> I believe if we just make sure we use tags in the subject line then we can
> have our email clients do the splitting of the emails instead of adding
> more emails lists.
> 
Agreed

> >
> >>
> >> Perhaps it means having some ReviewBoard type of tools, a clone in
> >>Github or
> >> Bitbucket where the less hardcore kernel-workflow types could send back
> >>their
> >> small bug fixes a bit more easily, this kind of stuff. Google has been
> >>getting
> >> good uptake since they moved most of their open source across to Github,
> >> because the contribution workflow was more convenient than Google Code
> >>was.
> 
> I like GitHub it is a much better designed tool then patchwork, plus it
> could get more eyes as it is very well know to the developer community in
> general. I feel GitHub has many advantages over the current systems in
> place but, it does not solve the all patch issues.
> 
Github is actually a bit irritating for this sort of thing, as it presumes a web
based interface for discussion.  They have some modicum of email forwarding
enabled, but it has never quite worked right, or integrated properly.

> The only way we can get patch issues resolved is to put a bit more process
> in place.
> >
> >Although I agree, we have to be careful on how github or bitbucket is
> >used. Having issues or even (e.g. github) pull requests *in addition* to
> >the normal contribution workflow can be a nightmare to deal with, in
> >terms of synchronization and preventing double work. So I guess setting
> >up an official github or bitbucket mirror would be fine, via some simple
> >cronjob, but I guess it would end-up not using PRs or issues in github
> >like the Linux kernel does.
> 
100% agree, we can't be split about this.  Allowing contributions from n
channels just means most developers will only see/reviews 1/nth of the patches
of interest to them.

> From what I can tell GitHub seems to be a better solution for a free open
> environment. Bitbucket I have never used and GitHub seems more popular
> from one article I read.
> 
> https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#
> q=bitbucket%20vs%20github
> 
> 
> >Btw, is this github organization already registered by Intel or some
> >other company of the community?
> >
> >https://github.com/dpdk
> >
> >Marc
> 
> If we can used the above that would be great, but a name like
> Œdpdk-community¹ or something could work too.
> 
> We can host the web site here and have many sub-projects like Pktgen-DPDK
> :-) under the same page. Not to say anything bad about our current web
> pages as I find it difficult to use sometimes and find things like
> patchwork link. Maintaining a web site is a full time job and GitHub does
> maintain the site, plus we can collaborate on host web page on the GitHub
> site easier.
> 
> Moving to the Linux Foundation is an option as well as it is very well
> know and has some nice ways to get your project promoted. It does have a
> few drawbacks in process handling and cost to state a few. The process
> model is all ready defined, which is good and bad it just depends on your
> needs IMO.
> 
> Regards,
> ++Keith
> 
> >
> >>
> >> Matthew.
> >
> 
> 

^ permalink raw reply	[relevance 0%]

Results 13201-13400 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2015-02-27  4:56     [dpdk-dev] [PATCH v6 0/8] Interrupt mode PMD Cunming Liang
2015-05-05  5:39  3% ` [dpdk-dev] From: Cunming Liang <cunming.liang@intel.com> Cunming Liang
2015-05-05  5:39  2%   ` [dpdk-dev] [PATCH v7 07/10] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-05-21  8:55  2%   ` [dpdk-dev] [PATCH v8 00/11] Interrupt mode PMD Cunming Liang
2015-05-21  8:55         ` [dpdk-dev] [PATCH v8 01/11] eal/linux: add interrupt vectors support in intr_handle Cunming Liang
2015-05-21 10:32  3%       ` Neil Horman
     [not found]             ` <20150521104300.00757b4e@urahara>
2015-05-21 17:58  4%           ` Neil Horman
2015-05-21 18:21  3%             ` Stephen Hemminger
     [not found]                 ` <20150521111400.2a04a196@urahara>
2015-05-22  0:05  4%               ` Neil Horman
     [not found]                   ` <40594e9e6e0543afa11e4dbd90e59b22@BRMWP-EXMB11.corp.brocade.com>
2015-05-22 16:52  5%                 ` Stephen Hemminger
2015-05-27 10:33  4%                   ` Neil Horman
2015-05-29  8:56  5%             ` Liang, Cunming
2015-05-21  8:56  2%     ` [dpdk-dev] [PATCH v8 08/11] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-05-29  8:45  4%     ` [dpdk-dev] [PATCH v9 00/12] Interrupt mode PMD Cunming Liang
2015-05-29  8:45  2%       ` [dpdk-dev] [PATCH v9 08/12] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-05-29  8:45 10%       ` [dpdk-dev] [PATCH v9 12/12] abi: fix v2.1 abi broken issue Cunming Liang
2015-05-29 15:27  9%         ` Stephen Hemminger
2015-06-01  8:48  9%           ` Liang, Cunming
2015-06-01 13:27  5%             ` Stephen Hemminger
2015-06-02  2:14  5%               ` Liang, Cunming
2015-05-29 15:36  5%         ` Vincent JARDIN
2015-06-01 14:11  5%         ` Stephen Hemminger
2015-06-01 14:18  5%           ` Stephen Hemminger
2015-06-02  6:53  4%       ` [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
2015-06-02  6:53  2%         ` [dpdk-dev] [PATCH v10 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-06-02  6:53 10%         ` [dpdk-dev] [PATCH v10 13/13] abi: fix v2.1 abi broken issue Cunming Liang
2015-06-05  8:19  4%         ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Cunming Liang
2015-06-05  8:20  2%           ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-06-05  8:20 10%           ` [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue Cunming Liang
2015-06-05  8:59  0%           ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Zhou, Danny
2015-06-08  5:28  4%           ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
2015-06-08  5:29  2%             ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-06-08  5:29 10%             ` [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue Cunming Liang
2015-06-09 23:59  0%             ` [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD Stephen Hemminger
2015-05-05  5:53  3% ` [dpdk-dev] [PATCH v7 00/10] " Cunming Liang
2015-02-27 13:11     [dpdk-dev] [PATCH v4 00/18] unified packet type Helin Zhang
2015-05-22  8:44  2% ` [dpdk-dev] [PATCH v5 " Helin Zhang
2015-05-22  8:44       ` [dpdk-dev] [PATCH v5 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
2015-05-22 10:09  3%     ` Neil Horman
2015-05-22  8:44  3%   ` [dpdk-dev] [PATCH v5 18/18] mbuf: remove old packet type bit masks Helin Zhang
2015-06-01  7:33  4%   ` [dpdk-dev] [PATCH v6 00/18] unified packet type Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
2015-06-01  8:14  5%       ` Olivier MATZ
2015-06-02 13:27  4%         ` O'Driscoll, Tim
2015-06-10 14:32  4%           ` Olivier MATZ
2015-06-10 14:51  0%             ` Zhang, Helin
2015-06-10 15:39  0%             ` Ananyev, Konstantin
2015-06-12  3:22  0%               ` Zhang, Helin
2015-06-10 16:14  5%             ` Thomas Monjalon
2015-06-12  7:24  5%               ` Panu Matilainen
2015-06-12  7:43  3%                 ` Zhang, Helin
2015-06-12  8:15  4%                   ` Panu Matilainen
2015-06-12  8:28  3%                     ` Zhang, Helin
2015-06-12  9:00  3%                       ` Panu Matilainen
2015-06-12  9:07  4%                       ` Bruce Richardson
2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 03/18] mbuf: add definitions of unified packet types Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 05/18] ixgbe: " Helin Zhang
2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 06/18] i40e: " Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 07/18] enic: " Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 08/18] vmxnet3: " Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 09/18] fm10k: " Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 10/18] app/test-pipeline: " Helin Zhang
2015-06-01  7:33  3%     ` [dpdk-dev] [PATCH v6 11/18] app/testpmd: " Helin Zhang
2015-06-01  7:33  4%     ` [dpdk-dev] [PATCH v6 12/18] app/test: Remove useless code Helin Zhang
2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 14/18] examples/ip_reassembly: " Helin Zhang
2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 15/18] examples/l3fwd-acl: " Helin Zhang
2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 16/18] examples/l3fwd-power: " Helin Zhang
2015-06-01  7:34  3%     ` [dpdk-dev] [PATCH v6 17/18] examples/l3fwd: " Helin Zhang
2015-06-01  7:34  4%     ` [dpdk-dev] [PATCH v6 18/18] mbuf: remove old packet type bit masks Helin Zhang
2015-04-16 10:38     [dpdk-dev] Beyond DPDK 2.0 O'Driscoll, Tim
2015-04-24  7:47     ` Luke Gorrie
2015-04-24 17:39       ` Jay Rolette
2015-04-24 17:51         ` Matthew Hall
2015-04-25 13:30           ` Marc Sune
2015-04-25 16:08             ` Wiles, Keith
2015-04-26 21:56  0%           ` Neil Horman
     [not found]                 ` <D162FA4E.1DED8%keith.wiles@intel.com>
2015-04-27  9:52  0%               ` Marc Sune
2015-04-27 13:39  0%                 ` Wiles, Keith
2015-04-27 15:34  0%                   ` Marc Sune
2015-04-27 10:29  0%               ` Neil Horman
2015-04-27 13:50  0%                 ` Wiles, Keith
2015-04-20 14:11     [dpdk-dev] [RFC PATCH] ethdev: remove old flow director API Thomas Monjalon
2015-04-20 16:33     ` Neil Horman
2015-04-20 16:45       ` Venky Venkatesan
2015-04-27 16:08  0%     ` Thomas Monjalon
2015-04-24 15:22     [dpdk-dev] [PATCH v7 1/6] Move common functions in eal_thread.c Neil Horman
2015-04-24 16:45     ` Ravi Kerur
2015-04-24 18:53       ` Neil Horman
2015-04-24 19:21         ` Ravi Kerur
2015-04-24 19:51           ` Neil Horman
2015-04-24 21:24             ` Ravi Kerur
2015-04-25  1:45               ` Ravi Kerur
2015-04-25 12:32                 ` Neil Horman
2015-04-25 13:02                   ` Neil Horman
2015-04-26  0:09                     ` Ravi Kerur
2015-04-27 13:44  0%                   ` Neil Horman
2015-04-27 22:39  3%                     ` Ravi Kerur
2015-04-28 19:35  0%                       ` Neil Horman
2015-04-28 23:52  4%                         ` Ravi Kerur
2015-04-29 10:04  3%                           ` Neil Horman
2015-04-29 17:47  5%                             ` Ravi Kerur
2015-04-30 16:00  3%                               ` Neil Horman
2015-05-01  0:15  4%                                 ` Ravi Kerur
2015-04-28 16:36     [dpdk-dev] [PATCH 0/3] eal: uio irq fixes and enhancements Stephen Hemminger
2015-05-12 20:02     ` Thomas Monjalon
2015-05-13  8:57  3%   ` Bruce Richardson
2015-05-13  9:32  0%     ` Thomas Monjalon
2015-04-28 23:46  4% [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL Ravi Kerur
2015-04-28 23:46  2% ` [dpdk-dev] [PATCH v8 1/6] Move common functions in eal_thread.c Ravi Kerur
2015-04-28 23:46  1%   ` [dpdk-dev] [PATCH v8 2/6] Move common functions in eal.c Ravi Kerur
2015-04-29 10:14  0% ` [dpdk-dev] [PATCH v8 0/6] Move common functions in EAL Neil Horman
2015-04-29  0:30  2% [dpdk-dev] [PATCH 1/3] pcap: utilize underlying real interface properties Nicolás Pernas Maradei
     [not found]     <CAFb4SLBGcR1EHL5FkJ7r6-7mqWR9UJ7GLD2cm18SJ8AuoWu_Og@mail.gmail.com>
2015-04-29  8:29  0% ` [dpdk-dev] gmake test on freeBSD Bruce Richardson
2015-04-29 17:58  0%   ` Ravi Kerur
2015-04-29 17:04     [dpdk-dev] [PATCH 0/6] rte_sched: patches against 2.o Stephen Hemminger
2015-04-29 17:04     ` [dpdk-dev] [PATCH 4/6] rte_sched: allow reading without clearing Stephen Hemminger
2015-05-11 12:53  3%   ` Thomas Monjalon
2015-05-05  2:32     [dpdk-dev] [PATCH RFC 0/6] support of QinQ stripping and insertion of i40e Helin Zhang
2015-05-26  8:36     ` [dpdk-dev] [PATCH 0/5] support i40e QinQ stripping and insertion Helin Zhang
2015-05-26  8:36       ` [dpdk-dev] [PATCH 2/5] mbuf: use the reserved 16 bits for double vlan Helin Zhang
2015-05-26 14:55  3%     ` Stephen Hemminger
2015-05-26 15:00  0%       ` Zhang, Helin
2015-05-26 15:02  3%       ` Ananyev, Konstantin
2015-05-26 15:35  3%         ` Stephen Hemminger
2015-05-26 15:46  3%           ` Ananyev, Konstantin
2015-05-27  1:07  0%             ` Zhang, Helin
2015-05-26  8:36       ` [dpdk-dev] [PATCH 3/5] i40e: support double vlan stripping and insertion Helin Zhang
2015-06-01  8:50  3%     ` Olivier MATZ
2015-06-02  2:45  0%       ` Zhang, Helin
2015-06-02  3:16  3%   ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Helin Zhang
2015-06-02  3:16  2%     ` [dpdk-dev] [PATCH v2 3/6] i40e: support double vlan " Helin Zhang
2015-06-02  7:37  0%     ` [dpdk-dev] [PATCH v2 0/6] support i40e QinQ " Liu, Jijiang
2015-06-08  7:32  0%       ` Cao, Min
2015-06-08  7:40  0%     ` Olivier MATZ
2015-06-11  7:03  3%     ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
2015-06-11  7:03  2%       ` [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan " Helin Zhang
2015-06-11  7:25  0%       ` [dpdk-dev] [PATCH v3 0/7] support i40e QinQ " Wu, Jingjing
2015-05-05 14:43     [dpdk-dev] [PATCH v3 0/6] update jhash function Pablo de Lara
2015-05-12 11:02     ` [dpdk-dev] [PATCH v4 " Pablo de Lara
2015-05-12 15:33  4%   ` Neil Horman
2015-05-13 13:52  0%     ` De Lara Guarch, Pablo
2015-05-13 14:20  0%       ` Neil Horman
2015-05-07 15:35     [dpdk-dev] [RFC PATCH 0/2] Move PMDs out of lib directory Bruce Richardson
2015-05-12 17:04     ` [dpdk-dev] [PATCH 00/19] Move PMDs to drivers directory Bruce Richardson
2015-05-12 17:05  1%   ` [dpdk-dev] [PATCH 14/19] virtio: move virtio PMD " Bruce Richardson
2015-05-15 15:56       ` [dpdk-dev] [PATCH v2 00/19] Move PMDs " Bruce Richardson
2015-05-15 15:56  1%     ` [dpdk-dev] [PATCH v2 14/19] virtio: move virtio PMD to drivers/net Bruce Richardson
2015-05-08 16:37  4% [dpdk-dev] [RFC PATCH 0/2] dynamic memzones Sergio Gonzalez Monroy
2015-05-08 16:37  1% ` [dpdk-dev] [RFC PATCH 2/2] eal: memzone allocated by malloc Sergio Gonzalez Monroy
2015-05-12 16:30  0% ` [dpdk-dev] [RFC PATCH 0/2] dynamic memzones Olivier MATZ
2015-06-06 10:32     ` [dpdk-dev] [PATCH v2 0/7] dynamic memzone Sergio Gonzalez Monroy
2015-06-06 10:32  1%   ` [dpdk-dev] [PATCH v2 2/7] eal: memzone allocated by malloc Sergio Gonzalez Monroy
2015-05-11  3:46     [dpdk-dev] [PATCH 0/6] extend flow director to support L2_paylod type and VF filtering in i40e driver Jingjing Wu
2015-05-11  3:46     ` [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow director in VFs Jingjing Wu
2015-06-12 16:45  3%   ` Thomas Monjalon
2015-06-15  7:14  3%     ` Wu, Jingjing
2015-06-16  3:43  3% ` [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type Jingjing Wu
2015-05-11 16:29     [dpdk-dev] [RFC PATCHv2 0/2] pktdev as wrapper type Bruce Richardson
2015-05-19 11:31     ` Bruce Richardson
2015-05-20  8:31       ` Thomas Monjalon
2015-05-20 10:05         ` Marc Sune
2015-05-20 10:28           ` Neil Horman
2015-05-20 17:01             ` Marc Sune
2015-05-20 18:47               ` Neil Horman
2015-05-21 12:12  3%             ` Richardson, Bruce
2015-05-11 17:07     [dpdk-dev] [PATCH v3 0/6] rte_sched: cleanups and API changes Stephen Hemminger
2015-05-11 17:07     ` [dpdk-dev] [PATCH 2/6] rte_sched: expand scheduler hierarchy for more VLAN's Stephen Hemminger
2015-05-11 17:20  3%   ` Neil Horman
     [not found]       ` <8edea4c81f624728bb5f0476b680c410@BRMWP-EXMB11.corp.brocade.com>
2015-05-11 17:32  4%     ` Stephen Hemminger
2015-05-11 17:43  4%       ` Neil Horman
2015-05-11 23:45     [dpdk-dev] [RFC PATCH 0/2] ethdev: add port speed capability bitmap Marc Sune
2015-05-11 23:45     ` [dpdk-dev] [RFC PATCH 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info Marc Sune
2015-05-26 15:03  3%   ` Stephen Hemminger
2015-05-26 15:09  0%     ` Marc Sune
2015-05-29 18:23     ` [dpdk-dev] [PATCH v2 " Thomas Monjalon
2015-06-08  8:50       ` Marc Sune
2015-06-11  9:08         ` Thomas Monjalon
2015-06-11 14:35  3%       ` Marc Sune
2015-05-14 20:55     [dpdk-dev] Technical Steering Committee (TSC) O'Driscoll, Tim
2015-05-19 14:43     ` Stephen Hemminger
2015-05-19 15:34       ` Neil Horman
2015-05-19 15:45         ` Thomas Monjalon
2015-05-19 17:34           ` Neil Horman
2015-05-19 20:21  3%         ` O'Driscoll, Tim
2015-05-21  7:49     [dpdk-dev] [PATCH 0/6] Support multiple queues in vhost Ouyang Changchun
2015-06-10  5:52     ` [dpdk-dev] [PATCH v2 0/7] " Ouyang Changchun
2015-06-10  5:52       ` [dpdk-dev] [PATCH v2 2/7] lib_vhost: Support multiple queues in virtio dev Ouyang Changchun
2015-06-11  9:54  5%     ` Panu Matilainen
2015-05-26 12:39     [dpdk-dev] [PATCH v3 00/10] table: added table statistics Maciej Gajdzica
2015-05-26 12:39     ` [dpdk-dev] [PATCH v3 01/10] table: added structure for storing table stats Maciej Gajdzica
2015-05-26 14:57  3%   ` Stephen Hemminger
2015-05-26 21:40  0%     ` Dumitrescu, Cristian
2015-05-26 21:57  0%       ` Stephen Hemminger
2015-05-28 19:32  3%         ` Dumitrescu, Cristian
2015-05-28 21:41  3%           ` Stephen Hemminger
2015-05-27 13:47     [dpdk-dev] [PATCH 1/4] kni: add function to query the name of a kni object Bruce Richardson
2015-05-27 13:52     ` Marc Sune
2015-05-27 13:55       ` Bruce Richardson
     [not found]         ` <5565D195.9040701@bisdn.de>
2015-05-27 15:36  3%       ` Bruce Richardson
2015-05-27 18:10  3% [dpdk-dev] [PATCH v4 0/5] rte_sched: cleanup and API enhancements Stephen Hemminger
2015-05-27 18:10  3% ` [dpdk-dev] [PATCH 4/5] rte_sched: hide structure of port hierarchy Stephen Hemminger
2015-05-29 15:47  3% [dpdk-dev] [PATCH] pmd: change initialization to indicate pci drivers Stephen Hemminger
2015-06-12  9:13  0% ` Thomas Monjalon
2015-05-30  0:37     [dpdk-dev] [PATCH 0/2] User-space Ethtool Liang-Min Larry Wang
2015-05-30  0:37     ` [dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs Liang-Min Larry Wang
2015-05-30 15:48       ` Stephen Hemminger
2015-05-30 16:16         ` Wang, Liang-min
2015-05-30 19:26  3%       ` Stephen Hemminger
2015-05-30 19:40  0%         ` Wang, Liang-min
2015-05-31 16:48  3%           ` Stephen Hemminger
2015-05-31 17:30  0%             ` Wang, Liang-min
2015-05-31 18:31  0%             ` Wang, Liang-min
2015-06-10 15:09     ` [dpdk-dev] [PATCH v4 0/4] User-space Ethtool Liang-Min Larry Wang
2015-06-10 15:09       ` [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info Liang-Min Larry Wang
2015-06-11 12:26         ` Ananyev, Konstantin
2015-06-11 12:57           ` Wang, Liang-min
2015-06-11 13:07             ` Ananyev, Konstantin
2015-06-11 21:51               ` Wang, Liang-min
2015-06-12 12:30                 ` Ananyev, Konstantin
2015-06-15 13:26                   ` Wang, Liang-min
2015-06-15 13:45                     ` Ananyev, Konstantin
2015-06-15 16:05                       ` David Harton (dharton)
2015-06-15 18:23  3%                     ` Ananyev, Konstantin
2015-06-02  7:55     [dpdk-dev] [PATCH v2 0/4] enable mirror functionality in i40e driver Jingjing Wu
2015-06-05  8:16  3% ` [dpdk-dev] [PATCH v3 " Jingjing Wu
2015-06-02 13:11     [dpdk-dev] add support for HTM lock elision for x86 Roman Dementiev
     [not found]     ` <CADNuJVpeKa9-R7WHkoCzw82vpYd=3XmhOoz2JfGsFLzDW+F5UQ@mail.gmail.com>
2015-06-02 13:39  0%   ` Dementiev, Roman
2015-06-04  1:00     [dpdk-dev] [PATCH 0/6] query hash key size in byte Helin Zhang
2015-06-04  1:00     ` [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size Helin Zhang
2015-06-04 13:05  3%   ` Neil Horman
2015-06-05  6:21  3%     ` Zhang, Helin
2015-06-05 10:30  0%       ` Neil Horman
2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size Helin Zhang
2015-06-04 10:38  3%     ` Ananyev, Konstantin
2015-06-12  6:06  0%       ` Zhang, Helin
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 2/6] e1000: fill the " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 3/6] fm10k: " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: show " Helin Zhang
2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
2015-06-15 15:01  3%       ` Thomas Monjalon
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 2/6] e1000: fill the " Helin Zhang
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Helin Zhang
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Helin Zhang
2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Helin Zhang
2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 6/6] app/testpmd: show " Helin Zhang
2015-06-12  9:31  0%     ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Ananyev, Konstantin
2015-06-04 14:43  3% [dpdk-dev] [PATCH 0/9] whitespace cleanups Stephen Hemminger
2015-06-04 14:43 14% ` [dpdk-dev] [PATCH 8/9] mk, scripts: remove useless blank lines Stephen Hemminger
2015-06-05  7:40 23% [dpdk-dev] [PATCH v1] abi: announce abi changes plan for interrupt mode Cunming Liang
2015-06-05 14:55  2% [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros Daniel Mrzyglod
2015-06-05 15:31  0% ` Dumitrescu, Cristian
2015-06-08 14:50  5% [dpdk-dev] [PATCH] doc: guidelines for library statistics Cristian Dumitrescu
2015-06-11 12:05  0% ` Thomas Monjalon
2015-06-15 21:46  4%   ` Dumitrescu, Cristian
2015-06-12  5:18     [dpdk-dev] [PATCH 0/3] do deprecation in 2.1 Stephen Hemminger
2015-06-12  5:18     ` [dpdk-dev] [PATCH 1/3] rte_ring: remove deprecated functions Stephen Hemminger
2015-06-12  5:46  3%   ` Panu Matilainen
2015-06-12 14:00  0%     ` Bruce Richardson
2015-06-12  5:18     ` [dpdk-dev] [PATCH 2/3] kni: " Stephen Hemminger
2015-06-12  6:20  3%   ` Panu Matilainen
2015-06-12 11:28  3% [dpdk-dev] [PATCH 0/4] ethdev: Add checks for function support in driver Bruce Richardson
2015-06-12 11:28     ` [dpdk-dev] [PATCH 4/4] ethdev: check support for rx_queue_count and descriptor_done fns Bruce Richardson
2015-06-12 17:32       ` Roger B. Melton
2015-06-15 10:14  4%     ` Bruce Richardson
2015-06-15 16:51     [dpdk-dev] [PATCH 0/3 v2] remove code marked as deprecated in 2.0 Stephen Hemminger
2015-06-15 16:51     ` [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions Stephen Hemminger
2015-06-16 13:52  3%   ` Bruce Richardson
2015-06-16 23:05  0%     ` Stephen Hemminger
2015-06-16 23:37  0%       ` Thomas Monjalon
     [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
2015-06-17  0:39  0%         ` Stephen Hemminger
2015-06-17  7:29  0%           ` Panu Matilainen
2015-06-15 16:51     ` [dpdk-dev] [PATCH 3/3] acl: mark " Stephen Hemminger
2015-06-17  7:59  4%   ` Panu Matilainen
2015-06-15 22:07  5% [dpdk-dev] [PATCH v2] doc: guidelines for library statistics Cristian Dumitrescu
2015-06-16  1:38 23% [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues Ouyang Changchun
2015-06-16 10:36  5% ` Neil Horman
2015-06-16 13:14  5% [dpdk-dev] [PATCH v2] doc: guidelines for library statistics Cristian Dumitrescu
2015-06-16 13:15  5% [dpdk-dev] [PATCH v3] " Cristian Dumitrescu
2015-06-16 13:35  5% [dpdk-dev] [PATCH v4] " Cristian Dumitrescu
2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
2015-06-17  4:36  9% ` Matthew Hall
2015-06-17  5:28  8%   ` Stephen Hemminger
2015-06-17  8:23  9%     ` Marc Sune
2015-06-17  3:36 23% [dpdk-dev] [PATCH] abi: announce abi changes plan for struct rte_eth_fdir_flow_ext Jingjing Wu
2015-06-17  5:48 20% [dpdk-dev] [PATCH] doc: announce ABI changes planned for unified packet type Helin Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).