DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [dpdk-dev] [RFC v3] latencystats: added new library for latency stats
  @ 2016-10-18 10:44  3%   ` Pattan, Reshma
  0 siblings, 0 replies; 200+ results
From: Pattan, Reshma @ 2016-10-18 10:44 UTC (permalink / raw)
  To: dev

Hi,

The below latency stats RFC adds new field to second cache line of mbuf. This is not an ABI break.
But I would like to emphasize on this  point so you can have a look into the changes and provide comments.

Thanks,
Reshma

> -----Original Message-----
> From: Pattan, Reshma
> Sent: Monday, October 17, 2016 2:40 PM
> To: dev@dpdk.org
> Cc: Pattan, Reshma <reshma.pattan@intel.com>
> Subject: [RFC v3] latencystats: added new library for latency stats
> 
> Library is designed to calculate latency stats and report them to the application
> when queried. Library measures minimum, average, maximum latencies and
> jitter in nano seconds.
> Current implementation supports global latency stats, i.e. per application stats.
> 
> Added new field to mbuf struct to mark the packet arrival time on Rx and use the
> times tamp to measure the latency on Tx.
> 
> Modified dpdk-procinfo process to display the new stats.
> 
> APIs:
> 
> Added APIs to initialize and un initialize latency stats calculation.
> Added API to retrieve latency stats names and values.
> 
> Functionality:
> 
> *Library will register ethdev Rx/Tx callbacks for each active port, queue
> combinations.
> *Library will register latency stats names with new stats library, which is under
> design for now.
> *Rx packets will be marked with time stamp on each sampling interval.
> *On Tx side, packets with time stamp will be considered for calculating the
> minimum, maximum, average latencies and jitter.
> *Average latency is calculated by summing all the latencies measured for each
> time stamped packet and dividing that by total time stamped packets.
> *Minimum and maximum latencies will be low and high latency values observed
> so far.
> *Jitter calculation is done based on inter packet delay variation.
> *Measured stats can be retrieved via get API of the libray (or) by calling generic
> get API of the new stats library, in this case callback is provided to update the
> stats  into new stats library.
> 
> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization
  2016-10-17 17:29  0%           ` Shreyansh Jain
@ 2016-10-18  9:23  0%             ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2016-10-18  9:23 UTC (permalink / raw)
  To: Shreyansh Jain, Thomas Monjalon
  Cc: dev, viktorin, David Marchand, Hemant Agrawal

On 10/17/2016 6:29 PM, Shreyansh Jain wrote:
> Hi Ferruh,
> 
>> -----Original Message-----
>> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
>> Sent: Monday, October 17, 2016 7:13 PM
>> To: Shreyansh Jain <shreyansh.jain@nxp.com>; Thomas Monjalon
>> <thomas.monjalon@6wind.com>
>> Cc: dev@dpdk.org; viktorin@rehivetech.com; David Marchand
>> <david.marchand@6wind.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
>> Subject: Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device
>> generalization
>>
>> On 10/5/2016 12:57 PM, Shreyansh Jain wrote:
>>> Hi Thomas,
>>>
>>> On Tuesday 04 October 2016 01:12 PM, Thomas Monjalon wrote:
>>>> 2016-10-04 12:21, Shreyansh Jain:
>>>>> Hi Thomas,
>>>>>
>>>>> On Monday 03 October 2016 07:58 PM, Thomas Monjalon wrote:
>>>>>> Applied, thanks everybody for the great (re)work!
>>>>>
>>>>> Thanks!
>>>>>
>>> [...]
>>> [...]
>>>>>
>>>>> It can be merged with changes for:
>>>>>   - drv_name
>>>>>   - EAL_ before _REGISTER_ macros
>>>>>   - eth_driver => rte_driver naming
>>>>
>>>> Good.
>>>> Could you make it this week, please?
>>>>
>>>
>>> Certainly. At least some of those I can send within this week :)
>>>
>>
>>
>> I caught while running ABI validation script today, I think this patch
>> should increase LIBABIVER of:
>> - lib/librte_cryptodev
>> - lib/librte_eal
>> - lib/librte_ether
>  
> Should I be referring to [1] for understanding how/when to change the LIBABIVER?
> 
> [1] http://dpdk.org/doc/guides/contributing/versioning.html

Yes, this is the document.

Briefly, if library becomes incompatible with existing applications,
LIBABIVER needs to be increased to indicate this.

Increasing LIBABIVER changes dynamic library name and so_name, and this
cause existing application do not work with this new library. Not
increasing the LIBABIVER, app may start running but can have segfault or
can generate wrong values. So increasing LIBABIVER is to inform user and
prevent surprises.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support
  2016-09-30 14:00  4% ` [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
  2016-09-30 15:03  0%   ` Pattan, Reshma
@ 2016-10-18  7:57  0%   ` Sergio Gonzalez Monroy
  1 sibling, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2016-10-18  7:57 UTC (permalink / raw)
  To: Marcin Kerlin, dev; +Cc: pablo.de.lara.guarch, thomas.monjalon

On 30/09/2016 15:00, Marcin Kerlin wrote:
> This patch ensure not overwrite device data in the multiprocess application.
>
> 1)Changes in the library introduces continuity in array rte_eth_dev_data[]
> shared between all processes. Secondary process adds new entries in free
> space instead of overwriting existing entries.
>
> 2)Changes in application testpmd allow secondary process to attach the
> mempool created by primary process rather than create new and in the case of
> quit or force quit to free devices data from shared array rte_eth_dev_data[].
>
> -------------------------
> How to reproduce the bug:
>
> 1) Run primary process:
> ./testpmd -c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0
> --proc-type=primary --file-prefix=xz1 -- -i
>
> (gdb) print rte_eth_devices[0].data.name
> $52 = "3:0.1"
> (gdb) print rte_eth_devices[1].data.name
> $53 = "3:0.0"
>
> 2) Run secondary process:
> ./testpmd -c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0
> --vdev 'eth_pcap0,rx_pcap=/var/log/device1.pcap, tx_pcap=/var/log/device2.pcap'
> --proc-type=secondary --file-prefix=xz1 -- -i
>
> (gdb) print rte_eth_devices[0].data.name
> $52 = "eth_pcap0"
> (gdb) print rte_eth_devices[1].data.name
> $53 = "eth_pcap1"
>
> 3) Go back to the primary and re-check:
> (gdb) print rte_eth_devices[0].data.name
> $54 = "eth_pcap0"
> (gdb) print rte_eth_devices[1].data.name
> $55 = "eth_pcap1"
>
> It means that secondary process overwrite data of primary process.
>
> This patch fix it and now if we go back to the primary and re-check then
> everything is fine:
> (gdb) print rte_eth_devices[0].data.name
> $56 = "3:0.1"
> (gdb) print rte_eth_devices[1].data.name
> $57 = "3:0.0"
>
> So after this fix structure rte_eth_dev_data[] will keep all data one after
> the other instead of overwriting:
> (gdb) print rte_eth_dev_data[0].name
> $52 = "3:0.1"
> (gdb) print rte_eth_dev_data[1].name
> $53 = "3:0.0"
> (gdb) print rte_eth_dev_data[2].name
> $54 = "eth_pcap0"
> (gdb) print rte_eth_dev_data[3].name
> $55 = "eth_pcap1"
> and so on will be append in the next indexes
>
> If secondary process will be turned off then also will be deleted from array:
> (gdb) print rte_eth_dev_data[0].name
> $52 = "3:0.1"
> (gdb) print rte_eth_dev_data[1].name
> $53 = "3:0.0"
> (gdb) print rte_eth_dev_data[2].name
> $54 = ""
> (gdb) print rte_eth_dev_data[3].name
> $55 = ""
> this also allows re-use index 2 and 3 for next another process
> -------------------------
>
> Breaking ABI:
> Changes in the library librte_ether causes extending existing structure
> rte_eth_dev_data with a new field lock. The reason is that this structure
> is sharing between all the processes so it should be protected against
> attempting to write from two different processes.
>
> Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
> I would like to join to this breaking ABI, if it is possible.
>
> v2:
> * fix syntax error in version script
> v3:
> * changed scope of function
> * improved description
> v4:
> * fix syntax error in version script
> v5:
> * fix header file
>
> Marcin Kerlin (2):
>    librte_ether: add protection against overwrite device data
>    app/testpmd: improve handling of multiprocess
>
>   app/test-pmd/testpmd.c                 | 37 +++++++++++++-
>   app/test-pmd/testpmd.h                 |  1 +
>   lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
>   lib/librte_ether/rte_ethdev.h          | 12 +++++
>   lib/librte_ether/rte_ether_version.map |  6 +++
>   5 files changed, 136 insertions(+), 10 deletions(-)
>

NACK series for 16.11

The patch would break the use case where primary and secondary share 
same PCI device.

Overall, I think Thomas has already mentioned it, we need further 
discussion on the use cases
and scope of DPDK multi-process.
This could be a good topic for the upcoming DPDK Userspace event.

Sergio

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization
  2016-10-17 13:43  3%         ` Ferruh Yigit
@ 2016-10-17 17:29  0%           ` Shreyansh Jain
  2016-10-18  9:23  0%             ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-10-17 17:29 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon
  Cc: dev, viktorin, David Marchand, Hemant Agrawal

Hi Ferruh,

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Monday, October 17, 2016 7:13 PM
> To: Shreyansh Jain <shreyansh.jain@nxp.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; viktorin@rehivetech.com; David Marchand
> <david.marchand@6wind.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
> Subject: Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device
> generalization
> 
> On 10/5/2016 12:57 PM, Shreyansh Jain wrote:
> > Hi Thomas,
> >
> > On Tuesday 04 October 2016 01:12 PM, Thomas Monjalon wrote:
> >> 2016-10-04 12:21, Shreyansh Jain:
> >>> Hi Thomas,
> >>>
> >>> On Monday 03 October 2016 07:58 PM, Thomas Monjalon wrote:
> >>>> Applied, thanks everybody for the great (re)work!
> >>>
> >>> Thanks!
> >>>
> > [...]
> > [...]
> >>>
> >>> It can be merged with changes for:
> >>>   - drv_name
> >>>   - EAL_ before _REGISTER_ macros
> >>>   - eth_driver => rte_driver naming
> >>
> >> Good.
> >> Could you make it this week, please?
> >>
> >
> > Certainly. At least some of those I can send within this week :)
> >
> 
> 
> I caught while running ABI validation script today, I think this patch
> should increase LIBABIVER of:
> - lib/librte_cryptodev
> - lib/librte_eal
> - lib/librte_ether
 
Should I be referring to [1] for understanding how/when to change the LIBABIVER?

[1] http://dpdk.org/doc/guides/contributing/versioning.html

> 
> Thanks,
> ferruh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization
  @ 2016-10-17 13:43  3%         ` Ferruh Yigit
  2016-10-17 17:29  0%           ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-10-17 13:43 UTC (permalink / raw)
  To: Shreyansh Jain, Thomas Monjalon
  Cc: dev, viktorin, David Marchand, hemant.agrawal

On 10/5/2016 12:57 PM, Shreyansh Jain wrote:
> Hi Thomas,
> 
> On Tuesday 04 October 2016 01:12 PM, Thomas Monjalon wrote:
>> 2016-10-04 12:21, Shreyansh Jain:
>>> Hi Thomas,
>>>
>>> On Monday 03 October 2016 07:58 PM, Thomas Monjalon wrote:
>>>> Applied, thanks everybody for the great (re)work!
>>>
>>> Thanks!
>>>
> [...]
> [...]
>>>
>>> It can be merged with changes for:
>>>   - drv_name
>>>   - EAL_ before _REGISTER_ macros
>>>   - eth_driver => rte_driver naming
>>
>> Good.
>> Could you make it this week, please?
>>
> 
> Certainly. At least some of those I can send within this week :)
> 


I caught while running ABI validation script today, I think this patch
should increase LIBABIVER of:
- lib/librte_cryptodev
- lib/librte_eal
- lib/librte_ether

Thanks,
ferruh

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API
  2016-10-11  8:21  3%     ` Adrien Mazarguil
@ 2016-10-12  2:38  0%       ` Zhao1, Wei
  0 siblings, 0 replies; 200+ results
From: Zhao1, Wei @ 2016-10-12  2:38 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev

Hi  Adrien Mazarguil,

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Tuesday, October 11, 2016 4:21 PM
> To: Zhao1, Wei <wei.zhao1@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification
> API
> 
> Hi Wei,
> 
> On Tue, Oct 11, 2016 at 01:47:53AM +0000, Zhao1, Wei wrote:
> > Hi  Adrien Mazarguil,
> >      There is a struct rte_flow_action_rss in rte_flow.txt, the  member
> rss_conf is a pointer type, is there any convenience in using pointer?
> > Why not using  struct rte_eth_rss_conf rss_conf type, as
> rte_flow_item_ipv4/ rte_flow_item_ipv6 struct member?
> >
> > Thank you.
> >
> >  struct rte_flow_action_rss {
> > 	struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
> > 	uint16_t queues; /**< Number of entries in queue[]. */
> > 	uint16_t queue[]; /**< Queues indices to use. */ };
> 
> Well I thought it made sharing flow RSS configuration with its counterpart in
> struct rte_eth_conf easier (this pointer should even be const). Also, while
> ABI breakage would still occur if rte_eth_rss_conf happened to be modified,
> the impact on this API would be limited as it would not cause a change in
> structure size. We'd ideally need some kind of version field to be completely
> safe but I guess that would be somewhat overkill.
> 
> Now considering this API was written without an initial implementation, all
> structure definitions that do not make sense are still open to debate, we can
> adjust them as needed.
> 
> --
> Adrien Mazarguil
> 6WIND

Your explanation seems very reasonable for me, structure pointer is an very experienced usage in this situation.
Thank you!

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API
  @ 2016-10-11  8:21  3%     ` Adrien Mazarguil
  2016-10-12  2:38  0%       ` Zhao1, Wei
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-10-11  8:21 UTC (permalink / raw)
  To: Zhao1, Wei; +Cc: dev

Hi Wei,

On Tue, Oct 11, 2016 at 01:47:53AM +0000, Zhao1, Wei wrote:
> Hi  Adrien Mazarguil,
>      There is a struct rte_flow_action_rss in rte_flow.txt, the  member rss_conf is a pointer type, is there any convenience in using pointer?
> Why not using  struct rte_eth_rss_conf rss_conf type, as rte_flow_item_ipv4/ rte_flow_item_ipv6 struct member?
> 
> Thank you.
> 
>  struct rte_flow_action_rss {
> 	struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
> 	uint16_t queues; /**< Number of entries in queue[]. */
> 	uint16_t queue[]; /**< Queues indices to use. */
> };

Well I thought it made sharing flow RSS configuration with its counterpart
in struct rte_eth_conf easier (this pointer should even be const). Also,
while ABI breakage would still occur if rte_eth_rss_conf happened to be
modified, the impact on this API would be limited as it would not cause a
change in structure size. We'd ideally need some kind of version field to be
completely safe but I guess that would be somewhat overkill.

Now considering this API was written without an initial implementation, all
structure definitions that do not make sense are still open to debate, we
can adjust them as needed.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3] mk: gcc -march support for intel processors code names
  2016-08-22 14:19  7% ` [dpdk-dev] [PATCH v2] " Reshma Pattan
@ 2016-10-10 21:33  8%   ` Reshma Pattan
  0 siblings, 0 replies; 200+ results
From: Reshma Pattan @ 2016-10-10 21:33 UTC (permalink / raw)
  To: dev; +Cc: Reshma Pattan

The GCC 4.9 -march option supports the intel code names for processors,
for example -march=silvermont, -march=broadwell.
The RTE_MACHINE config flag can be used to pass code name to
the compiler as -march flag.

Release notes is updated.

Linux and FreeBSD getting started guides are updated with recommended
gcc version as 4.9 and above.

Some of the gmake command examples in sample application guide and driver
guides are updated with gcc version as 4.9.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
---
 doc/guides/freebsd_gsg/build_dpdk.rst        | 4 ++--
 doc/guides/freebsd_gsg/build_sample_apps.rst | 6 +++---
 doc/guides/linux_gsg/sys_reqs.rst            | 6 +++---
 doc/guides/nics/bnx2x.rst                    | 4 ++--
 doc/guides/nics/qede.rst                     | 2 +-
 doc/guides/rel_notes/release_16_11.rst       | 5 +++++
 mk/target/generic/rte.vars.mk                | 4 ++++
 7 files changed, 20 insertions(+), 11 deletions(-)

v3:
Reverted changes of mk/toolchain/gcc/rte.toolchain-compat.mk.

v2:
Updated Linux and FreeBSD gsg guides, sample application guide and other driver doc
with recommended gcc version as 4.9 and above.

diff --git a/doc/guides/freebsd_gsg/build_dpdk.rst b/doc/guides/freebsd_gsg/build_dpdk.rst
index 27f21de..24a9f87 100644
--- a/doc/guides/freebsd_gsg/build_dpdk.rst
+++ b/doc/guides/freebsd_gsg/build_dpdk.rst
@@ -88,7 +88,7 @@ The ports required and their locations are as follows:
 For compiling and using the DPDK with gcc, the compiler must be installed
 from the ports collection:
 
-* gcc: version 4.8 is recommended ``/usr/ports/lang/gcc48``.
+* gcc: version 4.9 is recommended ``/usr/ports/lang/gcc49``.
   Ensure that ``CPU_OPTS`` is selected (default is OFF).
 
 When running the make config-recursive command, a dialog may be presented to the
@@ -164,7 +164,7 @@ For example to compile for FreeBSD use:
    If the compiler binary to be used does not correspond to that given in the
    TOOLCHAIN part of the target, the compiler command may need to be explicitly
    specified. For example, if compiling for gcc, where the gcc binary is called
-   gcc4.8, the command would need to be ``gmake install T=<target> CC=gcc4.8``.
+   gcc4.9, the command would need to be ``gmake install T=<target> CC=gcc4.9``.
 
 Browsing the Installed DPDK Environment Target
 ----------------------------------------------
diff --git a/doc/guides/freebsd_gsg/build_sample_apps.rst b/doc/guides/freebsd_gsg/build_sample_apps.rst
index 2662303..fffc4c0 100644
--- a/doc/guides/freebsd_gsg/build_sample_apps.rst
+++ b/doc/guides/freebsd_gsg/build_sample_apps.rst
@@ -54,7 +54,7 @@ the following variables must be exported:
 
 The following is an example of creating the ``helloworld`` application, which runs
 in the DPDK FreeBSD environment. While the example demonstrates compiling
-using gcc version 4.8, compiling with clang will be similar, except that the ``CC=``
+using gcc version 4.9, compiling with clang will be similar, except that the ``CC=``
 parameter can probably be omitted. The ``helloworld`` example may be found in the
 ``${RTE_SDK}/examples`` directory.
 
@@ -72,7 +72,7 @@ in the build directory.
     setenv RTE_SDK $HOME/DPDK
     setenv RTE_TARGET x86_64-native-bsdapp-gcc
 
-    gmake CC=gcc48
+    gmake CC=gcc49
       CC main.o
       LD helloworld
       INSTALL-APP helloworld
@@ -96,7 +96,7 @@ in the build directory.
     cd my_rte_app/
     setenv RTE_TARGET x86_64-native-bsdapp-gcc
 
-    gmake CC=gcc48
+    gmake CC=gcc49
       CC main.o
       LD helloworld
       INSTALL-APP helloworld
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index b321544..3d74342 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -61,8 +61,8 @@ Compilation of the DPDK
 
 *   coreutils: ``cmp``, ``sed``, ``grep``, ``arch``, etc.
 
-*   gcc: versions 4.5.x or later is recommended for ``i686/x86_64``. Versions 4.8.x or later is recommended
-    for ``ppc_64`` and ``x86_x32`` ABI. On some distributions, some specific compiler flags and linker flags are enabled by
+*   gcc: versions 4.9 or later is recommended for all platforms.
+    On some distributions, some specific compiler flags and linker flags are enabled by
     default and affect performance (``-fstack-protector``, for example). Please refer to the documentation
     of your distribution and to ``gcc -dumpspecs``.
 
@@ -82,7 +82,7 @@ Compilation of the DPDK
 .. note::
 
     x86_x32 ABI is currently supported with distribution packages only on Ubuntu
-    higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.8+.
+    higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.
 
 .. note::
 
diff --git a/doc/guides/nics/bnx2x.rst b/doc/guides/nics/bnx2x.rst
index 6453168..6d1768a 100644
--- a/doc/guides/nics/bnx2x.rst
+++ b/doc/guides/nics/bnx2x.rst
@@ -162,7 +162,7 @@ To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the following "gmake"
 command::
 
    cd <DPDK-source-directory>
-   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc48 CC=gcc48
+   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc49 CC=gcc49
 
 To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the following "gmake"
 command:
@@ -170,7 +170,7 @@ command:
 .. code-block:: console
 
    cd <DPDK-source-directory>
-   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc48 CC=gcc48
+   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc49 CC=gcc49
 
 Linux
 -----
diff --git a/doc/guides/nics/qede.rst b/doc/guides/nics/qede.rst
index 53d749c..3af755e 100644
--- a/doc/guides/nics/qede.rst
+++ b/doc/guides/nics/qede.rst
@@ -150,7 +150,7 @@ command::
 
    cd <DPDK-source-directory>
    gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=\
-                                        /usr/local/lib/gcc48 CC=gcc48
+                                        /usr/local/lib/gcc49 CC=gcc49
 
 
 Sample Application Notes
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 905186a..f55bac4 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -89,6 +89,11 @@ New Features
   * AES CBC IV generation with cipher forward function
   * AES GCM/CTR mode
 
+* **Added support for new gcc -march option.**
+
+  The GCC 4.9 ``-march`` option supports the Intel processor code names.
+  The config option ``RTE_MACHINE`` can be used to pass code names to the compiler as ``-march`` flag.
+
 
 Resolved Issues
 ---------------
diff --git a/mk/target/generic/rte.vars.mk b/mk/target/generic/rte.vars.mk
index 75a616a..b31e426 100644
--- a/mk/target/generic/rte.vars.mk
+++ b/mk/target/generic/rte.vars.mk
@@ -50,7 +50,11 @@
 #   - can define CPU_ASFLAGS variable (overriden by cmdline value) that
 #     overrides the one defined in arch.
 #
+ifneq ($(wildcard $(RTE_SDK)/mk/machine/$(RTE_MACHINE)/rte.vars.mk),)
 include $(RTE_SDK)/mk/machine/$(RTE_MACHINE)/rte.vars.mk
+else
+MACHINE_CFLAGS := -march=$(RTE_MACHINE)
+endif
 
 #
 # arch:
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API
@ 2016-10-10  9:42  0% Zhao1, Wei
    0 siblings, 1 reply; 200+ results
From: Zhao1, Wei @ 2016-10-10  9:42 UTC (permalink / raw)
  To: Adrien Mazarguil, dev

Hi Adrien Mazarguil,

In your v2 version of rte_flow.txt , there is an action type RTE_FLOW_ACTION_TYPE_MARK,  but there is no definition of struct rte_flow_action_mark.
And there is  an definition of struct rte_flow_action_id. Is it a typo or other usage?

Thank you.

struct rte_flow_action_id {
	uint32_t id; /**< 32 bit value to return with packets. */
};

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Saturday, August 20, 2016 3:33 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API
> 
> Hi All,
> 
> Thanks to many for the positive and constructive feedback I've received so
> far. Here is the updated specification (v0.7) at last.
> 
> I've attempted to address as many comments as possible but could not
> process them all just yet. A new section "Future evolutions" has been
> added for the remaining topics.
> 
> This series adds rte_flow.h to the DPDK tree. Next time I will attempt to
> convert the specification as a documentation commit part of the patchset
> and actually implement API functions.
> 
> I think including the entire document here makes it easier to annotate on
> the ML, apologies in advance for the resulting traffic.
> 
> Finally I'm off for the next two weeks, do not expect replies from me in
> the meantime.
> 
> Updates are also available online:
> 
> HTML version:
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
> 
> PDF version:
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf
> 
> Related draft header file (also in the next patch):
>  https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h
> 
> Git tree:
>  https://github.com/6WIND/rte_flow
> 
> Changes from v1:
> 
>  Specification:
> 
>  - Settled on [generic] "flow interface" / "flow API" as the name of this
>    framework, matches the rte_flow prefix better.
>  - Minor wording changes in several places.
>  - Partially added egress (TX) support.
>  - Added "unrecoverable errors" as another consequence of overlapping
>    rules.
>  - Described flow rules groups and their interaction with flow rule
>    priorities.
>  - Fully described PF and VF meta pattern items so they are not open to
>    interpretation anymore.
>  - Removed the SIGNATURE meta pattern item as its description was too
>    vague, may be re-added later if necessary.
>  - Added the PORT pattern item to apply rules to non-default physical
>    ports.
>  - Entirely redefined the RAW pattern item.
>  - Fixed tag error in the ETH item definition.
>  - Updated protocol definitions (IPV4, IPV6, ICMP, UDP).
>  - Added missing protocols (SCTP, VXLAN).
>  - Converted ID action to MARK and FLAG actions, described interaction
>    with the RSS hash result in mbufs.
>  - Updated COUNT query structure to retrieve the number of bytes.
>  - Updated VF action.
>  - Documented negative item and action types, those will be used for
>    dynamic types generated at run-time.
>  - Added blurb about IPv4 options and IPv6 extension headers matching.
>  - Updated function definitions.
>  - Documented a flush method to remove all rules on a given port at once.
>  - Documented the verbose error reporting interface.
>  - Documented how the private interface for PMD use will work.
>  - Documented expected behavior between successive port initializations.
>  - Documented expected behavior for ports not under DPDK control.
>  - Updated API migration section.
>  - Added future evolutions section.
> 
>  Header file:
> 
>  - Not a draft anymore and can be used as-is for preliminary
>    implementations.
>  - Flow rule attributes (group, priority, etc) now have their own
>    structure provided separately to API functions (struct rte_flow_attr).
>  - Group and priority interactions have been documented.
>  - Added PORT item.
>  - Removed SIGNATURE item.
>  - Defined ICMP, SCTP and VXLAN items.
>  - Redefined PF, VF, RAW, IPV4, IPV6, UDP and TCP items.
>  - Fixed tag error in the ETH item definition.
>  - Converted ID action to MARK and FLAG actions.
>    hash result in mbufs.
>  - Updated COUNT query structure.
>  - Updated VF action.
>  - Added verbose errors interface.
>  - Updated function prototypes according to the above.
>  - Defined rte_flow_flush().
> 
> --------
> 
> ======================
> Generic flow interface
> ======================
> 
> .. footer::
> 
>    v0.7
> 
> .. contents::
> .. sectnum::
> .. raw:: pdf
> 
>    PageBreak
> 
> Overview
> ========
> 
> DPDK provides several competing interfaces added over time to perform
> packet
> matching and related actions such as filtering and classification.
> 
> They must be extended to implement the features supported by newer
> devices
> in order to expose them to applications, however the current design has
> several drawbacks:
> 
> - Complicated filter combinations which have not been hard-coded cannot be
>   expressed.
> - Prone to API/ABI breakage when new features must be added to an
> existing
>   filter type, which frequently happens.
> 
> From an application point of view:
> 
> - Having disparate interfaces, all optional and lacking in features does not
>   make this API easy to use.
> - Seemingly arbitrary built-in limitations of filter types based on the
>   device they were initially designed for.
> - Undefined relationship between different filter types.
> - High complexity, considerable undocumented and/or undefined behavior.
> 
> Considering the growing number of devices supported by DPDK, adding a
> new
> filter type each time a new feature must be implemented is not sustainable
> in the long term. Applications not written to target a specific device
> cannot really benefit from such an API.
> 
> For these reasons, this document defines an extensible unified API that
> encompasses and supersedes these legacy filter types.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Current API
> ===========
> 
> Rationale
> ---------
> 
> The reason several competing (and mostly overlapping) filtering APIs are
> present in DPDK is due to its nature as a thin layer between hardware and
> software.
> 
> Each subsequent interface has been added to better match the capabilities
> and limitations of the latest supported device, which usually happened to
> need an incompatible configuration approach. Because of this, many ended
> up
> device-centric and not usable by applications that were not written for that
> particular device.
> 
> This document is not the first attempt to address this proliferation issue,
> in fact a lot of work has already been done both to create a more generic
> interface while somewhat keeping compatibility with legacy ones through a
> common call interface (``rte_eth_dev_filter_ctrl()`` with the
> ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``).
> 
> Today, these previously incompatible interfaces are known as filter types
> (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``).
> 
> However while trivial to extend with new types, it only shifted the
> underlying problem as applications still need to be written for one kind of
> filter type, which, as described in the following sections, is not
> necessarily implemented by all PMDs that support filtering.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Filter types
> ------------
> 
> This section summarizes the capabilities of each filter type.
> 
> Although the following list is exhaustive, the description of individual
> types may contain inaccuracies due to the lack of documentation or usage
> examples.
> 
> Note: names are prefixed with ``RTE_ETH_FILTER_``.
> 
> ``MACVLAN``
> ~~~~~~~~~~~
> 
> Matching:
> 
> - L2 source/destination addresses.
> - Optional 802.1Q VLAN ID.
> - Masking individual fields on a rule basis is not supported.
> 
> Action:
> 
> - Packets are redirected either to a given VF device using its ID or to the
>   PF.
> 
> ``ETHERTYPE``
> ~~~~~~~~~~~~~
> 
> Matching:
> 
> - L2 source/destination addresses (optional).
> - Ethertype (no VLAN ID?).
> - Masking individual fields on a rule basis is not supported.
> 
> Action:
> 
> - Receive packets on a given queue.
> - Drop packets.
> 
> ``FLEXIBLE``
> ~~~~~~~~~~~~
> 
> Matching:
> 
> - At most 128 consecutive bytes anywhere in packets.
> - Masking is supported with byte granularity.
> - Priorities are supported (relative to this filter type, undefined
>   otherwise).
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> ``SYN``
> ~~~~~~~
> 
> Matching:
> 
> - TCP SYN packets only.
> - One high priority bit can be set to give the highest possible priority to
>   this type when other filters with different types are configured.
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> ``NTUPLE``
> ~~~~~~~~~~
> 
> Matching:
> 
> - Source/destination IPv4 addresses (optional in 2-tuple mode).
> - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes).
> - L4 protocol (2 and 5-tuple modes).
> - Masking individual fields is supported.
> - TCP flags.
> - Up to 7 levels of priority relative to this filter type, undefined
>   otherwise.
> - No IPv6.
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> ``TUNNEL``
> ~~~~~~~~~~
> 
> Matching:
> 
> - Outer L2 source/destination addresses.
> - Inner L2 source/destination addresses.
> - Inner VLAN ID.
> - IPv4/IPv6 source (destination?) address.
> - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
> 802.1BR
>   E-Tag).
> - Tenant ID for tunneling protocols that have one.
> - Any combination of the above can be specified.
> - Masking individual fields on a rule basis is not supported.
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``FDIR``
> ~~~~~~~~
> 
> Queries:
> 
> - Device capabilities and limitations.
> - Device statistics about configured filters (resource usage, collisions).
> - Device configuration (matching input set and masks)
> 
> Matching:
> 
> - Device mode of operation: none (to disable filtering), signature
>   (hash-based dispatching from masked fields) or perfect (either MAC VLAN
> or
>   tunnel).
> - L2 Ethertype.
> - Outer L2 destination address (MAC VLAN mode).
> - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID
>   (tunnel mode).
> - IPv4 source/destination addresses, ToS, TTL and protocol fields.
> - IPv6 source/destination addresses, TC, protocol and hop limits fields.
> - UDP source/destination IPv4/IPv6 and ports.
> - TCP source/destination IPv4/IPv6 and ports.
> - SCTP source/destination IPv4/IPv6, ports and verification tag field.
> - Note, only one protocol type at once (either only L2 Ethertype, basic
>   IPv6, IPv4+UDP, IPv4+TCP and so on).
> - VLAN TCI (extended API).
> - At most 16 bytes to match in payload (extended API). A global device
>   look-up table specifies for each possible protocol layer (unknown, raw,
>   L2, L3, L4) the offset to use for each byte (they do not need to be
>   contiguous) and the related bit-mask.
> - Whether packet is addressed to PF or VF, in that case its ID can be
>   matched as well (extended API).
> - Masking most of the above fields is supported, but simultaneously affects
>   all filters configured on a device.
> - Input set can be modified in a similar fashion for a given device to
>   ignore individual fields of filters (i.e. do not match the destination
>   address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_**
>   macros). Configuring this also affects RSS processing on **i40e**.
> - Filters can also provide 32 bits of arbitrary data to return as part of
>   matched packets.
> 
> Action:
> 
> - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue.
> - **RTE_ETH_FDIR_REJECT**: drop packet immediately.
> - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list,
>   otherwise process it with subsequent filters.
> - For accepted packets and if requested by filter, either 32 bits of
>   arbitrary data and four bytes of matched payload (only in case of flex
>   bytes matching), or eight bytes of matched payload (flex also) are added
>   to meta data.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``HASH``
> ~~~~~~~~
> 
> Not an actual filter type. Provides and retrieves the global device
> configuration (per port or entire NIC) for hash functions and their
> properties.
> 
> Hash function selection: "default" (keep current), XOR or Toeplitz.
> 
> This function can be configured per flow type (**RTE_ETH_FLOW_**
> definitions), supported types are:
> 
> - Unknown.
> - Raw.
> - Fragmented or non-fragmented IPv4.
> - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other).
> - Fragmented or non-fragmented IPv6.
> - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other).
> - L2 payload.
> - IPv6 with extensions.
> - IPv6 with L4 (TCP, UDP) and extensions.
> 
> ``L2_TUNNEL``
> ~~~~~~~~~~~~~
> 
> Matching:
> 
> - All packets received on a given port.
> 
> Action:
> 
> - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
>   802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag
>   is implemented at the moment).
> - VF ID to use for tag insertion (currently unused).
> - Destination pool for tag based forwarding (pools are IDs that can be
>   affected to ports, duplication occurs if the same ID is shared by several
>   ports of the same NIC).
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Driver support
> --------------
> 
> ======== ======= ========= ======== === ====== ====== ==== ====
> =========
> Driver   MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH
> L2_TUNNEL
> ======== ======= ========= ======== === ====== ====== ==== ====
> =========
> bnx2x
> cxgbe
> e1000            yes       yes      yes yes
> ena
> enic                                                  yes
> fm10k
> i40e     yes     yes                           yes    yes  yes
> ixgbe            yes                yes yes           yes       yes
> mlx4
> mlx5                                                  yes
> szedata2
> ======== ======= ========= ======== === ====== ====== ==== ====
> =========
> 
> Flow director
> -------------
> 
> Flow director (FDIR) is the name of the most capable filter type, which
> covers most features offered by others. As such, it is the most widespread
> in PMDs that support filtering (i.e. all of them besides **e1000**).
> 
> It is also the only type that allows an arbitrary 32 bits value provided by
> applications to be attached to a filter and returned with matching packets
> instead of relying on the destination queue to recognize flows.
> 
> Unfortunately, even FDIR requires applications to be aware of low-level
> capabilities and limitations (most of which come directly from **ixgbe** and
> **i40e**):
> 
> - Bit-masks are set globally per device (port?), not per filter.
> - Configuration state is not expected to be saved by the driver, and
>   stopping/restarting a port requires the application to perform it again
>   (API documentation is also unclear about this).
> - Monolithic approach with ABI issues as soon as a new kind of flow or
>   combination needs to be supported.
> - Cryptic global statistics/counters.
> - Unclear about how priorities are managed; filters seem to be arranged as a
>   linked list in hardware (possibly related to configuration order).
> 
> Packet alteration
> -----------------
> 
> One interesting feature is that the L2 tunnel filter type implements the
> ability to alter incoming packets through a filter (in this case to
> encapsulate them), thus the **mlx5** flow encap/decap features are not a
> foreign concept.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Proposed API
> ============
> 
> Terminology
> -----------
> 
> - **Flow API**: overall framework affecting the fate of selected packets,
>   covers everything described in this document.
> - **Filtering API**: an alias for *Flow API*.
> - **Matching pattern**: properties to look for in packets, a combination of
>   any number of items.
> - **Pattern item**: part of a pattern that either matches packet data
>   (protocol header, payload or derived information), or specifies properties
>   of the pattern itself.
> - **Actions**: what needs to be done when a packet is matched by a
> pattern.
> - **Flow rule**: this is the result of combining a *matching pattern* with
>   *actions*.
> - **Filter rule**: a less generic term than *flow rule*, can otherwise be
>   used interchangeably.
> - **Hit**: a flow rule is said to be *hit* when processing a matching
>   packet.
> 
> Requirements
> ------------
> 
> As described in the previous section, there is a growing need for a common
> method to configure filtering and related actions in a hardware independent
> fashion.
> 
> The flow API should not disallow any filter combination by design and must
> remain as simple as possible to use. It can simply be defined as a method to
> perform one or several actions on selected packets.
> 
> PMDs are aware of the capabilities of the device they manage and should be
> responsible for preventing unsupported or conflicting combinations.
> 
> This approach is fundamentally different as it places most of the burden on
> the software side of the PMD instead of having device capabilities directly
> mapped to API functions, then expecting applications to work around
> ensuing
> compatibility issues.
> 
> Requirements for a new API:
> 
> - Flexible and extensible without causing API/ABI problems for existing
>   applications.
> - Should be unambiguous and easy to use.
> - Support existing filtering features and actions listed in `Filter types`_.
> - Support packet alteration.
> - In case of overlapping filters, their priority should be well documented.
> - Support filter queries (for example to retrieve counters).
> - Support egress (TX) matching and specific actions.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> High level design
> -----------------
> 
> The chosen approach to make filtering as generic as possible is by
> expressing matching patterns through lists of items instead of the flat
> structures used in DPDK today, enabling combinations that are not
> predefined
> and thus being more versatile.
> 
> Flow rules can have several distinct actions (such as counting,
> encapsulating, decapsulating before redirecting packets to a particular
> queue, etc.), instead of relying on several rules to achieve this and having
> applications deal with hardware implementation details regarding their
> order.
> 
> Support for different priority levels on a rule basis is provided, for
> example in order to force a more specific rule come before a more generic
> one for packets matched by both, however hardware support for more than
> a
> single priority level cannot be guaranteed. When supported, the number of
> available priority levels is usually low, which is why they can also be
> implemented in software by PMDs (e.g. missing priority levels may be
> emulated by reordering rules).
> 
> In order to remain as hardware agnostic as possible, by default all rules
> are considered to have the same priority, which means that the order
> between
> overlapping rules (when a packet is matched by several filters) is
> undefined, packet duplication or unrecoverable errors may even occur as a
> result.
> 
> PMDs may refuse to create overlapping rules at a given priority level when
> they can be detected (e.g. if a pattern matches an existing filter).
> 
> Thus predictable results for a given priority level can only be achieved
> with non-overlapping rules, using perfect matching on all protocol layers.
> 
> Flow rules can also be grouped, the flow rule priority is specific to the
> group they belong to. All flow rules in a given group are thus processed
> either before or after another group.
> 
> Support for multiple actions per rule may be implemented internally on top
> of non-default hardware priorities, as a result both features may not be
> simultaneously available to applications.
> 
> Considering that allowed pattern/actions combinations cannot be known in
> advance and would result in an unpractically large number of capabilities to
> expose, a method is provided to validate a given rule from the current
> device configuration state without actually adding it (akin to a "dry run"
> mode).
> 
> This enables applications to check if the rule types they need is supported
> at initialization time, before starting their data path. This method can be
> used anytime, its only requirement being that the resources needed by a
> rule
> must exist (e.g. a target RX queue must be configured first).
> 
> Each defined rule is associated with an opaque handle managed by the PMD,
> applications are responsible for keeping it. These can be used for queries
> and rules management, such as retrieving counters or other data and
> destroying them.
> 
> To avoid resource leaks on the PMD side, handles must be explicitly
> destroyed by the application before releasing associated resources such as
> queues and ports.
> 
> Integration
> -----------
> 
> To avoid ABI breakage, this new interface will be implemented through the
> existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using
> **RTE_ETH_FILTER_GENERIC** as a new filter type.
> 
> However a public front-end API described in `Rules management`_ will
> be added as the preferred method to use it.
> 
> Once discussions with the community have converged to a definite API,
> legacy
> filter types should be deprecated and a deadline defined to remove their
> support entirely.
> 
> PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC**
> or
> drop filtering support entirely. Less maintained PMDs for older hardware
> may
> lose support at this point.
> 
> The notion of filter type will then be deprecated and subsequently dropped
> to avoid confusion between both frameworks.
> 
> Implementation details
> ======================
> 
> Flow rule
> ---------
> 
> A flow rule is the combination a matching pattern with a list of actions,
> and is the basis of this API.
> 
> They also have several other attributes described in the following sections.
> 
> Groups
> ~~~~~~
> 
> Flow rules can be grouped by assigning them a common group number.
> Lower
> values have higher priority. Group 0 has the highest priority.
> 
> Although optional, applications are encouraged to group similar rules as
> much as possible to fully take advantage of hardware capabilities
> (e.g. optimized matching) and work around limitations (e.g. a single pattern
> type possibly allowed in a given group).
> 
> Note that support for more than a single group is not guaranteed.
> 
> Priorities
> ~~~~~~~~~~
> 
> A priority level can be assigned to a flow rule. Like groups, lower values
> denote higher priority, with 0 as the maximum.
> 
> A rule with priority 0 in group 8 is always matched after a rule with
> priority 8 in group 0.
> 
> Group and priority levels are arbitrary and up to the application, they do
> not need to be contiguous nor start from 0, however the maximum number
> varies between devices and may be affected by existing flow rules.
> 
> If a packet is matched by several rules of a given group for a given
> priority level, the outcome is undefined. It can take any path, may be
> duplicated or even cause unrecoverable errors.
> 
> Note that support for more than a single priority level is not guaranteed.
> 
> Traffic direction
> ~~~~~~~~~~~~~~~~~
> 
> Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
> 
> Several pattern items and actions are valid and can be used in both
> directions. Those valid for only one direction are described as such.
> 
> Specifying both directions at once is not recommended but may be valid in
> some cases, such as incrementing the same counter twice.
> 
> Not specifying any direction is currently an error.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Matching pattern
> ~~~~~~~~~~~~~~~~
> 
> A matching pattern comprises any number of items of various types.
> 
> Items are arranged in a list to form a matching pattern for packets. They
> fall in two categories:
> 
> - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, SCTP,
> VXLAN
>   and so on), usually associated with a specification structure. These must
>   be stacked in the same order as the protocol layers to match, starting
>   from L2.
> 
> - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, PORT
>   and so on), often without a specification structure. Since they are meta
>   data that does not match packet contents, these can be specified anywhere
>   within item lists without affecting the protocol matching items.
> 
> Most item specifications can be optionally paired with a mask to narrow the
> specific fields or bits to be matched.
> 
> - Items are defined with ``struct rte_flow_item``.
> - Patterns are defined with ``struct rte_flow_pattern``.
> 
> Example of an item specification matching an Ethernet header:
> 
> +-----------------------------------------+
> | Ethernet                                |
> +==========+=========+====================+
> | ``spec`` | ``src`` | ``00:01:02:03:04`` |
> |          +---------+--------------------+
> |          | ``dst`` | ``00:2a:66:00:01`` |
> +----------+---------+--------------------+
> | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` |
> |          +---------+--------------------+
> |          | ``dst`` | ``00:00:00:00:ff`` |
> +----------+---------+--------------------+
> 
> Non-masked bits stand for any value, Ethernet headers with the following
> properties are thus matched:
> 
> - ``src``: ``??:01:02:03:??``
> - ``dst``: ``??:??:??:??:01``
> 
> Except for meta types that do not need one, ``spec`` must be a valid pointer
> to a structure of the related item type. A ``mask`` of the same type can be
> provided to tell which bits in ``spec`` are to be matched.
> 
> A mask is normally only needed for ``spec`` fields matching packet data,
> ignored otherwise. See individual item types for more information.
> 
> A ``NULL`` mask pointer is allowed and is similar to matching with a full
> mask (all ones) ``spec`` fields supported by hardware, the remaining fields
> are ignored (all zeroes), there is thus no error checking for unsupported
> fields.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Matching pattern items for packet data must be naturally stacked (ordered
> from lowest to highest protocol layer), as in the following examples:
> 
> +--------------+
> | TCPv4 as L4  |
> +===+==========+
> | 0 | Ethernet |
> +---+----------+
> | 1 | IPv4     |
> +---+----------+
> | 2 | TCP      |
> +---+----------+
> 
> +----------------+
> | TCPv6 in VXLAN |
> +===+============+
> | 0 | Ethernet   |
> +---+------------+
> | 1 | IPv4       |
> +---+------------+
> | 2 | UDP        |
> +---+------------+
> | 3 | VXLAN      |
> +---+------------+
> | 4 | Ethernet   |
> +---+------------+
> | 5 | IPv6       |
> +---+------------+
> | 6 | TCP        |
> +---+------------+
> 
> +-----------------------------+
> | TCPv4 as L4 with meta items |
> +===+=========================+
> | 0 | VOID                    |
> +---+-------------------------+
> | 1 | Ethernet                |
> +---+-------------------------+
> | 2 | VOID                    |
> +---+-------------------------+
> | 3 | IPv4                    |
> +---+-------------------------+
> | 4 | TCP                     |
> +---+-------------------------+
> | 5 | VOID                    |
> +---+-------------------------+
> | 6 | VOID                    |
> +---+-------------------------+
> 
> The above example shows how meta items do not affect packet data
> matching
> items, as long as those remain stacked properly. The resulting matching
> pattern is identical to "TCPv4 as L4".
> 
> +----------------+
> | UDPv6 anywhere |
> +===+============+
> | 0 | IPv6       |
> +---+------------+
> | 1 | UDP        |
> +---+------------+
> 
> If supported by the PMD, omitting one or several protocol layers at the
> bottom of the stack as in the above example (missing an Ethernet
> specification) enables hardware to look anywhere in packets.
> 
> This is an alias for specifying `ANY`_ with ``min = 0`` and ``max = 0``
> properties as the first item.
> 
> It is unspecified whether the payload of supported encapsulations
> (e.g. VXLAN inner packet) is matched by such a pattern, which may apply to
> inner, outer or both packets.
> 
> +---------------------+
> | Invalid, missing L3 |
> +===+=================+
> | 0 | Ethernet        |
> +---+-----------------+
> | 1 | UDP             |
> +---+-----------------+
> 
> The above pattern is invalid due to a missing L3 specification between L2
> and L4. It is only allowed at the bottom and at the top of the stack.
> 
> Meta item types
> ~~~~~~~~~~~~~~~
> 
> These do not match packet data but affect how the pattern is processed,
> most
> of them do not need a specification structure. This particularity allows
> them to be specified anywhere without affecting other item types.
> 
> ``END``
> ^^^^^^^
> 
> End marker for item lists. Prevents further processing of items, thereby
> ending the pattern.
> 
> - Its numeric value is **0** for convenience.
> - PMD support is mandatory.
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | END                |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> ``VOID``
> ^^^^^^^^
> 
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
> 
> - PMD support is mandatory.
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | VOID               |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> One usage example for this type is generating rules that share a common
> prefix quickly without reallocating memory, only by updating item types:
> 
> +------------------------+
> | TCP, UDP or ICMP as L4 |
> +===+====================+
> | 0 | Ethernet           |
> +---+--------------------+
> | 1 | IPv4               |
> +---+------+------+------+
> | 2 | UDP  | VOID | VOID |
> +---+------+------+------+
> | 3 | VOID | TCP  | VOID |
> +---+------+------+------+
> | 4 | VOID | VOID | ICMP |
> +---+------+------+------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``INVERT``
> ^^^^^^^^^^
> 
> Inverted matching, i.e. process packets that do not match the pattern.
> 
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | INVERT             |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> Usage example in order to match non-TCPv4 packets only:
> 
> +--------------------+
> | Anything but TCPv4 |
> +===+================+
> | 0 | INVERT         |
> +---+----------------+
> | 1 | Ethernet       |
> +---+----------------+
> | 2 | IPv4           |
> +---+----------------+
> | 3 | TCP            |
> +---+----------------+
> 
> ``PF``
> ^^^^^^
> 
> Matches packets addressed to the physical function of the device.
> 
> If the underlying device function differs from the one that would normally
> receive the matched traffic, specifying this item prevents it from reaching
> that device unless the flow rule contains a `PF (action)`_. Packets are not
> duplicated between device instances by default.
> 
> - Likely to return an error or never match any traffic if applied to a VF
>   device.
> - Can be combined with any number of `VF`_ items to match both PF and VF
>   traffic.
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | PF                 |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> ``VF``
> ^^^^^^
> 
> Matches packets addressed to a virtual function ID of the device.
> 
> If the underlying device function differs from the one that would normally
> receive the matched traffic, specifying this item prevents it from reaching
> that device unless the flow rule contains a `VF (action)`_. Packets are not
> duplicated between device instances by default.
> 
> - Likely to return an error or never match any traffic if this causes a VF
>   device to match traffic addressed to a different VF.
> - Can be specified multiple times to match traffic addressed to several VFs.
> - Can be combined with a `PF`_ item to match both PF and VF traffic.
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
> 
> +-------------------------------------------------+
> | VF                                              |
> +==========+=========+============================+
> | ``spec`` | ``any`` | ignore the specified VF ID |
> |          +---------+----------------------------+
> |          | ``vf``  | destination VF ID          |
> +----------+---------+----------------------------+
> | ``mask`` | ignored                              |
> +----------+--------------------------------------+
> 
> ``PORT``
> ^^^^^^^^
> 
> Matches packets coming from the specified physical port of the underlying
> device.
> 
> The first PORT item overrides the physical port normally associated with the
> specified DPDK input port (port_id). This item can be provided several times
> to match additional physical ports.
> 
> Note that physical ports are not necessarily tied to DPDK input ports
> (port_id) when those are not under DPDK control. Possible values are
> specific to each device, they are not necessarily indexed from zero and may
> not be contiguous.
> 
> As a device property, the list of allowed values as well as the value
> associated with a port_id should be retrieved by other means.
> 
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
> 
> +--------------------------------------------+
> | PORT                                       |
> +==========+===========+=====================+
> | ``spec`` | ``index`` | physical port index |
> +----------+-----------+---------------------+
> | ``mask`` | ignored                         |
> +----------+---------------------------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Data matching item types
> ~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Most of these are basically protocol header definitions with associated
> bit-masks. They must be specified (stacked) from lowest to highest protocol
> layer.
> 
> The following list is not exhaustive as new protocols will be added in the
> future.
> 
> ``ANY``
> ^^^^^^^
> 
> Matches any protocol in place of the current layer, a single ANY may also
> stand for several protocol layers.
> 
> This is usually specified as the first pattern item when looking for a
> protocol anywhere in a packet.
> 
> - A maximum value of **0** requests matching any number of protocol
> layers
>   above or equal to the minimum value, a maximum value lower than the
>   minimum one is otherwise invalid.
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
> 
> +-----------------------------------------------------------------------+
> | ANY                                                                   |
> +==========+=========+====================================
> ==============+
> | ``spec`` | ``min`` | minimum number of layers covered                 |
> |          +---------+--------------------------------------------------+
> |          | ``max`` | maximum number of layers covered, 0 for infinity |
> +----------+---------+--------------------------------------------------+
> | ``mask`` | ignored                                                    |
> +----------+------------------------------------------------------------+
> 
> Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
> and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
> or IPv6) matched by the second ANY specification:
> 
> +----------------------------------+
> | TCP in VXLAN with wildcards      |
> +===+==============================+
> | 0 | Ethernet                     |
> +---+-----+----------+---------+---+
> | 1 | ANY | ``spec`` | ``min`` | 2 |
> |   |     |          +---------+---+
> |   |     |          | ``max`` | 2 |
> +---+-----+----------+---------+---+
> | 2 | VXLAN                        |
> +---+------------------------------+
> | 3 | Ethernet                     |
> +---+-----+----------+---------+---+
> | 4 | ANY | ``spec`` | ``min`` | 1 |
> |   |     |          +---------+---+
> |   |     |          | ``max`` | 1 |
> +---+-----+----------+---------+---+
> | 5 | TCP                          |
> +---+------------------------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``RAW``
> ^^^^^^^
> 
> Matches a byte string of a given length at a given offset.
> 
> Offset is either absolute (using the start of the packet) or relative to the
> end of the previous matched item in the stack, in which case negative values
> are allowed.
> 
> If search is enabled, offset is used as the starting point. The search area
> can be delimited by setting limit to a nonzero value, which is the maximum
> number of bytes after offset where the pattern may start.
> 
> Matching a zero-length pattern is allowed, doing so resets the relative
> offset for subsequent items.
> 
> - ``mask`` only affects the pattern field.
> 
> +---------------------------------------------------------------------------+
> | RAW                                                                       |
> +==========+==============+===============================
> ==================+
> | ``spec`` | ``relative`` | look for pattern after the previous item        |
> |          +--------------+-------------------------------------------------+
> |          | ``search``   | search pattern from offset (see also ``limit``) |
> |          +--------------+-------------------------------------------------+
> |          | ``reserved`` | reserved, must be set to zero                   |
> |          +--------------+-------------------------------------------------+
> |          | ``offset``   | absolute or relative offset for ``pattern``     |
> |          +--------------+-------------------------------------------------+
> |          | ``limit``    | search area limit for start of ``pattern``      |
> |          +--------------+-------------------------------------------------+
> |          | ``length``   | ``pattern`` length                              |
> |          +--------------+-------------------------------------------------+
> |          | ``pattern``  | byte string to look for                         |
> +----------+--------------+-------------------------------------------------+
> | ``mask`` | ``relative`` | ignored                                         |
> |          +--------------+-------------------------------------------------+
> |          | ``search``   | ignored                                         |
> |          +--------------+-------------------------------------------------+
> |          | ``reserved`` | ignored                                         |
> |          +--------------+-------------------------------------------------+
> |          | ``offset``   | ignored                                         |
> |          +--------------+-------------------------------------------------+
> |          | ``limit``    | ignored                                         |
> |          +--------------+-------------------------------------------------+
> |          | ``length``   | ignored                                         |
> |          +--------------+-------------------------------------------------+
> |          | ``pattern``  | bit-mask of the same byte length as ``pattern`` |
> +----------+--------------+-------------------------------------------------+
> 
> Example pattern looking for several strings at various offsets of a UDP
> payload, using combined RAW items:
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> +-------------------------------------------+
> | UDP payload matching                      |
> +===+=======================================+
> | 0 | Ethernet                              |
> +---+---------------------------------------+
> | 1 | IPv4                                  |
> +---+---------------------------------------+
> | 2 | UDP                                   |
> +---+-----+----------+--------------+-------+
> | 3 | RAW | ``spec`` | ``relative`` | 1     |
> |   |     |          +--------------+-------+
> |   |     |          | ``search``   | 1     |
> |   |     |          +--------------+-------+
> |   |     |          | ``offset``   | 10    |
> |   |     |          +--------------+-------+
> |   |     |          | ``limit``    | 0     |
> |   |     |          +--------------+-------+
> |   |     |          | ``length``   | 3     |
> |   |     |          +--------------+-------+
> |   |     |          | ``pattern``  | "foo" |
> +---+-----+----------+--------------+-------+
> | 4 | RAW | ``spec`` | ``relative`` | 1     |
> |   |     |          +--------------+-------+
> |   |     |          | ``search``   | 0     |
> |   |     |          +--------------+-------+
> |   |     |          | ``offset``   | 20    |
> |   |     |          +--------------+-------+
> |   |     |          | ``limit``    | 0     |
> |   |     |          +--------------+-------+
> |   |     |          | ``length``   | 3     |
> |   |     |          +--------------+-------+
> |   |     |          | ``pattern``  | "bar" |
> +---+-----+----------+--------------+-------+
> | 5 | RAW | ``spec`` | ``relative`` | 1     |
> |   |     |          +--------------+-------+
> |   |     |          | ``search``   | 0     |
> |   |     |          +--------------+-------+
> |   |     |          | ``offset``   | -29   |
> |   |     |          +--------------+-------+
> |   |     |          | ``limit``    | 0     |
> |   |     |          +--------------+-------+
> |   |     |          | ``length``   | 3     |
> |   |     |          +--------------+-------+
> |   |     |          | ``pattern``  | "baz" |
> +---+-----+----------+--------------+-------+
> 
> This translates to:
> 
> - Locate "foo" at least 10 bytes deep inside UDP payload.
> - Locate "bar" after "foo" plus 20 bytes.
> - Locate "baz" after "bar" minus 29 bytes.
> 
> Such a packet may be represented as follows (not to scale)::
> 
>  0                     >= 10 B           == 20 B
>  |                  |<--------->|     |<--------->|
>  |                  |           |     |           |
>  |-----|------|-----|-----|-----|-----|-----------|-----|------|
>  | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
>  |-----|------|-----|-----|-----|-----|-----------|-----|------|
>                           |                             |
>                           |<--------------------------->|
>                                       == 29 B
> 
> Note that matching subsequent pattern items would resume after "baz", not
> "bar" since matching is always performed after the previous item of the
> stack.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``ETH``
> ^^^^^^^
> 
> Matches an Ethernet header.
> 
> - ``dst``: destination MAC.
> - ``src``: source MAC.
> - ``type``: EtherType.
> - ``tags``: number of 802.1Q/ad tags defined.
> - ``tag[]``: 802.1Q/ad tag definitions, outermost first. For each one:
> 
>  - ``tpid``: Tag protocol identifier.
>  - ``tci``: Tag control information.
> 
> ``IPV4``
> ^^^^^^^^
> 
> Matches an IPv4 header.
> 
> Note: IPv4 options are handled by dedicated pattern items.
> 
> - ``hdr``: IPv4 header definition (``rte_ip.h``).
> 
> ``IPV6``
> ^^^^^^^^
> 
> Matches an IPv6 header.
> 
> Note: IPv6 options are handled by dedicated pattern items.
> 
> - ``hdr``: IPv6 header definition (``rte_ip.h``).
> 
> ``ICMP``
> ^^^^^^^^
> 
> Matches an ICMP header.
> 
> - ``hdr``: ICMP header definition (``rte_icmp.h``).
> 
> ``UDP``
> ^^^^^^^
> 
> Matches a UDP header.
> 
> - ``hdr``: UDP header definition (``rte_udp.h``).
> 
> ``TCP``
> ^^^^^^^
> 
> Matches a TCP header.
> 
> - ``hdr``: TCP header definition (``rte_tcp.h``).
> 
> ``SCTP``
> ^^^^^^^^
> 
> Matches a SCTP header.
> 
> - ``hdr``: SCTP header definition (``rte_sctp.h``).
> 
> ``VXLAN``
> ^^^^^^^^^
> 
> Matches a VXLAN header (RFC 7348).
> 
> - ``flags``: normally 0x08 (I flag).
> - ``rsvd0``: reserved, normally 0x000000.
> - ``vni``: VXLAN network identifier.
> - ``rsvd1``: reserved, normally 0x00.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Actions
> ~~~~~~~
> 
> Each possible action is represented by a type. Some have associated
> configuration structures. Several actions combined in a list can be affected
> to a flow rule. That list is not ordered.
> 
> At least one action must be defined in a filter rule in order to do
> something with matched packets.
> 
> - Actions are defined with ``struct rte_flow_action``.
> - A list of actions is defined with ``struct rte_flow_actions``.
> 
> They fall in three categories:
> 
> - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
>   processing matched packets by subsequent flow rules, unless overridden
>   with PASSTHRU.
> 
> - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
> for
>   additional processing by subsequent flow rules.
> 
> - Other non terminating meta actions that do not affect the fate of packets
>   (END, VOID, MARK, FLAG, COUNT).
> 
> When several actions are combined in a flow rule, they should all have
> different types (e.g. dropping a packet twice is not possible). The defined
> behavior is for PMDs to only take into account the last action of a given
> type found in the list. PMDs still perform error checking on the entire
> list.
> 
> *Note that PASSTHRU is the only action having the ability to override a
> terminating rule.*
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Example of an action that redirects packets to queue index 10:
> 
> +----------------+
> | QUEUE          |
> +===========+====+
> | ``queue`` | 10 |
> +-----------+----+
> 
> Action lists examples, their order is not significant, applications must
> consider all actions to be performed simultaneously:
> 
> +----------------+
> | Count and drop |
> +=======+========+
> | COUNT |        |
> +-------+--------+
> | DROP  |        |
> +-------+--------+
> 
> +--------------------------+
> | Tag, count and redirect  |
> +=======+===========+======+
> | MARK  | ``mark``  | 0x2a |
> +-------+-----------+------+
> | COUNT |                  |
> +-------+-----------+------+
> | QUEUE | ``queue`` | 10   |
> +-------+-----------+------+
> 
> +-----------------------+
> | Redirect to queue 5   |
> +=======+===============+
> | DROP  |               |
> +-------+-----------+---+
> | QUEUE | ``queue`` | 5 |
> +-------+-----------+---+
> 
> In the above example, considering both actions are performed
> simultaneously,
> its end result is that only QUEUE has any effect.
> 
> +-----------------------+
> | Redirect to queue 3   |
> +=======+===========+===+
> | QUEUE | ``queue`` | 5 |
> +-------+-----------+---+
> | VOID  |               |
> +-------+-----------+---+
> | QUEUE | ``queue`` | 3 |
> +-------+-----------+---+
> 
> As previously described, only the last action of a given type found in the
> list is taken into account. The above example also shows that VOID is
> ignored.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Action types
> ~~~~~~~~~~~~
> 
> Common action types are described in this section. Like pattern item types,
> this list is not exhaustive as new actions will be added in the future.
> 
> ``END`` (action)
> ^^^^^^^^^^^^^^^^
> 
> End marker for action lists. Prevents further processing of actions, thereby
> ending the list.
> 
> - Its numeric value is **0** for convenience.
> - PMD support is mandatory.
> - No configurable property.
> 
> +---------------+
> | END           |
> +===============+
> | no properties |
> +---------------+
> 
> ``VOID`` (action)
> ^^^^^^^^^^^^^^^^^
> 
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
> 
> - PMD support is mandatory.
> - No configurable property.
> 
> +---------------+
> | VOID          |
> +===============+
> | no properties |
> +---------------+
> 
> ``PASSTHRU``
> ^^^^^^^^^^^^
> 
> Leaves packets up for additional processing by subsequent flow rules. This
> is the default when a rule does not contain a terminating action, but can be
> specified to force a rule to become non-terminating.
> 
> - No configurable property.
> 
> +---------------+
> | PASSTHRU      |
> +===============+
> | no properties |
> +---------------+
> 
> Example to copy a packet to a queue and continue processing by subsequent
> flow rules:
> 
> +--------------------------+
> | Copy to queue 8          |
> +==========+===============+
> | PASSTHRU |               |
> +----------+-----------+---+
> | QUEUE    | ``queue`` | 8 |
> +----------+-----------+---+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``MARK``
> ^^^^^^^^
> 
> Attaches a 32 bit value to packets.
> 
> This value is arbitrary and application-defined. For compatibility with FDIR
> it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
> also set in ``ol_flags``.
> 
> +------------------------------------------------+
> | MARK                                           |
> +==========+=====================================+
> | ``mark`` | 32 bit value to return with packets |
> +----------+-------------------------------------+
> 
> ``FLAG``
> ^^^^^^^^
> 
> Flag packets. Similar to `MARK`_ but only affects ``ol_flags``.
> 
> Note: a distinctive flag must be defined for it.
> 
> +---------------+
> | FLAG          |
> +===============+
> | no properties |
> +---------------+
> 
> ``QUEUE``
> ^^^^^^^^^
> 
> Assigns packets to a given queue index.
> 
> - Terminating by default.
> 
> +--------------------------------+
> | QUEUE                          |
> +===========+====================+
> | ``queue`` | queue index to use |
> +-----------+--------------------+
> 
> ``DROP``
> ^^^^^^^^
> 
> Drop packets.
> 
> - No configurable property.
> - Terminating by default.
> - PASSTHRU overrides this action if both are specified.
> 
> +---------------+
> | DROP          |
> +===============+
> | no properties |
> +---------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``COUNT``
> ^^^^^^^^^
> 
> Enables counters for this rule.
> 
> These counters can be retrieved and reset through ``rte_flow_query()``, see
> ``struct rte_flow_query_count``.
> 
> - Counters can be retrieved with ``rte_flow_query()``.
> - No configurable property.
> 
> +---------------+
> | COUNT         |
> +===============+
> | no properties |
> +---------------+
> 
> Query structure to retrieve and reset flow rule counters:
> 
> +---------------------------------------------------------+
> | COUNT query                                             |
> +===============+=====+===================================
> +
> | ``reset``     | in  | reset counter after query         |
> +---------------+-----+-----------------------------------+
> | ``hits_set``  | out | ``hits`` field is set             |
> +---------------+-----+-----------------------------------+
> | ``bytes_set`` | out | ``bytes`` field is set            |
> +---------------+-----+-----------------------------------+
> | ``hits``      | out | number of hits for this rule      |
> +---------------+-----+-----------------------------------+
> | ``bytes``     | out | number of bytes through this rule |
> +---------------+-----+-----------------------------------+
> 
> ``DUP``
> ^^^^^^^
> 
> Duplicates packets to a given queue index.
> 
> This is normally combined with QUEUE, however when used alone, it is
> actually similar to QUEUE + PASSTHRU.
> 
> - Non-terminating by default.
> 
> +------------------------------------------------+
> | DUP                                            |
> +===========+====================================+
> | ``queue`` | queue index to duplicate packet to |
> +-----------+------------------------------------+
> 
> ``RSS``
> ^^^^^^^
> 
> Similar to QUEUE, except RSS is additionally performed on packets to spread
> them among several queues according to the provided parameters.
> 
> Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
> however it conflicts with the `MARK`_ action as they share the same
> space. When both actions are specified, the RSS hash is discarded and
> ``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
> structure should eventually evolve to store both.
> 
> - Terminating by default.
> 
> +---------------------------------------------+
> | RSS                                         |
> +==============+==============================+
> | ``rss_conf`` | RSS parameters               |
> +--------------+------------------------------+
> | ``queues``   | number of entries in queue[] |
> +--------------+------------------------------+
> | ``queue[]``  | queue indices to use         |
> +--------------+------------------------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``PF`` (action)
> ^^^^^^^^^^^^^^^
> 
> Redirects packets to the physical function (PF) of the current device.
> 
> - No configurable property.
> - Terminating by default.
> 
> +---------------+
> | PF            |
> +===============+
> | no properties |
> +---------------+
> 
> ``VF`` (action)
> ^^^^^^^^^^^^^^^
> 
> Redirects packets to a virtual function (VF) of the current device.
> 
> Packets matched by a VF pattern item can be redirected to their original VF
> ID instead of the specified one. This parameter may not be available and is
> not guaranteed to work properly if the VF part is matched by a prior flow
> rule or if packets are not addressed to a VF in the first place.
> 
> - Terminating by default.
> 
> +-----------------------------------------------+
> | VF                                            |
> +==============+================================+
> | ``original`` | use original VF ID if possible |
> +--------------+--------------------------------+
> | ``vf``       | VF ID to redirect packets to   |
> +--------------+--------------------------------+
> 
> Negative types
> ~~~~~~~~~~~~~~
> 
> All specified pattern items (``enum rte_flow_item_type``) and actions
> (``enum rte_flow_action_type``) use positive identifiers.
> 
> The negative space is reserved for dynamic types generated by PMDs during
> run-time, PMDs may encounter them as a result but do not have to accept
> the
> negative types they did not generate.
> 
> The method to generate them has not been specified yet.
> 
> Planned types
> ~~~~~~~~~~~~~
> 
> Pattern item types will be added as new protocols are implemented.
> 
> Variable headers support through dedicated pattern items, for example in
> order to match specific IPv4 options and IPv6 extension headers, these
> would
> be stacked behind IPv4/IPv6 items.
> 
> Other action types are planned but not defined yet. These actions will add
> the ability to alter matched packets in several ways, such as performing
> encapsulation/decapsulation of tunnel headers on specific flows.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Rules management
> ----------------
> 
> A simple API with few functions is provided to fully manage flows.
> 
> Each created flow rule is associated with an opaque, PMD-specific handle
> pointer. The application is responsible for keeping it until the rule is
> destroyed.
> 
> Flows rules are represented by ``struct rte_flow`` objects.
> 
> Validation
> ~~~~~~~~~~
> 
> Given that expressing a definite set of device capabilities with this API is
> not practical, a dedicated function is provided to check if a flow rule is
> supported and can be created.
> 
> ::
> 
>  int
>  rte_flow_validate(uint8_t port_id,
>                    const struct rte_flow_attr *attr,
>                    const struct rte_flow_pattern *pattern,
>                    const struct rte_flow_actions *actions,
>                    struct rte_flow_error *error);
> 
> While this function has no effect on the target device, the flow rule is
> validated against its current configuration state and the returned value
> should be considered valid by the caller for that state only.
> 
> The returned value is guaranteed to remain valid only as long as no
> successful calls to rte_flow_create() or rte_flow_destroy() are made in the
> meantime and no device parameter affecting flow rules in any way are
> modified, due to possible collisions or resource limitations (although in
> such cases ``EINVAL`` should not be returned).
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``attr``: flow rule attributes.
> - ``pattern``: pattern specification.
> - ``actions``: actions associated with the flow definition.
> - ``error``: perform verbose error reporting if not NULL.
> 
> Return value:
> 
> - **0** if flow rule is valid and can be created. A negative errno value
>   otherwise (``rte_errno`` is also set), the following errors are defined.
> - ``-ENOSYS``: underlying device does not support this functionality.
> - ``-EINVAL``: unknown or invalid rule specification.
> - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
>   bit-masks are unsupported).
> - ``-EEXIST``: collision with an existing rule.
> - ``-ENOMEM``: not enough resources.
> - ``-EBUSY``: action cannot be performed due to busy device resources, may
>   succeed if the affected queues or even the entire port are in a stopped
>   state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Creation
> ~~~~~~~~
> 
> Creating a flow rule is similar to validating one, except the rule is
> actually created and a handle returned.
> 
> ::
> 
>  struct rte_flow *
>  rte_flow_create(uint8_t port_id,
>                  const struct rte_flow_attr *attr,
>                  const struct rte_flow_pattern *pattern,
>                  const struct rte_flow_actions *actions,
>                  struct rte_flow_error *error);
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``attr``: flow rule attributes.
> - ``pattern``: pattern specification.
> - ``actions``: actions associated with the flow definition.
> - ``error``: perform verbose error reporting if not NULL.
> 
> Return value:
> 
> A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
> to the positive version of one of the error codes defined for
> ``rte_flow_validate()``.
> 
> Destruction
> ~~~~~~~~~~~
> 
> Flow rules destruction is not automatic, and a queue or a port should not be
> released if any are still attached to them. Applications must take care of
> performing this step before releasing resources.
> 
> ::
> 
>  int
>  rte_flow_destroy(uint8_t port_id,
>                   struct rte_flow *flow,
>                   struct rte_flow_error *error);
> 
> 
> Failure to destroy a flow rule handle may occur when other flow rules
> depend
> on it, and destroying it would result in an inconsistent state.
> 
> This function is only guaranteed to succeed if handles are destroyed in
> reverse order of their creation.
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``flow``: flow rule handle to destroy.
> - ``error``: perform verbose error reporting if not NULL.
> 
> Return value:
> 
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>   set.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Flush
> ~~~~~
> 
> Convenience function to destroy all flow rule handles associated with a
> port. They are released as with successive calls to ``rte_flow_destroy()``.
> 
> ::
> 
>  int
>  rte_flow_flush(uint8_t port_id,
>                 struct rte_flow_error *error);
> 
> In the unlikely event of failure, handles are still considered destroyed and
> no longer valid but the port must be assumed to be in an inconsistent state.
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``error``: perform verbose error reporting if not NULL.
> 
> Return value:
> 
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>   set.
> 
> Query
> ~~~~~
> 
> Query an existing flow rule.
> 
> This function allows retrieving flow-specific data such as counters. Data
> is gathered by special actions which must be present in the flow rule
> definition.
> 
> ::
> 
>  int
>  rte_flow_query(uint8_t port_id,
>                 struct rte_flow *flow,
>                 enum rte_flow_action_type action,
>                 void *data,
>                 struct rte_flow_error *error);
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``flow``: flow rule handle to query.
> - ``action``: action type to query.
> - ``data``: pointer to storage for the associated query data type.
> - ``error``: perform verbose error reporting if not NULL.
> 
> Return value:
> 
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>   set.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Verbose error reporting
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> The defined *errno* values may not be accurate enough for users or
> application developers who want to investigate issues related to flow rules
> management. A dedicated error object is defined for this purpose::
> 
>  enum rte_flow_error_type {
>      RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
>      RTE_FLOW_ERROR_TYPE_UNDEFINED, /**< Cause is undefined. */
>      RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
>      RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
>      RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
>      RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< field. */
>      RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< field. */
>      RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure itself. */
>      RTE_FLOW_ERROR_TYPE_PATTERN_MAX, /**< Pattern length (max field).
> */
>      RTE_FLOW_ERROR_TYPE_PATTERN_ITEM, /**< Specific pattern item. */
>      RTE_FLOW_ERROR_TYPE_PATTERN, /**< Pattern structure itself. */
>      RTE_FLOW_ERROR_TYPE_ACTION_MAX, /**< Number of actions (max
> field). */
>      RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
>      RTE_FLOW_ERROR_TYPE_ACTIONS, /**< Actions structure itself. */
>  };
> 
>  struct rte_flow_error {
>      enum rte_flow_error_type type; /**< Cause field and error types. */
>      void *cause; /**< Object responsible for the error. */
>      const char *message; /**< Human-readable error message. */
>  };
> 
> Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which
> case
> the remaining fields can be ignored. Other error types describe the object
> type pointed to by ``cause``.
> 
> If non-NULL, ``cause`` points to the object responsible for the error. For a
> flow rule, this may be a pattern item or an individual action.
> 
> If non-NULL, ``message`` provides a human-readable error message.
> 
> This object is normally allocated by applications and set by PMDs, the
> message points to a constant string which does not need to be freed by the
> application, however its pointer can be considered valid only as long as its
> associated DPDK port remains configured. Closing the underlying device or
> unloading the PMD invalidates it.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> PMD interface
> ~~~~~~~~~~~~~
> 
> This specification focuses on the public-facing interface, which must be
> fully defined from the start to avoid a re-design later as it is subject to
> API and ABI versioning constraints.
> 
> No such issue exists with the internal interface for use by poll-mode
> drivers which can evolve independently, hence this section only outlines how
> requests are processed by PMDs.
> 
> Public functions are mapped more or less directly to PMD operation
> callbacks, thus:
> 
> - Public API functions do not process flow rules definitions at all before
>   calling PMD callbacks (no basic error checking, no validation
>   whatsoever). They only make sure these callbacks are non-NULL or return
>   the ``ENOSYS`` (function not supported) error.
> 
> - DPDK does not keep track of flow rules definitions or flow rule objects
>   automatically. Applications may keep track of the former and must keep
>   track of the latter. PMDs may also do it for internal needs, however this
>   cannot be relied on by applications.
> 
> The private interface will provide helper functions to perform common tasks
> such as parsing, validating and keeping track of flow rule specifications to
> avoid redundant code in PMDs and ease implementation.
> 
> Its contents are currently largely undefined since at least one PMD
> implementation is necessary first. PMD maintainers are encouraged to share
> as much generic code as possible.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Caveats
> -------
> 
> - Flow rules are not maintained between successive port initializations. An
>   application exiting without releasing them and restarting must re-create
>   them from scratch.
> 
> - API operations are synchronous and blocking (``EAGAIN`` cannot be
>   returned).
> 
> - There is no provision for reentrancy/multi-thread safety, although nothing
>   should prevent different devices from being configured at the same
>   time. PMDs may protect their control path functions accordingly.
> 
> - Stopping the data path (TX/RX) should not be necessary when managing
> flow
>   rules. If this cannot be achieved naturally or with workarounds (such as
>   temporarily replacing the burst function pointers), an appropriate error
>   code must be returned (``EBUSY``).
> 
> - PMDs, not applications, are responsible for maintaining flow rules
>   configuration when stopping and restarting a port or performing other
>   actions which may affect them. They can only be destroyed explicitly.
> 
> For devices exposing multiple ports sharing global settings affected by flow
> rules:
> 
> - All ports under DPDK control must behave consistently, PMDs are
>   responsible for making sure that existing flow rules on a port are not
>   affected by other ports.
> 
> - Ports not under DPDK control (unaffected or handled by other applications)
>   are user's responsibility. They may affect existing flow rules and cause
>   undefined behavior. PMDs aware of this may prevent flow rules creation
>   altogether in such cases.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Compatibility
> -------------
> 
> No known hardware implementation supports all the features described in
> this
> document.
> 
> Unsupported features or combinations are not expected to be fully
> emulated
> in software by PMDs for performance reasons. Partially supported features
> may be completed in software as long as hardware performs most of the
> work
> (such as queue redirection and packet recognition).
> 
> However PMDs are expected to do their best to satisfy application requests
> by working around hardware limitations as long as doing so does not affect
> the behavior of existing flow rules.
> 
> The following sections provide a few examples of such cases, they are based
> on limitations built into the previous APIs.
> 
> Global bit-masks
> ~~~~~~~~~~~~~~~~
> 
> Each flow rule comes with its own, per-layer bit-masks, while hardware may
> support only a single, device-wide bit-mask for a given layer type, so that
> two IPv4 rules cannot use different bit-masks.
> 
> The expected behavior in this case is that PMDs automatically configure
> global bit-masks according to the needs of the first created flow rule.
> 
> Subsequent rules are allowed only if their bit-masks match those, the
> ``EEXIST`` error code should be returned otherwise.
> 
> Unsupported layer types
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> Many protocols can be simulated by crafting patterns with the `RAW`_ type.
> 
> PMDs can rely on this capability to simulate support for protocols with
> fixed headers not directly recognized by hardware.
> 
> ``ANY`` pattern item
> ~~~~~~~~~~~~~~~~~~~~
> 
> This pattern item stands for anything, which can be difficult to translate
> to something hardware would understand, particularly if followed by more
> specific types.
> 
> Consider the following pattern:
> 
> +---+--------------------------------+
> | 0 | ETHER                          |
> +---+--------------------------------+
> | 1 | ANY (``min`` = 1, ``max`` = 1) |
> +---+--------------------------------+
> | 2 | TCP                            |
> +---+--------------------------------+
> 
> Knowing that TCP does not make sense with something other than IPv4 and
> IPv6
> as L3, such a pattern may be translated to two flow rules instead:
> 
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV4 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
> 
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV6 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
> 
> Note that as soon as a ANY rule covers several layers, this approach may
> yield a large number of hidden flow rules. It is thus suggested to only
> support the most common scenarios (anything as L2 and/or L3).
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Unsupported actions
> ~~~~~~~~~~~~~~~~~~~
> 
> - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
>   tagging (`MARK`_ or `FLAG`_) may be implemented in software as long as
> the
>   target queue is used by a single rule.
> 
> - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
>   rules combining `QUEUE`_ and `PASSTHRU`_.
> 
> - When a single target queue is provided, `RSS`_ can also be implemented
>   through `QUEUE`_.
> 
> Flow rules priority
> ~~~~~~~~~~~~~~~~~~~
> 
> While it would naturally make sense, flow rules cannot be assumed to be
> processed by hardware in the same order as their creation for several
> reasons:
> 
> - They may be managed internally as a tree or a hash table instead of a
>   list.
> - Removing a flow rule before adding another one can either put the new
> rule
>   at the end of the list or reuse a freed entry.
> - Duplication may occur when packets are matched by several rules.
> 
> For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> predictable behavior is only guaranteed by using different priority levels.
> 
> Priority levels are not necessarily implemented in hardware, or may be
> severely limited (e.g. a single priority bit).
> 
> For these reasons, priority levels may be implemented purely in software by
> PMDs.
> 
> - For devices expecting flow rules to be added in the correct order, PMDs
>   may destroy and re-create existing rules after adding a new one with
>   a higher priority.
> 
> - A configurable number of dummy or empty rules can be created at
>   initialization time to save high priority slots for later.
> 
> - In order to save priority levels, PMDs may evaluate whether rules are
>   likely to collide and adjust their priority accordingly.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> API migration
> =============
> 
> Exhaustive list of deprecated filter types and how to convert them to
> generic flow rules.
> 
> ``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
> ---------------------------------------
> 
> `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF
> (action)`_ or `PF (action)`_ terminating action.
> 
> +------------------------------------+
> | MACVLAN                            |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | ETH | ``spec`` | any | VF,     |
> |   |     +----------+-----+ PF      |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
> 
> ``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
> ----------------------------------------------
> 
> `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as
> a terminating action.
> 
> +------------------------------------+
> | ETHERTYPE                          |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | ETH | ``spec`` | any | QUEUE,  |
> |   |     +----------+-----+ DROP    |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
> 
> ``FLEXIBLE`` to ``RAW`` → ``QUEUE``
> -----------------------------------
> 
> `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the
> terminating action and a defined priority level.
> 
> +------------------------------------+
> | FLEXIBLE                           |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | RAW | ``spec`` | any | QUEUE   |
> |   |     +----------+-----+         |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
> 
> ``SYN`` to ``TCP`` → ``QUEUE``
> ------------------------------
> 
> `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
> `QUEUE`_ as the terminating action.
> 
> Priority level can be set to simulate the high priority bit.
> 
> +---------------------------------------------+
> | SYN                                         |
> +-----------------------------------+---------+
> | Pattern                           | Actions |
> +===+======+==========+=============+=========+
> | 0 | ETH  | ``spec`` | empty       | QUEUE   |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | empty       |         |
> +---+------+----------+-------------+         |
> | 1 | IPV4 | ``spec`` | empty       |         |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | empty       |         |
> +---+------+----------+-------------+         |
> | 2 | TCP  | ``spec`` | ``syn`` = 1 |         |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | ``syn`` = 1 |         |
> +---+------+----------+-------------+---------+
> 
> ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
> ----------------------------------------------------
> 
> `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
> `UDP`_ as L4 and `QUEUE`_ as the terminating action.
> 
> A priority level can be specified as well.
> 
> +---------------------------------------+
> | NTUPLE                                |
> +-----------------------------+---------+
> | Pattern                     | Actions |
> +===+======+==========+=======+=========+
> | 0 | ETH  | ``spec`` | empty | QUEUE   |
> |   |      +----------+-------+         |
> |   |      | ``mask`` | empty |         |
> +---+------+----------+-------+         |
> | 1 | IPV4 | ``spec`` | any   |         |
> |   |      +----------+-------+         |
> |   |      | ``mask`` | any   |         |
> +---+------+----------+-------+         |
> | 2 | TCP, | ``spec`` | any   |         |
> |   | UDP  +----------+-------+         |
> |   |      | ``mask`` | any   |         |
> +---+------+----------+-------+---------+
> 
> ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
> ---------------------------------------------------------------------------
> 
> `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types.
> 
> In the following table, `ANY`_ is used to cover the optional L4.
> 
> +------------------------------------------------+
> | TUNNEL                                         |
> +--------------------------------------+---------+
> | Pattern                              | Actions |
> +===+=========+==========+=============+=========+
> | 0 | ETH     | ``spec`` | any         | QUEUE   |
> |   |         +----------+-------------+         |
> |   |         | ``mask`` | any         |         |
> +---+---------+----------+-------------+         |
> | 1 | IPV4,   | ``spec`` | any         |         |
> |   | IPV6    +----------+-------------+         |
> |   |         | ``mask`` | any         |         |
> +---+---------+----------+-------------+         |
> | 2 | ANY     | ``spec`` | ``min`` = 0 |         |
> |   |         |          +-------------+         |
> |   |         |          | ``max`` = 0 |         |
> |   |         +----------+-------------+         |
> |   |         | ``mask`` | N/A         |         |
> +---+---------+----------+-------------+         |
> | 3 | VXLAN,  | ``spec`` | any         |         |
> |   | GENEVE, +----------+-------------+         |
> |   | TEREDO, | ``mask`` | any         |         |
> |   | NVGRE,  |          |             |         |
> |   | GRE,    |          |             |         |
> |   | ...     |          |             |         |
> +---+---------+----------+-------------+---------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
> ---------------------------------------------------------------
> 
> `FDIR`_ is more complex than any other type, there are several methods to
> emulate its functionality. It is summarized for the most part in the table
> below.
> 
> A few features are intentionally not supported:
> 
> - The ability to configure the matching input set and masks for the entire
>   device, PMDs should take care of it automatically according to the
>   requested flow rules.
> 
>   For example if a device supports only one bit-mask per protocol type,
>   source/address IPv4 bit-masks can be made immutable by the first created
>   rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
>   compatible.
> 
>   Note that only protocol bit-masks affected by existing flow rules are
>   immutable, others can be changed later. They become mutable again after
>   the related flow rules are destroyed.
> 
> - Returning four or eight bytes of matched data when using flex bytes
>   filtering. Although a specific action could implement it, it conflicts
>   with the much more useful 32 bits tagging on devices that support it.
> 
> - Side effects on RSS processing of the entire device. Flow rules that
>   conflict with the current device configuration should not be
>   allowed. Similarly, device configuration should not be allowed when it
>   affects existing flow rules.
> 
> - Device modes of operation. "none" is unsupported since filtering cannot be
>   disabled as long as a flow rule is present.
> 
> - "MAC VLAN" or "tunnel" perfect matching modes should be automatically
> set
>   according to the created flow rules.
> 
> - Signature mode of operation is not defined but could be handled through a
>   specific item type if needed.
> 
> +----------------------------------------------+
> | FDIR                                         |
> +---------------------------------+------------+
> | Pattern                         | Actions    |
> +===+============+==========+=====+============+
> | 0 | ETH,       | ``spec`` | any | QUEUE,     |
> |   | RAW        +----------+-----+ DROP,      |
> |   |            | ``mask`` | any | PASSTHRU   |
> +---+------------+----------+-----+------------+
> | 1 | IPV4,      | ``spec`` | any | MARK       |
> |   | IPV6       +----------+-----+ (optional) |
> |   |            | ``mask`` | any |            |
> +---+------------+----------+-----+            |
> | 2 | TCP,       | ``spec`` | any |            |
> |   | UDP,       +----------+-----+            |
> |   | SCTP       | ``mask`` | any |            |
> +---+------------+----------+-----+            |
> | 3 | VF,        | ``spec`` | any |            |
> |   | PF         +----------+-----+            |
> |   | (optional) | ``mask`` | any |            |
> +---+------------+----------+-----+------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``HASH``
> ~~~~~~~~
> 
> There is no counterpart to this filter type because it translates to a
> global device setting instead of a pattern item. Device settings are
> automatically set according to the created flow rules.
> 
> ``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> All packets are matched. This type alters incoming packets to encapsulate
> them in a chosen tunnel type, optionally redirect them to a VF as well.
> 
> The destination pool for tag based forwarding can be emulated with other
> flow rules using `DUP`_ as the action.
> 
> +----------------------------------------+
> | L2_TUNNEL                              |
> +---------------------------+------------+
> | Pattern                   | Actions    |
> +===+======+==========+=====+============+
> | 0 | VOID | ``spec`` | N/A | VXLAN,     |
> |   |      |          |     | GENEVE,    |
> |   |      |          |     | ...        |
> |   |      +----------+-----+------------+
> |   |      | ``mask`` | N/A | VF         |
> |   |      |          |     | (optional) |
> +---+------+----------+-----+------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Future evolutions
> =================
> 
> - Describing dedicated testpmd commands to control and validate this API.
> 
> - A method to optimize generic flow rules with specific pattern items and
>   action types generated on the fly by PMDs. DPDK will assign negative
>   numbers to these in order to not collide with the existing types. See
>   `Negative types`_.
> 
> - Adding specific egress pattern items and actions as described in `Traffic
>   direction`_.
> 
> - Optional software fallback when PMDs are unable to handle requested flow
>   rules so applications do not have to implement their own.
> 
> - Ranges in addition to bit-masks. Ranges are more generic in many ways as
>   they interpret values. For instance only ranges make sense to cover
>   several TCP or UDP ports. These will probably be defined on a pattern item
>   basis.
> 
> --------
> 
> Adrien Mazarguil (1):
>   ethdev: introduce generic flow API
> 
>  lib/librte_ether/Makefile   |   2 +
>  lib/librte_ether/rte_flow.h | 941
> +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 943 insertions(+)
>  create mode 100644 lib/librte_ether/rte_flow.h
> 
> --
> 2.1.4


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 1/7] vhost: simplify memory regions handling
  @ 2016-10-09  7:27  3%     ` Yuanhan Liu
  0 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2016-10-09  7:27 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Yuanhan Liu

Due to history reason (that vhost-cuse comes before vhost-user), some
fields for maintaining the vhost-user memory mappings (such as mmapped
address and size, with those we then can unmap on destroy) are kept in
"orig_region_map" struct, a structure that is defined only in vhost-user
source file.

The right way to go is to remove the structure and move all those fields
into virtio_memory_region struct. But we simply can't do that before,
because it breaks the ABI.

Now, thanks to the ABI refactoring, it's never been a blocking issue
any more. And here it goes: this patch removes orig_region_map and
redefines virtio_memory_region, to include all necessary info.

With that, we can simplify the guest/host address convert a bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  49 ++++++------
 lib/librte_vhost/vhost_user.c | 173 +++++++++++++++++-------------------------
 2 files changed, 91 insertions(+), 131 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c2dfc3c..df2107b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -143,12 +143,14 @@ struct virtio_net {
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
  */
-struct virtio_memory_regions {
-	uint64_t guest_phys_address;
-	uint64_t guest_phys_address_end;
-	uint64_t memory_size;
-	uint64_t userspace_address;
-	uint64_t address_offset;
+struct virtio_memory_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
 };
 
 
@@ -156,12 +158,8 @@ struct virtio_memory_regions {
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-	/* Base QEMU userspace address of the memory file. */
-	uint64_t base_address;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
 	uint32_t nregions;
-	struct virtio_memory_regions regions[0];
+	struct virtio_memory_region regions[0];
 };
 
 
@@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
 #define MAX_VHOST_DEVICE	1024
 extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
-/**
- * Function to convert guest physical addresses to vhost virtual addresses.
- * This is used to convert guest virtio buffer addresses.
- */
+/* Convert guest physical Address to host virtual address */
 static inline uint64_t __attribute__((always_inline))
-gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
+gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
 {
-	struct virtio_memory_regions *region;
-	uint32_t regionidx;
-	uint64_t vhost_va = 0;
-
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((guest_pa >= region->guest_phys_address) &&
-			(guest_pa <= region->guest_phys_address_end)) {
-			vhost_va = region->address_offset + guest_pa;
-			break;
+	struct virtio_memory_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 struct virtio_net_device_ops const *notify_ops;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index eee99e9..49585b8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 };
 
-struct orig_region_map {
-	int fd;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
-	uint64_t blksz;
-};
-
-#define orig_region(ptr, nregions) \
-	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
-		sizeof(struct virtio_memory) + \
-		sizeof(struct virtio_memory_regions) * (nregions)))
-
 static uint64_t
 get_blk_size(int fd)
 {
@@ -99,18 +87,17 @@ get_blk_size(int fd)
 static void
 free_mem_region(struct virtio_net *dev)
 {
-	struct orig_region_map *region;
-	unsigned int idx;
+	uint32_t i;
+	struct virtio_memory_region *reg;
 
 	if (!dev || !dev->mem)
 		return;
 
-	region = orig_region(dev->mem, dev->mem->nregions);
-	for (idx = 0; idx < dev->mem->nregions; idx++) {
-		if (region[idx].mapped_address) {
-			munmap((void *)(uintptr_t)region[idx].mapped_address,
-					region[idx].mapped_size);
-			close(region[idx].fd);
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr) {
+			munmap(reg->mmap_addr, reg->mmap_size);
+			close(reg->fd);
 		}
 	}
 }
@@ -120,7 +107,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
 {
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 	if (dev->log_addr) {
@@ -286,25 +273,23 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
  * used to convert the ring addresses to our address space.
  */
 static uint64_t
-qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
+qva_to_vva(struct virtio_net *dev, uint64_t qva)
 {
-	struct virtio_memory_regions *region;
-	uint64_t vhost_va = 0;
-	uint32_t regionidx = 0;
+	struct virtio_memory_region *reg;
+	uint32_t i;
 
 	/* Find the region where the address lives. */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((qemu_va >= region->userspace_address) &&
-			(qemu_va <= region->userspace_address +
-			region->memory_size)) {
-			vhost_va = qemu_va + region->guest_phys_address +
-				region->address_offset -
-				region->userspace_address;
-			break;
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+
+		if (qva >= reg->guest_user_addr &&
+		    qva <  reg->guest_user_addr + reg->size) {
+			return qva - reg->guest_user_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 /*
@@ -391,11 +376,13 @@ static int
 vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 {
 	struct VhostUserMemory memory = pmsg->payload.memory;
-	struct virtio_memory_regions *pregion;
-	uint64_t mapped_address, mapped_size;
-	unsigned int idx = 0;
-	struct orig_region_map *pregion_orig;
+	struct virtio_memory_region *reg;
+	void *mmap_addr;
+	uint64_t mmap_size;
+	uint64_t mmap_offset;
 	uint64_t alignment;
+	uint32_t i;
+	int fd;
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
@@ -405,14 +392,12 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 
-	dev->mem = calloc(1,
-		sizeof(struct virtio_memory) +
-		sizeof(struct virtio_memory_regions) * memory.nregions +
-		sizeof(struct orig_region_map) * memory.nregions);
+	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_region) * memory.nregions, 0);
 	if (dev->mem == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to allocate memory for dev->mem\n",
@@ -421,22 +406,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	}
 	dev->mem->nregions = memory.nregions;
 
-	pregion_orig = orig_region(dev->mem, memory.nregions);
-	for (idx = 0; idx < memory.nregions; idx++) {
-		pregion = &dev->mem->regions[idx];
-		pregion->guest_phys_address =
-			memory.regions[idx].guest_phys_addr;
-		pregion->guest_phys_address_end =
-			memory.regions[idx].guest_phys_addr +
-			memory.regions[idx].memory_size;
-		pregion->memory_size =
-			memory.regions[idx].memory_size;
-		pregion->userspace_address =
-			memory.regions[idx].userspace_addr;
-
-		/* This is ugly */
-		mapped_size = memory.regions[idx].memory_size +
-			memory.regions[idx].mmap_offset;
+	for (i = 0; i < memory.nregions; i++) {
+		fd  = pmsg->fds[i];
+		reg = &dev->mem->regions[i];
+
+		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
+		reg->guest_user_addr = memory.regions[i].userspace_addr;
+		reg->size            = memory.regions[i].memory_size;
+		reg->fd              = fd;
+
+		mmap_offset = memory.regions[i].mmap_offset;
+		mmap_size   = reg->size + mmap_offset;
 
 		/* mmap() without flag of MAP_ANONYMOUS, should be called
 		 * with length argument aligned with hugepagesz at older
@@ -446,67 +426,52 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 		 * to avoid failure, make sure in caller to keep length
 		 * aligned.
 		 */
-		alignment = get_blk_size(pmsg->fds[idx]);
+		alignment = get_blk_size(fd);
 		if (alignment == (uint64_t)-1) {
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"couldn't get hugepage size through fstat\n");
 			goto err_mmap;
 		}
-		mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
+		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
 
-		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
-			mapped_size,
-			PROT_READ | PROT_WRITE, MAP_SHARED,
-			pmsg->fds[idx],
-			0);
+		mmap_addr = mmap(NULL, mmap_size,
+				 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
-			"off:0x%"PRIx64" align:0x%"PRIx64"\n",
-			idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
-			mapped_size, memory.regions[idx].mmap_offset,
-			alignment);
-
-		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
+		if (mmap_addr == MAP_FAILED) {
 			RTE_LOG(ERR, VHOST_CONFIG,
-				"mmap qemu guest failed.\n");
+				"mmap region %u failed.\n", i);
 			goto err_mmap;
 		}
 
-		pregion_orig[idx].mapped_address = mapped_address;
-		pregion_orig[idx].mapped_size = mapped_size;
-		pregion_orig[idx].blksz = alignment;
-		pregion_orig[idx].fd = pmsg->fds[idx];
-
-		mapped_address +=  memory.regions[idx].mmap_offset;
+		reg->mmap_addr = mmap_addr;
+		reg->mmap_size = mmap_size;
+		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
+				      mmap_offset;
 
-		pregion->address_offset = mapped_address -
-			pregion->guest_phys_address;
-
-		if (memory.regions[idx].guest_phys_addr == 0) {
-			dev->mem->base_address =
-				memory.regions[idx].userspace_addr;
-			dev->mem->mapped_address =
-				pregion->address_offset;
-		}
-
-		LOG_DEBUG(VHOST_CONFIG,
-			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
-			idx,
-			(void *)(uintptr_t)pregion->guest_phys_address,
-			(void *)(uintptr_t)pregion->userspace_address,
-			 pregion->memory_size);
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"guest memory region %u, size: 0x%" PRIx64 "\n"
+			"\t guest physical addr: 0x%" PRIx64 "\n"
+			"\t guest virtual  addr: 0x%" PRIx64 "\n"
+			"\t host  virtual  addr: 0x%" PRIx64 "\n"
+			"\t mmap addr : 0x%" PRIx64 "\n"
+			"\t mmap size : 0x%" PRIx64 "\n"
+			"\t mmap align: 0x%" PRIx64 "\n"
+			"\t mmap off  : 0x%" PRIx64 "\n",
+			i, reg->size,
+			reg->guest_phys_addr,
+			reg->guest_user_addr,
+			reg->host_user_addr,
+			(uint64_t)(uintptr_t)mmap_addr,
+			mmap_size,
+			alignment,
+			mmap_offset);
 	}
 
 	return 0;
 
 err_mmap:
-	while (idx--) {
-		munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
-				pregion_orig[idx].mapped_size);
-		close(pregion_orig[idx].fd);
-	}
-	free(dev->mem);
+	free_mem_region(dev);
+	rte_free(dev->mem);
 	dev->mem = NULL;
 	return -1;
 }
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance
@ 2016-10-09  3:16 13% Qiming Yang
  0 siblings, 0 replies; 200+ results
From: Qiming Yang @ 2016-10-09  3:16 UTC (permalink / raw)
  To: dev; +Cc: Qiming Yang

This patch adds a notice that the ABI change for ethtool app to
get the NIC firmware version in the 17.02 release.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 845d2aa..60bd7ed 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,7 @@ Deprecation Notices
 * API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
   structures. The member ``file_name`` data type will be changed from
   ``char *`` to ``const char *``. This change targets release 16.11.
+
+* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
+  will be extended with a new member ``fw_version`` in order to store
+  the NIC firmware version.
-- 
2.7.4

^ permalink raw reply	[relevance 13%]

* Re: [dpdk-dev] [PATCH v5 01/13] librte_ether: modify internal callback function
  2016-10-06 14:56  0%       ` Thomas Monjalon
@ 2016-10-06 15:32  0%         ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2016-10-06 15:32 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Shah, Rahul R, Lu, Wenzhuo, az5157, jerin.jacob

Hi Thomas,

<snip>

> > > Subject: Re: [dpdk-dev] [PATCH v5 01/13] librte_ether: modify
> > > internal callback function
> > >
> > > 2016-10-06 12:26, Bernard Iremonger:
> > > >  void
> > > >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > > > -	enum rte_eth_event_type event)
> > > > +	enum rte_eth_event_type event, void *param)
> > >
> > > You need to squash the patches updating the calls to this function.
> > > Otherwise, this patch does not compile.
> >
> > I will have to squash everything into one patch, separate patches will not
> compile.
> 
> No you can keep a separate patch for the VF event in ixgbe.

I have 4 patches at present

librte_ether
net/ixgbe
drivers/net
app/test

Would this be acceptable or do you just want  everything squashed into librte_ether except for net/ixgbe?
 
 
> > > [...]
> > > > +		if (param != NULL)
> > > > +			dev_cb.cb_arg = (void *) param;
> > >
> > > You are overriding the user parameter.
> >
> > Yes, we want to update the user parameter for the
> > RTE_ETH_EVENT_VF_MBOX

I have renamed param to cb_arg to make it clearer what is happening.


> > > As it is only for a new event, it can be described in the register
> > > API that the user param won't be returned for this event.
> >
> > I will add a description in the rte_eth_dev_callback_register  function.
> >
> > > But a better design would be to add a new parameter to the callback.
> > > However it will be an API breakage.
> >
> > I do not want to break the ABI at this point.
> 
> Yes, but it can be considered for a later change.

Yes, ok
 
> > > > +	RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> > >
> > > Sorry I do not parse well this line.
> > > The event name is VF_MBOX and the comment is about the callback
> > > processing this event on PF side?
> > > I would suggest this kind of comment: "message from VF received by PF"
> >
> > Ok.
> >
> > >
> > > [...]
> > > >   *  Pointer to struct rte_eth_dev.
> > > >   * @param event
> > > >   *  Eth device interrupt event type.
> > > > + * @param param
> > > > + *  Parameter to pass back to user application.
> > > > + *  Allows the user application to decide if a particular
> > > > + function
> > > > + *  is permitted.
> > >
> > > In a more generic case, the parameter gives some details about the
> event.
> 
> Please consider a rewording here, thanks.
 
I have reworded here, I hope it is clearer.

Regards,

Bernard.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 01/13] librte_ether: modify internal callback function
  2016-10-06 14:33  3%     ` Iremonger, Bernard
@ 2016-10-06 14:56  0%       ` Thomas Monjalon
  2016-10-06 15:32  0%         ` Iremonger, Bernard
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-10-06 14:56 UTC (permalink / raw)
  To: Iremonger, Bernard; +Cc: dev, Shah, Rahul R, Lu, Wenzhuo, az5157, jerin.jacob

2016-10-06 14:33, Iremonger, Bernard:
> Hi Thomas,
> 
> > Subject: Re: [dpdk-dev] [PATCH v5 01/13] librte_ether: modify internal
> > callback function
> > 
> > 2016-10-06 12:26, Bernard Iremonger:
> > >  void
> > >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > > -	enum rte_eth_event_type event)
> > > +	enum rte_eth_event_type event, void *param)
> > 
> > You need to squash the patches updating the calls to this function.
> > Otherwise, this patch does not compile.
> 
> I will have to squash everything into one patch, separate patches will not compile.

No you can keep a separate patch for the VF event in ixgbe.

> > [...]
> > > +		if (param != NULL)
> > > +			dev_cb.cb_arg = (void *) param;
> > 
> > You are overriding the user parameter.
> 
> Yes, we want to update the user parameter for the RTE_ETH_EVENT_VF_MBOX
> 
> > As it is only for a new event, it can be described in the register API that the
> > user param won't be returned for this event.
> 
> I will add a description in the rte_eth_dev_callback_register  function.
> 
> > But a better design would be to add a new parameter to the callback.
> > However it will be an API breakage.
> 
> I do not want to break the ABI at this point.

Yes, but it can be considered for a later change.

> > > +	RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> > 
> > Sorry I do not parse well this line.
> > The event name is VF_MBOX and the comment is about the callback
> > processing this event on PF side?
> > I would suggest this kind of comment: "message from VF received by PF"
> 
> Ok.
> 
> > 
> > [...]
> > >   *  Pointer to struct rte_eth_dev.
> > >   * @param event
> > >   *  Eth device interrupt event type.
> > > + * @param param
> > > + *  Parameter to pass back to user application.
> > > + *  Allows the user application to decide if a particular function
> > > + *  is permitted.
> > 
> > In a more generic case, the parameter gives some details about the event.

Please consider a rewording here, thanks.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 01/13] librte_ether: modify internal callback function
  @ 2016-10-06 14:33  3%     ` Iremonger, Bernard
  2016-10-06 14:56  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Iremonger, Bernard @ 2016-10-06 14:33 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Shah, Rahul R, Lu, Wenzhuo, az5157, jerin.jacob

Hi Thomas,

> Subject: Re: [dpdk-dev] [PATCH v5 01/13] librte_ether: modify internal
> callback function
> 
> 2016-10-06 12:26, Bernard Iremonger:
> >  void
> >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > -	enum rte_eth_event_type event)
> > +	enum rte_eth_event_type event, void *param)
> 
> You need to squash the patches updating the calls to this function.
> Otherwise, this patch does not compile.

I will have to squash everything into one patch, separate patches will not compile.
 
> [...]
> > +		if (param != NULL)
> > +			dev_cb.cb_arg = (void *) param;
> 
> You are overriding the user parameter.


Yes, we want to update the user parameter for the RTE_ETH_EVENT_VF_MBOX

> As it is only for a new event, it can be described in the register API that the
> user param won't be returned for this event.

I will add a description in the rte_eth_dev_callback_register  function.

> But a better design would be to add a new parameter to the callback.
> However it will be an API breakage.

I do not want to break the ABI at this point.

> 
> > +	RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> 
> Sorry I do not parse well this line.
> The event name is VF_MBOX and the comment is about the callback
> processing this event on PF side?
> I would suggest this kind of comment: "message from VF received by PF"

Ok.

> 
> [...]
> >   *  Pointer to struct rte_eth_dev.
> >   * @param event
> >   *  Eth device interrupt event type.
> > + * @param param
> > + *  Parameter to pass back to user application.
> > + *  Allows the user application to decide if a particular function
> > + *  is permitted.
> 
> In a more generic case, the parameter gives some details about the event.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions
  2016-10-05 17:04  4%     ` Iremonger, Bernard
@ 2016-10-05 17:19  3%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-10-05 17:19 UTC (permalink / raw)
  To: Iremonger, Bernard; +Cc: dev, Shah, Rahul R, Lu, Wenzhuo, az5157, jerin.jacob

2016-10-05 17:04, Iremonger, Bernard:
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -2510,6 +2510,20 @@ void
> > >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > >  	enum rte_eth_event_type event)
> > >  {
> > > +	return _rte_eth_dev_callback_process_generic(dev, event, NULL); }
> > > +
> > > +void
> > > +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> > > +	enum rte_eth_event_type event, void *param) {
> > > +	return _rte_eth_dev_callback_process_generic(dev, event, param);
> > }
> > 
> > This function is just adding a parameter, compared to the legacy
> > _rte_eth_dev_callback_process.
> > Why calling it process_vf?
> 
> The parameter is just being added for the VF event, the handling of the other events is unchanged.
> 
> > And by the way, why not just replacing the legacy function?
> > As it is a driver interface, there is no ABI restriction.
> 
> I thought there would be an ABI issue if the legacy function is replaced.
> The _rte_eth_dev_callback_process is exported in DPDK 2.2 and used in the following PMD's, lib and app:
> 
> app/test/virtual_pmd
> drivers/net/e1000
> drivers/net/ixgbe
> drivers/net/mlx5
> drivers/net/vhost
> drivers/net/virtio
> lib/librte_ether
> 
>  Adding a parameter to _rte_eth_dev_callback_process()  will impact all of the above.
> Will this cause an ABI issue?

No because ABI is for applications (Application Binary Interface).
Here you are just changing the driver interface. And we have no commitment
to maintain the compatibility of this interface for external drivers.

> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -3026,6 +3026,7 @@ enum rte_eth_event_type {
> > >  				/**< queue state event (enabled/disabled)
> > */
> > >  	RTE_ETH_EVENT_INTR_RESET,
> > >  			/**< reset interrupt event, sent to VF on PF reset */
> > > +	RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> > >  	RTE_ETH_EVENT_MAX       /**< max value of this enum */
> > >  };
> > 
> > Either we choose to have a "generic" VF event well documented, or it is just
> > a specific event with a tip on where to find the doc.
> > Here we need at least to know how to handle the argument.
> 
> It is a specific event for VF to PF messages, details on the function and arguments are in the rte_ethdev.h file.

No I think it is only explained in the ixgbe code.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions
  2016-10-05 16:10  3%   ` Thomas Monjalon
@ 2016-10-05 17:04  4%     ` Iremonger, Bernard
  2016-10-05 17:19  3%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Iremonger, Bernard @ 2016-10-05 17:04 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Shah, Rahul R, Lu, Wenzhuo, az5157, jerin.jacob

Hi Thomas,

<snip>

> Subject: Re: [dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback
> functions
> 
> 2016-10-04 15:52, Bernard Iremonger:
> > add _rte_eth_dev_callback_process_vf function.
> > add _rte_eth_dev_callback_process_generic function
> >
> > Adding a callback to the user application on VF to PF mailbox message,
> > allows passing information to the application controlling the PF when
> > a VF mailbox event message is received, such as VF reset.
> 
> I have some difficulties to parse this explanation.
> Please could you reword it and precise the direction of the message and the
> use case context?

I will reword the explanation and add use case context.
 
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -2510,6 +2510,20 @@ void
> >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> >  	enum rte_eth_event_type event)
> >  {
> > +	return _rte_eth_dev_callback_process_generic(dev, event, NULL); }
> > +
> > +void
> > +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> > +	enum rte_eth_event_type event, void *param) {
> > +	return _rte_eth_dev_callback_process_generic(dev, event, param);
> }
> 
> This function is just adding a parameter, compared to the legacy
> _rte_eth_dev_callback_process.
> Why calling it process_vf?

The parameter is just being added for the VF event, the handling of the other events is unchanged.

> And by the way, why not just replacing the legacy function?
> As it is a driver interface, there is no ABI restriction.

I thought there would be an ABI issue if the legacy function is replaced.
The _rte_eth_dev_callback_process is exported in DPDK 2.2 and used in the following PMD's, lib and app:

app/test/virtual_pmd
drivers/net/e1000
drivers/net/ixgbe
drivers/net/mlx5
drivers/net/vhost
drivers/net/virtio
lib/librte_ether

 Adding a parameter to _rte_eth_dev_callback_process()  will impact all of the above.
Will this cause an ABI issue?

> > +
> > +void
> > +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> > +	enum rte_eth_event_type event, void *param) {
> [...]
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -3026,6 +3026,7 @@ enum rte_eth_event_type {
> >  				/**< queue state event (enabled/disabled)
> */
> >  	RTE_ETH_EVENT_INTR_RESET,
> >  			/**< reset interrupt event, sent to VF on PF reset */
> > +	RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> >  	RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >  };
> 
> Either we choose to have a "generic" VF event well documented, or it is just
> a specific event with a tip on where to find the doc.
> Here we need at least to know how to handle the argument.

It is a specific event for VF to PF messages, details on the function and arguments are in the rte_ethdev.h file.
 
> > +/**
> > + * @internal Executes all the user application registered callbacks. Used
> by:
> > + * _rte_eth_dev_callback_process and
> _rte_eth_dev_callback_process_vf
> > + * It is for DPDK internal user only. User application should not
> > +call it
> > + * directly.
> > + *
> > + * @param dev
> > + *  Pointer to struct rte_eth_dev.
> > + * @param event
> > + *  Eth device interrupt event type.
> > + *
> > + * @param param
> > + *  parameters to pass back to user application.
> > + *
> > + * @return
> > + *  void
> > + */
> > +void
> > +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> > +				enum rte_eth_event_type event, void
> *param);
> 
> This is really an internal function and should not be exported at all.

Both new functions are internal I  will make them static and remove them from the map file.
When the functions are made static, should the function declarations be moved from rte_ethdev.h to rte_ethdev.c ?

Thanks for the review.

Regards,

Bernard.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions
  @ 2016-10-05 16:10  3%   ` Thomas Monjalon
  2016-10-05 17:04  4%     ` Iremonger, Bernard
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-10-05 16:10 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev, rahul.r.shah, wenzhuo.lu, az5157, jerin.jacob

2016-10-04 15:52, Bernard Iremonger:
> add _rte_eth_dev_callback_process_vf function.
> add _rte_eth_dev_callback_process_generic function
> 
> Adding a callback to the user application on VF to PF mailbox message,
> allows passing information to the application controlling the PF
> when a VF mailbox event message is received, such as VF reset.

I have some difficulties to parse this explanation.
Please could you reword it and precise the direction of the message
and the use case context?

> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2510,6 +2510,20 @@ void
>  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>  	enum rte_eth_event_type event)
>  {
> +	return _rte_eth_dev_callback_process_generic(dev, event, NULL);
> +}
> +
> +void
> +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> +	enum rte_eth_event_type event, void *param)
> +{
> +	return _rte_eth_dev_callback_process_generic(dev, event, param);
> +}

This function is just adding a parameter, compared to the legacy
_rte_eth_dev_callback_process.
Why calling it process_vf?
And by the way, why not just replacing the legacy function?
As it is a driver interface, there is no ABI restriction.

> +
> +void
> +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> +	enum rte_eth_event_type event, void *param)
> +{
[...]
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -3026,6 +3026,7 @@ enum rte_eth_event_type {
>  				/**< queue state event (enabled/disabled) */
>  	RTE_ETH_EVENT_INTR_RESET,
>  			/**< reset interrupt event, sent to VF on PF reset */
> +	RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
>  	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>  };

Either we choose to have a "generic" VF event well documented,
or it is just a specific event with a tip on where to find the doc.
Here we need at least to know how to handle the argument.

> +/**
> + * @internal Executes all the user application registered callbacks. Used by:
> + * _rte_eth_dev_callback_process and _rte_eth_dev_callback_process_vf
> + * It is for DPDK internal user only. User application should not call it
> + * directly.
> + *
> + * @param dev
> + *  Pointer to struct rte_eth_dev.
> + * @param event
> + *  Eth device interrupt event type.
> + *
> + * @param param
> + *  parameters to pass back to user application.
> + *
> + * @return
> + *  void
> + */
> +void
> +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> +				enum rte_eth_event_type event, void *param);

This is really an internal function and should not be exported at all.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC 0/7] changing mbuf pool handler
  2016-10-05 11:49  0%       ` Hemant Agrawal
@ 2016-10-05 13:15  0%         ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2016-10-05 13:15 UTC (permalink / raw)
  To: Hemant Agrawal, Olivier Matz, dev; +Cc: jerin.jacob



On 5/10/2016 12:49 PM, Hemant Agrawal wrote:
> Hi Olivier,
>
>> -----Original Message-----
>> From: Hunt, David [mailto:david.hunt@intel.com]
>> Hi Olivier,
>>
>>
>> On 3/10/2016 4:49 PM, Olivier Matz wrote:
>>> Hi Hemant,
>>>
>>> Thank you for your feedback.
>>>
>>> On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
>>>> Hi Olivier
>>>>
>>>> On 9/19/2016 7:12 PM, Olivier Matz wrote:
>>>>> Hello,
>>>>>
>>>>> Following discussion from [1] ("usages issue with external mempool").
>>>>>
>>>>> This is a tentative to make the mempool_ops feature introduced by
>>>>> David Hunt [2] more widely used by applications.
>>>>>
>>>>> It applies on top of a minor fix in mbuf lib [3].
>>>>>
>>>>> To sumarize the needs (please comment if I did not got it properly):
>>>>>
>>>>> - new hw-assisted mempool handlers will soon be introduced
>>>>> - to make use of it, the new mempool API [4]
>> (rte_mempool_create_empty,
>>>>>     rte_mempool_populate, ...) has to be used
>>>>> - the legacy mempool API (rte_mempool_create) does not allow to
>> change
>>>>>     the mempool ops. The default is "ring_<s|m>p_<s|m>c" depending on
>>>>>     flags.
>>>>> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
>>>>>     them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
>>>>>     ("ring_mp_mc")
>>>>> - today, most (if not all) applications and examples use either
>>>>>     rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
>>>>>     pool, making it difficult to take advantage of this feature with
>>>>>     existing apps.
>>>>>
>>>>> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
>>>>> rte_mempool_create(), forcing the applications to use the new API,
>>>>> which is more flexible. But after digging a bit, it appeared that
>>>>> rte_mempool_create() is widely used, and not only for mbufs.
>>>>> Deprecating it would have a big impact on applications, and
>>>>> replacing it with the new API would be overkill in many use-cases.
>>>> I agree with the proposal.
>>>>
>>>>> So I finally tried the following approach (inspired from a
>>>>> suggestion Jerin [5]):
>>>>>
>>>>> - add a new mempool_ops parameter to rte_pktmbuf_pool_create().
>> This
>>>>>     unfortunatelly breaks the API, but I implemented an ABI compat layer.
>>>>>     If the patch is accepted, we could discuss how to announce/schedule
>>>>>     the API change.
>>>>> - update the applications and documentation to prefer
>>>>>     rte_pktmbuf_pool_create() as much as possible
>>>>> - update most used examples (testpmd, l2fwd, l3fwd) to add a new
>> command
>>>>>     line argument to select the mempool handler
>>>>>
>>>>> I hope the external applications would then switch to
>>>>> rte_pktmbuf_pool_create(), since it supports most of the use-cases
>>>>> (even priv_size != 0, since we can call rte_mempool_obj_iter() after) .
>>>>>
>>>> I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
>>>> void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
>>>> consolidated wrapper will almost make it certain that applications
>>>> will not try to use rte_mempool_create for packet buffers.
>>> The patch changes the example applications. I'm not sure I understand
>>> why adding these arguments would force application to not use
>>> rte_mempool_create() for packet buffers. Do you have a application in
>> mind?
>>> For the mempool_ops parameter, we must pass it at init because we need
>>> to know the mempool handler before populating the pool. For object
>>> initialization, it can be done after, so I thought it was better to
>>> reduce the number of arguments to avoid to fall in the
>>> mempool_create() syndrom :)
>> I also agree with the proposal. Looks cleaner.
>>
>> I would lean to the side of keeping the parameters to the minimum, i.e.
>> not adding *obj_cb and *obj_cb_arg into rte_pktmbuf_pool_create.
>> Developers always have the option of going with rte_mempool_create if they
>> need more fine-grained control.
> [Hemant] The implementations with hw offloaded mempools don't want developer using *rte_mempool_create* for packet buffer pools.
> This API does not work for hw offloaded mempool.
>
> Also, *rte_mempool_create_empty* - may not be convenient for many application, as it requires calling  4+ APIs.
>
> Olivier is not in favor of deprecating the *rte_mempool_create*.   I agree with concerns raised by him.
>
> Essentially, I was suggesting to upgrade * rte_pktmbuf_pool_create* to be like *rte_mempool_create*  for packet buffers exclusively.
>
> This will provide a clear segregation for API usages w.r.t the packet buffer pool vs all other type of mempools.

Yes, it does sound like we need those extra parameters on 
rte_pktmbuf_pool_create.

Regards,
Dave.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC 0/7] changing mbuf pool handler
  2016-10-05  9:41  0%     ` Hunt, David
@ 2016-10-05 11:49  0%       ` Hemant Agrawal
  2016-10-05 13:15  0%         ` Hunt, David
  0 siblings, 1 reply; 200+ results
From: Hemant Agrawal @ 2016-10-05 11:49 UTC (permalink / raw)
  To: Hunt, David, Olivier Matz, dev; +Cc: jerin.jacob

Hi Olivier,

> -----Original Message-----
> From: Hunt, David [mailto:david.hunt@intel.com]
> Hi Olivier,
> 
> 
> On 3/10/2016 4:49 PM, Olivier Matz wrote:
> > Hi Hemant,
> >
> > Thank you for your feedback.
> >
> > On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
> >> Hi Olivier
> >>
> >> On 9/19/2016 7:12 PM, Olivier Matz wrote:
> >>> Hello,
> >>>
> >>> Following discussion from [1] ("usages issue with external mempool").
> >>>
> >>> This is a tentative to make the mempool_ops feature introduced by
> >>> David Hunt [2] more widely used by applications.
> >>>
> >>> It applies on top of a minor fix in mbuf lib [3].
> >>>
> >>> To sumarize the needs (please comment if I did not got it properly):
> >>>
> >>> - new hw-assisted mempool handlers will soon be introduced
> >>> - to make use of it, the new mempool API [4]
> (rte_mempool_create_empty,
> >>>    rte_mempool_populate, ...) has to be used
> >>> - the legacy mempool API (rte_mempool_create) does not allow to
> change
> >>>    the mempool ops. The default is "ring_<s|m>p_<s|m>c" depending on
> >>>    flags.
> >>> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
> >>>    them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
> >>>    ("ring_mp_mc")
> >>> - today, most (if not all) applications and examples use either
> >>>    rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
> >>>    pool, making it difficult to take advantage of this feature with
> >>>    existing apps.
> >>>
> >>> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
> >>> rte_mempool_create(), forcing the applications to use the new API,
> >>> which is more flexible. But after digging a bit, it appeared that
> >>> rte_mempool_create() is widely used, and not only for mbufs.
> >>> Deprecating it would have a big impact on applications, and
> >>> replacing it with the new API would be overkill in many use-cases.
> >> I agree with the proposal.
> >>
> >>> So I finally tried the following approach (inspired from a
> >>> suggestion Jerin [5]):
> >>>
> >>> - add a new mempool_ops parameter to rte_pktmbuf_pool_create().
> This
> >>>    unfortunatelly breaks the API, but I implemented an ABI compat layer.
> >>>    If the patch is accepted, we could discuss how to announce/schedule
> >>>    the API change.
> >>> - update the applications and documentation to prefer
> >>>    rte_pktmbuf_pool_create() as much as possible
> >>> - update most used examples (testpmd, l2fwd, l3fwd) to add a new
> command
> >>>    line argument to select the mempool handler
> >>>
> >>> I hope the external applications would then switch to
> >>> rte_pktmbuf_pool_create(), since it supports most of the use-cases
> >>> (even priv_size != 0, since we can call rte_mempool_obj_iter() after) .
> >>>
> >> I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
> >> void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
> >> consolidated wrapper will almost make it certain that applications
> >> will not try to use rte_mempool_create for packet buffers.
> > The patch changes the example applications. I'm not sure I understand
> > why adding these arguments would force application to not use
> > rte_mempool_create() for packet buffers. Do you have a application in
> mind?
> >
> > For the mempool_ops parameter, we must pass it at init because we need
> > to know the mempool handler before populating the pool. For object
> > initialization, it can be done after, so I thought it was better to
> > reduce the number of arguments to avoid to fall in the
> > mempool_create() syndrom :)
> 
> I also agree with the proposal. Looks cleaner.
> 
> I would lean to the side of keeping the parameters to the minimum, i.e.
> not adding *obj_cb and *obj_cb_arg into rte_pktmbuf_pool_create.
> Developers always have the option of going with rte_mempool_create if they
> need more fine-grained control.

[Hemant] The implementations with hw offloaded mempools don't want developer using *rte_mempool_create* for packet buffer pools. 
This API does not work for hw offloaded mempool. 

Also, *rte_mempool_create_empty* - may not be convenient for many application, as it requires calling  4+ APIs.

Olivier is not in favor of deprecating the *rte_mempool_create*.   I agree with concerns raised by him. 

Essentially, I was suggesting to upgrade * rte_pktmbuf_pool_create* to be like *rte_mempool_create*  for packet buffers exclusively.

This will provide a clear segregation for API usages w.r.t the packet buffer pool vs all other type of mempools. 


Regards,
Hemant

> 
> Regards,
> Dave.
> 
> > Any other opinions?
> >
> > Regards,
> > Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC 0/7] changing mbuf pool handler
  2016-10-03 15:49  0%   ` Olivier Matz
@ 2016-10-05  9:41  0%     ` Hunt, David
  2016-10-05 11:49  0%       ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Hunt, David @ 2016-10-05  9:41 UTC (permalink / raw)
  To: Olivier Matz, Hemant Agrawal, dev; +Cc: jerin.jacob

Hi Olivier,


On 3/10/2016 4:49 PM, Olivier Matz wrote:
> Hi Hemant,
>
> Thank you for your feedback.
>
> On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
>> Hi Olivier
>>
>> On 9/19/2016 7:12 PM, Olivier Matz wrote:
>>> Hello,
>>>
>>> Following discussion from [1] ("usages issue with external mempool").
>>>
>>> This is a tentative to make the mempool_ops feature introduced
>>> by David Hunt [2] more widely used by applications.
>>>
>>> It applies on top of a minor fix in mbuf lib [3].
>>>
>>> To sumarize the needs (please comment if I did not got it properly):
>>>
>>> - new hw-assisted mempool handlers will soon be introduced
>>> - to make use of it, the new mempool API [4] (rte_mempool_create_empty,
>>>    rte_mempool_populate, ...) has to be used
>>> - the legacy mempool API (rte_mempool_create) does not allow to change
>>>    the mempool ops. The default is "ring_<s|m>p_<s|m>c" depending on
>>>    flags.
>>> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
>>>    them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
>>>    ("ring_mp_mc")
>>> - today, most (if not all) applications and examples use either
>>>    rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
>>>    pool, making it difficult to take advantage of this feature with
>>>    existing apps.
>>>
>>> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
>>> rte_mempool_create(), forcing the applications to use the new API, which
>>> is more flexible. But after digging a bit, it appeared that
>>> rte_mempool_create() is widely used, and not only for mbufs. Deprecating
>>> it would have a big impact on applications, and replacing it with the
>>> new API would be overkill in many use-cases.
>> I agree with the proposal.
>>
>>> So I finally tried the following approach (inspired from a suggestion
>>> Jerin [5]):
>>>
>>> - add a new mempool_ops parameter to rte_pktmbuf_pool_create(). This
>>>    unfortunatelly breaks the API, but I implemented an ABI compat layer.
>>>    If the patch is accepted, we could discuss how to announce/schedule
>>>    the API change.
>>> - update the applications and documentation to prefer
>>>    rte_pktmbuf_pool_create() as much as possible
>>> - update most used examples (testpmd, l2fwd, l3fwd) to add a new command
>>>    line argument to select the mempool handler
>>>
>>> I hope the external applications would then switch to
>>> rte_pktmbuf_pool_create(), since it supports most of the use-cases (even
>>> priv_size != 0, since we can call rte_mempool_obj_iter() after) .
>>>
>> I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
>> void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
>> consolidated wrapper will almost make it certain that applications will
>> not try to use rte_mempool_create for packet buffers.
> The patch changes the example applications. I'm not sure I understand
> why adding these arguments would force application to not use
> rte_mempool_create() for packet buffers. Do you have a application in mind?
>
> For the mempool_ops parameter, we must pass it at init because we need
> to know the mempool handler before populating the pool. For object
> initialization, it can be done after, so I thought it was better to
> reduce the number of arguments to avoid to fall in the mempool_create()
> syndrom :)

I also agree with the proposal. Looks cleaner.

I would lean to the side of keeping the parameters to the minimum, i.e. 
not adding *obj_cb and *obj_cb_arg into rte_pktmbuf_pool_create. 
Developers always have the option of going with rte_mempool_create if 
they need more fine-grained control.

Regards,
Dave.

> Any other opinions?
>
> Regards,
> Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC 0/7] changing mbuf pool handler
  2016-09-22 11:52  0% ` Hemant Agrawal
@ 2016-10-03 15:49  0%   ` Olivier Matz
  2016-10-05  9:41  0%     ` Hunt, David
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-10-03 15:49 UTC (permalink / raw)
  To: Hemant Agrawal, dev; +Cc: jerin.jacob, david.hunt

Hi Hemant,

Thank you for your feedback.

On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
> Hi Olivier
> 
> On 9/19/2016 7:12 PM, Olivier Matz wrote:
>> Hello,
>>
>> Following discussion from [1] ("usages issue with external mempool").
>>
>> This is a tentative to make the mempool_ops feature introduced
>> by David Hunt [2] more widely used by applications.
>>
>> It applies on top of a minor fix in mbuf lib [3].
>>
>> To sumarize the needs (please comment if I did not got it properly):
>>
>> - new hw-assisted mempool handlers will soon be introduced
>> - to make use of it, the new mempool API [4] (rte_mempool_create_empty,
>>   rte_mempool_populate, ...) has to be used
>> - the legacy mempool API (rte_mempool_create) does not allow to change
>>   the mempool ops. The default is "ring_<s|m>p_<s|m>c" depending on
>>   flags.
>> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
>>   them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
>>   ("ring_mp_mc")
>> - today, most (if not all) applications and examples use either
>>   rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
>>   pool, making it difficult to take advantage of this feature with
>>   existing apps.
>>
>> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
>> rte_mempool_create(), forcing the applications to use the new API, which
>> is more flexible. But after digging a bit, it appeared that
>> rte_mempool_create() is widely used, and not only for mbufs. Deprecating
>> it would have a big impact on applications, and replacing it with the
>> new API would be overkill in many use-cases.
> 
> I agree with the proposal.
> 
>>
>> So I finally tried the following approach (inspired from a suggestion
>> Jerin [5]):
>>
>> - add a new mempool_ops parameter to rte_pktmbuf_pool_create(). This
>>   unfortunatelly breaks the API, but I implemented an ABI compat layer.
>>   If the patch is accepted, we could discuss how to announce/schedule
>>   the API change.
>> - update the applications and documentation to prefer
>>   rte_pktmbuf_pool_create() as much as possible
>> - update most used examples (testpmd, l2fwd, l3fwd) to add a new command
>>   line argument to select the mempool handler
>>
>> I hope the external applications would then switch to
>> rte_pktmbuf_pool_create(), since it supports most of the use-cases (even
>> priv_size != 0, since we can call rte_mempool_obj_iter() after) .
>>
> 
> I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
> void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
> consolidated wrapper will almost make it certain that applications will
> not try to use rte_mempool_create for packet buffers.

The patch changes the example applications. I'm not sure I understand
why adding these arguments would force application to not use
rte_mempool_create() for packet buffers. Do you have a application in mind?

For the mempool_ops parameter, we must pass it at init because we need
to know the mempool handler before populating the pool. For object
initialization, it can be done after, so I thought it was better to
reduce the number of arguments to avoid to fall in the mempool_create()
syndrom :)

Any other opinions?

Regards,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: check cpu flags at init
  2016-09-29 20:42  0%     ` Aaron Conole
@ 2016-10-03 14:13  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-10-03 14:13 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dev, Aaron Conole

2016-09-29 16:42, Aaron Conole:
> Flavio Leitner <fbl@sysclose.org> writes:
> 
> > On Mon, Sep 26, 2016 at 11:43:37AM -0400, Aaron Conole wrote:
> >> My only concern is whether this change would be considered ABI
> >> breaking.  I wouldn't think so, since it doesn't seem as though an
> >> application would want to call this explicitly (and is spelled out as
> >> such), but I can't be sure that it isn't already included in the
> >> standard application API, and therefore needs to go through the change
> >> process.
> >
> > I didn't want to change the original behavior more than needed.
> >
> > I think another patch would be necessary to change the whole EAL
> > initialization because there's a bunch of rte_panic() there which
> > aren't friendly with callers either.

Yes please, we need to remove all those panic/exit calls.

> Okay makes sense.
> 
> Acked-by: Aaron Conole <aconole@redhat.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support
  2016-09-30 14:00  4% ` [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
@ 2016-09-30 15:03  0%   ` Pattan, Reshma
  2016-10-18  7:57  0%   ` Sergio Gonzalez Monroy
  1 sibling, 0 replies; 200+ results
From: Pattan, Reshma @ 2016-09-30 15:03 UTC (permalink / raw)
  To: Kerlin, MarcinX, dev
  Cc: De Lara Guarch, Pablo, thomas.monjalon, Kerlin, MarcinX



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcin Kerlin
> Sent: Friday, September 30, 2016 3:01 PM
> To: dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> thomas.monjalon@6wind.com; Kerlin, MarcinX <marcinx.kerlin@intel.com>
> Subject: [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support
> 
> This patch ensure not overwrite device data in the multiprocess application.
> 
> 1)Changes in the library introduces continuity in array rte_eth_dev_data[] shared
> between all processes. Secondary process adds new entries in free space instead
> of overwriting existing entries.
> 
> 2)Changes in application testpmd allow secondary process to attach the
> mempool created by primary process rather than create new and in the case of
> quit or force quit to free devices data from shared array rte_eth_dev_data[].
> 
> -------------------------
> How to reproduce the bug:
> 
> 1) Run primary process:
> ./testpmd -c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0 --proc-
> type=primary --file-prefix=xz1 -- -i
> 
> (gdb) print rte_eth_devices[0].data.name
> $52 = "3:0.1"
> (gdb) print rte_eth_devices[1].data.name
> $53 = "3:0.0"
> 
> 2) Run secondary process:
> ./testpmd -c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0 --vdev
> 'eth_pcap0,rx_pcap=/var/log/device1.pcap, tx_pcap=/var/log/device2.pcap'
> --proc-type=secondary --file-prefix=xz1 -- -i
> 
> (gdb) print rte_eth_devices[0].data.name
> $52 = "eth_pcap0"
> (gdb) print rte_eth_devices[1].data.name
> $53 = "eth_pcap1"
> 
> 3) Go back to the primary and re-check:
> (gdb) print rte_eth_devices[0].data.name
> $54 = "eth_pcap0"
> (gdb) print rte_eth_devices[1].data.name
> $55 = "eth_pcap1"
> 
> It means that secondary process overwrite data of primary process.
> 
> This patch fix it and now if we go back to the primary and re-check then
> everything is fine:
> (gdb) print rte_eth_devices[0].data.name
> $56 = "3:0.1"
> (gdb) print rte_eth_devices[1].data.name
> $57 = "3:0.0"
> 
> So after this fix structure rte_eth_dev_data[] will keep all data one after the
> other instead of overwriting:
> (gdb) print rte_eth_dev_data[0].name
> $52 = "3:0.1"
> (gdb) print rte_eth_dev_data[1].name
> $53 = "3:0.0"
> (gdb) print rte_eth_dev_data[2].name
> $54 = "eth_pcap0"
> (gdb) print rte_eth_dev_data[3].name
> $55 = "eth_pcap1"
> and so on will be append in the next indexes
> 
> If secondary process will be turned off then also will be deleted from array:
> (gdb) print rte_eth_dev_data[0].name
> $52 = "3:0.1"
> (gdb) print rte_eth_dev_data[1].name
> $53 = "3:0.0"
> (gdb) print rte_eth_dev_data[2].name
> $54 = ""
> (gdb) print rte_eth_dev_data[3].name
> $55 = ""
> this also allows re-use index 2 and 3 for next another process
> -------------------------
> 
> Breaking ABI:
> Changes in the library librte_ether causes extending existing structure
> rte_eth_dev_data with a new field lock. The reason is that this structure is
> sharing between all the processes so it should be protected against attempting
> to write from two different processes.
> 
> Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
> I would like to join to this breaking ABI, if it is possible.
> 
> v2:
> * fix syntax error in version script
> v3:
> * changed scope of function
> * improved description
> v4:
> * fix syntax error in version script
> v5:
> * fix header file
> 
> Marcin Kerlin (2):
>   librte_ether: add protection against overwrite device data
>   app/testpmd: improve handling of multiprocess
> 
>  app/test-pmd/testpmd.c                 | 37 +++++++++++++-
>  app/test-pmd/testpmd.h                 |  1 +
>  lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
>  lib/librte_ether/rte_ethdev.h          | 12 +++++
>  lib/librte_ether/rte_ether_version.map |  6 +++
>  5 files changed, 136 insertions(+), 10 deletions(-)
> 
> --
> 1.9.1

Acked-by: Reshma Pattan <reshma.pattan@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support
  @ 2016-09-30 14:00  4% ` Marcin Kerlin
  2016-09-30 15:03  0%   ` Pattan, Reshma
  2016-10-18  7:57  0%   ` Sergio Gonzalez Monroy
  0 siblings, 2 replies; 200+ results
From: Marcin Kerlin @ 2016-09-30 14:00 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch, thomas.monjalon, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the
mempool created by primary process rather than create new and in the case of
quit or force quit to free devices data from shared array rte_eth_dev_data[].

-------------------------
How to reproduce the bug:

1) Run primary process:
./testpmd -c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0
--proc-type=primary --file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$53 = "3:0.0"

2) Run secondary process:
./testpmd -c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0 
--vdev 'eth_pcap0,rx_pcap=/var/log/device1.pcap, tx_pcap=/var/log/device2.pcap'
--proc-type=secondary --file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$53 = "eth_pcap1"

3) Go back to the primary and re-check:
(gdb) print rte_eth_devices[0].data.name
$54 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$55 = "eth_pcap1"

It means that secondary process overwrite data of primary process.

This patch fix it and now if we go back to the primary and re-check then 
everything is fine:
(gdb) print rte_eth_devices[0].data.name
$56 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$57 = "3:0.0"

So after this fix structure rte_eth_dev_data[] will keep all data one after
the other instead of overwriting:
(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = "eth_pcap0"
(gdb) print rte_eth_dev_data[3].name
$55 = "eth_pcap1"
and so on will be append in the next indexes

If secondary process will be turned off then also will be deleted from array:
(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = ""
(gdb) print rte_eth_dev_data[3].name
$55 = ""
this also allows re-use index 2 and 3 for next another process
-------------------------

Breaking ABI:
Changes in the library librte_ether causes extending existing structure 
rte_eth_dev_data with a new field lock. The reason is that this structure
is sharing between all the processes so it should be protected against
attempting to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
I would like to join to this breaking ABI, if it is possible.

v2:
* fix syntax error in version script
v3:
* changed scope of function
* improved description
v4:
* fix syntax error in version script
v5:
* fix header file

Marcin Kerlin (2):
  librte_ether: add protection against overwrite device data
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 37 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 12 +++++
 lib/librte_ether/rte_ether_version.map |  6 +++
 5 files changed, 136 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] eal: check cpu flags at init
  2016-09-27 18:32  0%   ` Flavio Leitner
@ 2016-09-29 20:42  0%     ` Aaron Conole
  2016-10-03 14:13  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Aaron Conole @ 2016-09-29 20:42 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dpdk

Flavio Leitner <fbl@sysclose.org> writes:

> On Mon, Sep 26, 2016 at 11:43:37AM -0400, Aaron Conole wrote:
>> My only concern is whether this change would be considered ABI
>> breaking.  I wouldn't think so, since it doesn't seem as though an
>> application would want to call this explicitly (and is spelled out as
>> such), but I can't be sure that it isn't already included in the
>> standard application API, and therefore needs to go through the change
>> process.
>
> I didn't want to change the original behavior more than needed.
>
> I think another patch would be necessary to change the whole EAL
> initialization because there's a bunch of rte_panic() there which
> aren't friendly with callers either.

Okay makes sense.

Acked-by: Aaron Conole <aconole@redhat.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH V2 1/2] net/virtio: support modern device id
@ 2016-09-28  8:25  4% Jason Wang
  0 siblings, 0 replies; 200+ results
From: Jason Wang @ 2016-09-28  8:25 UTC (permalink / raw)
  To: dev; +Cc: huawei.xie, yuanhan.liu, mst, vkaplans, Jason Wang

Add modern device id and rename VIRTIO_PCI_DEVICEID_MIN to
VIRTIO_PCI_LEGACY_DEVICEID_NET. While at it, remove unused macros too.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 3 ++-
 drivers/net/virtio/virtio_pci.h    | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index ef0d6ee..bb6181d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -103,7 +103,8 @@ static int virtio_dev_queue_stats_mapping_set(
  * The set of PCI devices this driver supports
  */
 static const struct rte_pci_id pci_id_virtio_map[] = {
-	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, VIRTIO_PCI_DEVICEID_MIN) },
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, VIRTIO_PCI_LEGACY_DEVICEID_NET) },
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, VIRTIO_PCI_MODERN_DEVICEID_NET) },
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..3430a39 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -44,8 +44,8 @@ struct virtnet_ctl;
 
 /* VirtIO PCI vendor/device ID. */
 #define VIRTIO_PCI_VENDORID     0x1AF4
-#define VIRTIO_PCI_DEVICEID_MIN 0x1000
-#define VIRTIO_PCI_DEVICEID_MAX 0x103F
+#define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000
+#define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041
 
 /* VirtIO ABI version, this must match exactly. */
 #define VIRTIO_PCI_ABI_VERSION 0
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] eal: check cpu flags at init
  2016-09-26 15:43  3% ` Aaron Conole
@ 2016-09-27 18:32  0%   ` Flavio Leitner
  2016-09-29 20:42  0%     ` Aaron Conole
  0 siblings, 1 reply; 200+ results
From: Flavio Leitner @ 2016-09-27 18:32 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dpdk

On Mon, Sep 26, 2016 at 11:43:37AM -0400, Aaron Conole wrote:
> My only concern is whether this change would be considered ABI
> breaking.  I wouldn't think so, since it doesn't seem as though an
> application would want to call this explicitly (and is spelled out as
> such), but I can't be sure that it isn't already included in the
> standard application API, and therefore needs to go through the change
> process.

I didn't want to change the original behavior more than needed.

I think another patch would be necessary to change the whole EAL
initialization because there's a bunch of rte_panic() there which
aren't friendly with callers either.

-- 
fbl

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 0/2] app/testpmd: improve multiprocess support
    2016-09-27 10:29  4% ` [dpdk-dev] [PATCH v4 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
@ 2016-09-27 11:13  4% ` Marcin Kerlin
  1 sibling, 0 replies; 200+ results
From: Marcin Kerlin @ 2016-09-27 11:13 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch, thomas.monjalon, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the
mempool created by primary process rather than create new and in the case of
quit or force quit to free devices data from shared array rte_eth_dev_data[].

-------------------------
How to reproduce the bug:

1) Run primary process:
-c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0 --proc-type=primary 
--file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$53 = "3:0.0"

2) Run secondary process:
-c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0 -b 01:00.0 -b 01:00.0
--vdev 'eth_pcap0,rx_pcap=/var/log/device1.pcap,tx_pcap=/var/log/device2.pcap'
--proc-type=secondary --file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$53 = "eth_pcap1"

3) Go back to the primary and re-check:
(gdb) print rte_eth_devices[0].data.name
$54 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$55 = "eth_pcap1"

It means that secondary process overwrite data of primary process.

This patch fix it and now if we go back to the primary and re-check then 
everything is fine:
(gdb) print rte_eth_devices[0].data.name
$56 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$57 = "3:0.0"

So after this fix structure rte_eth_dev_data[] will keep all data one 
after the other instead of overwriting:

(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = "eth_pcap0"
(gdb) print rte_eth_dev_data[3].name
$55 = "eth_pcap1"
and so on will be append in the next indexes

If secondary process will be turned off then also will be deleted from array:
(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = ""
(gdb) print rte_eth_dev_data[3].name
$55 = ""
this also allows re-use index 2 and 3 for next another process
-------------------------

Breaking ABI:
Changes in the library librte_ether causes extending existing structure
rte_eth_dev_data with a new field lock. The reason is that this structure is
sharing between all the processes so it should be protected against attempting
to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
I would like to join to this breaking ABI, if it is possible.

v2:
* fix syntax error in version script
v3:
* changed scope of function
* improved description
v4:
* fix syntax error in version script

Marcin Kerlin (2):
  librte_ether: add protection against overwrite device data
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 36 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 24 +++++++++
 lib/librte_ether/rte_ether_version.map |  6 +++
 5 files changed, 147 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 0/2] app/testpmd: improve multiprocess support
  @ 2016-09-27 10:29  4% ` Marcin Kerlin
  2016-09-27 11:13  4% ` Marcin Kerlin
  1 sibling, 0 replies; 200+ results
From: Marcin Kerlin @ 2016-09-27 10:29 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch, thomas.monjalon, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the
mempool created by primary process rather than create new and in the case of
quit or force quit to free devices data from shared array rte_eth_dev_data[].

-------------------------
How to reproduce the bug:

1) Run primary process:
-c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0 --proc-type=primary 
--file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$53 = "3:0.0"

2) Run secondary process:
-c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0 -b 01:00.0 -b 01:00.0
--vdev 'eth_pcap0,rx_pcap=/var/log/device1.pcap,tx_pcap=/var/log/device2.pcap'
--proc-type=secondary --file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$53 = "eth_pcap1"

3) Go back to the primary and re-check:
(gdb) print rte_eth_devices[0].data.name
$54 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$55 = "eth_pcap1"

It means that secondary process overwrite data of primary process.

This patch fix it and now if we go back to the primary and re-check then 
everything is fine:
(gdb) print rte_eth_devices[0].data.name
$56 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$57 = "3:0.0"

So after this fix structure rte_eth_dev_data[] will keep all data one 
after the other instead of overwriting:

(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = "eth_pcap0"
(gdb) print rte_eth_dev_data[3].name
$55 = "eth_pcap1"
and so on will be append in the next indexes

If secondary process will be turned off then also will be deleted from array:
(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = ""
(gdb) print rte_eth_dev_data[3].name
$55 = ""
this also allows re-use index 2 and 3 for next another process
-------------------------

Breaking ABI:
Changes in the library librte_ether causes extending existing structure
rte_eth_dev_data with a new field lock. The reason is that this structure is
sharing between all the processes so it should be protected against attempting
to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
I would like to join to this breaking ABI, if it is possible.

v2:
* fix syntax error in version script
v3:
* changed scope of function
* improved description
v4:
* fix syntax error in version script

Marcin Kerlin (2):
  librte_ether: add protection against overwrite device data
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 36 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 24 +++++++++
 lib/librte_ether/rte_ether_version.map |  6 +++
 5 files changed, 147 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] eal: check cpu flags at init
  @ 2016-09-26 15:43  3% ` Aaron Conole
  2016-09-27 18:32  0%   ` Flavio Leitner
  0 siblings, 1 reply; 200+ results
From: Aaron Conole @ 2016-09-26 15:43 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dpdk

Flavio Leitner <fbl@sysclose.org> writes:

> An application might be linked to DPDK but not really use it,
> so move the cpu flag check to the EAL initialization instead.
>
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  lib/librte_eal/bsdapp/eal/eal.c             | 3 +++
>  lib/librte_eal/common/eal_common_cpuflags.c | 6 ------
>  lib/librte_eal/linuxapp/eal/eal.c           | 3 +++
>  3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index a0c8f8c..c4b22af 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -496,6 +496,9 @@ rte_eal_init(int argc, char **argv)
>  	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
>  	char thread_name[RTE_MAX_THREAD_NAME_LEN];
>  
> +	/* checks if the machine is adequate */
> +	rte_cpu_check_supported();
> +

I think it makes sense to return a result here;  after all, since this
is no longer a *constructor*, we can actually handle a failure case.

So maybe the following diff:

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index ecb1240..eccf5f8 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -38,15 +38,9 @@
 
 /**
  * Checks if the machine is adequate for running the binary. If it is not, the
- * program exits with status 1.
- * The function attribute forces this function to be called before main(). But
- * with ICC, the check is generated by the compiler.
+ * function returns ENOTSUP.
  */
-#ifndef __INTEL_COMPILER
-void __attribute__ ((__constructor__))
-#else
-void
-#endif
+int
 rte_cpu_check_supported(void)
 {
 	/* This is generated at compile-time by the build system */
@@ -63,14 +57,15 @@ rte_cpu_check_supported(void)
 			fprintf(stderr,
 				"ERROR: CPU feature flag lookup failed with error %d\n",
 				ret);
-			exit(1);
+			return ENOTSUP;
 		}
 		if (!ret) {
 			fprintf(stderr,
 			        "ERROR: This system does not support \"%s\".\n"
 			        "Please check that RTE_MACHINE is set correctly.\n",
 			        rte_cpu_get_flag_name(compile_time_flags[i]));
-			exit(1);
+			return ENOTSUP;
 		}
 	}
+	return 0;
 }
diff --git a/lib/librte_eal/common/include/generic/rte_cpuflags.h b/lib/librte_eal/common/include/generic/rte_cpuflags.h
index 71321f3..6e4eb5a 100644
--- a/lib/librte_eal/common/include/generic/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/generic/rte_cpuflags.h
@@ -79,7 +79,7 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature);
  * that were specified at compile time. It is called automatically within the
  * EAL, so does not need to be used by applications.
  */
-void
+int
 rte_cpu_check_supported(void);
 
 #endif /* _RTE_CPUFLAGS_H_ */
--

and the change these hunks to:

if (!rte_cpu_check_supported()) {
	return -1;
}

My only concern is whether this change would be considered ABI
breaking.  I wouldn't think so, since it doesn't seem as though an
application would want to call this explicitly (and is spelled out as
such), but I can't be sure that it isn't already included in the
standard application API, and therefore needs to go through the change
process.

My $.02

-Aaron

>  	if (!rte_atomic32_test_and_set(&run_once))
>  		return -1;
>  
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
> index ecb1240..b5f76f7 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -39,14 +39,8 @@
>  /**
>   * Checks if the machine is adequate for running the binary. If it is not, the
>   * program exits with status 1.
> - * The function attribute forces this function to be called before main(). But
> - * with ICC, the check is generated by the compiler.
>   */
> -#ifndef __INTEL_COMPILER
> -void __attribute__ ((__constructor__))
> -#else
>  void
> -#endif
>  rte_cpu_check_supported(void)
>  {
>  	/* This is generated at compile-time by the build system */
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index d5b81a3..4e88cfc 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -740,6 +740,9 @@ rte_eal_init(int argc, char **argv)
>  	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
>  	char thread_name[RTE_MAX_THREAD_NAME_LEN];
>  
> +	/* checks if the machine is adequate */
> +	rte_cpu_check_supported();
> +
>  	if (!rte_atomic32_test_and_set(&run_once))
>  		return -1;

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 0/2] app/testpmd: improve multiprocess support
  @ 2016-09-26 14:53  4% ` Marcin Kerlin
  0 siblings, 0 replies; 200+ results
From: Marcin Kerlin @ 2016-09-26 14:53 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch, thomas.monjalon, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the
mempool created by primary process rather than create new and in the case of
quit or force quit to free devices data from shared array rte_eth_dev_data[].

-------------------------
How to reproduce the bug:

1) Run primary process:
-c 0xf -n 4 --socket-mem='512,0' -w 03:00.1 -w 03:00.0 --proc-type=primary 
--file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$53 = "3:0.0"

2) Run secondary process:
-c 0xf0 --socket-mem='512,0' -n 4 -v -b 03:00.1 -b 03:00.0 -b 01:00.0 -b 01:00.0
--vdev 'eth_pcap0,rx_pcap=/var/log/device1.pcap,tx_pcap=/var/log/device2.pcap'
--proc-type=secondary --file-prefix=xz1 -- -i

(gdb) print rte_eth_devices[0].data.name
$52 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$53 = "eth_pcap1"

3) Go back to the primary and re-check:
(gdb) print rte_eth_devices[0].data.name
$54 = "eth_pcap0"
(gdb) print rte_eth_devices[1].data.name
$55 = "eth_pcap1"

It means that secondary process overwrite data of primary process.

This patch fix it and now if we go back to the primary and re-check then 
everything is fine:
(gdb) print rte_eth_devices[0].data.name
$56 = "3:0.1"
(gdb) print rte_eth_devices[1].data.name
$57 = "3:0.0"

So after this fix structure rte_eth_dev_data[] will keep all data one 
after the other instead of overwriting:

(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = "eth_pcap0"
(gdb) print rte_eth_dev_data[3].name
$55 = "eth_pcap1"
and so on will be append in the next indexes

If secondary process will be turned off then also will be deleted from array:
(gdb) print rte_eth_dev_data[0].name
$52 = "3:0.1"
(gdb) print rte_eth_dev_data[1].name
$53 = "3:0.0"
(gdb) print rte_eth_dev_data[2].name
$54 = ""
(gdb) print rte_eth_dev_data[3].name
$55 = ""
this also allows re-use index 2 and 3 for next another process
-------------------------

Breaking ABI:
Changes in the library librte_ether causes extending existing structure
rte_eth_dev_data with a new field lock. The reason is that this structure is
sharing between all the processes so it should be protected against attempting
to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
I would like to join to this breaking ABI, if it is possible.

v2:
* fix syntax error in version script
v3:
* changed scope of function
* improved description

Marcin Kerlin (2):
  librte_ether: ensure not overwrite device data in mp app
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 36 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 24 +++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++
 5 files changed, 148 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: announce ABI changes in filtering support
@ 2016-09-23 11:22 13% Your Name
  0 siblings, 0 replies; 200+ results
From: Your Name @ 2016-09-23 11:22 UTC (permalink / raw)
  To: dev; +Cc: Laura Stroe

From: Laura Stroe <laura.stroe@intel.com>

This patch adds a notice that the ABI for filter types
functionality will be enhanced in the 17.02 release with
new operation available to manipulate the tunnel filters:
replace filter types.

Signed-off-by: Laura Stroe <laura.stroe@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1a3831f..1cd1d2c 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -57,3 +57,12 @@ Deprecation Notices
 * API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
   structures. The member ``file_name`` data type will be changed from
   ``char *`` to ``const char *``. This change targets release 16.11.
+
+* In 17.02 ABI changes are planned: the ``rte_filter_op `` enum will be extended
+  with a new member RTE_ETH_FILTER_REPLACE in order to facilitate
+  the new operation - replacing the tunnel filters,
+  the ``rte_eth_tunnel_filter_conf`` structure will be extended with a new field
+  ``filter_type_replace`` handling the bitmask combination of the filter types
+  defined by the values  ETH_TUNNEL_FILTER_XX,
+  define new values for Outer VLAN and Outer Ethertype filters
+  ETH_TUNNEL_FILTER_OVLAN and ETH_TUNNEL_FILTER_OETH.
-- 
2.5.5

--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

^ permalink raw reply	[relevance 13%]

* [dpdk-dev] [PATCH v2 1/7] vhost: simplify memory regions handling
  @ 2016-09-23  4:13  3%   ` Yuanhan Liu
    1 sibling, 0 replies; 200+ results
From: Yuanhan Liu @ 2016-09-23  4:13 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Yuanhan Liu

Due to history reason (that vhost-cuse comes before vhost-user), some
fields for maintaining the vhost-user memory mappings (such as mmapped
address and size, with those we then can unmap on destroy) are kept in
"orig_region_map" struct, a structure that is defined only in vhost-user
source file.

The right way to go is to remove the structure and move all those fields
into virtio_memory_region struct. But we simply can't do that before,
because it breaks the ABI.

Now, thanks to the ABI refactoring, it's never been a blocking issue
any more. And here it goes: this patch removes orig_region_map and
redefines virtio_memory_region, to include all necessary info.

With that, we can simplify the guest/host address convert a bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  49 ++++++------
 lib/librte_vhost/vhost_user.c | 173 +++++++++++++++++-------------------------
 2 files changed, 91 insertions(+), 131 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c2dfc3c..df2107b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -143,12 +143,14 @@ struct virtio_net {
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
  */
-struct virtio_memory_regions {
-	uint64_t guest_phys_address;
-	uint64_t guest_phys_address_end;
-	uint64_t memory_size;
-	uint64_t userspace_address;
-	uint64_t address_offset;
+struct virtio_memory_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
 };
 
 
@@ -156,12 +158,8 @@ struct virtio_memory_regions {
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-	/* Base QEMU userspace address of the memory file. */
-	uint64_t base_address;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
 	uint32_t nregions;
-	struct virtio_memory_regions regions[0];
+	struct virtio_memory_region regions[0];
 };
 
 
@@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
 #define MAX_VHOST_DEVICE	1024
 extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
-/**
- * Function to convert guest physical addresses to vhost virtual addresses.
- * This is used to convert guest virtio buffer addresses.
- */
+/* Convert guest physical Address to host virtual address */
 static inline uint64_t __attribute__((always_inline))
-gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
+gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
 {
-	struct virtio_memory_regions *region;
-	uint32_t regionidx;
-	uint64_t vhost_va = 0;
-
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((guest_pa >= region->guest_phys_address) &&
-			(guest_pa <= region->guest_phys_address_end)) {
-			vhost_va = region->address_offset + guest_pa;
-			break;
+	struct virtio_memory_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 struct virtio_net_device_ops const *notify_ops;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index eee99e9..49585b8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 };
 
-struct orig_region_map {
-	int fd;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
-	uint64_t blksz;
-};
-
-#define orig_region(ptr, nregions) \
-	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
-		sizeof(struct virtio_memory) + \
-		sizeof(struct virtio_memory_regions) * (nregions)))
-
 static uint64_t
 get_blk_size(int fd)
 {
@@ -99,18 +87,17 @@ get_blk_size(int fd)
 static void
 free_mem_region(struct virtio_net *dev)
 {
-	struct orig_region_map *region;
-	unsigned int idx;
+	uint32_t i;
+	struct virtio_memory_region *reg;
 
 	if (!dev || !dev->mem)
 		return;
 
-	region = orig_region(dev->mem, dev->mem->nregions);
-	for (idx = 0; idx < dev->mem->nregions; idx++) {
-		if (region[idx].mapped_address) {
-			munmap((void *)(uintptr_t)region[idx].mapped_address,
-					region[idx].mapped_size);
-			close(region[idx].fd);
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr) {
+			munmap(reg->mmap_addr, reg->mmap_size);
+			close(reg->fd);
 		}
 	}
 }
@@ -120,7 +107,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
 {
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 	if (dev->log_addr) {
@@ -286,25 +273,23 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
  * used to convert the ring addresses to our address space.
  */
 static uint64_t
-qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
+qva_to_vva(struct virtio_net *dev, uint64_t qva)
 {
-	struct virtio_memory_regions *region;
-	uint64_t vhost_va = 0;
-	uint32_t regionidx = 0;
+	struct virtio_memory_region *reg;
+	uint32_t i;
 
 	/* Find the region where the address lives. */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((qemu_va >= region->userspace_address) &&
-			(qemu_va <= region->userspace_address +
-			region->memory_size)) {
-			vhost_va = qemu_va + region->guest_phys_address +
-				region->address_offset -
-				region->userspace_address;
-			break;
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+
+		if (qva >= reg->guest_user_addr &&
+		    qva <  reg->guest_user_addr + reg->size) {
+			return qva - reg->guest_user_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 /*
@@ -391,11 +376,13 @@ static int
 vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 {
 	struct VhostUserMemory memory = pmsg->payload.memory;
-	struct virtio_memory_regions *pregion;
-	uint64_t mapped_address, mapped_size;
-	unsigned int idx = 0;
-	struct orig_region_map *pregion_orig;
+	struct virtio_memory_region *reg;
+	void *mmap_addr;
+	uint64_t mmap_size;
+	uint64_t mmap_offset;
 	uint64_t alignment;
+	uint32_t i;
+	int fd;
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
@@ -405,14 +392,12 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 
-	dev->mem = calloc(1,
-		sizeof(struct virtio_memory) +
-		sizeof(struct virtio_memory_regions) * memory.nregions +
-		sizeof(struct orig_region_map) * memory.nregions);
+	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_region) * memory.nregions, 0);
 	if (dev->mem == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to allocate memory for dev->mem\n",
@@ -421,22 +406,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	}
 	dev->mem->nregions = memory.nregions;
 
-	pregion_orig = orig_region(dev->mem, memory.nregions);
-	for (idx = 0; idx < memory.nregions; idx++) {
-		pregion = &dev->mem->regions[idx];
-		pregion->guest_phys_address =
-			memory.regions[idx].guest_phys_addr;
-		pregion->guest_phys_address_end =
-			memory.regions[idx].guest_phys_addr +
-			memory.regions[idx].memory_size;
-		pregion->memory_size =
-			memory.regions[idx].memory_size;
-		pregion->userspace_address =
-			memory.regions[idx].userspace_addr;
-
-		/* This is ugly */
-		mapped_size = memory.regions[idx].memory_size +
-			memory.regions[idx].mmap_offset;
+	for (i = 0; i < memory.nregions; i++) {
+		fd  = pmsg->fds[i];
+		reg = &dev->mem->regions[i];
+
+		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
+		reg->guest_user_addr = memory.regions[i].userspace_addr;
+		reg->size            = memory.regions[i].memory_size;
+		reg->fd              = fd;
+
+		mmap_offset = memory.regions[i].mmap_offset;
+		mmap_size   = reg->size + mmap_offset;
 
 		/* mmap() without flag of MAP_ANONYMOUS, should be called
 		 * with length argument aligned with hugepagesz at older
@@ -446,67 +426,52 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 		 * to avoid failure, make sure in caller to keep length
 		 * aligned.
 		 */
-		alignment = get_blk_size(pmsg->fds[idx]);
+		alignment = get_blk_size(fd);
 		if (alignment == (uint64_t)-1) {
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"couldn't get hugepage size through fstat\n");
 			goto err_mmap;
 		}
-		mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
+		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
 
-		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
-			mapped_size,
-			PROT_READ | PROT_WRITE, MAP_SHARED,
-			pmsg->fds[idx],
-			0);
+		mmap_addr = mmap(NULL, mmap_size,
+				 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
-			"off:0x%"PRIx64" align:0x%"PRIx64"\n",
-			idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
-			mapped_size, memory.regions[idx].mmap_offset,
-			alignment);
-
-		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
+		if (mmap_addr == MAP_FAILED) {
 			RTE_LOG(ERR, VHOST_CONFIG,
-				"mmap qemu guest failed.\n");
+				"mmap region %u failed.\n", i);
 			goto err_mmap;
 		}
 
-		pregion_orig[idx].mapped_address = mapped_address;
-		pregion_orig[idx].mapped_size = mapped_size;
-		pregion_orig[idx].blksz = alignment;
-		pregion_orig[idx].fd = pmsg->fds[idx];
-
-		mapped_address +=  memory.regions[idx].mmap_offset;
+		reg->mmap_addr = mmap_addr;
+		reg->mmap_size = mmap_size;
+		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
+				      mmap_offset;
 
-		pregion->address_offset = mapped_address -
-			pregion->guest_phys_address;
-
-		if (memory.regions[idx].guest_phys_addr == 0) {
-			dev->mem->base_address =
-				memory.regions[idx].userspace_addr;
-			dev->mem->mapped_address =
-				pregion->address_offset;
-		}
-
-		LOG_DEBUG(VHOST_CONFIG,
-			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
-			idx,
-			(void *)(uintptr_t)pregion->guest_phys_address,
-			(void *)(uintptr_t)pregion->userspace_address,
-			 pregion->memory_size);
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"guest memory region %u, size: 0x%" PRIx64 "\n"
+			"\t guest physical addr: 0x%" PRIx64 "\n"
+			"\t guest virtual  addr: 0x%" PRIx64 "\n"
+			"\t host  virtual  addr: 0x%" PRIx64 "\n"
+			"\t mmap addr : 0x%" PRIx64 "\n"
+			"\t mmap size : 0x%" PRIx64 "\n"
+			"\t mmap align: 0x%" PRIx64 "\n"
+			"\t mmap off  : 0x%" PRIx64 "\n",
+			i, reg->size,
+			reg->guest_phys_addr,
+			reg->guest_user_addr,
+			reg->host_user_addr,
+			(uint64_t)(uintptr_t)mmap_addr,
+			mmap_size,
+			alignment,
+			mmap_offset);
 	}
 
 	return 0;
 
 err_mmap:
-	while (idx--) {
-		munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
-				pregion_orig[idx].mapped_size);
-		close(pregion_orig[idx].fd);
-	}
-	free(dev->mem);
+	free_mem_region(dev);
+	rte_free(dev->mem);
 	dev->mem = NULL;
 	return -1;
 }
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC 0/7] changing mbuf pool handler
  2016-09-19 13:42  2% [dpdk-dev] [RFC 0/7] changing mbuf pool handler Olivier Matz
@ 2016-09-22 11:52  0% ` Hemant Agrawal
  2016-10-03 15:49  0%   ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Hemant Agrawal @ 2016-09-22 11:52 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jerin.jacob, david.hunt

Hi Olivier

On 9/19/2016 7:12 PM, Olivier Matz wrote:
> Hello,
>
> Following discussion from [1] ("usages issue with external mempool").
>
> This is a tentative to make the mempool_ops feature introduced
> by David Hunt [2] more widely used by applications.
>
> It applies on top of a minor fix in mbuf lib [3].
>
> To sumarize the needs (please comment if I did not got it properly):
>
> - new hw-assisted mempool handlers will soon be introduced
> - to make use of it, the new mempool API [4] (rte_mempool_create_empty,
>   rte_mempool_populate, ...) has to be used
> - the legacy mempool API (rte_mempool_create) does not allow to change
>   the mempool ops. The default is "ring_<s|m>p_<s|m>c" depending on
>   flags.
> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
>   them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
>   ("ring_mp_mc")
> - today, most (if not all) applications and examples use either
>   rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
>   pool, making it difficult to take advantage of this feature with
>   existing apps.
>
> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
> rte_mempool_create(), forcing the applications to use the new API, which
> is more flexible. But after digging a bit, it appeared that
> rte_mempool_create() is widely used, and not only for mbufs. Deprecating
> it would have a big impact on applications, and replacing it with the
> new API would be overkill in many use-cases.

I agree with the proposal.

>
> So I finally tried the following approach (inspired from a suggestion
> Jerin [5]):
>
> - add a new mempool_ops parameter to rte_pktmbuf_pool_create(). This
>   unfortunatelly breaks the API, but I implemented an ABI compat layer.
>   If the patch is accepted, we could discuss how to announce/schedule
>   the API change.
> - update the applications and documentation to prefer
>   rte_pktmbuf_pool_create() as much as possible
> - update most used examples (testpmd, l2fwd, l3fwd) to add a new command
>   line argument to select the mempool handler
>
> I hope the external applications would then switch to
> rte_pktmbuf_pool_create(), since it supports most of the use-cases (even
> priv_size != 0, since we can call rte_mempool_obj_iter() after) .
>

I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb, 
void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single 
consolidated wrapper will almost make it certain that applications will 
not try to use rte_mempool_create for packet buffers.



> Comments are of course welcome. Note: the patchset is not really
> tested yet.
>
>
> Thanks,
> Olivier
>
> [1] http://dpdk.org/ml/archives/dev/2016-July/044734.html
> [2] http://dpdk.org/ml/archives/dev/2016-June/042423.html
> [3] http://www.dpdk.org/dev/patchwork/patch/15923/
> [4] http://dpdk.org/ml/archives/dev/2016-May/039229.html
> [5] http://dpdk.org/ml/archives/dev/2016-July/044779.html
>
>
> Olivier Matz (7):
>   mbuf: set the handler at mbuf pool creation
>   mbuf: use helper to create the pool
>   testpmd: new parameter to set mbuf pool ops
>   l3fwd: rework long options parsing
>   l3fwd: new parameter to set mbuf pool ops
>   l2fwd: rework long options parsing
>   l2fwd: new parameter to set mbuf pool ops
>
>  app/pdump/main.c                                   |   2 +-
>  app/test-pipeline/init.c                           |   3 +-
>  app/test-pmd/parameters.c                          |   5 +
>  app/test-pmd/testpmd.c                             |  16 +-
>  app/test-pmd/testpmd.h                             |   1 +
>  app/test/test_cryptodev.c                          |   2 +-
>  app/test/test_cryptodev_perf.c                     |   2 +-
>  app/test/test_distributor.c                        |   2 +-
>  app/test/test_distributor_perf.c                   |   2 +-
>  app/test/test_kni.c                                |   2 +-
>  app/test/test_link_bonding.c                       |   2 +-
>  app/test/test_link_bonding_mode4.c                 |   2 +-
>  app/test/test_link_bonding_rssconf.c               |  11 +-
>  app/test/test_mbuf.c                               |   6 +-
>  app/test/test_pmd_perf.c                           |   3 +-
>  app/test/test_pmd_ring.c                           |   2 +-
>  app/test/test_reorder.c                            |   2 +-
>  app/test/test_sched.c                              |   2 +-
>  app/test/test_table.c                              |   2 +-
>  doc/guides/prog_guide/mbuf_lib.rst                 |   2 +-
>  doc/guides/sample_app_ug/ip_reassembly.rst         |  13 +-
>  doc/guides/sample_app_ug/ipv4_multicast.rst        |  12 +-
>  doc/guides/sample_app_ug/l2_forward_job_stats.rst  |  33 ++--
>  .../sample_app_ug/l2_forward_real_virtual.rst      |  26 ++-
>  doc/guides/sample_app_ug/ptpclient.rst             |  12 +-
>  doc/guides/sample_app_ug/quota_watermark.rst       |  26 ++-
>  drivers/net/bonding/rte_eth_bond_8023ad.c          |  13 +-
>  drivers/net/bonding/rte_eth_bond_alb.c             |   2 +-
>  examples/bond/main.c                               |   2 +-
>  examples/distributor/main.c                        |   2 +-
>  examples/dpdk_qat/main.c                           |   3 +-
>  examples/ethtool/ethtool-app/main.c                |   4 +-
>  examples/exception_path/main.c                     |   3 +-
>  examples/ip_fragmentation/main.c                   |   4 +-
>  examples/ip_pipeline/init.c                        |  19 ++-
>  examples/ip_reassembly/main.c                      |  16 +-
>  examples/ipsec-secgw/ipsec-secgw.c                 |   2 +-
>  examples/ipv4_multicast/main.c                     |   6 +-
>  examples/kni/main.c                                |   2 +-
>  examples/l2fwd-cat/l2fwd-cat.c                     |   3 +-
>  examples/l2fwd-crypto/main.c                       |   2 +-
>  examples/l2fwd-jobstats/main.c                     |   2 +-
>  examples/l2fwd-keepalive/main.c                    |   2 +-
>  examples/l2fwd/main.c                              |  36 ++++-
>  examples/l3fwd-acl/main.c                          |   2 +-
>  examples/l3fwd-power/main.c                        |   2 +-
>  examples/l3fwd-vf/main.c                           |   2 +-
>  examples/l3fwd/main.c                              | 180 +++++++++++----------
>  examples/link_status_interrupt/main.c              |   2 +-
>  examples/load_balancer/init.c                      |   2 +-
>  .../client_server_mp/mp_server/init.c              |   3 +-
>  examples/multi_process/l2fwd_fork/main.c           |  14 +-
>  examples/multi_process/symmetric_mp/main.c         |   2 +-
>  examples/netmap_compat/bridge/bridge.c             |   2 +-
>  examples/packet_ordering/main.c                    |   2 +-
>  examples/performance-thread/l3fwd-thread/main.c    |   2 +-
>  examples/ptpclient/ptpclient.c                     |   3 +-
>  examples/qos_meter/main.c                          |   2 +-
>  examples/qos_sched/init.c                          |   2 +-
>  examples/quota_watermark/qw/main.c                 |   2 +-
>  examples/rxtx_callbacks/main.c                     |   2 +-
>  examples/skeleton/basicfwd.c                       |   3 +-
>  examples/tep_termination/main.c                    |  17 +-
>  examples/vhost/main.c                              |   2 +-
>  examples/vhost_xen/main.c                          |   2 +-
>  examples/vmdq/main.c                               |   2 +-
>  examples/vmdq_dcb/main.c                           |   2 +-
>  lib/librte_mbuf/rte_mbuf.c                         |  34 +++-
>  lib/librte_mbuf/rte_mbuf.h                         |  44 +++--
>  lib/librte_mbuf/rte_mbuf_version.map               |   7 +
>  70 files changed, 366 insertions(+), 289 deletions(-)
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] examples/ethtool: enhance ethtool app in i40e
  @ 2016-09-21 10:20  3% ` Remy Horton
  0 siblings, 0 replies; 200+ results
From: Remy Horton @ 2016-09-21 10:20 UTC (permalink / raw)
  To: dev; +Cc: Qiming Yang


On 20/09/2016 08:30, Qiming Yang wrote:
> Now, run the example/ethtool, the drvinfo can not show the fireware
> information. From customer point of view, it should be better if we
> can have the same as kernel version ethtool show the bus-info and
> firmware-version. We need to add a variable in struct rte_eth_dev_info
> to get the fw version.
> I’m interested in:
> a) this approach is appropriate?
> b) would prefer a change of the API?

The approach is nice and clean, but the changes to rte_eth_dev_info are 
an ABI break.

One alternative is to add a new function (and associated driver function 
pointers) to fetch the version string, but my feeling is if one is to go 
down that road, they may as well generalise it into a driver info 
querying mini-API.

..Remy

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 0/3] add API's for VF management
  2016-09-16 14:15  3%   ` Bernard Iremonger
@ 2016-09-21 10:20  3%     ` Bernard Iremonger
  0 siblings, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-09-21 10:20 UTC (permalink / raw)
  To: dev, rahul.r.shah, wenzhuo.lu, az5157; +Cc: Bernard Iremonger

This patchset contains new DPDK API's requested by AT&T for use
with the Virtual Function Daemon (VFD).

The need to configure and manage VF's on a NIC has grown to the
point where AT&T have devloped a DPDK based tool, VFD, to do this.

This patch set adds API extensions to DPDK VF configuration.

Eight new functions have been added to the eth_dev_ops structure.
Corresponding functions have been added to the ixgbe PMD for the
Intel 82559 NIC.

Changes have been made to testpmd to facilitate testing of the new API's.
The testpmd documentation has been updated to document the testpmd changes.

Note:
Adding new functions to the eth_dev_ops structure will cause an
ABI breakage.

Changes in v4:
rebase to latest master branch.
The rte_eth_dev_vf_ping API has been dropped as it is a work around for a bug.
The rte_eth_dev_set_vf_vlan_strip API has been renamed to
rte_eth_dev_set_vf_vlan_stripq.

Changes in v3:
rebase to latest master branch.
drop patches for callback functions
revise VF id checks in new librte_ether functions
revise testpmd commands for new API's

Changes in V2:
rebase to latest master branch.
fix compile  error with clang.

Bernard Iremonger (3):
  librte_ether: add API's for VF management
  net/ixgbe: add functions for VF management
  app/test_pmd: add tests for new API's

 app/test-pmd/cmdline.c                      | 644 ++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  62 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c            | 138 ++++++
 lib/librte_ether/rte_ethdev.c               | 169 ++++++++
 lib/librte_ether/rte_ethdev.h               | 195 ++++++++-
 lib/librte_ether/rte_ether_version.map      |  13 +
 6 files changed, 1217 insertions(+), 4 deletions(-)

-- 
2.9.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/2] app/testpmd: improve multiprocess support
    2016-09-20 14:06  4% ` [dpdk-dev] [PATCH v2 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
@ 2016-09-20 14:31  4% ` Marcin Kerlin
  1 sibling, 0 replies; 200+ results
From: Marcin Kerlin @ 2016-09-20 14:31 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, pablo.de.lara.guarch, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the mempool
created by primary process rather than create new and in the case of quit or
force quit to free devices data from shared array rte_eth_dev_data[].

Breaking ABI:
Changes in the library librte_ether causes extending existing structure
rte_eth_dev_data with a new field lock. The reason is that this structure
is sharing between all the processes so it should be protected against
attempting to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
I would like to join to this breaking ABI, if it is possible.

v2:
* fix syntax error in version script

Marcin Kerlin (2):
  librte_ether: ensure not overwrite device data in mp app
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 36 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 24 +++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++
 5 files changed, 148 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 0/2] app/testpmd: improve multiprocess support
  @ 2016-09-20 14:06  4% ` Marcin Kerlin
  2016-09-20 14:31  4% ` Marcin Kerlin
  1 sibling, 0 replies; 200+ results
From: Marcin Kerlin @ 2016-09-20 14:06 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, pablo.de.lara.guarch, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the mempool
created by primary process rather than create new and in the case of quit or
force quit to free devices data from shared array rte_eth_dev_data[].

Breaking ABI:
Changes in the library librte_ether causes extending existing structure
rte_eth_dev_data with a new field lock. The reason is that this structure
is sharing between all the processes so it should be protected against
attempting to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016.
I would like to join to this breaking ABI, if it is possible.

v2:
* fix syntax error in version script

Marcin Kerlin (2):
  librte_ether: ensure not overwrite device data in mp app
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 36 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 24 +++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++
 5 files changed, 148 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v11 01/24] eal: remove duplicate function declaration
  @ 2016-09-20 12:41  3%   ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-09-20 12:41 UTC (permalink / raw)
  To: dev
  Cc: viktorin, David Marchand, hemant.agrawal, Thomas Monjalon,
	Shreyansh Jain

From: David Marchand <david.marchand@6wind.com>

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>
---
 lib/librte_eal/common/eal_private.h | 7 -------
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 19f7535..ca1aec6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -237,13 +237,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);
 
 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index d5b81a3..9412983 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [RFC 0/7] changing mbuf pool handler
@ 2016-09-19 13:42  2% Olivier Matz
  2016-09-22 11:52  0% ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-09-19 13:42 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob, hemant.agrawal, david.hunt

Hello,

Following discussion from [1] ("usages issue with external mempool").

This is a tentative to make the mempool_ops feature introduced
by David Hunt [2] more widely used by applications.

It applies on top of a minor fix in mbuf lib [3].

To sumarize the needs (please comment if I did not got it properly):

- new hw-assisted mempool handlers will soon be introduced
- to make use of it, the new mempool API [4] (rte_mempool_create_empty,
  rte_mempool_populate, ...) has to be used
- the legacy mempool API (rte_mempool_create) does not allow to change
  the mempool ops. The default is "ring_<s|m>p_<s|m>c" depending on
  flags.
- the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
  them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
  ("ring_mp_mc")
- today, most (if not all) applications and examples use either
  rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
  pool, making it difficult to take advantage of this feature with
  existing apps.

My initial idea was to deprecate both rte_pktmbuf_pool_create() and
rte_mempool_create(), forcing the applications to use the new API, which
is more flexible. But after digging a bit, it appeared that
rte_mempool_create() is widely used, and not only for mbufs. Deprecating
it would have a big impact on applications, and replacing it with the
new API would be overkill in many use-cases.

So I finally tried the following approach (inspired from a suggestion
Jerin [5]):

- add a new mempool_ops parameter to rte_pktmbuf_pool_create(). This
  unfortunatelly breaks the API, but I implemented an ABI compat layer.
  If the patch is accepted, we could discuss how to announce/schedule
  the API change.
- update the applications and documentation to prefer
  rte_pktmbuf_pool_create() as much as possible
- update most used examples (testpmd, l2fwd, l3fwd) to add a new command
  line argument to select the mempool handler

I hope the external applications would then switch to
rte_pktmbuf_pool_create(), since it supports most of the use-cases (even
priv_size != 0, since we can call rte_mempool_obj_iter() after) .

Comments are of course welcome. Note: the patchset is not really
tested yet.


Thanks,
Olivier

[1] http://dpdk.org/ml/archives/dev/2016-July/044734.html
[2] http://dpdk.org/ml/archives/dev/2016-June/042423.html
[3] http://www.dpdk.org/dev/patchwork/patch/15923/
[4] http://dpdk.org/ml/archives/dev/2016-May/039229.html
[5] http://dpdk.org/ml/archives/dev/2016-July/044779.html


Olivier Matz (7):
  mbuf: set the handler at mbuf pool creation
  mbuf: use helper to create the pool
  testpmd: new parameter to set mbuf pool ops
  l3fwd: rework long options parsing
  l3fwd: new parameter to set mbuf pool ops
  l2fwd: rework long options parsing
  l2fwd: new parameter to set mbuf pool ops

 app/pdump/main.c                                   |   2 +-
 app/test-pipeline/init.c                           |   3 +-
 app/test-pmd/parameters.c                          |   5 +
 app/test-pmd/testpmd.c                             |  16 +-
 app/test-pmd/testpmd.h                             |   1 +
 app/test/test_cryptodev.c                          |   2 +-
 app/test/test_cryptodev_perf.c                     |   2 +-
 app/test/test_distributor.c                        |   2 +-
 app/test/test_distributor_perf.c                   |   2 +-
 app/test/test_kni.c                                |   2 +-
 app/test/test_link_bonding.c                       |   2 +-
 app/test/test_link_bonding_mode4.c                 |   2 +-
 app/test/test_link_bonding_rssconf.c               |  11 +-
 app/test/test_mbuf.c                               |   6 +-
 app/test/test_pmd_perf.c                           |   3 +-
 app/test/test_pmd_ring.c                           |   2 +-
 app/test/test_reorder.c                            |   2 +-
 app/test/test_sched.c                              |   2 +-
 app/test/test_table.c                              |   2 +-
 doc/guides/prog_guide/mbuf_lib.rst                 |   2 +-
 doc/guides/sample_app_ug/ip_reassembly.rst         |  13 +-
 doc/guides/sample_app_ug/ipv4_multicast.rst        |  12 +-
 doc/guides/sample_app_ug/l2_forward_job_stats.rst  |  33 ++--
 .../sample_app_ug/l2_forward_real_virtual.rst      |  26 ++-
 doc/guides/sample_app_ug/ptpclient.rst             |  12 +-
 doc/guides/sample_app_ug/quota_watermark.rst       |  26 ++-
 drivers/net/bonding/rte_eth_bond_8023ad.c          |  13 +-
 drivers/net/bonding/rte_eth_bond_alb.c             |   2 +-
 examples/bond/main.c                               |   2 +-
 examples/distributor/main.c                        |   2 +-
 examples/dpdk_qat/main.c                           |   3 +-
 examples/ethtool/ethtool-app/main.c                |   4 +-
 examples/exception_path/main.c                     |   3 +-
 examples/ip_fragmentation/main.c                   |   4 +-
 examples/ip_pipeline/init.c                        |  19 ++-
 examples/ip_reassembly/main.c                      |  16 +-
 examples/ipsec-secgw/ipsec-secgw.c                 |   2 +-
 examples/ipv4_multicast/main.c                     |   6 +-
 examples/kni/main.c                                |   2 +-
 examples/l2fwd-cat/l2fwd-cat.c                     |   3 +-
 examples/l2fwd-crypto/main.c                       |   2 +-
 examples/l2fwd-jobstats/main.c                     |   2 +-
 examples/l2fwd-keepalive/main.c                    |   2 +-
 examples/l2fwd/main.c                              |  36 ++++-
 examples/l3fwd-acl/main.c                          |   2 +-
 examples/l3fwd-power/main.c                        |   2 +-
 examples/l3fwd-vf/main.c                           |   2 +-
 examples/l3fwd/main.c                              | 180 +++++++++++----------
 examples/link_status_interrupt/main.c              |   2 +-
 examples/load_balancer/init.c                      |   2 +-
 .../client_server_mp/mp_server/init.c              |   3 +-
 examples/multi_process/l2fwd_fork/main.c           |  14 +-
 examples/multi_process/symmetric_mp/main.c         |   2 +-
 examples/netmap_compat/bridge/bridge.c             |   2 +-
 examples/packet_ordering/main.c                    |   2 +-
 examples/performance-thread/l3fwd-thread/main.c    |   2 +-
 examples/ptpclient/ptpclient.c                     |   3 +-
 examples/qos_meter/main.c                          |   2 +-
 examples/qos_sched/init.c                          |   2 +-
 examples/quota_watermark/qw/main.c                 |   2 +-
 examples/rxtx_callbacks/main.c                     |   2 +-
 examples/skeleton/basicfwd.c                       |   3 +-
 examples/tep_termination/main.c                    |  17 +-
 examples/vhost/main.c                              |   2 +-
 examples/vhost_xen/main.c                          |   2 +-
 examples/vmdq/main.c                               |   2 +-
 examples/vmdq_dcb/main.c                           |   2 +-
 lib/librte_mbuf/rte_mbuf.c                         |  34 +++-
 lib/librte_mbuf/rte_mbuf.h                         |  44 +++--
 lib/librte_mbuf/rte_mbuf_version.map               |   7 +
 70 files changed, 366 insertions(+), 289 deletions(-)

-- 
2.8.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 0/3] add API's for VF management
  2016-08-26  9:10  3% ` [dpdk-dev] [RFC PATCH v2 " Bernard Iremonger
  2016-09-09  8:49  0%   ` Pattan, Reshma
  2016-09-16 11:05  3%   ` [dpdk-dev] [PATCH v3 0/3] " Bernard Iremonger
@ 2016-09-16 14:15  3%   ` Bernard Iremonger
  2016-09-21 10:20  3%     ` [dpdk-dev] [PATCH v4 " Bernard Iremonger
  2 siblings, 1 reply; 200+ results
From: Bernard Iremonger @ 2016-09-16 14:15 UTC (permalink / raw)
  To: dev, rahul.r.shah, wenzhuo.lu, az5157; +Cc: Bernard Iremonger

This patchset contains new DPDK API's requested by AT&T for use
with the Virtual Function Daemon (VFD).

The need to configure and manage VF's on a NIC has grown to the
point where AT&T have devloped a DPDK based tool, VFD, to do this.

This patch set adds API extensions to DPDK VF configuration.

Nine new functions have been added to the eth_dev_ops structure.
Corresponding functions have been added to the ixgbe PMD for the
Intel 82559 NIC.

Changes have been made to testpmd to facilitate testing of the new API's.
The testpmd documentation has been updated to document the testpmd changes.

Note:
Adding new functions to the eth_dev_ops structure will cause an
ABI breakage.

Changes in v3:
rebase to latest master branch.
drop patches for callback functions
revise VF id checks in new librte_ether functions
revise testpmd commands for new API's

Changes in V2:
rebase to latest master branch.
fix compile  error with clang.

Bernard Iremonger (3):
  librte_ether: add API's for VF management
  net/ixgbe: add functions for VF management
  app/test_pmd: add tests for new API's

 app/test-pmd/cmdline.c                      | 707 ++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  70 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c            | 166 +++++++
 lib/librte_ether/rte_ethdev.c               | 192 ++++++++
 lib/librte_ether/rte_ethdev.h               | 217 ++++++++-
 lib/librte_ether/rte_ether_version.map      |  14 +
 6 files changed, 1362 insertions(+), 4 deletions(-)

-- 
2.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v10 02/25] eal: remove duplicate function declaration
  2016-09-16  4:29  3%   ` [dpdk-dev] [PATCH v10 02/25] eal: remove duplicate function declaration Shreyansh Jain
@ 2016-09-16 11:42  0%     ` Jan Viktorin
  0 siblings, 0 replies; 200+ results
From: Jan Viktorin @ 2016-09-16 11:42 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, David Marchand, hemant.agrawal, Thomas Monjalon

On Fri, 16 Sep 2016 09:59:37 +0530
Shreyansh Jain <shreyansh.jain@nxp.com> wrote:

> From: David Marchand <david.marchand@6wind.com>
> 
> rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
> introduction.
> This function has been exported in ABI, so remove it from eal_private.h
> 
> Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
> 
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>

Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 0/3] add API's for VF management
  2016-08-26  9:10  3% ` [dpdk-dev] [RFC PATCH v2 " Bernard Iremonger
  2016-09-09  8:49  0%   ` Pattan, Reshma
@ 2016-09-16 11:05  3%   ` Bernard Iremonger
  2016-09-16 14:15  3%   ` Bernard Iremonger
  2 siblings, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-09-16 11:05 UTC (permalink / raw)
  To: dev, rahul.r.shah, wenzhuo.lu, az5157; +Cc: Bernard Iremonger

This patchset contains new DPDK API's requested by AT&T for use
with the Virtual Function Daemon (VFD).

The need to configure and manage VF's on a NIC has grown to the
point where AT&T have devloped a DPDK based tool, VFD, to do this.

This patch set adds API extensions to DPDK VF configuration.

Nine new functions have been added to the eth_dev_ops structure.
Corresponding functions have been added to the ixgbe PMD for the
Intel 82559 NIC.

Changes have been made to testpmd to facilitate testing of the new API's.
The testpmd documentation has been updated to document the testpmd changes.

Note:
Adding new functions to the eth_dev_ops structure will cause an
ABI breakage.

Changes in v3:
rebase to latest master branch.
drop patches for callback functions
revise VF id checks in new librte_ether functions
revise testpmd commands for new API's

Changes in V2:
rebase to latest master branch.
fix compile  error with clang.

Bernard Iremonger (3):
  librte_ether: add API's for VF management
  net/ixgbe: add functions for VF management
  app/test_pmd: add tests for new API's

 app/test-pmd/cmdline.c                      | 707 ++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  70 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c            | 166 +++++++
 lib/librte_ether/rte_ethdev.c               | 192 ++++++++
 lib/librte_ether/rte_ethdev.h               | 217 ++++++++-
 lib/librte_ether/rte_ether_version.map      |  14 +
 6 files changed, 1362 insertions(+), 4 deletions(-)

-- 
2.9.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v10 02/25] eal: remove duplicate function declaration
  @ 2016-09-16  4:29  3%   ` Shreyansh Jain
  2016-09-16 11:42  0%     ` Jan Viktorin
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-09-16  4:29 UTC (permalink / raw)
  To: dev
  Cc: viktorin, David Marchand, hemant.agrawal, Thomas Monjalon,
	Shreyansh Jain

From: David Marchand <david.marchand@6wind.com>

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 lib/librte_eal/common/eal_private.h | 7 -------
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 19f7535..ca1aec6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -237,13 +237,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);
 
 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index d5b81a3..9412983 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] scripts: disable optimization for ABI validation
  2016-08-26 15:06 19% [dpdk-dev] [PATCH] scripts: disable optimization for ABI validation Ferruh Yigit
@ 2016-09-15 14:23  4% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-09-15 14:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Neil Horman

2016-08-26 16:06, Ferruh Yigit:
> abi-dumper giving following warning:
> WARNING: incompatible build option detected: -O3
> 
> Although this patch won't fix warning, it is to ensure code compiled
> with optimization disabled.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] vhost: change the vhost library to a common framework which can support more VIRTIO devices
  2016-09-13 13:24  0%   ` Thomas Monjalon
@ 2016-09-13 13:49  3%     ` Yuanhan Liu
  0 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2016-09-13 13:49 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Changpeng Liu, dev, james.r.harris

On Tue, Sep 13, 2016 at 03:24:53PM +0200, Thomas Monjalon wrote:
> 2016-09-13 20:58, Yuanhan Liu:
> > rte_virtio_net.h is the header file will be exported for applications.
> > Every change there would mean either an API or ABI breakage. Thus, we
> > should try to avoid touching it. Don't even to say you added yet another
> > header file, rte_virtio_dev.h.
> > 
> > I confess that the rte_virtio_net.h filename isn't that right: it sticks
> > to virtio-net so tightly. We may could rename it to rte_vhost.h, but I
> > doubt it's worthwhile: as said, it breaks the API.
> > 
> > The fortunate thing about this file is that, the context is actually not
> > sticking to virtio-net too much. I mean, all the APIs are using the "vid",
> > which is just a number. Well, except the virtio_net_device_ops() structure,
> > which also should be renamed to vhost_device_ops(). Besides that, the
> > three ops, "new_device", "destroy_device" and "vring_state_changed", are
> > actually not limited to virtio-net device.
> > 
> > That is to say, we could have two options here:
> > 
> > - rename the header file and the structure properly, to not limited to
> >   virtio-net
> > 
> > - live with it, let it be a legacy issue, and document it at somewhere,
> >   say, "due to history reason, that virtio-net is the first one supported
> >   in DPDK, we kept the header filename as rte_virtio_net.h, but not ..."
> > 
> > I personally would prefer the later one, which saves us from breaking
> > applications again. I don't have strong objection to the first one though.
> > 
> > Thomas, any comments?
> 
> I don't think keeping broken names for historical reasons is a good
> long term maintenance.

Good point.

> It could be a FIXME comment that we would fix when we have other reasons
> to break the API.
> However, in this case, it is easy to keep the compatibility, I think,
> by including rte_virtio.h in rte_virtio_net.h.

Nice trick!

> Note: renames can also generally be managed with symlinks.
> 
> I also don't really understand why this file name is rte_virtio_net.h and
> not rte_vhost_net.h.

No idea. It's just named so since the beginning. Also I missed this file
while doing the ABI/API refactoring, otherwise, it would be no pain at
this stage.

	--yliu

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] vhost: change the vhost library to a common framework which can support more VIRTIO devices
  2016-09-13 12:58  3% ` Yuanhan Liu
@ 2016-09-13 13:24  0%   ` Thomas Monjalon
  2016-09-13 13:49  3%     ` Yuanhan Liu
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-09-13 13:24 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Changpeng Liu, dev, james.r.harris

2016-09-13 20:58, Yuanhan Liu:
> rte_virtio_net.h is the header file will be exported for applications.
> Every change there would mean either an API or ABI breakage. Thus, we
> should try to avoid touching it. Don't even to say you added yet another
> header file, rte_virtio_dev.h.
> 
> I confess that the rte_virtio_net.h filename isn't that right: it sticks
> to virtio-net so tightly. We may could rename it to rte_vhost.h, but I
> doubt it's worthwhile: as said, it breaks the API.
> 
> The fortunate thing about this file is that, the context is actually not
> sticking to virtio-net too much. I mean, all the APIs are using the "vid",
> which is just a number. Well, except the virtio_net_device_ops() structure,
> which also should be renamed to vhost_device_ops(). Besides that, the
> three ops, "new_device", "destroy_device" and "vring_state_changed", are
> actually not limited to virtio-net device.
> 
> That is to say, we could have two options here:
> 
> - rename the header file and the structure properly, to not limited to
>   virtio-net
> 
> - live with it, let it be a legacy issue, and document it at somewhere,
>   say, "due to history reason, that virtio-net is the first one supported
>   in DPDK, we kept the header filename as rte_virtio_net.h, but not ..."
> 
> I personally would prefer the later one, which saves us from breaking
> applications again. I don't have strong objection to the first one though.
> 
> Thomas, any comments?

I don't think keeping broken names for historical reasons is a good
long term maintenance.
It could be a FIXME comment that we would fix when we have other reasons
to break the API.
However, in this case, it is easy to keep the compatibility, I think,
by including rte_virtio.h in rte_virtio_net.h.
Note: renames can also generally be managed with symlinks.

I also don't really understand why this file name is rte_virtio_net.h and
not rte_vhost_net.h.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] vhost: change the vhost library to a common framework which can support more VIRTIO devices
  @ 2016-09-13 12:58  3% ` Yuanhan Liu
  2016-09-13 13:24  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Yuanhan Liu @ 2016-09-13 12:58 UTC (permalink / raw)
  To: Changpeng Liu; +Cc: dev, james.r.harris, Thomas Monjalon

On Wed, Sep 14, 2016 at 08:15:00PM +0800, Changpeng Liu wrote:
> For storage virtualization use cases, vhost-scsi becomes a more popular
> solution to support VMs. However a user space vhost-scsi-user solution
> does not exist currently. SPDK(Storage Performance Development Kit,
> https://github.com/spdk/spdk) will provide a user space vhost-scsi target
> to support multiple VMs through Qemu. Originally SPDK is built on top
> of DPDK libraries, so we would like to use DPDK vhost library as the
> communication channel between Qemu and vhost-scsi target application.
> 
> Currently DPDK vhost library can only support VIRTIO_ID_NET device type,
> we would like to extend the library to support VIRTIO_ID_SCSI and
> VIRTIO_ID_BLK. Most of DPDK vhost library can be reused only several
> differences:
> 1. VIRTIO SCSI device has different vring queues compared with VIRTIO NET
> device, at least 3 vring queues needed for SCSI device type;
> 2. VIRTIO SCSI will need several extra message operation code, such as
> SCSI_SET_ENDPIONT/SCSI_CLEAR_ENDPOINT;
> 
> First, we would like to extend DPDK vhost library as a common framework

I don't see how common it becomes with this patch applied.

> which be friendly to add other VIRTIO device types, to implement this feature,
> we add a new data structure virtio_dev, which can deliver socket messages
> to different VIRTIO devices, each specific VIRTIO device will register
> callback to virtio_dev.
> 
> Secondly, we would to upstream a patch to Qemu community to add vhost-scsi
> specific operation command such as SCSI_SET_ENDPOINT and SCSI_CLEAR_ENDOINT,
> and user space feature bits.
> 
> Finally, after the Qemu patch set was merged, we will add VIRTIO_ID_SCSI
> support to DPDK vhost library

You actually should send this part out with this patchset. You are making
changes for adding the vhost-scsi support, however, you don't show us how
the code to support vhost-scsi looks like. That means, it's hard for us to
understand why you are doing those changes.

What I said is DPDK will not consider merging vhost-scsi patches unless
QEMU have merged the vhost-scsi part. This doesn't mean you can't send
out the DPDK vhost-scsi patches before that.

> and an example vhost-scsi target which can
> add a SCSI device to VM through this example application.
> 
> This patch set changed the vhost library as a common framework which
> can add other VIRTIO device type in future.
> 
> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> ---
>  lib/librte_vhost/Makefile         |   4 +-
>  lib/librte_vhost/rte_virtio_dev.h | 140 ++++++++
>  lib/librte_vhost/rte_virtio_net.h |  97 +-----

rte_virtio_net.h is the header file will be exported for applications.
Every change there would mean either an API or ABI breakage. Thus, we
should try to avoid touching it. Don't even to say you added yet another
header file, rte_virtio_dev.h.

I confess that the rte_virtio_net.h filename isn't that right: it sticks
to virtio-net so tightly. We may could rename it to rte_vhost.h, but I
doubt it's worthwhile: as said, it breaks the API.

The fortunate thing about this file is that, the context is actually not
sticking to virtio-net too much. I mean, all the APIs are using the "vid",
which is just a number. Well, except the virtio_net_device_ops() structure,
which also should be renamed to vhost_device_ops(). Besides that, the
three ops, "new_device", "destroy_device" and "vring_state_changed", are
actually not limited to virtio-net device.

That is to say, we could have two options here:

- rename the header file and the structure properly, to not limited to
  virtio-net

- live with it, let it be a legacy issue, and document it at somewhere,
  say, "due to history reason, that virtio-net is the first one supported
  in DPDK, we kept the header filename as rte_virtio_net.h, but not ..."

I personally would prefer the later one, which saves us from breaking
applications again. I don't have strong objection to the first one though.

Thomas, any comments?

>  lib/librte_vhost/socket.c         |   6 +-
>  lib/librte_vhost/vhost.c          | 421 ------------------------
>  lib/librte_vhost/vhost.h          | 288 -----------------
>  lib/librte_vhost/vhost_device.h   | 230 +++++++++++++
>  lib/librte_vhost/vhost_net.c      | 659 ++++++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/vhost_net.h      | 126 ++++++++
>  lib/librte_vhost/vhost_user.c     | 451 +++++++++++++-------------
>  lib/librte_vhost/vhost_user.h     |  17 +-
>  lib/librte_vhost/virtio_net.c     |  37 ++-

That basically means you are heading the wrong way. For example,

> +struct virtio_dev_table {
> +	int (*vhost_dev_ready)(struct virtio_dev *dev);
> +	struct vhost_virtqueue* (*vhost_dev_get_queues)(struct virtio_dev *dev, uint16_t queue_id);
> +	void (*vhost_dev_cleanup)(struct virtio_dev *dev, int destroy);
> +	void (*vhost_dev_free)(struct virtio_dev *dev);
> +	void (*vhost_dev_reset)(struct virtio_dev *dev);
> +	uint64_t (*vhost_dev_get_features)(struct virtio_dev *dev);
> +	int (*vhost_dev_set_features)(struct virtio_dev *dev, uint64_t features);
> +	uint64_t (*vhost_dev_get_protocol_features)(struct virtio_dev *dev);
> +	int (*vhost_dev_set_protocol_features)(struct virtio_dev *dev, uint64_t features);
> +	uint32_t (*vhost_dev_get_default_queue_num)(struct virtio_dev *dev);
> +	uint32_t (*vhost_dev_get_queue_num)(struct virtio_dev *dev);
> +	uint16_t (*vhost_dev_get_avail_entries)(struct virtio_dev *dev, uint16_t queue_id);
> +	int (*vhost_dev_get_vring_base)(struct virtio_dev *dev, struct vhost_virtqueue *vq);
> +	int (*vhost_dev_set_vring_num)(struct virtio_dev *dev, struct vhost_virtqueue *vq);
> +	int (*vhost_dev_set_vring_call)(struct virtio_dev *dev, struct vhost_vring_file *file);
> +	int (*vhost_dev_set_log_base)(struct virtio_dev *dev, int fd, uint64_t size, uint64_t off);
> +};

This looks wrong. Most of them (if not all) should be same, regardless
it's for virtio-net, or virtio-scsi. I don't understand why you should
even touch this. Those are for handling *vhost-user* messages, but not
virtio-net, nor virtio-scsi. They should be same no matter which virtio
device we are dealing with. Well, virtio-scsi may just have few more
messages than virtio-net.

	--yliu

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC PATCH v2 0/5] add API's for VF management
  2016-08-26  9:10  3% ` [dpdk-dev] [RFC PATCH v2 " Bernard Iremonger
@ 2016-09-09  8:49  0%   ` Pattan, Reshma
  2016-09-16 11:05  3%   ` [dpdk-dev] [PATCH v3 0/3] " Bernard Iremonger
  2016-09-16 14:15  3%   ` Bernard Iremonger
  2 siblings, 0 replies; 200+ results
From: Pattan, Reshma @ 2016-09-09  8:49 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh
  Cc: Iremonger, Bernard, Shah, Rahul R, Lu, Wenzhuo, dev

Hi Thomas and Ferruh,

Can you take a look and  provide comments on ixgbe driver and ethdev changes.

Thanks,
Reshma

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bernard Iremonger
> Sent: Friday, August 26, 2016 10:10 AM
> To: Shah, Rahul R <rahul.r.shah@intel.com>; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; dev@dpdk.org
> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>
> Subject: [dpdk-dev] [RFC PATCH v2 0/5] add API's for VF management
> 
> This RFC patchset contains new DPDK API's requested by AT&T for use with the
> Virtual Function Daemon (VFD).
> 
> The need to configure and manage VF's on a NIC has grown to the point where
> AT&T have devloped a DPDK based tool, VFD, to do this.
> 
> This RFC proposes to add the following API extensions to DPDK:
>   mailbox communication callback support
>   VF configuration
> 
> Nine new functions have been added to the eth_dev_ops structure.
> Corresponding functions have been added to the ixgbe PMD for the Niantic NIC.
> 
> Two new callback functions have been added.
> Changes have been made to the ixgbe_rcv_msg_from_vf function to use the
> callback functions.
> 
> Changes have been made to testpmd to facilitate testing of the new API's.
> The testpmd documentation has been updated to document the testpmd
> changes.
> 
> Note:
> Adding new functions to the eth_dev_ops structure will cause an ABI breakage.
> 
> Changes in V2:
> rebase to latest master branch.
> fix compile  error with clang.
> 
> Bernard Iremonger (5):
>   librte_ether: add internal callback functions
>   net/ixgbe: add callback to user app on VF to PF mbox msg
>   librte_ether: add API's for VF management
>   net/ixgbe: add functions for VF management
>   app/test_pmd: add tests for new API's
> 
>  app/test-pmd/cmdline.c                      | 700 ++++++++++++++++++++++++++++
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  68 ++-
>  drivers/net/ixgbe/ixgbe_ethdev.c            | 179 +++++++
>  drivers/net/ixgbe/ixgbe_pf.c                |  39 +-
>  lib/librte_ether/rte_ethdev.c               | 176 +++++++
>  lib/librte_ether/rte_ethdev.h               | 284 +++++++++++
>  lib/librte_ether/rte_ether_version.map      |  16 +
>  7 files changed, 1455 insertions(+), 7 deletions(-)
> 
> --
> 2.9.0

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 05/10] lib: work around unnamed structs/unions
  @ 2016-09-08 12:25  3%   ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-09-08 12:25 UTC (permalink / raw)
  To: dev

Exported header files used by applications should allow the strictest
compiler flags. Language extensions used in many places must be explicitly
marked to avoid warnings and compilation failures.

Unnamed structs/unions are allowed since C11, however many compiler
versions do not use this mode by default.

This commit prevents the following errors:

 error: ISO C99 doesn't support unnamed structs/unions
 error: struct has no named members

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_cryptodev/rte_crypto.h                             | 2 ++
 lib/librte_cryptodev/rte_crypto_sym.h                         | 3 +++
 lib/librte_cryptodev/rte_cryptodev.h                          | 4 ++++
 lib/librte_cryptodev/rte_cryptodev_pmd.h                      | 2 ++
 lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h        | 2 ++
 lib/librte_eal/common/include/arch/x86/rte_atomic_32.h        | 3 +++
 lib/librte_eal/common/include/arch/x86/rte_cycles.h           | 2 ++
 lib/librte_eal/common/include/rte_common.h                    | 7 +++++++
 lib/librte_eal/common/include/rte_devargs.h                   | 1 +
 lib/librte_eal/common/include/rte_interrupts.h                | 2 ++
 lib/librte_eal/common/include/rte_memory.h                    | 1 +
 lib/librte_eal/common/include/rte_memzone.h                   | 2 ++
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h | 1 +
 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++++
 lib/librte_hash/rte_thash.h                                   | 3 +++
 lib/librte_lpm/rte_lpm.h                                      | 1 +
 lib/librte_mbuf/rte_mbuf.h                                    | 5 +++++
 lib/librte_mempool/rte_mempool.h                              | 2 ++
 lib/librte_pipeline/rte_pipeline.h                            | 2 ++
 lib/librte_timer/rte_timer.h                                  | 2 ++
 20 files changed, 51 insertions(+)

diff --git a/lib/librte_cryptodev/rte_crypto.h b/lib/librte_cryptodev/rte_crypto.h
index 5bc3eaa..9019518 100644
--- a/lib/librte_cryptodev/rte_crypto.h
+++ b/lib/librte_cryptodev/rte_crypto.h
@@ -48,6 +48,7 @@ extern "C" {
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
+#include <rte_common.h>
 
 #include "rte_crypto_sym.h"
 
@@ -111,6 +112,7 @@ struct rte_crypto_op {
 	void *opaque_data;
 	/**< Opaque pointer for user data */
 
+	RTE_STD_C11
 	union {
 		struct rte_crypto_sym_op *sym;
 		/**< Symmetric operation parameters */
diff --git a/lib/librte_cryptodev/rte_crypto_sym.h b/lib/librte_cryptodev/rte_crypto_sym.h
index d9bd821..8178e5a 100644
--- a/lib/librte_cryptodev/rte_crypto_sym.h
+++ b/lib/librte_cryptodev/rte_crypto_sym.h
@@ -51,6 +51,7 @@ extern "C" {
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
+#include <rte_common.h>
 
 
 /** Symmetric Cipher Algorithms */
@@ -333,6 +334,7 @@ struct rte_crypto_sym_xform {
 	/**< next xform in chain */
 	enum rte_crypto_sym_xform_type type
 	; /**< xform type */
+	RTE_STD_C11
 	union {
 		struct rte_crypto_auth_xform auth;
 		/**< Authentication / hash xform */
@@ -371,6 +373,7 @@ struct rte_crypto_sym_op {
 
 	enum rte_crypto_sym_op_sess_type sess_type;
 
+	RTE_STD_C11
 	union {
 		struct rte_cryptodev_sym_session *session;
 		/**< Handle for the initialised session context */
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index 957bdd7..cf28541 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -48,6 +48,7 @@ extern "C" {
 #include "rte_kvargs.h"
 #include "rte_crypto.h"
 #include "rte_dev.h"
+#include <rte_common.h>
 
 #define CRYPTODEV_NAME_NULL_PMD		cryptodev_null_pmd
 /**< Null crypto PMD device name */
@@ -104,6 +105,7 @@ extern const char **rte_cyptodev_names;
 struct rte_cryptodev_symmetric_capability {
 	enum rte_crypto_sym_xform_type xform_type;
 	/**< Transform type : Authentication / Cipher */
+	RTE_STD_C11
 	union {
 		struct {
 			enum rte_crypto_auth_algorithm algo;
@@ -177,6 +179,7 @@ struct rte_cryptodev_capabilities {
 	enum rte_crypto_op_type op;
 	/**< Operation type */
 
+	RTE_STD_C11
 	union {
 		struct rte_cryptodev_symmetric_capability sym;
 		/**< Symmetric operation capability parameters */
@@ -751,6 +754,7 @@ rte_cryptodev_enqueue_burst(uint8_t dev_id, uint16_t qp_id,
 
 /** Cryptodev symmetric crypto session */
 struct rte_cryptodev_sym_session {
+	RTE_STD_C11
 	struct {
 		uint8_t dev_id;
 		/**< Device Id */
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index 42e7b79..a929ef1 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -52,6 +52,7 @@ extern "C" {
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_log.h>
+#include <rte_common.h>
 
 #include "rte_crypto.h"
 #include "rte_cryptodev.h"
@@ -65,6 +66,7 @@ extern "C" {
 #endif
 
 struct rte_cryptodev_session {
+	RTE_STD_C11
 	struct {
 		uint8_t dev_id;
 		enum rte_cryptodev_type type;
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h b/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
index 64beddf..8fa6fc6 100644
--- a/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
@@ -40,6 +40,7 @@ extern "C" {
 #include "generic/rte_cycles.h"
 
 #include <rte_byteorder.h>
+#include <rte_common.h>
 
 /**
  * Read the time base register.
@@ -52,6 +53,7 @@ rte_rdtsc(void)
 {
 	union {
 		uint64_t tsc_64;
+		RTE_STD_C11
 		struct {
 #if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
 			uint32_t hi_32;
diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_32.h b/lib/librte_eal/common/include/arch/x86/rte_atomic_32.h
index 400d8a9..5ce01b3 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic_32.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_32.h
@@ -40,6 +40,8 @@
 #ifndef _RTE_ATOMIC_I686_H_
 #define _RTE_ATOMIC_I686_H_
 
+#include <rte_common.h>
+
 /*------------------------- 64 bit atomic operations -------------------------*/
 
 #ifndef RTE_FORCE_INTRINSICS
@@ -47,6 +49,7 @@ static inline int
 rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
 {
 	uint8_t res;
+	RTE_STD_C11
 	union {
 		struct {
 			uint32_t l32;
diff --git a/lib/librte_eal/common/include/arch/x86/rte_cycles.h b/lib/librte_eal/common/include/arch/x86/rte_cycles.h
index 6e3c7d8..5eb6ce9 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_cycles.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_cycles.h
@@ -75,12 +75,14 @@ extern "C" {
 extern int rte_cycles_vmware_tsc_map;
 #include <rte_branch_prediction.h>
 #endif
+#include <rte_common.h>
 
 static inline uint64_t
 rte_rdtsc(void)
 {
 	union {
 		uint64_t tsc_64;
+		RTE_STD_C11
 		struct {
 			uint32_t lo_32;
 			uint32_t hi_32;
diff --git a/lib/librte_eal/common/include/rte_common.h b/lib/librte_eal/common/include/rte_common.h
index 477472b..98ecc1c 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -59,6 +59,13 @@ extern "C" {
 #define asm __asm__
 #endif
 
+/** C extension macro for environments lacking C11 features. */
+#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L
+#define RTE_STD_C11 __extension__
+#else
+#define RTE_STD_C11
+#endif
+
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __attribute__ ((aligned(1)));
 typedef uint32_t unaligned_uint32_t __attribute__ ((aligned(1)));
diff --git a/lib/librte_eal/common/include/rte_devargs.h b/lib/librte_eal/common/include/rte_devargs.h
index 53c59f5..c66895f 100644
--- a/lib/librte_eal/common/include/rte_devargs.h
+++ b/lib/librte_eal/common/include/rte_devargs.h
@@ -76,6 +76,7 @@ struct rte_devargs {
 	TAILQ_ENTRY(rte_devargs) next;
 	/** Type of device. */
 	enum rte_devtype type;
+	RTE_STD_C11
 	union {
 		/** Used if type is RTE_DEVTYPE_*_PCI. */
 		struct {
diff --git a/lib/librte_eal/common/include/rte_interrupts.h b/lib/librte_eal/common/include/rte_interrupts.h
index ff11ef3..fd3c6ef 100644
--- a/lib/librte_eal/common/include/rte_interrupts.h
+++ b/lib/librte_eal/common/include/rte_interrupts.h
@@ -34,6 +34,8 @@
 #ifndef _RTE_INTERRUPTS_H_
 #define _RTE_INTERRUPTS_H_
 
+#include <rte_common.h>
+
 /**
  * @file
  *
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 12e0ebb..06b6596 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -104,6 +104,7 @@ typedef uint64_t phys_addr_t; /**< Physical address definition. */
  */
 struct rte_memseg {
 	phys_addr_t phys_addr;      /**< Start physical address. */
+	RTE_STD_C11
 	union {
 		void *addr;         /**< Start virtual address. */
 		uint64_t addr_64;   /**< Makes sure addr is always 64 bits */
diff --git a/lib/librte_eal/common/include/rte_memzone.h b/lib/librte_eal/common/include/rte_memzone.h
index dae98f5..8e42cae 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ -53,6 +53,7 @@
 
 #include <stdio.h>
 #include <rte_memory.h>
+#include <rte_common.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -78,6 +79,7 @@ struct rte_memzone {
 	char name[RTE_MEMZONE_NAMESIZE];  /**< Name of the memory zone. */
 
 	phys_addr_t phys_addr;            /**< Start physical address. */
+	RTE_STD_C11
 	union {
 		void *addr;                   /**< Start virtual address. */
 		uint64_t addr_64;             /**< Makes sure addr is always 64-bits */
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 3dacbff..d459bf4 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -82,6 +82,7 @@ struct rte_epoll_event {
 
 /** Handle for interrupts. */
 struct rte_intr_handle {
+	RTE_STD_C11
 	union {
 		int vfio_dev_fd;  /**< VFIO device file descriptor */
 		int uio_cfg_fd;  /**< UIO config file descriptor
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 2ef0506..164f127 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -61,6 +61,9 @@
 
 #ifdef __KERNEL__
 #include <linux/if.h>
+#define RTE_STD_C11
+#else
+#include <rte_common.h>
 #endif
 
 /**
@@ -85,6 +88,7 @@ enum rte_kni_req_id {
  */
 struct rte_kni_request {
 	uint32_t req_id;             /**< Request id */
+	RTE_STD_C11
 	union {
 		uint32_t new_mtu;    /**< New MTU */
 		uint8_t if_up;       /**< 1: interface up, 0: interface down */
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
index d98e98e..a4886a8 100644
--- a/lib/librte_hash/rte_thash.h
+++ b/lib/librte_hash/rte_thash.h
@@ -54,6 +54,7 @@ extern "C" {
 #include <stdint.h>
 #include <rte_byteorder.h>
 #include <rte_ip.h>
+#include <rte_common.h>
 
 #ifdef __SSE3__
 #include <rte_vect.h>
@@ -102,6 +103,7 @@ static const __m128i rte_thash_ipv6_bswap_mask = {
 struct rte_ipv4_tuple {
 	uint32_t	src_addr;
 	uint32_t	dst_addr;
+	RTE_STD_C11
 	union {
 		struct {
 			uint16_t dport;
@@ -119,6 +121,7 @@ struct rte_ipv4_tuple {
 struct rte_ipv6_tuple {
 	uint8_t		src_addr[16];
 	uint8_t		dst_addr[16];
+	RTE_STD_C11
 	union {
 		struct {
 			uint16_t dport;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 28668a3..3ef4533 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -100,6 +100,7 @@ struct rte_lpm_tbl_entry_v20 {
 	 * a group index pointing to a tbl8 structure (tbl24 only, when
 	 * valid_group is set)
 	 */
+	RTE_STD_C11
 	union {
 		uint8_t next_hop;
 		uint8_t group_idx;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index c6cb299..23b7bf8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -787,6 +787,7 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
 	 * config option.
 	 */
+	RTE_STD_C11
 	union {
 		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
 		uint16_t refcnt;              /**< Non-atomically accessed refcnt */
@@ -806,6 +807,7 @@ struct rte_mbuf {
 	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
 	 * vlan is stripped from the data.
 	 */
+	RTE_STD_C11
 	union {
 		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 		struct {
@@ -827,6 +829,7 @@ struct rte_mbuf {
 	union {
 		uint32_t rss;     /**< RSS hash result if RSS enabled */
 		struct {
+			RTE_STD_C11
 			union {
 				struct {
 					uint16_t hash;
@@ -854,6 +857,7 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	MARKER cacheline1 __rte_cache_min_aligned;
 
+	RTE_STD_C11
 	union {
 		void *userdata;   /**< Can be used for external metadata */
 		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
@@ -863,6 +867,7 @@ struct rte_mbuf {
 	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
 
 	/* fields to support TX offloads */
+	RTE_STD_C11
 	union {
 		uint64_t tx_offload;       /**< combined for easy fetch */
 		__extension__
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 059ad9e..0243f9e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -75,6 +75,7 @@
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_memcpy.h>
+#include <rte_common.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -216,6 +217,7 @@ struct rte_mempool {
 	 * RTE_MEMPOOL_NAMESIZE next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE]; /**< Name of mempool. */
+	RTE_STD_C11
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
 		uint64_t pool_id;        /**< External mempool identifier. */
diff --git a/lib/librte_pipeline/rte_pipeline.h b/lib/librte_pipeline/rte_pipeline.h
index b0b4615..f366348 100644
--- a/lib/librte_pipeline/rte_pipeline.h
+++ b/lib/librte_pipeline/rte_pipeline.h
@@ -87,6 +87,7 @@ extern "C" {
 
 #include <rte_port.h>
 #include <rte_table.h>
+#include <rte_common.h>
 
 struct rte_mbuf;
 
@@ -244,6 +245,7 @@ struct rte_pipeline_table_entry {
 	/** Reserved action */
 	enum rte_pipeline_action action;
 
+	RTE_STD_C11
 	union {
 		/** Output port ID (meta-data for "Send packet to output port"
 		action) */
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 77547c6..a276a73 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -66,6 +66,7 @@
 #include <stdio.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <rte_common.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -91,6 +92,7 @@ enum rte_timer_type {
  * config) and an owner (the id of the lcore that owns the timer).
  */
 union rte_timer_status {
+	RTE_STD_C11
 	struct {
 		uint16_t state;  /**< Stop, pending, running, config. */
 		int16_t owner;   /**< The lcore that owns the timer. */
-- 
2.1.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 02/25] eal: remove duplicate function declaration
  @ 2016-09-07 14:07  3%   ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-09-07 14:07 UTC (permalink / raw)
  To: dev; +Cc: hemant.agrawal, Shreyansh Jain, David Marchand

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 lib/librte_eal/common/eal_private.h | 7 -------
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 19f7535..ca1aec6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -237,13 +237,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);
 
 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index d5b81a3..9412983 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 0/2] app/testpmd: improve multiprocess support
@ 2016-09-02  8:58  4% Marcin Kerlin
  0 siblings, 0 replies; 200+ results
From: Marcin Kerlin @ 2016-09-02  8:58 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch, thomas.monjalon, Marcin Kerlin

This patch ensure not overwrite device data in the multiprocess application.

1)Changes in the library introduces continuity in array rte_eth_dev_data[]
shared between all processes. Secondary process adds new entries in free
space instead of overwriting existing entries.

2)Changes in application testpmd allow secondary process to attach the mempool
created by primary process rather than create new and in the case of quit or
force quit to free devices data from shared array rte_eth_dev_data[].

Breaking ABI:
Changes in the library librte_ether causes extending existing structure 
rte_eth_dev_data with a new field lock. The reason is that this structure is
sharing between all the processes so it should be protected against attempting
to write from two different processes.

Tomasz Kulasek sent announce ABI change in librte_ether on 21 July 2016. I would
like to join to this breaking ABI, if it is possible.

Marcin Kerlin (2):
  librte_ether: ensure not overwrite device data in mp app
  app/testpmd: improve handling of multiprocess

 app/test-pmd/testpmd.c                 | 36 +++++++++++++-
 app/test-pmd/testpmd.h                 |  1 +
 lib/librte_ether/rte_ethdev.c          | 90 +++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.h          | 24 +++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++
 5 files changed, 148 insertions(+), 10 deletions(-)

-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 1/2] net/virtio: support modern device id
@ 2016-09-02  6:36  4% Jason Wang
  0 siblings, 0 replies; 200+ results
From: Jason Wang @ 2016-09-02  6:36 UTC (permalink / raw)
  To: dev; +Cc: huawei.xie, yuanhan.liu, mst, Jason Wang

Spec said "The PCI Device ID is calculated by adding 0x1040 to the
Virtio Device ID". So this patch makes pmd can recognize modern virtio
net id.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 1 +
 drivers/net/virtio/virtio_pci.h    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 07d6449..f48e037 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -104,6 +104,7 @@ static int virtio_dev_queue_stats_mapping_set(
  */
 static const struct rte_pci_id pci_id_virtio_map[] = {
 	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, VIRTIO_PCI_DEVICEID_MIN) },
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID, VIRTIO_PCI_MODERN_DEVICEID_NET) },
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..d3bdfa0 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -46,6 +46,7 @@ struct virtnet_ctl;
 #define VIRTIO_PCI_VENDORID     0x1AF4
 #define VIRTIO_PCI_DEVICEID_MIN 0x1000
 #define VIRTIO_PCI_DEVICEID_MAX 0x103F
+#define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041
 
 /* VirtIO ABI version, this must match exactly. */
 #define VIRTIO_PCI_ABI_VERSION 0
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation
  @ 2016-08-31  9:59  3%       ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2016-08-31  9:59 UTC (permalink / raw)
  To: Vladyslav Buslov; +Cc: dev

Hi Vlad, 

> 
> Hi Konstantin,
> 
> Thanks for your feedback.
> 
> Would you accept this change as config file compile-time parameter with libc calloc as default?
> It is one line change only so it is easy to ifdef.

It is an option, but the main requirements from the community
is to minimize number of build time config options, and instead provide
ability to the user to configure things at runtime.
That's why I thought about something like:

+ /* use EAL hugepages for temporary memory allocations,
+  * might improve build time, build increases hugepages
+  * demand significantly.
+  */
#define	RTE_ACL_CFG_FLAG_HMEM	1

struct rte_acl_config {
	uint32_t num_categories; /**< Number of categories to build with. */
	uint32_t num_fields;     /**< Number of field definitions. */
	struct rte_acl_field_def defs[RTE_ACL_MAX_FIELDS];
	/**< array of field definitions. */
	size_t max_size;
	/**< max memory limit for internal run-time structures. */
+	uint64_t flags;
}; 

And then change tb_pool() to use either calloc() or rte_alloc() based on the flags value.
Another, probably even better and more flexible way is to allow user to specify his own alloc routine:

struct rte_acl_config {
	uint32_t num_categories; /**< Number of categories to build with. */
	uint32_t num_fields;     /**< Number of field definitions. */
	struct rte_acl_field_def defs[RTE_ACL_MAX_FIELDS];
	/**< array of field definitions. */
	size_t max_size;
	/**< max memory limit for internal run-time structures. */
+	void *(tballoc)( (size_t, size_t);
+	void (*tbfree)(void *);
};

In that case user can provide his own alloc/free based on rte_calloc/rte_free
or even on something else.
The only drawback I see with that approaches, that in that case, you'll need to follow
ABI compliance rules, which probably means that your change might not make 16.11.

Konstantin 

> 
> Regards,
> Vlad
> 
> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Wednesday, August 31, 2016 4:28 AM
> To: Vladyslav Buslov
> Cc: dev@dpdk.org
> Subject: RE: [PATCH] acl: use rte_calloc for temporary memory allocation
> 
> Hi Vladyslav,
> 
> > -----Original Message-----
> > From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com]
> > Sent: Tuesday, August 16, 2016 3:01 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org
> > Subject: [PATCH] acl: use rte_calloc for temporary memory allocation
> >
> > Acl build process uses significant amount of memory which degrades
> > performance by causing page walks when memory is allocated on regular heap using libc calloc.
> >
> > This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc.
> 
> We deliberately used standard system memory allocation routines (calloc/free) here.
> With current design build phase was never considered to be an 'RT' phase operation.
> It is pretty cpu and memory expensive.
> So if we'll use RTE memory for build phase it could easily consume all (or most) of it, and might cause DPDK process failure or degradation.
> If you really feel that you (and other users) would benefit from using rte_calloc here (I personally still in doubt), then at least it should be a
> new field inside rte_acl_config, that would allow user to control that behavior.
> Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'.
> To add ability to add/delete rules at runtime while keeping lookup time reasonably low some new approach need to be introduced.
> Konstantin
> 
> >
> > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> > ---
> >  lib/librte_acl/tb_mem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index
> > 157e608..c373673 100644
> > --- a/lib/librte_acl/tb_mem.c
> > +++ b/lib/librte_acl/tb_mem.c
> > @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz)
> >  	size_t size;
> >
> >  	size = sz + pool->alignment - 1;
> > -	block = calloc(1, size + sizeof(*pool->block));
> > +	block = rte_calloc("ACL_TBMEM_BLOCK", 1, size +
> > +sizeof(*pool->block), 0);
> >  	if (block == NULL) {
> >  		RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated "
> >  			"by pool: %zu bytes\n", __func__, sz, pool->alloc);
> > --
> > 2.8.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] scripts: disable optimization for ABI validation
@ 2016-08-26 15:06 19% Ferruh Yigit
  2016-09-15 14:23  4% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-08-26 15:06 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman

abi-dumper giving following warning:
WARNING: incompatible build option detected: -O3

Although this patch won't fix warning, it is to ensure code compiled
with optimization disabled.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 scripts/validate-abi.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/validate-abi.sh b/scripts/validate-abi.sh
index feda6c8..52e4e7a 100755
--- a/scripts/validate-abi.sh
+++ b/scripts/validate-abi.sh
@@ -186,7 +186,7 @@ fixup_config
 # Checking abi compliance relies on using the dwarf information in
 # The shared objects.  Thats only included in the DSO's if we build
 # with -g
-export EXTRA_CFLAGS="$EXTRA_CFLAGS -g"
+export EXTRA_CFLAGS="$EXTRA_CFLAGS -g -O0"
 export EXTRA_LDFLAGS="$EXTRA_LDFLAGS -g"
 
 # Now configure the build
-- 
2.7.4

^ permalink raw reply	[relevance 19%]

* [dpdk-dev] [PATCH v8 02/25] eal: remove duplicate function declaration
  @ 2016-08-26 13:56  3%   ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-08-26 13:56 UTC (permalink / raw)
  To: dev
  Cc: viktorin, david.marchand, thomas.monjalon, hemant.agrawal,
	Shreyansh Jain

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 lib/librte_eal/common/eal_private.h | 7 -------
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 19f7535..ca1aec6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -237,13 +237,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);
 
 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index d5b81a3..9412983 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [RFC PATCH v2 0/5] add API's for VF management
  2016-08-18 13:48  3% [dpdk-dev] [RFC PATCH 0/5] add API's for VF management Bernard Iremonger
@ 2016-08-26  9:10  3% ` Bernard Iremonger
  2016-09-09  8:49  0%   ` Pattan, Reshma
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Bernard Iremonger @ 2016-08-26  9:10 UTC (permalink / raw)
  To: rahul.r.shah, wenzhuo.lu, dev; +Cc: Bernard Iremonger

This RFC patchset contains new DPDK API's requested by AT&T for use
with the Virtual Function Daemon (VFD).

The need to configure and manage VF's on a NIC has grown to the
point where AT&T have devloped a DPDK based tool, VFD, to do this.

This RFC proposes to add the following API extensions to DPDK:
  mailbox communication callback support
  VF configuration

Nine new functions have been added to the eth_dev_ops structure.
Corresponding functions have been added to the ixgbe PMD for the
Niantic NIC.

Two new callback functions have been added.
Changes have been made to the ixgbe_rcv_msg_from_vf function to
use the callback functions.

Changes have been made to testpmd to facilitate testing of the new API's.
The testpmd documentation has been updated to document the testpmd changes.

Note:
Adding new functions to the eth_dev_ops structure will cause an
ABI breakage.

Changes in V2:
rebase to latest master branch.
fix compile  error with clang.

Bernard Iremonger (5):
  librte_ether: add internal callback functions
  net/ixgbe: add callback to user app on VF to PF mbox msg
  librte_ether: add API's for VF management
  net/ixgbe: add functions for VF management
  app/test_pmd: add tests for new API's

 app/test-pmd/cmdline.c                      | 700 ++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  68 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c            | 179 +++++++
 drivers/net/ixgbe/ixgbe_pf.c                |  39 +-
 lib/librte_ether/rte_ethdev.c               | 176 +++++++
 lib/librte_ether/rte_ethdev.h               | 284 +++++++++++
 lib/librte_ether/rte_ether_version.map      |  16 +
 7 files changed, 1455 insertions(+), 7 deletions(-)

-- 
2.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
  2016-08-24  7:40  0%     ` Yuanhan Liu
@ 2016-08-24  7:36  0%       ` Xu, Qian Q
  0 siblings, 0 replies; 200+ results
From: Xu, Qian Q @ 2016-08-24  7:36 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Maxime Coquelin

OK, it's better to claim that his patchset have the dependency on another one.   

-----Original Message-----
From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] 
Sent: Wednesday, August 24, 2016 3:40 PM
To: Xu, Qian Q <qian.q.xu@intel.com>
Cc: dev@dpdk.org; Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling

Yes, it depends on the vhost-cuse removal patchset I sent last week.

	--yliu

On Wed, Aug 24, 2016 at 07:26:07AM +0000, Xu, Qian Q wrote:
> I want to apply the patch on the latest DPDK, see below commit ID but failed since no vhost.h and vhost-user.h files. So do you have any dependency on other patches? 
> 
> commit 28d8abaf250c3fb4dcb6416518f4c54b4ae67205
> Author: Deirdre O'Connor <deirdre.o.connor@intel.com>
> Date:   Mon Aug 22 17:20:08 2016 +0100
> 
>     doc: fix patchwork link
> 
>     Fixes: 58abf6e77c6b ("doc: add contributors guide")
> 
>     Reported-by: Jon Loeliger <jdl@netgate.com>
>     Signed-off-by: Deirdre O'Connor <deirdre.o.connor@intel.com>
>     Acked-by: John McNamara <john.mcnamara@intel.com>
> 
> 
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Tuesday, August 23, 2016 4:11 PM
> To: dev@dpdk.org
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Subject: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
> 
> Due to history reason (that vhost-cuse comes before vhost-user), some fields for maintaining the vhost-user memory mappings (such as mmapped address and size, with those we then can unmap on destroy) are kept in "orig_region_map" struct, a structure that is defined only in vhost-user source file.
> 
> The right way to go is to remove the structure and move all those fields into virtio_memory_region struct. But we simply can't do that before, because it breaks the ABI.
> 
> Now, thanks to the ABI refactoring, it's never been a blocking issue any more. And here it goes: this patch removes orig_region_map and redefines virtio_memory_region, to include all necessary info.
> 
> With that, we can simplify the guest/host address convert a bit.
> 
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  lib/librte_vhost/vhost.h      |  49 ++++++------
>  lib/librte_vhost/vhost_user.c | 172 +++++++++++++++++-------------------------
>  2 files changed, 90 insertions(+), 131 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index c2dfc3c..df2107b 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -143,12 +143,14 @@ struct virtio_net {
>   * Information relating to memory regions including offsets to
>   * addresses in QEMUs memory file.
>   */
> -struct virtio_memory_regions {
> -	uint64_t guest_phys_address;
> -	uint64_t guest_phys_address_end;
> -	uint64_t memory_size;
> -	uint64_t userspace_address;
> -	uint64_t address_offset;
> +struct virtio_memory_region {
> +	uint64_t guest_phys_addr;
> +	uint64_t guest_user_addr;
> +	uint64_t host_user_addr;
> +	uint64_t size;
> +	void	 *mmap_addr;
> +	uint64_t mmap_size;
> +	int fd;
>  };
>  
>  
> @@ -156,12 +158,8 @@ struct virtio_memory_regions {
>   * Memory structure includes region and mapping information.
>   */
>  struct virtio_memory {
> -	/* Base QEMU userspace address of the memory file. */
> -	uint64_t base_address;
> -	uint64_t mapped_address;
> -	uint64_t mapped_size;
>  	uint32_t nregions;
> -	struct virtio_memory_regions regions[0];
> +	struct virtio_memory_region regions[0];
>  };
>  
>  
> @@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
>  #define MAX_VHOST_DEVICE	1024
>  extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
>  
> -/**
> - * Function to convert guest physical addresses to vhost virtual addresses.
> - * This is used to convert guest virtio buffer addresses.
> - */
> +/* Convert guest physical Address to host virtual address */
>  static inline uint64_t __attribute__((always_inline)) -gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
> +gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
>  {
> -	struct virtio_memory_regions *region;
> -	uint32_t regionidx;
> -	uint64_t vhost_va = 0;
> -
> -	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> -		region = &dev->mem->regions[regionidx];
> -		if ((guest_pa >= region->guest_phys_address) &&
> -			(guest_pa <= region->guest_phys_address_end)) {
> -			vhost_va = region->address_offset + guest_pa;
> -			break;
> +	struct virtio_memory_region *reg;
> +	uint32_t i;
> +
> +	for (i = 0; i < dev->mem->nregions; i++) {
> +		reg = &dev->mem->regions[i];
> +		if (gpa >= reg->guest_phys_addr &&
> +		    gpa <  reg->guest_phys_addr + reg->size) {
> +			return gpa - reg->guest_phys_addr +
> +			       reg->host_user_addr;
>  		}
>  	}
> -	return vhost_va;
> +
> +	return 0;
>  }
>  
>  struct virtio_net_device_ops const *notify_ops; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index eee99e9..d2071fd 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
>  	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",  };
>  
> -struct orig_region_map {
> -	int fd;
> -	uint64_t mapped_address;
> -	uint64_t mapped_size;
> -	uint64_t blksz;
> -};
> -
> -#define orig_region(ptr, nregions) \
> -	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
> -		sizeof(struct virtio_memory) + \
> -		sizeof(struct virtio_memory_regions) * (nregions)))
> -
>  static uint64_t
>  get_blk_size(int fd)
>  {
> @@ -99,18 +87,17 @@ get_blk_size(int fd)  static void  free_mem_region(struct virtio_net *dev)  {
> -	struct orig_region_map *region;
> -	unsigned int idx;
> +	uint32_t i;
> +	struct virtio_memory_region *reg;
>  
>  	if (!dev || !dev->mem)
>  		return;
>  
> -	region = orig_region(dev->mem, dev->mem->nregions);
> -	for (idx = 0; idx < dev->mem->nregions; idx++) {
> -		if (region[idx].mapped_address) {
> -			munmap((void *)(uintptr_t)region[idx].mapped_address,
> -					region[idx].mapped_size);
> -			close(region[idx].fd);
> +	for (i = 0; i < dev->mem->nregions; i++) {
> +		reg = &dev->mem->regions[i];
> +		if (reg->host_user_addr) {
> +			munmap(reg->mmap_addr, reg->mmap_size);
> +			close(reg->fd);
>  		}
>  	}
>  }
> @@ -120,7 +107,7 @@ vhost_backend_cleanup(struct virtio_net *dev)  {
>  	if (dev->mem) {
>  		free_mem_region(dev);
> -		free(dev->mem);
> +		rte_free(dev->mem);
>  		dev->mem = NULL;
>  	}
>  	if (dev->log_addr) {
> @@ -286,25 +273,23 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
>   * used to convert the ring addresses to our address space.
>   */
>  static uint64_t
> -qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
> +qva_to_vva(struct virtio_net *dev, uint64_t qva)
>  {
> -	struct virtio_memory_regions *region;
> -	uint64_t vhost_va = 0;
> -	uint32_t regionidx = 0;
> +	struct virtio_memory_region *reg;
> +	uint32_t i;
>  
>  	/* Find the region where the address lives. */
> -	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> -		region = &dev->mem->regions[regionidx];
> -		if ((qemu_va >= region->userspace_address) &&
> -			(qemu_va <= region->userspace_address +
> -			region->memory_size)) {
> -			vhost_va = qemu_va + region->guest_phys_address +
> -				region->address_offset -
> -				region->userspace_address;
> -			break;
> +	for (i = 0; i < dev->mem->nregions; i++) {
> +		reg = &dev->mem->regions[i];
> +
> +		if (qva >= reg->guest_user_addr &&
> +		    qva <  reg->guest_user_addr + reg->size) {
> +			return qva - reg->guest_user_addr +
> +			       reg->host_user_addr;
>  		}
>  	}
> -	return vhost_va;
> +
> +	return 0;
>  }
>  
>  /*
> @@ -391,11 +376,13 @@ static int
>  vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)  {
>  	struct VhostUserMemory memory = pmsg->payload.memory;
> -	struct virtio_memory_regions *pregion;
> -	uint64_t mapped_address, mapped_size;
> -	unsigned int idx = 0;
> -	struct orig_region_map *pregion_orig;
> +	struct virtio_memory_region *reg;
> +	void *mmap_addr;
> +	uint64_t mmap_size;
> +	uint64_t mmap_offset;
>  	uint64_t alignment;
> +	uint32_t i;
> +	int fd;
>  
>  	/* Remove from the data plane. */
>  	if (dev->flags & VIRTIO_DEV_RUNNING) { @@ -405,14 +392,12 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
>  
>  	if (dev->mem) {
>  		free_mem_region(dev);
> -		free(dev->mem);
> +		rte_free(dev->mem);
>  		dev->mem = NULL;
>  	}
>  
> -	dev->mem = calloc(1,
> -		sizeof(struct virtio_memory) +
> -		sizeof(struct virtio_memory_regions) * memory.nregions +
> -		sizeof(struct orig_region_map) * memory.nregions);
> +	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct virtio_memory) +
> +		sizeof(struct virtio_memory_region) * memory.nregions, 0);
>  	if (dev->mem == NULL) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
>  			"(%d) failed to allocate memory for dev->mem\n", @@ -421,22 +406,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
>  	}
>  	dev->mem->nregions = memory.nregions;
>  
> -	pregion_orig = orig_region(dev->mem, memory.nregions);
> -	for (idx = 0; idx < memory.nregions; idx++) {
> -		pregion = &dev->mem->regions[idx];
> -		pregion->guest_phys_address =
> -			memory.regions[idx].guest_phys_addr;
> -		pregion->guest_phys_address_end =
> -			memory.regions[idx].guest_phys_addr +
> -			memory.regions[idx].memory_size;
> -		pregion->memory_size =
> -			memory.regions[idx].memory_size;
> -		pregion->userspace_address =
> -			memory.regions[idx].userspace_addr;
> -
> -		/* This is ugly */
> -		mapped_size = memory.regions[idx].memory_size +
> -			memory.regions[idx].mmap_offset;
> +	for (i = 0; i < memory.nregions; i++) {
> +		fd  = pmsg->fds[i];
> +		reg = &dev->mem->regions[i];
> +
> +		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
> +		reg->guest_user_addr = memory.regions[i].userspace_addr;
> +		reg->size            = memory.regions[i].memory_size;
> +		reg->fd              = fd;
> +
> +		mmap_offset = memory.regions[i].mmap_offset;
> +		mmap_size   = reg->size + mmap_offset;
>  
>  		/* mmap() without flag of MAP_ANONYMOUS, should be called
>  		 * with length argument aligned with hugepagesz at older @@ -446,67 +426,51 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
>  		 * to avoid failure, make sure in caller to keep length
>  		 * aligned.
>  		 */
> -		alignment = get_blk_size(pmsg->fds[idx]);
> +		alignment = get_blk_size(fd);
>  		if (alignment == (uint64_t)-1) {
>  			RTE_LOG(ERR, VHOST_CONFIG,
>  				"couldn't get hugepage size through fstat\n");
>  			goto err_mmap;
>  		}
> -		mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
> +		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
>  
> -		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
> -			mapped_size,
> -			PROT_READ | PROT_WRITE, MAP_SHARED,
> -			pmsg->fds[idx],
> -			0);
> +		mmap_addr = mmap(NULL, mmap_size,
> +				 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>  
> -		RTE_LOG(INFO, VHOST_CONFIG,
> -			"mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
> -			"off:0x%"PRIx64" align:0x%"PRIx64"\n",
> -			idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
> -			mapped_size, memory.regions[idx].mmap_offset,
> -			alignment);
> -
> -		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
> +		if (mmap_addr == MAP_FAILED) {
>  			RTE_LOG(ERR, VHOST_CONFIG,
> -				"mmap qemu guest failed.\n");
> +				"mmap region %u failed.\n", i);
>  			goto err_mmap;
>  		}
>  
> -		pregion_orig[idx].mapped_address = mapped_address;
> -		pregion_orig[idx].mapped_size = mapped_size;
> -		pregion_orig[idx].blksz = alignment;
> -		pregion_orig[idx].fd = pmsg->fds[idx];
> -
> -		mapped_address +=  memory.regions[idx].mmap_offset;
> +		reg->mmap_addr = mmap_addr;
> +		reg->mmap_size = mmap_size;
> +		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
>  
> -		pregion->address_offset = mapped_address -
> -			pregion->guest_phys_address;
> -
> -		if (memory.regions[idx].guest_phys_addr == 0) {
> -			dev->mem->base_address =
> -				memory.regions[idx].userspace_addr;
> -			dev->mem->mapped_address =
> -				pregion->address_offset;
> -		}
> -
> -		LOG_DEBUG(VHOST_CONFIG,
> -			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
> -			idx,
> -			(void *)(uintptr_t)pregion->guest_phys_address,
> -			(void *)(uintptr_t)pregion->userspace_address,
> -			 pregion->memory_size);
> +		RTE_LOG(INFO, VHOST_CONFIG,
> +			"guest memory region %u, size: 0x%" PRIx64 "\n"
> +			"\t guest physical addr: 0x%" PRIx64 "\n"
> +			"\t guest virtual  addr: 0x%" PRIx64 "\n"
> +			"\t host  virtual  addr: 0x%" PRIx64 "\n"
> +			"\t mmap addr : 0x%" PRIx64 "\n"
> +			"\t mmap size : 0x%" PRIx64 "\n"
> +			"\t mmap align: 0x%" PRIx64 "\n"
> +			"\t mmap off  : 0x%" PRIx64 "\n",
> +			i, reg->size,
> +			reg->guest_phys_addr,
> +			reg->guest_user_addr,
> +			reg->host_user_addr,
> +			(uint64_t)(uintptr_t)mmap_addr,
> +			mmap_size,
> +			alignment,
> +			mmap_offset);
>  	}
>  
>  	return 0;
>  
>  err_mmap:
> -	while (idx--) {
> -		munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
> -				pregion_orig[idx].mapped_size);
> -		close(pregion_orig[idx].fd);
> -	}
> -	free(dev->mem);
> +	free_mem_region(dev);
> +	rte_free(dev->mem);
>  	dev->mem = NULL;
>  	return -1;
>  }
> --
> 1.9.0

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
  2016-08-24  7:26  3%   ` Xu, Qian Q
@ 2016-08-24  7:40  0%     ` Yuanhan Liu
  2016-08-24  7:36  0%       ` Xu, Qian Q
  0 siblings, 1 reply; 200+ results
From: Yuanhan Liu @ 2016-08-24  7:40 UTC (permalink / raw)
  To: Xu, Qian Q; +Cc: dev, Maxime Coquelin

Yes, it depends on the vhost-cuse removal patchset I sent last week.

	--yliu

On Wed, Aug 24, 2016 at 07:26:07AM +0000, Xu, Qian Q wrote:
> I want to apply the patch on the latest DPDK, see below commit ID but failed since no vhost.h and vhost-user.h files. So do you have any dependency on other patches? 
> 
> commit 28d8abaf250c3fb4dcb6416518f4c54b4ae67205
> Author: Deirdre O'Connor <deirdre.o.connor@intel.com>
> Date:   Mon Aug 22 17:20:08 2016 +0100
> 
>     doc: fix patchwork link
> 
>     Fixes: 58abf6e77c6b ("doc: add contributors guide")
> 
>     Reported-by: Jon Loeliger <jdl@netgate.com>
>     Signed-off-by: Deirdre O'Connor <deirdre.o.connor@intel.com>
>     Acked-by: John McNamara <john.mcnamara@intel.com>
> 
> 
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Tuesday, August 23, 2016 4:11 PM
> To: dev@dpdk.org
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Subject: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
> 
> Due to history reason (that vhost-cuse comes before vhost-user), some fields for maintaining the vhost-user memory mappings (such as mmapped address and size, with those we then can unmap on destroy) are kept in "orig_region_map" struct, a structure that is defined only in vhost-user source file.
> 
> The right way to go is to remove the structure and move all those fields into virtio_memory_region struct. But we simply can't do that before, because it breaks the ABI.
> 
> Now, thanks to the ABI refactoring, it's never been a blocking issue any more. And here it goes: this patch removes orig_region_map and redefines virtio_memory_region, to include all necessary info.
> 
> With that, we can simplify the guest/host address convert a bit.
> 
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  lib/librte_vhost/vhost.h      |  49 ++++++------
>  lib/librte_vhost/vhost_user.c | 172 +++++++++++++++++-------------------------
>  2 files changed, 90 insertions(+), 131 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index c2dfc3c..df2107b 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -143,12 +143,14 @@ struct virtio_net {
>   * Information relating to memory regions including offsets to
>   * addresses in QEMUs memory file.
>   */
> -struct virtio_memory_regions {
> -	uint64_t guest_phys_address;
> -	uint64_t guest_phys_address_end;
> -	uint64_t memory_size;
> -	uint64_t userspace_address;
> -	uint64_t address_offset;
> +struct virtio_memory_region {
> +	uint64_t guest_phys_addr;
> +	uint64_t guest_user_addr;
> +	uint64_t host_user_addr;
> +	uint64_t size;
> +	void	 *mmap_addr;
> +	uint64_t mmap_size;
> +	int fd;
>  };
>  
>  
> @@ -156,12 +158,8 @@ struct virtio_memory_regions {
>   * Memory structure includes region and mapping information.
>   */
>  struct virtio_memory {
> -	/* Base QEMU userspace address of the memory file. */
> -	uint64_t base_address;
> -	uint64_t mapped_address;
> -	uint64_t mapped_size;
>  	uint32_t nregions;
> -	struct virtio_memory_regions regions[0];
> +	struct virtio_memory_region regions[0];
>  };
>  
>  
> @@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
>  #define MAX_VHOST_DEVICE	1024
>  extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
>  
> -/**
> - * Function to convert guest physical addresses to vhost virtual addresses.
> - * This is used to convert guest virtio buffer addresses.
> - */
> +/* Convert guest physical Address to host virtual address */
>  static inline uint64_t __attribute__((always_inline)) -gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
> +gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
>  {
> -	struct virtio_memory_regions *region;
> -	uint32_t regionidx;
> -	uint64_t vhost_va = 0;
> -
> -	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> -		region = &dev->mem->regions[regionidx];
> -		if ((guest_pa >= region->guest_phys_address) &&
> -			(guest_pa <= region->guest_phys_address_end)) {
> -			vhost_va = region->address_offset + guest_pa;
> -			break;
> +	struct virtio_memory_region *reg;
> +	uint32_t i;
> +
> +	for (i = 0; i < dev->mem->nregions; i++) {
> +		reg = &dev->mem->regions[i];
> +		if (gpa >= reg->guest_phys_addr &&
> +		    gpa <  reg->guest_phys_addr + reg->size) {
> +			return gpa - reg->guest_phys_addr +
> +			       reg->host_user_addr;
>  		}
>  	}
> -	return vhost_va;
> +
> +	return 0;
>  }
>  
>  struct virtio_net_device_ops const *notify_ops; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index eee99e9..d2071fd 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
>  	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",  };
>  
> -struct orig_region_map {
> -	int fd;
> -	uint64_t mapped_address;
> -	uint64_t mapped_size;
> -	uint64_t blksz;
> -};
> -
> -#define orig_region(ptr, nregions) \
> -	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
> -		sizeof(struct virtio_memory) + \
> -		sizeof(struct virtio_memory_regions) * (nregions)))
> -
>  static uint64_t
>  get_blk_size(int fd)
>  {
> @@ -99,18 +87,17 @@ get_blk_size(int fd)  static void  free_mem_region(struct virtio_net *dev)  {
> -	struct orig_region_map *region;
> -	unsigned int idx;
> +	uint32_t i;
> +	struct virtio_memory_region *reg;
>  
>  	if (!dev || !dev->mem)
>  		return;
>  
> -	region = orig_region(dev->mem, dev->mem->nregions);
> -	for (idx = 0; idx < dev->mem->nregions; idx++) {
> -		if (region[idx].mapped_address) {
> -			munmap((void *)(uintptr_t)region[idx].mapped_address,
> -					region[idx].mapped_size);
> -			close(region[idx].fd);
> +	for (i = 0; i < dev->mem->nregions; i++) {
> +		reg = &dev->mem->regions[i];
> +		if (reg->host_user_addr) {
> +			munmap(reg->mmap_addr, reg->mmap_size);
> +			close(reg->fd);
>  		}
>  	}
>  }
> @@ -120,7 +107,7 @@ vhost_backend_cleanup(struct virtio_net *dev)  {
>  	if (dev->mem) {
>  		free_mem_region(dev);
> -		free(dev->mem);
> +		rte_free(dev->mem);
>  		dev->mem = NULL;
>  	}
>  	if (dev->log_addr) {
> @@ -286,25 +273,23 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
>   * used to convert the ring addresses to our address space.
>   */
>  static uint64_t
> -qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
> +qva_to_vva(struct virtio_net *dev, uint64_t qva)
>  {
> -	struct virtio_memory_regions *region;
> -	uint64_t vhost_va = 0;
> -	uint32_t regionidx = 0;
> +	struct virtio_memory_region *reg;
> +	uint32_t i;
>  
>  	/* Find the region where the address lives. */
> -	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> -		region = &dev->mem->regions[regionidx];
> -		if ((qemu_va >= region->userspace_address) &&
> -			(qemu_va <= region->userspace_address +
> -			region->memory_size)) {
> -			vhost_va = qemu_va + region->guest_phys_address +
> -				region->address_offset -
> -				region->userspace_address;
> -			break;
> +	for (i = 0; i < dev->mem->nregions; i++) {
> +		reg = &dev->mem->regions[i];
> +
> +		if (qva >= reg->guest_user_addr &&
> +		    qva <  reg->guest_user_addr + reg->size) {
> +			return qva - reg->guest_user_addr +
> +			       reg->host_user_addr;
>  		}
>  	}
> -	return vhost_va;
> +
> +	return 0;
>  }
>  
>  /*
> @@ -391,11 +376,13 @@ static int
>  vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)  {
>  	struct VhostUserMemory memory = pmsg->payload.memory;
> -	struct virtio_memory_regions *pregion;
> -	uint64_t mapped_address, mapped_size;
> -	unsigned int idx = 0;
> -	struct orig_region_map *pregion_orig;
> +	struct virtio_memory_region *reg;
> +	void *mmap_addr;
> +	uint64_t mmap_size;
> +	uint64_t mmap_offset;
>  	uint64_t alignment;
> +	uint32_t i;
> +	int fd;
>  
>  	/* Remove from the data plane. */
>  	if (dev->flags & VIRTIO_DEV_RUNNING) { @@ -405,14 +392,12 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
>  
>  	if (dev->mem) {
>  		free_mem_region(dev);
> -		free(dev->mem);
> +		rte_free(dev->mem);
>  		dev->mem = NULL;
>  	}
>  
> -	dev->mem = calloc(1,
> -		sizeof(struct virtio_memory) +
> -		sizeof(struct virtio_memory_regions) * memory.nregions +
> -		sizeof(struct orig_region_map) * memory.nregions);
> +	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct virtio_memory) +
> +		sizeof(struct virtio_memory_region) * memory.nregions, 0);
>  	if (dev->mem == NULL) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
>  			"(%d) failed to allocate memory for dev->mem\n", @@ -421,22 +406,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
>  	}
>  	dev->mem->nregions = memory.nregions;
>  
> -	pregion_orig = orig_region(dev->mem, memory.nregions);
> -	for (idx = 0; idx < memory.nregions; idx++) {
> -		pregion = &dev->mem->regions[idx];
> -		pregion->guest_phys_address =
> -			memory.regions[idx].guest_phys_addr;
> -		pregion->guest_phys_address_end =
> -			memory.regions[idx].guest_phys_addr +
> -			memory.regions[idx].memory_size;
> -		pregion->memory_size =
> -			memory.regions[idx].memory_size;
> -		pregion->userspace_address =
> -			memory.regions[idx].userspace_addr;
> -
> -		/* This is ugly */
> -		mapped_size = memory.regions[idx].memory_size +
> -			memory.regions[idx].mmap_offset;
> +	for (i = 0; i < memory.nregions; i++) {
> +		fd  = pmsg->fds[i];
> +		reg = &dev->mem->regions[i];
> +
> +		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
> +		reg->guest_user_addr = memory.regions[i].userspace_addr;
> +		reg->size            = memory.regions[i].memory_size;
> +		reg->fd              = fd;
> +
> +		mmap_offset = memory.regions[i].mmap_offset;
> +		mmap_size   = reg->size + mmap_offset;
>  
>  		/* mmap() without flag of MAP_ANONYMOUS, should be called
>  		 * with length argument aligned with hugepagesz at older @@ -446,67 +426,51 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
>  		 * to avoid failure, make sure in caller to keep length
>  		 * aligned.
>  		 */
> -		alignment = get_blk_size(pmsg->fds[idx]);
> +		alignment = get_blk_size(fd);
>  		if (alignment == (uint64_t)-1) {
>  			RTE_LOG(ERR, VHOST_CONFIG,
>  				"couldn't get hugepage size through fstat\n");
>  			goto err_mmap;
>  		}
> -		mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
> +		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
>  
> -		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
> -			mapped_size,
> -			PROT_READ | PROT_WRITE, MAP_SHARED,
> -			pmsg->fds[idx],
> -			0);
> +		mmap_addr = mmap(NULL, mmap_size,
> +				 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>  
> -		RTE_LOG(INFO, VHOST_CONFIG,
> -			"mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
> -			"off:0x%"PRIx64" align:0x%"PRIx64"\n",
> -			idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
> -			mapped_size, memory.regions[idx].mmap_offset,
> -			alignment);
> -
> -		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
> +		if (mmap_addr == MAP_FAILED) {
>  			RTE_LOG(ERR, VHOST_CONFIG,
> -				"mmap qemu guest failed.\n");
> +				"mmap region %u failed.\n", i);
>  			goto err_mmap;
>  		}
>  
> -		pregion_orig[idx].mapped_address = mapped_address;
> -		pregion_orig[idx].mapped_size = mapped_size;
> -		pregion_orig[idx].blksz = alignment;
> -		pregion_orig[idx].fd = pmsg->fds[idx];
> -
> -		mapped_address +=  memory.regions[idx].mmap_offset;
> +		reg->mmap_addr = mmap_addr;
> +		reg->mmap_size = mmap_size;
> +		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
>  
> -		pregion->address_offset = mapped_address -
> -			pregion->guest_phys_address;
> -
> -		if (memory.regions[idx].guest_phys_addr == 0) {
> -			dev->mem->base_address =
> -				memory.regions[idx].userspace_addr;
> -			dev->mem->mapped_address =
> -				pregion->address_offset;
> -		}
> -
> -		LOG_DEBUG(VHOST_CONFIG,
> -			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
> -			idx,
> -			(void *)(uintptr_t)pregion->guest_phys_address,
> -			(void *)(uintptr_t)pregion->userspace_address,
> -			 pregion->memory_size);
> +		RTE_LOG(INFO, VHOST_CONFIG,
> +			"guest memory region %u, size: 0x%" PRIx64 "\n"
> +			"\t guest physical addr: 0x%" PRIx64 "\n"
> +			"\t guest virtual  addr: 0x%" PRIx64 "\n"
> +			"\t host  virtual  addr: 0x%" PRIx64 "\n"
> +			"\t mmap addr : 0x%" PRIx64 "\n"
> +			"\t mmap size : 0x%" PRIx64 "\n"
> +			"\t mmap align: 0x%" PRIx64 "\n"
> +			"\t mmap off  : 0x%" PRIx64 "\n",
> +			i, reg->size,
> +			reg->guest_phys_addr,
> +			reg->guest_user_addr,
> +			reg->host_user_addr,
> +			(uint64_t)(uintptr_t)mmap_addr,
> +			mmap_size,
> +			alignment,
> +			mmap_offset);
>  	}
>  
>  	return 0;
>  
>  err_mmap:
> -	while (idx--) {
> -		munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
> -				pregion_orig[idx].mapped_size);
> -		close(pregion_orig[idx].fd);
> -	}
> -	free(dev->mem);
> +	free_mem_region(dev);
> +	rte_free(dev->mem);
>  	dev->mem = NULL;
>  	return -1;
>  }
> --
> 1.9.0

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
  2016-08-23  8:10  3% ` [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling Yuanhan Liu
  2016-08-23  9:17  0%   ` Maxime Coquelin
@ 2016-08-24  7:26  3%   ` Xu, Qian Q
  2016-08-24  7:40  0%     ` Yuanhan Liu
  1 sibling, 1 reply; 200+ results
From: Xu, Qian Q @ 2016-08-24  7:26 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: Maxime Coquelin

I want to apply the patch on the latest DPDK, see below commit ID but failed since no vhost.h and vhost-user.h files. So do you have any dependency on other patches? 

commit 28d8abaf250c3fb4dcb6416518f4c54b4ae67205
Author: Deirdre O'Connor <deirdre.o.connor@intel.com>
Date:   Mon Aug 22 17:20:08 2016 +0100

    doc: fix patchwork link

    Fixes: 58abf6e77c6b ("doc: add contributors guide")

    Reported-by: Jon Loeliger <jdl@netgate.com>
    Signed-off-by: Deirdre O'Connor <deirdre.o.connor@intel.com>
    Acked-by: John McNamara <john.mcnamara@intel.com>


-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
Sent: Tuesday, August 23, 2016 4:11 PM
To: dev@dpdk.org
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
Subject: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling

Due to history reason (that vhost-cuse comes before vhost-user), some fields for maintaining the vhost-user memory mappings (such as mmapped address and size, with those we then can unmap on destroy) are kept in "orig_region_map" struct, a structure that is defined only in vhost-user source file.

The right way to go is to remove the structure and move all those fields into virtio_memory_region struct. But we simply can't do that before, because it breaks the ABI.

Now, thanks to the ABI refactoring, it's never been a blocking issue any more. And here it goes: this patch removes orig_region_map and redefines virtio_memory_region, to include all necessary info.

With that, we can simplify the guest/host address convert a bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/librte_vhost/vhost.h      |  49 ++++++------
 lib/librte_vhost/vhost_user.c | 172 +++++++++++++++++-------------------------
 2 files changed, 90 insertions(+), 131 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index c2dfc3c..df2107b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -143,12 +143,14 @@ struct virtio_net {
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
  */
-struct virtio_memory_regions {
-	uint64_t guest_phys_address;
-	uint64_t guest_phys_address_end;
-	uint64_t memory_size;
-	uint64_t userspace_address;
-	uint64_t address_offset;
+struct virtio_memory_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
 };
 
 
@@ -156,12 +158,8 @@ struct virtio_memory_regions {
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-	/* Base QEMU userspace address of the memory file. */
-	uint64_t base_address;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
 	uint32_t nregions;
-	struct virtio_memory_regions regions[0];
+	struct virtio_memory_region regions[0];
 };
 
 
@@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
 #define MAX_VHOST_DEVICE	1024
 extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
-/**
- * Function to convert guest physical addresses to vhost virtual addresses.
- * This is used to convert guest virtio buffer addresses.
- */
+/* Convert guest physical Address to host virtual address */
 static inline uint64_t __attribute__((always_inline)) -gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
+gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
 {
-	struct virtio_memory_regions *region;
-	uint32_t regionidx;
-	uint64_t vhost_va = 0;
-
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((guest_pa >= region->guest_phys_address) &&
-			(guest_pa <= region->guest_phys_address_end)) {
-			vhost_va = region->address_offset + guest_pa;
-			break;
+	struct virtio_memory_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 struct virtio_net_device_ops const *notify_ops; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index eee99e9..d2071fd 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",  };
 
-struct orig_region_map {
-	int fd;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
-	uint64_t blksz;
-};
-
-#define orig_region(ptr, nregions) \
-	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
-		sizeof(struct virtio_memory) + \
-		sizeof(struct virtio_memory_regions) * (nregions)))
-
 static uint64_t
 get_blk_size(int fd)
 {
@@ -99,18 +87,17 @@ get_blk_size(int fd)  static void  free_mem_region(struct virtio_net *dev)  {
-	struct orig_region_map *region;
-	unsigned int idx;
+	uint32_t i;
+	struct virtio_memory_region *reg;
 
 	if (!dev || !dev->mem)
 		return;
 
-	region = orig_region(dev->mem, dev->mem->nregions);
-	for (idx = 0; idx < dev->mem->nregions; idx++) {
-		if (region[idx].mapped_address) {
-			munmap((void *)(uintptr_t)region[idx].mapped_address,
-					region[idx].mapped_size);
-			close(region[idx].fd);
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr) {
+			munmap(reg->mmap_addr, reg->mmap_size);
+			close(reg->fd);
 		}
 	}
 }
@@ -120,7 +107,7 @@ vhost_backend_cleanup(struct virtio_net *dev)  {
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 	if (dev->log_addr) {
@@ -286,25 +273,23 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
  * used to convert the ring addresses to our address space.
  */
 static uint64_t
-qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
+qva_to_vva(struct virtio_net *dev, uint64_t qva)
 {
-	struct virtio_memory_regions *region;
-	uint64_t vhost_va = 0;
-	uint32_t regionidx = 0;
+	struct virtio_memory_region *reg;
+	uint32_t i;
 
 	/* Find the region where the address lives. */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((qemu_va >= region->userspace_address) &&
-			(qemu_va <= region->userspace_address +
-			region->memory_size)) {
-			vhost_va = qemu_va + region->guest_phys_address +
-				region->address_offset -
-				region->userspace_address;
-			break;
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+
+		if (qva >= reg->guest_user_addr &&
+		    qva <  reg->guest_user_addr + reg->size) {
+			return qva - reg->guest_user_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 /*
@@ -391,11 +376,13 @@ static int
 vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)  {
 	struct VhostUserMemory memory = pmsg->payload.memory;
-	struct virtio_memory_regions *pregion;
-	uint64_t mapped_address, mapped_size;
-	unsigned int idx = 0;
-	struct orig_region_map *pregion_orig;
+	struct virtio_memory_region *reg;
+	void *mmap_addr;
+	uint64_t mmap_size;
+	uint64_t mmap_offset;
 	uint64_t alignment;
+	uint32_t i;
+	int fd;
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) { @@ -405,14 +392,12 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 
-	dev->mem = calloc(1,
-		sizeof(struct virtio_memory) +
-		sizeof(struct virtio_memory_regions) * memory.nregions +
-		sizeof(struct orig_region_map) * memory.nregions);
+	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_region) * memory.nregions, 0);
 	if (dev->mem == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to allocate memory for dev->mem\n", @@ -421,22 +406,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	}
 	dev->mem->nregions = memory.nregions;
 
-	pregion_orig = orig_region(dev->mem, memory.nregions);
-	for (idx = 0; idx < memory.nregions; idx++) {
-		pregion = &dev->mem->regions[idx];
-		pregion->guest_phys_address =
-			memory.regions[idx].guest_phys_addr;
-		pregion->guest_phys_address_end =
-			memory.regions[idx].guest_phys_addr +
-			memory.regions[idx].memory_size;
-		pregion->memory_size =
-			memory.regions[idx].memory_size;
-		pregion->userspace_address =
-			memory.regions[idx].userspace_addr;
-
-		/* This is ugly */
-		mapped_size = memory.regions[idx].memory_size +
-			memory.regions[idx].mmap_offset;
+	for (i = 0; i < memory.nregions; i++) {
+		fd  = pmsg->fds[i];
+		reg = &dev->mem->regions[i];
+
+		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
+		reg->guest_user_addr = memory.regions[i].userspace_addr;
+		reg->size            = memory.regions[i].memory_size;
+		reg->fd              = fd;
+
+		mmap_offset = memory.regions[i].mmap_offset;
+		mmap_size   = reg->size + mmap_offset;
 
 		/* mmap() without flag of MAP_ANONYMOUS, should be called
 		 * with length argument aligned with hugepagesz at older @@ -446,67 +426,51 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 		 * to avoid failure, make sure in caller to keep length
 		 * aligned.
 		 */
-		alignment = get_blk_size(pmsg->fds[idx]);
+		alignment = get_blk_size(fd);
 		if (alignment == (uint64_t)-1) {
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"couldn't get hugepage size through fstat\n");
 			goto err_mmap;
 		}
-		mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
+		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
 
-		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
-			mapped_size,
-			PROT_READ | PROT_WRITE, MAP_SHARED,
-			pmsg->fds[idx],
-			0);
+		mmap_addr = mmap(NULL, mmap_size,
+				 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
-			"off:0x%"PRIx64" align:0x%"PRIx64"\n",
-			idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
-			mapped_size, memory.regions[idx].mmap_offset,
-			alignment);
-
-		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
+		if (mmap_addr == MAP_FAILED) {
 			RTE_LOG(ERR, VHOST_CONFIG,
-				"mmap qemu guest failed.\n");
+				"mmap region %u failed.\n", i);
 			goto err_mmap;
 		}
 
-		pregion_orig[idx].mapped_address = mapped_address;
-		pregion_orig[idx].mapped_size = mapped_size;
-		pregion_orig[idx].blksz = alignment;
-		pregion_orig[idx].fd = pmsg->fds[idx];
-
-		mapped_address +=  memory.regions[idx].mmap_offset;
+		reg->mmap_addr = mmap_addr;
+		reg->mmap_size = mmap_size;
+		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
 
-		pregion->address_offset = mapped_address -
-			pregion->guest_phys_address;
-
-		if (memory.regions[idx].guest_phys_addr == 0) {
-			dev->mem->base_address =
-				memory.regions[idx].userspace_addr;
-			dev->mem->mapped_address =
-				pregion->address_offset;
-		}
-
-		LOG_DEBUG(VHOST_CONFIG,
-			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
-			idx,
-			(void *)(uintptr_t)pregion->guest_phys_address,
-			(void *)(uintptr_t)pregion->userspace_address,
-			 pregion->memory_size);
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"guest memory region %u, size: 0x%" PRIx64 "\n"
+			"\t guest physical addr: 0x%" PRIx64 "\n"
+			"\t guest virtual  addr: 0x%" PRIx64 "\n"
+			"\t host  virtual  addr: 0x%" PRIx64 "\n"
+			"\t mmap addr : 0x%" PRIx64 "\n"
+			"\t mmap size : 0x%" PRIx64 "\n"
+			"\t mmap align: 0x%" PRIx64 "\n"
+			"\t mmap off  : 0x%" PRIx64 "\n",
+			i, reg->size,
+			reg->guest_phys_addr,
+			reg->guest_user_addr,
+			reg->host_user_addr,
+			(uint64_t)(uintptr_t)mmap_addr,
+			mmap_size,
+			alignment,
+			mmap_offset);
 	}
 
 	return 0;
 
 err_mmap:
-	while (idx--) {
-		munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
-				pregion_orig[idx].mapped_size);
-		close(pregion_orig[idx].fd);
-	}
-	free(dev->mem);
+	free_mem_region(dev);
+	rte_free(dev->mem);
 	dev->mem = NULL;
 	return -1;
 }
--
1.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
  2016-08-23  8:10  3% ` [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling Yuanhan Liu
@ 2016-08-23  9:17  0%   ` Maxime Coquelin
  2016-08-24  7:26  3%   ` Xu, Qian Q
  1 sibling, 0 replies; 200+ results
From: Maxime Coquelin @ 2016-08-23  9:17 UTC (permalink / raw)
  To: Yuanhan Liu, dev



On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
> Due to history reason (that vhost-cuse comes before vhost-user), some
> fields for maintaining the vhost-user memory mappings (such as mmapped
> address and size, with those we then can unmap on destroy) are kept in
> "orig_region_map" struct, a structure that is defined only in vhost-user
> source file.
>
> The right way to go is to remove the structure and move all those fields
> into virtio_memory_region struct. But we simply can't do that before,
> because it breaks the ABI.
>
> Now, thanks to the ABI refactoring, it's never been a blocking issue
> any more. And here it goes: this patch removes orig_region_map and
> redefines virtio_memory_region, to include all necessary info.
>
> With that, we can simplify the guest/host address convert a bit.
>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  lib/librte_vhost/vhost.h      |  49 ++++++------
>  lib/librte_vhost/vhost_user.c | 172 +++++++++++++++++-------------------------
>  2 files changed, 90 insertions(+), 131 deletions(-)
>

Thanks for explaining the history behind this.
FWIW, the change looks good to me:

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling
  @ 2016-08-23  8:10  3% ` Yuanhan Liu
  2016-08-23  9:17  0%   ` Maxime Coquelin
  2016-08-24  7:26  3%   ` Xu, Qian Q
    1 sibling, 2 replies; 200+ results
From: Yuanhan Liu @ 2016-08-23  8:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Yuanhan Liu

Due to history reason (that vhost-cuse comes before vhost-user), some
fields for maintaining the vhost-user memory mappings (such as mmapped
address and size, with those we then can unmap on destroy) are kept in
"orig_region_map" struct, a structure that is defined only in vhost-user
source file.

The right way to go is to remove the structure and move all those fields
into virtio_memory_region struct. But we simply can't do that before,
because it breaks the ABI.

Now, thanks to the ABI refactoring, it's never been a blocking issue
any more. And here it goes: this patch removes orig_region_map and
redefines virtio_memory_region, to include all necessary info.

With that, we can simplify the guest/host address convert a bit.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/librte_vhost/vhost.h      |  49 ++++++------
 lib/librte_vhost/vhost_user.c | 172 +++++++++++++++++-------------------------
 2 files changed, 90 insertions(+), 131 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c2dfc3c..df2107b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -143,12 +143,14 @@ struct virtio_net {
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
  */
-struct virtio_memory_regions {
-	uint64_t guest_phys_address;
-	uint64_t guest_phys_address_end;
-	uint64_t memory_size;
-	uint64_t userspace_address;
-	uint64_t address_offset;
+struct virtio_memory_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
 };
 
 
@@ -156,12 +158,8 @@ struct virtio_memory_regions {
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-	/* Base QEMU userspace address of the memory file. */
-	uint64_t base_address;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
 	uint32_t nregions;
-	struct virtio_memory_regions regions[0];
+	struct virtio_memory_region regions[0];
 };
 
 
@@ -200,26 +198,23 @@ extern uint64_t VHOST_FEATURES;
 #define MAX_VHOST_DEVICE	1024
 extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
-/**
- * Function to convert guest physical addresses to vhost virtual addresses.
- * This is used to convert guest virtio buffer addresses.
- */
+/* Convert guest physical Address to host virtual address */
 static inline uint64_t __attribute__((always_inline))
-gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
+gpa_to_vva(struct virtio_net *dev, uint64_t gpa)
 {
-	struct virtio_memory_regions *region;
-	uint32_t regionidx;
-	uint64_t vhost_va = 0;
-
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((guest_pa >= region->guest_phys_address) &&
-			(guest_pa <= region->guest_phys_address_end)) {
-			vhost_va = region->address_offset + guest_pa;
-			break;
+	struct virtio_memory_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 struct virtio_net_device_ops const *notify_ops;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index eee99e9..d2071fd 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -74,18 +74,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 };
 
-struct orig_region_map {
-	int fd;
-	uint64_t mapped_address;
-	uint64_t mapped_size;
-	uint64_t blksz;
-};
-
-#define orig_region(ptr, nregions) \
-	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
-		sizeof(struct virtio_memory) + \
-		sizeof(struct virtio_memory_regions) * (nregions)))
-
 static uint64_t
 get_blk_size(int fd)
 {
@@ -99,18 +87,17 @@ get_blk_size(int fd)
 static void
 free_mem_region(struct virtio_net *dev)
 {
-	struct orig_region_map *region;
-	unsigned int idx;
+	uint32_t i;
+	struct virtio_memory_region *reg;
 
 	if (!dev || !dev->mem)
 		return;
 
-	region = orig_region(dev->mem, dev->mem->nregions);
-	for (idx = 0; idx < dev->mem->nregions; idx++) {
-		if (region[idx].mapped_address) {
-			munmap((void *)(uintptr_t)region[idx].mapped_address,
-					region[idx].mapped_size);
-			close(region[idx].fd);
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr) {
+			munmap(reg->mmap_addr, reg->mmap_size);
+			close(reg->fd);
 		}
 	}
 }
@@ -120,7 +107,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
 {
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 	if (dev->log_addr) {
@@ -286,25 +273,23 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
  * used to convert the ring addresses to our address space.
  */
 static uint64_t
-qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
+qva_to_vva(struct virtio_net *dev, uint64_t qva)
 {
-	struct virtio_memory_regions *region;
-	uint64_t vhost_va = 0;
-	uint32_t regionidx = 0;
+	struct virtio_memory_region *reg;
+	uint32_t i;
 
 	/* Find the region where the address lives. */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		region = &dev->mem->regions[regionidx];
-		if ((qemu_va >= region->userspace_address) &&
-			(qemu_va <= region->userspace_address +
-			region->memory_size)) {
-			vhost_va = qemu_va + region->guest_phys_address +
-				region->address_offset -
-				region->userspace_address;
-			break;
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+
+		if (qva >= reg->guest_user_addr &&
+		    qva <  reg->guest_user_addr + reg->size) {
+			return qva - reg->guest_user_addr +
+			       reg->host_user_addr;
 		}
 	}
-	return vhost_va;
+
+	return 0;
 }
 
 /*
@@ -391,11 +376,13 @@ static int
 vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 {
 	struct VhostUserMemory memory = pmsg->payload.memory;
-	struct virtio_memory_regions *pregion;
-	uint64_t mapped_address, mapped_size;
-	unsigned int idx = 0;
-	struct orig_region_map *pregion_orig;
+	struct virtio_memory_region *reg;
+	void *mmap_addr;
+	uint64_t mmap_size;
+	uint64_t mmap_offset;
 	uint64_t alignment;
+	uint32_t i;
+	int fd;
 
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
@@ -405,14 +392,12 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 
 	if (dev->mem) {
 		free_mem_region(dev);
-		free(dev->mem);
+		rte_free(dev->mem);
 		dev->mem = NULL;
 	}
 
-	dev->mem = calloc(1,
-		sizeof(struct virtio_memory) +
-		sizeof(struct virtio_memory_regions) * memory.nregions +
-		sizeof(struct orig_region_map) * memory.nregions);
+	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_region) * memory.nregions, 0);
 	if (dev->mem == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to allocate memory for dev->mem\n",
@@ -421,22 +406,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	}
 	dev->mem->nregions = memory.nregions;
 
-	pregion_orig = orig_region(dev->mem, memory.nregions);
-	for (idx = 0; idx < memory.nregions; idx++) {
-		pregion = &dev->mem->regions[idx];
-		pregion->guest_phys_address =
-			memory.regions[idx].guest_phys_addr;
-		pregion->guest_phys_address_end =
-			memory.regions[idx].guest_phys_addr +
-			memory.regions[idx].memory_size;
-		pregion->memory_size =
-			memory.regions[idx].memory_size;
-		pregion->userspace_address =
-			memory.regions[idx].userspace_addr;
-
-		/* This is ugly */
-		mapped_size = memory.regions[idx].memory_size +
-			memory.regions[idx].mmap_offset;
+	for (i = 0; i < memory.nregions; i++) {
+		fd  = pmsg->fds[i];
+		reg = &dev->mem->regions[i];
+
+		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
+		reg->guest_user_addr = memory.regions[i].userspace_addr;
+		reg->size            = memory.regions[i].memory_size;
+		reg->fd              = fd;
+
+		mmap_offset = memory.regions[i].mmap_offset;
+		mmap_size   = reg->size + mmap_offset;
 
 		/* mmap() without flag of MAP_ANONYMOUS, should be called
 		 * with length argument aligned with hugepagesz at older
@@ -446,67 +426,51 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 		 * to avoid failure, make sure in caller to keep length
 		 * aligned.
 		 */
-		alignment = get_blk_size(pmsg->fds[idx]);
+		alignment = get_blk_size(fd);
 		if (alignment == (uint64_t)-1) {
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"couldn't get hugepage size through fstat\n");
 			goto err_mmap;
 		}
-		mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
+		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
 
-		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
-			mapped_size,
-			PROT_READ | PROT_WRITE, MAP_SHARED,
-			pmsg->fds[idx],
-			0);
+		mmap_addr = mmap(NULL, mmap_size,
+				 PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
 
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
-			"off:0x%"PRIx64" align:0x%"PRIx64"\n",
-			idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
-			mapped_size, memory.regions[idx].mmap_offset,
-			alignment);
-
-		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
+		if (mmap_addr == MAP_FAILED) {
 			RTE_LOG(ERR, VHOST_CONFIG,
-				"mmap qemu guest failed.\n");
+				"mmap region %u failed.\n", i);
 			goto err_mmap;
 		}
 
-		pregion_orig[idx].mapped_address = mapped_address;
-		pregion_orig[idx].mapped_size = mapped_size;
-		pregion_orig[idx].blksz = alignment;
-		pregion_orig[idx].fd = pmsg->fds[idx];
-
-		mapped_address +=  memory.regions[idx].mmap_offset;
+		reg->mmap_addr = mmap_addr;
+		reg->mmap_size = mmap_size;
+		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
 
-		pregion->address_offset = mapped_address -
-			pregion->guest_phys_address;
-
-		if (memory.regions[idx].guest_phys_addr == 0) {
-			dev->mem->base_address =
-				memory.regions[idx].userspace_addr;
-			dev->mem->mapped_address =
-				pregion->address_offset;
-		}
-
-		LOG_DEBUG(VHOST_CONFIG,
-			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
-			idx,
-			(void *)(uintptr_t)pregion->guest_phys_address,
-			(void *)(uintptr_t)pregion->userspace_address,
-			 pregion->memory_size);
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"guest memory region %u, size: 0x%" PRIx64 "\n"
+			"\t guest physical addr: 0x%" PRIx64 "\n"
+			"\t guest virtual  addr: 0x%" PRIx64 "\n"
+			"\t host  virtual  addr: 0x%" PRIx64 "\n"
+			"\t mmap addr : 0x%" PRIx64 "\n"
+			"\t mmap size : 0x%" PRIx64 "\n"
+			"\t mmap align: 0x%" PRIx64 "\n"
+			"\t mmap off  : 0x%" PRIx64 "\n",
+			i, reg->size,
+			reg->guest_phys_addr,
+			reg->guest_user_addr,
+			reg->host_user_addr,
+			(uint64_t)(uintptr_t)mmap_addr,
+			mmap_size,
+			alignment,
+			mmap_offset);
 	}
 
 	return 0;
 
 err_mmap:
-	while (idx--) {
-		munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
-				pregion_orig[idx].mapped_size);
-		close(pregion_orig[idx].fd);
-	}
-	free(dev->mem);
+	free_mem_region(dev);
+	rte_free(dev->mem);
 	dev->mem = NULL;
 	return -1;
 }
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC v2] ethdev: introduce generic flow API
  2016-08-19 19:32  1%   ` [dpdk-dev] [RFC v2] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-08-22 18:20  0%     ` John Fastabend
  0 siblings, 0 replies; 200+ results
From: John Fastabend @ 2016-08-22 18:20 UTC (permalink / raw)
  To: Adrien Mazarguil, dev

On 16-08-19 12:32 PM, Adrien Mazarguil wrote:
> This new API supersedes all the legacy filter types described in
> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> PMDs to process and validate flow rules.
> 
> It has the following benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> The existing filter types will be deprecated and removed in the near
> future.
> 
> Note that it is not complete yet. This commit only provides the header
> file. The specification is provided separately, see below.
> 
> HTML version:
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
> 
> PDF version:
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf
> 
> Git tree:
>  https://github.com/6WIND/rte_flow
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---

Hi Adrien,

[...]

> +
> +/**
> + * Flow rule attributes.
> + *
> + * Priorities are set on two levels: per group and per rule within groups.
> + *
> + * Lower values denote higher priority, the highest priority for both levels
> + * is 0, so that a rule with priority 0 in group 8 is always matched after a
> + * rule with priority 8 in group 0.
> + *
> + * Although optional, applications are encouraged to group similar rules as
> + * much as possible to fully take advantage of hardware capabilities
> + * (e.g. optimized matching) and work around limitations (e.g. a single
> + * pattern type possibly allowed in a given group).
> + *
> + * Group and priority levels are arbitrary and up to the application, they
> + * do not need to be contiguous nor start from 0, however the maximum number
> + * varies between devices and may be affected by existing flow rules.
> + *

Another pattern that I just want to note, I think it can be covered is
to map rules between groups.

The idea is if we build a "tunnel-endpoint" group based on a rule in the
tunnel-endpoint we might map this onto a "switch" group. In this case
the "switch" group match should depend on a rule in the
"tunnel-endpoint"  group. Meaning the TEP select the switch. I believe
this can be done with a metadata action.

My idea is to create a rule in "tunnel-endpoint" group that has a
match based on TEP address and then an action "send-to-switch group" +
"metadata set 0x1234". Then in the "switch group" add a match "metadata
eq 0x1234" this allows linking groups together.

It certainly doesn't all need to be in the first iteration of this
series but do you think this is reasonable as a TODO/future extension.
And if we standardize around group-ids the semantics should be
consistent for at least the set of NICs that support tunnel endpoint and
multiple switches.

Any thoughts?


.John

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] mk: gcc -march support for intel processors code names
  @ 2016-08-22 14:19  7% ` Reshma Pattan
  2016-10-10 21:33  8%   ` [dpdk-dev] [PATCH v3] " Reshma Pattan
  0 siblings, 1 reply; 200+ results
From: Reshma Pattan @ 2016-08-22 14:19 UTC (permalink / raw)
  To: dev; +Cc: Reshma Pattan

The GCC 4.9 -march option supports the intel code names for processors,
for example -march=silvermont, -march=broadwell.
The RTE_MACHINE config flag can be used to pass code name to
the compiler as -march flag. Also old gcc versions compatibility code
for the intel platform is removed from
mk/toolchain/gcc/rte.toolchain-compat.mk

Release notes is updated.

Linux and FreeBSD getting started guides are updated with recommended
gcc version as 4.9 and above.

Some of the gmake command examples in sample application guide and driver
guides are updated with gcc version as 4.9.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
---
 doc/guides/freebsd_gsg/build_dpdk.rst        |  4 +--
 doc/guides/freebsd_gsg/build_sample_apps.rst |  6 ++--
 doc/guides/linux_gsg/sys_reqs.rst            |  6 ++--
 doc/guides/nics/bnx2x.rst                    |  4 +--
 doc/guides/nics/qede.rst                     |  2 +-
 doc/guides/rel_notes/release_16_11.rst       |  5 +++
 mk/target/generic/rte.vars.mk                |  4 +++
 mk/toolchain/gcc/rte.toolchain-compat.mk     | 47 ++--------------------------
 8 files changed, 22 insertions(+), 56 deletions(-)

v2:
Updated Linux and FreeBSD gsg guides, sample application guide and other driver doc
with recommended gcc version as 4.9 and above.

diff --git a/doc/guides/freebsd_gsg/build_dpdk.rst b/doc/guides/freebsd_gsg/build_dpdk.rst
index 93c4366..7d5e9dc 100644
--- a/doc/guides/freebsd_gsg/build_dpdk.rst
+++ b/doc/guides/freebsd_gsg/build_dpdk.rst
@@ -88,7 +88,7 @@ The ports required and their locations are as follows:
 For compiling and using the DPDK with gcc, the compiler must be installed
 from the ports collection:
 
-* gcc: version 4.8 is recommended ``/usr/ports/lang/gcc48``.
+* gcc: version 4.9 is recommended ``/usr/ports/lang/gcc49``.
   Ensure that ``CPU_OPTS`` is selected (default is OFF).
 
 When running the make config-recursive command, a dialog may be presented to the
@@ -168,7 +168,7 @@ For example to compile for FreeBSD use:
    If the compiler binary to be used does not correspond to that given in the
    TOOLCHAIN part of the target, the compiler command may need to be explicitly
    specified. For example, if compiling for gcc, where the gcc binary is called
-   gcc4.8, the command would need to be ``gmake install T=<target> CC=gcc4.8``.
+   gcc4.9, the command would need to be ``gmake install T=<target> CC=gcc4.9``.
 
 Browsing the Installed DPDK Environment Target
 ----------------------------------------------
diff --git a/doc/guides/freebsd_gsg/build_sample_apps.rst b/doc/guides/freebsd_gsg/build_sample_apps.rst
index 2662303..fffc4c0 100644
--- a/doc/guides/freebsd_gsg/build_sample_apps.rst
+++ b/doc/guides/freebsd_gsg/build_sample_apps.rst
@@ -54,7 +54,7 @@ the following variables must be exported:
 
 The following is an example of creating the ``helloworld`` application, which runs
 in the DPDK FreeBSD environment. While the example demonstrates compiling
-using gcc version 4.8, compiling with clang will be similar, except that the ``CC=``
+using gcc version 4.9, compiling with clang will be similar, except that the ``CC=``
 parameter can probably be omitted. The ``helloworld`` example may be found in the
 ``${RTE_SDK}/examples`` directory.
 
@@ -72,7 +72,7 @@ in the build directory.
     setenv RTE_SDK $HOME/DPDK
     setenv RTE_TARGET x86_64-native-bsdapp-gcc
 
-    gmake CC=gcc48
+    gmake CC=gcc49
       CC main.o
       LD helloworld
       INSTALL-APP helloworld
@@ -96,7 +96,7 @@ in the build directory.
     cd my_rte_app/
     setenv RTE_TARGET x86_64-native-bsdapp-gcc
 
-    gmake CC=gcc48
+    gmake CC=gcc49
       CC main.o
       LD helloworld
       INSTALL-APP helloworld
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index b321544..3d74342 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -61,8 +61,8 @@ Compilation of the DPDK
 
 *   coreutils: ``cmp``, ``sed``, ``grep``, ``arch``, etc.
 
-*   gcc: versions 4.5.x or later is recommended for ``i686/x86_64``. Versions 4.8.x or later is recommended
-    for ``ppc_64`` and ``x86_x32`` ABI. On some distributions, some specific compiler flags and linker flags are enabled by
+*   gcc: versions 4.9 or later is recommended for all platforms.
+    On some distributions, some specific compiler flags and linker flags are enabled by
     default and affect performance (``-fstack-protector``, for example). Please refer to the documentation
     of your distribution and to ``gcc -dumpspecs``.
 
@@ -82,7 +82,7 @@ Compilation of the DPDK
 .. note::
 
     x86_x32 ABI is currently supported with distribution packages only on Ubuntu
-    higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.8+.
+    higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.
 
 .. note::
 
diff --git a/doc/guides/nics/bnx2x.rst b/doc/guides/nics/bnx2x.rst
index 6453168..6d1768a 100644
--- a/doc/guides/nics/bnx2x.rst
+++ b/doc/guides/nics/bnx2x.rst
@@ -162,7 +162,7 @@ To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the following "gmake"
 command::
 
    cd <DPDK-source-directory>
-   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc48 CC=gcc48
+   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc49 CC=gcc49
 
 To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the following "gmake"
 command:
@@ -170,7 +170,7 @@ command:
 .. code-block:: console
 
    cd <DPDK-source-directory>
-   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc48 CC=gcc48
+   gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=/usr/local/lib/gcc49 CC=gcc49
 
 Linux
 -----
diff --git a/doc/guides/nics/qede.rst b/doc/guides/nics/qede.rst
index 53d749c..3af755e 100644
--- a/doc/guides/nics/qede.rst
+++ b/doc/guides/nics/qede.rst
@@ -150,7 +150,7 @@ command::
 
    cd <DPDK-source-directory>
    gmake config T=x86_64-native-bsdapp-gcc install -Wl,-rpath=\
-                                        /usr/local/lib/gcc48 CC=gcc48
+                                        /usr/local/lib/gcc49 CC=gcc49
 
 
 Sample Application Notes
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 0b9022d..9f58133 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -36,6 +36,11 @@ New Features
 
      This section is a comment. Make sure to start the actual text at the margin.
 
+* **Added support for new gcc -march option.**
+
+  The GCC 4.9 ``-march`` option supports the Intel processor code names.
+  The config option ``RTE_MACHINE`` can be used to pass code names to the compiler as ``-march`` flag.
+
 
 Resolved Issues
 ---------------
diff --git a/mk/target/generic/rte.vars.mk b/mk/target/generic/rte.vars.mk
index 75a616a..b31e426 100644
--- a/mk/target/generic/rte.vars.mk
+++ b/mk/target/generic/rte.vars.mk
@@ -50,7 +50,11 @@
 #   - can define CPU_ASFLAGS variable (overriden by cmdline value) that
 #     overrides the one defined in arch.
 #
+ifneq ($(wildcard $(RTE_SDK)/mk/machine/$(RTE_MACHINE)/rte.vars.mk),)
 include $(RTE_SDK)/mk/machine/$(RTE_MACHINE)/rte.vars.mk
+else
+MACHINE_CFLAGS := -march=$(RTE_MACHINE)
+endif
 
 #
 # arch:
diff --git a/mk/toolchain/gcc/rte.toolchain-compat.mk b/mk/toolchain/gcc/rte.toolchain-compat.mk
index 6eed20c..7f23721 100644
--- a/mk/toolchain/gcc/rte.toolchain-compat.mk
+++ b/mk/toolchain/gcc/rte.toolchain-compat.mk
@@ -42,51 +42,8 @@ GCC_MAJOR = $(shell echo __GNUC__ | $(CC) -E -x c - | tail -n 1)
 GCC_MINOR = $(shell echo __GNUC_MINOR__ | $(CC) -E -x c - | tail -n 1)
 GCC_VERSION = $(GCC_MAJOR)$(GCC_MINOR)
 
-# if GCC is older than 4.x
-ifeq ($(shell test $(GCC_VERSION) -lt 40 && echo 1), 1)
-	MACHINE_CFLAGS =
-$(warning You are using GCC < 4.x. This is neither supported, nor tested.)
-
-
-else
-# GCC graceful degradation
-# GCC 4.2.x - added support for generic target
-# GCC 4.3.x - added support for core2, ssse3, sse4.1, sse4.2
-# GCC 4.4.x - added support for avx, aes, pclmul
-# GCC 4.5.x - added support for atom
-# GCC 4.6.x - added support for corei7, corei7-avx
-# GCC 4.7.x - added support for fsgsbase, rdrnd, f16c, core-avx-i, core-avx2
 # GCC 4.9.x - added support for armv8-a+crc
 #
-	ifeq ($(shell test $(GCC_VERSION) -le 49 && echo 1), 1)
-		MACHINE_CFLAGS := $(patsubst -march=armv8-a+crc,-march=armv8-a+crc -D__ARM_FEATURE_CRC32=1,$(MACHINE_CFLAGS))
-	endif
-	ifeq ($(shell test $(GCC_VERSION) -le 47 && echo 1), 1)
-		MACHINE_CFLAGS := $(patsubst -march=core-avx-i,-march=corei7-avx,$(MACHINE_CFLAGS))
-		MACHINE_CFLAGS := $(patsubst -march=core-avx2,-march=core-avx2,$(MACHINE_CFLAGS))
-	endif
-	ifeq ($(shell test $(GCC_VERSION) -lt 46 && echo 1), 1)
-		MACHINE_CFLAGS := $(patsubst -march=corei7-avx,-march=core2 -maes -mpclmul -mavx,$(MACHINE_CFLAGS))
-		MACHINE_CFLAGS := $(patsubst -march=corei7,-march=core2 -maes -mpclmul,$(MACHINE_CFLAGS))
-	endif
-	ifeq ($(shell test $(GCC_VERSION) -lt 45 && echo 1), 1)
-		MACHINE_CFLAGS := $(patsubst -march=atom,-march=core2 -mssse3,$(MACHINE_CFLAGS))
-	endif
-	ifeq ($(shell test $(GCC_VERSION) -lt 44 && echo 1), 1)
-		MACHINE_CFLAGS := $(filter-out -mavx -mpclmul -maes,$(MACHINE_CFLAGS))
-		ifneq ($(findstring SSE4_2, $(CPUFLAGS)),)
-			MACHINE_CFLAGS += -msse4.2
-		endif
-		ifneq ($(findstring SSE4_1, $(CPUFLAGS)),)
-			MACHINE_CFLAGS += -msse4.1
-		endif
-	endif
-	ifeq ($(shell test $(GCC_VERSION) -lt 43 && echo 1), 1)
-		MACHINE_CFLAGS := $(filter-out -msse% -mssse%,$(MACHINE_CFLAGS))
-		MACHINE_CFLAGS := $(patsubst -march=core2,-march=generic,$(MACHINE_CFLAGS))
-		MACHINE_CFLAGS += -msse3
-	endif
-	ifeq ($(shell test $(GCC_VERSION) -lt 42 && echo 1), 1)
-		MACHINE_CFLAGS := $(filter-out -march% -mtune% -msse%,$(MACHINE_CFLAGS))
-	endif
+ifeq ($(shell test $(GCC_VERSION) -le 49 && echo 1), 1)
+MACHINE_CFLAGS := $(patsubst -march=armv8-a+crc,-march=armv8-a+crc -D__ARM_FEATURE_CRC32=1,$(MACHINE_CFLAGS))
 endif
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] Best Practices for PMD Verification before Upstream Requests
  2016-08-17 12:34  3% [dpdk-dev] Best Practices for PMD Verification before Upstream Requests Shepard Siegel
@ 2016-08-22 13:07  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-08-22 13:07 UTC (permalink / raw)
  To: Shepard Siegel; +Cc: dev

2016-08-17 08:34, Shepard Siegel:
> Atomic Rules is new to the DPDK community. We attended the DPDK Summit last
> week and received terrific advice and encouragement. We are developing a
> DPDK PMD for our Arkville product which is a DPDK-aware data mover, capable
> of marshaling packets between FPGA/ASIC gates with AXI interfaces on one
> side, and the DPDK API/ABI on the other. Arkville plus a MAC looks like a
> line-rate-agnostic bare-bones L2 NIC. We have testpmd and our first DPDK
> applications running using our early-alpha Arkville PMD.

Welcome :)

Any release targeted for upstream support?

> This post is to ask of the DPDK community what tests, regressions,
> check-lists or similar verification assets we might work through before
> starting the process to upstream our code? We know device-specific PMDs are
> rather cloistered and unlikely to interfere; but still, others must have
> managed to find a way to fail with even an L2 baseline NIC.  We don’t want
> to needlessly repeat those mistakes. Any DPDK-specific collateral that we
> can use to verify and validate our codes before attempting to upstream them
> would be greatly appreciated. To the DPDK PMD developers, what can you
> share so that we are more aligned with your regressions? To the DPDK
> application developers, what’s your top gripe we might try to avoid in our
> Arkville L2 baseline PMD?

Are you aware of the DPDK test suite?
	http://dpdk.org/doc/dts/gsg/
	http://dpdk.org/browse/tools/dts/

I don't know how efficient it is for PMD developers and who use it.
I guess that DTS authors would like to have more feedbacks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC v2] ethdev: introduce generic flow API
  2016-08-19 19:32  2% ` [dpdk-dev] [RFC v2] " Adrien Mazarguil
@ 2016-08-19 19:32  1%   ` Adrien Mazarguil
  2016-08-22 18:20  0%     ` John Fastabend
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-08-19 19:32 UTC (permalink / raw)
  To: dev

This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

It has the following benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

The existing filter types will be deprecated and removed in the near
future.

Note that it is not complete yet. This commit only provides the header
file. The specification is provided separately, see below.

HTML version:
 https://rawgit.com/6WIND/rte_flow/master/rte_flow.html

PDF version:
 https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf

Git tree:
 https://github.com/6WIND/rte_flow

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_ether/Makefile   |   2 +
 lib/librte_ether/rte_flow.h | 941 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 943 insertions(+)

diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 0bb5dc9..a6f7cd5 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -52,8 +52,10 @@ SYMLINK-y-include += rte_ether.h
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
+DEPDIRS-y += lib/librte_net
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
new file mode 100644
index 0000000..0aa6094
--- /dev/null
+++ b/lib/librte_ether/rte_flow.h
@@ -0,0 +1,941 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_H_
+#define RTE_FLOW_H_
+
+/**
+ * @file
+ * RTE generic flow API
+ *
+ * This interface provides the ability to program packet matching and
+ * associated actions in hardware through flow rules.
+ */
+
+#include <rte_arp.h>
+#include <rte_ether.h>
+#include <rte_icmp.h>
+#include <rte_ip.h>
+#include <rte_sctp.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Flow rule attributes.
+ *
+ * Priorities are set on two levels: per group and per rule within groups.
+ *
+ * Lower values denote higher priority, the highest priority for both levels
+ * is 0, so that a rule with priority 0 in group 8 is always matched after a
+ * rule with priority 8 in group 0.
+ *
+ * Although optional, applications are encouraged to group similar rules as
+ * much as possible to fully take advantage of hardware capabilities
+ * (e.g. optimized matching) and work around limitations (e.g. a single
+ * pattern type possibly allowed in a given group).
+ *
+ * Group and priority levels are arbitrary and up to the application, they
+ * do not need to be contiguous nor start from 0, however the maximum number
+ * varies between devices and may be affected by existing flow rules.
+ *
+ * If a packet is matched by several rules of a given group for a given
+ * priority level, the outcome is undefined. It can take any path, may be
+ * duplicated or even cause unrecoverable errors.
+ *
+ * Note that support for more than a single group and priority level is not
+ * guaranteed.
+ *
+ * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+ *
+ * Several pattern items and actions are valid and can be used in both
+ * directions. Those valid for only one direction are described as such.
+ *
+ * Specifying both directions at once is not recommended but may be valid in
+ * some cases, such as incrementing the same counter twice.
+ *
+ * Not specifying any direction is currently an error.
+ */
+struct rte_flow_attr {
+	uint32_t group; /**< Priority group. */
+	uint32_t priority; /**< Priority level within group. */
+	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
+	uint32_t egress:1; /**< Rule applies to egress traffic. */
+	uint32_t reserved:30; /**< Reserved, must be zero. */
+};
+
+/**
+ * Matching pattern item types.
+ *
+ * Items are arranged in a list to form a matching pattern for packets.
+ * They fall in two categories:
+ *
+ * - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, SCTP,
+ *   VXLAN and so on), usually associated with a specification
+ *   structure. These must be stacked in the same order as the protocol
+ *   layers to match, starting from L2.
+ *
+ * - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, PORT
+ *   and so on), often without a specification structure. Since they are
+ *   meta data that does not match packet contents, these can be specified
+ *   anywhere within item lists without affecting the protocol matching
+ *   items.
+ *
+ * See the description of individual types for more information. Those
+ * marked with [META] fall into the second category.
+ */
+enum rte_flow_item_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for item lists. Prevents further processing of items,
+	 * thereby ending the pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_VOID,
+
+	/**
+	 * [META]
+	 *
+	 * Inverted matching, i.e. process packets that do not match the
+	 * pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_INVERT,
+
+	/**
+	 * Matches any protocol in place of the current layer, a single ANY
+	 * may also stand for several protocol layers.
+	 *
+	 * See struct rte_flow_item_any.
+	 */
+	RTE_FLOW_ITEM_TYPE_ANY,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to the physical function of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a PF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_PF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to a virtual function ID of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a VF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * See struct rte_flow_item_vf.
+	 */
+	RTE_FLOW_ITEM_TYPE_VF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets coming from the specified physical port of the
+	 * underlying device.
+	 *
+	 * The first PORT item overrides the physical port normally
+	 * associated with the specified DPDK input port (port_id). This
+	 * item can be provided several times to match additional physical
+	 * ports.
+	 *
+	 * See struct rte_flow_item_port.
+	 */
+	RTE_FLOW_ITEM_TYPE_PORT,
+
+	/**
+	 * Matches a byte string of a given length at a given offset.
+	 *
+	 * See struct rte_flow_item_raw.
+	 */
+	RTE_FLOW_ITEM_TYPE_RAW,
+
+	/**
+	 * Matches an Ethernet header.
+	 *
+	 * See struct rte_flow_item_eth.
+	 */
+	RTE_FLOW_ITEM_TYPE_ETH,
+
+	/**
+	 * Matches an IPv4 header.
+	 *
+	 * See struct rte_flow_item_ipv4.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV4,
+
+	/**
+	 * Matches an IPv6 header.
+	 *
+	 * See struct rte_flow_item_ipv6.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6,
+
+	/**
+	 * Matches an ICMP header.
+	 *
+	 * See struct rte_flow_item_icmp.
+	 */
+	RTE_FLOW_ITEM_TYPE_ICMP,
+
+	/**
+	 * Matches a UDP header.
+	 *
+	 * See struct rte_flow_item_udp.
+	 */
+	RTE_FLOW_ITEM_TYPE_UDP,
+
+	/**
+	 * Matches a TCP header.
+	 *
+	 * See struct rte_flow_item_tcp.
+	 */
+	RTE_FLOW_ITEM_TYPE_TCP,
+
+	/**
+	 * Matches a SCTP header.
+	 *
+	 * See struct rte_flow_item_sctp.
+	 */
+	RTE_FLOW_ITEM_TYPE_SCTP,
+
+	/**
+	 * Matches a VXLAN header.
+	 *
+	 * See struct rte_flow_item_vxlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VXLAN,
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ANY
+ *
+ * Matches any protocol in place of the current layer, a single ANY may also
+ * stand for several protocol layers.
+ *
+ * This is usually specified as the first pattern item when looking for a
+ * protocol anywhere in a packet.
+ *
+ * A maximum value of 0 requests matching any number of protocol layers
+ * above or equal to the minimum value, a maximum value lower than the
+ * minimum one is otherwise invalid.
+ *
+ * Layer mask is ignored.
+ */
+struct rte_flow_item_any {
+	uint16_t min; /**< Minimum number of layers covered. */
+	uint16_t max; /**< Maximum number of layers covered, 0 for infinity. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VF
+ *
+ * Matches packets addressed to a virtual function ID of the device.
+ *
+ * If the underlying device function differs from the one that would
+ * normally receive the matched traffic, specifying this item prevents it
+ * from reaching that device unless the flow rule contains a VF
+ * action. Packets are not duplicated between device instances by default.
+ *
+ * - Likely to return an error or never match any traffic if this causes a
+ *   VF device to match traffic addressed to a different VF.
+ * - Can be specified multiple times to match traffic addressed to several
+ *   specific VFs.
+ * - Can be combined with a PF item to match both PF and VF traffic.
+ *
+ * Layer mask is ignored.
+ */
+struct rte_flow_item_vf {
+	uint32_t any:1; /**< Ignore the specified VF ID. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint32_t vf; /**< Destination VF ID. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_PORT
+ *
+ * Matches packets coming from the specified physical port of the underlying
+ * device.
+ *
+ * The first PORT item overrides the physical port normally associated with
+ * the specified DPDK input port (port_id). This item can be provided
+ * several times to match additional physical ports.
+ *
+ * Layer mask is ignored.
+ *
+ * Note that physical ports are not necessarily tied to DPDK input ports
+ * (port_id) when those are not under DPDK control. Possible values are
+ * specific to each device, they are not necessarily indexed from zero and
+ * may not be contiguous.
+ *
+ * As a device property, the list of allowed values as well as the value
+ * associated with a port_id should be retrieved by other means.
+ */
+struct rte_flow_item_port {
+	uint32_t index; /**< Physical port index. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_RAW
+ *
+ * Matches a byte string of a given length at a given offset.
+ *
+ * Offset is either absolute (using the start of the packet) or relative to
+ * the end of the previous matched item in the stack, in which case negative
+ * values are allowed.
+ *
+ * If search is enabled, offset is used as the starting point. The search
+ * area can be delimited by setting limit to a nonzero value, which is the
+ * maximum number of bytes after offset where the pattern may start.
+ *
+ * Matching a zero-length pattern is allowed, doing so resets the relative
+ * offset for subsequent items.
+ *
+ * The mask only affects the pattern field.
+ */
+struct rte_flow_item_raw {
+	uint32_t relative:1; /**< Look for pattern after the previous item. */
+	uint32_t search:1; /**< Search pattern from offset (see also limit). */
+	uint32_t reserved:30; /**< Reserved, must be set to zero. */
+	int32_t offset; /**< Absolute or relative offset for pattern. */
+	uint16_t limit; /**< Search area limit for start of pattern. */
+	uint16_t length; /**< Pattern length. */
+	uint8_t pattern[]; /**< Byte string to look for. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ETH
+ *
+ * Matches an Ethernet header.
+ */
+struct rte_flow_item_eth {
+	struct ether_addr dst; /**< Destination MAC. */
+	struct ether_addr src; /**< Source MAC. */
+	unsigned int type; /**< EtherType. */
+	unsigned int tags; /**< Number of 802.1Q/ad tags defined. */
+	struct {
+		uint16_t tpid; /**< Tag protocol identifier. */
+		uint16_t tci; /**< Tag control information. */
+	} tag[]; /**< 802.1Q/ad tag definitions, outermost first. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV4
+ *
+ * Matches an IPv4 header.
+ *
+ * Note: IPv4 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv4 {
+	struct ipv4_hdr hdr; /**< IPv4 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV6.
+ *
+ * Matches an IPv6 header.
+ *
+ * Note: IPv6 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv6 {
+	struct ipv6_hdr hdr; /**< IPv6 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ICMP.
+ *
+ * Matches an ICMP header.
+ */
+struct rte_flow_item_icmp {
+	struct icmp_hdr hdr; /**< ICMP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_UDP.
+ *
+ * Matches a UDP header.
+ */
+struct rte_flow_item_udp {
+	struct udp_hdr hdr; /**< UDP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_TCP.
+ *
+ * Matches a TCP header.
+ */
+struct rte_flow_item_tcp {
+	struct tcp_hdr hdr; /**< TCP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_SCTP.
+ *
+ * Matches a SCTP header.
+ */
+struct rte_flow_item_sctp {
+	struct sctp_hdr hdr; /**< SCTP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VXLAN.
+ *
+ * Matches a VXLAN header (RFC 7348).
+ */
+struct rte_flow_item_vxlan {
+	uint32_t flags:8; /**< Normally 0x08 (I flag). */
+	uint32_t rsvd0:24; /**< Reserved, normally 0x000000. */
+	uint32_t vni:24; /**< VXLAN network identifier. */
+	uint32_t rsvd1:8; /**< Reserved, normally 0x00. */
+};
+
+/**
+ * Matching pattern item definition.
+ *
+ * Except for meta types that do not need one, spec must be a valid pointer
+ * to a structure of the related item type. A mask of the same type can be
+ * provided to tell which bits in spec are to be matched.
+ *
+ * A mask is normally only needed for spec fields matching packet data,
+ * ignored otherwise. See individual item types for more information.
+ *
+ * A NULL mask pointer is allowed and is similar to matching with a full
+ * mask (all ones) spec fields supported by hardware, the remaining fields
+ * are ignored (all zero), there is thus no error checking for unsupported
+ * fields.
+ */
+struct rte_flow_item {
+	enum rte_flow_item_type type; /**< Item type. */
+	const void *spec; /**< Pointer to item specification structure. */
+	const void *mask; /**< Mask for item specification. */
+};
+
+/**
+ * Matching pattern definition.
+ *
+ * A pattern is formed by stacking items starting from the lowest protocol
+ * layer to match. This stacking restriction does not apply to meta items
+ * which can be placed anywhere in the stack with no effect on the meaning
+ * of the resulting pattern.
+ *
+ * The end of the item[] stack is detected either by reaching max or a END
+ * item, whichever comes first.
+ */
+struct rte_flow_pattern {
+	uint32_t max; /**< Maximum number of entries in item[]. */
+	struct rte_flow_item item[]; /**< Stacked items. */
+};
+
+/**
+ * Action types.
+ *
+ * Each possible action is represented by a type. Some have associated
+ * configuration structures. Several actions combined in a list can be
+ * affected to a flow rule. That list is not ordered.
+ *
+ * They fall in three categories:
+ *
+ * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+ *   processing matched packets by subsequent flow rules, unless overridden
+ *   with PASSTHRU.
+ *
+ * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
+ *   for additional processing by subsequent flow rules.
+ *
+ * - Other non terminating meta actions that do not affect the fate of
+ *   packets (END, VOID, MARK, FLAG, COUNT).
+ *
+ * When several actions are combined in a flow rule, they should all have
+ * different types (e.g. dropping a packet twice is not possible). The
+ * defined behavior is for PMDs to only take into account the last action of
+ * a given type found in the list. PMDs still perform error checking on the
+ * entire list.
+ *
+ * Note that PASSTHRU is the only action able to override a terminating
+ * rule.
+ */
+enum rte_flow_action_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for action lists. Prevents further processing of
+	 * actions, thereby ending the list.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_VOID,
+
+	/**
+	 * Leaves packets up for additional processing by subsequent flow
+	 * rules. This is the default when a rule does not contain a
+	 * terminating action, but can be specified to force a rule to
+	 * become non-terminating.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PASSTHRU,
+
+	/**
+	 * [META]
+	 *
+	 * Attaches a 32 bit value to packets.
+	 *
+	 * See struct rte_flow_action_mark.
+	 */
+	RTE_FLOW_ACTION_TYPE_MARK,
+
+	/**
+	 * [META]
+	 *
+	 * Flag packets. Similar to MARK but only affects ol_flags.
+	 *
+	 * Note: a distinctive flag must be defined for it.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_FLAG,
+
+	/**
+	 * Assigns packets to a given queue index.
+	 *
+	 * See struct rte_flow_action_queue.
+	 */
+	RTE_FLOW_ACTION_TYPE_QUEUE,
+
+	/**
+	 * Drops packets.
+	 *
+	 * PASSTHRU overrides this action if both are specified.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_DROP,
+
+	/**
+	 * [META]
+	 *
+	 * Enables counters for this rule.
+	 *
+	 * These counters can be retrieved and reset through rte_flow_query(),
+	 * see struct rte_flow_query_count.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_COUNT,
+
+	/**
+	 * Duplicates packets to a given queue index.
+	 *
+	 * This is normally combined with QUEUE, however when used alone, it
+	 * is actually similar to QUEUE + PASSTHRU.
+	 *
+	 * See struct rte_flow_action_dup.
+	 */
+	RTE_FLOW_ACTION_TYPE_DUP,
+
+	/**
+	 * Similar to QUEUE, except RSS is additionally performed on packets
+	 * to spread them among several queues according to the provided
+	 * parameters.
+	 *
+	 * See struct rte_flow_action_rss.
+	 */
+	RTE_FLOW_ACTION_TYPE_RSS,
+
+	/**
+	 * Redirects packets to the physical function (PF) of the current
+	 * device.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PF,
+
+	/**
+	 * Redirects packets to the virtual function (VF) of the current
+	 * device with the specified ID.
+	 *
+	 * See struct rte_flow_action_vf.
+	 */
+	RTE_FLOW_ACTION_TYPE_VF,
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_MARK
+ *
+ * Attaches a 32 bit value to packets.
+ *
+ * This value is arbitrary and application-defined. For compatibility with
+ * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
+ * also set in ol_flags.
+ */
+struct rte_flow_action_id {
+	uint32_t id; /**< 32 bit value to return with packets. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_QUEUE
+ *
+ * Assign packets to a given queue index.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_queue {
+	uint16_t queue; /**< Queue index to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_COUNT (query)
+ *
+ * Query structure to retrieve and reset flow rule counters.
+ */
+struct rte_flow_query_count {
+	uint32_t reset:1; /**< Reset counters after query [in]. */
+	uint32_t hits_set:1; /**< hits field is set [out]. */
+	uint32_t bytes_set:1; /**< bytes field is set [out]. */
+	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
+	uint64_t hits; /**< Number of hits for this rule [out]. */
+	uint64_t bytes; /**< Number of bytes through this rule [out]. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_DUP
+ *
+ * Duplicates packets to a given queue index.
+ *
+ * This is normally combined with QUEUE, however when used alone, it is
+ * actually similar to QUEUE + PASSTHRU.
+ *
+ * Non-terminating by default.
+ */
+struct rte_flow_action_dup {
+	uint16_t queue; /**< Queue index to duplicate packet to. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_RSS
+ *
+ * Similar to QUEUE, except RSS is additionally performed on packets to
+ * spread them among several queues according to the provided parameters.
+ *
+ * Note: RSS hash result is normally stored in the hash.rss mbuf field,
+ * however it conflicts with the MARK action as they share the same
+ * space. When both actions are specified, the RSS hash is discarded and
+ * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
+ * structure should eventually evolve to store both.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_rss {
+	struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
+	uint16_t queues; /**< Number of entries in queue[]. */
+	uint16_t queue[]; /**< Queues indices to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_VF
+ *
+ * Redirects packets to a virtual function (VF) of the current device.
+ *
+ * Packets matched by a VF pattern item can be redirected to their original
+ * VF ID instead of the specified one. This parameter may not be available
+ * and is not guaranteed to work properly if the VF part is matched by a
+ * prior flow rule or if packets are not addressed to a VF in the first
+ * place.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_vf {
+	uint32_t original:1; /**< Use original VF ID if possible. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint16_t vf; /**< VF ID to redirect packets to. */
+};
+
+/**
+ * Definition of a single action.
+ *
+ * For simple actions without a configuration structure, conf remains NULL.
+ */
+struct rte_flow_action {
+	enum rte_flow_action_type type; /**< Action type. */
+	const void *conf; /**< Pointer to action configuration structure. */
+};
+
+/**
+ * List of actions to associate with a flow.
+ *
+ * The end of the action[] list is detected either by reaching max or a END
+ * action, whichever comes first.
+ */
+struct rte_flow_actions {
+	uint32_t max; /**< Maximum number of entries in action[]. */
+	struct rte_flow_action action[]; /**< Actions to perform. */
+};
+
+/**
+ * Opaque type returned after successfully creating a flow.
+ *
+ * This handle can be used to manage and query the related flow (e.g. to
+ * destroy it or retrieve counters).
+ */
+struct rte_flow;
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_flow_error.cause.
+ */
+enum rte_flow_error_type {
+	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+	RTE_FLOW_ERROR_TYPE_UNDEFINED, /**< Cause is undefined. */
+	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< field. */
+	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure itself. */
+	RTE_FLOW_ERROR_TYPE_PATTERN_MAX, /**< Pattern length (max field). */
+	RTE_FLOW_ERROR_TYPE_PATTERN_ITEM, /**< Specific pattern item. */
+	RTE_FLOW_ERROR_TYPE_PATTERN, /**< Pattern structure itself. */
+	RTE_FLOW_ERROR_TYPE_ACTION_MAX, /**< Number of actions (max field). */
+	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+	RTE_FLOW_ERROR_TYPE_ACTIONS, /**< Actions structure itself. */
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_flow_error {
+	enum rte_flow_error_type type; /**< Cause field and error types. */
+	void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Check whether a flow rule can be created on a given port.
+ *
+ * While this function has no effect on the target device, the flow rule is
+ * validated against its current configuration state and the returned value
+ * should be considered valid by the caller for that state only.
+ *
+ * The returned value is guaranteed to remain valid only as long as no
+ * successful calls to rte_flow_create() or rte_flow_destroy() are made in
+ * the meantime and no device parameter affecting flow rules in any way are
+ * modified, due to possible collisions or resource limitations (although in
+ * such cases EINVAL should not be returned).
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification.
+ * @param[in] actions
+ *   Actions associated with the flow definition.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 if flow rule is valid and can be created. A negative errno value
+ *   otherwise (rte_errno is also set), the following errors are defined:
+ *
+ *   -ENOSYS: underlying device does not support this functionality.
+ *
+ *   -EINVAL: unknown or invalid rule specification.
+ *
+ *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
+ *   bit-masks are unsupported).
+ *
+ *   -EEXIST: collision with an existing rule.
+ *
+ *   -ENOMEM: not enough resources.
+ *
+ *   -EBUSY: action cannot be performed due to busy device resources, may
+ *   succeed if the affected queues or even the entire port are in a stopped
+ *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
+ */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_pattern *pattern,
+		  const struct rte_flow_actions *actions,
+		  struct rte_flow_error *error);
+
+/**
+ * Create a flow rule on a given port.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification.
+ * @param[in] actions
+ *   Actions associated with the flow definition.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise and rte_errno is set
+ *   to the positive version of one of the error codes defined for
+ *   rte_flow_validate().
+ */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_pattern *pattern,
+		const struct rte_flow_actions *actions,
+		struct rte_flow_error *error);
+
+/**
+ * Destroy a flow rule on a given port.
+ *
+ * Failure to destroy a flow rule handle may occur when other flow rules
+ * depend on it, and destroying it would result in an inconsistent state.
+ *
+ * This function is only guaranteed to succeed if handles are destroyed in
+ * reverse order of their creation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to destroy.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error);
+
+/**
+ * Destroy all flow rules associated with a port.
+ *
+ * In the unlikely event of failure, handles are still considered destroyed
+ * and no longer valid but the port must be assumed to be in an inconsistent
+ * state.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error);
+
+/**
+ * Query an existing flow rule.
+ *
+ * This function allows retrieving flow-specific data such as counters.
+ * Data is gathered by special actions which must be present in the flow
+ * rule definition.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to query.
+ * @param action
+ *   Action type to query.
+ * @param[in, out] data
+ *   Pointer to storage for the associated query data type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_H_ */
-- 
2.1.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API
  2016-07-05 18:16  2% [dpdk-dev] [RFC] Generic flow director/filtering/classification API Adrien Mazarguil
                   ` (4 preceding siblings ...)
  @ 2016-08-19 19:32  2% ` Adrien Mazarguil
  2016-08-19 19:32  1%   ` [dpdk-dev] [RFC v2] ethdev: introduce generic flow API Adrien Mazarguil
  5 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-08-19 19:32 UTC (permalink / raw)
  To: dev

Hi All,

Thanks to many for the positive and constructive feedback I've received so
far. Here is the updated specification (v0.7) at last.

I've attempted to address as many comments as possible but could not
process them all just yet. A new section "Future evolutions" has been
added for the remaining topics.

This series adds rte_flow.h to the DPDK tree. Next time I will attempt to
convert the specification as a documentation commit part of the patchset
and actually implement API functions.

I think including the entire document here makes it easier to annotate on
the ML, apologies in advance for the resulting traffic.

Finally I'm off for the next two weeks, do not expect replies from me in
the meantime.

Updates are also available online:

HTML version:
 https://rawgit.com/6WIND/rte_flow/master/rte_flow.html

PDF version:
 https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf                          

Related draft header file (also in the next patch):
 https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h

Git tree:
 https://github.com/6WIND/rte_flow

Changes from v1:

 Specification:

 - Settled on [generic] "flow interface" / "flow API" as the name of this
   framework, matches the rte_flow prefix better.
 - Minor wording changes in several places.
 - Partially added egress (TX) support.
 - Added "unrecoverable errors" as another consequence of overlapping
   rules.
 - Described flow rules groups and their interaction with flow rule
   priorities.
 - Fully described PF and VF meta pattern items so they are not open to
   interpretation anymore.
 - Removed the SIGNATURE meta pattern item as its description was too
   vague, may be re-added later if necessary.
 - Added the PORT pattern item to apply rules to non-default physical
   ports.
 - Entirely redefined the RAW pattern item.
 - Fixed tag error in the ETH item definition.
 - Updated protocol definitions (IPV4, IPV6, ICMP, UDP).
 - Added missing protocols (SCTP, VXLAN).
 - Converted ID action to MARK and FLAG actions, described interaction
   with the RSS hash result in mbufs.
 - Updated COUNT query structure to retrieve the number of bytes.
 - Updated VF action.
 - Documented negative item and action types, those will be used for
   dynamic types generated at run-time.
 - Added blurb about IPv4 options and IPv6 extension headers matching.
 - Updated function definitions.
 - Documented a flush method to remove all rules on a given port at once.
 - Documented the verbose error reporting interface.
 - Documented how the private interface for PMD use will work.
 - Documented expected behavior between successive port initializations.
 - Documented expected behavior for ports not under DPDK control.
 - Updated API migration section.
 - Added future evolutions section.

 Header file:
 
 - Not a draft anymore and can be used as-is for preliminary
   implementations.
 - Flow rule attributes (group, priority, etc) now have their own
   structure provided separately to API functions (struct rte_flow_attr).
 - Group and priority interactions have been documented.
 - Added PORT item.
 - Removed SIGNATURE item.
 - Defined ICMP, SCTP and VXLAN items.
 - Redefined PF, VF, RAW, IPV4, IPV6, UDP and TCP items.
 - Fixed tag error in the ETH item definition.
 - Converted ID action to MARK and FLAG actions.
   hash result in mbufs.
 - Updated COUNT query structure.
 - Updated VF action.
 - Added verbose errors interface.
 - Updated function prototypes according to the above.
 - Defined rte_flow_flush().

--------

======================
Generic flow interface
======================

.. footer::

   v0.7

.. contents::
.. sectnum::
.. raw:: pdf

   PageBreak

Overview
========

DPDK provides several competing interfaces added over time to perform packet
matching and related actions such as filtering and classification.

They must be extended to implement the features supported by newer devices
in order to expose them to applications, however the current design has
several drawbacks:

- Complicated filter combinations which have not been hard-coded cannot be
  expressed.
- Prone to API/ABI breakage when new features must be added to an existing
  filter type, which frequently happens.

>From an application point of view:

- Having disparate interfaces, all optional and lacking in features does not
  make this API easy to use.
- Seemingly arbitrary built-in limitations of filter types based on the
  device they were initially designed for.
- Undefined relationship between different filter types.
- High complexity, considerable undocumented and/or undefined behavior.

Considering the growing number of devices supported by DPDK, adding a new
filter type each time a new feature must be implemented is not sustainable
in the long term. Applications not written to target a specific device
cannot really benefit from such an API.

For these reasons, this document defines an extensible unified API that
encompasses and supersedes these legacy filter types.

.. raw:: pdf

   PageBreak

Current API
===========

Rationale
---------

The reason several competing (and mostly overlapping) filtering APIs are
present in DPDK is due to its nature as a thin layer between hardware and
software.

Each subsequent interface has been added to better match the capabilities
and limitations of the latest supported device, which usually happened to
need an incompatible configuration approach. Because of this, many ended up
device-centric and not usable by applications that were not written for that
particular device.

This document is not the first attempt to address this proliferation issue,
in fact a lot of work has already been done both to create a more generic
interface while somewhat keeping compatibility with legacy ones through a
common call interface (``rte_eth_dev_filter_ctrl()`` with the
``.filter_ctrl`` PMD callback in ``rte_ethdev.h``).

Today, these previously incompatible interfaces are known as filter types
(``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``).

However while trivial to extend with new types, it only shifted the
underlying problem as applications still need to be written for one kind of
filter type, which, as described in the following sections, is not
necessarily implemented by all PMDs that support filtering.

.. raw:: pdf

   PageBreak

Filter types
------------

This section summarizes the capabilities of each filter type.

Although the following list is exhaustive, the description of individual
types may contain inaccuracies due to the lack of documentation or usage
examples.

Note: names are prefixed with ``RTE_ETH_FILTER_``.

``MACVLAN``
~~~~~~~~~~~

Matching:

- L2 source/destination addresses.
- Optional 802.1Q VLAN ID.
- Masking individual fields on a rule basis is not supported.

Action:

- Packets are redirected either to a given VF device using its ID or to the
  PF.

``ETHERTYPE``
~~~~~~~~~~~~~

Matching:

- L2 source/destination addresses (optional).
- Ethertype (no VLAN ID?).
- Masking individual fields on a rule basis is not supported.

Action:

- Receive packets on a given queue.
- Drop packets.

``FLEXIBLE``
~~~~~~~~~~~~

Matching:

- At most 128 consecutive bytes anywhere in packets.
- Masking is supported with byte granularity.
- Priorities are supported (relative to this filter type, undefined
  otherwise).

Action:

- Receive packets on a given queue.

``SYN``
~~~~~~~

Matching:

- TCP SYN packets only.
- One high priority bit can be set to give the highest possible priority to
  this type when other filters with different types are configured.

Action:

- Receive packets on a given queue.

``NTUPLE``
~~~~~~~~~~

Matching:

- Source/destination IPv4 addresses (optional in 2-tuple mode).
- Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes).
- L4 protocol (2 and 5-tuple modes).
- Masking individual fields is supported.
- TCP flags.
- Up to 7 levels of priority relative to this filter type, undefined
  otherwise.
- No IPv6.

Action:

- Receive packets on a given queue.

``TUNNEL``
~~~~~~~~~~

Matching:

- Outer L2 source/destination addresses.
- Inner L2 source/destination addresses.
- Inner VLAN ID.
- IPv4/IPv6 source (destination?) address.
- Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, 802.1BR
  E-Tag).
- Tenant ID for tunneling protocols that have one.
- Any combination of the above can be specified.
- Masking individual fields on a rule basis is not supported.

Action:

- Receive packets on a given queue.

.. raw:: pdf

   PageBreak

``FDIR``
~~~~~~~~

Queries:

- Device capabilities and limitations.
- Device statistics about configured filters (resource usage, collisions).
- Device configuration (matching input set and masks)

Matching:

- Device mode of operation: none (to disable filtering), signature
  (hash-based dispatching from masked fields) or perfect (either MAC VLAN or
  tunnel).
- L2 Ethertype.
- Outer L2 destination address (MAC VLAN mode).
- Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID
  (tunnel mode).
- IPv4 source/destination addresses, ToS, TTL and protocol fields.
- IPv6 source/destination addresses, TC, protocol and hop limits fields.
- UDP source/destination IPv4/IPv6 and ports.
- TCP source/destination IPv4/IPv6 and ports.
- SCTP source/destination IPv4/IPv6, ports and verification tag field.
- Note, only one protocol type at once (either only L2 Ethertype, basic
  IPv6, IPv4+UDP, IPv4+TCP and so on).
- VLAN TCI (extended API).
- At most 16 bytes to match in payload (extended API). A global device
  look-up table specifies for each possible protocol layer (unknown, raw,
  L2, L3, L4) the offset to use for each byte (they do not need to be
  contiguous) and the related bit-mask.
- Whether packet is addressed to PF or VF, in that case its ID can be
  matched as well (extended API).
- Masking most of the above fields is supported, but simultaneously affects
  all filters configured on a device.
- Input set can be modified in a similar fashion for a given device to
  ignore individual fields of filters (i.e. do not match the destination
  address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_**
  macros). Configuring this also affects RSS processing on **i40e**.
- Filters can also provide 32 bits of arbitrary data to return as part of
  matched packets.

Action:

- **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue.
- **RTE_ETH_FDIR_REJECT**: drop packet immediately.
- **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list,
  otherwise process it with subsequent filters.
- For accepted packets and if requested by filter, either 32 bits of
  arbitrary data and four bytes of matched payload (only in case of flex
  bytes matching), or eight bytes of matched payload (flex also) are added
  to meta data.

.. raw:: pdf

   PageBreak

``HASH``
~~~~~~~~

Not an actual filter type. Provides and retrieves the global device
configuration (per port or entire NIC) for hash functions and their
properties.

Hash function selection: "default" (keep current), XOR or Toeplitz.

This function can be configured per flow type (**RTE_ETH_FLOW_**
definitions), supported types are:

- Unknown.
- Raw.
- Fragmented or non-fragmented IPv4.
- Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other).
- Fragmented or non-fragmented IPv6.
- Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other).
- L2 payload.
- IPv6 with extensions.
- IPv6 with L4 (TCP, UDP) and extensions.

``L2_TUNNEL``
~~~~~~~~~~~~~

Matching:

- All packets received on a given port.

Action:

- Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
  802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag
  is implemented at the moment).
- VF ID to use for tag insertion (currently unused).
- Destination pool for tag based forwarding (pools are IDs that can be
  affected to ports, duplication occurs if the same ID is shared by several
  ports of the same NIC).

.. raw:: pdf

   PageBreak

Driver support
--------------

======== ======= ========= ======== === ====== ====== ==== ==== =========
Driver   MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH L2_TUNNEL
======== ======= ========= ======== === ====== ====== ==== ==== =========
bnx2x
cxgbe
e1000            yes       yes      yes yes
ena
enic                                                  yes
fm10k
i40e     yes     yes                           yes    yes  yes
ixgbe            yes                yes yes           yes       yes
mlx4
mlx5                                                  yes
szedata2
======== ======= ========= ======== === ====== ====== ==== ==== =========

Flow director
-------------

Flow director (FDIR) is the name of the most capable filter type, which
covers most features offered by others. As such, it is the most widespread
in PMDs that support filtering (i.e. all of them besides **e1000**).

It is also the only type that allows an arbitrary 32 bits value provided by
applications to be attached to a filter and returned with matching packets
instead of relying on the destination queue to recognize flows.

Unfortunately, even FDIR requires applications to be aware of low-level
capabilities and limitations (most of which come directly from **ixgbe** and
**i40e**):

- Bit-masks are set globally per device (port?), not per filter.
- Configuration state is not expected to be saved by the driver, and
  stopping/restarting a port requires the application to perform it again
  (API documentation is also unclear about this).
- Monolithic approach with ABI issues as soon as a new kind of flow or
  combination needs to be supported.
- Cryptic global statistics/counters.
- Unclear about how priorities are managed; filters seem to be arranged as a
  linked list in hardware (possibly related to configuration order).

Packet alteration
-----------------

One interesting feature is that the L2 tunnel filter type implements the
ability to alter incoming packets through a filter (in this case to
encapsulate them), thus the **mlx5** flow encap/decap features are not a
foreign concept.

.. raw:: pdf

   PageBreak

Proposed API
============

Terminology
-----------

- **Flow API**: overall framework affecting the fate of selected packets,
  covers everything described in this document.
- **Filtering API**: an alias for *Flow API*.
- **Matching pattern**: properties to look for in packets, a combination of
  any number of items.
- **Pattern item**: part of a pattern that either matches packet data
  (protocol header, payload or derived information), or specifies properties
  of the pattern itself.
- **Actions**: what needs to be done when a packet is matched by a pattern.
- **Flow rule**: this is the result of combining a *matching pattern* with
  *actions*.
- **Filter rule**: a less generic term than *flow rule*, can otherwise be
  used interchangeably.
- **Hit**: a flow rule is said to be *hit* when processing a matching
  packet.

Requirements
------------

As described in the previous section, there is a growing need for a common
method to configure filtering and related actions in a hardware independent
fashion.

The flow API should not disallow any filter combination by design and must
remain as simple as possible to use. It can simply be defined as a method to
perform one or several actions on selected packets.

PMDs are aware of the capabilities of the device they manage and should be
responsible for preventing unsupported or conflicting combinations.

This approach is fundamentally different as it places most of the burden on
the software side of the PMD instead of having device capabilities directly
mapped to API functions, then expecting applications to work around ensuing
compatibility issues.

Requirements for a new API:

- Flexible and extensible without causing API/ABI problems for existing
  applications.
- Should be unambiguous and easy to use.
- Support existing filtering features and actions listed in `Filter types`_.
- Support packet alteration.
- In case of overlapping filters, their priority should be well documented.
- Support filter queries (for example to retrieve counters).
- Support egress (TX) matching and specific actions.

.. raw:: pdf

   PageBreak

High level design
-----------------

The chosen approach to make filtering as generic as possible is by
expressing matching patterns through lists of items instead of the flat
structures used in DPDK today, enabling combinations that are not predefined
and thus being more versatile.

Flow rules can have several distinct actions (such as counting,
encapsulating, decapsulating before redirecting packets to a particular
queue, etc.), instead of relying on several rules to achieve this and having
applications deal with hardware implementation details regarding their
order.

Support for different priority levels on a rule basis is provided, for
example in order to force a more specific rule come before a more generic
one for packets matched by both, however hardware support for more than a
single priority level cannot be guaranteed. When supported, the number of
available priority levels is usually low, which is why they can also be
implemented in software by PMDs (e.g. missing priority levels may be
emulated by reordering rules).

In order to remain as hardware agnostic as possible, by default all rules
are considered to have the same priority, which means that the order between
overlapping rules (when a packet is matched by several filters) is
undefined, packet duplication or unrecoverable errors may even occur as a
result.

PMDs may refuse to create overlapping rules at a given priority level when
they can be detected (e.g. if a pattern matches an existing filter).

Thus predictable results for a given priority level can only be achieved
with non-overlapping rules, using perfect matching on all protocol layers.

Flow rules can also be grouped, the flow rule priority is specific to the
group they belong to. All flow rules in a given group are thus processed
either before or after another group.

Support for multiple actions per rule may be implemented internally on top
of non-default hardware priorities, as a result both features may not be
simultaneously available to applications.

Considering that allowed pattern/actions combinations cannot be known in
advance and would result in an unpractically large number of capabilities to
expose, a method is provided to validate a given rule from the current
device configuration state without actually adding it (akin to a "dry run"
mode).

This enables applications to check if the rule types they need is supported
at initialization time, before starting their data path. This method can be
used anytime, its only requirement being that the resources needed by a rule
must exist (e.g. a target RX queue must be configured first).

Each defined rule is associated with an opaque handle managed by the PMD,
applications are responsible for keeping it. These can be used for queries
and rules management, such as retrieving counters or other data and
destroying them.

To avoid resource leaks on the PMD side, handles must be explicitly
destroyed by the application before releasing associated resources such as
queues and ports.

Integration
-----------

To avoid ABI breakage, this new interface will be implemented through the
existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using
**RTE_ETH_FILTER_GENERIC** as a new filter type.

However a public front-end API described in `Rules management`_ will
be added as the preferred method to use it.

Once discussions with the community have converged to a definite API, legacy
filter types should be deprecated and a deadline defined to remove their
support entirely.

PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC** or
drop filtering support entirely. Less maintained PMDs for older hardware may
lose support at this point.

The notion of filter type will then be deprecated and subsequently dropped
to avoid confusion between both frameworks.

Implementation details
======================

Flow rule
---------

A flow rule is the combination a matching pattern with a list of actions,
and is the basis of this API.

They also have several other attributes described in the following sections.

Groups
~~~~~~

Flow rules can be grouped by assigning them a common group number. Lower
values have higher priority. Group 0 has the highest priority.

Although optional, applications are encouraged to group similar rules as
much as possible to fully take advantage of hardware capabilities
(e.g. optimized matching) and work around limitations (e.g. a single pattern
type possibly allowed in a given group).

Note that support for more than a single group is not guaranteed.

Priorities
~~~~~~~~~~

A priority level can be assigned to a flow rule. Like groups, lower values
denote higher priority, with 0 as the maximum.

A rule with priority 0 in group 8 is always matched after a rule with
priority 8 in group 0.

Group and priority levels are arbitrary and up to the application, they do
not need to be contiguous nor start from 0, however the maximum number
varies between devices and may be affected by existing flow rules.

If a packet is matched by several rules of a given group for a given
priority level, the outcome is undefined. It can take any path, may be
duplicated or even cause unrecoverable errors.

Note that support for more than a single priority level is not guaranteed.

Traffic direction
~~~~~~~~~~~~~~~~~

Flow rules can apply to inbound and/or outbound traffic (ingress/egress).

Several pattern items and actions are valid and can be used in both
directions. Those valid for only one direction are described as such.

Specifying both directions at once is not recommended but may be valid in
some cases, such as incrementing the same counter twice.

Not specifying any direction is currently an error.

.. raw:: pdf

   PageBreak

Matching pattern
~~~~~~~~~~~~~~~~

A matching pattern comprises any number of items of various types.

Items are arranged in a list to form a matching pattern for packets. They
fall in two categories:

- Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, SCTP, VXLAN
  and so on), usually associated with a specification structure. These must
  be stacked in the same order as the protocol layers to match, starting
  from L2.

- Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, PORT
  and so on), often without a specification structure. Since they are meta
  data that does not match packet contents, these can be specified anywhere
  within item lists without affecting the protocol matching items.

Most item specifications can be optionally paired with a mask to narrow the
specific fields or bits to be matched.

- Items are defined with ``struct rte_flow_item``.
- Patterns are defined with ``struct rte_flow_pattern``.

Example of an item specification matching an Ethernet header:

+-----------------------------------------+
| Ethernet                                |
+==========+=========+====================+
| ``spec`` | ``src`` | ``00:01:02:03:04`` |
|          +---------+--------------------+
|          | ``dst`` | ``00:2a:66:00:01`` |
+----------+---------+--------------------+
| ``mask`` | ``src`` | ``00:ff:ff:ff:00`` |
|          +---------+--------------------+
|          | ``dst`` | ``00:00:00:00:ff`` |
+----------+---------+--------------------+

Non-masked bits stand for any value, Ethernet headers with the following
properties are thus matched:

- ``src``: ``??:01:02:03:??``
- ``dst``: ``??:??:??:??:01``

Except for meta types that do not need one, ``spec`` must be a valid pointer
to a structure of the related item type. A ``mask`` of the same type can be
provided to tell which bits in ``spec`` are to be matched.

A mask is normally only needed for ``spec`` fields matching packet data,
ignored otherwise. See individual item types for more information.

A ``NULL`` mask pointer is allowed and is similar to matching with a full
mask (all ones) ``spec`` fields supported by hardware, the remaining fields
are ignored (all zeroes), there is thus no error checking for unsupported
fields.

.. raw:: pdf

   PageBreak

Matching pattern items for packet data must be naturally stacked (ordered
from lowest to highest protocol layer), as in the following examples:

+--------------+
| TCPv4 as L4  |
+===+==========+
| 0 | Ethernet |
+---+----------+
| 1 | IPv4     |
+---+----------+
| 2 | TCP      |
+---+----------+

+----------------+
| TCPv6 in VXLAN |
+===+============+
| 0 | Ethernet   |
+---+------------+
| 1 | IPv4       |
+---+------------+
| 2 | UDP        |
+---+------------+
| 3 | VXLAN      |
+---+------------+
| 4 | Ethernet   |
+---+------------+
| 5 | IPv6       |
+---+------------+
| 6 | TCP        |
+---+------------+

+-----------------------------+
| TCPv4 as L4 with meta items |
+===+=========================+
| 0 | VOID                    |
+---+-------------------------+
| 1 | Ethernet                |
+---+-------------------------+
| 2 | VOID                    |
+---+-------------------------+
| 3 | IPv4                    |
+---+-------------------------+
| 4 | TCP                     |
+---+-------------------------+
| 5 | VOID                    |
+---+-------------------------+
| 6 | VOID                    |
+---+-------------------------+

The above example shows how meta items do not affect packet data matching
items, as long as those remain stacked properly. The resulting matching
pattern is identical to "TCPv4 as L4".

+----------------+
| UDPv6 anywhere |
+===+============+
| 0 | IPv6       |
+---+------------+
| 1 | UDP        |
+---+------------+

If supported by the PMD, omitting one or several protocol layers at the
bottom of the stack as in the above example (missing an Ethernet
specification) enables hardware to look anywhere in packets.

This is an alias for specifying `ANY`_ with ``min = 0`` and ``max = 0``
properties as the first item.

It is unspecified whether the payload of supported encapsulations
(e.g. VXLAN inner packet) is matched by such a pattern, which may apply to
inner, outer or both packets.

+---------------------+
| Invalid, missing L3 |
+===+=================+
| 0 | Ethernet        |
+---+-----------------+
| 1 | UDP             |
+---+-----------------+

The above pattern is invalid due to a missing L3 specification between L2
and L4. It is only allowed at the bottom and at the top of the stack.

Meta item types
~~~~~~~~~~~~~~~

These do not match packet data but affect how the pattern is processed, most
of them do not need a specification structure. This particularity allows
them to be specified anywhere without affecting other item types.

``END``
^^^^^^^

End marker for item lists. Prevents further processing of items, thereby
ending the pattern.

- Its numeric value is **0** for convenience.
- PMD support is mandatory.
- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| END                |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

``VOID``
^^^^^^^^

Used as a placeholder for convenience. It is ignored and simply discarded by
PMDs.

- PMD support is mandatory.
- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| VOID               |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

One usage example for this type is generating rules that share a common
prefix quickly without reallocating memory, only by updating item types:

+------------------------+
| TCP, UDP or ICMP as L4 |
+===+====================+
| 0 | Ethernet           |
+---+--------------------+
| 1 | IPv4               |
+---+------+------+------+
| 2 | UDP  | VOID | VOID |
+---+------+------+------+
| 3 | VOID | TCP  | VOID |
+---+------+------+------+
| 4 | VOID | VOID | ICMP |
+---+------+------+------+

.. raw:: pdf

   PageBreak

``INVERT``
^^^^^^^^^^

Inverted matching, i.e. process packets that do not match the pattern.

- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| INVERT             |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

Usage example in order to match non-TCPv4 packets only:

+--------------------+
| Anything but TCPv4 |
+===+================+
| 0 | INVERT         |
+---+----------------+
| 1 | Ethernet       |
+---+----------------+
| 2 | IPv4           |
+---+----------------+
| 3 | TCP            |
+---+----------------+

``PF``
^^^^^^

Matches packets addressed to the physical function of the device.

If the underlying device function differs from the one that would normally
receive the matched traffic, specifying this item prevents it from reaching
that device unless the flow rule contains a `PF (action)`_. Packets are not
duplicated between device instances by default.

- Likely to return an error or never match any traffic if applied to a VF
  device.
- Can be combined with any number of `VF`_ items to match both PF and VF
  traffic.
- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| PF                 |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

``VF``
^^^^^^

Matches packets addressed to a virtual function ID of the device.

If the underlying device function differs from the one that would normally
receive the matched traffic, specifying this item prevents it from reaching
that device unless the flow rule contains a `VF (action)`_. Packets are not
duplicated between device instances by default.

- Likely to return an error or never match any traffic if this causes a VF
  device to match traffic addressed to a different VF.
- Can be specified multiple times to match traffic addressed to several VFs.
- Can be combined with a `PF`_ item to match both PF and VF traffic.
- Only ``spec`` needs to be defined, ``mask`` is ignored.

+-------------------------------------------------+
| VF                                              |
+==========+=========+============================+
| ``spec`` | ``any`` | ignore the specified VF ID |
|          +---------+----------------------------+
|          | ``vf``  | destination VF ID          |
+----------+---------+----------------------------+
| ``mask`` | ignored                              |
+----------+--------------------------------------+

``PORT``
^^^^^^^^

Matches packets coming from the specified physical port of the underlying
device.

The first PORT item overrides the physical port normally associated with the
specified DPDK input port (port_id). This item can be provided several times
to match additional physical ports.

Note that physical ports are not necessarily tied to DPDK input ports
(port_id) when those are not under DPDK control. Possible values are
specific to each device, they are not necessarily indexed from zero and may
not be contiguous.

As a device property, the list of allowed values as well as the value
associated with a port_id should be retrieved by other means.

- Only ``spec`` needs to be defined, ``mask`` is ignored.

+--------------------------------------------+
| PORT                                       |
+==========+===========+=====================+
| ``spec`` | ``index`` | physical port index |
+----------+-----------+---------------------+
| ``mask`` | ignored                         |
+----------+---------------------------------+

.. raw:: pdf

   PageBreak

Data matching item types
~~~~~~~~~~~~~~~~~~~~~~~~

Most of these are basically protocol header definitions with associated
bit-masks. They must be specified (stacked) from lowest to highest protocol
layer.

The following list is not exhaustive as new protocols will be added in the
future.

``ANY``
^^^^^^^

Matches any protocol in place of the current layer, a single ANY may also
stand for several protocol layers.

This is usually specified as the first pattern item when looking for a
protocol anywhere in a packet.

- A maximum value of **0** requests matching any number of protocol layers
  above or equal to the minimum value, a maximum value lower than the
  minimum one is otherwise invalid.
- Only ``spec`` needs to be defined, ``mask`` is ignored.

+-----------------------------------------------------------------------+
| ANY                                                                   |
+==========+=========+==================================================+
| ``spec`` | ``min`` | minimum number of layers covered                 |
|          +---------+--------------------------------------------------+
|          | ``max`` | maximum number of layers covered, 0 for infinity |
+----------+---------+--------------------------------------------------+
| ``mask`` | ignored                                                    |
+----------+------------------------------------------------------------+

Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
or IPv6) matched by the second ANY specification:

+----------------------------------+
| TCP in VXLAN with wildcards      |
+===+==============================+
| 0 | Ethernet                     |
+---+-----+----------+---------+---+
| 1 | ANY | ``spec`` | ``min`` | 2 |
|   |     |          +---------+---+
|   |     |          | ``max`` | 2 |
+---+-----+----------+---------+---+
| 2 | VXLAN                        |
+---+------------------------------+
| 3 | Ethernet                     |
+---+-----+----------+---------+---+
| 4 | ANY | ``spec`` | ``min`` | 1 |
|   |     |          +---------+---+
|   |     |          | ``max`` | 1 |
+---+-----+----------+---------+---+
| 5 | TCP                          |
+---+------------------------------+

.. raw:: pdf

   PageBreak

``RAW``
^^^^^^^

Matches a byte string of a given length at a given offset.

Offset is either absolute (using the start of the packet) or relative to the
end of the previous matched item in the stack, in which case negative values
are allowed.

If search is enabled, offset is used as the starting point. The search area
can be delimited by setting limit to a nonzero value, which is the maximum
number of bytes after offset where the pattern may start.

Matching a zero-length pattern is allowed, doing so resets the relative
offset for subsequent items.

- ``mask`` only affects the pattern field.

+---------------------------------------------------------------------------+
| RAW                                                                       |
+==========+==============+=================================================+
| ``spec`` | ``relative`` | look for pattern after the previous item        |
|          +--------------+-------------------------------------------------+
|          | ``search``   | search pattern from offset (see also ``limit``) |
|          +--------------+-------------------------------------------------+
|          | ``reserved`` | reserved, must be set to zero                   |
|          +--------------+-------------------------------------------------+
|          | ``offset``   | absolute or relative offset for ``pattern``     |
|          +--------------+-------------------------------------------------+
|          | ``limit``    | search area limit for start of ``pattern``      |
|          +--------------+-------------------------------------------------+
|          | ``length``   | ``pattern`` length                              |
|          +--------------+-------------------------------------------------+
|          | ``pattern``  | byte string to look for                         |
+----------+--------------+-------------------------------------------------+
| ``mask`` | ``relative`` | ignored                                         |
|          +--------------+-------------------------------------------------+
|          | ``search``   | ignored                                         |
|          +--------------+-------------------------------------------------+
|          | ``reserved`` | ignored                                         |
|          +--------------+-------------------------------------------------+
|          | ``offset``   | ignored                                         |
|          +--------------+-------------------------------------------------+
|          | ``limit``    | ignored                                         |
|          +--------------+-------------------------------------------------+
|          | ``length``   | ignored                                         |
|          +--------------+-------------------------------------------------+
|          | ``pattern``  | bit-mask of the same byte length as ``pattern`` |
+----------+--------------+-------------------------------------------------+

Example pattern looking for several strings at various offsets of a UDP
payload, using combined RAW items:

.. raw:: pdf

   PageBreak

+-------------------------------------------+
| UDP payload matching                      |
+===+=======================================+
| 0 | Ethernet                              |
+---+---------------------------------------+
| 1 | IPv4                                  |
+---+---------------------------------------+
| 2 | UDP                                   |
+---+-----+----------+--------------+-------+
| 3 | RAW | ``spec`` | ``relative`` | 1     |
|   |     |          +--------------+-------+
|   |     |          | ``search``   | 1     |
|   |     |          +--------------+-------+
|   |     |          | ``offset``   | 10    |
|   |     |          +--------------+-------+
|   |     |          | ``limit``    | 0     |
|   |     |          +--------------+-------+
|   |     |          | ``length``   | 3     |
|   |     |          +--------------+-------+
|   |     |          | ``pattern``  | "foo" |
+---+-----+----------+--------------+-------+
| 4 | RAW | ``spec`` | ``relative`` | 1     |
|   |     |          +--------------+-------+
|   |     |          | ``search``   | 0     |
|   |     |          +--------------+-------+
|   |     |          | ``offset``   | 20    |
|   |     |          +--------------+-------+
|   |     |          | ``limit``    | 0     |
|   |     |          +--------------+-------+
|   |     |          | ``length``   | 3     |
|   |     |          +--------------+-------+
|   |     |          | ``pattern``  | "bar" |
+---+-----+----------+--------------+-------+
| 5 | RAW | ``spec`` | ``relative`` | 1     |
|   |     |          +--------------+-------+
|   |     |          | ``search``   | 0     |
|   |     |          +--------------+-------+
|   |     |          | ``offset``   | -29   |
|   |     |          +--------------+-------+
|   |     |          | ``limit``    | 0     |
|   |     |          +--------------+-------+
|   |     |          | ``length``   | 3     |
|   |     |          +--------------+-------+
|   |     |          | ``pattern``  | "baz" |
+---+-----+----------+--------------+-------+

This translates to:

- Locate "foo" at least 10 bytes deep inside UDP payload.
- Locate "bar" after "foo" plus 20 bytes.
- Locate "baz" after "bar" minus 29 bytes.

Such a packet may be represented as follows (not to scale)::

 0                     >= 10 B           == 20 B
 |                  |<--------->|     |<--------->|
 |                  |           |     |           |
 |-----|------|-----|-----|-----|-----|-----------|-----|------|
 | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
 |-----|------|-----|-----|-----|-----|-----------|-----|------|
                          |                             |
                          |<--------------------------->|
                                      == 29 B

Note that matching subsequent pattern items would resume after "baz", not
"bar" since matching is always performed after the previous item of the
stack.

.. raw:: pdf

   PageBreak

``ETH``
^^^^^^^

Matches an Ethernet header.

- ``dst``: destination MAC.
- ``src``: source MAC.
- ``type``: EtherType.
- ``tags``: number of 802.1Q/ad tags defined.
- ``tag[]``: 802.1Q/ad tag definitions, outermost first. For each one:

 - ``tpid``: Tag protocol identifier.
 - ``tci``: Tag control information.

``IPV4``
^^^^^^^^

Matches an IPv4 header.

Note: IPv4 options are handled by dedicated pattern items.

- ``hdr``: IPv4 header definition (``rte_ip.h``).

``IPV6``
^^^^^^^^

Matches an IPv6 header.

Note: IPv6 options are handled by dedicated pattern items.

- ``hdr``: IPv6 header definition (``rte_ip.h``).

``ICMP``
^^^^^^^^

Matches an ICMP header.

- ``hdr``: ICMP header definition (``rte_icmp.h``).

``UDP``
^^^^^^^

Matches a UDP header.

- ``hdr``: UDP header definition (``rte_udp.h``).

``TCP``
^^^^^^^

Matches a TCP header.

- ``hdr``: TCP header definition (``rte_tcp.h``).

``SCTP``
^^^^^^^^

Matches a SCTP header.

- ``hdr``: SCTP header definition (``rte_sctp.h``).

``VXLAN``
^^^^^^^^^

Matches a VXLAN header (RFC 7348).

- ``flags``: normally 0x08 (I flag).
- ``rsvd0``: reserved, normally 0x000000.
- ``vni``: VXLAN network identifier.
- ``rsvd1``: reserved, normally 0x00.

.. raw:: pdf

   PageBreak

Actions
~~~~~~~

Each possible action is represented by a type. Some have associated
configuration structures. Several actions combined in a list can be affected
to a flow rule. That list is not ordered.

At least one action must be defined in a filter rule in order to do
something with matched packets.

- Actions are defined with ``struct rte_flow_action``.
- A list of actions is defined with ``struct rte_flow_actions``.

They fall in three categories:

- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
  processing matched packets by subsequent flow rules, unless overridden
  with PASSTHRU.

- Non terminating actions (PASSTHRU, DUP) that leave matched packets up for
  additional processing by subsequent flow rules.

- Other non terminating meta actions that do not affect the fate of packets
  (END, VOID, MARK, FLAG, COUNT).

When several actions are combined in a flow rule, they should all have
different types (e.g. dropping a packet twice is not possible). The defined
behavior is for PMDs to only take into account the last action of a given
type found in the list. PMDs still perform error checking on the entire
list.

*Note that PASSTHRU is the only action having the ability to override a
terminating rule.*

.. raw:: pdf

   PageBreak

Example of an action that redirects packets to queue index 10:

+----------------+
| QUEUE          |
+===========+====+
| ``queue`` | 10 |
+-----------+----+

Action lists examples, their order is not significant, applications must
consider all actions to be performed simultaneously:

+----------------+
| Count and drop |
+=======+========+
| COUNT |        |
+-------+--------+
| DROP  |        |
+-------+--------+

+--------------------------+
| Tag, count and redirect  |
+=======+===========+======+
| MARK  | ``mark``  | 0x2a |
+-------+-----------+------+
| COUNT |                  |
+-------+-----------+------+
| QUEUE | ``queue`` | 10   |
+-------+-----------+------+

+-----------------------+
| Redirect to queue 5   |
+=======+===============+
| DROP  |               |
+-------+-----------+---+
| QUEUE | ``queue`` | 5 |
+-------+-----------+---+

In the above example, considering both actions are performed simultaneously,
its end result is that only QUEUE has any effect.

+-----------------------+
| Redirect to queue 3   |
+=======+===========+===+
| QUEUE | ``queue`` | 5 |
+-------+-----------+---+
| VOID  |               |
+-------+-----------+---+
| QUEUE | ``queue`` | 3 |
+-------+-----------+---+

As previously described, only the last action of a given type found in the
list is taken into account. The above example also shows that VOID is
ignored.

.. raw:: pdf

   PageBreak

Action types
~~~~~~~~~~~~

Common action types are described in this section. Like pattern item types,
this list is not exhaustive as new actions will be added in the future.

``END`` (action)
^^^^^^^^^^^^^^^^

End marker for action lists. Prevents further processing of actions, thereby
ending the list.

- Its numeric value is **0** for convenience.
- PMD support is mandatory.
- No configurable property.

+---------------+
| END           |
+===============+
| no properties |
+---------------+

``VOID`` (action)
^^^^^^^^^^^^^^^^^

Used as a placeholder for convenience. It is ignored and simply discarded by
PMDs.

- PMD support is mandatory.
- No configurable property.

+---------------+
| VOID          |
+===============+
| no properties |
+---------------+

``PASSTHRU``
^^^^^^^^^^^^

Leaves packets up for additional processing by subsequent flow rules. This
is the default when a rule does not contain a terminating action, but can be
specified to force a rule to become non-terminating.

- No configurable property.

+---------------+
| PASSTHRU      |
+===============+
| no properties |
+---------------+

Example to copy a packet to a queue and continue processing by subsequent
flow rules:

+--------------------------+
| Copy to queue 8          |
+==========+===============+
| PASSTHRU |               |
+----------+-----------+---+
| QUEUE    | ``queue`` | 8 |
+----------+-----------+---+

.. raw:: pdf

   PageBreak

``MARK``
^^^^^^^^

Attaches a 32 bit value to packets.

This value is arbitrary and application-defined. For compatibility with FDIR
it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
also set in ``ol_flags``.

+------------------------------------------------+
| MARK                                           |
+==========+=====================================+
| ``mark`` | 32 bit value to return with packets |
+----------+-------------------------------------+

``FLAG``
^^^^^^^^

Flag packets. Similar to `MARK`_ but only affects ``ol_flags``.

Note: a distinctive flag must be defined for it.

+---------------+
| FLAG          |
+===============+
| no properties |
+---------------+

``QUEUE``
^^^^^^^^^

Assigns packets to a given queue index.

- Terminating by default.

+--------------------------------+
| QUEUE                          |
+===========+====================+
| ``queue`` | queue index to use |
+-----------+--------------------+

``DROP``
^^^^^^^^

Drop packets.

- No configurable property.
- Terminating by default.
- PASSTHRU overrides this action if both are specified.

+---------------+
| DROP          |
+===============+
| no properties |
+---------------+

.. raw:: pdf

   PageBreak

``COUNT``
^^^^^^^^^

Enables counters for this rule.

These counters can be retrieved and reset through ``rte_flow_query()``, see
``struct rte_flow_query_count``.

- Counters can be retrieved with ``rte_flow_query()``.
- No configurable property.

+---------------+
| COUNT         |
+===============+
| no properties |
+---------------+

Query structure to retrieve and reset flow rule counters:

+---------------------------------------------------------+
| COUNT query                                             |
+===============+=====+===================================+
| ``reset``     | in  | reset counter after query         |
+---------------+-----+-----------------------------------+
| ``hits_set``  | out | ``hits`` field is set             |
+---------------+-----+-----------------------------------+
| ``bytes_set`` | out | ``bytes`` field is set            |
+---------------+-----+-----------------------------------+
| ``hits``      | out | number of hits for this rule      |
+---------------+-----+-----------------------------------+
| ``bytes``     | out | number of bytes through this rule |
+---------------+-----+-----------------------------------+

``DUP``
^^^^^^^

Duplicates packets to a given queue index.

This is normally combined with QUEUE, however when used alone, it is
actually similar to QUEUE + PASSTHRU.

- Non-terminating by default.

+------------------------------------------------+
| DUP                                            |
+===========+====================================+
| ``queue`` | queue index to duplicate packet to |
+-----------+------------------------------------+

``RSS``
^^^^^^^

Similar to QUEUE, except RSS is additionally performed on packets to spread
them among several queues according to the provided parameters.

Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
however it conflicts with the `MARK`_ action as they share the same
space. When both actions are specified, the RSS hash is discarded and
``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
structure should eventually evolve to store both.

- Terminating by default.

+---------------------------------------------+
| RSS                                         |
+==============+==============================+
| ``rss_conf`` | RSS parameters               |
+--------------+------------------------------+
| ``queues``   | number of entries in queue[] |
+--------------+------------------------------+
| ``queue[]``  | queue indices to use         |
+--------------+------------------------------+

.. raw:: pdf

   PageBreak

``PF`` (action)
^^^^^^^^^^^^^^^

Redirects packets to the physical function (PF) of the current device.

- No configurable property.
- Terminating by default.

+---------------+
| PF            |
+===============+
| no properties |
+---------------+

``VF`` (action)
^^^^^^^^^^^^^^^

Redirects packets to a virtual function (VF) of the current device.

Packets matched by a VF pattern item can be redirected to their original VF
ID instead of the specified one. This parameter may not be available and is
not guaranteed to work properly if the VF part is matched by a prior flow
rule or if packets are not addressed to a VF in the first place.

- Terminating by default.

+-----------------------------------------------+
| VF                                            |
+==============+================================+
| ``original`` | use original VF ID if possible |
+--------------+--------------------------------+
| ``vf``       | VF ID to redirect packets to   |
+--------------+--------------------------------+

Negative types
~~~~~~~~~~~~~~

All specified pattern items (``enum rte_flow_item_type``) and actions
(``enum rte_flow_action_type``) use positive identifiers.

The negative space is reserved for dynamic types generated by PMDs during
run-time, PMDs may encounter them as a result but do not have to accept the
negative types they did not generate.

The method to generate them has not been specified yet.

Planned types
~~~~~~~~~~~~~

Pattern item types will be added as new protocols are implemented.

Variable headers support through dedicated pattern items, for example in
order to match specific IPv4 options and IPv6 extension headers, these would
be stacked behind IPv4/IPv6 items.

Other action types are planned but not defined yet. These actions will add
the ability to alter matched packets in several ways, such as performing
encapsulation/decapsulation of tunnel headers on specific flows.

.. raw:: pdf

   PageBreak

Rules management
----------------

A simple API with few functions is provided to fully manage flows.

Each created flow rule is associated with an opaque, PMD-specific handle
pointer. The application is responsible for keeping it until the rule is
destroyed.

Flows rules are represented by ``struct rte_flow`` objects.

Validation
~~~~~~~~~~

Given that expressing a definite set of device capabilities with this API is
not practical, a dedicated function is provided to check if a flow rule is
supported and can be created.

::

 int
 rte_flow_validate(uint8_t port_id,
                   const struct rte_flow_attr *attr,
                   const struct rte_flow_pattern *pattern,
                   const struct rte_flow_actions *actions,
                   struct rte_flow_error *error);

While this function has no effect on the target device, the flow rule is
validated against its current configuration state and the returned value
should be considered valid by the caller for that state only.

The returned value is guaranteed to remain valid only as long as no
successful calls to rte_flow_create() or rte_flow_destroy() are made in the
meantime and no device parameter affecting flow rules in any way are
modified, due to possible collisions or resource limitations (although in
such cases ``EINVAL`` should not be returned).

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``attr``: flow rule attributes.
- ``pattern``: pattern specification.
- ``actions``: actions associated with the flow definition.
- ``error``: perform verbose error reporting if not NULL.

Return value:

- **0** if flow rule is valid and can be created. A negative errno value
  otherwise (``rte_errno`` is also set), the following errors are defined.
- ``-ENOSYS``: underlying device does not support this functionality.
- ``-EINVAL``: unknown or invalid rule specification.
- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
  bit-masks are unsupported).
- ``-EEXIST``: collision with an existing rule.
- ``-ENOMEM``: not enough resources.
- ``-EBUSY``: action cannot be performed due to busy device resources, may
  succeed if the affected queues or even the entire port are in a stopped
  state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).

.. raw:: pdf

   PageBreak

Creation
~~~~~~~~

Creating a flow rule is similar to validating one, except the rule is
actually created and a handle returned.

::

 struct rte_flow *
 rte_flow_create(uint8_t port_id,
                 const struct rte_flow_attr *attr,
                 const struct rte_flow_pattern *pattern,
                 const struct rte_flow_actions *actions,
                 struct rte_flow_error *error);

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``attr``: flow rule attributes.
- ``pattern``: pattern specification.
- ``actions``: actions associated with the flow definition.
- ``error``: perform verbose error reporting if not NULL.

Return value:

A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
to the positive version of one of the error codes defined for
``rte_flow_validate()``.

Destruction
~~~~~~~~~~~

Flow rules destruction is not automatic, and a queue or a port should not be
released if any are still attached to them. Applications must take care of
performing this step before releasing resources.

::

 int
 rte_flow_destroy(uint8_t port_id,
                  struct rte_flow *flow,
                  struct rte_flow_error *error);


Failure to destroy a flow rule handle may occur when other flow rules depend
on it, and destroying it would result in an inconsistent state.

This function is only guaranteed to succeed if handles are destroyed in
reverse order of their creation.

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``flow``: flow rule handle to destroy.
- ``error``: perform verbose error reporting if not NULL.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

.. raw:: pdf

   PageBreak

Flush
~~~~~

Convenience function to destroy all flow rule handles associated with a
port. They are released as with successive calls to ``rte_flow_destroy()``.

::

 int
 rte_flow_flush(uint8_t port_id,
                struct rte_flow_error *error);

In the unlikely event of failure, handles are still considered destroyed and
no longer valid but the port must be assumed to be in an inconsistent state.

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``error``: perform verbose error reporting if not NULL.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

Query
~~~~~

Query an existing flow rule.

This function allows retrieving flow-specific data such as counters. Data
is gathered by special actions which must be present in the flow rule
definition.

::

 int
 rte_flow_query(uint8_t port_id,
                struct rte_flow *flow,
                enum rte_flow_action_type action,
                void *data,
                struct rte_flow_error *error);

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``flow``: flow rule handle to query.
- ``action``: action type to query.
- ``data``: pointer to storage for the associated query data type.
- ``error``: perform verbose error reporting if not NULL.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

.. raw:: pdf

   PageBreak

Verbose error reporting
~~~~~~~~~~~~~~~~~~~~~~~

The defined *errno* values may not be accurate enough for users or
application developers who want to investigate issues related to flow rules
management. A dedicated error object is defined for this purpose::

 enum rte_flow_error_type {
     RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
     RTE_FLOW_ERROR_TYPE_UNDEFINED, /**< Cause is undefined. */
     RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
     RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
     RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
     RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< field. */
     RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< field. */
     RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure itself. */
     RTE_FLOW_ERROR_TYPE_PATTERN_MAX, /**< Pattern length (max field). */
     RTE_FLOW_ERROR_TYPE_PATTERN_ITEM, /**< Specific pattern item. */
     RTE_FLOW_ERROR_TYPE_PATTERN, /**< Pattern structure itself. */
     RTE_FLOW_ERROR_TYPE_ACTION_MAX, /**< Number of actions (max field). */
     RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
     RTE_FLOW_ERROR_TYPE_ACTIONS, /**< Actions structure itself. */
 };

 struct rte_flow_error {
     enum rte_flow_error_type type; /**< Cause field and error types. */
     void *cause; /**< Object responsible for the error. */
     const char *message; /**< Human-readable error message. */
 };

Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case
the remaining fields can be ignored. Other error types describe the object
type pointed to by ``cause``.

If non-NULL, ``cause`` points to the object responsible for the error. For a
flow rule, this may be a pattern item or an individual action.

If non-NULL, ``message`` provides a human-readable error message.

This object is normally allocated by applications and set by PMDs, the
message points to a constant string which does not need to be freed by the
application, however its pointer can be considered valid only as long as its
associated DPDK port remains configured. Closing the underlying device or
unloading the PMD invalidates it.

.. raw:: pdf

   PageBreak

PMD interface
~~~~~~~~~~~~~

This specification focuses on the public-facing interface, which must be
fully defined from the start to avoid a re-design later as it is subject to
API and ABI versioning constraints.

No such issue exists with the internal interface for use by poll-mode
drivers which can evolve independently, hence this section only outlines how
requests are processed by PMDs.

Public functions are mapped more or less directly to PMD operation
callbacks, thus:

- Public API functions do not process flow rules definitions at all before
  calling PMD callbacks (no basic error checking, no validation
  whatsoever). They only make sure these callbacks are non-NULL or return
  the ``ENOSYS`` (function not supported) error.

- DPDK does not keep track of flow rules definitions or flow rule objects
  automatically. Applications may keep track of the former and must keep
  track of the latter. PMDs may also do it for internal needs, however this
  cannot be relied on by applications.

The private interface will provide helper functions to perform common tasks
such as parsing, validating and keeping track of flow rule specifications to
avoid redundant code in PMDs and ease implementation.

Its contents are currently largely undefined since at least one PMD
implementation is necessary first. PMD maintainers are encouraged to share
as much generic code as possible.

.. raw:: pdf

   PageBreak

Caveats
-------

- Flow rules are not maintained between successive port initializations. An
  application exiting without releasing them and restarting must re-create
  them from scratch.

- API operations are synchronous and blocking (``EAGAIN`` cannot be
  returned).

- There is no provision for reentrancy/multi-thread safety, although nothing
  should prevent different devices from being configured at the same
  time. PMDs may protect their control path functions accordingly.

- Stopping the data path (TX/RX) should not be necessary when managing flow
  rules. If this cannot be achieved naturally or with workarounds (such as
  temporarily replacing the burst function pointers), an appropriate error
  code must be returned (``EBUSY``).

- PMDs, not applications, are responsible for maintaining flow rules
  configuration when stopping and restarting a port or performing other
  actions which may affect them. They can only be destroyed explicitly.

For devices exposing multiple ports sharing global settings affected by flow
rules:

- All ports under DPDK control must behave consistently, PMDs are
  responsible for making sure that existing flow rules on a port are not
  affected by other ports.

- Ports not under DPDK control (unaffected or handled by other applications)
  are user's responsibility. They may affect existing flow rules and cause
  undefined behavior. PMDs aware of this may prevent flow rules creation
  altogether in such cases.

.. raw:: pdf

   PageBreak

Compatibility
-------------

No known hardware implementation supports all the features described in this
document.

Unsupported features or combinations are not expected to be fully emulated
in software by PMDs for performance reasons. Partially supported features
may be completed in software as long as hardware performs most of the work
(such as queue redirection and packet recognition).

However PMDs are expected to do their best to satisfy application requests
by working around hardware limitations as long as doing so does not affect
the behavior of existing flow rules.

The following sections provide a few examples of such cases, they are based
on limitations built into the previous APIs.

Global bit-masks
~~~~~~~~~~~~~~~~

Each flow rule comes with its own, per-layer bit-masks, while hardware may
support only a single, device-wide bit-mask for a given layer type, so that
two IPv4 rules cannot use different bit-masks.

The expected behavior in this case is that PMDs automatically configure
global bit-masks according to the needs of the first created flow rule.

Subsequent rules are allowed only if their bit-masks match those, the
``EEXIST`` error code should be returned otherwise.

Unsupported layer types
~~~~~~~~~~~~~~~~~~~~~~~

Many protocols can be simulated by crafting patterns with the `RAW`_ type.

PMDs can rely on this capability to simulate support for protocols with
fixed headers not directly recognized by hardware.

``ANY`` pattern item
~~~~~~~~~~~~~~~~~~~~

This pattern item stands for anything, which can be difficult to translate
to something hardware would understand, particularly if followed by more
specific types.

Consider the following pattern:

+---+--------------------------------+
| 0 | ETHER                          |
+---+--------------------------------+
| 1 | ANY (``min`` = 1, ``max`` = 1) |
+---+--------------------------------+
| 2 | TCP                            |
+---+--------------------------------+

Knowing that TCP does not make sense with something other than IPv4 and IPv6
as L3, such a pattern may be translated to two flow rules instead:

+---+--------------------+
| 0 | ETHER              |
+---+--------------------+
| 1 | IPV4 (zeroed mask) |
+---+--------------------+
| 2 | TCP                |
+---+--------------------+

+---+--------------------+
| 0 | ETHER              |
+---+--------------------+
| 1 | IPV6 (zeroed mask) |
+---+--------------------+
| 2 | TCP                |
+---+--------------------+

Note that as soon as a ANY rule covers several layers, this approach may
yield a large number of hidden flow rules. It is thus suggested to only
support the most common scenarios (anything as L2 and/or L3).

.. raw:: pdf

   PageBreak

Unsupported actions
~~~~~~~~~~~~~~~~~~~

- When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
  tagging (`MARK`_ or `FLAG`_) may be implemented in software as long as the
  target queue is used by a single rule.

- A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
  rules combining `QUEUE`_ and `PASSTHRU`_.

- When a single target queue is provided, `RSS`_ can also be implemented
  through `QUEUE`_.

Flow rules priority
~~~~~~~~~~~~~~~~~~~

While it would naturally make sense, flow rules cannot be assumed to be
processed by hardware in the same order as their creation for several
reasons:

- They may be managed internally as a tree or a hash table instead of a
  list.
- Removing a flow rule before adding another one can either put the new rule
  at the end of the list or reuse a freed entry.
- Duplication may occur when packets are matched by several rules.

For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
predictable behavior is only guaranteed by using different priority levels.

Priority levels are not necessarily implemented in hardware, or may be
severely limited (e.g. a single priority bit).

For these reasons, priority levels may be implemented purely in software by
PMDs.

- For devices expecting flow rules to be added in the correct order, PMDs
  may destroy and re-create existing rules after adding a new one with
  a higher priority.

- A configurable number of dummy or empty rules can be created at
  initialization time to save high priority slots for later.

- In order to save priority levels, PMDs may evaluate whether rules are
  likely to collide and adjust their priority accordingly.

.. raw:: pdf

   PageBreak

API migration
=============

Exhaustive list of deprecated filter types and how to convert them to
generic flow rules.

``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
---------------------------------------

`MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF
(action)`_ or `PF (action)`_ terminating action.

+------------------------------------+
| MACVLAN                            |
+--------------------------+---------+
| Pattern                  | Actions |
+===+=====+==========+=====+=========+
| 0 | ETH | ``spec`` | any | VF,     |
|   |     +----------+-----+ PF      |
|   |     | ``mask`` | any |         |
+---+-----+----------+-----+---------+

``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
----------------------------------------------

`ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as
a terminating action.

+------------------------------------+
| ETHERTYPE                          |
+--------------------------+---------+
| Pattern                  | Actions |
+===+=====+==========+=====+=========+
| 0 | ETH | ``spec`` | any | QUEUE,  |
|   |     +----------+-----+ DROP    |
|   |     | ``mask`` | any |         |
+---+-----+----------+-----+---------+

``FLEXIBLE`` to ``RAW`` → ``QUEUE``
-----------------------------------

`FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the
terminating action and a defined priority level.

+------------------------------------+
| FLEXIBLE                           |
+--------------------------+---------+
| Pattern                  | Actions |
+===+=====+==========+=====+=========+
| 0 | RAW | ``spec`` | any | QUEUE   |
|   |     +----------+-----+         |
|   |     | ``mask`` | any |         |
+---+-----+----------+-----+---------+

``SYN`` to ``TCP`` → ``QUEUE``
------------------------------

`SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
`QUEUE`_ as the terminating action.

Priority level can be set to simulate the high priority bit.

+---------------------------------------------+
| SYN                                         |
+-----------------------------------+---------+
| Pattern                           | Actions |
+===+======+==========+=============+=========+
| 0 | ETH  | ``spec`` | empty       | QUEUE   |
|   |      +----------+-------------+         |
|   |      | ``mask`` | empty       |         |
+---+------+----------+-------------+         |
| 1 | IPV4 | ``spec`` | empty       |         |
|   |      +----------+-------------+         |
|   |      | ``mask`` | empty       |         |
+---+------+----------+-------------+         |
| 2 | TCP  | ``spec`` | ``syn`` = 1 |         |
|   |      +----------+-------------+         |
|   |      | ``mask`` | ``syn`` = 1 |         |
+---+------+----------+-------------+---------+

``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
----------------------------------------------------

`NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
`UDP`_ as L4 and `QUEUE`_ as the terminating action.

A priority level can be specified as well.

+---------------------------------------+
| NTUPLE                                |
+-----------------------------+---------+
| Pattern                     | Actions |
+===+======+==========+=======+=========+
| 0 | ETH  | ``spec`` | empty | QUEUE   |
|   |      +----------+-------+         |
|   |      | ``mask`` | empty |         |
+---+------+----------+-------+         |
| 1 | IPV4 | ``spec`` | any   |         |
|   |      +----------+-------+         |
|   |      | ``mask`` | any   |         |
+---+------+----------+-------+         |
| 2 | TCP, | ``spec`` | any   |         |
|   | UDP  +----------+-------+         |
|   |      | ``mask`` | any   |         |
+---+------+----------+-------+---------+

``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
---------------------------------------------------------------------------

`TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types.

In the following table, `ANY`_ is used to cover the optional L4.

+------------------------------------------------+
| TUNNEL                                         |
+--------------------------------------+---------+
| Pattern                              | Actions |
+===+=========+==========+=============+=========+
| 0 | ETH     | ``spec`` | any         | QUEUE   |
|   |         +----------+-------------+         |
|   |         | ``mask`` | any         |         |
+---+---------+----------+-------------+         |
| 1 | IPV4,   | ``spec`` | any         |         |
|   | IPV6    +----------+-------------+         |
|   |         | ``mask`` | any         |         |
+---+---------+----------+-------------+         |
| 2 | ANY     | ``spec`` | ``min`` = 0 |         |
|   |         |          +-------------+         |
|   |         |          | ``max`` = 0 |         |
|   |         +----------+-------------+         |
|   |         | ``mask`` | N/A         |         |
+---+---------+----------+-------------+         |
| 3 | VXLAN,  | ``spec`` | any         |         |
|   | GENEVE, +----------+-------------+         |
|   | TEREDO, | ``mask`` | any         |         |
|   | NVGRE,  |          |             |         |
|   | GRE,    |          |             |         |
|   | ...     |          |             |         |
+---+---------+----------+-------------+---------+

.. raw:: pdf

   PageBreak

``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
---------------------------------------------------------------

`FDIR`_ is more complex than any other type, there are several methods to
emulate its functionality. It is summarized for the most part in the table
below.

A few features are intentionally not supported:

- The ability to configure the matching input set and masks for the entire
  device, PMDs should take care of it automatically according to the
  requested flow rules.

  For example if a device supports only one bit-mask per protocol type,
  source/address IPv4 bit-masks can be made immutable by the first created
  rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
  compatible.

  Note that only protocol bit-masks affected by existing flow rules are
  immutable, others can be changed later. They become mutable again after
  the related flow rules are destroyed.

- Returning four or eight bytes of matched data when using flex bytes
  filtering. Although a specific action could implement it, it conflicts
  with the much more useful 32 bits tagging on devices that support it.

- Side effects on RSS processing of the entire device. Flow rules that
  conflict with the current device configuration should not be
  allowed. Similarly, device configuration should not be allowed when it
  affects existing flow rules.

- Device modes of operation. "none" is unsupported since filtering cannot be
  disabled as long as a flow rule is present.

- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
  according to the created flow rules.

- Signature mode of operation is not defined but could be handled through a
  specific item type if needed.

+----------------------------------------------+
| FDIR                                         |
+---------------------------------+------------+
| Pattern                         | Actions    |
+===+============+==========+=====+============+
| 0 | ETH,       | ``spec`` | any | QUEUE,     |
|   | RAW        +----------+-----+ DROP,      |
|   |            | ``mask`` | any | PASSTHRU   |
+---+------------+----------+-----+------------+
| 1 | IPV4,      | ``spec`` | any | MARK       |
|   | IPV6       +----------+-----+ (optional) |
|   |            | ``mask`` | any |            |
+---+------------+----------+-----+            |
| 2 | TCP,       | ``spec`` | any |            |
|   | UDP,       +----------+-----+            |
|   | SCTP       | ``mask`` | any |            |
+---+------------+----------+-----+            |
| 3 | VF,        | ``spec`` | any |            |
|   | PF         +----------+-----+            |
|   | (optional) | ``mask`` | any |            |
+---+------------+----------+-----+------------+

.. raw:: pdf

   PageBreak

``HASH``
~~~~~~~~

There is no counterpart to this filter type because it translates to a
global device setting instead of a pattern item. Device settings are
automatically set according to the created flow rules.

``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All packets are matched. This type alters incoming packets to encapsulate
them in a chosen tunnel type, optionally redirect them to a VF as well.

The destination pool for tag based forwarding can be emulated with other
flow rules using `DUP`_ as the action.

+----------------------------------------+
| L2_TUNNEL                              |
+---------------------------+------------+
| Pattern                   | Actions    |
+===+======+==========+=====+============+
| 0 | VOID | ``spec`` | N/A | VXLAN,     |
|   |      |          |     | GENEVE,    |
|   |      |          |     | ...        |
|   |      +----------+-----+------------+
|   |      | ``mask`` | N/A | VF         |
|   |      |          |     | (optional) |
+---+------+----------+-----+------------+

.. raw:: pdf

   PageBreak

Future evolutions
=================

- Describing dedicated testpmd commands to control and validate this API.

- A method to optimize generic flow rules with specific pattern items and
  action types generated on the fly by PMDs. DPDK will assign negative
  numbers to these in order to not collide with the existing types. See
  `Negative types`_.

- Adding specific egress pattern items and actions as described in `Traffic
  direction`_.

- Optional software fallback when PMDs are unable to handle requested flow
  rules so applications do not have to implement their own.

- Ranges in addition to bit-masks. Ranges are more generic in many ways as
  they interpret values. For instance only ranges make sense to cover
  several TCP or UDP ports. These will probably be defined on a pattern item
  basis.

--------

Adrien Mazarguil (1):
  ethdev: introduce generic flow API

 lib/librte_ether/Makefile   |   2 +
 lib/librte_ether/rte_flow.h | 941 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 943 insertions(+)
 create mode 100644 lib/librte_ether/rte_flow.h

-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [RFC PATCH 0/5] add API's for VF management
@ 2016-08-18 13:48  3% Bernard Iremonger
  2016-08-26  9:10  3% ` [dpdk-dev] [RFC PATCH v2 " Bernard Iremonger
  0 siblings, 1 reply; 200+ results
From: Bernard Iremonger @ 2016-08-18 13:48 UTC (permalink / raw)
  To: rahul.r.shah, wenzhuo.lu, dev; +Cc: Bernard Iremonger

This RFC patchset contains new DPDK API's requested by AT&T for use
with the Virtual Function Daemon (VFD).

The need to configure and manage VF's on a NIC has grown to the
point where AT&T have devloped a DPDK based tool, VFD, to do this.

This RFC proposes to add the following API extensions to DPDK:
  mailbox communication callback support
  VF configuration

Nine new functions have been added to the eth_dev_ops structure.
Corresponding functions have been added to the ixgbe PMD for the
Niantic NIC.

Two new callback functions have been added.
Changes have been made to the ixgbe_rcv_msg_from_vf function to
use the callback functions.

Changes have been made to testpmd to facilitate testing of the new API's.
The testpmd documentation has been updated to document the testpmd changes.

Note:
Adding new functions to the eth_dev_ops structure will cause an
ABI breakage.

Bernard Iremonger (5):
  librte_ether: add internal callback functions
  net/ixgbe: add callback to user app on VF to PF mbox msg
  librte_ether: add API's for VF management
  net/ixgbe: add functions for VF management
  app/test_pmd: add tests for new API's

 app/test-pmd/cmdline.c                      | 700 ++++++++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  68 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c            | 179 +++++++
 drivers/net/ixgbe/ixgbe_pf.c                |  39 +-
 lib/librte_ether/rte_ethdev.c               | 176 +++++++
 lib/librte_ether/rte_ethdev.h               | 284 +++++++++++
 lib/librte_ether/rte_ether_version.map      |  16 +
 7 files changed, 1455 insertions(+), 7 deletions(-)

-- 
2.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] DPDK Stable Releases and Long Term Support
  2016-08-17 12:29  5% ` Panu Matilainen
@ 2016-08-17 13:30  0%   ` Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2016-08-17 13:30 UTC (permalink / raw)
  To: Panu Matilainen, dev



> -----Original Message-----
> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> Sent: Wednesday, August 17, 2016 1:30 PM
> To: Mcnamara, John <john.mcnamara@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] DPDK Stable Releases and Long Term Support
> 
> ...
>
> > ABI
> > ---
> >
> > The Stable Release should not be seen as a way of breaking or
> > circumventing the DPDK ABI policy.
> 
> I find this a strange thing to say about a stable/LTS release ABI. I had
> read the originating thread before seeing this, but it still made me go
> "Huh?" for several seconds. The problem perhaps being, the rest of the
> document addresses stable/LTS releases, but this statement speaks about
> normal development work going on elsewhere.
> 
> The earlier version had a mention about ABI/API breakage related to things
> what not to backport but that's entirely gone here. Given how important
> ABI + API stability is for stable/LTS releases, I think it deserves a
> special mention here. Maybe something more to the tune of:
> 
> ---
> ABI or API breakages are not permitted in stable releases, special care
> must be taken to when backporting.
> 
> The existence of stable release(s) does not lessen the need to comply to
> DPDK ABI policy in development work.
> ---

That seems reasonable. If I do an update to the doc or add it to the guides I'll update it with this.

> 
> With the exception of the ABI/API thing, this looks like a fair starting
> point to me. Time and experience will tell more.
> 

I also think that we will have to see how it goes. What is important is that we end up with something that is useful to the community and consumers.

John.
-- 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Best Practices for PMD Verification before Upstream Requests
@ 2016-08-17 12:34  3% Shepard Siegel
  2016-08-22 13:07  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Shepard Siegel @ 2016-08-17 12:34 UTC (permalink / raw)
  To: dev

Hi,


Atomic Rules is new to the DPDK community. We attended the DPDK Summit last
week and received terrific advice and encouragement. We are developing a
DPDK PMD for our Arkville product which is a DPDK-aware data mover, capable
of marshaling packets between FPGA/ASIC gates with AXI interfaces on one
side, and the DPDK API/ABI on the other. Arkville plus a MAC looks like a
line-rate-agnostic bare-bones L2 NIC. We have testpmd and our first DPDK
applications running using our early-alpha Arkville PMD.


This post is to ask of the DPDK community what tests, regressions,
check-lists or similar verification assets we might work through before
starting the process to upstream our code? We know device-specific PMDs are
rather cloistered and unlikely to interfere; but still, others must have
managed to find a way to fail with even an L2 baseline NIC.  We don’t want
to needlessly repeat those mistakes. Any DPDK-specific collateral that we
can use to verify and validate our codes before attempting to upstream them
would be greatly appreciated. To the DPDK PMD developers, what can you
share so that we are more aligned with your regressions? To the DPDK
application developers, what’s your top gripe we might try to avoid in our
Arkville L2 baseline PMD?


Thanks in advance. We won’t have anyone at the Dublin DPDK Summit, but we
will be at FPL2016 in two weeks. Any constructive feedback is greatly
appreciated!


Shepard Siegel, CTO

atomicrules.com


<http://atomicrules.com>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] DPDK Stable Releases and Long Term Support
  2016-07-28 12:33  3% [dpdk-dev] DPDK Stable Releases and Long Term Support Mcnamara, John
@ 2016-08-17 12:29  5% ` Panu Matilainen
  2016-08-17 13:30  0%   ` Mcnamara, John
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2016-08-17 12:29 UTC (permalink / raw)
  To: Mcnamara, John, dev

[ Yes I'm late to this party, apologies for missing the first round of 
discussion ]

On 07/28/2016 03:33 PM, Mcnamara, John wrote:
>
> This document sets out the guidelines for DPDK Stable Releases and Long Term
> Support releases (LTS) based on the initial RFC and comments:
> http://dpdk.org/ml/archives/dev/2016-June/040256.html.
>
> In particular it incorporates suggestions for a Stable Release structure as
> well as a Long Term Support release.
>
>
> Introduction
> ------------
>
> The purpose of the DPDK Stable Releases will be to maintain releases of DPDK
> with backported fixes over an extended period of time. This will provide
> downstream consumers of DPDK with a stable target on which to base
> applications or packages.
>
> The Long Term Support release (LTS) will be a designation applied to a Stable
> Release to indicate longer support.
>
>
> Stable Releases
> ---------------
>
> Any major release of DPDK can be designated as a Stable Release if a
> maintainer volunteers to maintain it.
>
> A Stable Release will be used to backport fixes from a N release back to a N-1
> release, for example, from 16.11 to 16.07.
>
> The duration of a stable release should be one complete release cycle. It can
> be longer, up to 1 year, if a maintainer continues to support the stable
> branch, or if users supply backported fixes, however the explicit commitment
> should be for one release cycle.
>
> The release cadence can be determined by the maintainer based on the number of
> bugfixes and the criticality of the bugs. However, releases should be
> coordinated with the validation engineers to ensure that a tagged release has
> been tested.
>
>
> LTS Release
> -----------
>
> A stable release can be designated as an LTS release based on community
> agreement and a commitment from a maintainer. An LTS release will have a
> maintenance duration of 2 years.
>
> It is anticipated that there should be at least 4 releases per year of the LTS
> or approximately 1 every 3 months. However, the cadence can be shorter or
> longer depending on the number and criticality of the backported
> fixes. Releases should be coordinated with the validation engineers to ensure
> that a tagged release has been tested.
>
>
> Initial Stable Release
> ----------------------
>
> The initial DPDK Stable Release will be 16.07. It will be viewed as a trial of
> the Stable Release/LTS policy to determine what are the best working practices
> for DPDK.
>
> The maintainer for the initial release will be Yuanhan Liu
> <yuanhan.liu@linux.intel.com>. It is hoped that other community members will
> volunteer as maintainers for other Stable Releases.
>
> The initial targeted release for LTS is proposed to be 16.11 based on the
> results of the work carried out on the 16.07 Stable Release.
>
> A list has been set up for Stable Release/LTS specific discussions:
> <stable@dpdk.org>. This address can also be used for CCing maintainers on bug
> fix submissions.
>
>
> What changes should be backported
> ---------------------------------
>
> The backporting should be limited to bug fixes.
>
> Features should not be backported to stable releases. It may be acceptable, in
> limited cases, to back port features for the LTS release where:
>
> * There is a justifiable use case (for example a new PMD).
> * The change is non-invasive.

A new PMD also would not touch existing code, which makes it a 
low-to-no-risk thing. Ditto for, say, new command line tool or an example.

> * The work of preparing the backport is done by the proposer.
> * There is support within the community.
>
>
> Testing
> -------
>
> Stable and LTS releases should be tested before release/tagging.
>
> Intel will provide validation engineers to test the 16.07 Stable Release and
> the initial LTS tree. Other community members should provide testing for other
> stable releases.
>
> The validation will consist of compilation testing on the range of OSes
> supported by the master release and functional/performance testing on the
> current major/LTS release of the following OSes:
>
> * Ubuntu
> * RHEL
> * SuSE
> * FreeBSD
>
>
> Releasing
> ---------
>
> A Stable Release will be released by:
>
> * Tagging the release with YY.MM.nn (year, month, number) or similar.
> * Uploading a tarball of the release to dpdk.org.
> * Sending an announcement to the <announce@dpdk.org> list.
>
>
> ABI
> ---
>
> The Stable Release should not be seen as a way of breaking or circumventing
> the DPDK ABI policy.

I find this a strange thing to say about a stable/LTS release ABI. I had 
read the originating thread before seeing this, but it still made me go 
"Huh?" for several seconds. The problem perhaps being, the rest of the 
document addresses stable/LTS releases, but this statement speaks about 
normal development work going on elsewhere.

The earlier version had a mention about ABI/API breakage related to 
things what not to backport but that's entirely gone here. Given how 
important ABI + API stability is for stable/LTS releases, I think it 
deserves a special mention here. Maybe something more to the tune of:

---
ABI or API breakages are not permitted in stable releases, special care 
must be taken to when backporting.

The existence of stable release(s) does not lessen the need to comply to 
DPDK ABI policy in development work.
---

>
>
> Review of the Stable Release/LTS guidelines
> -------------------------------------------
>
> This document serves as a set of guidelines for the planned Stable
> Releases/LTS activities. However, the actual process can be reviewed and
> amended over time, based on experiences and feedback.
>

With the exception of the ABI/API thing, this looks like a fair starting 
point to me. Time and experience will tell more.

	 - Panu -

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] Ring PMD: why are stats counters atomic?
  @ 2016-08-15 20:41  0%       ` Mauricio Vásquez
  0 siblings, 0 replies; 200+ results
From: Mauricio Vásquez @ 2016-08-15 20:41 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Finally I have some time to have a look to it.

On Mon, May 16, 2016 at 3:16 PM, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> On Mon, May 16, 2016 at 03:12:10PM +0200, Mauricio Vásquez wrote:
>> Hello Bruce,
>>
>> Although having this support does not harm anyone, I am not convinced that
>> it is useful, mainly because there exists the single-thread limitation in
>> other PMDs. Then, if an application has to use different kind of NICs (i.e,
>> different PMDs) it has to implement the locking strategies. On the other
>> hand, if an application  only uses rte_rings, it could just use the
>> rte_ring library.
>>
>> Thanks, Mauricio V
>>
> I agree.
> If you want, please submit a patch to remove this behaviour and see
> if anyone objects to it. If there are no objections, I have no problem accepting
> the patch.
>
> However, since this is a behaviour change to existing functionality, we may
> need to implement function versionning for this for ABI compatibility. Please
> take that into account when drafting any patch.
>

Do you think that versioning is required in this case?
If anyone is using a functionality that is not supposed to work in
that way, should we care about it?

I am not against versioning, I just want to know if it is worthy to do.

> Regards,
> /Bruce
>
>> On Tue, May 10, 2016 at 11:36 AM, Bruce Richardson <
>> bruce.richardson@intel.com> wrote:
>>
>> > On Tue, May 10, 2016 at 11:13:08AM +0200, Mauricio Vásquez wrote:
>> > > Hello,
>> > >
>> > > Per-queue stats counters are defined as rte_atomic64_t, in the tx/rx
>> > > functions, they are atomically increased if the rings have the multiple
>> > > consumers/producer flag enabled.
>> > >
>> > > According to the design principles, the application should not invoke
>> > those
>> > > functions on the same queue on different cores, then I think that atomic
>> > > increasing is not necessary.
>> > >
>> > > Is there something wrong with my reasoning?, If not, I am willing to
>> > send a
>> > > patch.
>> > >
>> > > Thank you very much,
>> > >
>> > Since the rte_rings, on which the ring pmd is obviously based, have
>> > multi-producer
>> > and multi-consumer support built-in, I thought it might be useful in the
>> > ring
>> > PMD itself to allow multiple threads to access the ring queues at the same
>> > time,
>> > if the underlying rings are marked as MP/MC safe. When doing enqueues and
>> > dequeue
>> > from the ring, the stats are either incremented atomically, or
>> > non-atomically,
>> > depending on the underlying queue type.
>> >
>> >         const uint16_t nb_rx = (uint16_t)rte_ring_dequeue_burst(r->rng,
>> >                         ptrs, nb_bufs);
>> >         if (r->rng->flags & RING_F_SC_DEQ)
>> >                 r->rx_pkts.cnt += nb_rx;
>> >         else
>> >                 rte_atomic64_add(&(r->rx_pkts), nb_rx);
>> >
>> > If people don't think this behaviour is worthwhile keeping, I'm ok with
>> > removing
>> > it, since all other PMDs have the restriction that the queues are
>> > single-thread
>> > only.
>> >
>> > Regards,
>> > /Bruce
>> >

Regards,

Mauricio V

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-08-04 13:05  0%             ` Adrien Mazarguil
@ 2016-08-09 21:24  0%               ` John Fastabend
  0 siblings, 0 replies; 200+ results
From: John Fastabend @ 2016-08-09 21:24 UTC (permalink / raw)
  To: Jerin Jacob, dev, Thomas Monjalon, Helin Zhang, Jingjing Wu,
	Rasesh Mody, Ajit Khaparde, Rahul Lakkireddy, Wenzhuo Lu,
	Jan Medala, John Daley, Jing Chen, Konstantin Ananyev,
	Matej Vido, Alejandro Lucero, Sony Chacko, Pablo de Lara,
	Olga Shern

[...]

>> I'm not sure I understand 'bit granularity' here. I would say we have
>> devices now that have rather strange restrictions due to hardware
>> implementation. Going forward we should get better hardware and a lot
>> of this will go away in my view. Yes this is a long term view and
>> doesn't help the current state. The overall point you are making is
>> the sum off all these strange/odd bits in the hardware implementation
>> means capabilities queries are very difficult to guarantee. On existing
>> hardware and I think you've convinced me. Thanks ;)
> 
> Precisely. By "bit granularity" I meant that while it is fairly easy to
> report whether bit-masking is supported on protocol fields such as MAC
> addresses at all, devices may have restrictions on the possible bit-masks,
> like they may only have an effect at byte level (0xff), may not allow
> specific bits (broadcast) or there even may be a fixed set of bit-masks to
> choose from.

Yep lots of strange hardware implementation voodoo here.

> 
> [...]
>>> I understand, however I think this approach may be too low-level to express
>>> all the possible combinations. This graph would have to include possible
>>> actions for each possible pattern, all while considering that some actions
>>> are not possible with some patterns and that there are exclusive actions.
>>>
>>
>> Really? You have hardware that has dependencies between the parser and
>> the supported actions? Ugh...
> 
> Not that I know of actually, even though we cannot rule out this
> possibility.
> 
> Here are the possible cases I have in mind with existing HW:
> 
> - Too many actions specified for a single rule, even though each of them is
>   otherwise supported.

Yep most hardware will have this restriction.

> 
> - Performing several encap/decap actions. None are defined in the initial
>   specification but these are already planned.
> 

Great this is certainly needed.

> - Assuming there is a single table from the application point of view
>   (separate discussion for the other thread), some actions may only be
>   possible with the right pattern item or meta item. Asking HW to perform
>   tunnel decap may only be safe if the pattern specifically matches that
>   protocol.
> 

Yep continue in other thread.

>> If the hardware has separate tables then we shouldn't try to have the
>> PMD flatten those into a single table because we will have no way of
>> knowing how to do that. (I'll respond to the other thread on this in
>> an attempt to not get to scattered).
> 
> OK, will reply there as well.
> 
>>> Also while memory consumption is not really an issue, such a graph may be
>>> huge. It could take a while for the PMD to update it when adding a rule
>>> impacting capabilities.
>>
>> Ugh... I wouldn't suggest updating the capabilities at runtime like
>> this. But I see your point if the graph has to _guarantee_ correctness
>> how does it represent limited number of masks and other strange hw,
>> its unfortunate the hardware isn't more regular.
>>
>> You have convinced me that guaranteed correctness via capabilities
>> is going to difficult for many types of devices although not all.
> 
> I'll just add that these capabilities also depend on side effects of
> configuration performed outside the scope of this API. The way queues are
> (re)initialized or offloads configured may affect them. RSS configuration is
> the most obvious example.
> 

OK.

[...]

>>
>> My concern is this non-determinism will create performance issues in
>> the network because when a flow may or may not be offloaded this can
>> have a rather significant impact on its performance. This can make
>> debugging network wide performance miserable when at time X I get
>> performance X and then for whatever reason something degrades to
>> software and at time Y I get some performance Y << X. I suspect that
>> in general applications will bind tightly with hardware they know
>> works.
> 
> You are right, performance determinism is not taken into account at all, at
> least not yet. It should not be an issue at the beginning as long as the
> API has the ability evolve later for applications that need it.
> 
> Just an idea, could some kind of meta pattern items specifying time
> constraints for a rule address this issue? Say, how long (cycles/ms) the PMD
> may take to query/apply/delete the rule. If it cannot be guaranteed, the
> rule cannot be created. Applications could mantain statistic counters about
> failed rules to determine if performance issues are caused by the inability
> to create them.

It seems a bit heavy to me to have each PMD driver implementing
something like this. But it would be interesting to explore probably
after the basic support is implemented though.

> 
> [...]
>>> For individual points:
>>>
>>> (i) should be doable with the query API without recompiling DPDK as well,
>>> the fact API/ABI breakage must be avoided being part of the requirements. If
>>> you think there is a problem regarding this, can you provide a specific
>>> example?
>>
>> What I was after you noted yourself in the doc here,
>>
>> "PMDs can rely on this capability to simulate support for protocols with
>> fixed headers not directly recognized by hardware."
>>
>> I was trying to get variable header support with the RAW capabilities. A
>> parse graph supports this for example the proposed query API does not.
> 
> OK, I see, however the RAW capability itself may not be supported everywhere
> in patterns. What I described is that PMDs, not applications, could leverage
> the RAW abilities of underlying devices to implement otherwise unsupported
> but fixed patterns.
> 
> So basically you would like to expose the ability to describe fixed protocol
> definitions following RAW patterns, as in:

Correct for say some new tunnel metadata or something.

> 
>  ETH / RAW / IP / UDP / ...
> 
> While with such a pattern the current specification makes RAW (4.1.4.2) and
> IP start matching from the same offset as two different branches, in effect
> you cannot specify a fixed protocol following a RAW item.

What this means though is for every new protocol we will need to rebuild
drivers and dpdk. For a shared lib DPDK environment or a Linux
distribution this can be painful. It would be best to avoid this.

> 
> It is defined that way because I do not see how HW could parse higher level
> protocols after having given up due to a RAW pattern, however assuming the
> entire stack is described only using RAW patterns I guess it could be done.
> 
> Such a pattern could be generated from a separate function before feeding it
> to rte_flow_create(), or translated by the PMD afterwards assuming a
> separate meta item such as RAW_END exists to signal the end of a RAW layer.
> Of course processing this would be more expensive.
> 

Or the supported parse graph could be fetched from the hardware with the
values for each protocol so that the programming interface is the same.
The well known protocols could keep the 'enum values' in the header
rte_flow_item_type enum so that users would not be required to do
the parse graph but for new or experimental protocols we could query
the parse graph and get the programming pattern matching id for them.

The normal flow would be unchanged but we don't get stuck upgrading
everything to add our own protocol. So the flow would be,

 rte_get_parse_graph(graph);
 flow_item_proto = is_my_proto_supported(graph);

 pattern = build_flow_match(flow_item_proto, value, mask);
 action = build_action();
 rte_flow_create(my_port, pattern, action);

The only change to the API proposed to support this would be to allow
unsupported RTE_FLOW_ values to be pushed to the hardware and define
a range of values that are reserved for use by the parse graph discover.

This would not have to be any more expensive.

[...]

>>>>> So you can put it after "known"
>>>>> variable length headers like IP. The limitation is it can't get past
>>>>> undefined variable length headers.
>>>
>>> RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are
>>> looking for?
>>>
>>
>> But FLOW_ITEM_TYPE_ANY skips "any" header type is my understanding if
>> we have new variable length header in the future we will have to add
>> a new type RTE_FLOW_ITEM_TYPE_FOO for example. The RAW type will work
>> for fixed headers as noted above.
> 
> I'm (slowly) starting to get it. How about the suggestion I made above for
> RAW items then?

hmm for performance reasons building an entire graph up using RAW items
seems to be a bit heavy. Another alternative to the above parse graph
notion would be to allow users to add RAW node definitions at init time
and have the PMD give a ID back for those. Then the new node could be
used just like any other RTE_FLOW_ITEM_TYPE in a pattern.

Something like,

	ret_flow_item_type_foo = rte_create_raw_node(foo_raw_pattern)
	ret_flow_item_type_bar = rte_create_raw_node(bar_raw_pattern)

then allow ret_flow_item_type_{foo|bar} to be used in subsequent
pattern matching items. And if the hardware can not support this return
an error from the initial rte_create_raw_node() API call.

Do any either of those proposals sound like reasonable extensions?

> 
> [...]
>> The two open items from me are do we need to support adding new variable
>> length headers? And how do we handle multiple tables I'll take that up
>> in the other thread.
> 
> I think variable length headers may be eventually supported through pattern
> tricks or eventually a separate conversion layer.
> 

A parse graph notion would support this naturally though without pattern
tricks hence my above suggestions.

Also in the current scheme how would I match an ipv6 option or specific
nsh option or mpls tag?

>>>>> I looked at the git repo but I only saw the header definition I guess
>>>>> the implementation is TBD after there is enough agreement on the
>>>>> interface?
>>>
>>> Precisely, I intend to update the tree and send a v2 soon (unfortunately did
>>> not have much time these past few days to work on this).
>>>
>>> Now what if, instead of a seemingly complex parse graph and still in
>>> addition to the query method, enum values were defined for PMDs to report
>>> an array of supported items, typical patterns and actions so applications
>>> can get a quick idea of what devices are capable of without being too
>>> specific. Something like:
>>>
>>>  enum rte_flow_capability {
>>>      RTE_FLOW_CAPABILITY_ITEM_ETH,
>>>      RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP,
>>>      RTE_FLOW_CAPABILITY_ACTION_ID,
>>>      ...
>>>  };
>>>
>>> Although I'm not convinced about the usefulness of this because it would
>>> have to be maintained separately, but that would be easier than building a
>>> dummy flow rule for simple query purposes.
>>
>> I'm not sure its necessary either at first.
> 
> Then I'll discard this idea.
> 
>>> The main question I have for you is, do you think the core of the specified
>>> API is adequate enough assuming it can be extended later with new methods?
>>>
>>
>> The above two items are my only opens at this point, I agree with your
>> summary of my capabilities proposal namely it can be added.
> 
> Thanks, see you in the other thread.
> 

Thanks,
John

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 1/2] lib/librte_port: modify source and sink port structure parameter
@ 2016-08-09 16:30  4% Jasvinder Singh
  0 siblings, 0 replies; 200+ results
From: Jasvinder Singh @ 2016-08-09 16:30 UTC (permalink / raw)
  To: dev; +Cc: cristian.dumitrescu

The ``file_name`` data type of ``struct rte_port_source_params`` and
``struct rte_port_sink_params`` is changed from `char *`` to ``const char *``.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 doc/guides/rel_notes/deprecation.rst   | 4 ----
 doc/guides/rel_notes/release_16_11.rst | 3 ++-
 lib/librte_port/rte_port_source_sink.h | 4 ++--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 96db661..f302af0 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -61,7 +61,3 @@ Deprecation Notices
   renamed to something more consistent (net and crypto prefixes) in 16.11.
   Some of these driver names are used publicly, to create virtual devices,
   so a deprecation notice is necessary.
-
-* API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
-  structures. The member ``file_name`` data type will be changed from
-  ``char *`` to ``const char *``. This change targets release 16.11.
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 0b9022d..4f3d899 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -94,7 +94,8 @@ API Changes
 
    This section is a comment. Make sure to start the actual text at the margin.
 
-* The log history is removed.
+* The ``file_name`` data type of ``struct rte_port_source_params`` and
+  ``struct rte_port_sink_params`` is changed from `char *`` to ``const char *``.
 
 
 ABI Changes
diff --git a/lib/librte_port/rte_port_source_sink.h b/lib/librte_port/rte_port_source_sink.h
index 4db8a8a..be585a7 100644
--- a/lib/librte_port/rte_port_source_sink.h
+++ b/lib/librte_port/rte_port_source_sink.h
@@ -55,7 +55,7 @@ struct rte_port_source_params {
 	struct rte_mempool *mempool;
 
 	/** The full path of the pcap file to read packets from */
-	char *file_name;
+	const char *file_name;
 	/** The number of bytes to be read from each packet in the
 	 *  pcap file. If this value is 0, the whole packet is read;
 	 *  if it is bigger than packet size, the generated packets
@@ -69,7 +69,7 @@ extern struct rte_port_in_ops rte_port_source_ops;
 /** sink port parameters */
 struct rte_port_sink_params {
 	/** The full path of the pcap file to write the packets to */
-	char *file_name;
+	const char *file_name;
 	/** The maximum number of packets write to the pcap file.
 	 *  If this value is 0, the "infinite" write will be carried
 	 *  out.
-- 
2.5.5

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
  2016-08-09  1:01  1% [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK Jerin Jacob
@ 2016-08-09  8:48  0% ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2016-08-09  8:48 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, thomas.monjalon, hemant.agrawal, shreyansh.jain

On Tue, Aug 09, 2016 at 06:31:41AM +0530, Jerin Jacob wrote:
> Hi All,
> 
> Find below an RFC API specification which attempts to
> define the standard application programming interface
> for event driven programming in DPDK and to abstract HW based event devices.
> 
> These devices can support event scheduling and flow ordering
> in HW and typically found in NW SoCs as an integrated device or
> as PCI EP device.
> 
> The RFC APIs are inspired from existing ethernet and crypto devices.
> Following are the requirements considered to define the RFC API.
> 
> 1) APIs similar to existing Ethernet and crypto API framework for
>     ○ Device creation, device Identification and device configuration
> 2) Enumerate libeventdev resources as numbers(0..N) to
>     ○ Avoid ABI issues with handles
>     ○ Event device may have million flow queues so it's not practical to
>     have handles for each flow queue and its associated name based
>     lookup in multiprocess case
> 3) Avoid struct mbuf changes
> 4) APIs to
>     ○ Enumerate eventdev driver capabilities and resources
>     ○ Enqueue events from l-core
>     ○ Schedule events
>     ○ Synchronize events
>     ○ Maintain ingress order of the events
>     ○ Run to completion support
> 
> Find below the URL for the complete API specification.
> 
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> 
> I have created a supportive document to share the concepts of
> event driven programming model and proposed APIs details to get
> better reach for the specification.
> This presentation will cover introduction to event driven programming model concepts,
> characteristics of hardware-based event manager devices,
> RFC API proposal, example use case, and benefits of using the event driven programming model.
> 
> Find below the URL for the supportive document.
> 
> https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf
> 
> git repo for the above documents:
> 
> https://github.com/jerinjacobk/libeventdev/
> 
> Looking forward to getting comments from both application and driver
> implementation perspective.
> 

Hi Jerin,

thanks for the RFC. Packet distribution and scheduling is something we've been
thinking about here too. This RFC gives us plenty of new ideas to take on board. :-)
While you refer to HW implementations on SOC's, have you given any thought to
how a pure-software implementation of an event API might work? I know that
while a software implemenation can obviously be done for just about any API,
I'd be concerned that the API not get in the way of a very highly
tuned implementation.

We'll look at it in some detail and get back to you with our feedback, as soon
as we can, to start getting the discussion going.

Regards,
/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK
@ 2016-08-09  1:01  1% Jerin Jacob
  2016-08-09  8:48  0% ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-08-09  1:01 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, bruce.richardson, hemant.agrawal,
	shreyansh.jain, jerin.jacob

Hi All,

Find below an RFC API specification which attempts to
define the standard application programming interface
for event driven programming in DPDK and to abstract HW based event devices.

These devices can support event scheduling and flow ordering
in HW and typically found in NW SoCs as an integrated device or
as PCI EP device.

The RFC APIs are inspired from existing ethernet and crypto devices.
Following are the requirements considered to define the RFC API.

1) APIs similar to existing Ethernet and crypto API framework for
    ○ Device creation, device Identification and device configuration
2) Enumerate libeventdev resources as numbers(0..N) to
    ○ Avoid ABI issues with handles
    ○ Event device may have million flow queues so it's not practical to
    have handles for each flow queue and its associated name based
    lookup in multiprocess case
3) Avoid struct mbuf changes
4) APIs to
    ○ Enumerate eventdev driver capabilities and resources
    ○ Enqueue events from l-core
    ○ Schedule events
    ○ Synchronize events
    ○ Maintain ingress order of the events
    ○ Run to completion support

Find below the URL for the complete API specification.

https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h

I have created a supportive document to share the concepts of
event driven programming model and proposed APIs details to get
better reach for the specification.
This presentation will cover introduction to event driven programming model concepts,
characteristics of hardware-based event manager devices,
RFC API proposal, example use case, and benefits of using the event driven programming model.

Find below the URL for the supportive document.

https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf

git repo for the above documents:

https://github.com/jerinjacobk/libeventdev/

Looking forward to getting comments from both application and driver
implementation perspective.

What follows is the text version of the above documents, for inline comments and discussion.
I intend to update that specification accordingly.

/**
 * Get the total number of event devices that have been successfully
 * initialised.
 *
 * @return
 *   The total number of usable event devices.
 */
extern uint8_t
rte_eventdev_count(void);

/**
 * Get the device identifier for the named event device.
 *
 * @param name
 *   Event device name to select the event device identifier.
 *
 * @return
 *   Returns event device identifier on success.
 *   - <0: Failure to find named event device.
 */
extern uint8_t
rte_eventdev_get_dev_id(const char *name);

/*
 * Return the NUMA socket to which a device is connected.
 *
 * @param dev_id
 *   The identifier of the device.
 * @return
 *   The NUMA socket id to which the device is connected or
 *   a default of zero if the socket could not be determined.
 *   - -1: dev_id value is out of range.
 */
extern int
rte_eventdev_socket_id(uint8_t dev_id);

/**  Event device information */
struct rte_eventdev_info {
	const char *driver_name;	/**< Event driver name */
	struct rte_pci_device *pci_dev;	/**< PCI information */
	uint32_t min_sched_wait_ns;
	/**< Minimum supported scheduler wait delay in ns by this device */
	uint32_t max_sched_wait_ns;
	/**< Maximum supported scheduler wait delay in ns by this device */
	uint32_t sched_wait_ns;
	/**< Configured scheduler wait delay in ns of this device */
	uint32_t max_flow_queues_log2;
	/**< LOG2 of maximum flow queues supported by this device */
	uint8_t  max_sched_groups;
	/**< Maximum schedule groups supported by this device */
	uint8_t  max_sched_group_priority_levels;
	/**< Maximum schedule group priority levels supported by this device */
}

/**
 * Retrieve the contextual information of an event device.
 *
 * @param dev_id
 *   The identifier of the device.
 * @param[out] dev_info
 *   A pointer to a structure of type *rte_eventdev_info* to be filled with the
 *   contextual information of the device.
 */
extern void
rte_eventdev_info_get(uint8_t dev_id, struct rte_eventdev_info *dev_info);

/** Event device configuration structure */
struct rte_eventdev_config {
	uint32_t sched_wait_ns;
	/**< rte_event_schedule() wait for *sched_wait_ns* ns on this device */
	uint32_t nb_flow_queues_log2;
	/**< LOG2 of the number of flow queues to configure on this device */
	uint8_t  nb_sched_groups;
	/**< The number of schedule groups to configure on this device */
};

/**
 * Configure an event device.
 *
 * This function must be invoked first before any other function in the
 * API. This function can also be re-invoked when a device is in the
 * stopped state.
 *
 * The caller may use rte_eventdev_info_get() to get the capability of each
 * resources available in this event device.
 *
 * @param dev_id
 *   The identifier of the device to configure.
 * @param config
 *   The event device configuration structure.
 *
 * @return
 *   - 0: Success, device configured.
 *   - <0: Error code returned by the driver configuration function.
 */
extern int
rte_eventdev_configure(uint8_t dev_id, struct rte_eventdev_config *config);


#define RTE_EVENT_SCHED_GRP_PRI_HIGHEST	0
/**< Highest schedule group priority */
#define RTE_EVENT_SCHED_GRP_PRI_NORMAL	128
/**< Normal schedule group priority */
#define RTE_EVENT_SCHED_GRP_PRI_LOWEST	255
/**< Lowest schedule group priority */

struct rte_eventdev_sched_group_conf {
	rte_cpuset_t lcore_list;
	/**< List of l-cores has membership in this schedule group */
	uint8_t priority;
	/**< Priority for this schedule group relative to other schedule groups.
	     If the event device's *max_sched_group_priority_levels* are not in
	     the range of requested *priority* then event driver can normalize
	     to required priority value in the range of
	     [RTE_EVENT_SCHED_GRP_PRI_HIGHEST, RTE_EVENT_SCHED_GRP_PRI_LOWEST]*/
	uint8_t enable_all_lcores;
	/**< Ignore *core_list* and enable all the l-cores */
};

/**
 * Allocate and set up a schedule group for a event device.
 *
 * @param dev_id
 *   The identifier of the device.
 * @param group_id
 *   The index of the schedule group to setup. The value must be in the range
 *   [0, nb_sched_groups - 1] previously supplied to rte_eventdev_configure().
 * @param group_conf
 *   The pointer to the configuration data to be used for the schedule group.
 *   NULL value is allowed, in which case default configuration	used.
 * @param socket_id
 *   The *socket_id* argument is the socket identifier in case of NUMA.
 *   The value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
 *   DMA memory allocated for the receive schedule group.
 *
 * @return
 *   - 0: Success, schedule group correctly set up.
 *   - <0: Schedule group configuration failed
 */
extern int
rte_eventdev_sched_group_setup(uint8_t dev_id, uint8_t group_id,
		const struct rte_eventdev_sched_group_conf *group_conf,
		int socket_id);

/**
 * Get the number of schedule groups on a specific event device
 *
 * @param dev_id
 *   Event device identifier.
 * @return
 *   - The number of configured schedule groups
 */
extern uint16_t
rte_eventdev_sched_group_count(uint8_t dev_id);

/**
 * Get the priority of the schedule group on a specific event device
 *
 * @param dev_id
 *   Event device identifier.
 * @param group_id
 *   Schedule group identifier.
 * @return
 *   - The configured priority of the schedule group in
 *     [RTE_EVENT_SCHED_GRP_PRI_HIGHEST, RTE_EVENT_SCHED_GRP_PRI_LOWEST] range
 */
extern uint8_t
rte_eventdev_sched_group_priority(uint8_t dev_id, uint8_t group_id);

/**
 * Get the configured flow queue id mask of a specific event device
 *
 * *flow_queue_id_mask* can be used to generate *flow_queue_id* value in the
 * range [0 - (2^max_flow_queues_log2 -1)] of a specific event device.
 * *flow_queue_id* value will be used in the event enqueue operation
 * and comparing scheduled event *flow_queue_id* value against enqueued value.
 *
 * @param dev_id
 *   Event device identifier.
 * @return
 *   - The configured flow queue id mask
 */
extern uint32_t
rte_eventdev_flow_queue_id_mask(uint8_t dev_id);

/**
 * Start an event device.
 *
 * The device start step is the last one and consists of setting the schedule
 * groups and flow queues to start accepting the events and schedules to l-cores.
 *
 * On success, all basic functions exported by the API (event enqueue,
 * event schedule and so on) can be invoked.
 *
 * @param dev_id
 *   Event device identifier
 * @return
 *   - 0: Success, device started.
 *   - <0: Error code of the driver device start function.
 */
extern int
rte_eventdev_start(uint8_t dev_id);

/**
 * Stop an event device. The device can be restarted with a call to
 * rte_eventdev_start()
 *
 * @param dev_id
 *   Event device identifier.
 */
extern void
rte_eventdev_stop(uint8_t dev_id);

/**
 * Close an event device. The device cannot be restarted!
 *
 * @param dev_id
 *   Event device identifier
 *
 * @return
 *  - 0 on successfully closing device
 *  - <0 on failure to close device
 */
extern int
rte_eventdev_close(uint8_t dev_id);


/* Scheduler synchronization method */

#define RTE_SCHED_SYNC_ORDERED		0
/**< Ordered flow queue synchronization
 *
 * Events from an ordered flow queue can be scheduled to multiple l-cores for
 * concurrent processing while maintaining the original event order. This
 * scheme enables the user to achieve high single flow throughput by avoiding
 * SW synchronization for ordering between l-cores.
 *
 * The source flow queue ordering is maintained when events are enqueued to
 * their destination queue(s) within the same ordered queue synchronization
 * context. A l-core holds the context until it requests another event from the
 * scheduler, which implicitly releases the context. User may allow the
 * scheduler to release the context earlier than that by calling
 * rte_event_schedule_release()
 *
 * Events from the source flow queue appear in their original order when
 * dequeued from a destination flow queue irrespective of its
 * synchronization method. Event ordering is based on the received event(s),
 * but also other (newly allocated or stored) events are ordered when enqueued
 * within the same ordered context.Events not enqueued (e.g. freed or stored)
 * within the context are considered missing from reordering and are skipped at
 * this time (but can be ordered again within another context).
 *
 */

#define RTE_SCHED_SYNC_ATOMIC		1
/**< Atomic flow queue synchronization
 *
 * Events from an atomic flow queue can be scheduled only to a single l-core at
 * a time. The l-core is guaranteed to have exclusive (atomic) access to the
 * associated flow queue context, which enables the user to avoid SW
 * synchronization. Atomic flow queue also helps to maintain event ordering
 * since only one l-core at a time is able to process events from a flow queue.
 *
 * The atomic queue synchronization context is dedicated to the l-core until it
 * requests another event from the scheduler, which implicitly releases the
 * context. User may allow the scheduler to release the context earlier than
 * that by calling rte_event_schedule_release()
 *
 */

#define RTE_SCHED_SYNC_PARALLEL		2
/**< Parallel flow queue
 *
 * The scheduler performs priority scheduling, load balancing etc functions
 * but does not provide additional event synchronization or ordering.
 * It's free to schedule events from single parallel queue to multiple l-core
 * for concurrent processing. Application is responsible for flow queue context
 * synchronization and event ordering (SW synchronization).
 *
 */

/* Event types to classify the event source */

#define RTE_EVENT_TYPE_ETHDEV		0x0
/**< The event generated from ethdev subsystem */
#define RTE_EVENT_TYPE_CRYPTODEV	0x1
/**< The event generated from crypodev subsystem */
#define RTE_EVENT_TYPE_TIMERDEV		0x2
/**< The event generated from timerdev subsystem */
#define RTE_EVENT_TYPE_LCORE		0x3
/**< The event generated from l-core. Application may use *sub_event_type*
 * to further classify the event */
#define RTE_EVENT_TYPE_INVALID		0xf
/**< Invalid event type */
#define RTE_EVENT_TYPE_MAX		0x16

/**< The generic rte_event structure to hold the event attributes */
struct rte_event {
        union {
		uint64_t u64;
		struct {
			uint32_t flow_queue_id;
			/**< Flow queue identifier to choose the flow queue in
			 * enqueue and schedule operation.
			 * The value must be the range of
			 * rte_eventdev_flow_queue_id_mask() */
			uint8_t  sched_group_id;
			/**< Schedule group identifier to choose the schedule
			 * group in enqueue and schedule operation.
			 * The value must be in the range
			 * [0, nb_sched_groups - 1] previously supplied to
			 * rte_eventdev_configure(). */
			uint8_t  sched_sync;
			/**< Scheduler synchronization method associated
			 * with flow queue for enqueue and schedule operation */
			uint8_t  event_type;
			/**< Event type to classify the event source  */
			uint8_t  sub_event_type;
			/**< Sub-event types based on the event source */
		};
	};
	union {
		uintptr_t event;
		/**< Opaque event pointer */
		struct rte_mbuf *mbuf;
		/**< mbuf pointer if the scheduled event is associated with mbuf */
	};
}

/**
 *
 * Enqueue the event object supplied in *rte_event* structure on flow queue
 * identified as *flow_queue_id* associated with the schedule group
 * *sched_group_id*, scheduler synchronization method and its event types
 * on an event device designated by its *dev_id*.
 *
 * @param dev_id
 *   Event device identifier.
 * @param ev
 *   Pointer to struct rte_event
 * @return
 *  - 0 on success
 *  - <0 on failure
 */
extern int
rte_eventdev_enqueue(uint8_t dev_id, struct rte_event *ev);

/**
 * Enqueue a burst of events objects supplied in *rte_event* structure
 * on an event device designated by its *dev_id*.
 *
 * The rte_eventdev_enqueue_burst() function is invoked to enqueue
 * multiple event objects. Its the burst variant of rte_eventdev_enqueue()
 * function
 *
 * The *num* parameter is the number of event objects to enqueue which are
 * supplied in the *ev* array of *rte_event* structure.
 *
 * The rte_eventdev_enqueue_burst() function returns the number of
 * events objects it actually enqueued . A return value equal to
 * *num* means that all event objects have been enqueued.
 *
 * @param dev_id
 *   The identifier of the device.
 * @param ev
 *   The address of an array of *num* pointers to *rte_event* structure
 *   which contain the event object enqueue operations to be processed.
 * @param num
 *   The number of event objects to enqueue
 *
 * @return
 * The number of event objects actually enqueued on the event device. The return
 * value can be less than the value of the *num* parameter when the
 * event devices flow queue is full or if invalid parameters are specified in
 * a *rte_event*. If return value is less than *num*, the remaining events at
 * the end of ev[] are not consumed, and the caller has to take care of them.
 */
extern int
rte_eventdev_enqueue_burst(uint8_t dev_id, struct rte_event *ev[], int num);

/**
 * Schedule an event to the caller l-core from the event device designated by
 * its *dev_id*.
 *
 * rte_event_schedule() does not dictate the specifics of scheduling algorithm as
 * each eventdev driver may have different criteria to schedule an event.
 * However, in general, from an application perspective scheduler may use
 * following scheme to dispatch an event to l-core
 *
 * 1) Selection of schedule group
 *   a) The Number of schedule group available in the event device
 *   b) The caller l-core membership in the schedule group.
 *   c) Schedule group priority relative to other schedule groups.
 * 2) Selection of flow queue and event
 *   a) The Number of flow queues  available in event device
 *   b) Scheduler synchronization method associated with the flow queue
 *
 * On successful scheduler event dispatch, The caller l-core holds scheduler
 * synchronization context associated with the dispatched event, an explicit
 * rte_event_schedule_release() or rte_event_schedule_ctxt_*() or next
 * rte_event_schedule() call shall release the context
 *
 * @param dev_id
 *   The identifier of the device.
 * @param[out] ev
 *   Pointer to struct rte_event. On successful event dispatch, Implementation
 *   updates the event attributes
 * @param wait
 *   When true, wait for event till available or *sched_wait_ns* ns which
 *   previously supplied to rte_eventdev_configure()
 *
 * @return
 * When true, a valid event has been dispatched by the scheduler.
 *
 */
extern bool
rte_event_schedule(uint8_t dev_id, struct rte_event *ev, bool wait);

/**
 * Schedule an event to the caller l-core from a specific schedule group
 * *group_id* of event device designated by its *dev_id*.
 *
 * Like rte_event_schedule(), but schedule group provided as argument *group_id*
 *
 * @param dev_id
 *   The identifier of the device.
 * @param group_id
 *   Schedule group identifier to select the schedule group for event dispatch
 * @param[out] ev
 *   Pointer to struct rte_event. On successful event dispatch, Implementation
 *   updates the event attributes
 * @param wait
 *   When true, wait for event till available or *sched_wait_ns* ns which
 *   previously supplied to rte_eventdev_configure()
 *
 * @return
 * When true, a valid event has been dispatched by the scheduler.
 *
 */
extern bool
rte_event_schedule_from_group(uint8_t dev_id, uint8_t group_id,
				struct rte_event *ev, bool wait);

/**
 * Release the current scheduler synchronization context associated with the
 * scheduler dispatched event
 *
 * If current scheduler synchronization context method is *RTE_SCHED_SYNC_ATOMIC*
 * then this function hints the scheduler that the user has completed critical
 * section processing in the current atomic context.
 * The scheduler is now allowed to schedule events from the same flow queue to
 * another l-core.
 * Early atomic context release may increase parallelism and thus system
 * performance, but user needs to design carefully the split into critical vs.
 * non-critical sections.
 *
 * If current scheduler synchronization context method is *RTE_SCHED_SYNC_ORDERED*
 * then this function hints the scheduler that the user has done all enqueues
 * that need to maintain event order in the current ordered context.
 * The scheduler is allowed to release the ordered context of this l-core and
 * avoid reordering any following enqueues.
 * Early ordered context release may increase parallelism and thus system
 * performance, since scheduler may start reordering events sooner than the next
 * schedule call.
 *
 * If current scheduler synchronization context method is *RTE_SCHED_SYNC_PARALLEL*
 * then this function is a nop
 *
 * @param dev_id
 *   The identifier of the device.
 *
 */
extern void
rte_event_schedule_release(uint8_t dev_id);

/**
 * Update the current schedule context associated with caller l-core
 *
 * rte_event_schedule_ctxt_update() can be used to support run-to-completion
 * model where the application requires the current *event* to stay on the same
 * l-core as it moves through the series of processing stages, provided the
 * event type is *RTE_EVENT_TYPE_LCORE*.
 *
 * In the context of run-to-completion model, rte_eventdev_enqueue()
 * and its associated rte_event_schedule() can be replaced by
 * rte_event_schedule_ctxt_update() if caller requires to current event to
 * stay on caller l-core for new *flow_queue_id* and/or new *sched_sync*
 * and/or new *sub_event_type* values
 *
 * All of the arguments should be equal to their current schedule context values
 * unless the application needs the dispatcher to modify the  event attribute
 * of a dispatched event.
 *
 * rte_event_schedule_ctxt_update() is a costly operation, by splitting it as
 * functions(rte_event_schedule_ctxt_update() and rte_event_schedule_ctxt_wait())
 * allows caller to overlap the context update latency with other profitable
 * work
 *
 * @param dev_id
 *   The identifier of the device.
 * @param flow_queue_id
 *   The new flow queue identifier
 * @param sched_sync
 *   The new schedule synchronization method
 * @param sub_event_type
 *   The new sub_event_type where event_type == RTE_EVENT_TYPE_LCORE
 * @param wait
 *   When true, wait until context update completes
 *   When false, request to update the attribute may optionally start an
 *   operation that may not finish when this function returns.
 *   In that case, this function return '1' to indicate the application to
 *   call rte_event_schedule_ctxt_wait() before processing with an
 *   operation that requires the completion of the requested event attribute
 *   change
 * @return
 *  - <0 on failure
 *  - 0 on if event attribute update operation has been completed.
 *  - 1 on if event attribute update operation has begun asynchronously.
 *
 */
extern int
rte_event_schedule_ctxt_update(uint8_t dev_id, uint32_t flow_queue_id,
		uint8_t  sched_sync, uint8_t sub_event_type, bool wait);

/**
 * Wait for l-core associated event update operation to complete on the
 * event device designated by its *dev_id*.
 *
 * The caller l-core wait until a previously started event attribute update
 * operation from the same l-core till it completes
 *
 * This function is invoked when rte_event_schedule_ctxt_update() returns '1'
 *
 * @param dev_id
 *   The identifier of the device.
 */
extern void
rte_event_schedule_ctxt_wait(uint8_t dev_id);

/**
 * Join the caller l-core to a schedule group *group_id* of the event device
 * designated by its *dev_id*.
 *
 * l-core membership in the schedule group can be configured with
 * rte_eventdev_sched_group_setup() prior to rte_eventdev_start()
 *
 * @param dev_id
 *   The identifier of the device.
 * @param group_id
 *   Schedule group identifier to select the schedule group to join
 *
 * @return
 *  - 0 on success
 *  - <0 on failure
 */
extern int
rte_event_schedule_group_join(uint8_t dev_id, uint8_t group_id);

/**
 * Leave the caller l-core from a schedule group *group_id* of the event device
 * designated by its *dev_id*.
 *
 * This function will unsubscribe the calling l-core from receiving  events from
 * the specified  schedule group *group_id*
 *
 * l-core membership in the schedule group can be configured with
 * rte_eventdev_sched_group_setup() prior to rte_eventdev_start()
 *
 * @param dev_id
 *   The identifier of the device.
 * @param group_id
 *   Schedule group identifier to select the schedule group to join
 *
 * @return
 *  - 0 on success
 *  - <0 on failure
 */
extern int
rte_event_schedule_group_leave(uint8_t dev_id, uint8_t group_id);


*************** text version of the presentation document ************************

Agenda
Event driven programming model concepts in data plane perspective
Characteristics of HW based event manager devices
libeventdev
Example use case - Simple IPSec outbound processing
Benefits of event driven programming model
Future work


Event driven programming model - Concepts
Event is an asynchronous notification from HW/SW to CPU core
Typical examples of events in dataplane are
Packets from ethernet device
Crypto work completion notification from Crypto HW
Timer expiry notification from Timer HW
CPU generates an event to notify another CPU(used in pipeline mode) 
Event driven programming is a programming paradigm in which flow of the program is determined by events

Core 0
queue0
Core 1
Core n
Scheduler
queue N
queue3
queue2
queue1
packet event
Timer expiry ev 
Crypto done ev
SW event
Packet event, Timer expiry event and crypto work complete event are the typical HW generated events
Core can also produce  the SW event to notify another core for work completion

Queue 0..N stores the events


Scheduler schedules an event to core



Core process the event and enqueue  to another downstream queue for further processing or send the event/packet to wire



Event driven programming model - Concepts

Characteristics of HW based event device
Millions of flow queues
Events associated with a single flow queue can be scheduled on multiple CPUs for concurrent processing while maintaining the original event order
Provides synchronization of the events without SW lock schemes
Priority based scheduling to enable the QoS
Event device may have 1 to N schedule groups
Each core can be a member of any subset of schedule groups
Each core decides which schedule group(s) it accepts the events from 
Schedule groups provide a means to execute different functions on different cores
Flow queues grouped into schedule groups
Core to schedule group membership can be changed at runtime to support scaling and reduce the latency of critical work by assigning more cores at runtime  
Event scheduler is implemented in HW to the save CPU cycles 



libeventdev components
Core 0
Core 1
Core n
Scheduler
packet event
Timer expiry ev 
Crypto done ev
SW event
flowqueue n
flowqueue2
flowqueue1
flowqueue0
flowqueue n
flowqueue2
flowqueue1
flowqueue0
flowqueue n
flowqueue2
flowqueue1
flowqueue0
Sched
group0
Sched
group1
Sched
group n
enqueue(grp_id, flow_queue_id, schedule_sync.
event_type,
event)
{grp,flow_queueid,schedule_sync, event_type, event}= schedule()
priority x
priority y
priority z
Core 0's  Sched
Group bitmask:
100011
Group 0
Group 1
Group n

Core 1's
Sched  group bitmask:
000001
Group 0

Each core has group-mask to capture, the list of  schedule groups participate in schedule()
API Interface
Southbound eventdev driver interface 

libeventdev - flow
Event driver registers with libeventdev subsystem and subsystem provide a unique device id
Application get the device capabilities with rte_eventdev_info_get(dev_id), like
The number of schedule groups
The number of flow queues in a schedule group
Application configures the event device and each schedule groups in the event device, like
The number of schedule groups and the flow queues are required
Priority of each schedule group and list of l-cores associated with it
Connect schedule groups with other HW event producers in the system like ethdev and crypto etc
In fastpath,
HW/SW enqueues the events to flow queues associated with schedule groups
Core gets the event through scheduler by invoking rte_event_scheduler() from lcore
Core process the event and enqueue to another downstream queue for further processing or send the event/packet to wire if it is the last stage of the processing
rte_event_scheduler() schedules the event based on 
selection of the schedule group 
The caller l-core membership in the schedule group
 Schedule group priority relative to other schedule groups.
selection of the flow queue and the event inside the  schedule group
Scheduler sync method associated with the flow queue(ATOMIC vs ORDERED/PARALLEL) 



Schedule sync methods (How events are Synchronized)
PARALLEL
Events from a parallel flow queue can be scheduled to multiple cores for concurrent processing
Ingress order is not maintained
ATOMIC
Events from an atomic flow queue can schedule only to a single core at a time
Enable critical section in packet processing like sequence number update etc
Ingress order is maintained as outstanding is always one at a time
ORDERED
Events from the ordered flow queue can be scheduled to multiple cores for concurrent processing
Ingress order is maintained
Enable high single flow throughput



ORDERED flow queue for ingress ordering
6
5
4
3
2
1
ORDERED flow queue
Scheduler
Cores processing ordered events in parallel
4
6
3
1
2
5
6
5
4
3
2
1
Any downstream flow queue
rte_event_schedule() 
rte_event_queue_enqueue()
The source ORDERED flow queue’s ingress order shall be maintained when events are enqueued to any downstream flow queue

Use case (Simple IPSec Outbound processing) 

PHASE1:
POLICY/SA, 
ROUTE
Lookup 
 In parallel
(ORDERED)

 Port 0
   RX
 Port 1
   RX
 Port 2
   RX
 Port 3
   RX
 Port 4
   RX
 Port 6
   RX
 Port 0
   TX
 Port 1
   TX
 Port 2
   TX
 Port 3
   TX
 Port 4
   TX
 Port 6
   TX
PHASE2:
SEQ Number update per SA
(ATOMIC)
PHASE3:
HW assisted IPSec crypto 


PHASE4:
Core sends  encrypted pks to Tx port queues
(ATOMIC)


Packets enqueued  into one of up to 1M flow queues based on a classification criterion(e.g 5 tuple hash)
PHASE1 generates a unique SA based on input packet and SA tables.
Each SA flow will be processed in parallel.
Core enqueues on ATOMIC flow queue for critical section processing per SA
Crypto HW sends the crypto work completion event to notify the core.

Crypto HW processes the crypto operations in background
Core issues IPSec crypto request to HW

Simple IPSec Outbound processing - Cores View
Core n
Core 1
Core 0
while(1) {
    event = rte_event_schedule();
    process the specific phase
    call different enqueue() to send  to  
         - atomic flow queue
         - crypto HW engine queue
         - TX port queue
}
Scheduler
N
HW crypto assist
Tx port  queue
Tx port queue
Tx port queue
Per SA, Core enqueues on ATOMIC flow queue for critical section phase of the flow
On completion of crypto work, HW generates the crypto work completion notification
RX pkt HW enqueues one of millions flow to ORDERED flow queues 
Flow queues 
SA
Flow queues 
Flow queues 
SA
Core enqueues the crypto work

API Requirements
APIs similar to existing ethernet and crypto API framework for
Device creation, device Identification and device configuration
Enumerate libeventdev resources as numbers(0..N)  to
Avoid ABI issues with handles
event device may have million flow queues so it's not practical to have handles for each flow queue and its associated name based lookup in multiprocess case
Avoid struct mbuf changes
APIs to
Enumerate eventdev driver capabilities and resources
Enqueue events from l-core
Schedule events
Synchronize events
Maintain ingress order of the events


API - Slow path
APIs similar to existing ethernet and crypto API framework for
Device creation - Physical event devices are discovered during the PCI probe/enumeration of the EAL function which is executed at DPDK initialization, based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function)
Device Identification - A unique device index used to designate the event device in all functions exported by the eventdev API.
Device Capability discovery 
rte_eventdev_info_get() - To get the global resources like number of schedule groups and number of flow queues per schedule group etc of the event device
Device configuration
rte_eventdev_configure() - configures the number of schedule groups and the number of flow queues on the schedule groups
rte_eventdev_sched_group_setup() - configures schedule group specific configuration like priority and the list of l-core has membership in the schedule group
Device state change - rte_eventdev_start()/stop()/close() like ethdev device



API - Fast path
bool rte_event_schedule(uint8_t dev_id, struct rte_event *ev, bool wait);
Schedule an event to the caller l-core from a specific schedule group of event device designated by its dev_id
bool rte_event_schedule_from_group(uint8_t dev_id, uint8_t group_id,struct rte_event *ev, wait)
Like rte_event_schedule(), but schedule group provided as argument 
void rte_event_schedule_release(uint8_t dev_id);
Release the current scheduler synchronization context associated with the scheduler dispatched event
int rte_event_schedule_group_[join/leave](uint8_t dev_id, uint8_t group_id);
Leave/Joins the caller l-core from/to a schedule group
bool rte_event_schedule_ctxt_update(uint8_t dev_id, uint32_t flow_queue_id, uint8_t  sched_sync, uint8_t sub_event_type, bool wait);
rte_event_schedule_ctxt_update() can be used to support run-to-completion model where the application requires the current *event* to stay on the same  l-core as it moves through the series of processing stages, provided the event type is RTE_EVENT_TYPE_LCORE




Fast path APIs - Simple IPSec outbound example
#define APP_STATE_SEQ_UPDATE 0
on each lcore
{
        struct rte_event ev;
        uint32_t flow_queue_id_mask = rte_eventdev_flow_queue_id_mask(eventdev);

        while (1) {
                ret = rte_event_schedule(eventdev, &ev, true);
		if (!ret)
		    continue;

                /* packets from HW rx ports proceed parallely per flow(ORDERED)*/
                if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
                        sa = outbound_sa_lookup(ev.mbuf);
		        modify the packet per SA attributes
			find the tx port and tx queue from routing table

                        /* move to next phase (atomic seq number update per sa) */
                        ev.flow_queue_id = sa & flow_queue_id_mask;
                        ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
                        ev.sub_event_id = APP_STATE_SEQ_UPDATE;
                        rte_event_enqueue(evendev, ev);
                } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
                        sa = ev.flow_queue_id;
                        /* do critical section work per sa */
                        do_critical_section_work(sa);

                        /* Issue the crypto request and generate the following on crypto work completion */
                        ev.flow_queue_id = tx_port;
                        ev.sub_event_id = tx_queue_id;
                        ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
                        rte_cryptodev_event_enqueue(cryptodev, ev.mbuf, eventdev, ev);
                }
                } else if((ev.event_type == RTE_EVENT_TYPE_CRYPTODEV)
		        tx_port = ev.flow_queue_id;
			tx_queue_id = ev.sub_evend_id;
                        send the packet to tx port/queue
                }
        }
}


rte_event_schedule_ctxt_update() can be used to support run-to-completion model where the application requires the current event to stay on same l-core as it moves through the series of processing stages, provided the event type is RTE_EVENT_TYPE_LCORE(l-core to l-core communication)
For example in the previous use case, the ATOMIC  sequence number update per SA can be achieved like below


Scheduler context update is costly operation, by spliting it as two functions(rte_event_schedule_ctxt_update() and rte_event_schedule_ctxt_wait()) allows application to overlap the context switch latency with other profitable work


Run-to-completion model support
                        /* move to next phase (atomic seq number update per sa) */
                        ev.flow_queue_id = sa & flow_queue_id_mask;
                        ev.sched_sync = RTE_SCHED_SYNC_ATOMIC;
                        ev.sub_event_id = APP_STATE_SEQ_UPDATE;
                        rte_event_enqueue(evendev, ev);
                } else if (ev.event_type == RTE_EVENT_TYPE_LCORE && ev.sub_event_id == APP_STATE_SEQ_UPDATE) {
                        sa = ev.flow_queue_id;
                        /* do critical section work per sa */
                        do_critical_section_work(sa);
          /* move to next phase (atomic seq number update per sa) */

                    rte_event_schedule_ctxt_update(eventdev,
sa & flow_queue_id_mask, RTE_SCHED_SYNC_ATOMIC, APP_STATE_SEQ_UPDATE, true);

                        /* do critical section work per sa */
                        do_critical_section_work(sa);

Benefits of event driven programming model
Enable high single flow throughput with ORDERED schedule sync method
The processing stages are not bound to specific cores. It provides better load-balancing and scaling capabilities than traditional pipelining.
Prioritize: Guarantee lcores work on the highest priority event available
Support asynchronous operations which allow the cores to stay busy while hardware manages requests.
Remove the static mappings between core to port/rx queue
Scaling from 1 to N flows are easy as its not bound to specific cores


Future work
Integrate the event device with ethernet, crypto and timer subsystems in DPDK
Ethdev/event device integration is possible by extending new 6WIND’s ingress classification specification where a new action type can establish ethdev’s port to eventdev’s schedule group connection
Cryptodev needs some change at configuration stage to set crypto work complete event delivery mechanism 
Spec out timerdev for PCI based timer event devices(timer event devices generates timer expiry event vs callback in the existing SW based timer scheme)
Event driven model operates on a single event at a time. Need to create a helper  API to make it burst in nature for the final enqueues to different HW block like ethdev tx-queue

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-08-03 18:10  0%           ` John Fastabend
@ 2016-08-04 13:05  0%             ` Adrien Mazarguil
  2016-08-09 21:24  0%               ` John Fastabend
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-08-04 13:05 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jerin Jacob, dev, Thomas Monjalon, Helin Zhang, Jingjing Wu,
	Rasesh Mody, Ajit Khaparde, Rahul Lakkireddy, Wenzhuo Lu,
	Jan Medala, John Daley, Jing Chen, Konstantin Ananyev,
	Matej Vido, Alejandro Lucero, Sony Chacko, Pablo de Lara,
	Olga Shern

On Wed, Aug 03, 2016 at 11:10:49AM -0700, John Fastabend wrote:
> [...]
> 
> >>>> Considering that allowed pattern/actions combinations cannot be known in
> >>>> advance and would result in an unpractically large number of capabilities to
> >>>> expose, a method is provided to validate a given rule from the current
> >>>> device configuration state without actually adding it (akin to a "dry run"
> >>>> mode).
> >>>
> >>> Rather than have a query/validate process why did we jump over having an
> >>> intermediate representation of the capabilities? Here you state it is
> >>> unpractical but we know how to represent parse graphs and the drivers
> >>> could report their supported parse graph via a single query to a middle
> >>> layer.
> >>>
> >>> This will actually reduce the msg chatter imagine many applications at
> >>> init time or in boundary cases where a large set of applications come
> >>> online at once and start banging on the interface all at once seems less
> >>> than ideal.
> > 
> > Well, I also thought about a kind of graph to represent capabilities but
> > feared the extra complexity would not be worth the trouble, thus settled on
> > the query idea. A couple more reasons:
> > 
> > - Capabilities evolve at the same time as devices are configured. For
> >   example, if a device supports a single RSS context, then a single rule
> >   with a RSS action may be created. The graph would have to be rewritten
> >   accordingly and thus queried/parsed again by the application.
> 
> The graph would not help here because this is an action
> restriction not a parsing restriction. This is yet another query to see
> what actions are supported and how many of each action are supported.
> 
>    get_parse_graph - report the parsable fields
>    get_actions - report the supported actions and possible num of each

OK, now I understand your idea, in my mind the graph was indeed supposed to
represent complete flow rules.

> > - Expressing capabilities at bit granularity (say, for a matching pattern
> >   item mask) is complex, there is no way to simplify the representation of
> >   capabilities without either losing information or making the graph more
> >   complex to parse than simply providing a flow rule from an application
> >   point of view.
> > 
> 
> I'm not sure I understand 'bit granularity' here. I would say we have
> devices now that have rather strange restrictions due to hardware
> implementation. Going forward we should get better hardware and a lot
> of this will go away in my view. Yes this is a long term view and
> doesn't help the current state. The overall point you are making is
> the sum off all these strange/odd bits in the hardware implementation
> means capabilities queries are very difficult to guarantee. On existing
> hardware and I think you've convinced me. Thanks ;)

Precisely. By "bit granularity" I meant that while it is fairly easy to
report whether bit-masking is supported on protocol fields such as MAC
addresses at all, devices may have restrictions on the possible bit-masks,
like they may only have an effect at byte level (0xff), may not allow
specific bits (broadcast) or there even may be a fixed set of bit-masks to
choose from.

[...]
> > I understand, however I think this approach may be too low-level to express
> > all the possible combinations. This graph would have to include possible
> > actions for each possible pattern, all while considering that some actions
> > are not possible with some patterns and that there are exclusive actions.
> > 
> 
> Really? You have hardware that has dependencies between the parser and
> the supported actions? Ugh...

Not that I know of actually, even though we cannot rule out this
possibility.

Here are the possible cases I have in mind with existing HW:

- Too many actions specified for a single rule, even though each of them is
  otherwise supported.

- Performing several encap/decap actions. None are defined in the initial
  specification but these are already planned.

- Assuming there is a single table from the application point of view
  (separate discussion for the other thread), some actions may only be
  possible with the right pattern item or meta item. Asking HW to perform
  tunnel decap may only be safe if the pattern specifically matches that
  protocol.

> If the hardware has separate tables then we shouldn't try to have the
> PMD flatten those into a single table because we will have no way of
> knowing how to do that. (I'll respond to the other thread on this in
> an attempt to not get to scattered).

OK, will reply there as well.

> > Also while memory consumption is not really an issue, such a graph may be
> > huge. It could take a while for the PMD to update it when adding a rule
> > impacting capabilities.
> 
> Ugh... I wouldn't suggest updating the capabilities at runtime like
> this. But I see your point if the graph has to _guarantee_ correctness
> how does it represent limited number of masks and other strange hw,
> its unfortunate the hardware isn't more regular.
> 
> You have convinced me that guaranteed correctness via capabilities
> is going to difficult for many types of devices although not all.

I'll just add that these capabilities also depend on side effects of
configuration performed outside the scope of this API. The way queues are
(re)initialized or offloads configured may affect them. RSS configuration is
the most obvious example.

> [...]
> 
> >>
> >> The cost doing all this is some additional overhead at init time. But
> >> building generic function over this and having a set of predefined
> >> uids for well-known protocols such ip, udp, tcp, etc helps. What you
> >> get for the cost is a few things that I think are worth it. (i) Now
> >> new protocols can be added/removed without recompiling DPDK (ii) a
> >> software package can use the capability query to verify the required
> >> protocols are off-loadable vs a possibly large set of test queries and
> >> (iii) when we do the programming of the device we can provide a tuple
> >> (table-uid, header-uid, field-uid, value, mask, priority) and the
> >> middle layer "knowing" the above graph can verify the command so
> >> drivers only ever see "good"  commands, (iv) finally it should be
> >> faster in terms of cmds per second because the drivers can map the
> >> tuple (table, header, field, priority) to a slot efficiently vs
> >> parsing.
> >>
> >> IMO point (iii) and (iv) will in practice make the code much simpler
> >> because we can maintain common middle layer and not require parsing
> >> by drivers. Making each driver simpler by abstracting into common
> >> layer.
> > 
> > Before answering your points, let's consider how applications are going to
> > be written. Not only devices do not support all possible pattern/actions
> > combinations, they also have memory constraints. Whichever method
> > applications use to determine if a flow rule is supported, at some point
> > they won't be able to add any more due to device limitations.
> > 
> > Sane applications designed to work regardless of the underlying device won't
> > simply call abort() at this point but provide a software fallback
> > instead. My bet is that applications will provide one every time a rule
> > cannot be added for any reason, they won't even bother to query capabilities
> > except perhaps for a very small subset, as in "does this device support the
> > ID action at all?".
> > 
> > Applications that really want/need to know at init time whether all the
> > rules they may want to possibly create are supported will spend about the
> > same time in both cases (query or graph). For queries, by iterating on a
> > list of typical rules. For a graph, by walking through it. Either way, it
> > won't be done later from the data path.
> 
> The queries and graph suffer from the same problems you noted above if
> actually instantiating the rules will impact what rules are allowed. So
> that in both cases we may run into corner cases but it seems that this
> is a result of hardware deficiencies and can't be solved easily at least
> with software.
> 
> My concern is this non-determinism will create performance issues in
> the network because when a flow may or may not be offloaded this can
> have a rather significant impact on its performance. This can make
> debugging network wide performance miserable when at time X I get
> performance X and then for whatever reason something degrades to
> software and at time Y I get some performance Y << X. I suspect that
> in general applications will bind tightly with hardware they know
> works.

You are right, performance determinism is not taken into account at all, at
least not yet. It should not be an issue at the beginning as long as the
API has the ability evolve later for applications that need it.

Just an idea, could some kind of meta pattern items specifying time
constraints for a rule address this issue? Say, how long (cycles/ms) the PMD
may take to query/apply/delete the rule. If it cannot be guaranteed, the
rule cannot be created. Applications could mantain statistic counters about
failed rules to determine if performance issues are caused by the inability
to create them.

[...]
> > For individual points:
> > 
> > (i) should be doable with the query API without recompiling DPDK as well,
> > the fact API/ABI breakage must be avoided being part of the requirements. If
> > you think there is a problem regarding this, can you provide a specific
> > example?
> 
> What I was after you noted yourself in the doc here,
> 
> "PMDs can rely on this capability to simulate support for protocols with
> fixed headers not directly recognized by hardware."
> 
> I was trying to get variable header support with the RAW capabilities. A
> parse graph supports this for example the proposed query API does not.

OK, I see, however the RAW capability itself may not be supported everywhere
in patterns. What I described is that PMDs, not applications, could leverage
the RAW abilities of underlying devices to implement otherwise unsupported
but fixed patterns.

So basically you would like to expose the ability to describe fixed protocol
definitions following RAW patterns, as in:

 ETH / RAW / IP / UDP / ...

While with such a pattern the current specification makes RAW (4.1.4.2) and
IP start matching from the same offset as two different branches, in effect
you cannot specify a fixed protocol following a RAW item.

It is defined that way because I do not see how HW could parse higher level
protocols after having given up due to a RAW pattern, however assuming the
entire stack is described only using RAW patterns I guess it could be done.

Such a pattern could be generated from a separate function before feeding it
to rte_flow_create(), or translated by the PMD afterwards assuming a
separate meta item such as RAW_END exists to signal the end of a RAW layer.
Of course processing this would be more expensive.

[...]
> >>> One strategy I've used in other systems that worked relatively well
> >>> is if the query for the parse graph above returns a key for each node
> >>> in the graph then a single lookup can map the key to a node. Its
> >>> unambiguous and then these operations simply become a table lookup.
> >>> So to be a bit more concrete this changes the pattern structure in
> >>> rte_flow_create() into a  <key,value,mask> tuple where the key is known
> >>> by the initial parse graph query. If you reserve a set of well-defined
> >>> key values for well known protocols like ethernet, ip, etc. then the
> >>> query model also works but the middle layer catches errors in this case
> >>> and again the driver only gets known good flows. So something like this,
> >>>
> >>>   struct rte_flow_pattern {
> >>> 	uint32_t priority;
> >>> 	uint32_t key;
> >>> 	uint32_t value_length;
> >>> 	u8 *value;
> >>>   }
> > 
> > I agree that having an integer representing an entire pattern/actions combo
> > would be great, however how do you tell whether you want matched packets to
> > be duplicated to queue 6 and redirected to queue 3? This method can be used
> > to check if a type of rule is allowed but not whether it is actually
> > applicable. You still need to provide the entire pattern/actions description
> > to create a flow rule.
> 
> In reality its almost the same as your proposal it just took me a moment
> to see it. The only difference I can see is adding new headers via RAW
> type only supports fixed length headers.
> 
> To answer your question the flow_pattern would have to include a action
> set as well to give a list of actions to perform. I just didn't include
> it here.

OK.

> >>> Also if we have multiple tables what do you think about adding a
> >>> table_id to the signature. Probably not needed in the first generation
> >>> but is likely useful for hardware with multiple tables so that it
> >>> would be,
> >>>
> >>>    rte_flow_create(uint8_t port_id, uint8_t table_id, ...);
> > 
> > Not sure if I understand the table ID concept, do you mean in case a device
> > supports entirely different sets of features depending on something? (What?)
> > 
> 
> In many devices we support multiple tables each with their own size,
> match fields and action set. This is useful for building routers for
> example along with lots of other constructs. The basic idea is
> smashing everything into a single table creates a Cartesian product
> problem.

Right, so I understand we'd need a method to express table capabilities as
well as you described (a topic for the other thread then).

[...]
> >>> So you can put it after "known"
> >>> variable length headers like IP. The limitation is it can't get past
> >>> undefined variable length headers.
> > 
> > RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are
> > looking for?
> > 
> 
> But FLOW_ITEM_TYPE_ANY skips "any" header type is my understanding if
> we have new variable length header in the future we will have to add
> a new type RTE_FLOW_ITEM_TYPE_FOO for example. The RAW type will work
> for fixed headers as noted above.

I'm (slowly) starting to get it. How about the suggestion I made above for
RAW items then?

[...]
> The two open items from me are do we need to support adding new variable
> length headers? And how do we handle multiple tables I'll take that up
> in the other thread.

I think variable length headers may be eventually supported through pattern
tricks or eventually a separate conversion layer.

> >>> I looked at the git repo but I only saw the header definition I guess
> >>> the implementation is TBD after there is enough agreement on the
> >>> interface?
> > 
> > Precisely, I intend to update the tree and send a v2 soon (unfortunately did
> > not have much time these past few days to work on this).
> > 
> > Now what if, instead of a seemingly complex parse graph and still in
> > addition to the query method, enum values were defined for PMDs to report
> > an array of supported items, typical patterns and actions so applications
> > can get a quick idea of what devices are capable of without being too
> > specific. Something like:
> > 
> >  enum rte_flow_capability {
> >      RTE_FLOW_CAPABILITY_ITEM_ETH,
> >      RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP,
> >      RTE_FLOW_CAPABILITY_ACTION_ID,
> >      ...
> >  };
> > 
> > Although I'm not convinced about the usefulness of this because it would
> > have to be maintained separately, but that would be easier than building a
> > dummy flow rule for simple query purposes.
> 
> I'm not sure its necessary either at first.

Then I'll discard this idea.

> > The main question I have for you is, do you think the core of the specified
> > API is adequate enough assuming it can be extended later with new methods?
> > 
> 
> The above two items are my only opens at this point, I agree with your
> summary of my capabilities proposal namely it can be added.

Thanks, see you in the other thread.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-08-03 16:44  3%           ` Adrien Mazarguil
@ 2016-08-03 19:11  0%             ` John Fastabend
  0 siblings, 0 replies; 200+ results
From: John Fastabend @ 2016-08-03 19:11 UTC (permalink / raw)
  To: Rahul Lakkireddy, dev, Thomas Monjalon, Helin Zhang, Jingjing Wu,
	Rasesh Mody, Ajit Khaparde, Wenzhuo Lu, Jan Medala, John Daley,
	Jing Chen, Konstantin Ananyev, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, Pablo de Lara, Olga Shern, Kumar A S,
	Nirranjan Kirubaharan, Indranil Choudhury

[...]

>>>>>> The proposal looks very good.  It satisfies most of the features
>>>>>> supported by Chelsio NICs.  We are looking for suggestions on exposing
>>>>>> more additional features supported by Chelsio NICs via this API.
>>>>>>
>>>>>> Chelsio NICs have two regions in which filters can be placed -
>>>>>> Maskfull and Maskless regions.  As their names imply, maskfull region
>>>>>> can accept masks to match a range of values; whereas, maskless region
>>>>>> don't accept any masks and hence perform a more strict exact-matches.
>>>>>> Filters without masks can also be placed in maskfull region.  By
>>>>>> default, maskless region have higher priority over the maskfull region.
>>>>>> However, the priority between the two regions is configurable.
>>>>>
>>>>> I understand this configuration affects the entire device. Just to be clear,
>>>>> assuming some filters are already configured, are they affected by a change
>>>>> of region priority later?
>>>>>
>>>>
>>>> Both the regions exist at the same time in the device.  Each filter can
>>>> either belong to maskfull or the maskless region.
>>>>
>>>> The priority is configured at time of filter creation for every
>>>> individual filter and cannot be changed while the filter is still
>>>> active. If priority needs to be changed for a particular filter then,
>>>> it needs to be deleted first and re-created.
>>>
>>> Could you model this as two tables and add a table_id to the API? This
>>> way user space could populate the table it chooses. We would have to add
>>> some capabilities attributes to "learn" if tables support masks or not
>>> though.
>>>
>>
>> This approach sounds interesting.
> 
> Now I understand the idea behind these tables, however from an application
> point of view I still think it's better if the PMD could take care of flow
> rules optimizations automatically. Think about it, PMDs have exactly a
> single kind of device they know perfectly well to manage, while applications
> want the best possible performance out of any device in the most generic
> fashion.

The problem is keeping priorities in order and/or possibly breaking
rules apart (e.g. you have an L2 table and an L3 table) becomes very
complex to manage at driver level. I think its easier for the
application which has some context to do this. The application "knows"
if its a router for example will likely be able to pack rules better
than a PMD will.

> 
>>> I don't see how the PMD can sort this out in any meaningful way and it
>>> has to be exposed to the application that has the intelligence to 'know'
>>> priorities between masks and non-masks filters. I'm sure you could come
>>> up with something but it would be less than ideal in many cases I would
>>> guess and we can't have the driver getting priorities wrong or we may
>>> not get the correct behavior.
> 
> It may be solved by having the PMD maintain a SW state to quickly know which
> rules are currently created and in what state the device is so basically the
> application doesn't have to perform this work.
> 
> This API allows applications to express basic needs such as "redirect
> packets matching this pattern to that queue". It must not deal with HW
> details and limitations in my opinion. If a request cannot be satisfied,
> then the rule cannot be created. No help from the application must be
> expected by PMDs, otherwise it opens the door to the same issues as the
> legacy filtering APIs.

This depends on the application and what/how it wants to manage the
device. If the application manages a pipeline with some set of tables,
then mapping this down to a single table, which then the PMD has to
unwind back to a multi-table topology to me seems like a waste.

> 
> [...]
>>>> Unfortunately, our maskfull region is extremely small too compared to
>>>> maskless region.
>>>>
>>>
>>> To me this means a userspace application would want to pack it
>>> carefully to get the full benefit. So you need some mechanism to specify
>>> the "region" hence the above table proposal.
>>>
>>
>> Right. Makes sense.
> 
> I do not agree, applications should not be aware of it. Note this case can
> be handled differently, so that rules do not have to be moved back and forth
> between both tables. If the first created rule requires a maskfull entry,
> then all subsequent rules will be entered into that table. Otherwise no
> maskfull entry can be created as long as there is one maskless entry. When
> either table is full, no more rules may be added. Would that work for you?
> 

Its not about mask vs no mask. The devices with multiple tables that I
have don't have this mask limitations. Its about how to optimally pack
the rules and who implements that logic. I think its best done in the
application where I have the context.

Is there a way to omit the table field if the PMD is expected to do
a best effort and add the table field if the user wants explicit
control over table mgmt. This would support both models. I at least
would like to have explicit control over rule population in my pipeline
for use cases where I'm building a pipeline on top of the hardware.

>> [...]
>>>>> Now about this "promisc" match criteria, it can be added as a new meta
>>>>> pattern item (4.1.3 Meta item types). Do you want it to be defined from the
>>>>> start or add it later with the related code in your PMD?
>>>>>
>>>>
>>>> It could be added as a meta item.  If there are other interested
>>>> parties, it can be added now.  Otherwise, we'll add it with our filtering
>>>> related code.
>>>>
>>>
>>> hmm I guess by "promisc" here you mean match packets received from the
>>> wire before they have been switched by the silicon?
>>>
>>
>> Match packets received from wire before they have been switched by
>> silicon, and which also includes packets not destined for DUT and were
>> still received due to interface being in promisc mode.
> 
> I think it's fine, but we'll have to precisely define what happens when a
> packet matched with such pattern is part of a terminating rule. For instance
> if it is duplicated by HW, then the rule cannot be terminating.
> 
> [...]
>>>> This raises another interesting question.  What should the PMD do
>>>> if it has support to only a subset of fields in the particular item?
>>>>
>>>> For example, if a rule has been sent to match IP fragmentation along
>>>> with several other IPv4 fields, and if the underlying hardware doesn't
>>>> support matching based on IP fragmentation, does the PMD reject the
>>>> complete rule although it could have done the matching for rest of the
>>>> IPv4 fields?
>>>
>>> I think it has to fail the command other wise user space will not have
>>> any way to understand that the full match criteria can not be met and
>>> we will get different behavior for the same applications on different
>>> nics depending on hardware feature set. This will most likely break
>>> applications so we need the error IMO.
>>>
>>
>> Ok. Makes sense.
> 
> Yes, I fully agree with this.
> 
>>>>>> - Match range of physical ports on the NIC in a single rule via masks.
>>>>>>   For ex: match all UDP packets coming on ports 3 and 4 out of 4
>>>>>>   ports available on the NIC.
>>>>>
>>>>> Applications create flow rules per port, I'd thus suggest that the PMD
>>>>> should detect identical rules created on different ports and aggregate them
>>>>> as a single HW rule automatically.
>>>>>
>>>>> If you think this approach is not right, the alternative is a meta pattern
>>>>> item that provides a list of ports. I'm not sure this is the right approach
>>>>> considering it would most likely not be supported by most NICs. Applications
>>>>> may not request it explicitly.
>>>>>
>>>>
>>>> Aggregating via PMD will be expensive operation since it would involve:
>>>> - Search of existing filters.
>>>> - Deleting those filters.
>>>> - Creating a single combined filter.
>>>>
>>>> And all of above 3 operations would need to be atomic so as not to
>>>> affect existing traffic which is hitting above filters.
> 
> Atomicity may not be a problem if the PMD makes sure the new combined rule
> is inserted before the others, so they do not need to be removed either.
> 
>>>> Adding a
>>>> meta item would be a simpler solution here.
> 
> Yes, clearly.
> 
>>> For this adding a meta-data item seems simplest to me. And if you want
>>> to make the default to be only a single port that would maybe make it
>>> easier for existing apps to port from flow director. Then if an
>>> application cares it can create a list of ports if needed.
>>>
>>
>> Agreed.
> 
> However although I'm not opposed to adding dedicated meta items, remember
> applications will not automatically benefit from the increased performance
> if a single PMD implements this feature, their maintainers will probably not
> bother with it.
> 

Unless as we noted in other thread the application is closely bound to
its hardware for capability reasons. In this case it would make sense
to implement.

>>>>>> - Match range of Physical Functions (PFs) on the NIC in a single rule
>>>>>>   via masks. For ex: match all traffic coming on several PFs.
>>>>>
>>>>> The PF and VF pattern items assume there is a single PF associated with a
>>>>> DPDK port. VFs are identified with an ID. I basically took the same
>>>>> definitions as the existing filter types, perhaps this is not enough for
>>>>> Chelsio adapters.
>>>>>
>>>>> Do you expose more than one PF for a DPDK port?
>>>>>
>>>>> Anyway, I'd suggest the same approach as above, automatic aggregation of
>>>>> rules for performance reasons, otherwise new or updated PF/VF pattern items,
>>>>> in which case it would be great if you could provide ideal structure
>>>>> definitions for this use case.
>>>>>
>>>>
>>>> In Chelsio hardware, all the ports of a device are exposed via single
>>>> PF4. There could be many VFs attached to a PF.  Physical NIC functions
>>>> are operational on PF4, while VFs can be attached to PFs 0-3.
>>>> So, Chelsio hardware doesn't remain tied on a PF-to-Port, one-to-one
>>>> mapping assumption.
>>>>
>>>> There already seems to be a PF meta-item, but it doesn't seem to accept
>>>> any "spec" and "mask" field.  Similarly, the VF meta-item doesn't
>>>> seem to accept a "mask" field.  We could probably enable these fields
>>>> in the PF and VF meta-items to allow configuration.
>>>
>>> Maybe a range field would help here as well? So you could specify a VF
>>> range. It might be one of the things to consider adding later though if
>>> there is no clear use for it now.
>>>
>>
>> VF-value and VF-mask would help to achieve the desired filter.
>> VF-mask would also enable to specify a range of VF values.
> 
> Like John, I think a range or even a list instead of a mask would be better,
> the PMD can easily create a mask from that if necessary. Reason is that
> we've always had bad experiences with bit-fields, they're always too short
> at some point and we would like to avoid having to break the ABI to update
> existing pattern items later.

Agreed avoiding bit-fields is a good idea.

> 
> Also while I don't think this is the case yet, perhaps it will be a good
> idea for PFs/VFs to have global unique IDs, just like DPDK ports.
> 
> Thanks.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-08-03 14:30  2%         ` Adrien Mazarguil
@ 2016-08-03 18:10  0%           ` John Fastabend
  2016-08-04 13:05  0%             ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: John Fastabend @ 2016-08-03 18:10 UTC (permalink / raw)
  To: Jerin Jacob, dev, Thomas Monjalon, Helin Zhang, Jingjing Wu,
	Rasesh Mody, Ajit Khaparde, Rahul Lakkireddy, Wenzhuo Lu,
	Jan Medala, John Daley, Jing Chen, Konstantin Ananyev,
	Matej Vido, Alejandro Lucero, Sony Chacko, Pablo de Lara,
	Olga Shern

[...]

>>>> Considering that allowed pattern/actions combinations cannot be known in
>>>> advance and would result in an unpractically large number of capabilities to
>>>> expose, a method is provided to validate a given rule from the current
>>>> device configuration state without actually adding it (akin to a "dry run"
>>>> mode).
>>>
>>> Rather than have a query/validate process why did we jump over having an
>>> intermediate representation of the capabilities? Here you state it is
>>> unpractical but we know how to represent parse graphs and the drivers
>>> could report their supported parse graph via a single query to a middle
>>> layer.
>>>
>>> This will actually reduce the msg chatter imagine many applications at
>>> init time or in boundary cases where a large set of applications come
>>> online at once and start banging on the interface all at once seems less
>>> than ideal.
> 
> Well, I also thought about a kind of graph to represent capabilities but
> feared the extra complexity would not be worth the trouble, thus settled on
> the query idea. A couple more reasons:
> 
> - Capabilities evolve at the same time as devices are configured. For
>   example, if a device supports a single RSS context, then a single rule
>   with a RSS action may be created. The graph would have to be rewritten
>   accordingly and thus queried/parsed again by the application.

The graph would not help here because this is an action
restriction not a parsing restriction. This is yet another query to see
what actions are supported and how many of each action are supported.

   get_parse_graph - report the parsable fields
   get_actions - report the supported actions and possible num of each

> 
> - Expressing capabilities at bit granularity (say, for a matching pattern
>   item mask) is complex, there is no way to simplify the representation of
>   capabilities without either losing information or making the graph more
>   complex to parse than simply providing a flow rule from an application
>   point of view.
> 

I'm not sure I understand 'bit granularity' here. I would say we have
devices now that have rather strange restrictions due to hardware
implementation. Going forward we should get better hardware and a lot
of this will go away in my view. Yes this is a long term view and
doesn't help the current state. The overall point you are making is
the sum off all these strange/odd bits in the hardware implementation
means capabilities queries are very difficult to guarantee. On existing
hardware and I think you've convinced me. Thanks ;)

> With that in mind, I am not opposed to the idea, both methods could even
> coexist, with the query function eventually evolving to become a front-end
> to a capability graph. Just remember that I am only defining the
> fundamentals for the initial implementation, i.e. how rules are expressed as
> patterns/actions and the basic functions to manage them, ideally without
> having to redefine them ever.
> 

Agreed they should be able to coexist. So I can get my capabilities
queries as a layer on top of the API here.

>> A bit more details on possible interface for capabilities query,
>>
>> One way I've used to describe these graphs from driver to software
>> stacks is to use a set of structures to build the graph. For fixed
>> graphs this could just be *.h file for programmable hardware (typically
>> coming from fw update on nics) the driver can read the parser details
>> out of firmware and render the structures.
> 
> I understand, however I think this approach may be too low-level to express
> all the possible combinations. This graph would have to include possible
> actions for each possible pattern, all while considering that some actions
> are not possible with some patterns and that there are exclusive actions.
> 

Really? You have hardware that has dependencies between the parser and
the supported actions? Ugh...

If the hardware has separate tables then we shouldn't try to have the
PMD flatten those into a single table because we will have no way of
knowing how to do that. (I'll respond to the other thread on this in
an attempt to not get to scattered).

> Also while memory consumption is not really an issue, such a graph may be
> huge. It could take a while for the PMD to update it when adding a rule
> impacting capabilities.

Ugh... I wouldn't suggest updating the capabilities at runtime like
this. But I see your point if the graph has to _guarantee_ correctness
how does it represent limited number of masks and other strange hw,
its unfortunate the hardware isn't more regular.

You have convinced me that guaranteed correctness via capabilities
is going to difficult for many types of devices although not all.

[...]

>>
>> The cost doing all this is some additional overhead at init time. But
>> building generic function over this and having a set of predefined
>> uids for well-known protocols such ip, udp, tcp, etc helps. What you
>> get for the cost is a few things that I think are worth it. (i) Now
>> new protocols can be added/removed without recompiling DPDK (ii) a
>> software package can use the capability query to verify the required
>> protocols are off-loadable vs a possibly large set of test queries and
>> (iii) when we do the programming of the device we can provide a tuple
>> (table-uid, header-uid, field-uid, value, mask, priority) and the
>> middle layer "knowing" the above graph can verify the command so
>> drivers only ever see "good"  commands, (iv) finally it should be
>> faster in terms of cmds per second because the drivers can map the
>> tuple (table, header, field, priority) to a slot efficiently vs
>> parsing.
>>
>> IMO point (iii) and (iv) will in practice make the code much simpler
>> because we can maintain common middle layer and not require parsing
>> by drivers. Making each driver simpler by abstracting into common
>> layer.
> 
> Before answering your points, let's consider how applications are going to
> be written. Not only devices do not support all possible pattern/actions
> combinations, they also have memory constraints. Whichever method
> applications use to determine if a flow rule is supported, at some point
> they won't be able to add any more due to device limitations.
> 
> Sane applications designed to work regardless of the underlying device won't
> simply call abort() at this point but provide a software fallback
> instead. My bet is that applications will provide one every time a rule
> cannot be added for any reason, they won't even bother to query capabilities
> except perhaps for a very small subset, as in "does this device support the
> ID action at all?".
> 
> Applications that really want/need to know at init time whether all the
> rules they may want to possibly create are supported will spend about the
> same time in both cases (query or graph). For queries, by iterating on a
> list of typical rules. For a graph, by walking through it. Either way, it
> won't be done later from the data path.

The queries and graph suffer from the same problems you noted above if
actually instantiating the rules will impact what rules are allowed. So
that in both cases we may run into corner cases but it seems that this
is a result of hardware deficiencies and can't be solved easily at least
with software.

My concern is this non-determinism will create performance issues in
the network because when a flow may or may not be offloaded this can
have a rather significant impact on its performance. This can make
debugging network wide performance miserable when at time X I get
performance X and then for whatever reason something degrades to
software and at time Y I get some performance Y << X. I suspect that
in general applications will bind tightly with hardware they know
works.

> 
> I think that for an application maintainer, writing or even generating a set
> of typical rules will also be easier than walking through a graph. It should
> also be easier on the PMD side.
> 

I tend to think getting a graph and doing operations on graphs is easier
myself but I can see this is a matter of opinion/style.

> For individual points:
> 
> (i) should be doable with the query API without recompiling DPDK as well,
> the fact API/ABI breakage must be avoided being part of the requirements. If
> you think there is a problem regarding this, can you provide a specific
> example?

What I was after you noted yourself in the doc here,

"PMDs can rely on this capability to simulate support for protocols with
fixed headers not directly recognized by hardware."

I was trying to get variable header support with the RAW capabilities. A
parse graph supports this for example the proposed query API does not.

> 
> (ii) as described above, I think this use case won't be very common in the
> wild, except for applications designed for a specific device and then they
> will probably know enough about it to skip the query step entirely. If time
> must be spent anyway, it will be in the control path at initialization
> time.
> 

OK.

> (iii) misses the fact that capabilities evolve as flow rules get added,
> there is no way for PMDs to only see "valid" rules also because device
> limitations may prevent adding an otherwise valid rule.

OK I agree for devices with this evolving characteristic we are lost.

> 
> (iv) could be true if not for the same reason as (iii). The graph would have
> to be verfied again before adding another rule. Note that PMDs maintainers
> are encouraged to make their query function as fast as possible, they may
> rely on static data internally for this as well.
> 

OK I'm not going to get hung up on this because I think its an
implementation detail and not an API problem. I would prefer to be
pragmatic and see how fast the API is before I bikeshed it to death for
no good reason.

>>> Worse in my opinion it requires all drivers to write mostly duplicating
>>> validation code where a common layer could easily do this if every
>>> driver reported a common data structure representing its parse graph
>>> instead. The nice fallout of this initial effort upfront is the driver
>>> no longer needs to do error handling/checking/etc and can assume all
>>> rules are correct and valid. It makes driver code much simpler to
>>> support. And IMO at least by doing this we get some other nice benefits
>>> described below.
> 
> About duplicated code, my usual reply is that DPDK will provide internal
> helper methods to assist PMDs with rules management/parsing/etc. These are
> not discussed in the specification because I wanted everyone to agree to the
> application side of things first, and it is difficult to know how much
> assistance PMDs might need without an initial implementation.
> 
> I think this private API will be built at the same time as support is added
> to PMDs and maintainers notice generic code that can be shared.
> Documentation may be written later once things start to settle down.

OK lets see.

> 
>>> Another related question is about performance.
>>>
>>>> Creation
>>>> ~~~~~~~~
>>>>
>>>> Creating a flow rule is similar to validating one, except the rule is
>>>> actually created.
>>>>
>>>> ::
>>>>
>>>>  struct rte_flow *
>>>>  rte_flow_create(uint8_t port_id,
>>>>                  const struct rte_flow_pattern *pattern,
>>>>                  const struct rte_flow_actions *actions);
>>>
>>> I gather this implies that each driver must parse the pattern/action
>>> block and map this onto the hardware. How many rules per second can this
>>> support? I've run into systems that expect a level of service somewhere
>>> around 50k cmds per second. So bulking will help at the message level
>>> but it seems like a lot of overhead to unpack the pattern/action section.
> 
> There is indeed no guarantee on the time taken to create a flow rule, as
> debated with Sugesh (see the full thread):
> 
>  http://dpdk.org/ml/archives/dev/2016-July/043958.html
> 
> I will update the specification accordingly.
> 
> Consider that even 50k cmds per second may not be fast enough. Applications
> always need to have some kind of fallback ready, and the ability to know
> whether a packet has been matched by a rule is a way to help with that.
> 
> In any case, flow rules must be managed from the control path, the data path
> must only handle consequences.

Same as above lets see I think it can probably be made fast enough.

> 
>>> One strategy I've used in other systems that worked relatively well
>>> is if the query for the parse graph above returns a key for each node
>>> in the graph then a single lookup can map the key to a node. Its
>>> unambiguous and then these operations simply become a table lookup.
>>> So to be a bit more concrete this changes the pattern structure in
>>> rte_flow_create() into a  <key,value,mask> tuple where the key is known
>>> by the initial parse graph query. If you reserve a set of well-defined
>>> key values for well known protocols like ethernet, ip, etc. then the
>>> query model also works but the middle layer catches errors in this case
>>> and again the driver only gets known good flows. So something like this,
>>>
>>>   struct rte_flow_pattern {
>>> 	uint32_t priority;
>>> 	uint32_t key;
>>> 	uint32_t value_length;
>>> 	u8 *value;
>>>   }
> 
> I agree that having an integer representing an entire pattern/actions combo
> would be great, however how do you tell whether you want matched packets to
> be duplicated to queue 6 and redirected to queue 3? This method can be used
> to check if a type of rule is allowed but not whether it is actually
> applicable. You still need to provide the entire pattern/actions description
> to create a flow rule.

In reality its almost the same as your proposal it just took me a moment
to see it. The only difference I can see is adding new headers via RAW
type only supports fixed length headers.

To answer your question the flow_pattern would have to include a action
set as well to give a list of actions to perform. I just didn't include
it here.

> 
>>> Also if we have multiple tables what do you think about adding a
>>> table_id to the signature. Probably not needed in the first generation
>>> but is likely useful for hardware with multiple tables so that it
>>> would be,
>>>
>>>    rte_flow_create(uint8_t port_id, uint8_t table_id, ...);
> 
> Not sure if I understand the table ID concept, do you mean in case a device
> supports entirely different sets of features depending on something? (What?)
> 

In many devices we support multiple tables each with their own size,
match fields and action set. This is useful for building routers for
example along with lots of other constructs. The basic idea is
smashing everything into a single table creates a Cartesian product
problem.

>>> Finally one other problem we've had which would be great to address
>>> if we are doing a rewrite of the API is adding new protocols to
>>> already deployed DPDK stacks. This is mostly a Linux distribution
>>> problem where you can't easily update DPDK.
>>>
>>> In the prototype header linked in this document it seems to add new
>>> headers requires adding a new enum in the rte_flow_item_type but there
>>> is at least an attempt at a catch all here,
>>>
>>>> 	/**
>>>> 	 * Matches a string of a given length at a given offset (in bytes),
>>>> 	 * or anywhere in the payload of the current protocol layer
>>>> 	 * (including L2 header if used as the first item in the stack).
>>>> 	 *
>>>> 	 * See struct rte_flow_item_raw.
>>>> 	 */
>>>> 	RTE_FLOW_ITEM_TYPE_RAW,
>>>
>>> Actually this is a nice implementation because it works after the
>>> previous item in the stack correct?
> 
> Yes, this is correct.

Great.

> 
>>> So you can put it after "known"
>>> variable length headers like IP. The limitation is it can't get past
>>> undefined variable length headers.
> 
> RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are
> looking for?
> 

But FLOW_ITEM_TYPE_ANY skips "any" header type is my understanding if
we have new variable length header in the future we will have to add
a new type RTE_FLOW_ITEM_TYPE_FOO for example. The RAW type will work
for fixed headers as noted above.

>>> However if you use the above parse
>>> graph reporting from the driver mechanism and the driver always reports
>>> its largest supported graph then we don't have this issue where a new
>>> hardware sku/ucode/etc added support for new headers but we have no
>>> way to deploy it to existing software users without recompiling and
>>> redeploying.
> 
> I really would like to understand if you see a limitation regarding this
> with the specified API, even assuming DPDK is compiled as a shared library
> and thus not part of the user application.
> 

Thanks this thread was very helpful for me at least. So the summary
for me is. Capability queries can be build on top of this API no
problem and for many existing devices capability queries will not be
able to guarantee a flow insertion success due to hardware
quirks/limitations.

The two open items from me are do we need to support adding new variable
length headers? And how do we handle multiple tables I'll take that up
in the other thread.

>>> I looked at the git repo but I only saw the header definition I guess
>>> the implementation is TBD after there is enough agreement on the
>>> interface?
> 
> Precisely, I intend to update the tree and send a v2 soon (unfortunately did
> not have much time these past few days to work on this).
> 
> Now what if, instead of a seemingly complex parse graph and still in
> addition to the query method, enum values were defined for PMDs to report
> an array of supported items, typical patterns and actions so applications
> can get a quick idea of what devices are capable of without being too
> specific. Something like:
> 
>  enum rte_flow_capability {
>      RTE_FLOW_CAPABILITY_ITEM_ETH,
>      RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP,
>      RTE_FLOW_CAPABILITY_ACTION_ID,
>      ...
>  };
> 
> Although I'm not convinced about the usefulness of this because it would
> have to be maintained separately, but that would be easier than building a
> dummy flow rule for simple query purposes.

I'm not sure its necessary either at first.

> 
> The main question I have for you is, do you think the core of the specified
> API is adequate enough assuming it can be extended later with new methods?
> 

The above two items are my only opens at this point, I agree with your
summary of my capabilities proposal namely it can be added.

.John

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: postpone mempool ABI breakage
  2016-07-29 13:41  8% [dpdk-dev] [PATCH] doc: postpone mempool ABI breakage Thomas Monjalon
@ 2016-08-03 16:46  4% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-08-03 16:46 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev

2016-07-29 15:41, Thomas Monjalon:
> It was planned to remove some mempool functions which are deprecated
> since 16.07.
> As no other mempool ABI change is planned in 16.11, it is better
> to postpone and group every mempool ABI changes in 17.02.
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  @ 2016-08-03 16:44  3%           ` Adrien Mazarguil
  2016-08-03 19:11  0%             ` John Fastabend
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-08-03 16:44 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: John Fastabend, dev, Thomas Monjalon, Helin Zhang, Jingjing Wu,
	Rasesh Mody, Ajit Khaparde, Wenzhuo Lu, Jan Medala, John Daley,
	Jing Chen, Konstantin Ananyev, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, Pablo de Lara, Olga Shern, Kumar A S,
	Nirranjan Kirubaharan, Indranil Choudhury

Replying to everything at once, please see below.

On Tue, Jul 26, 2016 at 03:37:35PM +0530, Rahul Lakkireddy wrote:
> On Monday, July 07/25/16, 2016 at 09:40:02 -0700, John Fastabend wrote:
> > On 16-07-25 04:32 AM, Rahul Lakkireddy wrote:
> > > Hi Adrien,
> > > 
> > > On Thursday, July 07/21/16, 2016 at 19:07:38 +0200, Adrien Mazarguil wrote:
> > >> Hi Rahul,
> > >>
> > >> Please see below.
> > >>
> > >> On Thu, Jul 21, 2016 at 01:43:37PM +0530, Rahul Lakkireddy wrote:
> > >>> Hi Adrien,
> > >>>
> > >>> The proposal looks very good.  It satisfies most of the features
> > >>> supported by Chelsio NICs.  We are looking for suggestions on exposing
> > >>> more additional features supported by Chelsio NICs via this API.
> > >>>
> > >>> Chelsio NICs have two regions in which filters can be placed -
> > >>> Maskfull and Maskless regions.  As their names imply, maskfull region
> > >>> can accept masks to match a range of values; whereas, maskless region
> > >>> don't accept any masks and hence perform a more strict exact-matches.
> > >>> Filters without masks can also be placed in maskfull region.  By
> > >>> default, maskless region have higher priority over the maskfull region.
> > >>> However, the priority between the two regions is configurable.
> > >>
> > >> I understand this configuration affects the entire device. Just to be clear,
> > >> assuming some filters are already configured, are they affected by a change
> > >> of region priority later?
> > >>
> > > 
> > > Both the regions exist at the same time in the device.  Each filter can
> > > either belong to maskfull or the maskless region.
> > > 
> > > The priority is configured at time of filter creation for every
> > > individual filter and cannot be changed while the filter is still
> > > active. If priority needs to be changed for a particular filter then,
> > > it needs to be deleted first and re-created.
> > 
> > Could you model this as two tables and add a table_id to the API? This
> > way user space could populate the table it chooses. We would have to add
> > some capabilities attributes to "learn" if tables support masks or not
> > though.
> > 
> 
> This approach sounds interesting.

Now I understand the idea behind these tables, however from an application
point of view I still think it's better if the PMD could take care of flow
rules optimizations automatically. Think about it, PMDs have exactly a
single kind of device they know perfectly well to manage, while applications
want the best possible performance out of any device in the most generic
fashion.

> > I don't see how the PMD can sort this out in any meaningful way and it
> > has to be exposed to the application that has the intelligence to 'know'
> > priorities between masks and non-masks filters. I'm sure you could come
> > up with something but it would be less than ideal in many cases I would
> > guess and we can't have the driver getting priorities wrong or we may
> > not get the correct behavior.

It may be solved by having the PMD maintain a SW state to quickly know which
rules are currently created and in what state the device is so basically the
application doesn't have to perform this work.

This API allows applications to express basic needs such as "redirect
packets matching this pattern to that queue". It must not deal with HW
details and limitations in my opinion. If a request cannot be satisfied,
then the rule cannot be created. No help from the application must be
expected by PMDs, otherwise it opens the door to the same issues as the
legacy filtering APIs.

[...]
> > > Unfortunately, our maskfull region is extremely small too compared to
> > > maskless region.
> > > 
> > 
> > To me this means a userspace application would want to pack it
> > carefully to get the full benefit. So you need some mechanism to specify
> > the "region" hence the above table proposal.
> > 
> 
> Right. Makes sense.

I do not agree, applications should not be aware of it. Note this case can
be handled differently, so that rules do not have to be moved back and forth
between both tables. If the first created rule requires a maskfull entry,
then all subsequent rules will be entered into that table. Otherwise no
maskfull entry can be created as long as there is one maskless entry. When
either table is full, no more rules may be added. Would that work for you?

> [...]
> > >> Now about this "promisc" match criteria, it can be added as a new meta
> > >> pattern item (4.1.3 Meta item types). Do you want it to be defined from the
> > >> start or add it later with the related code in your PMD?
> > >>
> > > 
> > > It could be added as a meta item.  If there are other interested
> > > parties, it can be added now.  Otherwise, we'll add it with our filtering
> > > related code.
> > > 
> > 
> > hmm I guess by "promisc" here you mean match packets received from the
> > wire before they have been switched by the silicon?
> > 
> 
> Match packets received from wire before they have been switched by
> silicon, and which also includes packets not destined for DUT and were
> still received due to interface being in promisc mode.

I think it's fine, but we'll have to precisely define what happens when a
packet matched with such pattern is part of a terminating rule. For instance
if it is duplicated by HW, then the rule cannot be terminating.

[...]
> > > This raises another interesting question.  What should the PMD do
> > > if it has support to only a subset of fields in the particular item?
> > > 
> > > For example, if a rule has been sent to match IP fragmentation along
> > > with several other IPv4 fields, and if the underlying hardware doesn't
> > > support matching based on IP fragmentation, does the PMD reject the
> > > complete rule although it could have done the matching for rest of the
> > > IPv4 fields?
> > 
> > I think it has to fail the command other wise user space will not have
> > any way to understand that the full match criteria can not be met and
> > we will get different behavior for the same applications on different
> > nics depending on hardware feature set. This will most likely break
> > applications so we need the error IMO.
> > 
> 
> Ok. Makes sense.

Yes, I fully agree with this.

> > >>> - Match range of physical ports on the NIC in a single rule via masks.
> > >>>   For ex: match all UDP packets coming on ports 3 and 4 out of 4
> > >>>   ports available on the NIC.
> > >>
> > >> Applications create flow rules per port, I'd thus suggest that the PMD
> > >> should detect identical rules created on different ports and aggregate them
> > >> as a single HW rule automatically.
> > >>
> > >> If you think this approach is not right, the alternative is a meta pattern
> > >> item that provides a list of ports. I'm not sure this is the right approach
> > >> considering it would most likely not be supported by most NICs. Applications
> > >> may not request it explicitly.
> > >>
> > > 
> > > Aggregating via PMD will be expensive operation since it would involve:
> > > - Search of existing filters.
> > > - Deleting those filters.
> > > - Creating a single combined filter.
> > > 
> > > And all of above 3 operations would need to be atomic so as not to
> > > affect existing traffic which is hitting above filters.

Atomicity may not be a problem if the PMD makes sure the new combined rule
is inserted before the others, so they do not need to be removed either.

> > > Adding a
> > > meta item would be a simpler solution here.

Yes, clearly.

> > For this adding a meta-data item seems simplest to me. And if you want
> > to make the default to be only a single port that would maybe make it
> > easier for existing apps to port from flow director. Then if an
> > application cares it can create a list of ports if needed.
> > 
> 
> Agreed.

However although I'm not opposed to adding dedicated meta items, remember
applications will not automatically benefit from the increased performance
if a single PMD implements this feature, their maintainers will probably not
bother with it.

> > >>> - Match range of Physical Functions (PFs) on the NIC in a single rule
> > >>>   via masks. For ex: match all traffic coming on several PFs.
> > >>
> > >> The PF and VF pattern items assume there is a single PF associated with a
> > >> DPDK port. VFs are identified with an ID. I basically took the same
> > >> definitions as the existing filter types, perhaps this is not enough for
> > >> Chelsio adapters.
> > >>
> > >> Do you expose more than one PF for a DPDK port?
> > >>
> > >> Anyway, I'd suggest the same approach as above, automatic aggregation of
> > >> rules for performance reasons, otherwise new or updated PF/VF pattern items,
> > >> in which case it would be great if you could provide ideal structure
> > >> definitions for this use case.
> > >>
> > > 
> > > In Chelsio hardware, all the ports of a device are exposed via single
> > > PF4. There could be many VFs attached to a PF.  Physical NIC functions
> > > are operational on PF4, while VFs can be attached to PFs 0-3.
> > > So, Chelsio hardware doesn't remain tied on a PF-to-Port, one-to-one
> > > mapping assumption.
> > > 
> > > There already seems to be a PF meta-item, but it doesn't seem to accept
> > > any "spec" and "mask" field.  Similarly, the VF meta-item doesn't
> > > seem to accept a "mask" field.  We could probably enable these fields
> > > in the PF and VF meta-items to allow configuration.
> > 
> > Maybe a range field would help here as well? So you could specify a VF
> > range. It might be one of the things to consider adding later though if
> > there is no clear use for it now.
> > 
> 
> VF-value and VF-mask would help to achieve the desired filter.
> VF-mask would also enable to specify a range of VF values.

Like John, I think a range or even a list instead of a mask would be better,
the PMD can easily create a mask from that if necessary. Reason is that
we've always had bad experiences with bit-fields, they're always too short
at some point and we would like to avoid having to break the ABI to update
existing pattern items later.

Also while I don't think this is the case yet, perhaps it will be a good
idea for PFs/VFs to have global unique IDs, just like DPDK ports.

Thanks.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  @ 2016-08-03 14:30  2%         ` Adrien Mazarguil
  2016-08-03 18:10  0%           ` John Fastabend
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-08-03 14:30 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jerin Jacob, dev, Thomas Monjalon, Helin Zhang, Jingjing Wu,
	Rasesh Mody, Ajit Khaparde, Rahul Lakkireddy, Wenzhuo Lu,
	Jan Medala, John Daley, Jing Chen, Konstantin Ananyev,
	Matej Vido, Alejandro Lucero, Sony Chacko, Pablo de Lara,
	Olga Shern

Hi John,

I'm replying below to both messages.

On Tue, Aug 02, 2016 at 11:19:15AM -0700, John Fastabend wrote:
> On 16-07-23 02:10 PM, John Fastabend wrote:
> > On 16-07-21 12:20 PM, Adrien Mazarguil wrote:
> >> Hi Jerin,
> >>
> >> Sorry, looks like I missed your reply. Please see below.
> >>
> > 
> > Hi Adrian,
> > 
> > Sorry for a bit delay but a few comments that may be worth considering.
> > 
> > To start with completely agree on the general problem statement and the
> > nice summary of all the current models. Also good start on this.

Thanks.

> >> Considering that allowed pattern/actions combinations cannot be known in
> >> advance and would result in an unpractically large number of capabilities to
> >> expose, a method is provided to validate a given rule from the current
> >> device configuration state without actually adding it (akin to a "dry run"
> >> mode).
> > 
> > Rather than have a query/validate process why did we jump over having an
> > intermediate representation of the capabilities? Here you state it is
> > unpractical but we know how to represent parse graphs and the drivers
> > could report their supported parse graph via a single query to a middle
> > layer.
> > 
> > This will actually reduce the msg chatter imagine many applications at
> > init time or in boundary cases where a large set of applications come
> > online at once and start banging on the interface all at once seems less
> > than ideal.

Well, I also thought about a kind of graph to represent capabilities but
feared the extra complexity would not be worth the trouble, thus settled on
the query idea. A couple more reasons:

- Capabilities evolve at the same time as devices are configured. For
  example, if a device supports a single RSS context, then a single rule
  with a RSS action may be created. The graph would have to be rewritten
  accordingly and thus queried/parsed again by the application.

- Expressing capabilities at bit granularity (say, for a matching pattern
  item mask) is complex, there is no way to simplify the representation of
  capabilities without either losing information or making the graph more
  complex to parse than simply providing a flow rule from an application
  point of view.

With that in mind, I am not opposed to the idea, both methods could even
coexist, with the query function eventually evolving to become a front-end
to a capability graph. Just remember that I am only defining the
fundamentals for the initial implementation, i.e. how rules are expressed as
patterns/actions and the basic functions to manage them, ideally without
having to redefine them ever.

> A bit more details on possible interface for capabilities query,
> 
> One way I've used to describe these graphs from driver to software
> stacks is to use a set of structures to build the graph. For fixed
> graphs this could just be *.h file for programmable hardware (typically
> coming from fw update on nics) the driver can read the parser details
> out of firmware and render the structures.

I understand, however I think this approach may be too low-level to express
all the possible combinations. This graph would have to include possible
actions for each possible pattern, all while considering that some actions
are not possible with some patterns and that there are exclusive actions.

Also while memory consumption is not really an issue, such a graph may be
huge. It could take a while for the PMD to update it when adding a rule
impacting capabilities.

> I've done this two ways: one is to define all the fields in their
> own structures using something like,
> 
> struct field {
> 	char *name;
> 	u32 uid;
> 	u32 bitwidth;
> };
> 
> This gives a unique id (uid) for each field along with its
> width and a user friendly name. The fields are organized into
> headers via a header structure,
> 
> struct header_node {
> 	char *name;
> 	u32 uid;
> 	u32 *fields;
> 	struct parse_graph *jump;
> };
> 
> Each node has a unique id and then a list of fields. Where 'fields'
> is a list of uid's of fields its also easy enough to embed the field
> struct in the header_node if that is simpler its really a style
> question.
> 
> The 'struct parse_graph' gives the list of edges from this header node
> to other header nodes. Using a parse graph structure defined
> 
> struct parse_graph {
> 	struct field_reference ref;
> 	__u32 jump_uid;
> };
> 
> Again as a matter of style you can embed the parse graph in the header
> node as I did above or do it as its own object.
> 
> The field_reference noted below gives the id of the field and the value
> e.g. the tuple (ipv4.protocol, 6) then jump_uid would be the uid of TCP.
> 
> struct field_reference {
> 	__u32 header_uid;
> 	__u32 field_uid;
> 	__u32 mask_type;
> 	__u32 type;
> 	__u8  *value;
> 	__u8  *mask;
> };
> 
> The cost doing all this is some additional overhead at init time. But
> building generic function over this and having a set of predefined
> uids for well-known protocols such ip, udp, tcp, etc helps. What you
> get for the cost is a few things that I think are worth it. (i) Now
> new protocols can be added/removed without recompiling DPDK (ii) a
> software package can use the capability query to verify the required
> protocols are off-loadable vs a possibly large set of test queries and
> (iii) when we do the programming of the device we can provide a tuple
> (table-uid, header-uid, field-uid, value, mask, priority) and the
> middle layer "knowing" the above graph can verify the command so
> drivers only ever see "good"  commands, (iv) finally it should be
> faster in terms of cmds per second because the drivers can map the
> tuple (table, header, field, priority) to a slot efficiently vs
> parsing.
> 
> IMO point (iii) and (iv) will in practice make the code much simpler
> because we can maintain common middle layer and not require parsing
> by drivers. Making each driver simpler by abstracting into common
> layer.

Before answering your points, let's consider how applications are going to
be written. Not only devices do not support all possible pattern/actions
combinations, they also have memory constraints. Whichever method
applications use to determine if a flow rule is supported, at some point
they won't be able to add any more due to device limitations.

Sane applications designed to work regardless of the underlying device won't
simply call abort() at this point but provide a software fallback
instead. My bet is that applications will provide one every time a rule
cannot be added for any reason, they won't even bother to query capabilities
except perhaps for a very small subset, as in "does this device support the
ID action at all?".

Applications that really want/need to know at init time whether all the
rules they may want to possibly create are supported will spend about the
same time in both cases (query or graph). For queries, by iterating on a
list of typical rules. For a graph, by walking through it. Either way, it
won't be done later from the data path.

I think that for an application maintainer, writing or even generating a set
of typical rules will also be easier than walking through a graph. It should
also be easier on the PMD side.

For individual points:

(i) should be doable with the query API without recompiling DPDK as well,
the fact API/ABI breakage must be avoided being part of the requirements. If
you think there is a problem regarding this, can you provide a specific
example?

(ii) as described above, I think this use case won't be very common in the
wild, except for applications designed for a specific device and then they
will probably know enough about it to skip the query step entirely. If time
must be spent anyway, it will be in the control path at initialization
time.

(iii) misses the fact that capabilities evolve as flow rules get added,
there is no way for PMDs to only see "valid" rules also because device
limitations may prevent adding an otherwise valid rule.

(iv) could be true if not for the same reason as (iii). The graph would have
to be verfied again before adding another rule. Note that PMDs maintainers
are encouraged to make their query function as fast as possible, they may
rely on static data internally for this as well.

> > Worse in my opinion it requires all drivers to write mostly duplicating
> > validation code where a common layer could easily do this if every
> > driver reported a common data structure representing its parse graph
> > instead. The nice fallout of this initial effort upfront is the driver
> > no longer needs to do error handling/checking/etc and can assume all
> > rules are correct and valid. It makes driver code much simpler to
> > support. And IMO at least by doing this we get some other nice benefits
> > described below.

About duplicated code, my usual reply is that DPDK will provide internal
helper methods to assist PMDs with rules management/parsing/etc. These are
not discussed in the specification because I wanted everyone to agree to the
application side of things first, and it is difficult to know how much
assistance PMDs might need without an initial implementation.

I think this private API will be built at the same time as support is added
to PMDs and maintainers notice generic code that can be shared.
Documentation may be written later once things start to settle down.

> > Another related question is about performance.
> > 
> >> Creation
> >> ~~~~~~~~
> >>
> >> Creating a flow rule is similar to validating one, except the rule is
> >> actually created.
> >>
> >> ::
> >>
> >>  struct rte_flow *
> >>  rte_flow_create(uint8_t port_id,
> >>                  const struct rte_flow_pattern *pattern,
> >>                  const struct rte_flow_actions *actions);
> > 
> > I gather this implies that each driver must parse the pattern/action
> > block and map this onto the hardware. How many rules per second can this
> > support? I've run into systems that expect a level of service somewhere
> > around 50k cmds per second. So bulking will help at the message level
> > but it seems like a lot of overhead to unpack the pattern/action section.

There is indeed no guarantee on the time taken to create a flow rule, as
debated with Sugesh (see the full thread):

 http://dpdk.org/ml/archives/dev/2016-July/043958.html

I will update the specification accordingly.

Consider that even 50k cmds per second may not be fast enough. Applications
always need to have some kind of fallback ready, and the ability to know
whether a packet has been matched by a rule is a way to help with that.

In any case, flow rules must be managed from the control path, the data path
must only handle consequences.

> > One strategy I've used in other systems that worked relatively well
> > is if the query for the parse graph above returns a key for each node
> > in the graph then a single lookup can map the key to a node. Its
> > unambiguous and then these operations simply become a table lookup.
> > So to be a bit more concrete this changes the pattern structure in
> > rte_flow_create() into a  <key,value,mask> tuple where the key is known
> > by the initial parse graph query. If you reserve a set of well-defined
> > key values for well known protocols like ethernet, ip, etc. then the
> > query model also works but the middle layer catches errors in this case
> > and again the driver only gets known good flows. So something like this,
> > 
> >   struct rte_flow_pattern {
> > 	uint32_t priority;
> > 	uint32_t key;
> > 	uint32_t value_length;
> > 	u8 *value;
> >   }

I agree that having an integer representing an entire pattern/actions combo
would be great, however how do you tell whether you want matched packets to
be duplicated to queue 6 and redirected to queue 3? This method can be used
to check if a type of rule is allowed but not whether it is actually
applicable. You still need to provide the entire pattern/actions description
to create a flow rule.

> > Also if we have multiple tables what do you think about adding a
> > table_id to the signature. Probably not needed in the first generation
> > but is likely useful for hardware with multiple tables so that it
> > would be,
> > 
> >    rte_flow_create(uint8_t port_id, uint8_t table_id, ...);

Not sure if I understand the table ID concept, do you mean in case a device
supports entirely different sets of features depending on something? (What?)

> > Finally one other problem we've had which would be great to address
> > if we are doing a rewrite of the API is adding new protocols to
> > already deployed DPDK stacks. This is mostly a Linux distribution
> > problem where you can't easily update DPDK.
> > 
> > In the prototype header linked in this document it seems to add new
> > headers requires adding a new enum in the rte_flow_item_type but there
> > is at least an attempt at a catch all here,
> > 
> >> 	/**
> >> 	 * Matches a string of a given length at a given offset (in bytes),
> >> 	 * or anywhere in the payload of the current protocol layer
> >> 	 * (including L2 header if used as the first item in the stack).
> >> 	 *
> >> 	 * See struct rte_flow_item_raw.
> >> 	 */
> >> 	RTE_FLOW_ITEM_TYPE_RAW,
> > 
> > Actually this is a nice implementation because it works after the
> > previous item in the stack correct?

Yes, this is correct.

> > So you can put it after "known"
> > variable length headers like IP. The limitation is it can't get past
> > undefined variable length headers.

RTE_FLOW_ITEM_TYPE_ANY is made for that purpose. Is that what you are
looking for?

> > However if you use the above parse
> > graph reporting from the driver mechanism and the driver always reports
> > its largest supported graph then we don't have this issue where a new
> > hardware sku/ucode/etc added support for new headers but we have no
> > way to deploy it to existing software users without recompiling and
> > redeploying.

I really would like to understand if you see a limitation regarding this
with the specified API, even assuming DPDK is compiled as a shared library
and thus not part of the user application.

> > I looked at the git repo but I only saw the header definition I guess
> > the implementation is TBD after there is enough agreement on the
> > interface?

Precisely, I intend to update the tree and send a v2 soon (unfortunately did
not have much time these past few days to work on this).

Now what if, instead of a seemingly complex parse graph and still in
addition to the query method, enum values were defined for PMDs to report
an array of supported items, typical patterns and actions so applications
can get a quick idea of what devices are capable of without being too
specific. Something like:

 enum rte_flow_capability {
     RTE_FLOW_CAPABILITY_ITEM_ETH,
     RTE_FLOW_CAPABILITY_PATTERN_ETH_IP_TCP,
     RTE_FLOW_CAPABILITY_ACTION_ID,
     ...
 };

Although I'm not convinced about the usefulness of this because it would
have to be maintained separately, but that would be easier than building a
dummy flow rule for simple query purposes.

The main question I have for you is, do you think the core of the specified
API is adequate enough assuming it can be extended later with new methods?

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v2] lpm: remove redundant check when adding lpm rule
  2016-08-02 21:36  3%     ` Thomas Monjalon
@ 2016-08-03  9:16  4%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2016-08-03  9:16 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Wei Dai

On Tue, Aug 02, 2016 at 11:36:41PM +0200, Thomas Monjalon wrote:
> 2016-08-02 17:04, Bruce Richardson:
> > Having to make this change twice shows up the fact that we are still carrying
> > around some version changes for older releases. Given that we are now past the
> > 16.07 release, the old code can probably be removed. Any volunteers to maybe
> > do up a patch for that.
> 
> The first step would be to announce an ABI breakage.
> Do you plan to do other breaking changes? We may try to group them.
> 
Why does an ABI breakage need to be announced? The code has been in place for
some time, and was called out as an API change in the release notes for 16.04.
Any app compiled against either 16.04 or 16.07 release will work fine once the
code is removed. Any app compiled against an earlier version:
a) is not guaranteed to work, because we only guarantee 1-version
compatibility right now
b) in practice almost certainly won't work with 16.11 anyway, due to 
ABI changes in other areas.

Therefore, I would view an ABI announcement as rather pointless.

/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] lpm: remove redundant check when adding lpm rule
  @ 2016-08-02 21:36  3%     ` Thomas Monjalon
  2016-08-03  9:16  4%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-08-02 21:36 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Wei Dai

2016-08-02 17:04, Bruce Richardson:
> Having to make this change twice shows up the fact that we are still carrying
> around some version changes for older releases. Given that we are now past the
> 16.07 release, the old code can probably be removed. Any volunteers to maybe
> do up a patch for that.

The first step would be to announce an ABI breakage.
Do you plan to do other breaking changes? We may try to group them.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count
  @ 2016-08-01 18:08  4%                   ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2016-08-01 18:08 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: Neil Horman, Thomas Monjalon, dev, Mcnamara, John

On Mon, Aug 01, 2016 at 04:16:12PM +0000, Wiles, Keith wrote:
> 
> >> Neil
> >> 
> > Sorry for the delayed response, I've been on vacation.
> > 
> >> Your modified copy of make has no bearing on the topic we are taking about customers using dpdk in standard distros right?
> >> 
> > !?  I really don't know what you're saying here.  My only reason for commenting
> > on my copy of make was to consider an explination for why make -j 0 might be
> > working for me, and not for you.  I've since uninstalled and resinalled make on
> > my system, and my copy now behaves as yours does
> > 
> > But thats all really irrelevant.  I don't know what you mean by this not having
> > any bearing on the conversation since we're talking about customers using dpdk
> > in a distro.  We're not really talking about that at all (if we were using make
> > would be a moot point, since distro users tend to only use pre-compiled
> > binaries).  Were talking about accelerating the build process when comparing
> > ABIs on two different dpdk versions, thats all.
> 
> Neil, (I am trying to not read your style of text as condescending and I will try to not do that as well)
> 
> Not everyone uses DPDK from prebuilt libraries and we need to support them as well, correct?
> 
Correct, which is why I didn't understand your initial comment:
"Your modified copy of make has no bearing on the topic we are taking about
customers using dpdk in standard distros right?"

I read that as you saying that the topic we are discussing here is DPDK use in
standard distros, to which I am replying "No, that is not what we are talking
about here at all"

Standard Distros, as I am involved with them, are standard because the end user
typically strives to never build software included in the distro themselves.  As
such, this patch has no bearing whatsoever on those end users, because they
expect to just use a pre-built binary that conforms to a given ABI level.  They
have no need for this code

As you note, of course other upstream developers don't use pre-built binaries,
and that is who this change is targeting

> > 
> >> Seems odd to me to send this out with 0 or lspci as it may fail because of no lspci and will fail on all Ubuntu systems. 
> >> 
> > Again, don't really understand what your saying.  If you look at the patch,
> > neither of your assertions are true.  With this patch and no other change, the
> > validate_abi script behaves exactly as it does now.  The only thing I've done is
> > add a check for the DPDK_MAKE_JOBS environment variable, and if its not set,
> > either:
> > 
> > a) Set DPDK_MAKE_JOBS to 1 if lscpu doesn't exist on the system
> > b) Set DPDK_MAKE_JOBS to the number of online cpus if lscpu does exist
> > 
> > All of that gets overridden if you manually set DPDK_MAKE_JOBS to something
> > else.
> > 
> > seems pretty straightforward to me.
> 
> At this point do we need to add yet another environment variable to get the correct behavior with DPDK. DPDK is very simple to build today and I worry we keep adding special variables to build DPDK. Can we just use a cleaner default, then adding more and more special build requirements? Adding this one is fine, but it also means the customer must read the docs to find this new variable.
> 
Please read back through the thread.  The DPDK_MAKE_JOBS environment variable
was specifically used because it already exists in the build (see
test-build.sh).  Thomas specifically asked me to change the environment variable
name so that it can be resued.  We're not adding anything here that isn't
already there in other locations.

> DPDK should be build able with the least amount of docs to read, then they can read the docs more later. Just looking at how the developer can get started building DPDK without RTFM problem. At some point they need to read the docs to possibly runs the examples, but to build DPDK should very simple IMO.
So, this script has nothing to do with actually building the DPDK, only
analyzing differences in ABI between two arbitrary levels.  Nothing about this
change makes building the DPDK harder, easier, or in any way different.

> 
> > 
> >> If we ship with 1 then why even bother the adding code and if I have to edit the file or some other method to get better compile performance then why bother as well.
> >> 
> > Please stop and think for a minute.  Why would you need to edit a file to change
> > anything?  If lscpu exists, then everything happens automatically.  If it
> > doesn't you can still just run:
> > export DPDK_MAKE_JOBS=$BIGNUMBER; validate_abi.sh <args>
> 
> Please do not add extra environment variable we would start to get to the point of having so many pre-build requirements it becomes the private knowledge to just build DPDK or a huge setup/RTFM problem.
> 
See above, I'm getting the impression that you're just arguing without actually
looking at the code first.

> > 
> > and it works fine.  If ubuntu has some other utiilty besides lscpu to parse cpu
> > counts, fine, we can add that in as a feature, but I don't see why that should
> > stop other, non-ubuntu systems from taking advantage of this.
> > 
> >> Setting the value to some large number does not make any sense to me and if I have to edit file every time or maintain a patch just seems silly. 
Its no different than setting -j without an argument.  -j with no argument,
allows make to just fork jobs as deeply as the dependency graph allows (above
and beyond the number of cpus that can run them).  This can lead to large run
queues on the scheduler (depending on the amount of parallelism the dependency
graph evaluates to).  While thats not necessecarily a bad thing, it may not be
what a developer wants (as it shows up as a large load value).  Regardless,
setting -j to any number larger than the maximum number of independent jobs that
a given make file can find is equivalent to setting -j with no argument.  Thats
all I'm getting at here, export DPDK_MAKE_JOBS=<large number> is exactly
equivalent to -j without an argument for <large number> >= <max number of
parallel jobs possible>.  So you can get the behavior you want, or something
more restrictive with my patch.  If we just set -j in the validate abi script
without an argument, then we only get one of those possibilities.

If your argument is that no one would want to set anything other than maximal
job count, I would respond by saying that I like to do that frequently, because
it lets me limit the load of a build that I run, and it lets me serialize make
in the event that I want to debug a build problem.  Allowing reduced job counts
is a valuable and relevant feature.

> >> 
> > Goodness Keith, stop for just a minute with the editing the file train your on.
> > Its an environment variable, you don't have to edit a file to change it.
> 
> Yes Neil, you also need to stop an think about what you are placing on the developer to build DPDK. This one little problem is not the real issue to me, but a symptom of a growing problem in DPDK around how DPDK is build and the amount of knowledge or setup it requires to do this one simple task.
> 
Keith, I feel like you may be missing the point of this script.  This is the 3rd
time you've asserted that this change makes DPDK harder to build.  I assert 
that:

a) The validate_abi script is in no way required to build the DPDK (i.e. it is
an ancilliary tool that compares ABI at two different arbitrary points in the
project git history.  It simply does so by building the DPDK as part of its
work) 

b) The environment variable in question is already being used by other scripts
(i.e. there is nothing new being added here in terms of environment variables,
only reuse of existing ones)

c) Completely ignoring this environment variable in no way impacts behavior
(i.e. failure to set this environment variable attempts to accelerate build
parallelism within the confines of this script, if the right tools are
available, and leaves the behavior unchanged otherwise)

Can you please elaborate on how you feel this change makes DPDK harder to build?

> > 
> >> It just seems easier to set it to -j and not use lspci at all. This way we all win as I am not following your logic at all.
> >> 
> > This doesn't even make any sense.  If you set it to -j then you get make issuing
> > jobs at the max rate make can issue them, which is potentially fine, but may not
> > be what developers want in the event they don't want this script to monopolize
> > their system.  The way its written now lets people get reasonable behavior in
> > most cases, and opt-in to the extreeme case should they desire.  That makes far
> > more sense to me, then just chewing up as much cpu as possible all the time.
> 
> I only suggest -j as this would give the developer the best build performance without having to require lscpu or setting up yet another build environment variable. The lscpu command does not exist on all systems today and other non-Linux based systems in the future.
> 
In your environment it does yes, but that may not be the only thing people want,
thats what I'm saying, and I think thats worth taking into account.

Also, fwiw, lscpu should be part of every standard linux distribution, as its
part of the utils-linux package, which contains a number of core utilities to
make user space usable:
http://packages.ubuntu.com/precise/amd64/util-linux/filelist

I'm not sure if you are asserting that it doesn't exist on your ubuntu system,
but that seems to suggest that you've somehow removed it.

> The amount of gain with -j over with -j is a reasonable performance option, as for chewing up cpu performance for 20-50 seconds is not a problem IMO. Please look at the bigger picture and not just your way of building DPDK as most will have a standalone machine as a development platform (I would assume) and utilizing that machine to the fullest is not a problem (unless you need to get the last digit of PI in another process :-).
> 
Again, this utility builds the dpdk as part of its work, but is not used for the
DPDK build process itself.  That seems to be a critical difference in my mind
here.

> I have DPDK building on Mac OS and lscpu does not exist for that system (I do not know about windows). Think about the future some and using -j just seems to have the least amount of requirements on the system, right?
> 
I'm sorry, no.  From what I can see we currently support linux and bsd on a
variety of hardware architectures.  Thats what we're targeting.  I don't think
its reasonable to say no to a change because it uses a tool that may not exist
on a future operating system, that no one is targeting at the moment, or plans
to (Note also, that this script will work just fine on those systems, it will
just use the basic behavior of -j 1 because the if [ -e /usr/bin/lscpu ] test
will fail, which is more than I can say for the cpu_layout.py script, which will
traceback when reading /proc/cpuinfo or dpdk_nic_bind, which will simply fail
when trying to bind or unbind a nic using vfio on those systems)

I think thats really where were falling apart here.  You seem to be under the
impression that this script is somehow tied to the build process, and by making
this change I am somehow negatively impacting that process (or potentially
breaking it).  Thats simply not true.  This script is self contained, and is in
no way part of the build system.  The script itself just happens to build the
dpdk as part of its work internally), but a developer never has to use it to
acutally build the dpdk itself.

> If the developer does not like the ‘chew up all of my CPUs' problem then he can read the docs to set the environment variable, but I suspect that would not even happen in 99% of the cases.
Ah, so we agree in priciple, and we're just haggling over price? :)

I would counter your argument with the fact that 99% of developers also have a
system which contains lscpu, and so they will get the right amount of
parallelism for their system, not a flood of parallel jobs, nor a serialized
single job environment.  Its the happy medium I'm after, not too much, not too
little.

Neil

> 
> Keith
> 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 04/17] eal: remove duplicate function declaration
  @ 2016-08-01 10:45  3%   ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-08-01 10:45 UTC (permalink / raw)
  To: dev; +Cc: viktorin, thomas.monjalon, David Marchand

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 lib/librte_eal/common/eal_private.h | 7 -------
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 857dc3e..06a68f6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -259,13 +259,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);
 
 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 3fb2188..fe9c704 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-31  7:46  4% ` [dpdk-dev] [PATCH] " Vlad Zolotarov
@ 2016-07-31  8:10  4%   ` Vlad Zolotarov
  0 siblings, 0 replies; 200+ results
From: Vlad Zolotarov @ 2016-07-31  8:10 UTC (permalink / raw)
  To: Tomasz Kulasek, dev



On 07/31/2016 10:46 AM, Vlad Zolotarov wrote:
>
>
> On 07/20/2016 05:24 PM, Tomasz Kulasek wrote:
>> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
>> changes in rte_eth_dev and rte_eth_desc_lim structures.
>>
>> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
>> necessary preparations of packet burst to be safely transmitted on
>> device for desired HW offloads (set/reset checksum field according to
>> the hardware requirements) and check HW constraints (number of segments
>> per packet, etc).
>>
>> While the limitations and requirements may differ for devices, it
>> requires to extend rte_eth_dev structure with new function pointer
>> "tx_pkt_prep" which can be implemented in the driver to prepare and
>> verify packets, in devices specific way, before burst, what should to
>> prevent application to send malformed packets.
>>
>> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
>> nb_mtu_seg_max, providing an information about max segments in TSO and
>> non TSO packets acceptable by device.
>>
>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>
> Acked-by: Vlad Zolotarov <vladz@scylladb.com>

One small comment however.
Although this function is a must we need a way to clearly understand 
which one the clusters are malformed since dropping the whole bulk is 
usually not an option and sending the malformed packets anyway may cause 
a HW hang, thus not an option as well.
Another thing - I've pulled the current master and I couldn't find the 
way an application may query the mentioned Tx offload HW limitation, 
e.g. maximum number of segments.
Knowing this limitation would avoid extra liniarization.

thanks,
vlad

>
>> ---
>>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst 
>> b/doc/guides/rel_notes/deprecation.rst
>> index f502f86..485aacb 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -41,3 +41,10 @@ Deprecation Notices
>>   * The mempool functions for single/multi producer/consumer are 
>> deprecated and
>>     will be removed in 16.11.
>>     It is replaced by rte_mempool_generic_get/put functions.
>> +
>> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure 
>> will be
>> +  extended with new function pointer ``tx_pkt_prep`` allowing 
>> verification
>> +  and processing of packet burst to meet HW specific requirements 
>> before
>> +  transmit. Also new fields will be added to the 
>> ``rte_eth_desc_lim`` structure:
>> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about 
>> number of
>> +  segments limit to be transmitted by device for TSO/non-TSO packets.
>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 22:48  4%   ` Ananyev, Konstantin
  2016-07-27  8:59  4%     ` Thomas Monjalon
@ 2016-07-31  7:50  4%     ` Vlad Zolotarov
  1 sibling, 0 replies; 200+ results
From: Vlad Zolotarov @ 2016-07-31  7:50 UTC (permalink / raw)
  To: Ananyev, Konstantin, Kulasek, TomaszX, dev



On 07/22/2016 01:48 AM, Ananyev, Konstantin wrote:
>
>> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
>> changes in rte_eth_dev and rte_eth_desc_lim structures.
>>
>> As discussed in that thread:
>>
>> http://dpdk.org/ml/archives/dev/2015-September/023603.html
>>
>> Different NIC models depending on HW offload requested might impose
>> different requirements on packets to be TX-ed in terms of:
>>
>>   - Max number of fragments per packet allowed
>>   - Max number of fragments per TSO segments
>>   - The way pseudo-header checksum should be pre-calculated
>>   - L3/L4 header fields filling
>>   - etc.
>>
>>
>> MOTIVATION:
>> -----------
>>
>> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>>     However, this work is sometimes required, and now, it's an
>>     application issue.
>>
>> 2) Different hardware may have different requirements for TX offloads,
>>     other subset can be supported and so on.
>>
>> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>>     device. These parameters may be vary for different devices.
>>
>>     For example i40e HW allows 8 fragments per packet, but that is after
>>     TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
>>
>> 4) Fields in packet may require different initialization (like eg. will
>>     require pseudo-header checksum precalculation, sometimes in a
>>     different way depending on packet type, and so on). Now application
>>     needs to care about it.
>>
>> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>>     prepare packet burst in acceptable form for specific device.
>>
>> 6) Some additional checks may be done in debug mode keeping tx_burst
>>     implementation clean.
>>
>>
>> PROPOSAL:
>> ---------
>>
>> To help user to deal with all these varieties we propose to:
>>
>> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>>     packet burst to be safely transmitted on device for desired HW
>>     offloads (set/reset checksum field according to the hardware
>>     requirements) and check HW constraints (number of segments per
>>     packet, etc).
>>
>>     While the limitations and requirements may differ for devices, it
>>     requires to extend rte_eth_dev structure with new function pointer
>>     "tx_pkt_prep" which can be implemented in the driver to prepare and
>>     verify packets, in devices specific way, before burst, what should to
>>     prevent application to send malformed packets.
>>
>> 2. Also new fields will be introduced in rte_eth_desc_lim:
>>     nb_seg_max and nb_mtu_seg_max, providing an information about max
>>     segments in TSO and non-TSO packets acceptable by device.
>>
>>     This information is useful for application to not create/limit
>>     malicious packet.
>>
>>
>> APPLICATION (CASE OF USE):
>> --------------------------
>>
>> 1) Application should to initialize burst of packets to send, set
>>     required tx offload flags and required fields, like l2_len, l3_len,
>>     l4_len, and tso_segsz
>>
>> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>>     required to send packets through the NIC.
>>
>> 3) The result of rte_eth_tx_prep can be used to send valid packets
>>     and/or restore invalid if function fails.
>>
>> eg.
>>
>> 	for (i = 0; i < nb_pkts; i++) {
>>
>> 		/* initialize or process packet */
>>
>> 		bufs[i]->tso_segsz = 800;
>> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
>> 				| PKT_TX_IP_CKSUM;
>> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
>> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
>> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
>> 	}
>>
>> 	/* Prepare burst of TX packets */
>> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
>>
>> 	if (nb_prep < nb_pkts) {
>> 		printf("tx_prep failed\n");
>>
>> 		/* drop or restore invalid packets */
>>
>> 	}
>>
>> 	/* Send burst of TX packets */
>> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
>>
>> 	/* Free any unsent packets. */
>>
>>
>>
>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

Acked-by: Vlad Zolotarov <vladz@scylladb.com>

>> ---
>>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index f502f86..485aacb 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -41,3 +41,10 @@ Deprecation Notices
>>   * The mempool functions for single/multi producer/consumer are deprecated and
>>     will be removed in 16.11.
>>     It is replaced by rte_mempool_generic_get/put functions.
>> +
>> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
>> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
>> +  and processing of packet burst to meet HW specific requirements before
>> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
>> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
>> +  segments limit to be transmitted by device for TSO/non-TSO packets.
>> --
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
>> 1.7.9.5

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 14:24 13% [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
  2016-07-20 15:01  4% ` Thomas Monjalon
  2016-07-21 15:24 11% ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
@ 2016-07-31  7:46  4% ` Vlad Zolotarov
  2016-07-31  8:10  4%   ` Vlad Zolotarov
  2 siblings, 1 reply; 200+ results
From: Vlad Zolotarov @ 2016-07-31  7:46 UTC (permalink / raw)
  To: Tomasz Kulasek, dev



On 07/20/2016 05:24 PM, Tomasz Kulasek wrote:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
>
> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> necessary preparations of packet burst to be safely transmitted on
> device for desired HW offloads (set/reset checksum field according to
> the hardware requirements) and check HW constraints (number of segments
> per packet, etc).
>
> While the limitations and requirements may differ for devices, it
> requires to extend rte_eth_dev structure with new function pointer
> "tx_pkt_prep" which can be implemented in the driver to prepare and
> verify packets, in devices specific way, before burst, what should to
> prevent application to send malformed packets.
>
> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> nb_mtu_seg_max, providing an information about max segments in TSO and
> non TSO packets acceptable by device.
>
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

Acked-by: Vlad Zolotarov <vladz@scylladb.com>

> ---
>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>   * The mempool functions for single/multi producer/consumer are deprecated and
>     will be removed in 16.11.
>     It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] log: remove history dump
  2016-07-29 13:01  9% [dpdk-dev] [PATCH] log: remove history dump Thomas Monjalon
@ 2016-07-29 13:50  9% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-29 13:50 UTC (permalink / raw)
  To: dev

The log history feature was deprecated in 16.07.
The remaining empty functions are removed in 16.11.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
v2: fix LIBABIVER and compilation of test
---
 app/test/test.h                         |  5 +++-
 doc/guides/rel_notes/deprecation.rst    |  3 ---
 doc/guides/rel_notes/release_16_11.rst  |  4 +++-
 lib/librte_eal/bsdapp/eal/Makefile      |  2 +-
 lib/librte_eal/common/eal_common_log.c  | 19 ---------------
 lib/librte_eal/common/include/rte_log.h | 41 ---------------------------------
 lib/librte_eal/linuxapp/eal/Makefile    |  2 +-
 7 files changed, 9 insertions(+), 67 deletions(-)

diff --git a/app/test/test.h b/app/test/test.h
index 467b9c0..74d6021 100644
--- a/app/test/test.h
+++ b/app/test/test.h
@@ -33,9 +33,12 @@
 
 #ifndef _TEST_H_
 #define _TEST_H_
+
 #include <stddef.h>
 #include <sys/queue.h>
-#include "rte_log.h"
+
+#include <rte_common.h>
+#include <rte_log.h>
 
 #define TEST_SUCCESS  (0)
 #define TEST_FAILED  (-1)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 8263d03..96db661 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,9 +8,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* The log history is deprecated.
-  It is voided in 16.07 and will be removed in release 16.11.
-
 * The ethdev library file will be renamed from libethdev.* to librte_ethdev.*
   in release 16.11 in order to have a more consistent namespace.
 
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index a6e3307..0b9022d 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -94,6 +94,8 @@ API Changes
 
    This section is a comment. Make sure to start the actual text at the margin.
 
+* The log history is removed.
+
 
 ABI Changes
 -----------
@@ -131,7 +133,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_cmdline.so.2
      librte_cryptodev.so.1
      librte_distributor.so.1
-     librte_eal.so.2
+   + librte_eal.so.3
      librte_hash.so.2
      librte_ip_frag.so.1
      librte_ivshmem.so.1
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index 988cbbc..7a0fea5 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -48,7 +48,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := rte_eal_version.map
 
-LIBABIVER := 2
+LIBABIVER := 3
 
 # specific to linuxapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_log.c b/lib/librte_eal/common/eal_common_log.c
index 7916c78..967991a 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -64,19 +64,6 @@ static RTE_DEFINE_PER_LCORE(struct log_cur_msg, log_cur_msg);
 
 /* default logs */
 
-int
-rte_log_add_in_history(const char *buf __rte_unused, size_t size __rte_unused)
-{
-	return 0;
-}
-
-void
-rte_log_set_history(int enable)
-{
-	if (enable)
-		RTE_LOG(WARNING, EAL, "The log history is deprecated.\n");
-}
-
 /* Change the stream that will be used by logging system */
 int
 rte_openlog_stream(FILE *f)
@@ -131,12 +118,6 @@ int rte_log_cur_msg_logtype(void)
 	return RTE_PER_LCORE(log_cur_msg).logtype;
 }
 
-/* Dump log history to file */
-void
-rte_log_dump_history(FILE *out __rte_unused)
-{
-}
-
 /*
  * Generates a log message The message will be sent in the stream
  * defined by the previous call to rte_openlog_stream().
diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h
index b1add04..919563c 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -42,8 +42,6 @@
  * This file provides a log API to RTE applications.
  */
 
-#include "rte_common.h" /* for __rte_deprecated macro */
-
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -181,45 +179,6 @@ int rte_log_cur_msg_loglevel(void);
 int rte_log_cur_msg_logtype(void);
 
 /**
- * @deprecated
- * Enable or disable the history (enabled by default)
- *
- * @param enable
- *   true to enable, or 0 to disable history.
- */
-__rte_deprecated
-void rte_log_set_history(int enable);
-
-/**
- * @deprecated
- * Dump the log history to a file
- *
- * @param f
- *   A pointer to a file for output
- */
-__rte_deprecated
-void rte_log_dump_history(FILE *f);
-
-/**
- * @deprecated
- * Add a log message to the history.
- *
- * This function can be called from a user-defined log stream. It adds
- * the given message in the history that can be dumped using
- * rte_log_dump_history().
- *
- * @param buf
- *   A data buffer containing the message to be saved in the history.
- * @param size
- *   The length of the data buffer.
- * @return
- *   - 0: Success.
- *   - (-ENOBUFS) if there is no room to store the message.
- */
-__rte_deprecated
-int rte_log_add_in_history(const char *buf, size_t size);
-
-/**
  * Generates a log message.
  *
  * The message will be sent in the stream defined by the previous call
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 182729c..3a7631a 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -37,7 +37,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 2
+LIBABIVER := 3
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
-- 
2.7.0

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] doc: postpone mempool ABI breakage
@ 2016-07-29 13:41  8% Thomas Monjalon
  2016-08-03 16:46  4% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-29 13:41 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev

It was planned to remove some mempool functions which are deprecated
since 16.07.
As no other mempool ABI change is planned in 16.11, it is better
to postpone and group every mempool ABI changes in 17.02.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1b953fe..96db661 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,15 +41,14 @@ Deprecation Notices
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
   their behavior will be kept in 16.07 and will be removed in 16.11.
 
-* The APIs rte_mempool_count and rte_mempool_free_count are being deprecated
-  on the basis that they are confusing to use - free_count actually returns
-  the number of allocated entries, not the number of free entries as expected.
-  They are being replaced by rte_mempool_avail_count and
-  rte_mempool_in_use_count respectively.
+* mempool: The functions ``rte_mempool_count`` and ``rte_mempool_free_count``
+  will be removed in 17.02.
+  They are replaced by ``rte_mempool_avail_count`` and
+  ``rte_mempool_in_use_count`` respectively.
 
-* The mempool functions for single/multi producer/consumer are deprecated and
-  will be removed in 16.11.
-  It is replaced by rte_mempool_generic_get/put functions.
+* mempool: The functions for single/multi producer/consumer are deprecated
+  and will be removed in 17.02.
+  It is replaced by ``rte_mempool_generic_get/put`` functions.
 
 * The ``rte_ivshmem`` feature (including library and EAL code) will be removed
   in 16.11 because it has some design issues which are not planned to be fixed.
-- 
2.7.0

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH] log: remove history dump
@ 2016-07-29 13:01  9% Thomas Monjalon
  2016-07-29 13:50  9% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-29 13:01 UTC (permalink / raw)
  To: david.marchand; +Cc: dev

The log history feature was deprecated in 16.07.
The remaining empty functions are removed in 16.11.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst    |  3 ---
 doc/guides/rel_notes/release_16_11.rst  |  4 +++-
 lib/librte_eal/common/eal_common_log.c  | 19 ---------------
 lib/librte_eal/common/include/rte_log.h | 41 ---------------------------------
 4 files changed, 3 insertions(+), 64 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d2dc4a9..1b953fe 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,9 +8,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* The log history is deprecated.
-  It is voided in 16.07 and will be removed in release 16.11.
-
 * The ethdev library file will be renamed from libethdev.* to librte_ethdev.*
   in release 16.11 in order to have a more consistent namespace.
 
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index a6e3307..0b9022d 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -94,6 +94,8 @@ API Changes
 
    This section is a comment. Make sure to start the actual text at the margin.
 
+* The log history is removed.
+
 
 ABI Changes
 -----------
@@ -131,7 +133,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_cmdline.so.2
      librte_cryptodev.so.1
      librte_distributor.so.1
-     librte_eal.so.2
+   + librte_eal.so.3
      librte_hash.so.2
      librte_ip_frag.so.1
      librte_ivshmem.so.1
diff --git a/lib/librte_eal/common/eal_common_log.c b/lib/librte_eal/common/eal_common_log.c
index 7916c78..967991a 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -64,19 +64,6 @@ static RTE_DEFINE_PER_LCORE(struct log_cur_msg, log_cur_msg);
 
 /* default logs */
 
-int
-rte_log_add_in_history(const char *buf __rte_unused, size_t size __rte_unused)
-{
-	return 0;
-}
-
-void
-rte_log_set_history(int enable)
-{
-	if (enable)
-		RTE_LOG(WARNING, EAL, "The log history is deprecated.\n");
-}
-
 /* Change the stream that will be used by logging system */
 int
 rte_openlog_stream(FILE *f)
@@ -131,12 +118,6 @@ int rte_log_cur_msg_logtype(void)
 	return RTE_PER_LCORE(log_cur_msg).logtype;
 }
 
-/* Dump log history to file */
-void
-rte_log_dump_history(FILE *out __rte_unused)
-{
-}
-
 /*
  * Generates a log message The message will be sent in the stream
  * defined by the previous call to rte_openlog_stream().
diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h
index b1add04..919563c 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -42,8 +42,6 @@
  * This file provides a log API to RTE applications.
  */
 
-#include "rte_common.h" /* for __rte_deprecated macro */
-
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -181,45 +179,6 @@ int rte_log_cur_msg_loglevel(void);
 int rte_log_cur_msg_logtype(void);
 
 /**
- * @deprecated
- * Enable or disable the history (enabled by default)
- *
- * @param enable
- *   true to enable, or 0 to disable history.
- */
-__rte_deprecated
-void rte_log_set_history(int enable);
-
-/**
- * @deprecated
- * Dump the log history to a file
- *
- * @param f
- *   A pointer to a file for output
- */
-__rte_deprecated
-void rte_log_dump_history(FILE *f);
-
-/**
- * @deprecated
- * Add a log message to the history.
- *
- * This function can be called from a user-defined log stream. It adds
- * the given message in the history that can be dumped using
- * rte_log_dump_history().
- *
- * @param buf
- *   A data buffer containing the message to be saved in the history.
- * @param size
- *   The length of the data buffer.
- * @return
- *   - 0: Success.
- *   - (-ENOBUFS) if there is no room to store the message.
- */
-__rte_deprecated
-int rte_log_add_in_history(const char *buf, size_t size);
-
-/**
  * Generates a log message.
  *
  * The message will be sent in the stream defined by the previous call
-- 
2.7.0

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] ivshmem: remove integration in dpdk
@ 2016-07-29 12:28  1% David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2016-07-29 12:28 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, anatoly.burakov

Following discussions on the mailing list [1] and since nobody stood up to
implement the necessary cleanups, here is the ivshmem integration removal.

There is not much to say about this patch, a lot of code is being removed.
The default configuration file for packet_ordering example is replaced with
the "native" x86 file.
The only tricky part is in eal_memory with the memseg index stuff.

More cleanups can be done after this but will come in subsequent patchsets.

[1]: http://dpdk.org/ml/archives/dev/2016-June/040844.html

Signed-off-by: David Marchand <david.marchand@6wind.com>
---
 MAINTAINERS                                  |   8 -
 app/test/Makefile                            |   1 -
 app/test/autotest_data.py                    |   6 -
 app/test/test.c                              |   3 -
 app/test/test.h                              |   1 -
 app/test/test_ivshmem.c                      | 433 ------------
 config/defconfig_arm64-armv8a-linuxapp-gcc   |   1 -
 config/defconfig_x86_64-ivshmem-linuxapp-gcc |  49 --
 config/defconfig_x86_64-ivshmem-linuxapp-icc |  49 --
 doc/api/doxy-api-index.md                    |   1 -
 doc/api/doxy-api.conf                        |   1 -
 doc/api/examples.dox                         |   2 -
 doc/guides/linux_gsg/build_dpdk.rst          |   2 +-
 doc/guides/linux_gsg/quick_start.rst         |  14 +-
 doc/guides/prog_guide/img/ivshmem.png        | Bin 44920 -> 0 bytes
 doc/guides/prog_guide/index.rst              |   1 -
 doc/guides/prog_guide/ivshmem_lib.rst        | 160 -----
 doc/guides/prog_guide/source_org.rst         |   1 -
 doc/guides/rel_notes/deprecation.rst         |   3 -
 doc/guides/rel_notes/release_16_11.rst       |   3 +
 examples/Makefile                            |   1 -
 examples/l2fwd-ivshmem/Makefile              |  43 --
 examples/l2fwd-ivshmem/guest/Makefile        |  50 --
 examples/l2fwd-ivshmem/guest/guest.c         | 452 -------------
 examples/l2fwd-ivshmem/host/Makefile         |  50 --
 examples/l2fwd-ivshmem/host/host.c           | 895 -------------------------
 examples/l2fwd-ivshmem/include/common.h      | 111 ----
 examples/packet_ordering/Makefile            |   2 +-
 lib/Makefile                                 |   1 -
 lib/librte_eal/common/eal_common_memzone.c   |  12 -
 lib/librte_eal/common/eal_private.h          |  22 -
 lib/librte_eal/common/include/rte_memory.h   |   3 -
 lib/librte_eal/common/include/rte_memzone.h  |   7 +-
 lib/librte_eal/common/malloc_heap.c          |   8 -
 lib/librte_eal/linuxapp/eal/Makefile         |   9 -
 lib/librte_eal/linuxapp/eal/eal.c            |  10 -
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c    | 954 ---------------------------
 lib/librte_eal/linuxapp/eal/eal_memory.c     |  30 +-
 lib/librte_ivshmem/Makefile                  |  54 --
 lib/librte_ivshmem/rte_ivshmem.c             | 919 --------------------------
 lib/librte_ivshmem/rte_ivshmem.h             | 165 -----
 lib/librte_ivshmem/rte_ivshmem_version.map   |  12 -
 mk/rte.app.mk                                |   1 -
 43 files changed, 13 insertions(+), 4537 deletions(-)
 delete mode 100644 app/test/test_ivshmem.c
 delete mode 100644 config/defconfig_x86_64-ivshmem-linuxapp-gcc
 delete mode 100644 config/defconfig_x86_64-ivshmem-linuxapp-icc
 delete mode 100644 doc/guides/prog_guide/img/ivshmem.png
 delete mode 100644 doc/guides/prog_guide/ivshmem_lib.rst
 delete mode 100644 examples/l2fwd-ivshmem/Makefile
 delete mode 100644 examples/l2fwd-ivshmem/guest/Makefile
 delete mode 100644 examples/l2fwd-ivshmem/guest/guest.c
 delete mode 100644 examples/l2fwd-ivshmem/host/Makefile
 delete mode 100644 examples/l2fwd-ivshmem/host/host.c
 delete mode 100644 examples/l2fwd-ivshmem/include/common.h
 delete mode 100644 lib/librte_eal/linuxapp/eal/eal_ivshmem.c
 delete mode 100644 lib/librte_ivshmem/Makefile
 delete mode 100644 lib/librte_ivshmem/rte_ivshmem.c
 delete mode 100644 lib/librte_ivshmem/rte_ivshmem.h
 delete mode 100644 lib/librte_ivshmem/rte_ivshmem_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 6536c6b..5e3d825 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -546,14 +546,6 @@ F: app/test/test_cmdline*
 F: examples/cmdline/
 F: doc/guides/sample_app_ug/cmd_line.rst
 
-Qemu IVSHMEM
-M: Anatoly Burakov <anatoly.burakov@intel.com>
-F: lib/librte_ivshmem/
-F: lib/librte_eal/linuxapp/eal/eal_ivshmem.c
-F: doc/guides/prog_guide/ivshmem_lib.rst
-F: app/test/test_ivshmem.c
-F: examples/l2fwd-ivshmem/
-
 Key/Value parsing
 M: Olivier Matz <olivier.matz@6wind.com>
 F: lib/librte_kvargs/
diff --git a/app/test/Makefile b/app/test/Makefile
index 49ea195..611d77a 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -167,7 +167,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_KNI) += test_kni.c
 SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power.c test_power_acpi_cpufreq.c
 SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power_kvm_vm.c
 SRCS-y += test_common.c
-SRCS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += test_ivshmem.c
 
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += test_distributor.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += test_distributor_perf.c
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index defd46e..9e8fd94 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -174,12 +174,6 @@ parallel_test_group_list = [
 			"Report" :  None,
 		},
 		{
-		 "Name" :	"IVSHMEM autotest",
-		 "Command" : 	"ivshmem_autotest",
-		 "Func" :	default_autotest,
-		 "Report" :	None,
-		},
-		{
 		 "Name" :	"Memcpy autotest",
 		 "Command" : 	"memcpy_autotest",
 		 "Func" :	default_autotest,
diff --git a/app/test/test.c b/app/test/test.c
index ccad0e3..cd0e784 100644
--- a/app/test/test.c
+++ b/app/test/test.c
@@ -95,9 +95,6 @@ do_recursive_call(void)
 			{ "test_memory_flags", no_action },
 			{ "test_file_prefix", no_action },
 			{ "test_no_huge_flag", no_action },
-#ifdef RTE_LIBRTE_IVSHMEM
-			{ "test_ivshmem", test_ivshmem },
-#endif
 	};
 
 	if (recursive_call == NULL)
diff --git a/app/test/test.h b/app/test/test.h
index 467b9c0..b250c84 100644
--- a/app/test/test.h
+++ b/app/test/test.h
@@ -235,7 +235,6 @@ int test_pci_run;
 
 int test_mp_secondary(void);
 
-int test_ivshmem(void);
 int test_set_rxtx_conf(cmdline_fixed_string_t mode);
 int test_set_rxtx_anchor(cmdline_fixed_string_t type);
 int test_set_rxtx_sc(cmdline_fixed_string_t type);
diff --git a/app/test/test_ivshmem.c b/app/test/test_ivshmem.c
deleted file mode 100644
index ae9fd6c..0000000
--- a/app/test/test_ivshmem.c
+++ /dev/null
@@ -1,433 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <fcntl.h>
-#include <limits.h>
-#include <unistd.h>
-#include <string.h>
-#include <sys/mman.h>
-#include <sys/wait.h>
-#include <stdio.h>
-
-#include <cmdline_parse.h>
-
-#include "test.h"
-
-#include <rte_common.h>
-#include <rte_ivshmem.h>
-#include <rte_string_fns.h>
-#include "process.h"
-
-#define DUPLICATE_METADATA "duplicate"
-#define METADATA_NAME "metadata"
-#define NONEXISTENT_METADATA "nonexistent"
-#define FIRST_TEST 'a'
-
-#define launch_proc(ARGV) process_dup(ARGV, \
-		sizeof(ARGV)/(sizeof(ARGV[0])), "test_ivshmem")
-
-#define ASSERT(cond,msg) do {						\
-		if (!(cond)) {								\
-			printf("**** TEST %s() failed: %s\n",	\
-				__func__, msg);						\
-			return -1;								\
-		}											\
-} while(0)
-
-static char*
-get_current_prefix(char * prefix, int size)
-{
-	char path[PATH_MAX] = {0};
-	char buf[PATH_MAX] = {0};
-
-	/* get file for config (fd is always 3) */
-	snprintf(path, sizeof(path), "/proc/self/fd/%d", 3);
-
-	/* return NULL on error */
-	if (readlink(path, buf, sizeof(buf)) == -1)
-		return NULL;
-
-	/* get the basename */
-	snprintf(buf, sizeof(buf), "%s", basename(buf));
-
-	/* copy string all the way from second char up to start of _config */
-	snprintf(prefix, size, "%.*s",
-			(int)(strnlen(buf, sizeof(buf)) - sizeof("_config")),
-			&buf[1]);
-
-	return prefix;
-}
-
-static struct rte_ivshmem_metadata*
-mmap_metadata(const char *name)
-{
-	int fd;
-	char pathname[PATH_MAX];
-	struct rte_ivshmem_metadata *metadata;
-
-	snprintf(pathname, sizeof(pathname),
-			"/var/run/.dpdk_ivshmem_metadata_%s", name);
-
-	fd = open(pathname, O_RDWR, 0660);
-	if (fd < 0)
-		return NULL;
-
-	metadata = (struct rte_ivshmem_metadata*) mmap(NULL,
-			sizeof(struct rte_ivshmem_metadata), PROT_READ | PROT_WRITE,
-			MAP_SHARED, fd, 0);
-
-	if (metadata == MAP_FAILED)
-		return NULL;
-
-	close(fd);
-
-	return metadata;
-}
-
-static int
-create_duplicate(void)
-{
-	/* create a metadata that another process will then try to overwrite */
-	ASSERT (rte_ivshmem_metadata_create(DUPLICATE_METADATA) == 0,
-			"Creating metadata failed");
-	return 0;
-}
-
-static int
-test_ivshmem_create_lots_of_memzones(void)
-{
-	int i;
-	char name[IVSHMEM_NAME_LEN];
-	const struct rte_memzone *mz;
-
-	ASSERT(rte_ivshmem_metadata_create(METADATA_NAME) == 0,
-			"Failed to create metadata");
-
-	for (i = 0; i < RTE_LIBRTE_IVSHMEM_MAX_ENTRIES; i++) {
-		snprintf(name, sizeof(name), "mz_%i", i);
-
-		mz = rte_memzone_reserve(name, RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY, 0);
-		ASSERT(mz != NULL, "Failed to reserve memzone");
-
-		ASSERT(rte_ivshmem_metadata_add_memzone(mz, METADATA_NAME) == 0,
-				"Failed to add memzone");
-	}
-	mz = rte_memzone_reserve("one too many", RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY, 0);
-	ASSERT(mz != NULL, "Failed to reserve memzone");
-
-	ASSERT(rte_ivshmem_metadata_add_memzone(mz, METADATA_NAME) < 0,
-		"Metadata should have been full");
-
-	return 0;
-}
-
-static int
-test_ivshmem_create_duplicate_memzone(void)
-{
-	const struct rte_memzone *mz;
-
-	ASSERT(rte_ivshmem_metadata_create(METADATA_NAME) == 0,
-			"Failed to create metadata");
-
-	mz = rte_memzone_reserve("mz", RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY, 0);
-	ASSERT(mz != NULL, "Failed to reserve memzone");
-
-	ASSERT(rte_ivshmem_metadata_add_memzone(mz, METADATA_NAME) == 0,
-			"Failed to add memzone");
-
-	ASSERT(rte_ivshmem_metadata_add_memzone(mz, METADATA_NAME) < 0,
-			"Added the same memzone twice");
-
-	return 0;
-}
-
-static int
-test_ivshmem_api_test(void)
-{
-	const struct rte_memzone * mz;
-	struct rte_mempool * mp;
-	struct rte_ring * r;
-	char buf[BUFSIZ];
-
-	memset(buf, 0, sizeof(buf));
-
-	r = rte_ring_create("ring", 1, SOCKET_ID_ANY, 0);
-	mp = rte_mempool_create("mempool", 1, 1, 1, 1, NULL, NULL, NULL, NULL,
-			SOCKET_ID_ANY, 0);
-	mz = rte_memzone_reserve("memzone", 64, SOCKET_ID_ANY, 0);
-
-	ASSERT(r != NULL, "Failed to create ring");
-	ASSERT(mp != NULL, "Failed to create mempool");
-	ASSERT(mz != NULL, "Failed to reserve memzone");
-
-	/* try to create NULL metadata */
-	ASSERT(rte_ivshmem_metadata_create(NULL) < 0,
-			"Created metadata with NULL name");
-
-	/* create valid metadata to do tests on */
-	ASSERT(rte_ivshmem_metadata_create(METADATA_NAME) == 0,
-			"Failed to create metadata");
-
-	/* test adding memzone */
-	ASSERT(rte_ivshmem_metadata_add_memzone(NULL, NULL) < 0,
-			"Added NULL memzone to NULL metadata");
-	ASSERT(rte_ivshmem_metadata_add_memzone(NULL, METADATA_NAME) < 0,
-			"Added NULL memzone");
-	ASSERT(rte_ivshmem_metadata_add_memzone(mz, NULL) < 0,
-			"Added memzone to NULL metadata");
-	ASSERT(rte_ivshmem_metadata_add_memzone(mz, NONEXISTENT_METADATA) < 0,
-			"Added memzone to nonexistent metadata");
-
-	/* test adding ring */
-	ASSERT(rte_ivshmem_metadata_add_ring(NULL, NULL) < 0,
-			"Added NULL ring to NULL metadata");
-	ASSERT(rte_ivshmem_metadata_add_ring(NULL, METADATA_NAME) < 0,
-			"Added NULL ring");
-	ASSERT(rte_ivshmem_metadata_add_ring(r, NULL) < 0,
-			"Added ring to NULL metadata");
-	ASSERT(rte_ivshmem_metadata_add_ring(r, NONEXISTENT_METADATA) < 0,
-			"Added ring to nonexistent metadata");
-
-	/* test adding mempool */
-	ASSERT(rte_ivshmem_metadata_add_mempool(NULL, NULL) < 0,
-			"Added NULL mempool to NULL metadata");
-	ASSERT(rte_ivshmem_metadata_add_mempool(NULL, METADATA_NAME) < 0,
-			"Added NULL mempool");
-	ASSERT(rte_ivshmem_metadata_add_mempool(mp, NULL) < 0,
-			"Added mempool to NULL metadata");
-	ASSERT(rte_ivshmem_metadata_add_mempool(mp, NONEXISTENT_METADATA) < 0,
-			"Added mempool to nonexistent metadata");
-
-	/* test creating command line */
-	ASSERT(rte_ivshmem_metadata_cmdline_generate(NULL, sizeof(buf), METADATA_NAME) < 0,
-			"Written command line into NULL buffer");
-	ASSERT(strnlen(buf, sizeof(buf)) == 0, "Buffer is not empty");
-
-	ASSERT(rte_ivshmem_metadata_cmdline_generate(buf, 0, METADATA_NAME) < 0,
-			"Written command line into small buffer");
-	ASSERT(strnlen(buf, sizeof(buf)) == 0, "Buffer is not empty");
-
-	ASSERT(rte_ivshmem_metadata_cmdline_generate(buf, sizeof(buf), NULL) < 0,
-			"Written command line for NULL metadata");
-	ASSERT(strnlen(buf, sizeof(buf)) == 0, "Buffer is not empty");
-
-	ASSERT(rte_ivshmem_metadata_cmdline_generate(buf, sizeof(buf),
-			NONEXISTENT_METADATA) < 0,
-			"Writen command line for nonexistent metadata");
-	ASSERT(strnlen(buf, sizeof(buf)) == 0, "Buffer is not empty");
-
-	/* add stuff to config */
-	ASSERT(rte_ivshmem_metadata_add_memzone(mz, METADATA_NAME) == 0,
-			"Failed to add memzone to valid config");
-	ASSERT(rte_ivshmem_metadata_add_ring(r, METADATA_NAME) == 0,
-			"Failed to add ring to valid config");
-	ASSERT(rte_ivshmem_metadata_add_mempool(mp, METADATA_NAME) == 0,
-			"Failed to add mempool to valid config");
-
-	/* create config */
-	ASSERT(rte_ivshmem_metadata_cmdline_generate(buf, sizeof(buf),
-			METADATA_NAME) == 0, "Failed to write command-line");
-
-	/* check if something was written */
-	ASSERT(strnlen(buf, sizeof(buf)) != 0, "Buffer is empty");
-
-	/* make sure we don't segfault */
-	rte_ivshmem_metadata_dump(stdout, NULL);
-
-	/* dump our metadata */
-	rte_ivshmem_metadata_dump(stdout, METADATA_NAME);
-
-	return 0;
-}
-
-static int
-test_ivshmem_create_duplicate_metadata(void)
-{
-	ASSERT(rte_ivshmem_metadata_create(DUPLICATE_METADATA) < 0,
-			"Creating duplicate metadata should have failed");
-
-	return 0;
-}
-
-static int
-test_ivshmem_create_metadata_config(void)
-{
-	struct rte_ivshmem_metadata *metadata;
-
-	rte_ivshmem_metadata_create(METADATA_NAME);
-
-	metadata = mmap_metadata(METADATA_NAME);
-
-	ASSERT(metadata != MAP_FAILED, "Metadata mmaping failed");
-
-	ASSERT(metadata->magic_number == IVSHMEM_MAGIC,
-			"Magic number is not that magic");
-
-	ASSERT(strncmp(metadata->name, METADATA_NAME, sizeof(metadata->name)) == 0,
-			"Name has not been set up");
-
-	ASSERT(metadata->entry[0].offset == 0, "Offest is not initialized");
-	ASSERT(metadata->entry[0].mz.addr == 0, "mz.addr is not initialized");
-	ASSERT(metadata->entry[0].mz.len == 0, "mz.len is not initialized");
-
-	return 0;
-}
-
-static int
-test_ivshmem_create_multiple_metadata_configs(void)
-{
-	int i;
-	char name[IVSHMEM_NAME_LEN];
-	struct rte_ivshmem_metadata *metadata;
-
-	for (i = 0; i < RTE_LIBRTE_IVSHMEM_MAX_METADATA_FILES / 2; i++) {
-		snprintf(name, sizeof(name), "test_%d", i);
-		rte_ivshmem_metadata_create(name);
-		metadata = mmap_metadata(name);
-
-		ASSERT(metadata->magic_number == IVSHMEM_MAGIC,
-				"Magic number is not that magic");
-
-		ASSERT(strncmp(metadata->name, name, sizeof(metadata->name)) == 0,
-				"Name has not been set up");
-	}
-
-	return 0;
-}
-
-static int
-test_ivshmem_create_too_many_metadata_configs(void)
-{
-	int i;
-	char name[IVSHMEM_NAME_LEN];
-
-	for (i = 0; i < RTE_LIBRTE_IVSHMEM_MAX_METADATA_FILES; i++) {
-		snprintf(name, sizeof(name), "test_%d", i);
-		ASSERT(rte_ivshmem_metadata_create(name) == 0,
-				"Create config file failed");
-	}
-
-	ASSERT(rte_ivshmem_metadata_create(name) < 0,
-			"Create config file didn't fail");
-
-	return 0;
-}
-
-enum rte_ivshmem_tests {
-	_test_ivshmem_api_test = 0,
-	_test_ivshmem_create_metadata_config,
-	_test_ivshmem_create_multiple_metadata_configs,
-	_test_ivshmem_create_too_many_metadata_configs,
-	_test_ivshmem_create_duplicate_metadata,
-	_test_ivshmem_create_lots_of_memzones,
-	_test_ivshmem_create_duplicate_memzone,
-	_last_test,
-};
-
-#define RTE_IVSHMEM_TEST_ID "RTE_IVSHMEM_TEST_ID"
-
-static int
-launch_all_tests_on_secondary_processes(void)
-{
-	int ret = 0;
-	char id;
-	char testid;
-	char tmp[PATH_MAX] = {0};
-	char prefix[PATH_MAX] = {0};
-
-	get_current_prefix(tmp, sizeof(tmp));
-
-	snprintf(prefix, sizeof(prefix), "--file-prefix=%s", tmp);
-
-	const char *argv[] = { prgname, "-c", "1", "-n", "3",
-			"--proc-type=secondary", prefix };
-
-	for (id = 0; id < _last_test; id++) {
-		testid = (char)(FIRST_TEST + id);
-		setenv(RTE_IVSHMEM_TEST_ID, &testid, 1);
-		if (launch_proc(argv) != 0)
-			return -1;
-	}
-	return ret;
-}
-
-int
-test_ivshmem(void)
-{
-	int testid;
-
-	/* We want to have a clean execution for every test without exposing
-	 * private global data structures in rte_ivshmem so we launch each test
-	 * on a different secondary process. */
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-
-		/* first, create metadata */
-		ASSERT(create_duplicate() == 0, "Creating metadata failed");
-
-		return launch_all_tests_on_secondary_processes();
-	}
-
-	testid = *(getenv(RTE_IVSHMEM_TEST_ID)) - FIRST_TEST;
-
-	printf("Secondary process running test %d \n", testid);
-
-	switch (testid) {
-	case _test_ivshmem_api_test:
-		return test_ivshmem_api_test();
-
-	case _test_ivshmem_create_metadata_config:
-		return test_ivshmem_create_metadata_config();
-
-	case _test_ivshmem_create_multiple_metadata_configs:
-		return test_ivshmem_create_multiple_metadata_configs();
-
-	case _test_ivshmem_create_too_many_metadata_configs:
-		return test_ivshmem_create_too_many_metadata_configs();
-
-	case _test_ivshmem_create_duplicate_metadata:
-		return test_ivshmem_create_duplicate_metadata();
-
-	case _test_ivshmem_create_lots_of_memzones:
-		return test_ivshmem_create_lots_of_memzones();
-
-	case _test_ivshmem_create_duplicate_memzone:
-		return test_ivshmem_create_duplicate_memzone();
-
-	default:
-		break;
-	}
-
-	return -1;
-}
-
-REGISTER_TEST_COMMAND(ivshmem_autotest, test_ivshmem);
diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 1a17126..73f4733 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -44,7 +44,6 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
 
 CONFIG_RTE_EAL_IGB_UIO=n
 
-CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 
diff --git a/config/defconfig_x86_64-ivshmem-linuxapp-gcc b/config/defconfig_x86_64-ivshmem-linuxapp-gcc
deleted file mode 100644
index 41ac5c3..0000000
--- a/config/defconfig_x86_64-ivshmem-linuxapp-gcc
+++ /dev/null
@@ -1,49 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-#
-
-#
-# use default config
-#
-
-#include "defconfig_x86_64-native-linuxapp-gcc"
-
-#
-# Compile IVSHMEM library
-#
-CONFIG_RTE_LIBRTE_IVSHMEM=y
-CONFIG_RTE_LIBRTE_IVSHMEM_DEBUG=n
-CONFIG_RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS=4
-CONFIG_RTE_LIBRTE_IVSHMEM_MAX_ENTRIES=128
-CONFIG_RTE_LIBRTE_IVSHMEM_MAX_METADATA_FILES=32
-
-# Set EAL to single file segments
-CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS=y
\ No newline at end of file
diff --git a/config/defconfig_x86_64-ivshmem-linuxapp-icc b/config/defconfig_x86_64-ivshmem-linuxapp-icc
deleted file mode 100644
index 77fec93..0000000
--- a/config/defconfig_x86_64-ivshmem-linuxapp-icc
+++ /dev/null
@@ -1,49 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-#
-
-#
-# use default config
-#
-
-#include "defconfig_x86_64-native-linuxapp-icc"
-
-#
-# Compile IVSHMEM library
-#
-CONFIG_RTE_LIBRTE_IVSHMEM=y
-CONFIG_RTE_LIBRTE_IVSHMEM_DEBUG=n
-CONFIG_RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS=4
-CONFIG_RTE_LIBRTE_IVSHMEM_MAX_ENTRIES=128
-CONFIG_RTE_LIBRTE_IVSHMEM_MAX_METADATA_FILES=32
-
-# Set EAL to single file segments
-CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 2284a53..6675f96 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -108,7 +108,6 @@ There are many libraries, so their headers may be grouped by topics:
   [reorder]            (@ref rte_reorder.h),
   [tailq]              (@ref rte_tailq.h),
   [bitmap]             (@ref rte_bitmap.h),
-  [ivshmem]            (@ref rte_ivshmem.h)
 
 - **packet framework**:
   * [port]             (@ref rte_port.h):
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index af5d6dd..9dc7ae5 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -43,7 +43,6 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_ether \
                           lib/librte_hash \
                           lib/librte_ip_frag \
-                          lib/librte_ivshmem \
                           lib/librte_jobstats \
                           lib/librte_kni \
                           lib/librte_kvargs \
diff --git a/doc/api/examples.dox b/doc/api/examples.dox
index 200af0b..1626852 100644
--- a/doc/api/examples.dox
+++ b/doc/api/examples.dox
@@ -40,8 +40,6 @@
 @example ipv4_multicast/main.c
 @example kni/main.c
 @example l2fwd-crypto/main.c
-@example l2fwd-ivshmem/guest/guest.c
-@example l2fwd-ivshmem/host/host.c
 @example l2fwd-jobstats/main.c
 @example l2fwd-keepalive/main.c
 @example l2fwd/main.c
diff --git a/doc/guides/linux_gsg/build_dpdk.rst b/doc/guides/linux_gsg/build_dpdk.rst
index f8007b3..474598a 100644
--- a/doc/guides/linux_gsg/build_dpdk.rst
+++ b/doc/guides/linux_gsg/build_dpdk.rst
@@ -75,7 +75,7 @@ where:
 
 * ``ARCH`` can be:  ``i686``, ``x86_64``, ``ppc_64``
 
-* ``MACHINE`` can be:  ``native``, ``ivshmem``, ``power8``
+* ``MACHINE`` can be:  ``native``, ``power8``
 
 * ``EXECENV`` can be:  ``linuxapp``,  ``bsdapp``
 
diff --git a/doc/guides/linux_gsg/quick_start.rst b/doc/guides/linux_gsg/quick_start.rst
index 8789b58..6e858c2 100644
--- a/doc/guides/linux_gsg/quick_start.rst
+++ b/doc/guides/linux_gsg/quick_start.rst
@@ -126,19 +126,15 @@ Some options in the script prompt the user for further data before proceeding.
 
     [3] ppc_64-power8-linuxapp-gcc
 
-    [4] x86_64-ivshmem-linuxapp-gcc
+    [4] x86_64-native-bsdapp-clang
 
-    [5] x86_64-ivshmem-linuxapp-icc
+    [5] x86_64-native-bsdapp-gcc
 
-    [6] x86_64-native-bsdapp-clang
+    [6] x86_64-native-linuxapp-clang
 
-    [7] x86_64-native-bsdapp-gcc
+    [7] x86_64-native-linuxapp-gcc
 
-    [8] x86_64-native-linuxapp-clang
-
-    [9] x86_64-native-linuxapp-gcc
-
-    [10] x86_64-native-linuxapp-icc
+    [8] x86_64-native-linuxapp-icc
 
     ------------------------------------------------------------------------
 
diff --git a/doc/guides/prog_guide/img/ivshmem.png b/doc/guides/prog_guide/img/ivshmem.png
deleted file mode 100644
index 2b34a2cfdf3d65025c1f942f5cc1f2a1ef05892c..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 44920
zcmdSA1Cu4q^ESL=dj~tVZR3n>+qP})*x0eH9ox2TduGRa&;5J;U*bIxCpx;LJF7DD
zs;e@qDkBx-#1UYzVSoJifgmX%qV(ej2oLbz3k?Z;vK2Xl2fTndD~SVs)J)@^0dK&~
zg=B?({HTwE`!Iw6-orRZXgdG+fjIczA0&ka5eNA2my4)|i?Y3$i@Twd=?_^;V<&rO
z;2oj7nVF@rss9z5<BuOTr;;K<Djs@Qx=;q_Ll}H#*~oyKE=lw{MG<85s-HA&m#Ziu
z;ND>9-XQ2h_o_cdMC%k)Gv~eGkfGy|=e;-lXD;d227iDgbslzcwH|k^`F1pRG@8$J
zIiJL)5e0++e^~<?Z<YD^`FqQ=CZ)g&%Ku#`IT|J=CWL)_Iv=Zz;!{(@ezBMlQc;aj
z&&|%xGAbUMq@|}vI35fe&gKbWvD>6;`F*{bTidP~e7-$ONJ!LmqX&)Tuv(%b67buk
zSO9<Nl*$d#va&;TEt^4jJZ`&@6gj<61U><M!B7XY`NCGda&oa+HYR3f1$lYBZ;*$_
z$M{>zFp25asErK`jAoNqK95W4qobn%jVu|tpP!#7KYxZC9v%{pj89L~-)Fz)zy0mI
z@`PWGys){&k_#O-cF+g+7iK#Hk8|AZ4J3XieBsH#X_1$gS3j{LVP{vvTY>q?**G~l
z(I&&j#{Pyd#rC@YdcQenoZ%nXY;#7z$G1zt2)Y0T1$9_bRWHAh=YK6C9-!B4Hy8**
zc-jBWkWBqMI(o1$j=$n-`VSY{!^w;~xZ7EA8g)V2b^8TSC8$l0gP7K*b!!w{+^Q!`
zG_=Cx!`?tJ^u2gWu^8;oX4{SZ4f`HZDJfXHu1B?|tHDULxk`10f0t`$Kpzb=^n4UR
zhl6vPj3PJDYBqp3noTM7Sg+LJU3Wd}y}Z2azMkZY1Fc=}cBN&?yyLcEd%Dvj@b!8^
z;PY`e64l$53zUv${>Kp#5uZ1rT)DKOhqQgY$=djEG~xUvnM%3N``OO*lINyV*|x`Y
z0z)Q?T{eNL=fUU7d2x1@lP$TtLA%YV*y~i%<^AbuAWh$|T*7Rt-PP)|LbdXz(CW)o
z%T97dzEdIU;eh{wEC&_KV|B~6(mP?G+xa5QijFga)nZv-DxG$LP%z}_W*bYp+ZC+7
zzP`1+6Z7`ARHW@jGeVk<)0kpo>T0_yO)7(a4L{m%Oa`@(R{w9m&e!9ts27Qn&9L{)
zChIifgX^sh8o=jqmVL57{Dp^H6;Iq;kt7=L*UJH?(_sX0tMfFMcmhF-_f^x1if@U0
zZp|XSD#H@P1%TN9{i<cDi74alkHjshbE>od=8R=&HY5WBL*+8i^`%kFWIPoZ0s;b$
z8Klh)dw|Y|X&x)0vh&u}JS{jFn7-i{9Q1nCN;O%&OW7=TRjIf#_1YZb(etHBntAKF
zf>0$@pj8IWN^@%_)o@60u9NtMQBfoZpROD`dV2*QGj`2W6`K?z6{(aA6J@kp>|%Pp
zzqHHQmm^Cov$<VW_$10mE8cH<2r%`1>WM9DPPqQKE^D^#2Eip49D3ei-}*?jf5?u+
z;?0%F;xzepBl~^aiHqJ=2?KGs^rTX{X?C;SC3TeJ`?Plchn1D}3uip}@}(CDgxqr$
zbQH<!ahh7*^OXXC<6DIVY9g-$HK}Bxh>p%5C2`UuAQ1Ob)b$jjV$aV9kOWkE4&3DY
zv(PpvBEtwH*AAtuw=C;7Mo1#(MGhUyK;IuPN=L)UDJIT%3dajPWg0W2VtUA4QAlOz
zd66=2RPsofzQRna<bimpcR5ok{;4_bOyoh=iZG!ubCTmNmrXPCp9hC4YrCU5Zo0Lf
z>8&l2L4e(@>G|?9{hBK^O#&rQ$x0_CCQb$=#-P>6o0LUG_O&8~r*7IC2&YlvdYEKc
zvX_Kp=)4WO&#zYwB4mmugf@qQdKLw(he<IR4ke7V+3D7$qN8TbtRxk619kAG7NZtV
zgqO?!ANqW}+E8B#KW0P!6(_F0ev<27^TcDn&3j~D*fz%e1vH>Jx?vJ!`TA)vN#Js|
zwt9lVrvtd`?(D3B>-468Dp8N})|SHn!ntX^%^LHgHv&S;cNh>nZ?mgl6NsS`^c^x8
zo-SX{XA9!hJ^N8KKLhJOfc&83M~y07saCz%9**xNK}CX!BWsWdJ5kg9W+f`;vWs>Z
zuzPcMK%B8k$duUol18{T4^Fx_M<9N`ufLp7!WbvZl5$~br8JeWdYTIytq+R;C*ofT
zqmqfN{*ujANHYmmF^|L<^DZSp2mF4YkLs*;n;Nz=#&u{xBZd?RJdzd-u>=D6{-2L&
z-1_(@#c6s<kImloo_?5m9{I!y|5L(A5?pXUA}UI0aJkdd*)(<?4@O1F`{_~$NIE}J
zRuB8EW+P`1!QL9n3M|B~H`|mGkX4cWg%zljh}Ox8zfnoi-cHL1J$b3==|AL*jG<wc
zNL6v4WtmMFsKzp~p{t<0CD$T*<)HD+AC9NaeXN$N%BMrAZ|^4<)VsM+tu-{#&&$a?
z5VP2AmAa*u(?)jhB<-4i$v7P_t%3+Ijb8LQn|z*D9>N8b@Lot3kk47>W`_;?hmV-^
zy?#FLq+|e*7&06S^R_5qK^AD01RHBIgcySsI4B-EoTH!F{4pWT51cCt{RDv8-iOQG
zk>+_d4k&s3JI(i!jf1G@68L%%M!`cjw(LdB6#b1+j&Ghh1BA!doZL^~7g2%aJO~H~
zae^6c3ja-IDry!D3$wGg(5;ivrZZ<__?djEdV1^vY5%mnHB`smn}n6dVR^%i16}DZ
z4lFG!{%}+Oxx`xTO8~{Cm=N5b$KNJJVmLf}C8lA~wTSs8gKI3-Rdowb60AQt6OtwX
z3r_(4!sO6ee2zs)NhtXW{XS+7+sX%)^op}A3O&xF!EZLvZ_EiT9czdMar-ze^caCn
z-C11p)>s2iX-0SloO)Cxego%rYsEbiOgy|85HQHHRo2*a&Kz#1An*pGBdyD|dp>Sg
zn^@YL*wg^D#Cr_pvF#s+*HA&;k_kOkNh#Z3w<CtQn;p;>ZXpHgwcBN?mE?;7h^U&w
z_5uFZy|r9$CMZS^_eu8<_Z-O3yjz<Bn0_-_V6{QP5i=GvrXZ~cu^ZA=0ta7MP@@UR
z@ITFZ>w*3GE!0pcAtEaJE1Soi{dhXtIwcwZrbl4O!xVt|`ge}DE}?6h5mCn$tDL18
z6z&x!EP-iU<87fyESLWT2Z)VkHR%0ZwrF7z1QL#6>bm$3fq{s&Va5iNTse~}#^}!h
zc#;S6gB$;4ERG$Y%BBL7D$T}WkHZA<=+#w?N)ch<fs`_bQdD&8zWPRr-&K>O9^v{_
zxwr$RPAecV@?c9vye<){5~M%w3PqdPLe*p5caL;`*b~rPNbWZpwMD^JJ@D>o@6eLW
z_asJ5fm#U=^;ajUQjWgjNO-}NIVP2L!HKqcj!EyoL?s(Lz`Xmgv9jv?t~e4Z{YHq*
zSJni$ILd<(t&B4-QvW<00fQpI@NNi=U+bb|{76>Sj%NVR;+IUfAV7XCusM}r%)Adp
zUWX?HJe^o0ek+Zl7W_tYg75eR-7bQT{!^vws5U=e5wc9nq|^XK_Q`8Eg0jwv0~!T^
zWg?=JbGMF?`<oqL5vLidV(3%>18{&~5~NeK+>V_?qAEC5>H)FhCyA;F26p~UJeEjA
z{S#3!CQUOm=!0}kzTi<*Sh$cn?>qs|TQ}y)s6Y<6phGcYy8IMT`nAxgy&0nb#~K}#
z-{|@}c_(cA31Gn=ix`~z4KwFU2Tm!X5pt>qJvY|8N6M`{mx0VIn@%}G&HWq_7AAHi
zhR$LP7C6}>vIMiH($S4I7kP%wAr?`Sf}H)qhwVpPL2ujqk#e&nKCWMS_SS~H)dFyt
zi9~ftugL(o+%b%nkVK*Cei0FIEB@wpJ>rzD0su#6MF4fl<zUEEIs>fC<T9h1ibh_+
zZ|^~7-4V&ajC&Zd9Lm3A_gSF;Il7P=+&ep|xhYmzb${`2@#CF>(~ysf9dePAQJ+Nj
zS#B})yy4)6TIy2};NUh;sitTq6kS}F1+aoug9`Ecx)dB@F<?@WUn>5Dw&Y{*4=Exh
zNW1y>sQJaeA+9dHjLD=c>GE%Gi%P`39B5e{1AG#N2>O3|RLCh3#7ncYWz1zpW>r0w
zfc3mHFO5m5%o;Wc7sLfNy`S28@W>xOceQ6|AMVl7JW`Rj;1yk1bvLrI&{%aPW2NS}
zWYnqK2aH31@_?ZrWtmc6ii9Tvd5BDfNjn-`xL_}r#7K=fwx*bpgNT=5!)2-|LQbLr
zqg1hq{5+P)9*{h?5VF|a=%xERs|Mh**8)4o<*jQ>C%Pg!sv@I7k%4iF2n#f1NX1E`
zF$KinpBNm|j2d_b$9QB(nw$7vdi%;-v?atv;|K8#w|Anl>Y^?Hzo7wXi}rgg5To|>
zC^@0Ym7;PdO2rglJf#Y$Iukv$K~W@%_7|K$mp3xj(S-rn;{IvF%IG=x!s4TqF*>Y8
zJNc@ki`+sAKX+!0ipx+0y=ZJgN9E}#9MGjwaX^<^METK*EQ}I!SH4}+DwJIK5w)j<
zp5i*Xcs7becA39D`dDd)234g1T;Lh<<EGbs5mnsQU1kjoSw%E7_6>sE$ww7PCmIV7
zg)dt!v9VU5R$#0-E&@h}09wyunBpEV!lGrvR<g-P6E0*}dWo-H$eENJ=x#=jd1lqj
z{)G|KmhG@}b4X=Bvj@aWBd)2EPKiSZeXTM~|DTRBYp<D66Jds-X_N(panoGj(T*-V
z&>6Bnf;g#MYaG<$k(afIYD#56Kh!-Uk?{tkqbRxEJVu`&#!5UV>)XUqKbE@cf!2ai
zoirag*XpXLj=k0-qROmE3OW5F+}mBZe<~8{*_13RDQ5&~PM_?w_T;fgHstFfE;6U6
zn0VS*nK5+vYU(5pv=?<S&|a%JG(|%_Zn)XR+@)ty1zDy;bDF}v>G4S+DZILqyEwly
zYU+*JW9xl@$fBcMc!XJSIbyOYAeleT>;cYZ|Dn@FT$GHO?%9?oD^X^JK~_{<c~Z_C
z7w_Q6=@}H|8N?4$GZeJ=bkIb##G0v~us~TvTolu)8rei^V6qd>*apaV)O`P0*)C%6
zIW)R}uE067hLk%t8hK&qm$-Qq%6_8lXJ*tq8p~4H{apP@ipUA0t}VfqVu3RyH}$|j
z0@L@ofZ9THbs+!D7g{Cg`ZXY{No(fNin&bc0SdhJMk`m5>;8@jD#&vwXSM@OLSTWO
zk1erwEJ*UjTrH^-?jn<AiKGWW$CFU2P=o?0M^$L)VJ%dL&>zpN_}sNP?s~02q!jWI
z?iFGOTBSn@c92!I?G`bstEPgbz{3M3l72`f1_;V}-?Il9=0}uJjd;6iV8UBQ4P@Gp
zz>n$pxWZpyxoG#)sa+v>Sh&CTSb9mB!>Sz)g~!3yQXO5$LemuHVDde~HEF=9jM_`4
zm?km+Q3?kRT%QI1>khiITL`*Lm5&R+jVz;vnjqU5I{gnhOq}IfMn?_woZ~-;Ncujd
z|7hVAjcXW5@GA1YV_7Yq&^sx;-*Fl9e|%RV?xy1YD!}zKd%>kCJb=kNlEffgB&|bN
zW>-iMSzKSkpbB?DI`-}KPd3WGxiLqq%8^NG_WKYSH2owb)muJbn%`HZB~Lq2$Q&Dc
zp!?0ng($mX9(rnrS>SAfB~}%`fI|#z4YS4t8oAcD3sjO-_pDGFtw6cnYkCmGf2>>N
zC=?M5)W!te3P?f_t<jesA0Jin#o@K?^dCMUMW9hAc+0<Gy&=lp0|s2kVk2OEetms`
z+}>LU1pIjWpe>TeAWrgKLP<F$ZNL0=pS2SR35b^~VEF=JE~;8Cgf8=B>+Gs>Y#s@y
zt!4Q42RKSnp!EJwZM^9at#)~H(6=3;Mq{(Vrs*DIjVZu(et+>;IbC&J<C{{m;K)qn
zFqBvbioX(*SNckcwI_=kpN*4vyzDWhGavD}T#nk_n>JSWOUR1ciH^FdNx=5W!PgoX
z9vmy>SfayX)hNYR#Maww?Yw+6$}`z!rnd*}r^S=D7!ARGyd1<~3Hb3^EupHG15K!;
z!nuaW>!Faal|Uk{P^-sxc(P%v8;4|XZ;#9E0?ot(g@QFoWQ;f)R~i<Q@B^>(;0S>M
zoREhHfr~4{VzCSb002x%)^vfHDl3}H6_8v?dPc?h9({S@a*4UQl^|=_>bT5_k}dCy
zj6eYsD&fcTrMpR%nLI7Fa0UfnEJIf2&@eGIHIzsuo!rl($r88G*bV_ff%(*TC7DXP
z5)crO^yTYVhtH}U2esS?%)u?T+MBqW|1T~CuJz>a!Fn_wj>^~Z17q4&9^S0WeTpA2
zwUODC;r^EY?*S()@slB;8CLvLqiPxOgV<Y}h6j5Pw#8Ht_g!qFZtuW9l+M{ek=lKY
zW3dZFiI*PeoQcGHcn06Eg8Lf^Ne<RP(FnYMF0n_y(?2!7R*TXawSR8=8m&joAH*{O
zqw1Cb^Q3HfJ|Zrf-%$vO_-X%khW>V8aC*?hcj-abyY$Tf={5x|N7l^`;5O;3j_P50
zuRQekQv>?OjnF+#vj!Ljk?pOSjDY``hVlMWC%bVgBzTqf<J+u414tuQ{r~lgnYu@p
zDgKmGNIsh*@Qo#4_o1k_ks$?058>oIPowLEV{35e`Y+i|p@3E~;8#%t17q~a&{j5?
zl`=(2jT8#NwiKZ0t?}X+C2dQ9Ek5q4MgEwdpGRqO#&Kev-@USRuS|u{X(GDF0`G~_
z63R6`>8_Eu_eF;2o|>gKfVT>z!(38_>tNOw)={DXe_=jasDCca-{Zy_O?$%C?7+`t
zx2|ct#09exb@Q*)i0#Pz6U}+!MZ6wVbjVpM*1*!~AU5G!hL;d4nk6T$r3M0f5C7L1
zH(cKC0~g`E#Bfgl!Qm#_IzAKS2gA;RImPm<15Br{0y06cP_~uPJ)Ard>0*MX_5tN)
z)Me`AH$<9zW_S^PVL+WivQ@LNJ=dp?K69>*8Etl13#k3(&X;iK+#Sl{p||0oz|8JI
z8I;JGWy3;Q$ZZ;XhbIL6^gE1x7e7co3O9?JqytUEyYrrX>wPsD=c03GysqHc=rq4~
zX()9M-2M3Aq?|BQ-52j*5yw!dwcl__GP_l8XqZ?r=;suUx}n4R%8Sdsg{<31!27S~
zFh*6s8ru6dRydf1-;FZRj4T|`@?w0UtN*wykUOE|IXrQ@kxj)NPd1-4;GQLkU9US{
z94~J^ysNG>k2gzRjO;ITVw<Klr+YTt?~GHQ_p5~OoZf47OA~9tAG}5BN4T4HhkMAD
zI$oadwP05nJqcrsg}j;LAQT<^DRKfip0{h)XLs9o_DbFsrovo#6^NYnS*M*7vQ7P3
zmJIHiPd}!woI{HqO!8Ncdl749#+YtUS5C&vzf83}?c6*6-SgHwT;Df-X1~v*g$v`!
zf8BGULo83|+0^N|X$I;TymKG&OvkMZbR*XE!d%d~(&^d1cQ@j3*7k%+?>;2=zUdsH
zbWy<WRILX873&K8G52J8t79W2fWSA9S|ItN_ZML^N*|4WFyV4E5vOMBw&fe9Dih>V
z1AOoboEQgeEIgq4;mBFJJgd&oiMH+yCr=Tx`~}<K$h`Nfskcv7(8HIg4ZeP&6m$b4
zoky0RvSeIm?F2w*#f99c_k;#HYvSmHe=+n(BLHR93=W{(A(K3;wG>o>N2c!4e4wk7
zKL7QJRQGJ37q-Zx(M-oczvr62uXoQMUq?{3GtG;k5C6^qDlL%8J=zqxgYfZ?q60;>
z@zf!4`7=+*now>0`$M4_3EXPzk!w!6v??z`Mi1IwCZYW*q5W@~mo(7@UgJ<_35wGd
z>XTGgPqLRi<BewOB2QWSX81SAMK>7g(btkKNs;V#-MaHyV38Lm<dqYL@ohlPEz*2_
zKABLDbu3gHH_RH_?c#N5d=5&`@=}Yj%!1_Y^PWr0GP|6DZ&eP&$9+M3H0wX|ts_HO
z)B<iJe)z28(JcM#dz(L006*(c2Q1>oD-MGlt2{XxqB65%Fuog4LSes+y5J94@Q>gl
zp`=H}iNn6M@G3QDLWR+3O{7PI@djN9x0=oY_Aird!!<m&FQRuJxmYgcCkBR~h%f*r
zfH?BKXj%!WfL9yrIXelK)(7F`)08OG7O<=|%$-E)=LNvm@MOAeU$P_`TmfGtUNP<I
zgl1j{JSwR8Ms46BJ(<4HQB-)GPWI|CTz>Ce?)R#F!%k-r(zD<x+TIhiUDm$dVfQS*
zN=cJw-+b=0$<yR0?}^_Z&V^gb5eqHk&1&%$#6J0W`SNhz;fs<8<9Gp8U(B#gj7qZb
zAX`|FVOv(0*#T(*%VvOt<RdPkuy?5dBNzwaC;K1~P}rec<Uu&)MCQEmA*Agm%{TD8
zoxuhD<H|Mrmu1#B0695c{{UZ*W3Ab&(4Pj^6ci5U54!}ujG{lxeeYd-mIyj~vaiAt
zL}bv#I)Md=bXe7mYk0i5F4MrIZZeL=K3hV&dM$;>Q{|b)4$WZ-oyqb)TIJUzv<A)-
z*oRGFYf!!;?)&{5*G8ROX%igJIThCX&^qjFyNaEup7h7<mMs<gscH5bTV2KCD)+K7
zKkD|Jma&O)I)}Y3!5*5UMxFN%i^eGpS;0c1{IQzkblguyH8L0BJb#QFizoPVNXK~_
zYqCBXPZ8)Si#cWdeM*C;q+X5#k;urkV_7bzXtitLB5zZA7l960#jPq4geNgQ77bD=
za{tDCo(<+hDiTbt^V|AM??dCmi`*=@ooh{e8gEhfAni8IcyLI+*IN9R!Cr?F1JNc+
zEtP)($*SkzK~tAPgvMkSc)9YTKEMwl1fDySIwB%1Z8&HpaG@3hP)j03EwPvr596C!
z696ln_`ag1$KIJEUgX!2HZEP>+$q>T+Jx3L{hmY#>+ogbnE;QTLQ%qAOEuB*`!$Q$
z%w|H*49y@zEL1p2*OE~EOhHVSx5FEBzERjg>robK<ucUY8FOkz$MV+aU?m=^^`O|O
z#ulRiEon*~UxOM`tmVPo$D`~Tn*T6c1bvgRWV(&SG}~tONoM!SUS!M_wA)uALI^)~
zKGU8QU4Y5^)xi!~*{v{r<{!-xy6PxY()$|C<N6zq!<U6(x>B4min%6<ICRaUX9>0z
z2`rb|WdDLl{amR;ot<<@2#H9O8`^vw%t2WY-!NQZXP;L~GA49n+#`P-$|>RV%CxgY
zLiZtGUMA(?HDJDqV}^_iB%Q*xxT);)xB&4ran1pTbu9(MXnHyZCuGLBMtx?34xwZ(
zaUDSXBs|)~N@ICW>nGlvmb%^Ek#201m}p?tk{Cuna!6r@LjeEgVsU*NNEuHtz#u3?
z$ixsF%4m6whnz&gD>IqTBWsj*6F0*p*+(8`9C%TF%^exE&OGyP%at&H)K4Nr6EF%V
z66k_PwpJFJJYi2AA{;{LK^N3N6oTRG`!!L(sXX-4#=M*GrD2tt+HD3l@5Z;J4XCw_
z1gl7Bi&o`!;L;%k@uME$d6wTIMyX_1@mJOJlWT#}H|qr+VTvi9G9uWHJR=YJbcgVj
zSS9z05YU}OR~#+9NDP<1!mA~1L(}H(3=UQlfqqj;gur3)HBIBSe)8z%0QXSezAZPp
z7Gp<2jRxnIUfV1kVnuIs9EcOF?T|5dud6D2d%MNhh4a)X!%p=j?ZISzlNzl#Ff|JY
zR}J)!_!vI7J=h}A<i@N3xS74cDu=*%vSTi!SQTdQrdN^ry1k29Uh6@Ad_vqVAN8tm
z$z47HoTwNS#u87<95!H3-bcq;2rN*4MTYQCR`*0iV=QqgFJ5LTwe0A#s9v?i{pk4m
zp(X=`;cIm@^Pzmrsb$YpsA;cn-gOjObWTT|LH*UCsMa+Eug}1c`M3jzLLdLcs4>2s
zs7?5sS8!gf2!72Gv_B=v>GEV;OhE6}fIR0S`&r9aeuedsy$VLc?jKQvt7!jZ9UL;S
zE&O)lG%xeOBm+G>VYMgzl)3kF=>oW~Gj?~*g7_N`LdAvES=c(OM>gI!zIAn?)W>gA
z?P)4}Mt^cY+C+0uhC5SB&rNK8cVP6u5$Ely>>sG90N1j){aTJxLRYkk_a?!GaHvvF
zbzlXv_Vn+Ns{!s&SLpcn!2{*Jgpbu<f=wdsKeEWjNL$0oiZmY99SFLe^e&;xC~+&d
ztK?IY-$%6zio6zF`>ZFK^~|?%-d@MCeI#wQ0{edL>@P$b|AUS)7EIb2>u=Y}9$4O!
z&?9{^XHTxQNX3I_8=*gVs)G7%nCz1lmo@M-i%l%<;_KOX-*tc2xx8w%`%MWMW`3Jb
z{36IEATG1~d-8F!r;EVvSf8FZOcx19XXgGMGR|5*E|I^{?}-bQq8%SSRJoiZZQt~X
zk3rg^fd>6g3!sp-4%%2(>Lf8T)*!fKY-~u9_~U}Mr7yhr<!RcNlfcHWISo=bz*+2v
z`L+Q){EWrsmX7(&&TGc|k-ItH<xq{W=K#MzLD5WAZlj(X*Y~>VW=Vk~^&@?`%z){v
zi97FY_0)7oA_kB}#po<`MQQ(9Sx0!T6p_u=Wwf!{Aud*F-amc2a;>HIzoq=mjneV>
zmV)d4E43~zN|%}jDE!EqT0J1*Ly?e7`5aABNDO)auXDW>g0Dpc4@IZWJP6#wvV;+|
z3)A7i?2<n-KX7PAmMc?<$`Z#EoF^ibKDT>Ur-64;3<eE6Z<;dBy+kXYT2%rLHObwF
z4DBC1LuhmChUCOn9+<z9t|f01%x@HU>k8L=8K+T|Gpdg~=)$b1*--L#1j3r<{7HcL
z>96KcBhrL+zp%t4@!kbjlq1fLb^m?|7WNJxum`hh1dC9r3=-5c9Z?iBHaIO6beJ8>
z>d3s5f{=A13p7bfh#TDzH#l!wn5&Gtq@K_@vb;@&I-9r&m{Xh~a_>Db+q=_(S2Y0O
za?7>BFFe%<lDuloAw;*>r_hqL!w>Xj31Ehf=;<|8nAA=4aDYjBfgwfo(;zogmPyf=
zO@s4SQJJ)Wlqt$4>2m$2WZh`ZQ=<3nR$V#g3Fs^R)f@>C@7<7jFC7g9@pC6r2`U%L
z8X3OWb6m|}91mt;ji%O~2F--&R=DxbBSA{yW%GC(J!np%SdxC9Ng8R)MX`gyEUc{C
z5i};3N(JN#_UK+6K|M3C=a{MdzGTqNudhwjUCQ;OW18df3CB6&Z_8u`)-~WBi3j-q
ze=gZSy7@nH%^CPRAt_)OG+&(614qp8<X|i2?|Y*3d`uF)S;sLmFJr+-7@&TY3%U`<
zryOSd=6+^v!+P~R7NtUf7iOOkYxkK<C94d=#Agvs4f?=+^`|0o6;mURCvB3e=?xu`
zP9O_sTcMQq_yv05oHCLN=Rc5QPv4*4NC4IkC~nfGk;CwTIQ&U?`+Ubq@$S4>eG>EG
z<D633CtfpDW3hYlW~?~-PEQ0u#jE!_zetKdD56%ZU?ry*`XSNUyr!H{Ap)Nr_Ex==
zXcUVV=Jr{@Pun2-`3BSs2}}$2&f_)X2t5=x|3o+@(Ap0iAbdDvl#+(w<hA-2G?=A(
zs1$94khLIHvqljdsAo91=Xl)UnBQUKvkFV_RIrIcB@wdT$%}~iEw>`XusBG@*!@1G
z8Q8<K9_xk)3V@p9C$w?MxUha<q-WJZV5PIDCf+LbIlX$f`rkg16(W&!>n?VuuESM|
zcM&Tf39+u>A^D8bv~Q`aDcJsr-p}LQZ=%_5m!LfqNOh)S=ss}N*qPVb+3XI2&_%h`
zf}Y7i1;zxhZ;Jf*FL|?9g=t9~)i1mAo@zb<PO=@gx}o8IUI++CR}4MYlg4SOr_hPA
zlJ8Vg{l;t3k{8mAf2ijyld&@|S*Z~V{+yHJ_Lt^vM~pS%4-et^Gi&d{R82}DB%1Wn
zbJaT<v47k7t&}AL56vht)j?lnF6rlg=1=v*=&j<dP106XfcJTMk9`yKgIje$hfJm1
z5C^jDsY(I256iG#pb1EEhewBQ%4!>V^aBO7A1mGlce6pk86aT+L%6nd{znNF<5Ej8
z`D1d0MuaBrEBZHurXO-e@8P>w{LjQBv@eIgfhh<={=Yr_BqC~1;@4DRSr=6Q+nl8I
ze*uX>5<Mt6T3A{k;MR*Oww-3edsF?TQk?$)vcpMO*+_f!h^88uZyzJ}8Uzqm*`k}^
z7yJF9D*+$d8-B7GTh-fp564pu9e?_P8qnKxFC@p@*ip8~*d$fx1W3vjd+jH%K<h$S
zsr}IL{XF~oyMFhtpQ`>+LK&Dk2nbJrPG+5X#u0W32GES-Nb~KT53r0rN0{U4Y1qhN
z9Piuq6<rnEOnuJKdPL-xAVk*~B|UN-7u0OLNtZ5#Rag<P>mLR`?eyOQ-`-?bXztt>
z@Kt-10U337g7K6*B+#Rr;>mh-cIqj(Iiha3h(9Ue!UON@HIb0^mln0wzzKYq9<M*F
zVb4o&0RRW^ZT@FHXA5Q3rzI59j)1F@9{L{cc)uJVddv5?;>-eQ9p`!r<O9~^!5##_
z5wB@GT9}@bvS?y@Dg?$%!9DZ;G~1)u>Cvxf-}YP&)HhpVK4W1F3GU||73iP^;xZXF
z@@UYoQnT;|UzU-x(}VKk)BB&m5EX;)V8Y#XAsim1OT`5*z;rjgu5~k$GWngb>)1FQ
zdUZtS4rTW3<bu)g)6hBxz1KYpi=O>AKhuBM-DGX7nX<RW?H^5FwQY+@KDwo*V0zGM
zUcVmbF3QA-+i2j)@Vu*G`6-<uR!obp%SZer$T86p;3`G_&q0MRBxinvAPRH7&t(!Z
zzmH%(uv^beciD-GyOBnP8PW*tIFKJ(Sl4-zu-vPEF11W`a+8L4SqBUefZ+J0*dAJv
zDVmu7u=WT2dclf75A`@?6hv>7`FSwB6ELar@a%J&p<(t+y>^0V@QBnfm3AK=LA_g#
zYHX5y>HnUq^d+FkzXs@i?S0b^fKV>>DYqf5gP}8{sdI>G%dK*G_((T&S+Z%2y>qJB
zjD3vVf}l-kU~?|&&M;f+j%nC}QX1I_{U>Tx?`Mj@RlDF`GhD~EQ&y}04=a|9u3haD
zUjye$(TK_>^&Q*d_2Q5?b;Na%Qn^)N?XGX9Nd~)&b@B@-)Bf0ksKkUQYjjvJ&!)~8
z=C%|WE;0<(c8!&Yod`O!n<f<3-%kj;DSqCb0cNg~K1RFw{Zn<9fT(oR!(aoHa4*0|
z4g(Z*KT?wciGAT6)+01y=PK?wP~JYVeY74Jx|(g;XLiAO2h7;9$Dxpfq+>&U++y9P
zEmCOWS7;)^oplivk)!Vr4L-NobVx@yPQe>2=r9$A+?vIpHl(^jag4s906dgd(q)zS
zwVrw~S9{iZr}5XmFPW6Y%TL_Ooii7+!8KCjKRr({DD`!({tj?r=^-h5ZyV~l(_aT5
zx4Cu8>}`(W5|>2QNcDD&#m2;2`XT<lkFp?*QGvdZpk*}Yy?vBW^aem){wlgRx);Oo
zOeR!Hr5tSfP|g!&*ySZkpcj^9c?^hG=qxKaM+ks3A^1Erv|Cu)tXW&@eFg9@UBFOf
zb|^neU@+TIUDSGtVknx0heVg1wlWLj|A89;;eZXqM=sbT8Fz~C=~f$U%DucGPz<UE
z7)XbP;J8ECrE%*}z&0E}C<zdCkpN6iHq8>bW{jICD@3^T=Kb;pihE}%0?4mIeuyp=
z<AJp6Lg>&Pj&tY8U>D5`ZsTo<3?XViRKLBg{o*jt4rW(kcx*Zk2Kx*?3Oynl=hs!y
zwZpRx6*CR@PQpt8pJAlb!S^A8Gs{OvnK9QvU`1`DwrYDtW@e#d8B9!W?W$ch)KLn-
zS_s9JNc}BIU682tgfbru?zKfHWd(P;ol;PTk<p|4%cJ`9C)9-2Bb)5Xx@?FK4(we-
z6iGch7nsHB&?XeiN7(YuC8~}12yAz>CemiB0%S~zV2kaAO(=1~kcqaz$hK^fBkNEd
zwTSsDnj&&5*y>Oh9vxySxlxa0XD50|L3O5stO>BUv+7<Q#WIx!UNcRhr=x6=itCwY
zMG_K_x%(i`o)uj({kRI~?oLoqK9*wT?!SSwZ5-<%)y$L=C3>E7moiCp%RWdUFp__k
zkSu5_M&)gmK@CL&s0t-d=!*J9qKy2gX8%1Rq3k0ka^JWa#8Vbc=(CQMHC;Z~5%r1z
zlp66-T*)`Q7tD@tOkP*O2q9DqHc53SM@bd+@(W-kVj=ol|K;8p-XpPHBqmvAF3KR>
z&^)<FfV&%^l76~u`eRI8pOe2nCBJ36=a#jO37*iH;H=FUl;5!3fLxgUrvACmR<h{*
zV%>|5MyLEKz7S1U8Lm2gh0D}btNWPp#@%$1y{g-yxmGEzPVAAdv*9b~rcL<ZBJUZr
z33}~cDWNyz9T7<p@^2x@s5=UVM}D#&;(pfI4?GNI@rH2xBt98N(TC!Ai=76>n2t#~
zD;wg2F3NJk$KXK6V1)p{Ow~{w4?WRp$d;JuQ?-V=`btrE2*-9Nv$vFz;s?<9y@QJO
z2BxST2P*_~$C?O5F0PT-&2K6-Q6l3l;v;~O&QNssPD-RkbP`q_=yn1p=}Y^FS&&IX
zehpb}{%m-^Tl~8M&6&rlUeZRTP?6@aMQMbj;|K)30KNjUN+0s$uFbP1@IKo_{#NIk
z=J03NJ*>k`VhalG$=^mbX`i)Et%vsHp)VRRpCTrR+SGSyhK$K+r<?PvlTt;p-7k|7
zsF*0rUa;O2fJ256aFdjP+R9=U+_K6E2K#T?3?M{&FytQ`2HU@G%j<eFBxIKO5OGzB
zQvNBq3ck!#T9P>jNSb{Yq8zz2&^emWR*?)mU*saxKRMB6rkjYb)yO-H<<b)A$X0L*
zJI4%zBpqT~Y6a7JDY>IhS$=H(&N+))@!8pqz_lDBjBDin;<qZYjb+TBu7AH2YmBt2
z6W?wbBzarI%FR}MDMBcVoQDw#Iz`IGzDbJspiZ3`9M*dqly#k?PHZ(p0<%uaYYB2a
z;R>+XIbb&Ndf9*04PAy#RnuLKdQn-b{PLI%;UUG(Ok>?Bd$NFa;)4k@n5mSN`;bQ6
zb=%bA1o$3T241{myrc3fnNfW=MCZ2Y>nZXdZuW5J_;|V_J6ua>4i1{=M|Amrg}obP
z_)=VYt-qge`yJnDV)~i!y`+s8ZneeczG~;7^*9q$1$QC&nRE5^Y}U|TcVBWaa=_`!
z;KUMHn;UvxEuX&k65~eGh9e@Y2d>nIm7ROt?~ifI>R}h;J<y8YSl#Aaz_`mncbs(y
zyeK^EmDtCS67^qD&Y^rvh8Fd@k|MI3usgkT&^E#Z`D+eeUD>{79|6xo#_l$&#L3RE
zFJU~ZesJ*SB)PnJd*j&6_&RFa8Lz}_#7k&Q&+yW>9C&>-jV1eEeLk@GPlUd^<3IEW
z;x~5}oYQN62fvN~*bi>nj5zhk_Bj#*XGlVF>T8^icdjHADOI%cKwRT;tP20|KoVF%
z*Vmta8fqcAo2}}Uk(pBSg&n&Wq3kB0kkWeAAOC_oT>MaWc6iRc>1!cG%DQM85jT&c
zB=vUxQJZEDkNg+F4+UZgHvWWdr2@SfcZv1Yr>B^cp*QZHuSuKsRxl-SurDhxhPfN#
zSBOlYxj-76=%=iTd0g{=)<0oQ9F?&rNM>^|ABsNDiN85+-eU5_(7L~*pWty^Xv!kl
zJbc&pEG0zm_in)WcBK(<<jP~Xd)1hrA}qt}&TQCqXp$)aEiws-s0%g7%W&kEG7{b7
z<Cb<fIh>X>Vx7Ciw5w*)Vzj1@#q`BC_{I^cu29D+VQZaC0SgN1c-Bql;lHLnAAir(
zya18DUBcmmRdq>zJN_)@XMzz6|Ej2$2Lqk$P7h$u4g(e7=BI%XSwbHaJxD|}^e~@q
zgodd!tDbe}uO6l6;`)`vUL~&PMcvue^xUemcNF@usN#Yz0K&#$FYJ{@ukrjZX%rqI
z5NQOo1I7oQz$fdOX8ZmXmF@F<k8Jbza@%9V$y849c(9LW^u@A$Nb}YDz=(L(U$VaH
zQMtk(rU&iaVVrz`b%_j!UDuM<o?ffj#pjFHhzh)&51g5+-wgFqnN^;(vhcxSLOOKk
zBYS)T%pcDZ#(uVC*ysM5XbW1NPBBtFN-yz$by92<#K5%q5BBjS2nI`F@opbf;I6?#
zW0s-eKC?*ajmXq4`Kh+9gl^$DFlHYfBjNg$&b#)pS&GDH5&yb26>oodFf-^MuqomK
zV;whbcvZ!Qt%?E++2!zA%zjv}NL?fxKim%3Rh~bY3>#Q7A6)c18k!JzIFm+;rG$2S
zV^f5OfbU+lOwFjcoaEa3xd~UvtwrFK<DeX3M#2$$YZqgyqW>ojk-#6mO5eVM>!~}e
z^~8k5)2N?bLWjKBhJ70PCYqgSsv6kve{V{~rDN?r%={-13@S5s3mK+O^5trjARrBi
zzIu%SWE;06Pch}cjOgFXG0uWI=1=fVe*qOM500Uov5Pty5?DW@{->x@yA&%G)Bh47
zyn@0&_Z>C?eHWD3+T7PQ**oJ8pL$c}&{Ho9v}M^mpd^=<duf9aL_*fZ=t^U^UmT)R
zde|5gPWXE||0*33E8frRORj&&x?xj5`nuvGioxSR{&TuXK9ojPkAUfbA`(JyRC}yZ
zEC1hSatzu*wtK)<M_?VJ!639H+x3=>AKyKcQzK%Yu?Cb&?Q$Q`z30Gm(a~qtrODeM
z91&ugf8*n_L}FaFFY55osKmQa{n5^-KTO-2wSpVyTwDSIU+~KfsKqjnO?QIb<Fnu|
zDXq}Q@(UDX4J1sF^IwCTFhIk_#>i#`$O`}!^(1q6J3BV#7`{Jb8y)A4I2Mw1kn`ZE
zR2jenYJ0Hya)`FOE2O*>gKLT2&JcS@Zlt*H<Vg+cplCu6^3r~it}h=-&J&YZ?^-G8
zHh9$nhPSz+*u&<F`GvxQ(M^P<(O9cmu}A-3%FC?hjSvwBh0P9L)3aTDaBf2@rx8f2
z=xz(x$a5}}J2|uizIA6`r2{kh$I`F+Nh?mlRr7T%9jynM@t%pYfx+rcT*I1m69T9}
zR6*X4ms{c8QItGMwB1;qD)j4^sk4oGZdbI<&R_?-!3=KqY#hLjVoZTXGyKbw_QQkg
z3J9Gx5G1(H2bd&0C(!F*?ts#|X0`G#9wA_gO5{|n+0U7Q=<DjfWm@fE|FgF@D5O|W
z57tZlCbmS#=^EP(3h36*my=w8{#MsBXbeskP}nzV?LhnA>f5{8(43`?zRG!-$6Ccy
zpdyAr3s5~@-0=w@e=R`Z3GAvJ{AuNM0z;y=EPl@8o-A>WP%k-f071vd<j;RQfOKr0
zS<7n7_aoy~^M%H<X|;iilc$pg8)CU}*`gH(=-MBzug9d^nwo+uHFi9Eu9B;n^>7nh
zx&{8=^ovR7t!)3j;(P)MhnX4t@k{WMs3P`-!jsQv!9CwikndmIB6$^$a8h<D2^mB>
z_tM8k4W7e-ERZC`)r8pn%o(ck><2Z=2o%(PmmTCS8XjC|_2575f4^o>{UIiqn!ykF
z)P$=3HTdBmS?zbvc&ex29!Ab159zI!5t@GIDzh_~Z-^fvkn|;F<VH&zjFYe36brO$
z2arf8g1fo3mzZi|Em!!U1@QEs4@(REP2^eOY)n$DT4`9<JgHY7#wW=|Znk@O`;N_#
z`ns#YsA-rMYM;yfAcK5Bfi!asTqMkajrG^9@4pP}dXUTWa<vXM-)$lB-#1W$w+jC|
zsJW+GrG)UmPKhWba_zAlTyWCulcj0UY$ADz<z$MKavnEgvQyx1Rk{L3X4(I9JJ1|F
z!DpGvwNUd5z$3&eu9KRWge5rM1_N8}kaPo#w_jfj4hzGIFPFRS5~H#JnJ1!;Y~qz6
z4){;v>9C%bo~EY%6S5Yn@As?ql-OXKSCN@Z5CF<(n^}H7&4<~#4{L4w9Fp5Yka(|j
zYq_Z1YWtw^=Xu5ztNqE24aV&;5AqamZeU$$U@L!YI$NkgS{m%@V}X<n@!J1Mlf_)u
zfbg~#%uaxfc#gcaP05kV^Or_$CnQaxTAtwK#^i@gXVZ=HB<DZ0=As*bTVN+V>n4UC
z@dLs|V;Il={RAA;3s2uzjsG_uwA=TuzsExi!$;P3!Zk!D$48{@mT#n-p5W(D%Okz(
zCkLUaae2-DvesR|@j%1lF7DLun2+Ul1!~uU@N&UW+3TxFkw<<Gxs0JS(I{mTasP;@
z0Q(~NM6{T4_4j+)7ufx+KDnqwWK1ZZ*Wxua9v8|w-;SpKPlb(I$n%#K)P{)`DI4gu
z$n^)bpsYnR&>ZVK=!$o>#%0U@;N8X}#|l9_pzw6VQ7Y@Aym%icqMKPVl#-oZw{0}W
zzh?APjH7FTdeK202u_oiH&7%#led4<Oba-+TkYt|B{F@$(y+T_ZTmg<w?|x!ZXYf^
z|4&yFb93Y6D$UmOib_V0TLYiBvr?<icGrvBOg?5|1;X)rU_HRdA1>#`=TJDzs7D{1
zFOyP9rrTB|{+B#%_Pc4Go89({nt>&C{pwWv?pLuCXMgt{-xiI1qbQ#EjEqPo<Kf*k
z^IX-DCKST)YQG;-8sXz~qpJOOE)VaF6=zp!`Tqk*X@RFRv@l3*H|%1$F*>OC^Ysu=
zO!&u#k0?1Qp*g|w_t&$)<z^e&L@M3DZ-#D4wOg{N2np6rJ76Kt%W_V(OmD+lqq#AJ
zxHYuj>k+2Uzd7-j%f^}b;T$8l4)>ekhZ%u5U;%<=g07w(e)rosjY_2&!az8Zy7Wlw
zvB75si#c_FBg`_SKacO5%OlU%HcU!j`WIKu?{nA3ehs2_b1mLz!Bp#3KHz91rtkXs
zawpC*A<4wDGBbnJ*VhLP2d7Lbc2C@fA|8v|S0bBrXuh%!EF2geAMbx$(T}sPX@hj%
z_AI>|huhxXE>kQFThF<@yAztqWJ4KMz!U-&>k*FvyJHMIl~xf=80dPji~u~<uLdq@
zDpt^;a_+TenpR$HtL+>md>v=ybTpn~6!;s4espRIJ@74f-F=}%7G0BP4tRoMrZy8<
zH<i_kA==tcAP4<nX=h_*9q_-wv(;Dhhn-Vt|E!d1CgAC+x<65f_+q{o?!WS&Gf9Q1
z1b9gaKJO=$L9uzwzZMBv^5Ed$lqq7$&r=*bD*h`qx^VvfjcG|H?(3GN<-W>yC{#)#
zL%%UAeG?cZ-8MO3;?8aF569BkZLv|Ykm(GzydKX(fwev#PaF0Xmp47%YVT(&wL5dZ
zoO7gg`Q^LOKG7%z-VH-ww{)ms1$)=CCMVZ*!tx<`DmNEs@`4)uv^<Dq$w5Iuv!(I`
zM=gb#FCzs0dCH}7Q6|`IRw1=LUs|-g+*nLTAr58nWdSA<fZs7Os^>Srn!s@#GLW^r
z9{bbT{3u=)W@bkBYaJ0WvHgDw@&Xm3mse<LXekAUUp5QLu9f;d{6G}*PG)h)<Hfz_
zWk^4YGt<V31-hb~;lw6PUH2&iT(ssq@qr_dMFLkvWgrY>#~%qE*P@z}$;Q-WFBFrQ
zn;Tm?jb2nm<tHROR^B)a>cnEXGLz%J$bXfCRyfoR$5UA}{@ZX2oq5ZTr*pz6i3o}4
zOS(W`r`%kx==%==Ys+dUm9u7nMRK(`cCA9M_xRqIOp7(T+SMedB%Q$W$Q1uLeqiAo
z2X)fQ%1VpGNk~llSCfhbOmNtmVg8;c(00>=hA?{uZ!Yx*nHr%KVGH2Az~ZjCJRvaN
z$3^9+qG|36+NPY63R|^?=t0q;V-<&x&#i5W@AdN-vx_8o{_<p;vDk}0Bh4_ioUI5;
z+Gt#*ZddDjbK(T5{6cRZWyIO<btvaJL3@u3;%qx;m0AeTA>=&nkiybo@Lc6YhNQ%a
zsj^s?26_I`emsdVJoK%x<a9jbVzIc9+IDTCuea^)H?fP0i^OK6&m%X@MhE)Wxvi%K
zzkhNySuMc=@vdqWb0Ogb%-)K7V&Bb6p;ZZmQY7}t%8tvfArY{Px*IpQd&0udylkq(
zP)Nzhh{(wy>9m_wvJt@1H-S}=v%qRTQ!CVe#R<2qbdgcVbiSUKrev?PE#h^aw%eVW
z2_^@_F{(54V^zSmqM$i!Rs(?eq}m}Ei`}DFvDwTip%ONfNL7iT)B3BBO+srP$MET<
zGlFlFxkyArBq1j^oWtk+UlpooIyN>tFqHUjk4<ho+BK5`c%l+`?z~vZEjm1t$ry6~
z;2@>k;q@r3Qm>(b84asI@LNV0DKoO0^p@M<ABZ>J)rQe~O|o+CwBJ)rL(AN~CSEA|
zNXdA1OR^u&HpNpI_r%#t+K$KAli#wh%ac)q`odFV3df%li|~*`C)~<E_66y)tfd13
z{d;6f6HHEVdN}<f6F7c)eifa)Ot{fzwLQrvIIpA5q(2O}S~w2%>BFPBb}x?L^vk6K
z<ErQ=FKPGm>7x-$!Xw0%1|5NGo!?vTJE{ZToH+@O3eg&+Gl%)%_y7-&7e#aCzTxrV
z!%LyUflTl-+5>xy=^Zq9<-FWbkg`kFM(<l*glL|Qi<*wb-yyMXdMs}6jEBKp+0=W-
z+g;>}OBJS{Qi40!Q1>h))35!6b90(Aia=kc@4)EkZ?mk~3yFZE9%*3edgR}ytUZVI
zwO<PETEJ4kq_SbzOgc7fwAhQs)Ikp(;R}iRVPJzV3kmswG~4ckg}LGomWe?>nd7Pq
zba&DP4{Armu4cC(xqevV<2@abeM@92D&g;=sgNeYv~b1aRgNa?QL?Acak2>@yrUuI
zNe3fxw&?4NHt@P{hqGBP=lQ-rRXl4HMg^I;)7oq_Pn|B7ClQTLQ&WYof*t^;4W<&A
z45Omr;_?O{>#V19IhVXmGS_(T213CTi8V180Vkm_R;+^raeO8bI_!Hw9P?b?=D7k<
zxC3B$AoJ7JMrred$Yid-PV3j(1%js~R)Nc+zFyB+uAf!W#Ph$NFXnlM)?SJ4Cv6fw
zzAYUmr|5rt-qHUmYnbQtJ^CclE^5YESYrMinvnSZvmp6=+KBlV3<c+(lx1BcpKVp7
z?--gu`+OUwZO`WSvKL-Bl7gAZ<Z`NTaJx5f?(Ip>du(hxg8k6_J{BRhZLQON>2^?k
zm0RY8A1Pq;$6wF?pisW_mvNyUJHE#IUq@0hq3Q{V8O8?YhG_K`KLdDOzqI<t(dohy
z_4o|?kH7ZaOuMx>(K|+|xO6iUlWgoG_AK<8Zk0@?Cz#?BZMmz5oW_T#6w$v+b<BCc
zdNoMs>1!nL1l})Mj2Fukm$T=~q9~KK$pQv^R^Ny1e5qp~;4sQK_(xckd=(uTowB<u
zz;|Z-HOC0cq06P{TX#W^E^v6<s+fbh0<!o#;C~t1;HeT#sb|`;qFgd1=RoTzXU3_Z
zcG7=HAEA-2kb6Aaf%n|Q2Hmtl^$CYl4;5o58urcbh3{T(p?s_wLt~&<j~{VE_B^Jx
z`OTApuT*QlWZiTN-!k+Nz7(e^RBLmHs8yE4ni8ZH5_;%-9X;RccuW$06J9<?4OLPR
z9P|&L>~E`alzS~lR0XalG~J3_Qb5DP4gby_p?n;173vduA%2DWNNPc;b;FBeXVb#v
z>*;Y)Gq&o7mpOG1n6BM<xA2U1$IXfH#SG3u9K@rm;n9jEw3mr`M}8zeH*y)_Q6!X&
zf9}&1f7h~5(Zx->#1a#vn!%9;6yC&=l2^zKaVMm=ZnN{e2Grfvz2^!?V^Q}S3c?E;
zjDnEdSx3GO5h}&D;c`0U!HVZGgybXQUfPDUdr%)jnD{aqjyBEccVc;bc{(G)lxVgP
zJ&W5x&Q|?mv`SW`%v&4V)AoK!7aE%8GUTF9g*z#*FqR{dP@eMCdH43rtQFl0jZ{>g
zBWvZmu3<Q_5Jt?+&JA`Gs%9)Xh6c~S73DPjPJ7{UgVD3NgU)>U9cgFyl~A1i*BTbz
z!UN4q{lYE<w1c^j$!kLT6wM9Nu9|0(Iy8MU!^34m<Om5@_k=yf1A?f9@LT^x@1xs7
zg%7BE=9utc6{d>Oq+lPfA@NZP%UY~2YNUS1<x56j0jYR}(A#tC=jmJ_`s>|u^mxxf
zEDxjICg)OFYxh2W6*c7Bn%N{{xDQj||3}kRhDFsyYYPyiLmH%U;G?^{Lt2pT&Y@E}
zha3s%p+Qol8)oQ`PU#$wZs8v9eeOU09A=*#>+H4G8#S0drY0lAc^>xS1la@F^#)bV
zG%YXDMl$-hirfftgPc@h@&y&r5Km761LHrO%Q~Bi6|dx3<@?oK->zADuV=QTy&fej
zHuVkqP4ww~A#d#^%!c&S9?T%IhM-z52nhYlA?J2kj?hfecEnFL$)|yOS6$d)rN(;^
zv6l%)>954tjD>%n;YNLxflT*viNKE&Hj^n3+?*z<n|NjL`FwF6IXv1yj+zvTcjP0Z
zCR&-bsZZeIDVCz%Z=rpE0F45|=i^_t!GZ4H*g~}FUU8GhTA@EbRH|cOTsJ=ju5HAA
zfAL(iTys!8Bc8(ElOG3JTlB)>Oo>DNsb~)Xs^s{f?oD)q2x?yTKDNhp<xINwwCVx<
z#|IYS&U*?LH6xfHX;yos7Wggx*UfMp*d$A7;@&%j1$(~|Q>~Pxu)PuK!dKBwBUjPH
zI16vRxZ7qyD8J4)*_{x>uTQYL-2bBzU3@#D%%tDj%m`8P<Q5QnIFXV8k7#ro^`KFD
zi*LWBmEahDg&yDQLm7Bu+CVXw*j~+IUV!-Q+^!wK(40h<(f_r1{3eZ>MJrUxH?GcF
zN8f5_#w9pLSNSjO{p8@V<OjjWhVG*rzcDQ^onT1;?W8)zuR;&J6{?_T&TXbwXD!5_
zY3W4m#CS%}c7|8hYe`mKrGGHksO7=u2@hoihOuis_I2#0Bt|Ru`7|vZJYzo;`@y48
z@K5UCmo$DGR}picii!)!I?uJK`GbU4Ho9Ta&-=XPz_ZV@?*>&|4eDt5rT2!}I<**B
zgh1&-kv>Cv6`P5a#bZ_=#|UM1Ka}Wr6WKR-^ncwh`jh76LzJ%u@?{!-2evLUW_OV5
z-ZrOy(hAZ_Cha#tv~skCy_zeB-p=V*7^qF9QZsNHQVepP_7fHFEB5bl8)gW2pIHo$
z#bSM^nD8)>muEd=H{G>2o84K@*ukb==2I>cU4%X<Jx*RA8hkNSC}JR%)kj79f|60k
zEEQS=hH%lirQ14A>+Bdvx$E+n?%M8AthLM}$}IujUq9EU-ZZoz>C%LIHMad|e(D9>
zi@D3=mCs4I^_m|~7fd2wL?6X*jeT6;$U5XVIt2s|v=;v3pZ~U&K497#kstK*SkCv=
z#F+>!qdronKgE_HJpIYejNDW9<77OUHGB}4P7u_RPHggvj!U8h6C{X7p)xn6TK@(m
z=*cnpcNDJGA|nxYR*9R^pjv+{+N!Q-JyE|j<`tIK^X^BZM{j4~gk_!a86}?;cZUO<
zxNF}FZsGYGs|)v=y(^1j1Wbsbz_?G~>1;B7g#-P0-T1CP1e6T<JG!_3DeWhU?kq{6
z;D6l9*hMr#m_#8I1e23iB`>Fv>$!SR1Av_akV}`r+pl;G{s3f9FkPq6AMO%$I)h52
zfBVCU!?e%<uH&_2g}U6#*M#(x+cP$LHcWidIBEBi{mdrv>f>8^m9JB5brW)OQyv#r
zM9_U-YnY^rTmf0*CZT1E=+s$0jlj`r%p<{Yitc6x19pLjJlEEPeV`tuS1?2C5LS>6
zldThHTUdKFZ_NE(%r}712QbTGoNnEwmfJTW1r(I5vQ<g3Sk&K7y#Kc1nq?%?HLH{@
z0hz)Vcw=6^*LtBiGCI3$S4x;hdXBGW9X1>XHy1Mc=?@0xs>lj2n<~fl)C5=N`YTgQ
zkwV2uT{^-ezz#%}e;n@t)U7IzYw5GhRnLxF6RtY+WT&79BfN&dE|EqU>Eh*2$u+ta
zDFkvGvb9fxD#=Zp4^Mp4h@xTNeAk%${nyB-8J=JwMSU+UTLNi{55LO@(q360b=xcc
z%51MpPT00s#q19m>535=ZCp&Yf4JBC`U5pZKS3r`d`efDr))iE!a_IN1i}373(hy)
zYNIDCw)}!`{IxdT=)39fC)L{p2v@|O2tb0?A2~xIB!0WEF&Gsq<pe@S@_CDopYVbh
z{)P(mVH7Ep34VALi>JHR&`!0U&(H|csITQ?Mm6_r3wz`l>(0iKk(;_a&8}wZ@voYp
zm66xX0BE|%1??#o{k%6@^NSmEU4(PeFb4p^j@M0DphZ4=5-`;WU!9eTd_02D4vuD8
z;^n98%~o<KOzy*I&CFGb5I3^wal$__A?alUk)JUUXFSn1ck6neQtR;8eh|mc=8vkc
zX$OmUKC9&Yxi*@<KLi#TpFC{g`}2$e>4W*w?vnM_VP-hFFxg+M7XdF%9EKycPzlI1
z%{zO3?;oZXxZ|yaAPH|+j)lT*Q7Z%m@*`uk1H479m)#>o2qNlvTy2P~jjU_fo!LVp
zGcKXjbpPaWHqDOQBBJNrW8Ff`Lhj8fDf~<qOq$qGDVDIYdsXL^gv{X>(8<|VqR2Q`
z3PyFK&cMHuZdWI6#&Ny%)+M_f#Jh-vJ(5<$4|xpLp9$SKB#_57q^_&0tK4<}k9A>0
zZs485(KzdFA`cwbjn@ydyB)}1(3>N~BO+or8aPhvW`H=~cqptSdkG^kw)qo#&cPlp
zu*<&k)A1V(E(B+tj$0}MLtXRnKfOBXqfmrN9<N`sK;N+?HD{G|i+c`juLp5OlXjJ}
z*MjrvM*jfO*}>(aJ-?>%<boTf*HFOON;AHX`)&agXOR$}!KBQfq^Lrtn~?+e*3Hol
zkm0GiU3*Es*iI%k5CD4LJ(@wQh#*S?glg&*k+Be+>)_q4XRcPaYupiSyCakixo8Ns
z^JO2bti~Bp=c}G`eOLMS)oh33;na%4&Zw#9fiB>96}mPeWH@j${EWLr_WIAxmvldu
z=kT2CXUAl|jnkKC!h(0|+{pP?9GLi%t=%Nv_f@*JPD{QlHy&R~YcNJ2e=%{XXHIWL
zG{cB7M;>-QTUc6|N^Ecz-H5iWV}k8}6B-BpCEUQKfWt?QJ`s14<D<s)WT&mfESz^B
ze#U=xW#_{Zqi2=e91-Mih41sjN=WVp-FF1eI!=~J-j83Lc=~PjSZ`V^k<sFL#|vN<
z)wiFno|8hg{We9+WNlm9ITpVl)<ip=9{n2r>;7BYekyuKJR<E#6?$*Y!otYw_7g=`
z=Y%iICOWoV1dZ&3SXmG5y8wgT;I#wlDq7!mIruN>5Bojcj+=MOD;2ZE(n%7%F^*#m
zT+5G+fTCL$aY=3~P!x!0Z##vPMc^d5cB2px1NoDd<%8dnV<QPVIy&fOWPO`ju4n}L
zNMhvno@gN+C^Xu@M2a<dvRs>4+p0R~>F@Q7^WD|aXA5ImISBXG3o@@258b4*$G`Vv
z92^{QK$68BOtqYAU}JuJwjII!PKs1N$=v^JkP!}O7R3X)L?r_q=?x*fuIHnCL?6=N
z<zs$8*-7rYxASPJkuPi+liEazbZ^o_@+@MR1=?LCmp}<@=u05|=PINonKSe<Ofr{;
zeazrC@?H<8L73=d0Cg2o_ni^--`iFsKv9N^l-I#VK9moT0MeuV;p+kGaPdFt5{G>y
zzt12t=EfT4h%gFjS5MhqT|FSyU2zG;W!qL8xYMt5bkx(;3|wlrRx|%Leff<lko{oL
zRnJI>WyFT&+G?!L2fnFc?K2T~cX>$WaWKt6LraTk>wQ34w>oy-BmwC@Pm$FsG@ule
zqI|Up6fAWuuWor97pD``Qil?0EnF{_n`_$p;qdbNS3$(jc|zm#dF)AO|B&0#(KC=>
zi`~>vV@Iu%9avA7YWEUPT>G0FUx;p=Cx~&C&&9ES#FX{6Xqv3D{JdE$_Qa{f`q}n}
z)6Akcaa`iSM_6F?9b8URjMMB2K}>Yn`Vo3RlXytu;eFE)In6=YvMbTcxI8ZI;SAF`
z(wB|=Nued=vk}cWeQ6@%(<vn>sg)t@@?_^cRiY-gf5jke(sp-H@ozyKoF#)T7guwF
z9&c$7fM!*_o%4tr?N-@P^NB(D2nMw!3=rUw2?PpN<HRGuv~(I<#5BBdH#goMhR7p5
zR=Vms^5_Y5#|2SYm~4Dpv->V74G07GqYUI<am_=8wc}=k264l76n{O4&X$a#pe9<;
z_<V*GQLkY;@BDHMTl^hkg*)ZP)+}e5a?vq$aS!W1kr-oTXE!<<9i<&N8HfC9#-NHF
z%$khs`wih&28h2W@g`~K|8)g$e0ZLMK7YSH@b;@a*>S8hX%+1!T?W#%2h^X*Yo7jG
zXXLl2TV(UByGxM!4cCmp@Ox7MLeO}E^QVrCTUqKGA<EFxNMSLMn4fW$8Wg6;j~C|Q
z*>U8J0kZ07MjqXwZpaf*hAPJBSrnd@dT8NQcl_mjC{gZ4IrAJQ6DCH<&$ppL*w87O
zaar#qH~vguYVmIl<k`+z8tf>LF_l)!!T9sdN%G5dF3Cs17(1nvI+Ug3R&11Zzn8nY
z*v226=ZZ0Nf{q2W#maGJp_f*M#009BqYgY*KJqY)FFV3$Ik5qhaT!|;X07X&+afeK
z5muD^kJ}9eG#k#yD|j3q_VV?!Sri9EZ`x`ssnL3=cFDWNpSH1tWV&Y4hS;#(;MJzW
zr}ts6BJ=DKte<}Vp^7uQz(AcrMy>)qKI4&PEDBXA{XhoUk$~6n0f(mY*|9iV6jAT~
z`<&<WeX;3!7l9jKA?9T7&trF^C712%F;T2d*4BO@<=^bXdu-$XpMLld&q|BlOfJl4
z-qYI&DAW+m`aIm6Dkec?{5Bi5Ssb@#R=rmQs@e;$WbI=+T`7BW5TU!>>))=Nn}R4K
z&lCBVUgVnpR4->{y}G*E17TrqH`m9l{b>>zPeH8R_&!F=L1A_LC9lW7`S&>93Q^Z+
zIywKqDN{_4<o#7jmxo6Q6qYYX`$`d%T!C#@`V6msD6%~K35syNoQbFnXw^26n6Bag
zmzqYIF~9J!kLhEVR@FlZZ~D`5ANM=(e)P_Y_*ZbMM&7Tg*X*6^37w<09T_(~r`VhO
zrF`*+)#OeO7v}3Nr6U3G6BTTdsg2GCMP&<;1$ggLE+1S*jn^$sxMI5R^W;?gx1R~T
zSR|)9(a}DqvGMuejyZHJsv8wC%rDDl;6Ua(!{ZWC9(f=MJCOAHs6}(4VtxP7f!)z}
z7OE?0)T|R7EK(W=!-wWaKo-_;YKV+qIkhF76X=J(K&5cM2ya8zejSUcKKrTel~F-*
z_Id2d_wmM(AJ`Rz{u0o&w9oU;*`EfCy@Q^fTy*4#wzjt4PnDB#VJEDJ;<i^@cKw$W
zP6Fhr#oyVwmri={F=czO!|yfL>7(sW3}BHT(UNpNyqAu=E)YY6UOK@^9x1;6>haOB
ziRWK$6D8Hg&qS8LVBr$)hQOF}+QVpRi9dTSId)kB+Bq-qR`w==POeok!SDs%=RrDI
zY9YcVq}hDmP#i<GOi&ye=}O33yGvO}Qm9F;c;6PnbmV6=f2Zs?%{u*vQIEJ8ByJ6d
zeRtqB*7MIvKi;u5P{v0<jOL6t($-c2a!r+|geI5^BDLQ*3_^PNYQ@U5j!BP!{<Ze=
z)mg{7$Y8>8BS{$qf6K-?eO$k}^YnhnOKdR*!Z)S*IlhYNG^5H?+h#M8ca}AO>f#ze
zeuI0(jW9KD>>^eH^s$H^j@~<6OHFxJ)p&JjpJ-`gFrWJg-D$EY>HjCqhZ_Il@N-dD
zwcKOZ(!FSZlht*ewX~_*`1k$R6aNdL0is`)S=kZy0Gq@T4Vqrf2DHAzrRS<XUEzII
zpA?9O#6X&l`SmWPuz^X+9HeX6vN-peDbM&a#QKV{^>+Qlzs-HmU3l5z)mTGB@TBWF
zE~c|wP=x4A%O)1uq$Poc=`uujow+mMXTOw8WbhGBWN`bhyILA)KDNLln(kVaKk$W_
zZ4?vJ&u}N@c>$8N8Ds+4f#DZ!YIV%(P#}YWJoAYXyA3<n%RKlV;AbBmTwIae-e=pw
zX_8)SzdN>LyacxNOkvDf13y~jxqeF5J2smLrpH~;{<BE4pmg^3O#5BxH!_Uglarov
zSb_PDDX3*PqrFeymBSy#PhTA%aW#>_xMIvxmv7bMy`l=3l?ht70uo5AJY7f6U9|h&
zZ*dW|MSPY_G1PvB=+Y=6yq;WGP|A{)n}T)-WL7=chl&bXspc1K?!9Gh*RgyyRhQuC
zOXQ0}m^|^blBtb02IAi4`IcGOxtta2pk#0;30I~*O&M{v)gTE5w0Q3JR@I9SSN4uh
zCLNU$G|KW=D?B=Av_fo)xR>!6>4e}9?37~=C^fNC-zTd0H_!gsVvE}B)Yh+GObXPe
zCGN?`Amf+@<q%ok$ke=j#R>nF`jQF8J;`azQpW(Z%;I)$!2^|T#CZhKS!Jpc`Lvi3
zphYBTMeLbb<;*<g!DHm4ISu|R5Ln9@x(@^NUM?nGmyt!1V7?hDGpMBX2T${?8mM~|
zD&w$eS3$h4kq=5rhUXq#H)W%M`;JY4u--&!gX#UAO^aXvn4E1G+0Sj54dl>>=<Kq+
zbn1QvL9<2fHCV#PRvv4FijAEcIS2E-Tvq53nPDgk?PtubaaqX1;lXwVAZ}#EhFgF?
zb!lnEhR&pnQq-f=2c}p;j?qXr8WK=U)lGrhn<Y$ND@jUlJ}N(yMCnhvP-P_|iQW6v
z@jVN9x@}xoRb9X@<vtOh!G4e!s<7*4;qLttq}*g<20<!@dUa;TPVlU+@Pr^$N@+on
zfYJ*`4kQN92?~7R=&FVDEt@#)nz1j_GO%6kZSm6{7#8`;lp!Ho^y{}$f0%#y{nV~K
zNp8<AMbaZZJR;H_NT1MFSFW267Cf$6jz{K05uP0^F;!wtwrlC5ENq11(w*T!8oZ^O
zHceEj@}A^5-Q5l3fu_AV;u&kfW>`N1b~3D|SuC<BlZx3QV}u#88MNQ;R(_(6*#i2G
zeXH$$GhOajH(r@=Qp9uUTm)vgsGzl|pio;M1XeFG1ziH_3&$P}904tgukxZ&c3f+p
zEykdha9Vf|YTp0uyM+w8rtd-+4r5z?Yr4y(rkc(p<A_7y{r+I*qi+yI=Bgnb#OYgl
zRl;$4p+B(PY+B%vFl8|lil@w``gITa(W&;@<0$-O3Wa#{aejIlYa%jH|E3{Mlav)e
znDjfmKKVr6b0oQ284b8oX>Y-QyGT-6g`4Q$L+Z_CUya#H(9G~ZCq8L4c^-iF1cAW{
zVEcB=f@FZ$6QFLy?9`$$$q^U6JvK?i8kUyEXG9bZpBJ?5s%cE!CD>Yzv&%lUIa-Rn
z0X7yV<QU4k855C4ZWQKq6%c{QT#0w5$r5d067DePWG!2$emz<AX42It<{;3N*zn1-
zh}|Qh(r+WoLFybD>YAatUHr{SqdwA=oy|P239@gVUJw=P)*91d;!8h9HO$Rxz@L}J
zz=oFSp6q~Sweyqw!XCSDmLJRCU)e1=O&Pbq+*?5*2C`0%&yZfs>}Qz0P7kQ^#43Ie
zX(ocAAmiH`y4ja;T*I|#gtK$w7HT^cUtYr37KzHyD;+1mhNkqZ;JdwflGD{kHumBL
z3jPvb)A8f0xD{^$<vAt0Yy7Dhqz8TJ9wlMDKo1+a8eh=sjuQz;KI5RLF0j3A>|}vP
zazQMs23M>rwQVhc+;>|9CFG$GyM}4G05A*n4CT@{8E<2Y*%9QN=i#y+Ot0ZaE?>0Y
z*8rMT*Db%U;U{=Ya=Ubb@Pnj1;xxoygQG>%;Lb)G78mYy>FnSg+-MFgTz{`psj|Xi
zRq34Mf+N(n)vtrW_Td0nH8SRyD&0%TDG$FpCmyfLH*Bs<K>Y}QoE}B`ICH8Mi?(*?
zc-3>DP`5$^bR?CpH|UrGcO{d}%Sjmq&FjQGW!Z1(Q%E=iFsG^~er$8c0DuZj+t2cz
z%I%UG(-Z_n&#IG3*I>X{fd~=}DQ1oy!(#iO3f79mp!39|7~NFlRPgLbvlV36E;4x-
z8+?uD?gl;=O0?YSLM(mG_<GU%pHlJu9Wg0d#$1@Rz@GuLw)Q7Y0E+<JV1Yqv-g(CT
zz64d3isVJ@`_i$ghi6ZpR6rQR_x8BxI1`JKF>m%y@EiNk2O&N%8;GC~Q+i44?QWDD
zrH}OOYPt`U>o1oQ6IW~<T3Ux-pw(mfH|GvL$MLu>=Yf~9#<|CNOmC8dT*Vlw{YoiF
zdlc#vEOvA3uR0djJPOqkysuXZL|dvmQNOZ=4>&}P{CtCgT=<LRDk6UQ@K?Je1=CZ0
zvcKzI2X{?iap+*F&=?1o^b1j@NC?hv6K@|Skryv&-!0Yr=Zw9&f5UPhq&(M-f_5Tx
z_Ck2(kvfaQNFFRO&})q@W?}>T(lm)HC+816!6((KD3SaG$bx;{sqX)$SfDSy*EzQ0
z6Tn^0(6;qlsH$z_U;{f}MZ=S1h6+kfg?D%T+Nn23AN73-`4PM)fJ1>H<@Eu2ZEQ%{
zcjJId?U~roL+KtLqmi+bsMi;!cs*)jC>ol81RuA^b_+ARJGwS0)X2iZ^v}WZ3ZMu5
zjBmvP(or)^vU_uPD4sycE*&1kB+YMeVsddfr?$d}npBUKAokd#nByD5aF#1x;AeP{
zoSqovexDQjBMak|_bqkT994#ZUb>)1xgvTXT)>)!nO3uZ&h>Rhygy7_rFB5>!CgQg
zL0cEp;jqA}ML@%NZaQ8Sj|?Gyb@!@E?s4HXAS^AHh}UT`Y^ztn`y?L?8noba4=6J0
zL^JyY0n-4)Vp76LKoB^(|2EnecY;W<KKC>hd8EVk4wb0Bmxe3F8S_y0#hvmkXMWH)
zgLp`1KB$mwoCJ_CvuMWoM%|fTXTFPPEDBaRVKZmQh|Wa(mt$VdB2o7b4ua6!b;w9W
zIZLom(|ujPtFb~j(44fK^|98w&hXR?=$n5Y6&emwt(gj4<H%ya-i~j%JS-n+6ZR6N
ztVBB;uzBJ6%T`MTD4Wu)_4Wa^{;8cpIsdMv#1mv>m9T`z7=#{@o-YQqZ8%b}ny1z_
zLhJ@A)M!1@Q-s<MG&=u=096fUsNU)t599tB>AXohlHf==VJt04b_TdgDZapaOXX0G
zHa5Hls-eu13eHwE4!kWo(EhX3Sl#e*qq~M4i=*&tYXg+%B7i?1PGnSC6)ig>l~1eM
zOP{B20wCeZ{b~(mz;uL}BXk5;HJXkbI9Uwb0o7G6psWGbS;+pfpCD3<&W-ksM7hVA
zQeOZkVuoC5R1@e_;LTbXCCs}W@)1~&FdJ_dcZYD9bJ)I&ylH-Ov0qDhvC>>enI#g(
zz@m`mG5NnwO0p037zh0=dR<h|BODWIgn@Nr?kuRCT~DdSm9^JQ<8O(0`Ep8NCa0nb
zSEBO+l$EfW>t&+s%$KW$=GR&uM;;}Mx+XtdBsIU_9_Ek8puk1G`q<uqRpT=upccNZ
zh!9qhyDt*&aYeC_J=dT3F$?_Y(3s4cRPVGdIU^EOKx+fQC7^Gr_$P5M7?V>MT-O!Z
zW3l`y_I1I8^XAHRhV8x$nKgPaI{>ZvRvan`m6q|E)w9+sB78eb4FW0B+n8X}258WK
z!RFUyYV@%DrH#iC$x_!_D7^Wo`ik<~x&ZlEsd_`+U{y$W<o5ONAl*Ptt3X$%?SSh@
z=i^-k4R&TJ;>BEp^KT;$v&0hv6jc7H>0;6yeh1wzBE4EY$d7K8Yz^>1KqdqwG{ywL
zWW`7uY_nQ^wJFCJZ2~{$zDAOp1PBa$=|)t&=ykL4<mxP7miV=uD5#s<`G1dv<l~Rb
zkKmyCrAo_$7Qg43n4tSt1fKa=iT}ks$M5o>=wJkwqDm6Ljk%5q>^5=lW#;XVZxIWw
zR@q5h!*y`HGX(I0LkFvyHe$^028PIx>RCcwRh;?b{F@jRKforX7=OL0Fl5#}EJXIO
zTr5RwX6sq~aZ{&jVd}TQi@)enQYK7HOw3D#R-{IY4B|$Ha`<mrP-<jWXYqwey8|#Z
zicvGo$@hBWSpu5vEc4x)ihwGtY!anVhC%)>9#yP6sMhnfCPh3H1-P~}A~N|r)CK4p
z9$3u09-R~PI4DhgE4XWjGMlZZNBn9uEwV!VtVGec(JA{)>}2T!&>k2w8K>Ue%9C1x
z(kwzS&8>UOLSC*dly&DA=;pjd94DI);~aq!A4sSaWDsc?vfGgit!}SWSXJev3IE~5
zpqI@B5*lXSgJMq)p}dZZrPC@Uc5bdW!~p?xD!N!hyToP?GZVZ|y6|O=T?%c6`kAIP
zUz=2{Oym$QhKh*+u^eL}oxyWFvQZJML@W63rw!%Zj(OT)BLl%5LzFu*K*YH$>fW0p
zWE-<ws9d?+Qa54?c)TUFfV>+?eDNR0*HJ=jdguRy=!u(@w<Yj@u!v7(J#QG1-5AP|
zzZ=<$-HPqgzFgzkQ3|F|V_e#Ysn<?;6_a<lR(Jlr`gX?L;J7qvc@Ww`2Vu1Q3aO(J
z3>kcDlhew1kLK4X@W!$`1cjA{S^>;$4v3!nrFo{?WDx^Hf1afj5+**g4*D4IpK*tQ
zNSr0bl$<eP^fL5zyEM-xvwoA2fdLNkz~oPj4^-Y^m@aJtrW(HXkpT<+5GczH1?{{^
zw8)5>qNG=!rJ&^|TTrO69;YPq+6B49fnLT6Jd8oteiU3R)ut^%+4$o(fh|rFA}WZ5
zwpqgfQFWP_0F_gWEe%7G&Y})CeI|uVevd&ptms_v*hb}aXG3d)E{5HH(cRhbfEdAO
zG9)3UINZiZIGDP@5H~;s`#>Ib8@iI_b1ctJ&Um0W50f<Dc^E@oLB%*V^-ieZ60%Rl
zQ1P@3!zAM^J$V^n)l@DAf%q0Dm|4F<J5J*(wa8FT`H*Lj(Vrd@W3hQKedNEOZpqm!
zX^Y2Vqs8)r<TGlXdFF!uK9iSa8(}1?x}nze%klH=J)FB^19YRHogMdWcHWS!anmn?
zwbt8V46RSEGu_CY6;qNVTC38jzHDSkm81)1OQoxd{RP{r+=kdf_Hql&SD_|tRK0!%
zD`bl2&`T4xhU3yw{|Sq0s_qdY@O17!0r_q(6nW3MGT4VfU#(OUl@M#R*Q4^DQB7~?
zNUr?r8JD3Hgn*!DX`aIO-T!|!GW(H3u-FjAQ~u*9g)?jP*$ez6H~&)R-4c^O>@yVW
zb~2U;ph*ke+dLsARa|nBF79nQ65VZ2T7XG<alEob-yZBBio`7%a-SiHeC3y~(1gr#
zorxk0nG4<Fp7Lo{+y`d!J5u#I(qnUL7KyVKdJH<5SFTm%TGZwUAkXhEe1<G_Dax$H
z$9I_qMhh^wLOYbN;tp{Mp>H#R=Psru(N_SN43EVip@}aTm3>N1n-WH+ipotP5c5UP
z$UuPYWi<+-tfi$l43u&mu^H)cP%`vV#gh@eoM<Ov^CLYL`sTM~LX*g6RCbw@PxRE*
zdciwf0beovOz~dn<^$4-L)L>txabJ&ydjGMO^}mFLBScI*`g34N`2K-J2#~*3c3s5
z!`>jrpfk|zJ$`SJmBDO&yjNkqhp8#nEO?bL7mEI3@4%&KafS(4O^uN59_|@_<ruw4
z0lE>wJiWdqQLFu;%yO2byR6)hO-oy&Ry=ap0cb7+ADRC%z)Nta%1!(l{uPg#goI+e
z%XExk8#EDSwfnl)aF45!#e{3)?U#d?Aj28c9^aam(dFM_hM<Go`%+t{mlL%^&%YxT
zpGRaQnxxN0m30}Slyuer=ujQo?PmdLbR-ObOsva?*-*s;AgcuLhY7zL{ggS4<nXct
zv0wuAFx6*5BVkz1W3qqB8@WgGq0G9V?LOZoMF9$IElu)VMM9*a`_~u<U2l$_E<~GQ
z4?(xkEZAKd!O)5yvem;aOGj+A=Rru^G&1=HgKK?6ABsz7VkK@WDx%801*-8%pNnyI
zs)}o-MeZ{iQhbHw4<0v}%s)eXuG^h~TG5Vj6#H7EJ@X@D=Hi!yV)MwQHJz<n@nb<5
z&fm7+aa$hMkVh2QGy&lf@*GQK1LH2sAXB8K>V-Dms7+=BcNw#g%?=lNgxx$e`{hNo
zhGmP{>umfsR63)GH`?L%yJHjWwe-O;763GEzRibgOzg?u60doqyQp)p3egW3Hh^pV
zEYYf|gwVZ54S+*EmvEWqb2Q|8;+wd4Q>yORaz?*4IHBAAD$yX_CtWYnR<Nbq^;ekH
zN05q*uF9%A$RvS@@U$oVt>L12w=a7c2r=ApmOXYQrSzp2VXz{F90bDN3;yLuIt04C
z0Xk~2+eS{6VQY)Sr^T|6%c9&NXfz(u9zID>Rf*Q!lDKDZBz0k7Q|BJMPLJd<uVx<A
zE#3#NCui(B@q_#%!UnjZ54Ok98M<-*nlfoyLC(La6<xmIy$K22?27Z@)BgS&`G}n+
zH#b)`fD3}gPmTmc@O46wh9n614SS}rKRZ85TyeU)iO_DC<qk9jMWxx?<EIN&q~CEf
z;_m8qH`W`$ZQGbu&U3U#2#3HkMpQQs)MTIoS0;_##z0Dlt>XEP_+n6ioIht$d+I$N
zhL`CHJJ)+Ii!aa1s@u+D4xkOE$E|-hf}ZYkFTY=`ofI+V`fH#pzmwBLz&S*doSsD-
zozz!|dkj(sYH3$t^)a0OqE{z@AN$hy=iq(H6?sSZ1*m1Ke8_#v+9cjD3tZgjNsc8^
zdVDA1ib>>u&jD5!KHS4^5PNlyzxUU=zne#q3-#J1Z=_fq>eIHltjDwzq1#zjr~gb6
z4sZQ-v#PZk{#7SLdw?>z>IVg(s7wk>q`Cz=7sU~hdZ#N0RZz3&0VsGL%wNRULY5(H
z*yWla%^b908`Z^pVPGuZ-`von<I?$*+c5~N-JTao)r6X_w)vD}DX&O343pd_<SmO;
zqs$~{@5B_2-Yum-r~J2!8JtX33>TP(A#aQ{X9uC1ks<&$?iC%0>=GBBfzL>uvHk?2
zFEZ!UG_H+N9{9E@C~H}08N2U}X*VzZ!-n&`y9A(}>W?DujewJGG>~%m*a?EtPZ^gi
zNX~83HyIs@5~Nm8n|;H5E(*XLW#X0}HxtBktcVnLzo1cmWP5=}jegXPM%DMvt|mX8
zgiDs<TcOl1GIo<-8mJsqfR@MzT^U=)jU+DDaP>bM3w?~5Z4PgLP}c{|Vy2!g=T+}b
zOc_;-TOXh^I~^M(eR_9TU2(LpR@6dYcKT4Pnq59yf1Tw~q&9*jP`istEs<igaJr?K
zoG<p3jp7bRVdCm?uJoBa?Ul{zX5zCol7jb@cyr=Wy*M;nTk+1&^c%o!1;a-ilbU^V
z(Y~6es^tJAMTxhWVV$FX7>Pk*rku&G5U}HnhE3@J$OGTQ0Tis=#olBuz#?hZmpd`&
zujH0K)uRswlFVvMYCiLSt`zt2e5*Eb6hb-y9Vb0+<8_S#v;%g0Ir_B%!d>DHt2>RL
z=iZ^Hn4EZccyWMrM)yD0BS9)^aav*F<(oZNLR@Jn6AZ`-Bhq&c_#`#p%|Iq^VPQco
z;;ilm2N($sucONY00)veoQ8_oQ-fPFY84o;WD)6S2gESB1m3WlnZP~^o>kw&{~Fry
zCBgq1%bFX_x^&(-XCWFM>$kRA<id;AavmJkW8~XNuOj2<Vj@C)g;Nvm&C-jl=Q`yo
z@=_{Qy+<}!8qo_MN=lvP?*~p$A>03&<?p4R#du-K?+ToG(U+l9MizgVA(trW$Sw=L
zZ>Yu|)e0<@G(G0WdJ7u@I-%8C!tb>yp=8?2z*RYwDmBB*#!iEyE^Ss-rC;!uMbR7c
zl3@UW9S_7m^$a+7lI^30?OUG;bC=wQ|Fm&c%raMnSr1q1t<^5SG3IznS_<Rh=_rn&
zMlQI5(W+HW?G2LF8Z2D+XXv3nW&69$2O-$Zk*25BM|^H5H)|Y@_`Rb1Bjt4X!&jlF
zrbP>Ie(}4B8yEL%E@rmqkivdPL_Ar^fckVkC%phTri1_SC%=5~1?|EbuRlBb@$e(&
zp+fSutF7A<UGdvshOL{^O{iMj(l0(9v%c3+f_*@rx>`PwjR;RDNUmlsW9i&qe&Asg
zU=3Bvr`)goyb^GC($B#*#56#-WgY?zf30HHxl?j!Ly#B%H3)E{dUY)^l})Fx;YIJA
zb3yydL%d+Nl-QeY;WWQ^BhNUte8AI5ms=i8{=GY0P;`wR^Ndm8bR8U#bMkyg(yR?x
zH=KF^dx|ZO2&37lI+zRqFagjz>cLY}Mu{ZweCzPJ7l^!^nGyY?j4Sp0w?VA%{i%`)
z^CW|^V|lHIkrs8+<d0>j3mH9|zt9H8P(ap8*n{?}FR}m*%WB}g<Mur)#N4n6yPU%_
zwMLYMvH2|P_1#kw8;I5_D8CwVpUYt$p%Ed>dniXmMMx`TnZ<?;Qoh2{jX16IuT5(5
zQR4=oQZ7hHpmeR7X4*MCRJq}uHzB{%xlR<qG?=lfU^MG(L|35_Jq68n-@=w7+7p(k
zqQg{uGUJhPHHaA__fZ}@Av~-Zs`2dmAKe~w0Ya^B6S^<8+J)cwRW5uHce6x$9vxcD
zcg+R3TFvC20Q?qaZH4oUpoPyvhswXG1(Heu{zQko4brs(K*mn{)Q-^wfY^?6moz=<
zy)y63cVDB&Q}czQVaxvR2$<;*$b#%}9Lm>cRy||Ipfa;YAxy>mOt>V{SA{Ii)3B!a
z<n`E~{X<qja<!Bag`+#JsfqtmcREyC|I{S%b@oGLC=+eP4?EMFllAhcFx4PxwwZ<Z
z422CpnXZmk?1gXELS?UySH36Xn*$esa$3pMZ`4-ENl3!e#ynBKZQJ##9~`Kce8-!U
zm~NnsG4LR_O%)SXUEqidwF{0352Vul0)k_GaGlu_c_P#b#E3VId`3t%s#{)_to3Gi
zUO+r0?VyIcJSOJrP+T#nm#36hE}pb#=;^JEnY*zkd47fXZ`yN$UzLy5yM?Z1-dZt?
zna<?z`Xf`Vk}F1t1(!%t;Aj~P6e<<v7<b|dR*Y_FCyVFHZ0#2sgrEF4*EohY0$s%}
z_9w*#(#3Q;PUZ!=$7ox~ZOW%cW|_pi<6yx_*8tF=L__0ZsySwFPr8IH_78DHqLxPi
zt@nRQo7Wa-7=CU?tMe%mdMj+lk%3C9Xm`T~=yxCz{!!ni7S^0+3F@BQjeBcWbe+s{
z3NBFuqnzbRR|7tH^XpxcnA}TB2XSRgi7pH4k{<g*zBqJ-SG-HTe*i%!N(oO4W0M4;
zQB>V{Kl=G8dCTq`wT_wFxv4I{E))H`(s|l_P@)C#!$^+V6;mtTg*hi(q)sGrb#!a*
z(dZ}CHn&9KUIGIJaGc)ejXJz@Qs6jRTJB>%yG6Ld8-v@Nq8^K7%D+ujs4Aa4b!t=O
zU=M4v7IN<4#mw3Szt^f>g?iuSwSt11>w>SpJcomso7^s*{q?MTD+EGLQUkX`R-%sd
zBXOtrI=PH*waXNLspEnhyKS!O9Ol6hDP2wE*cg9InqI!N6xd1eFpm_s&DwuLL7r|3
zEak<Rl)R#1Ma6E#vz)2$DPL<~oIG8v2`#2^Yd!T>I*a63%X}+0sZ~gowl#ERD(2h9
zKrRSB?6KE??vRx!GP&28S2N;ECu{L%MRnj_h3xF!D8bGpMMt`~dHr!P*l^Q*<m(lc
zy?c$niZ6&_qQ3dvW^_WhF(NUlkNZo0ob~kk2wcjoh$`$9T#42>B{9~>NcfdT_$tgz
zA;nziS_(gf8ePNdHeZ`*&6$|8ixVw_`4csUsK%JTJ(&Oe*BL)~ME{{gB>zk?bJqk;
zrV@G6Z<ri*#gC0{drABp3Z6DLgCR}5q`lIwd+>pw5y=|0_igR|iJ57|uP2repPhmP
zB_Bd@_l^>Nv|-pfhF_j0IZ6eDBqimB8^CJ*1F9TfcS(E(SeVF%<X+vsQdqi_)DN8e
zdHiP^$gO->?Ox>vw%;8H&L>di;`=$U{~ac^OyFb`63q4KmG^c~zQ3mGTKd!Wh57hR
zXtmd?gEdZjC0-)qzI8?mnm3pnlB7{()6$L9gk%;g*fznRah{zY{h^;uPk4olTMWGw
zW9vO<6D}Ew&3%C$f%yP%hI+ESR?8@zL_dP%J0H%5o32_8*QAoo?ap(9P-WvxGO^DQ
zbb{0l{x{Sv`$Z!~j$bt~qd`6Qf24>Bh}pYX7@upHj-}_zyDQCA%`=3Z{d&t*IzyK+
zF7{-QM1Hsdnu@<P*?0-j4FJ&aT6!eE?=-$#N3c=A>#(Ur4KjXgt9ZuQF1NXI-CZ9=
zHp{)E2Dwu?IeRz!;B)4YZM6N%V8c~I``GBm>r^I&!Si=i^@TBYOW3l7^1>56YYZ|T
zV(O^f+?dvAAS{3TH1WQM6@ar;3@sG=Z42NKI_4uEi0S=~8IiH{=Aq5Mmerlvs`81_
zy1xMtCFHEx{KDtBjgMRrk>yvC{6!y@`hEdQ-{+v!JmEmTm}_C+<xE5iZ_@4PPY-`r
z7heR&+VrJGu%CY;+(@h)CN`xb$^8t-vf9V-=^WjSH2wX0Uzwaty^}fA`_pY(u>!!B
z?33`V_}AZVPf+(Z9r5<2^9(;6zdCPfKZ`LXAozsxu2`8P4vmUCu8^*xt0x2{!u9Zb
zM6G#btq7?6g!o<Yz#pW)OI07>Z4me6C7S+kgIHWz+0vgHxapz+EvL6X#^b)jjW&1M
z6G(%I9Dd9a!(rHBcFVUT^5&9bpW(Xu?%#h&Dn|wMdUFKEBv}VEp1KaPY<8qu$9vN*
zfH+RASM5CF8sNMYeSOat&N_O>Nvpp<b$R=bZ2n`-Rq`+Z?r$Y$H@2x%Lq(w>fz+(h
zzB+ghZltliWdHqsv+Kzpmn+lS9!%OK1<uMyh?hsGR>DQh`mQATBR&`5aPgshD-@Rc
zd8OzWO3on#LCZv`44v{C822eyK3h2Caieot^#Pdiv-%PmcrN!xSxwJq-m1*Ve)JK3
z60Y$VV0R*48_7I9YV10!etEt6_J>2Xvbq<_Z1WP6eeQNy$Swdl*9{qh#GbFFGWx&u
zKbgs)I@##(a<QNP2)+g>2d}Zs_9NTH@-=mo3@C&rKqwl80W0-BFV0fjPvzr3_ZQcg
zu0chpOuBQ#PCspBM@jQkklz463}yW%%Hv8;OhH%!fHP58icYeplB23VZsJ7JSnOCm
z4ri;Y&-J9JAGg;=OlajLW>Jj@phWfFpiFwBjdG|CR5fZV7A!G5QOY=Sc#v&aRisL%
zv22@LQt?a>Y!>-M<cTY)k#ZY)A=%Ln%PXF7SSN%`y@tG#-0>bn$G5r8*yg@8@<qp)
zBEV;Kn<c2vdOsPh7gdsr+UHL%jal9a*no(hB@`dFH#kfGiINPm)R%}z)CX047kkwk
z7Un>wv@lPBkGro+q1NCnKC;lmp09i5A9Y#w`BdXjPMs~U=D)-}(d8#XIA14`wKg9Y
z0_WQ=Pm|@bG*`>)SPqsmnyq(RgIwyT$1mgGycg<O=oXi0oTF)Z4Ul9MdwX@Le;~s5
zSP2VS2SGf9E4&M{L^G3kYVy&I$Rh%6JOHCwsB7$)oE_gMZG!v+f0iz7gHIt*Kr5k|
zsl_=da?fH?B%|T-r~}h@|DPBIEvZ5D@$>K)Cwn-naQ*Io7TdddPU7o*Qd;0m&$hUO
zjv4m+{(vDg>rHLy7LT&e>}8kcrGIWlu_J>3JCKf^l+izO2`8#LA8M#Q$l!mnDx;1U
zebdT0dneeWbn`=Ul&8q)CS^7asarT?qF5=Gwtst5x5UPN5Y%{2Xz1~yM^i&j$bXHX
zLZ!fdm7Lhfh7!Q9Ov?55yu<f<@U9OH$!kWaqT;PES3iTa*#J^2fGCTK%v@QGYXw%q
zN^3vwuiiiq*^9Qfe+I8En1pK&dB7O9-+fix-;ShN>l?K&wJ5~w<=CCRfB62J_$;|_
zzeOkNUl%@%{OwZDwQJ`(>@Qas@gJGS6A17M9F9UcspbZHyEF#6<HxD^^HsR<Z{$!C
zDPAhQkk0*2M|ute)=zw7`JF2OymUf*z$D`ptNmuuX;2aj3tzR1D)3vB0BglGm+u~?
zcEPQ#B^QgC%`0<oW`Gq?QO~Rt^2!X#{*Yd}*Kza0`)sfv#%V=xsu~fH^Zro{wxMpa
za-L1%caokCO`8#<;+rO^>`|7t(;MjCqhQOAFy_zEk=FIbHeRvMtitaHf8R8g2aCn*
z->HxAdAHrlXW<(0kNLRYscKHnw10{^cB|0Jlnvw*h1}{~p5?gR&+uEO3N75ee;xga
zq4@Bd4Rztp#bgosXohHu(qvxxev6^9ZpvOxs^Ys$y!+G4N<(@}37tWOkocT1Yk+Lg
zf}mov3rt=s-LvTtRGHJOf82&?<`vN1uCSD+ru=3XRfGKs<|f+14-SA+3xkH@rM&`H
zn5P9hVz76~y9;Tl@*<r1a~xd7{A1Irs6I?x7R{Q+q*u9byju(XYy3fTP64b1PWa=E
zqB{lt4!ohXq!V7w^1UFI6@u{ext$jIP&N`{i@C=>CbFOSzXMo`6}7aN0>>1Oc1Bfk
z0z_K(CSoOCciZcmh4biipc*=|a4icpLB#5x^OK+S^XSCl!>glTL&hj}Ku?uW?s~tZ
zi);MZm;-$lpm*cdi`hk=mi`>^(`~wx;582PC$A-bj@St+fXcuI5JYz0zr4YCWU)eh
z(SPUtk9mI%a&Ru*Zy~9U+1D4#gJ{1Yl;h*sVHTF4GHd|a^-_zx!nG;Ihg~pHL7ESI
zdKn1JzlR%>R$er6BUUR4>ZeRzb<E<tl9QCCd$!PdQNeF}Z&Y;axi-}E8(;qbnYEAt
zTFH{3hZCf13cU`6SAM_pd<KI}GpmqsF8`oS?KcGrhj(YgS#-eHe-Czlrli>}5bboN
zpYl-ENl9!cBHqIH^B47mkoe`-tG7aAjoidm(98j*0r95+QE$4lSr`CN*xMG-)JE$a
z{*Ea4*$W0dsmv`~Rzr`ms3`mZeorr8|5a?S;q*q<IS3u|#etb}uTYrx%hdf!_0!U4
zxfJR*?`S81y;JZHMY6C?Z)Gi=Km9v?FlKM@^`sB1m>a4Yr%(_Q8~vP=k5`B`Yn@(o
zYG#3i!@YQcMB<+z*TrNKi*p+;@s4*e*x8}zWn{3cMW6P}o6Ws{f+jj6CGz9L(j|6B
zMTUf{je}+_nd|iCALnq5ICoE8_%neL*XgYzi0<t9xY_`Zz3gn<f21VLmxrby4-tVa
zSs^_fQ>28kZ|H@%AWq910boz>=qYeK-V_gt{lvSJzKNR9WVOlW;jU3$#}NQcRZWZz
z7<iqY;QixVZLHJ(`)_=M0zeH1fG(2S(kxlk$`(C?rl$NLHsTbu<HYN4D`aD<&SSSq
z$;P$U;4@8TYVif8PKcJ@0a^A=rdNqK=FZQ5x1y=|n7#>a+T!-xyirXuv`eu3Wv_sP
zGLU8A)NKYok|S{?&AkWnf^PRpFJdCoCQ_q%h7~6p2=RHJU25G1p^oYZaTj`fN2dS?
zqB^dSb6kO5C3!HN*xjDP6qMh8{^H5?>vf(cQh1VyZ;PZjfp6vwo2hl*JcE~l-+4$v
z<E@*o&p28+*9LTUGX{qu08DkvM`*|wnc!p&Y|`e#c;J`uWOyJ89P07kDA=SCtGYDU
zx9Pbx1fmC}z%D9KKemu7Q0je$#F6BB8R6V8I}a>`;>qMBI6x+_5%35LZ{EW@fu%a9
zqT<^?v*?gxFCU8ltPwh)p#oq;mQh6~h6fmg(E}cK^Bp4Asa7;1|MW7S{o()XnKAGV
z&v%fVWb)xTG(3#@16_~WuZ~_+4nxKU($CT;k2u#t#nJnB-si3TV2`l0!lS;h>=cel
zCv~c9PQ%9bDB0(iC(^O^kgK+L#3<?H=Sdn-fftx|RTVBksQVy>f(3hr8nf>U(;jLx
zsACcV339cCk$iNR0IT%-{^;(bktT{1i0Wpn`)_nv)8RiYPdY)oEDAZ@<OImUcff{C
zMK>aifX@&S_k8gu4`ND#M<?X8gfu*_E@oIDglKbBH|98c=USnc?^|H0GrQ=@6VGh1
zqYhvvSU}4WfOFoEIQy5scO3v9ZQ|5a92%TepWtOy#Y+QI=TY7m?N@A~rh`H0=$R&9
zF%r0Mr$r2exU=m5eF_oj8G-1re{N^?+9Ps^lVpO*ziIoVu~aqKnozZBQo)M9Iox|n
z!(+UmLHjTOgSIHuHQ%eh3~uQC2k4YRd}{ytpGprcs5>|)zTovKI(+)CG0gz;N8B_G
zop)~W!qY^Ay!lrHG{ww>E0K~Rbns)<DV`Y&w&wD&yrBENKH(~=PiUKAd;Tm`6e}ET
zvhA@)`#|0}r7jtES$K4ime2h0*uuKtgx!ifo5biB>G(kAGi|5)J&n|VHh886_~(m9
z2sxWC2;9P`V*)?(vKC%vRsc{*OPf`ESlgqNRm?0zBk%HEKZC>1ldbYNs}wviGTsxO
z6u{mx>nIl)Upr;yv_S*yJMYmCz}+|GPxwv)_%n_kfA~exZ8V483LL^x8*haifG}r_
zh95?Lq#EVVVQ3uZi!H=?U5b_PI@hk8m9$C%B`N#E!YKd0b(iYODy{&)FH{tp|7e2{
zrWWk*J$N*h@DxZy20yqQ63w??;f_ZkC!4wj``o$8!c1udgz{;oJdeWy>9jQKtDA$<
z1n3(9S=JUEAr<}4>#=*60aWHr4>Zu|d6G5zh&2E28+&6e7HkX29h#RdM&BcX3E%#6
z23)8A@0ht@kKriL7*ox!Yz9{9!A)&r`faZClh*46pE5|KGNI>2tjjRi0}nW%z(lRn
z178;^&D4E;dGPY@4?5qTcPdvLXZc4KA71P~T>(4Dvb-d@2LPLKhJ^<?&PxnFWUE#f
zmUZc3Pn9gK4ZMa=#ybnv4G@0<2}Y#3ScGvwAl?0Ou-5Xb^eQYW!-8#?IXMvLk52G5
z`d9v#w&JWv6Q&y6n5{r-bq~xRr%FEoC#aU|3kAWMQK}l9AOJUYw+ORs`6Z_-FIjuD
z%4wLJXzbCbsE|QsxxjqS<WR_nf|qkdzJtRKYS~y&lK=E`|A0<#Y;g?Z1GgbS*L4BN
z$HViV>S2=iM<5CZv1}SZ+|0Xq#=P`U_pJ}gEhv5}c{%7LDZOe)Zg<2e+0Ke>eIUyI
z@Er&+ZuAD<OK-zqZwe*p<`3_Bj=gHR0;-1+&6owvbb8plY47k;OnBzB`GEr@?+^z)
zmni)g>3AmzuUdX;w^xOWwtRYElI9!b1jxegYEfGI9!4b$lKy59*tjJZ+O$wI`$eyi
zxNbvr^1A@k-XJ*OSm2*F-NH=LxyuMk@w)FvnQgM$^smBGfB|IvuW>r+V(&YSD_!M;
zd#uf{9{?Y9#2*~W-X`y@z)89ASfBx>le6|7RN^+AD%+#fre@{sh|i96#v1R%ysT#D
zs>v!~l--;LA}eB{uP3}FGrDdwOn?^%=)fjz2OLjLtL@(^GXcJhW+|kW>ezGgw+nWi
zjF`#fLO94^({jF&{h_hfe!!}l%ww@pPZIJOF4PZG+t3NB_Na8`0j_4fO7mo`f{rOr
z)25oana_*T2A;x{l^&Y7QywYV?S-uP&hIV(7Vu*-p^=+2?i6&LUT6=NVosnAt`*y?
zfBD<ge=})Mue~+1Xk?#px|A5$45x%}?gJdfdQ+_Du!Li6cFR`G%&Pm7FWWF?ZbQv}
z@Cvo12M0zHG00E2e|k3SvsrIM54Q?p$VfxOTfub-y2DPm^7z+kyegZ&%cxr2Hj%m5
zF5pSfyKXf=<`=i);Bxqu0oQ+ia!)-1w7~KHvY>Q;1rp@ku<?!tOgp_~aoD8WOOiu6
zmI2JYD8Q3e!v+j|{-f%%mG};Ju6(Qh+(NFUzFv5K0XP)D<Nzx8J`4tYI)Impaf&Q$
zYayojVFSR$i+#jCOr*h$<~O9eO84Z!St$jDC&?Sd%-6j~*yQY7h4WMFc!HmP{GzUS
z|2A#*I6;3k{CfZX=J$#I1-Gq2gyM6{&;=!6fLfNsKuAjg3|B+%ULKSK;8bvt&x8N8
zQ289Rcf{_Fh|(OM-+c41b`9JZL-+7zji3V1JlXyN$*MNT4&XLJP9pKok5rB-n?z2R
z!unGYUCXFhizutK*QI&^t!gvh*GHbaX;rZm5Dq|^?xaxBElB`CrL}&Bj{R2Ofcq0i
zSdnkOv24+ojmL}mc}29lDTfE|8!v!w{CRnDjg-gsr-;*6sV>UjwpRZzuu46&OUhIf
z(PJb=oUYdd$e}pi05BHdpL)Ns<{&)t5D5Nmg-6jn{RfDf%_$Vo5a9H834Prctd%DR
zPSfP<KvaQrqXYr)H(iT;rN-Se{KVKkn#azR7o%!IyW}c!=9X;vgNibU&VO4OphoX%
zfwPuxTK@#wg=-$#YEe$`$&IYz)ciHx9NfXq^RtoT0mo*jUCF=Img-U*G6FtF=aT@z
zKgEA3z3Dyx$S)`;v8Yx?_^0Hq9V#Lx`z65kTtAsQDG^(O*D!Ss;1X?qcOm@{ZBZ%N
z8iWaKONcaIzPxwzKFm*pU<GjGozpL(8A&XZ{Pauul;i&N!<YED9VS9zAtvC&LJ0p6
zRZj5k_F@>XEvi{JFH3L!r@VLkj=Ou~h11x!)1<MjCh5e+9ox3kXks-pv28VMY}<`(
zyD@sEzxBL));fQ{nK%3Gp6}ZG+7~`ZLlvWgE^{|-87??aPGgi9KL27Mu86(+hEtia
zWVYMOP_eWiu$W9uJ;dHYlgA6;m4m0S46*N7n~JZ7#vRC+0Pb6micDw1%e(DB{1|t2
zkugk;gooqB-yOBsNo6(Z!BhQTkV>=Wl0aS3^hW=X^m9igZ1Q&xgM6+ggJ;Nr3@fp#
zrf*{*WWimMxWCMCg;p-H@bZKJ-#Z7Ue5^3u3&5ZO8Oh-(#3Py&8bMl&?eEw{Ip-~N
zK|5c#%mRo0wCLC(rnyZ;icz%>N~f3_H@ZOzE_L^Z5kz<*p?-Nn_6{z$8e(Y8`5Kjt
zK<bjSqlf^NJ*W22#s;~A&Ww5@<*V;FOAD=}{bxq`zGeoYu5?0f3FmFsc&GIy5d8O<
zu5gEW;lHS)g{KNp1(=e@3Jter4lKu4wn?ym7~=c{%r<F(T&V;pDM(@SzXYoMxa>*L
zoqZevVJe2%J4ZJk-su%=-c}gKtlfW8=aGHzxR2Paj|<Q&KPfb`f{mII;0%VnPrxLT
z{-$8hQ&I+ztf*UR9E9QT)ZKaemDL|S&%3GZ#LgOfTw3DrTjGcq5}vT8HveO6v-YEx
zcr@pL*g3b+$x*gRQ1#R?N`x`Su{%e9Yj5fXiL^0ItQ6yxL6YZ%DMJ9izq<mQ#=5&j
z2YF&(j^=U;_+i;row(?Ti{&->_?(T66NO~1xm>osA<WMf4tH~@U_~C1gZPdCcgO(l
zSv2jM5GKVm`ovHJ3x$+bM1fv*J(lvtcNo6k6v0#;tZEi72FHTxGl@3jLsN!xYBJ-p
z?pYiYe`e8e<l3hgzs!Cq-_QAEP1#Mvc4g%&e8J$U<ZLe9U@7Sf30@5i5MqabwA2i@
zMGrgV?iVyDYGIqS_YK7+Gfn<r^rSV9m4i8&_&_BL5IjZ8QvHbo_)W7?lH5aOtZ=zP
znWq#Vj|F+HC^r5<^ou9WB>UvM1h}>CA%q1FSIiA!Rrl%h?<h-4o%(1z4fz_`XDW{7
zR*$?sqnWd7;}{H2WE^SXngCrT3WK&8qP^qOO2Z{U!8p#mM?0oLEn)S~Xqrdd#siOC
z*?JVPQ45*q-P&woQxqsvk|J9XlERcPqmyy|ukJJQw|Hnv_?L4nFgE`5(#;TvNryRY
z0ex>lPR>UUH2_Cfft$fTPFlrI1312h!yahWJqwMjs<FLWOBgq|rvBzjG1mC2GK%Ha
zUQ*_iDAQN|2)?eS?5!K#5`up2Mnmtwn!Uh2E(a;R6AyzNi0<l;UZ#GSi3fsCanI%(
zOxpKrrUe}};mzv$brLVBStdhxWB5gAj=vo&yVv)~-wNu~BQM2&^>URRA`(vZSe|3g
zH|EEYnw{9=MxM0&hsX}A_`8;a4t>|7$fqVgnnJVzkp$yx*IoiL2W5afH~h|9HvdX9
z39E=yej)0vkG2_ii<(M^en-NVfmbj8odV&Bq*=K(cq}2kiSu`eL*Y{=gYwhU)5wJ|
zFewR_RK$-^3r;^YsH-zBE9pnlFZH6L63VYaFjmB5F1{nlGIBB!06?&=shOFoP@$ld
zMmLej=+NwKzce~E9S6yyR{N*PhBFElt+QQpGw+!%{9CV8pAYnd=icIyv(eac19M+z
zo61#LuM?!SN`C&+BZO%G!TxhEhJ}Np`f&bS;(voI=;aXWyagKJLLUBgIP%JkPzL}S
zNURf}e_|3Vq3<HI?b4xrWX{4|6ANF$FB*8I+EEd4Y=2+Gjta&2&(e|>f~AOt0ML+f
z8}aH=*xv9XAoJovmqglYlbRC{nb<JMvcj`HUv+__m#$RLyv@@J9>;N734Le_QqGn^
zB_^l?f^aJ&R|sl?6kp_1n6Xd<5g_1({FgK2>!7V)HaG!)-@Vo$(v}rfKb-u_#e(-b
zpQH?lJAEJjF`R!t-UdVHOc01n#a*>tb9F{cu+PQZZ7~`0R&D3Y*t^qOJ=z<xUwlTc
zAq`YIGV=tT1lpacp)j5H6mWkcxZvw`?`Du3k*kqXS^5O0Tl`6KtUOyMZhU<F>H3P4
z=jA$sBE4**a_sH5^hAS<G?%Twznw4R;VGG!A=yF?lwTEzL%n7ZvP&2mQrUkDtVH!<
zVP}<<ij~FX`b!MPqoRo2Ef73Q;eWQ%ly0z`l=CB5tpGXQMMffqhKE;34c5D>8W0f?
z-CE*tg_+RP<nCgSNQ=!_L8valYdC`luZheM3;0jB{$!dziJ@hpJASDV+@l}em<xpj
z{+!mQbKTn-W<a>kxQ27yAHm|@hTUNe_Xm%W2&1R!w+;;&aWg9`DQV`eD2*k$J6xxI
zngiK;*ikF5>e=~3J)CJRGgYZJK?Z(IcDR(pquN1~+&C|GtY1LnvlZ|eQxAPA_!T{F
z(<~XLau7#ieU+0@5RP5V5aw*aO~KV^40;tyU}jX*n$ky8O1DE8ED6Klk?YK_SP<{z
zs{iXmjy|Y`vhE8_8p@;rYf3>+G+qx;7#T0I2F!{SXv}z>v4x?}zrRW?cI=0ovgZJ+
zFnE%%Mx>$s{N)<!d!f%+2&N<u((8EQ*Da)lmq|DKJ`mlv6$s<CY4h^tt)sl3+kOo&
zzSI{a`UxFd5&Sj%mBamgqouvqd3~|g`MgQ$Kr-2TjChwfG$z{2g6MZ|9+ao$e)Q!y
zAS<nv#06^07$rwjM{W6L2iyJ3(o$tdn4tJ1vgJWn(Om<VWNu(rMjMp+pzuw-q|7^3
z4$OE{#6hM`y<RP{#+jrPsF{N@3Px0Uxzw;&ZG)w$9%th_YBYAbyurjS@b}QboiO1O
zH7S4TU~)joSr%xO!KjuxkVuZlQ$WU~rL~n+9OHI?nZ@1<#Ye~ts~(UfNXKy&k~Y9a
zFuO@(PBNjCPf6>nC)R+@3fV2~M=6?x8`E3DFe1;0RYsrqwP>rZsg9JJ9sAW=4|VZX
zaun|;Ck%Ywz{FRWZYo4N6?Hr$+PjX|%f2Z@63?e7hf_}butJWsw7kGTu9;_ViSiva
z=dP8QrFBLp9UC7~oA}%ye28+6{}R|aLvE)?m1yUGM!+Y%*NHUDVuI-8$wft9S$+5m
z!gJ~p`_f5&7aRAu&X@EQNJ@sp2?)m)a`CTVm_0mP^7uLvu)>xoa{Z@LN4F&Z;~poy
zpBN7$3+Z-v+!7}qdxJaP<I+5!B5u{3b$;H({ODgO6E0ITACB_9)p+cm)<hCwuE?i-
zp+uPlgs#N{jkduu3?2$4dvfgeI7gjN7kpeQ>3K$A9A~bf%iS179xY|W3>UAg$Ipz&
zX`&QQI|#->Y4WHBJJ#r|#H@qy>of|>JVZz<E&x|hC<ee=Pp!E?Ni6$wLlPt*RcEND
z4xR;I=?a=IJT4N}STdNYP2)04XOwk$zZGF4vG4RSSaGw;J5Qlx6w?TUo{R+<grcB)
z9n)P7f>D4Fok65HT2_B=(%g2t)Mp{kcrm~{Nwn17908$uzD#8z)+q7>!BXxufgm-R
zxhIrfm>R=QtvSyq)-76LsZ3Qs<zNI<=RwMgALC<MWf;txMt{nx|Kagplo;1wlH#Bt
z4f0K~F;8ETj$Ru7ZwrV5YMS6OP2DgBpC|!B6nM`uoi@jW;m?N4F|iPrj2~&HkZ+T$
zpI&o@PP_o~WV)4dp?)vXsAfIgVuUyX5zaO+&PaF4<a{*hzeDEX9b3r%FGoivSt;0T
zGR4g>%Zf`+rdkHzo7Tc8n^1`0%9#&KQmY0|=TSNG@;USHJ3*vD>0mxAS2*W?y&t4P
z1eYed=GysYn%N+DJR45CBos(uXS#i=S1V#u9G87xuXZ}6?orVn9Efszt-n^fK(h*&
zoM|MSM+(n6bpb4f0n8;PrE&W8Wx{MvG4Z~oa!_Nl_!{^f8Vcwy!xWZEoy7(0{IIdc
zrk_fZ=x*w?9w)dI-LwiP*tu@3^c$wKyy<%A5EZoIGqQuBcUm@ie7Cbk9IbIxC=_AN
z@)WJ@eqAqQ%O@jgq@07o&d(toisBK-m>y|}A`(3S+7N3^5{)xVjDma28{4PO0qa-~
z%E_oJDklANGG0|ljgd(Senx+m@?N{nb#^z@0>%JTx+a1Qusf|E;Ae_{bbl5xu-X`p
zi}mTS26c2i42%x^`o+)xSQ^iBPN0fK7oJBFSR+af4#e<vdoVG5=V<x5i|p<DT1GUE
z90#6wGhJ~RL%T(MeJa$A&A4Vu;nyt0bd>glAOQRA=DVZ}#fp%*(UFzEt;OIMT!m#~
z(v!wzU;^*E{|xRLwNr`8BTj>DiYk`BY|Ef|DzPiZy`f<SJGFVS=}KP!9{YV~x%Nh8
zXGk+{+ex^4K!+@~^_phPC{CdgTTrn37QK9;%1unh)a-nV{H%VGv(G-yvUVK%89vlY
zOc>*APh<?S7<q$RrG=Bx)gw1nZJc*TgFUvLvIy4`$G$T$V}^wiEo3W-K7ak!@9$aj
zUKIiGYH{*BNg+Y+_&eclh7$d8A^e}&)>mSthK~Y;L2rNH=b|MUd^98*sa?U*V1};u
zfggCcl`Ilo-^afFS!uu)U!0Q2<W~*EHyyY(()r1O&nJY^Ac!&4)pnuKID#_hL)c=y
zD})}c^Xd7vbNEiIp8`mx1w$e=dXH<D|2cp6&HUMivx|HIMav*CrV&uczYaPVdASx}
z=MmMKXl|cu8f_@_pu;Zp=)`NeKOc7X><_bD*pi+amX<f_JM`aBFfFQSWi&#|!JQ6~
zr&SR>fD*jW16M9ih;3cO%rcvRvvOrm>*5vbDpdI<&N*>o^s-~#c}}S_MZ9uE5AvGa
z!5D^ix4M%@Kw_MrUv|J2491VPl?%@yw=^Gd7}Qp!#Em+Y1AwwpC{Lx9+0>LhF1O&k
zy|;awAF$YkT-GrXZ<b(O=O|C|=Ylm}J&plMV8p~arUcn3H1%+F=I&L+Zl7CYWH!b(
zTNy&nMTe*>Ap_7~k9Ks3eTsOA5qq#3*X;3#<1fKqaH%mbBweP0tAq9u&bsifm_X`Y
zp2<I)>jbF@po0^X8Xkhb*<!~OuGxJ-Ndc}G;_{na!URoMigzv^sffSM%3w0UaE~<Y
z2CekgjON45HE03&tmC6dLlws{n3<b|incWX91gp7-t3s`nUH0oC9?E%lf$*S=d}t%
z5vYfL$`M55-c?Hj_Pam%Pe1RWsvMT#h(4&wGF=gQ=D~Sx6W>vF_)rM!pYgIK(ZY*M
zsaSvU&Q*@~kK3s()JXgEvT2T75<F89&^rL*3@)NOHHGol&Qc`*0sN<_gh=cO0kG<>
zB&NQFQr@^3nrpF21%GsTpHMg$Sg=x;f4@0M@~+;(k4~$#-tgF(c#O|;lx`HK&F^4W
zRc;Xlz_pq_kgM-|y?iVXwt3>iJLW`8&7SRBb@AhBVh99$6quV+Eb{!ud@_eN*&5wa
zr)o#pcUYB3H24yB6i;Tt6BoMwqTThVM!5NGO~qJav25t>$crEV_i5SV$sKpUZg~DJ
zN-wC0uEen`mu>g+nXyLV<(L;=<viwqx)TuCVIOK$oTs`g;vQodR;fANv+c2!A29dH
zJSIK!fF$(C3})h!J=d5S)kDC2&B631s`eOQRTJKtCB#OOI7ikKb_pr<;1Jg2z6(WS
zmPfi!*KUS+AF7S_nTV(F!DK0ZK0clpG{)X3+1iu;aQpU#snv=8VZDYh@4FsaJn6eD
z=IxK`?Nj(1Pybs<=!Z!AN6*jk(w(Y}aaFb*N<X(-Oq@+jI4V~vu-DHG;y!K@QRExx
zPeh$T6OMSkEdSesyeDr}^L36mSsm+x{%_30CY>i%_p(OxpTuuL;^mVr^mVPWsCdsq
z)z;T<Wpkl#s1#1;Ki6<k1pf>}d@lR@7T+-q3OQ`eY3S$d5(r~I)(DIv)7tz!KY~hJ
zeB%iTgIa0@PvHvil|OgKi2{KvVh=U|Z~uMhu~Dtx&vx}p5HAWW%@A3+@|c6!RH`iq
z1mEx%(nf?5MBp*1yq?K$VwE)B+b?c|tZ8=~Cm$O()BsJ-^xOEyQ}wv<+uQ@d`~`uE
za}V*fXeeLujrGr37pC~q)Bv1bfVi_}IHnPr`4b$qs7y712FK&@;6~nRDtA^7kJz;w
z2M#gJ$<cUgv`yt`^&EsJiRM^2u_{)zA|y7ks%cFhx+J_$DaiMM*I04U3!>hLi`U$+
z*T-8(GeE2O8lF4yTml(yRVblMoH#RxoVif*avz5>cZ<PtBz<wo#WccYUExRItA0bh
zz;Ki@)`XajzF|`6CZ=mC_BDKu*nk%<1sW7w;*X$w2C&Xi3Jh;Y+ffLEAPgOd*olj_
zL=#^%%;IoX#OY4^tp3ERC(lx$L&%p%S%)a?j7AAbp5rm7%rfTsF;vvPaN2D4$~Bd@
z)k8p4%@$I5kV#o0b|pYZkZc*4e`aa2U6MO%^Y^>Dyu6g3GfPo;yR0=B&5&Tro6@i8
z`Kv`_2kCy8w(8OF-al7n$dQ2kP*OKkvn6}$JTl$?PGYV5O7RPI16-{b2mqxu3ZXv&
zO=OtdQX?MBdn2++s^hlePkHLvH2B_@FF7ZK94IyUPT{wL{qTm=_>%LP?byvXyjrZs
z*^cY2e!fj0)a84`T~Oo+axL_ay=zF;Zi)~AJh}FT=fcjzkyaKUY%U}S5yu;|Z+sN3
zZ`B_LGkX~w&GzRvJE{()*udTrajQKsQ>KP|AHlK**8^}(oVGbxm7}X**O@RZ<aLpB
zH#dXbw=C54WF)4ODqQPYXweO;pOBiT_KiOw3uVp~E7Dp}!I$*>(U{4E%r=Y6?D$Xm
z5RkYLWgNfK)qhcUDJJ>^M<ylb3H!0mgNt->&s43d1-ERg{BijFvg7RS@V3yFifr-R
zV8Mo$mUVRzLL23)aU?ZeR@~z8Jbc`0$8btV8mjClg05K!_B%YRlnXt|MhpF5W%Qtk
zTli_(ul(=YU8W;$FQM*d`*yEjj@_Q$78yJrQT-N0Vj$uHW9JVc^_&eD>Zp$GAdT|d
zUAg{QeWqhwVMj19U93=ybpMOlvGnTrqx)RI{52}d?iqoJXSu7x4JcR$j06WQrw&#(
zC#a~ydXq{aPWWMSIXQ+?EomhDSWm`$arsI#0;3mq6qDqx=uQWhfL}U-XU8VMP1^0=
ziHXA+h#zv99dmc}Zh*C(L#()yn;QZ`;4KsO#goGSi&te8w7i7`aE_pbDNp_9!HTZC
z;zCjZs<?I5kYXPnyQ$`?amZs<r+o1A)43K}8oi>PM|Iz5hJXD`Y@+wBYjM@S=OWXh
z4?@rj%V9om`&jaHFR;@g70%K0EDp9xBwIR?Jris}laWihDxDkqN;nZq@R@nBr4Tc;
zEHj~+XOx;6PMeQJX5GqBi&R16ltr1Du4bo*=h;=crRJ|3sWg?)9&Xa)6uj;u&!cZg
z8+qJAT($*X68?I5=U#$m0Q9x4>rm^<1rX|4Z`I&B?w>XGG_8>9+77?Eo-=|Vk1lAN
zcypu%IiNre%ERUk)n3~G$0Z~H&UWcPk;%ZI{4WRvGdmkG&RaWVWvoW>t3S4X?goq#
zXU#Y#GdtaPpx7*RS-EP<KBfzI_#oB(G0Zcp_-(*wgr$sd)^Eds!TX8z3CenIQQf;Z
z2y)Z=8zB4y(?8#St-1=3`<^=ec4E*3=^4b{!sIGP?^-z__f|CN0S{y40>pe`F)oLI
z<yyuMLHD(^lvU2m2^QA5uW`-`nZvX~^}Ablz$cO*I|(HC7~9DjApEy_xlve=at!x3
zwcl1&5Yud6;_QhdGMo47YkzmfuY1b*;2C)af>z7T5BH&LZSaAwPhDW2MPB99wjUZD
zwBjk0K&gsAZ?qu<t^J>L?0><uLZ2F0grJB%m!u|Y?`LHHFxZxx^-~)@98!13>l_Ir
zKKN&NL`+Qn>HpCMnynT2<C_@JZ*g3fIbaUB9E_@?1hs;Tbts)msesRRw->cTEf2Ie
zn2l*>CO&{?6F2tvMDhEx6VY<^{{@0Ju$QMF^dp(X{6gw(b-Px+5*KB2kAAU>U)<iU
zYmnD4yovv#gV?;3>&=9RWRIPnF`bg{5n!EyBFxs(Cd@3G*(uwNWUl75r^*j<22nqz
z^^K>UjaASw>EZBZ$z`g`iF(e5deXUong()RDPMvdvbuAhX)HI_QN^p<b@1%oNOBZC
z?P*4%uK(`A{pMcm=Nlk^uqoC0N7%#wa%kqL-Z*9CNc(zBSt=N=3AK}6Vq;X#mrf~K
z+_zyJ8GoQwA5uN3&<cw@znN}!;%oD&76nBAPKLt^UAHecgW2x55z8*QrL7xNhE?<|
z4!26o4z`a=XN~EQLvvWjCEZ_z8%x9|wZ$H(>(S_1G|~!YtKW-9JWFN!G5HWDe^@<&
zJl^;x)s6`FbHo?lWbCSf|Gk!Ey+dkh?T0X95h1UX7_M~-F&iUKsTN9-)+o>)i<?vM
zkAoNT6$Fn?SJOxzIcsh%i^*H^l~7|S56@YMp4niy*N&_sGb15GJ@Sah{_wl?i!;59
z@7G?#FkqW41Ufmva0AhlmE6|A0865?NlxEPoCR=q<8<S%T5#p)W$27rxH0v9h#A{k
z4vQR{W%F}tEE_cr|F3EN>8o|iM@!>zWR7XIgBXpWp*B0}+IuHeMAHmX5yW%MdFyHX
z24T4rT#k4-K})sIm|l6uf|p<mSySYanBDEG)@}n5Ak3_q=t|jB7o+P=oji<TgBgp&
zEf%F=H6DdwyTXL|zy-ow!hvqdXlqTP9qTx`z$1Sv!8!=zdxrkB_sZ1A#Z3xM&{~rF
z*S#V~BsJ;k=4l~$g*_PGVh8ilw*H=n1y`fR%*XJpR+g)cw!z!4-?|W^ua;V!bBoIh
zhZrEKi0n2@5-ZwkYHDbj>0n3P)eXQeW|Nzu4ArN_QWTxp2SJ+)=V^;<ZsspH**eN*
z!mU>jUM_UzA@uA1r)6%8G#<IdPI&uQg9io-D6on6*P^*Gg3eEVoTQP4W|uL;u~>lj
zKeXF57)46AJ<dj%fhlGk7;p3*BXOKZT@}RMm+JO+ImvF_h9gvCZ&Plfltnk2gFQn6
z0BcuK-Z$&)c73m$_Ei0b$rxa{IM5<Dt((E_8fg@d^VJmgdV}z8^X@ysSiL7c`vG!8
zj`YLG<JJDbBNO{0gYk)UPXNPTkAh$EPC%l^G0(M_ivMDf!)V*hsElw8Tz{k$eG@D5
zd&GH=ZAu!xot~Hu85+N(pQ)(9Ocn;J8~yeB<W2^U>bt`1<j?B7x#3!TAZz~}e&!KB
z&(iIV#WCb{|AtedcP2_1ut$=EF;tRUqyUmsL#aXd(`t~?KCu<L_^D!vzOAML*>D<_
ziW?gyN(5q+erX|AP`0G5D$8@+L4ouqg+M-szGC&~eupG_@o3vB%VhV#Y=%#^o8*EO
z(kR(F4O`<EAU9G`<k)6|+9?ZLwY%zmtk5R{VOV!^W{=bhJK1boUYHkARQ>0yb7{ur
z1HMLiNYfh`q7-4)t2M%)r%+m4P!{m4{h@>dU9TO6VNyC?^bO@XR2rHLezfV9okp(T
zX~QUKc&%VcVm3?~imERuf8pR0QtEWQL)zOxS4<|8x@W*=0>LOX{?Dhh9ER#3;U2%V
zRRj8NEck1zhf3C0K}rlVSr=fLB&X?(zXnh3N4~4WDhBs7i%ulXw<#gbB$Z~n!6X8;
zyX3nnh9$Qp*Ve^yz|-UC{VmLQeFlknaou|EgWEQA;Q$t@rNg6<8_C}J{q<J$%Lvl$
z3#dEE1GdKqww(KhNr(1v6gII-{|#EP+I22?*V_-f0yiS=6*3e*94>t-m)x{k(=pnM
zR4HYI3?R)&xc#e63(IKanh~RI&B=;s69^7pDW3BT$y`A`$bRAie6Mn74mG*ote&U}
zC=>b{w-@?JQ36ZNV8%}lU(Mi2(D`tZcBL((f|~IAY*}UpZbg)hpwYx*W5Ho4ec5)|
zewnM8<U0$_-8HFd^A3QrgG>(ZPAZAZ^<Wbs>D6d%gP@vO9B@|DRXRRi*+gnI0?Jlf
z+lOoZFUhDIVe<Qp$Q_d2ZLuGOwYpzCD40sXU`I*-;hKjD`#kjxD0E!AaEOPtu*MCF
z4%nB`2v(qH=yBUXgj~hKW6JTE;g&%7cMnL$7YeF2rpT#_rBa;tcpghPY8(*<pYs2f
zZos*wJ{6&KSiBqMM(Q3zWqnz@zHE(Aqcl;kWMYfv3W_ytyL&r9-$n_oh+R5V7VsJx
z%Vt;#O$|L#O4u0>vQ2+1?5IW<ZE^IatcSZE7V4pb<mav!PrHSE<G6kA@`jF9U`#mr
zL#8W%)JD@bM<Xcx+h64h8wUsKaFG07dpnb#ONZDeWkt-P8_lIHP3ccD%L{<K=V}5J
zgPxjp>KK&PF5iXl@DutoF-*`cWdh~19`^AKxl~2O1T4qV6koD<ANlh-iVQ(wii^F_
z2QT=531cbRY0~<y!b*+Mx};$)bL%D<EA&GZ6eo62`0cH+0NxF+YTNk1YQods1UIAi
zm?|7MuT1-P#U#pitIUC46L_x7c7276R(jj-XMS+Cr%XO|3j_y5%m-@9o0+1byc;AP
zf%29koeAF^628~nF-pidQk>B}g6Y`+|9Wo!XyO~tcA7OqS&&Xk9^i>Bb}Wu!K4LL1
zv7FQz1RpJ{bH%N*Dwodfo{=6=xUaa0bNr<QuNS_1kL|zMKdsW6hXC)fv4pF8$#~Vy
zT0?c(hqc@oD!9$tAWOX>dmKg#K_UKA?QA0Mli&gOJ($#FTXH@w+0!MRF-aRujrrr_
zuL&3)2W^Y-cYkpHO9`0CxD`+yTgk_vxA41((?`F;Pc*{ScxNqXRrb*6%&#I)t6)1F
zuYt`yE(a_qe+(1ogn-ltR2|$|+=&<&yTUQ+*0~nfM2r*YpS^u#pJmpN;shJaVk&qX
zOSR$WEKkQpKm<Vl`t9nc`n7sQEw3oCc95$gw_GSXqX_<!UN`|_GkrZ3)hHLN&n9lm
z6q2L#Oj3QbR<sI!@Y<a(NZq=oW5))5r9b`{%E?=B20#wm7|YBJK3LIsJ#f*X2=F3L
z-O{1>_<6eRb5q=Nb|CC!s9f;WdMD`8$!VrTxqYRhxyZ7*g5Tn_2!0%?W&+Fi?fBRC
zdQ>$S1O{8IOX~!CnCZ=b8>9;rA{0A-;kB*P*9d8h8ln{p&TXx|xn*lU>YKKId$YqK
zGf8!$?mCfmo(lWg`M@hoE_xTVYjqis+o~inF)LYRsKMsdQ#X|zC@<*sx_SMwX7NUL
z^-ZUL1^+31X+uXVz>+Nc-mKGmD>~Jt1vTURgGugt3+TS5F6(PE%fR`C4~g@l;-Tvl
zqN-=gvrE;Mt+n+V9mLIZX670ZBB_8^ww)Cd9xl}iF~AHD50@7E|66+QAaey({Ve(q
z199>Gi)|nCVF7sLtc~2;bc<5XxKJ(d>=`8gPA1P0_f&TxEx%r)G2XCuv&Kjj*8|o_
zLq~YwqYLe7>?|+rYDEkjPBdY)qeQM~Y9GP|j&mM4A$}YPz7q0+JQN(KZ+DAy{kkP&
z*H&LyK77&8FvM@9$Nj&3&ihW?`sISF+s(w&7^GnVkTFEVHg;siq=4DS!v%TKv|3nX
z9AuI7YCSaWni=odiG7bg&kW%q8yV$&OpQk`qRBAtzw!aJtg}7F5BMNgnUQ`iIgc1F
znAvw~p&85^rnJE#bxLQwSNy@UUeifK%2~?#tJ_Uf8XYQ~=+FzYkFX>2uRB(U5@sNN
z%LIp|L7HS5s`#I3L%51OXHc1`(A{0;C1|U{7?IYr2aQ1nmB8m37$JNl7xw$a^Eg@V
za>Rx?pv(%0D9VTQP(<&UsP@Lnje@Bp(|oZC$JsfeOy2JASpfSHN9>_KoLbX8--=;C
zzu97fSwdcb57E<}0T)A2AWvR=l^)CMeip8JSzUeip;h{xL#hnXU5})^LAo9|9T0%`
zW&F+AHwC6~4Qiopk|T%s@>3IE*MR$CB2Hh!=6x18v2`{j0MXA~l>Oq<_V5j@56AnI
ziK?6%ZJN*V3z$858+`jUSzrv}<+u5=3HRHMZeLUUqlw_NJ=K6uS9-$_ngSV(rag0^
zEam)nsJvKA;!*qVv`!zLp5Rehfoxy0-|Hx%KX9<I3%(kGM!A)<6vDY|f4(7Wzp^+#
zOxEXqI}Gqb;rRGqoT<l{Il|{^UL~gL?6eE&7DW&=KzApBh8=&l#{wLf^H^=tn2V^4
z7|`U?3PQeq1Q9R5kftzFsvvF(iueKQG;+1hs(h;vusHsOh-QEOl9Vb}TA~r@gml#m
zTko7#H5d@7sDwZq=~^+bD^t^o8E;SWIXa4#y4T-4=K1>}nph1=7^QUdTm_wZ^AOF9
zphK8~#g-MKEaw7_tb334U0PP&7sZK1Yo$(l+v+=&=XYbO4wIgf`uCwN$uH3Ylj{2`
z0)T||D?zGe_QYBNX@CTd+hs_tE^Y6XKJ)zr<ADN|<ASPqfB&|Qx%6K`ijT#2;9WHV
zcKpGgN`-_xTT@O2+S7dv6G8F{<_t){8_!cc+-J?Yy^^P5D|s_vW1SD@>Qt7EX5gP=
zE)4_`dHVQalOzHw)HmxvD^apd`qS>;6tGVyY*P|v`p)o6_|*c$%c<!$889u<Qs_f{
z3h7%b!#cu#)C9<%vRFRjs(RW<EM(J~2OQX$3Zmq91*Y3XyHP-jW<-qtJfRbak)n~M
zvEt+4ySaRs8%A6$YswY?^!1=F)+wZK6h`*H%Gfx#*mCI)ch~^j^)T4frFs)zaPVMc
zV?Nrb%S$j=z6B`b$yyBmXs1yj?`}(*$YOP_DU<Vr01t-S*r{D5#EmQITNMN}cd3js
z>G_qq#Nk$?)D1;BLfWHm3;Pr>d1UUr?}QGo7XKQj@Zd^%tqs$<xJ~IY>ajxh2tQ3N
zRKbspg%pB<he!5I(|ee$*l<%R1k{p5a`{*c0;5DpK%s<57~<ePjiVG?4+x{-4q4C2
zk0>atmz!FW<Km>k+JxLmkZ!{H(ZfC$pI^Cih-V@N#cPL{2XcFqFK5KOoO+^NUxf59
zr9xL)Ax3?rfz^~&SkhBWe2D#aD6xX-cH$b8j^UK?cBrW8o{P9shkgT?qcA|}V6hMY
zl>E7KFk<jX-0Nbxw?w`Hxhm0U=QM|hNA%v!)w@9xI8hZq9nRzLLx8!Y@L3!`?&9X`
zztR--T|i*=olUZsqjj|T&K2El1v4@U5^s)_?~&yL{CxCA6gmLj|5y=LjcCTskX0>D
zEK9>{YN_7sg~^YwzwEg9X0*DjS%{jZEZVXT;}$fWO-{wPkO8XIM3vwtpJmHw1lo1l
zstS!jZrrsASx3NZmPXyQ*KL=*WroiWoC`LDM+)p-l{>V?BuDa^r52)8!r=2|c5O@1
zKkMP4{a4IKR;;0&WPfkH)>Dj$2-@?pxMMCZWIX>1+vmqw81zD7kkz@_;%6~B;z05>
z3cYLb^pRb|@iaiZ+-fBYv~nul5vg5EgeW5QJ&$<9KlLQK+b`1?C!#FH)3@-CKd-Ph
z2Tpr<rr$ksn;?8e;z<-eBBwv$Q{o4otP|`ASKkp1Ha9=ov4y$`T}q4hhwS8xIhuUB
zec3SSy4QE{6lN{du`igLp=Yc8Ia_$U5R)d*!5WU*E%~pP02zH=b={#$1=Q**jPw@h
z6@O`2fOtjX!i-MI6F(Q&Jl&q;w;aMNdJYG##c$jmjB}i=ImH>pVNfWk|IFXHjHBbH
zW20!^^b`{GI5)<##f?~t|Mxyh$<EA4L*77Yi0AnKWr2ZuqsS1(0^-G;5kf(JvXV*?
J)d1t*{{tHm(mwzI

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 07a4d35..e5a50a8 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,7 +43,6 @@ Programmer's Guide
     mbuf_lib
     poll_mode_drv
     cryptodev_lib
-    ivshmem_lib
     link_bonding_poll_mode_drv_lib
     timer_lib
     hash_lib
diff --git a/doc/guides/prog_guide/ivshmem_lib.rst b/doc/guides/prog_guide/ivshmem_lib.rst
deleted file mode 100644
index b8a32e4..0000000
--- a/doc/guides/prog_guide/ivshmem_lib.rst
+++ /dev/null
@@ -1,160 +0,0 @@
-..  BSD LICENSE
-    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-    All rights reserved.
-
-    Redistribution and use in source and binary forms, with or without
-    modification, are permitted provided that the following conditions
-    are met:
-
-    * Redistributions of source code must retain the above copyright
-    notice, this list of conditions and the following disclaimer.
-    * Redistributions in binary form must reproduce the above copyright
-    notice, this list of conditions and the following disclaimer in
-    the documentation and/or other materials provided with the
-    distribution.
-    * Neither the name of Intel Corporation nor the names of its
-    contributors may be used to endorse or promote products derived
-    from this software without specific prior written permission.
-
-    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-IVSHMEM Library
-===============
-
-The DPDK IVSHMEM library facilitates fast zero-copy data sharing among virtual machines
-(host-to-guest or guest-to-guest) by means of QEMU's IVSHMEM mechanism.
-
-The library works by providing a command line for QEMU to map several hugepages into a single IVSHMEM device.
-For the guest to know what is inside any given IVSHMEM device
-(and to distinguish between DPDK and non-DPDK IVSHMEM devices),
-a metadata file is also mapped into the IVSHMEM segment.
-No work needs to be done by the guest application to map IVSHMEM devices into memory;
-they are automatically recognized by the DPDK Environment Abstraction Layer (EAL).
-
-A typical DPDK IVSHMEM use case looks like the following.
-
-
-.. figure:: img/ivshmem.*
-
-   Typical Ivshmem use case
-
-
-The same could work with several virtual machines, providing host-to-VM or VM-to-VM communication.
-The maximum number of metadata files is 32 (by default) and each metadata file can contain different (or even the same) hugepages.
-The only constraint is that each VM has to have access to the memory it is sharing with other entities (be it host or another VM).
-For example, if the user wants to share the same memzone across two VMs, each VM must have that memzone in its metadata file.
-
-IVHSHMEM Library API Overview
------------------------------
-
-The following is a simple guide to using the IVSHMEM Library API:
-
-*   Call rte_ivshmem_metadata_create() to create a new metadata file.
-    The metadata name is used to distinguish between multiple metadata files.
-
-*   Populate each metadata file with DPDK data structures.
-    This can be done using the following API calls:
-
-    *   rte_ivhshmem_metadata_add_memzone() to add rte_memzone to metadata file
-
-    *   rte_ivshmem_metadata_add_ring() to add rte_ring to metadata file
-
-    *   rte_ivshmem_metadata_add_mempool() to add rte_mempool to metadata file
-
-*   Finally, call rte_ivshmem_metadata_cmdline_generate() to generate the command line for QEMU.
-    Multiple metadata files (and thus multiple command lines) can be supplied to a single VM.
-
-.. note::
-
-    Only data structures fully residing in DPDK hugepage memory work correctly.
-    Supported data structures created by malloc(), mmap()
-    or otherwise using non-DPDK memory cause undefined behavior and even a segmentation fault.
-    Specifically, because the memzone field in an rte_ring refers to a memzone structure residing in local memory,
-    accessing the memzone field in a shared rte_ring will cause an immediate segmentation fault.
-
-IVSHMEM Environment Configuration
----------------------------------
-
-The steps needed to successfully run IVSHMEM applications are the following:
-
-*   Compile a special version of QEMU from sources.
-
-    The source code can be found on the QEMU website (currently, version 1.4.x is supported, but version 1.5.x is known to work also),
-    however, the source code will need to be patched to support using regular files as the IVSHMEM memory backend.
-    The patch is not included in the DPDK package,
-    but is available on the `Intel®DPDK-vswitch project webpage <https://01.org/packet-processing/intel%C2%AE-ovdk>`_
-    (either separately or in a DPDK vSwitch package).
-
-*   Enable IVSHMEM library in the DPDK build configuration.
-
-    In the default configuration, IVSHMEM library is not compiled. To compile the IVSHMEM library,
-    one has to either use one of the provided IVSHMEM targets
-    (for example, x86_64-ivshmem-linuxapp-gcc),
-    or set CONFIG_RTE_LIBRTE_IVSHMEM to "y" in the build configuration.
-
-*   Set up hugepage memory on the virtual machine.
-
-    The guest applications run as regular DPDK (primary) processes and thus need their own hugepage memory set up inside the VM.
-    The process is identical to the one described in the *DPDK Getting Started Guide*.
-
-Best Practices for Writing IVSHMEM Applications
------------------------------------------------
-
-When considering the use of IVSHMEM for sharing memory, security implications need to be carefully evaluated.
-IVSHMEM is not suitable for untrusted guests, as IVSHMEM is essentially a window into the host process memory.
-This also has implications for the multiple VM scenarios.
-While the IVSHMEM library tries to share as little memory as possible,
-it is quite probable that data designated for one VM might also be present in an IVSMHMEM device designated for another VM.
-Consequently, any shared memory corruption will affect both host and all VMs sharing that particular memory.
-
-IVSHMEM applications essentially behave like multi-process applications,
-so it is important to implement access serialization to data and thread safety.
-DPDK ring structures are already thread-safe, however,
-any custom data structures that the user might need would have to be thread-safe also.
-
-Similar to regular DPDK multi-process applications,
-it is not recommended to use function pointers as functions might have different memory addresses in different processes.
-
-It is best to avoid freeing the rte_mbuf structure on a different machine from where it was allocated,
-that is, if the mbuf was allocated on the host, the host should free it.
-Consequently, any packet transmission and reception should also happen on the same machine (whether virtual or physical).
-Failing to do so may lead to data corruption in the mempool cache.
-
-Despite the IVSHMEM mechanism being zero-copy and having good performance,
-it is still desirable to do processing in batches and follow other procedures described in
-:ref:`Performance Optimization <Performance_Optimization>`.
-
-Best Practices for Running IVSHMEM Applications
------------------------------------------------
-
-For performance reasons,
-it is best to pin host processes and QEMU processes to different cores so that they do not interfere with each other.
-If NUMA support is enabled, it is also desirable to keep host process' hugepage memory and QEMU process on the same NUMA node.
-
-For the best performance across all NUMA nodes, each QEMU core should be pinned to host CPU core on the appropriate NUMA node.
-QEMU's virtual NUMA nodes should also be set up to correspond to physical NUMA nodes.
-More on how to set up DPDK and QEMU NUMA support can be found in *DPDK Getting Started Guide* and
-`QEMU documentation <http://qemu.weilnetz.de/qemu-doc.html>`_ respectively.
-A script called cpu_layout.py is provided with the DPDK package (in the tools directory)
-that can be used to identify which CPU cores correspond to which NUMA node.
-
-The QEMU IVSHMEM command line creation should be considered the last step before starting the virtual machine.
-Currently, there is no hot plug support for QEMU IVSHMEM devices,
-so one cannot add additional memory to an IVSHMEM device once it has been created.
-Therefore, the correct sequence to run an IVSHMEM application is to run host application first,
-obtain the command lines for each IVSHMEM device and then run all QEMU instances with guest applications afterwards.
-
-It is important to note that once QEMU is started, it holds on to the hugepages it uses for IVSHMEM devices.
-As a result, if the user wishes to shut down or restart the IVSHMEM host application,
-it is not enough to simply shut the application down.
-The virtual machine must also be shut down (if not, it will hold onto outdated host data).
diff --git a/doc/guides/prog_guide/source_org.rst b/doc/guides/prog_guide/source_org.rst
index 0c06d47..d9c140f 100644
--- a/doc/guides/prog_guide/source_org.rst
+++ b/doc/guides/prog_guide/source_org.rst
@@ -70,7 +70,6 @@ The lib directory contains::
     +-- librte_ether        # Generic interface to poll mode driver
     +-- librte_hash         # Hash library
     +-- librte_ip_frag      # IP fragmentation library
-    +-- librte_ivshmem      # QEMU IVSHMEM library
     +-- librte_kni          # Kernel NIC interface
     +-- librte_kvargs       # Argument parsing library
     +-- librte_lpm          # Longest prefix match library
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d2dc4a9..be03262 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -54,9 +54,6 @@ Deprecation Notices
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
 
-* The ``rte_ivshmem`` feature (including library and EAL code) will be removed
-  in 16.11 because it has some design issues which are not planned to be fixed.
-
 * The vhost-cuse will be removed in 16.11. Since v2.1, a large majority of
   development effort has gone to vhost-user, such as multiple-queue, live
   migration, reconnect etc. Therefore, vhost-user should be used instead.
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index a6e3307..f7a2ceb 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -94,6 +94,9 @@ API Changes
 
    This section is a comment. Make sure to start the actual text at the margin.
 
+* The ``rte_ivshmem`` feature (including library and EAL code) has been removed
+  in 16.11 because it had some design issues which were not planned to be fixed.
+
 
 ABI Changes
 -----------
diff --git a/examples/Makefile b/examples/Makefile
index 18b41b9..d49c7f2 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -61,7 +61,6 @@ ifneq ($(PQOS_INSTALL_PATH),)
 DIRS-y += l2fwd-cat
 endif
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += l2fwd-crypto
-DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += l2fwd-jobstats
 DIRS-y += l2fwd-keepalive
 DIRS-y += l2fwd-keepalive/ka-agent
diff --git a/examples/l2fwd-ivshmem/Makefile b/examples/l2fwd-ivshmem/Makefile
deleted file mode 100644
index 5f1d172..0000000
--- a/examples/l2fwd-ivshmem/Makefile
+++ /dev/null
@@ -1,43 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-ifeq ($(RTE_SDK),)
-$(error "Please define RTE_SDK environment variable")
-endif
-
-# Default target, can be overriden by command line or environment
-RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += host guest
-
-include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/l2fwd-ivshmem/guest/Makefile b/examples/l2fwd-ivshmem/guest/Makefile
deleted file mode 100644
index 3ca73b4..0000000
--- a/examples/l2fwd-ivshmem/guest/Makefile
+++ /dev/null
@@ -1,50 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-ifeq ($(RTE_SDK),)
-$(error "Please define RTE_SDK environment variable")
-endif
-
-# Default target, can be overriden by command line or environment
-RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-# binary name
-APP = guest
-
-# all source are stored in SRCS-y
-SRCS-y := guest.c
-
-CFLAGS += -O3
-CFLAGS += $(WERROR_FLAGS)
-
-include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/l2fwd-ivshmem/guest/guest.c b/examples/l2fwd-ivshmem/guest/guest.c
deleted file mode 100644
index 7c49521..0000000
--- a/examples/l2fwd-ivshmem/guest/guest.c
+++ /dev/null
@@ -1,452 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <stdint.h>
-#include <unistd.h>
-#include <getopt.h>
-#include <signal.h>
-#include <sys/mman.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/queue.h>
-#include <sys/file.h>
-#include <unistd.h>
-#include <limits.h>
-#include <errno.h>
-#include <sys/ioctl.h>
-#include <sys/time.h>
-
-#include <rte_common.h>
-#include <rte_eal_memconfig.h>
-#include <rte_log.h>
-#include <rte_memory.h>
-#include <rte_memcpy.h>
-#include <rte_memzone.h>
-#include <rte_eal.h>
-#include <rte_per_lcore.h>
-#include <rte_launch.h>
-#include <rte_atomic.h>
-#include <rte_cycles.h>
-#include <rte_prefetch.h>
-#include <rte_lcore.h>
-#include <rte_per_lcore.h>
-#include <rte_branch_prediction.h>
-#include <rte_interrupts.h>
-#include <rte_pci.h>
-#include <rte_random.h>
-#include <rte_debug.h>
-#include <rte_ether.h>
-#include <rte_ethdev.h>
-#include <rte_ring.h>
-#include <rte_mempool.h>
-#include <rte_mbuf.h>
-#include <rte_ivshmem.h>
-
-#include "../include/common.h"
-
-#define MAX_RX_QUEUE_PER_LCORE 16
-#define MAX_TX_QUEUE_PER_PORT 16
-struct lcore_queue_conf {
-	unsigned n_rx_port;
-	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table rx_mbufs[RTE_MAX_ETHPORTS];
-	struct vm_port_param * port_param[MAX_RX_QUEUE_PER_LCORE];
-} __rte_cache_aligned;
-static struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
-
-/* Print out statistics on packets dropped */
-static void
-print_stats(void)
-{
-	uint64_t total_packets_dropped, total_packets_tx, total_packets_rx;
-	unsigned portid;
-
-	total_packets_dropped = 0;
-	total_packets_tx = 0;
-	total_packets_rx = 0;
-
-	const char clr[] = { 27, '[', '2', 'J', '\0' };
-	const char topLeft[] = { 27, '[', '1', ';', '1', 'H','\0' };
-
-		/* Clear screen and move to top left */
-	printf("%s%s", clr, topLeft);
-
-	printf("\nPort statistics ====================================");
-
-	for (portid = 0; portid < ctrl->nb_ports; portid++) {
-		/* skip ports that are not enabled */
-		printf("\nStatistics for port %u ------------------------------"
-			   "\nPackets sent: %24"PRIu64
-			   "\nPackets received: %20"PRIu64
-			   "\nPackets dropped: %21"PRIu64,
-			   portid,
-			   ctrl->vm_ports[portid].stats.tx,
-			   ctrl->vm_ports[portid].stats.rx,
-			   ctrl->vm_ports[portid].stats.dropped);
-
-		total_packets_dropped += ctrl->vm_ports[portid].stats.dropped;
-		total_packets_tx += ctrl->vm_ports[portid].stats.tx;
-		total_packets_rx += ctrl->vm_ports[portid].stats.rx;
-	}
-	printf("\nAggregate statistics ==============================="
-		   "\nTotal packets sent: %18"PRIu64
-		   "\nTotal packets received: %14"PRIu64
-		   "\nTotal packets dropped: %15"PRIu64,
-		   total_packets_tx,
-		   total_packets_rx,
-		   total_packets_dropped);
-	printf("\n====================================================\n");
-}
-
-/* display usage */
-static void
-l2fwd_ivshmem_usage(const char *prgname)
-{
-	printf("%s [EAL options] -- [-q NQ -T PERIOD]\n"
-		   "  -q NQ: number of queue (=ports) per lcore (default is 1)\n"
-		   "  -T PERIOD: statistics will be refreshed each PERIOD seconds (0 to disable, 10 default, 86400 maximum)\n",
-	       prgname);
-}
-
-static unsigned int
-l2fwd_ivshmem_parse_nqueue(const char *q_arg)
-{
-	char *end = NULL;
-	unsigned long n;
-
-	/* parse hexadecimal string */
-	n = strtoul(q_arg, &end, 10);
-	if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0'))
-		return 0;
-	if (n == 0)
-		return 0;
-	if (n >= MAX_RX_QUEUE_PER_LCORE)
-		return 0;
-
-	return n;
-}
-
-static int
-l2fwd_ivshmem_parse_timer_period(const char *q_arg)
-{
-	char *end = NULL;
-	int n;
-
-	/* parse number string */
-	n = strtol(q_arg, &end, 10);
-	if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0'))
-		return -1;
-	if (n >= MAX_TIMER_PERIOD)
-		return -1;
-
-	return n;
-}
-
-/* Parse the argument given in the command line of the application */
-static int
-l2fwd_ivshmem_parse_args(int argc, char **argv)
-{
-	int opt, ret;
-	char **argvopt;
-	int option_index;
-	char *prgname = argv[0];
-	static struct option lgopts[] = {
-		{NULL, 0, 0, 0}
-	};
-
-	argvopt = argv;
-
-	while ((opt = getopt_long(argc, argvopt, "q:p:T:",
-				  lgopts, &option_index)) != EOF) {
-
-		switch (opt) {
-
-		/* nqueue */
-		case 'q':
-			l2fwd_ivshmem_rx_queue_per_lcore = l2fwd_ivshmem_parse_nqueue(optarg);
-			if (l2fwd_ivshmem_rx_queue_per_lcore == 0) {
-				printf("invalid queue number\n");
-				l2fwd_ivshmem_usage(prgname);
-				return -1;
-			}
-			break;
-
-		/* timer period */
-		case 'T':
-			timer_period = l2fwd_ivshmem_parse_timer_period(optarg) * 1000 * TIMER_MILLISECOND;
-			if (timer_period < 0) {
-				printf("invalid timer period\n");
-				l2fwd_ivshmem_usage(prgname);
-				return -1;
-			}
-			break;
-
-		/* long options */
-		case 0:
-			l2fwd_ivshmem_usage(prgname);
-			return -1;
-
-		default:
-			l2fwd_ivshmem_usage(prgname);
-			return -1;
-		}
-	}
-
-	if (optind >= 0)
-		argv[optind-1] = prgname;
-
-	ret = optind-1;
-	optind = 0; /* reset getopt lib */
-	return ret;
-}
-
-/*
- * this loop is getting packets from RX rings of each port, and puts them
- * into TX rings of destination ports.
- */
-static void
-fwd_loop(void)
-{
-
-	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
-	struct rte_mbuf **m_table;
-	struct rte_mbuf *m;
-	struct rte_ring *rx, *tx;
-	unsigned lcore_id, len;
-	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
-	unsigned i, j, portid, nb_rx;
-	struct lcore_queue_conf *qconf;
-	struct ether_hdr *eth;
-	void *tmp;
-
-	prev_tsc = 0;
-	timer_tsc = 0;
-
-	lcore_id = rte_lcore_id();
-	qconf = &lcore_queue_conf[lcore_id];
-
-	if (qconf->n_rx_port == 0) {
-		RTE_LOG(INFO, L2FWD_IVSHMEM, "lcore %u has nothing to do\n", lcore_id);
-		return;
-	}
-
-	RTE_LOG(INFO, L2FWD_IVSHMEM, "entering main loop on lcore %u\n", lcore_id);
-
-	for (i = 0; i < qconf->n_rx_port; i++) {
-		portid = qconf->rx_port_list[i];
-		RTE_LOG(INFO, L2FWD_IVSHMEM, " -- lcoreid=%u portid=%u\n", lcore_id,
-			portid);
-	}
-
-	while (ctrl->state == STATE_FWD) {
-		cur_tsc = rte_rdtsc();
-
-		diff_tsc = cur_tsc - prev_tsc;
-
-		/*
-		 * Read packet from RX queues and send it to TX queues
-		 */
-		for (i = 0; i < qconf->n_rx_port; i++) {
-
-			portid = qconf->rx_port_list[i];
-
-			len = qconf->rx_mbufs[portid].len;
-
-			rx = ctrl->vm_ports[portid].rx_ring;
-			tx = ctrl->vm_ports[portid].dst->tx_ring;
-
-			m_table = qconf->rx_mbufs[portid].m_table;
-
-			/* if we have something in the queue, try and transmit it down */
-			if (len != 0) {
-
-				/* if we succeed in sending the packets down, mark queue as free */
-				if (rte_ring_enqueue_bulk(tx, (void**) m_table, len) == 0) {
-					ctrl->vm_ports[portid].stats.tx += len;
-					qconf->rx_mbufs[portid].len = 0;
-					len = 0;
-				}
-			}
-
-			nb_rx = rte_ring_count(rx);
-
-			nb_rx = RTE_MIN(nb_rx, (unsigned) MAX_PKT_BURST);
-
-			if (nb_rx == 0)
-				continue;
-
-			/* if we can get packets into the m_table */
-			if (nb_rx < (RTE_DIM(qconf->rx_mbufs[portid].m_table) - len)) {
-
-				/* this situation cannot exist, so if we fail to dequeue, that
-				 * means something went horribly wrong, hence the failure. */
-				if (rte_ring_dequeue_bulk(rx, (void**) pkts_burst, nb_rx) < 0) {
-					ctrl->state = STATE_FAIL;
-					return;
-				}
-
-				ctrl->vm_ports[portid].stats.rx += nb_rx;
-
-				/* put packets into the queue */
-				for (j = 0; j < nb_rx; j++) {
-					m = pkts_burst[j];
-
-					rte_prefetch0(rte_pktmbuf_mtod(m, void *));
-
-					m_table[len + j] = m;
-
-					eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
-
-					/* 02:00:00:00:00:xx */
-					tmp = &eth->d_addr.addr_bytes[0];
-					*((uint64_t *)tmp) = 0x000000000002 + ((uint64_t)portid << 40);
-
-					/* src addr */
-					ether_addr_copy(&ctrl->vm_ports[portid].dst->ethaddr,
-							&eth->s_addr);
-				}
-				qconf->rx_mbufs[portid].len += nb_rx;
-
-			}
-
-		}
-
-		/* if timer is enabled */
-		if (timer_period > 0) {
-
-			/* advance the timer */
-			timer_tsc += diff_tsc;
-
-			/* if timer has reached its timeout */
-			if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-
-				/* do this only on master core */
-				if (lcore_id == rte_get_master_lcore()) {
-					print_stats();
-					/* reset the timer */
-					timer_tsc = 0;
-				}
-			}
-		}
-
-		prev_tsc = cur_tsc;
-	}
-}
-
-static int
-l2fwd_ivshmem_launch_one_lcore(__attribute__((unused)) void *dummy)
-{
-	fwd_loop();
-	return 0;
-}
-
-int
-main(int argc, char **argv)
-{
-	struct lcore_queue_conf *qconf;
-	const struct rte_memzone * mz;
-	int ret;
-	uint8_t portid;
-	unsigned rx_lcore_id, lcore_id;
-
-	/* init EAL */
-	ret = rte_eal_init(argc, argv);
-	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
-	argc -= ret;
-	argv += ret;
-
-	/* parse application arguments (after the EAL ones) */
-	ret = l2fwd_ivshmem_parse_args(argc, argv);
-	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid l2fwd-ivshmem arguments\n");
-
-	/* find control structure */
-	mz = rte_memzone_lookup(CTRL_MZ_NAME);
-	if (mz == NULL)
-		rte_exit(EXIT_FAILURE, "Cannot find control memzone\n");
-
-	ctrl = (struct ivshmem_ctrl*) mz->addr;
-
-	/* lock the ctrl so that we don't have conflicts with anything else */
-	rte_spinlock_lock(&ctrl->lock);
-
-	if (ctrl->state == STATE_FWD)
-		rte_exit(EXIT_FAILURE, "Forwarding already started!\n");
-
-	rx_lcore_id = 0;
-	qconf = NULL;
-
-	/* Initialize the port/queue configuration of each logical core */
-	for (portid = 0; portid < ctrl->nb_ports; portid++) {
-
-		/* get the lcore_id for this port */
-		while (rte_lcore_is_enabled(rx_lcore_id) == 0 ||
-			   lcore_queue_conf[rx_lcore_id].n_rx_port ==
-			   l2fwd_ivshmem_rx_queue_per_lcore) {
-			rx_lcore_id++;
-			if (rx_lcore_id >= RTE_MAX_LCORE)
-				rte_exit(EXIT_FAILURE, "Not enough cores\n");
-		}
-
-		if (qconf != &lcore_queue_conf[rx_lcore_id])
-			/* Assigned a new logical core in the loop above. */
-			qconf = &lcore_queue_conf[rx_lcore_id];
-
-		qconf->rx_port_list[qconf->n_rx_port] = portid;
-		qconf->port_param[qconf->n_rx_port] = &ctrl->vm_ports[portid];
-		qconf->n_rx_port++;
-
-		printf("Lcore %u: RX port %u\n", rx_lcore_id, (unsigned) portid);
-	}
-
-	sigsetup();
-
-	/* indicate that we are ready to forward */
-	ctrl->state = STATE_FWD;
-
-	/* unlock */
-	rte_spinlock_unlock(&ctrl->lock);
-
-	/* launch per-lcore init on every lcore */
-	rte_eal_mp_remote_launch(l2fwd_ivshmem_launch_one_lcore, NULL, CALL_MASTER);
-	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (rte_eal_wait_lcore(lcore_id) < 0)
-			return -1;
-	}
-
-	return 0;
-}
diff --git a/examples/l2fwd-ivshmem/host/Makefile b/examples/l2fwd-ivshmem/host/Makefile
deleted file mode 100644
index f91419e..0000000
--- a/examples/l2fwd-ivshmem/host/Makefile
+++ /dev/null
@@ -1,50 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-ifeq ($(RTE_SDK),)
-$(error "Please define RTE_SDK environment variable")
-endif
-
-# Default target, can be overriden by command line or environment
-RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-# binary name
-APP = host
-
-# all source are stored in SRCS-y
-SRCS-y := host.c
-
-CFLAGS += -O3
-CFLAGS += $(WERROR_FLAGS)
-
-include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/l2fwd-ivshmem/host/host.c b/examples/l2fwd-ivshmem/host/host.c
deleted file mode 100644
index da7b00d..0000000
--- a/examples/l2fwd-ivshmem/host/host.c
+++ /dev/null
@@ -1,895 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <string.h>
-#include <limits.h>
-#include <inttypes.h>
-#include <getopt.h>
-#include <signal.h>
-
-#include <rte_eal.h>
-#include <rte_cycles.h>
-#include <rte_eal_memconfig.h>
-#include <rte_debug.h>
-#include <rte_ether.h>
-#include <rte_ethdev.h>
-#include <rte_string_fns.h>
-#include <rte_ivshmem.h>
-#include <rte_ring.h>
-#include <rte_mempool.h>
-#include <rte_mbuf.h>
-
-#include "../include/common.h"
-
-/*
- * Configurable number of RX/TX ring descriptors
- */
-#define RTE_TEST_RX_DESC_DEFAULT 128
-#define RTE_TEST_TX_DESC_DEFAULT 512
-static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
-static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
-
-#define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
-
-/* mask of enabled ports */
-static uint32_t l2fwd_ivshmem_enabled_port_mask = 0;
-
-static struct ether_addr l2fwd_ivshmem_ports_eth_addr[RTE_MAX_ETHPORTS];
-
-#define NB_MBUF   8192
-
-#define MAX_RX_QUEUE_PER_LCORE 16
-#define MAX_TX_QUEUE_PER_PORT 16
-struct lcore_queue_conf {
-	unsigned n_rx_port;
-	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-	struct vm_port_param * port_param[MAX_RX_QUEUE_PER_LCORE];
-	struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
-	struct mbuf_table rx_mbufs[RTE_MAX_ETHPORTS];
-} __rte_cache_aligned;
-static struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];
-
-static const struct rte_eth_conf port_conf = {
-	.rxmode = {
-		.split_hdr_size = 0,
-		.header_split   = 0, /**< Header Split disabled */
-		.hw_ip_checksum = 0, /**< IP checksum offload disabled */
-		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
-		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
-		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
-	},
-	.txmode = {
-		.mq_mode = ETH_MQ_TX_NONE,
-	},
-};
-
-#define METADATA_NAME "l2fwd_ivshmem"
-#define CMDLINE_OPT_FWD_CONF "fwd-conf"
-
-#define QEMU_CMD_FMT "/tmp/ivshmem_qemu_cmdline_%s"
-
-struct port_statistics port_statistics[RTE_MAX_ETHPORTS];
-
-struct rte_mempool * l2fwd_ivshmem_pktmbuf_pool = NULL;
-
-/* Print out statistics on packets dropped */
-static void
-print_stats(void)
-{
-	uint64_t total_packets_dropped, total_packets_tx, total_packets_rx;
-	uint64_t total_vm_packets_dropped = 0;
-	uint64_t total_vm_packets_tx, total_vm_packets_rx;
-	unsigned portid;
-
-	total_packets_dropped = 0;
-	total_packets_tx = 0;
-	total_packets_rx = 0;
-	total_vm_packets_tx = 0;
-	total_vm_packets_rx = 0;
-
-	const char clr[] = { 27, '[', '2', 'J', '\0' };
-	const char topLeft[] = { 27, '[', '1', ';', '1', 'H','\0' };
-
-		/* Clear screen and move to top left */
-	printf("%s%s", clr, topLeft);
-
-	printf("\nPort statistics ====================================");
-
-	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-		/* skip disabled ports */
-		if ((l2fwd_ivshmem_enabled_port_mask & (1 << portid)) == 0)
-			continue;
-		printf("\nStatistics for port %u ------------------------------"
-			   "\nPackets sent: %24"PRIu64
-			   "\nPackets received: %20"PRIu64
-			   "\nPackets dropped: %21"PRIu64,
-			   portid,
-			   port_statistics[portid].tx,
-			   port_statistics[portid].rx,
-			   port_statistics[portid].dropped);
-
-		total_packets_dropped += port_statistics[portid].dropped;
-		total_packets_tx += port_statistics[portid].tx;
-		total_packets_rx += port_statistics[portid].rx;
-	}
-
-	printf("\nVM statistics ======================================");
-	for (portid = 0; portid < ctrl->nb_ports; portid++) {
-		printf("\nStatistics for port %u ------------------------------"
-			   "\nPackets sent: %24"PRIu64
-			   "\nPackets received: %20"PRIu64,
-			   portid,
-			   ctrl->vm_ports[portid].stats.tx,
-			   ctrl->vm_ports[portid].stats.rx);
-
-		total_vm_packets_dropped += ctrl->vm_ports[portid].stats.dropped;
-		total_vm_packets_tx += ctrl->vm_ports[portid].stats.tx;
-		total_vm_packets_rx += ctrl->vm_ports[portid].stats.rx;
-	}
-	printf("\nAggregate statistics ==============================="
-			   "\nTotal packets sent: %18"PRIu64
-			   "\nTotal packets received: %14"PRIu64
-			   "\nTotal packets dropped: %15"PRIu64
-			   "\nTotal VM packets sent: %15"PRIu64
-			   "\nTotal VM packets received: %11"PRIu64,
-			   total_packets_tx,
-			   total_packets_rx,
-			   total_packets_dropped,
-			   total_vm_packets_tx,
-			   total_vm_packets_rx);
-	printf("\n====================================================\n");
-}
-
-static int
-print_to_file(const char *cmdline, const char *config_name)
-{
-	FILE *file;
-	char path[PATH_MAX];
-
-	snprintf(path, sizeof(path), QEMU_CMD_FMT, config_name);
-	file = fopen(path, "w");
-	if (file == NULL) {
-		RTE_LOG(ERR, L2FWD_IVSHMEM, "Could not open '%s' \n", path);
-		return -1;
-	}
-
-	RTE_LOG(DEBUG, L2FWD_IVSHMEM, "QEMU command line for config '%s': %s \n",
-			config_name, cmdline);
-
-	fprintf(file, "%s\n", cmdline);
-	fclose(file);
-	return 0;
-}
-
-static int
-generate_ivshmem_cmdline(const char *config_name)
-{
-	char cmdline[PATH_MAX];
-	if (rte_ivshmem_metadata_cmdline_generate(cmdline, sizeof(cmdline),
-			config_name) < 0)
-		return -1;
-
-	if (print_to_file(cmdline, config_name) < 0)
-		return -1;
-
-	rte_ivshmem_metadata_dump(stdout, config_name);
-	return 0;
-}
-
-/* display usage */
-static void
-l2fwd_ivshmem_usage(const char *prgname)
-{
-	printf("%s [EAL options] -- -p PORTMASK [-q NQ -T PERIOD]\n"
-		   "  -p PORTMASK: hexadecimal bitmask of ports to configure\n"
-		   "  -q NQ: number of queue (=ports) per lcore (default is 1)\n"
-		   "  -T PERIOD: statistics will be refreshed each PERIOD seconds "
-		       "(0 to disable, 10 default, 86400 maximum)\n",
-	       prgname);
-}
-
-static unsigned int
-l2fwd_ivshmem_parse_nqueue(const char *q_arg)
-{
-	char *end = NULL;
-	unsigned long n;
-
-	/* parse hexadecimal string */
-	n = strtoul(q_arg, &end, 10);
-	if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0'))
-		return 0;
-	if (n == 0)
-		return 0;
-	if (n >= MAX_RX_QUEUE_PER_LCORE)
-		return 0;
-
-	return n;
-}
-
-static int
-l2fwd_ivshmem_parse_portmask(const char *portmask)
-{
-	char *end = NULL;
-	unsigned long pm;
-
-	/* parse hexadecimal string */
-	pm = strtoul(portmask, &end, 16);
-	if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0'))
-		return -1;
-
-	if (pm == 0)
-		return -1;
-
-	return pm;
-}
-
-static int
-l2fwd_ivshmem_parse_timer_period(const char *q_arg)
-{
-	char *end = NULL;
-	int n;
-
-	/* parse number string */
-	n = strtol(q_arg, &end, 10);
-	if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0'))
-		return -1;
-	if (n >= MAX_TIMER_PERIOD)
-		return -1;
-
-	return n;
-}
-
-/* Parse the argument given in the command line of the application */
-static int
-l2fwd_ivshmem_parse_args(int argc, char **argv)
-{
-	int opt, ret;
-	char **argvopt;
-	int option_index;
-	char *prgname = argv[0];
-	static struct option lgopts[] = {
-			{CMDLINE_OPT_FWD_CONF, 1, 0, 0},
-		{NULL, 0, 0, 0}
-	};
-
-	argvopt = argv;
-
-	while ((opt = getopt_long(argc, argvopt, "q:p:T:",
-				  lgopts, &option_index)) != EOF) {
-
-		switch (opt) {
-		/* portmask */
-		case 'p':
-			l2fwd_ivshmem_enabled_port_mask = l2fwd_ivshmem_parse_portmask(optarg);
-			if (l2fwd_ivshmem_enabled_port_mask == 0) {
-				printf("invalid portmask\n");
-				l2fwd_ivshmem_usage(prgname);
-				return -1;
-			}
-			break;
-
-		/* nqueue */
-		case 'q':
-			l2fwd_ivshmem_rx_queue_per_lcore = l2fwd_ivshmem_parse_nqueue(optarg);
-			if (l2fwd_ivshmem_rx_queue_per_lcore == 0) {
-				printf("invalid queue number\n");
-				l2fwd_ivshmem_usage(prgname);
-				return -1;
-			}
-			break;
-
-		/* timer period */
-		case 'T':
-			timer_period = l2fwd_ivshmem_parse_timer_period(optarg) * 1000 * TIMER_MILLISECOND;
-			if (timer_period < 0) {
-				printf("invalid timer period\n");
-				l2fwd_ivshmem_usage(prgname);
-				return -1;
-			}
-			break;
-
-		/* long options */
-		case 0:
-			l2fwd_ivshmem_usage(prgname);
-			return -1;
-
-		default:
-			l2fwd_ivshmem_usage(prgname);
-			return -1;
-		}
-	}
-
-	if (optind >= 0)
-		argv[optind-1] = prgname;
-
-	ret = optind-1;
-	optind = 0; /* reset getopt lib */
-	return ret;
-}
-
-/* Check the link status of all ports in up to 9s, and print them finally */
-static void
-check_all_ports_link_status(uint8_t port_num, uint32_t port_mask)
-{
-#define CHECK_INTERVAL 100 /* 100ms */
-#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */
-	uint8_t portid, count, all_ports_up, print_flag = 0;
-	struct rte_eth_link link;
-
-	printf("\nChecking link status");
-	fflush(stdout);
-	for (count = 0; count <= MAX_CHECK_TIME; count++) {
-		all_ports_up = 1;
-		for (portid = 0; portid < port_num; portid++) {
-			if ((port_mask & (1 << portid)) == 0)
-				continue;
-			memset(&link, 0, sizeof(link));
-			rte_eth_link_get_nowait(portid, &link);
-			/* print link status if flag set */
-			if (print_flag == 1) {
-				if (link.link_status)
-					printf("Port %d Link Up - speed %u "
-						"Mbps - %s\n", (uint8_t)portid,
-						(unsigned)link.link_speed,
-				(link.link_duplex == ETH_LINK_FULL_DUPLEX) ?
-					("full-duplex") : ("half-duplex\n"));
-				else
-					printf("Port %d Link Down\n",
-						(uint8_t)portid);
-				continue;
-			}
-			/* clear all_ports_up flag if any link down */
-			if (link.link_status == ETH_LINK_DOWN) {
-				all_ports_up = 0;
-				break;
-			}
-		}
-		/* after finally printing all link status, get out */
-		if (print_flag == 1)
-			break;
-
-		if (all_ports_up == 0) {
-			printf(".");
-			fflush(stdout);
-			rte_delay_ms(CHECK_INTERVAL);
-		}
-
-		/* set the print_flag if all ports up or timeout */
-		if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) {
-			print_flag = 1;
-			printf("done\n");
-		}
-	}
-}
-
-/* Send the burst of packets on an output interface */
-static int
-l2fwd_ivshmem_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
-{
-	struct rte_mbuf **m_table;
-	unsigned ret;
-	unsigned queueid =0;
-
-	m_table = (struct rte_mbuf **)qconf->tx_mbufs[port].m_table;
-
-	ret = rte_eth_tx_burst(port, (uint16_t) queueid, m_table, (uint16_t) n);
-	port_statistics[port].tx += ret;
-	if (unlikely(ret < n)) {
-		port_statistics[port].dropped += (n - ret);
-		do {
-			rte_pktmbuf_free(m_table[ret]);
-		} while (++ret < n);
-	}
-
-	return 0;
-}
-
-/* Enqueue packets for TX and prepare them to be sent on the network */
-static int
-l2fwd_ivshmem_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-	unsigned lcore_id, len;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-	len = qconf->tx_mbufs[port].len;
-	qconf->tx_mbufs[port].m_table[len] = m;
-	len++;
-
-	/* enough pkts to be sent */
-	if (unlikely(len == MAX_PKT_BURST)) {
-		l2fwd_ivshmem_send_burst(qconf, MAX_PKT_BURST, port);
-		len = 0;
-	}
-
-	qconf->tx_mbufs[port].len = len;
-	return 0;
-}
-
-static int
-l2fwd_ivshmem_receive_burst(struct lcore_queue_conf *qconf, unsigned portid,
-		unsigned vm_port)
-{
-	struct rte_mbuf ** m;
-	struct rte_ring * rx;
-	unsigned len, pkt_idx;
-
-	m = qconf->rx_mbufs[portid].m_table;
-	len = qconf->rx_mbufs[portid].len;
-	rx = qconf->port_param[vm_port]->rx_ring;
-
-	/* if enqueueing failed, ring is probably full, so drop the packets */
-	if (rte_ring_enqueue_bulk(rx, (void**) m, len) < 0) {
-		port_statistics[portid].dropped += len;
-
-		pkt_idx = 0;
-		do {
-			rte_pktmbuf_free(m[pkt_idx]);
-		} while (++pkt_idx < len);
-	}
-	else
-		/* increment rx stats by however many packets we managed to receive */
-		port_statistics[portid].rx += len;
-
-	return 0;
-}
-
-/* Enqueue packets for RX and prepare them to be sent to VM */
-static int
-l2fwd_ivshmem_receive_packets(struct rte_mbuf ** m, unsigned n, unsigned portid,
-		unsigned vm_port)
-{
-	unsigned lcore_id, len, pkt_idx;
-	struct lcore_queue_conf *qconf;
-
-	lcore_id = rte_lcore_id();
-
-	qconf = &lcore_queue_conf[lcore_id];
-
-	len = qconf->rx_mbufs[portid].len;
-	pkt_idx = 0;
-
-	/* enqueue packets */
-	while (pkt_idx < n && len < MAX_PKT_BURST * 2) {
-		qconf->rx_mbufs[portid].m_table[len++] = m[pkt_idx++];
-	}
-
-	/* increment queue len by however many packets we managed to receive */
-	qconf->rx_mbufs[portid].len += pkt_idx;
-
-	/* drop the unreceived packets */
-	if (unlikely(pkt_idx < n)) {
-		port_statistics[portid].dropped += n - pkt_idx;
-		do {
-			rte_pktmbuf_free(m[pkt_idx]);
-		} while (++pkt_idx < n);
-	}
-
-	/* drain the queue halfway through the maximum capacity */
-	if (unlikely(qconf->rx_mbufs[portid].len >= MAX_PKT_BURST))
-		l2fwd_ivshmem_receive_burst(qconf, portid, vm_port);
-
-	return 0;
-}
-
-/* loop for host forwarding mode.
- * the data flow is as follows:
- *  1) get packets from TX queue and send it out from a given port
- *  2) RX packets from given port and enqueue them on RX ring
- *  3) dequeue packets from TX ring and put them on TX queue for a given port
- */
-static void
-fwd_loop(void)
-{
-	struct rte_mbuf *pkts_burst[MAX_PKT_BURST * 2];
-	struct rte_mbuf *m;
-	unsigned lcore_id;
-	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
-	unsigned i, j, portid, nb_rx;
-	struct lcore_queue_conf *qconf;
-	struct rte_ring *tx;
-	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) / US_PER_S * BURST_TX_DRAIN_US;
-
-	prev_tsc = 0;
-	timer_tsc = 0;
-
-	lcore_id = rte_lcore_id();
-	qconf = &lcore_queue_conf[lcore_id];
-
-	if (qconf->n_rx_port == 0) {
-		RTE_LOG(INFO, L2FWD_IVSHMEM, "lcore %u has nothing to do\n", lcore_id);
-		return;
-	}
-
-	RTE_LOG(INFO, L2FWD_IVSHMEM, "entering main loop on lcore %u\n", lcore_id);
-
-	for (i = 0; i < qconf->n_rx_port; i++) {
-
-		portid = qconf->rx_port_list[i];
-		RTE_LOG(INFO, L2FWD_IVSHMEM, " -- lcoreid=%u portid=%u\n", lcore_id,
-			portid);
-	}
-
-	while (ctrl->state == STATE_FWD) {
-
-		cur_tsc = rte_rdtsc();
-
-		/*
-		 * Burst queue drain
-		 */
-		diff_tsc = cur_tsc - prev_tsc;
-		if (unlikely(diff_tsc > drain_tsc)) {
-
-			/*
-			 * TX
-			 */
-			for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
-				if (qconf->tx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_ivshmem_send_burst(qconf,
-						 qconf->tx_mbufs[portid].len,
-						 (uint8_t) portid);
-				qconf->tx_mbufs[portid].len = 0;
-			}
-
-			/*
-			 * RX
-			 */
-			for (i = 0; i < qconf->n_rx_port; i++) {
-				portid = qconf->rx_port_list[i];
-				if (qconf->rx_mbufs[portid].len == 0)
-					continue;
-				l2fwd_ivshmem_receive_burst(qconf, portid, i);
-				qconf->rx_mbufs[portid].len = 0;
-			}
-
-			/* if timer is enabled */
-			if (timer_period > 0) {
-
-				/* advance the timer */
-				timer_tsc += diff_tsc;
-
-				/* if timer has reached its timeout */
-				if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-
-					/* do this only on master core */
-					if (lcore_id == rte_get_master_lcore()) {
-						print_stats();
-						/* reset the timer */
-						timer_tsc = 0;
-					}
-				}
-			}
-
-			prev_tsc = cur_tsc;
-		}
-
-		/*
-		 * packet RX and forwarding
-		 */
-		for (i = 0; i < qconf->n_rx_port; i++) {
-
-			/* RX packets from port and put them on RX ring */
-			portid = qconf->rx_port_list[i];
-			nb_rx = rte_eth_rx_burst((uint8_t) portid, 0,
-						 pkts_burst, MAX_PKT_BURST);
-
-			if (nb_rx != 0)
-				l2fwd_ivshmem_receive_packets(pkts_burst, nb_rx, portid, i);
-
-			/* dequeue packets from TX ring and send them to TX queue */
-			tx = qconf->port_param[i]->tx_ring;
-
-			nb_rx = rte_ring_count(tx);
-
-			nb_rx = RTE_MIN(nb_rx, (unsigned) MAX_PKT_BURST);
-
-			if (nb_rx == 0)
-				continue;
-
-			/* should not happen */
-			if (unlikely(rte_ring_dequeue_bulk(tx, (void**) pkts_burst, nb_rx) < 0)) {
-				ctrl->state = STATE_FAIL;
-				return;
-			}
-
-			for (j = 0; j < nb_rx; j++) {
-				m = pkts_burst[j];
-				l2fwd_ivshmem_send_packet(m, portid);
-			}
-		}
-	}
-}
-
-static int
-l2fwd_ivshmem_launch_one_lcore(__attribute__((unused)) void *dummy)
-{
-	fwd_loop();
-	return 0;
-}
-
-int main(int argc, char **argv)
-{
-	char name[RTE_RING_NAMESIZE];
-	struct rte_ring *r;
-	struct lcore_queue_conf *qconf;
-	struct rte_eth_dev_info dev_info;
-	uint8_t portid, port_nr;
-	uint8_t nb_ports, nb_ports_available;
-	uint8_t nb_ports_in_mask;
-	int ret;
-	unsigned lcore_id, rx_lcore_id;
-
-	/* init EAL */
-	ret = rte_eal_init(argc, argv);
-	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
-	argc -= ret;
-	argv += ret;
-
-	/* parse application arguments (after the EAL ones) */
-	ret = l2fwd_ivshmem_parse_args(argc, argv);
-	if (ret < 0)
-		rte_exit(EXIT_FAILURE, "Invalid l2fwd-ivshmem arguments\n");
-
-	/* create a shared mbuf pool */
-	l2fwd_ivshmem_pktmbuf_pool =
-		rte_pktmbuf_pool_create(MBUF_MP_NAME, NB_MBUF, 32,
-			0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
-	if (l2fwd_ivshmem_pktmbuf_pool == NULL)
-		rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
-
-	nb_ports = rte_eth_dev_count();
-	if (nb_ports == 0)
-		rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");
-
-	/*
-	 * reserve memzone to communicate with VMs - we cannot use rte_malloc here
-	 * because while it is technically possible, it is a very bad idea to share
-	 * the heap between two primary processes.
-	 */
-	ctrl_mz = rte_memzone_reserve(CTRL_MZ_NAME, sizeof(struct ivshmem_ctrl),
-			SOCKET_ID_ANY, 0);
-	if (ctrl_mz == NULL)
-		rte_exit(EXIT_FAILURE, "Cannot reserve control memzone\n");
-	ctrl = (struct ivshmem_ctrl*) ctrl_mz->addr;
-
-	memset(ctrl, 0, sizeof(struct ivshmem_ctrl));
-
-	/*
-	 * Each port is assigned an output port.
-	 */
-	nb_ports_in_mask = 0;
-	for (portid = 0; portid < nb_ports; portid++) {
-		/* skip ports that are not enabled */
-		if ((l2fwd_ivshmem_enabled_port_mask & (1 << portid)) == 0)
-			continue;
-		if (portid % 2) {
-			ctrl->vm_ports[nb_ports_in_mask].dst = &ctrl->vm_ports[nb_ports_in_mask-1];
-			ctrl->vm_ports[nb_ports_in_mask-1].dst = &ctrl->vm_ports[nb_ports_in_mask];
-		}
-
-		nb_ports_in_mask++;
-
-		rte_eth_dev_info_get(portid, &dev_info);
-	}
-	if (nb_ports_in_mask % 2) {
-		printf("Notice: odd number of ports in portmask.\n");
-		ctrl->vm_ports[nb_ports_in_mask-1].dst =
-				&ctrl->vm_ports[nb_ports_in_mask-1];
-	}
-
-	rx_lcore_id = 0;
-	qconf = NULL;
-
-	printf("Initializing ports configuration...\n");
-
-	nb_ports_available = nb_ports;
-
-	/* Initialise each port */
-	for (portid = 0; portid < nb_ports; portid++) {
-
-		/* skip ports that are not enabled */
-		if ((l2fwd_ivshmem_enabled_port_mask & (1 << portid)) == 0) {
-			printf("Skipping disabled port %u\n", (unsigned) portid);
-			nb_ports_available--;
-			continue;
-		}
-
-		/* init port */
-		printf("Initializing port %u... ", (unsigned) portid);
-		fflush(stdout);
-		ret = rte_eth_dev_configure(portid, 1, 1, &port_conf);
-		if (ret < 0)
-			rte_exit(EXIT_FAILURE, "Cannot configure device: err=%d, port=%u\n",
-				  ret, (unsigned) portid);
-
-		rte_eth_macaddr_get(portid,&l2fwd_ivshmem_ports_eth_addr[portid]);
-
-		/* init one RX queue */
-		fflush(stdout);
-		ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd,
-						 rte_eth_dev_socket_id(portid),
-						 NULL,
-						 l2fwd_ivshmem_pktmbuf_pool);
-		if (ret < 0)
-			rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, port=%u\n",
-				  ret, (unsigned) portid);
-
-		/* init one TX queue on each port */
-		fflush(stdout);
-		ret = rte_eth_tx_queue_setup(portid, 0, nb_txd,
-				rte_eth_dev_socket_id(portid),
-				NULL);
-		if (ret < 0)
-			rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n",
-				ret, (unsigned) portid);
-
-		/* Start device */
-		ret = rte_eth_dev_start(portid);
-		if (ret < 0)
-			rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n",
-				  ret, (unsigned) portid);
-
-		printf("done: \n");
-
-		rte_eth_promiscuous_enable(portid);
-
-		printf("Port %u, MAC address: %02X:%02X:%02X:%02X:%02X:%02X\n\n",
-				(unsigned) portid,
-				l2fwd_ivshmem_ports_eth_addr[portid].addr_bytes[0],
-				l2fwd_ivshmem_ports_eth_addr[portid].addr_bytes[1],
-				l2fwd_ivshmem_ports_eth_addr[portid].addr_bytes[2],
-				l2fwd_ivshmem_ports_eth_addr[portid].addr_bytes[3],
-				l2fwd_ivshmem_ports_eth_addr[portid].addr_bytes[4],
-				l2fwd_ivshmem_ports_eth_addr[portid].addr_bytes[5]);
-
-		/* initialize port stats */
-		memset(&port_statistics, 0, sizeof(port_statistics));
-	}
-
-	if (!nb_ports_available) {
-		rte_exit(EXIT_FAILURE,
-			"All available ports are disabled. Please set portmask.\n");
-	}
-	port_nr = 0;
-
-	/* Initialize the port/queue configuration of each logical core */
-	for (portid = 0; portid < nb_ports; portid++) {
-		if ((l2fwd_ivshmem_enabled_port_mask & (1 << portid)) == 0)
-			continue;
-
-		/* get the lcore_id for this port */
-		while (rte_lcore_is_enabled(rx_lcore_id) == 0 ||
-			   lcore_queue_conf[rx_lcore_id].n_rx_port ==
-					   l2fwd_ivshmem_rx_queue_per_lcore) {
-			rx_lcore_id++;
-			if (rx_lcore_id >= RTE_MAX_LCORE)
-				rte_exit(EXIT_FAILURE, "Not enough cores\n");
-		}
-
-		if (qconf != &lcore_queue_conf[rx_lcore_id])
-			/* Assigned a new logical core in the loop above. */
-			qconf = &lcore_queue_conf[rx_lcore_id];
-
-
-		rte_eth_macaddr_get(portid, &ctrl->vm_ports[port_nr].ethaddr);
-
-		qconf->rx_port_list[qconf->n_rx_port] = portid;
-		qconf->port_param[qconf->n_rx_port] = &ctrl->vm_ports[port_nr];
-		qconf->n_rx_port++;
-		port_nr++;
-		printf("Lcore %u: RX port %u\n", rx_lcore_id, (unsigned) portid);
-	}
-
-	check_all_ports_link_status(nb_ports_available, l2fwd_ivshmem_enabled_port_mask);
-
-	/* create rings for each VM port (several ports can be on the same VM).
-	 * note that we store the pointers in ctrl - that way, they are the same
-	 * and valid across all VMs because ctrl is also in DPDK memory */
-	for (portid = 0; portid < nb_ports_available; portid++) {
-
-		/* RX ring. SP/SC because it's only used by host and a single VM */
-		snprintf(name, sizeof(name), "%s%i", RX_RING_PREFIX, portid);
-		r = rte_ring_create(name, NB_MBUF,
-				SOCKET_ID_ANY, RING_F_SP_ENQ | RING_F_SC_DEQ);
-		if (r == NULL)
-			rte_exit(EXIT_FAILURE, "Cannot create ring %s\n", name);
-
-		ctrl->vm_ports[portid].rx_ring = r;
-
-		/* TX ring. SP/SC because it's only used by host and a single VM */
-		snprintf(name, sizeof(name), "%s%i", TX_RING_PREFIX, portid);
-		r = rte_ring_create(name, NB_MBUF,
-				SOCKET_ID_ANY, RING_F_SP_ENQ | RING_F_SC_DEQ);
-		if (r == NULL)
-			rte_exit(EXIT_FAILURE, "Cannot create ring %s\n", name);
-
-		ctrl->vm_ports[portid].tx_ring = r;
-	}
-
-	/* create metadata, output cmdline */
-	if (rte_ivshmem_metadata_create(METADATA_NAME) < 0)
-		rte_exit(EXIT_FAILURE, "Cannot create IVSHMEM metadata\n");
-
-	if (rte_ivshmem_metadata_add_memzone(ctrl_mz, METADATA_NAME))
-		rte_exit(EXIT_FAILURE, "Cannot add memzone to IVSHMEM metadata\n");
-
-	if (rte_ivshmem_metadata_add_mempool(l2fwd_ivshmem_pktmbuf_pool, METADATA_NAME))
-		rte_exit(EXIT_FAILURE, "Cannot add mbuf mempool to IVSHMEM metadata\n");
-
-	for (portid = 0; portid < nb_ports_available; portid++) {
-		if (rte_ivshmem_metadata_add_ring(ctrl->vm_ports[portid].rx_ring,
-				METADATA_NAME) < 0)
-			rte_exit(EXIT_FAILURE, "Cannot add ring %s to IVSHMEM metadata\n",
-					ctrl->vm_ports[portid].rx_ring->name);
-		if (rte_ivshmem_metadata_add_ring(ctrl->vm_ports[portid].tx_ring,
-				METADATA_NAME) < 0)
-			rte_exit(EXIT_FAILURE, "Cannot add ring %s to IVSHMEM metadata\n",
-					ctrl->vm_ports[portid].tx_ring->name);
-	}
-	generate_ivshmem_cmdline(METADATA_NAME);
-
-	ctrl->nb_ports = nb_ports_available;
-
-	printf("Waiting for VM to initialize...\n");
-
-	/* wait for VM to initialize */
-	while (ctrl->state != STATE_FWD) {
-		if (ctrl->state == STATE_FAIL)
-			rte_exit(EXIT_FAILURE, "VM reported failure\n");
-
-		sleep(1);
-	}
-
-	printf("Done!\n");
-
-	sigsetup();
-
-	/* launch per-lcore init on every lcore */
-	rte_eal_mp_remote_launch(l2fwd_ivshmem_launch_one_lcore, NULL, CALL_MASTER);
-	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (rte_eal_wait_lcore(lcore_id) < 0)
-			return -1;
-	}
-
-	if (ctrl->state == STATE_FAIL)
-		rte_exit(EXIT_FAILURE, "VM reported failure\n");
-
-	return 0;
-}
diff --git a/examples/l2fwd-ivshmem/include/common.h b/examples/l2fwd-ivshmem/include/common.h
deleted file mode 100644
index 8564d32..0000000
--- a/examples/l2fwd-ivshmem/include/common.h
+++ /dev/null
@@ -1,111 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _IVSHMEM_COMMON_H_
-#define _IVSHMEM_COMMON_H_
-
-#define RTE_LOGTYPE_L2FWD_IVSHMEM RTE_LOGTYPE_USER1
-
-#define CTRL_MZ_NAME "CTRL_MEMZONE"
-#define MBUF_MP_NAME "MBUF_MEMPOOL"
-#define RX_RING_PREFIX "RX_"
-#define TX_RING_PREFIX "TX_"
-
-/* A tsc-based timer responsible for triggering statistics printout */
-#define TIMER_MILLISECOND 2000000ULL /* around 1ms at 2 Ghz */
-#define MAX_TIMER_PERIOD 86400 /* 1 day max */
-static int64_t timer_period = 10 * TIMER_MILLISECOND * 1000; /* default period is 10 seconds */
-
-#define DIM(x)\
-	(sizeof(x)/sizeof(x)[0])
-
-#define MAX_PKT_BURST 32
-
-const struct rte_memzone * ctrl_mz;
-
-enum l2fwd_state {
-	STATE_NONE = 0,
-	STATE_FWD,
-	STATE_EXIT,
-	STATE_FAIL
-};
-
-/* Per-port statistics struct */
-struct port_statistics {
-	uint64_t tx;
-	uint64_t rx;
-	uint64_t dropped;
-} __rte_cache_aligned;
-
-struct mbuf_table {
-	unsigned len;
-	struct rte_mbuf *m_table[MAX_PKT_BURST * 2]; /**< allow up to two bursts */
-};
-
-struct vm_port_param {
-	struct rte_ring * rx_ring;         /**< receiving ring for current port */
-	struct rte_ring * tx_ring;         /**< transmitting ring for current port */
-	struct vm_port_param * dst;        /**< current port's destination port */
-	volatile struct port_statistics stats;      /**< statistics for current port */
-	struct ether_addr ethaddr;         /**< Ethernet address of the port */
-};
-
-/* control structure, to synchronize host and VM */
-struct ivshmem_ctrl {
-	rte_spinlock_t lock;
-	uint8_t nb_ports;                /**< total nr of ports */
-	volatile enum l2fwd_state state; /**< report state */
-	struct vm_port_param vm_ports[RTE_MAX_ETHPORTS];
-};
-
-struct ivshmem_ctrl * ctrl;
-
-static unsigned int l2fwd_ivshmem_rx_queue_per_lcore = 1;
-
-static void sighandler(int __rte_unused s)
-{
-	ctrl->state = STATE_EXIT;
-}
-
-static void sigsetup(void)
-{
-	   struct sigaction sigIntHandler;
-
-	   sigIntHandler.sa_handler = sighandler;
-	   sigemptyset(&sigIntHandler.sa_mask);
-	   sigIntHandler.sa_flags = 0;
-
-	   sigaction(SIGINT, &sigIntHandler, NULL);
-}
-
-#endif /* _IVSHMEM_COMMON_H_ */
diff --git a/examples/packet_ordering/Makefile b/examples/packet_ordering/Makefile
index 9e080a3..de066c4 100644
--- a/examples/packet_ordering/Makefile
+++ b/examples/packet_ordering/Makefile
@@ -34,7 +34,7 @@ $(error "Please define RTE_SDK environment variable")
 endif
 
 # Default target, can be overridden by command line or environment
-RTE_TARGET ?= x86_64-ivshmem-linuxapp-gcc
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
diff --git a/lib/Makefile b/lib/Makefile
index ca7c02f..990f23a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -61,7 +61,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
-DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += librte_ivshmem
 endif
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 1bd0a33..64f4e0a 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -337,19 +337,7 @@ rte_memzone_free(const struct rte_memzone *mz)
 	idx = ((uintptr_t)mz - (uintptr_t)mcfg->memzone);
 	idx = idx / sizeof(struct rte_memzone);
 
-#ifdef RTE_LIBRTE_IVSHMEM
-	/*
-	 * If ioremap_addr is set, it's an IVSHMEM memzone and we cannot
-	 * free it.
-	 */
-	if (mcfg->memzone[idx].ioremap_addr != 0) {
-		rte_rwlock_write_unlock(&mcfg->mlock);
-		return -EINVAL;
-	}
-#endif
-
 	addr = mcfg->memzone[idx].addr;
-
 	if (addr == NULL)
 		ret = -EINVAL;
 	else if (mcfg->memzone_cnt == 0) {
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 857dc3e..0bda493 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -126,28 +126,6 @@ int rte_eal_log_init(const char *id, int facility);
  */
 int rte_eal_pci_init(void);
 
-#ifdef RTE_LIBRTE_IVSHMEM
-/**
- * Init the memory from IVSHMEM devices
- *
- * This function is private to EAL.
- *
- * @return
- *  0 on success, negative on error
- */
-int rte_eal_ivshmem_init(void);
-
-/**
- * Init objects in IVSHMEM devices
- *
- * This function is private to EAL.
- *
- * @return
- *  0 on success, negative on error
- */
-int rte_eal_ivshmem_obj_init(void);
-#endif
-
 struct rte_pci_driver;
 struct rte_pci_device;
 
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 0661109..d9e8c21 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -107,9 +107,6 @@ struct rte_memseg {
 		void *addr;         /**< Start virtual address. */
 		uint64_t addr_64;   /**< Makes sure addr is always 64 bits */
 	};
-#ifdef RTE_LIBRTE_IVSHMEM
-	phys_addr_t ioremap_addr; /**< Real physical address inside the VM */
-#endif
 	size_t len;               /**< Length of the segment. */
 	uint64_t hugepage_sz;       /**< The pagesize of underlying memory */
 	int32_t socket_id;          /**< NUMA socket ID. */
diff --git a/lib/librte_eal/common/include/rte_memzone.h b/lib/librte_eal/common/include/rte_memzone.h
index f69b5a8..dae98f5 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ -82,9 +82,6 @@ struct rte_memzone {
 		void *addr;                   /**< Start virtual address. */
 		uint64_t addr_64;             /**< Makes sure addr is always 64-bits */
 	};
-#ifdef RTE_LIBRTE_IVSHMEM
-	phys_addr_t ioremap_addr;         /**< Real physical address inside the VM */
-#endif
 	size_t len;                       /**< Length of the memzone. */
 
 	uint64_t hugepage_sz;             /**< The page size of underlying memory */
@@ -256,12 +253,10 @@ const struct rte_memzone *rte_memzone_reserve_bounded(const char *name,
 /**
  * Free a memzone.
  *
- * Note: an IVSHMEM zone cannot be freed.
- *
  * @param mz
  *   A pointer to the memzone
  * @return
- *  -EINVAL - invalid parameter, IVSHMEM memzone.
+ *  -EINVAL - invalid parameter.
  *  0 - success
  */
 int rte_memzone_free(const struct rte_memzone *mz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 763fa32..267a4c6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -221,14 +221,6 @@ rte_eal_malloc_heap_init(void)
 	for (ms = &mcfg->memseg[0], ms_cnt = 0;
 			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
 			ms_cnt++, ms++) {
-#ifdef RTE_LIBRTE_IVSHMEM
-		/*
-		 * if segment has ioremap address set, it's an IVSHMEM segment and
-		 * it is not memory to allocate from.
-		 */
-		if (ms->ioremap_addr != 0)
-			continue;
-#endif
 		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
 	}
 
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 182729c..0baa571 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -44,12 +44,6 @@ VPATH += $(RTE_SDK)/lib/librte_eal/common
 CFLAGS += -I$(SRCDIR)/include
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
-ifeq ($(CONFIG_RTE_LIBRTE_IVSHMEM),y)
-# workaround for circular dependency eal -> ivshmem -> ring/mempool -> eal
-CFLAGS += -I$(RTE_SDK)/lib/librte_ring
-CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
-CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem
-endif
 CFLAGS += $(WERROR_FLAGS) -O3
 
 LDLIBS += -ldl
@@ -76,9 +70,6 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
-ifeq ($(CONFIG_RTE_LIBRTE_IVSHMEM),y)
-SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_ivshmem.c
-endif
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 3fb2188..d5b81a3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -797,11 +797,6 @@ rte_eal_init(int argc, char **argv)
 		rte_panic("Cannot init VFIO\n");
 #endif
 
-#ifdef RTE_LIBRTE_IVSHMEM
-	if (rte_eal_ivshmem_init() < 0)
-		rte_panic("Cannot init IVSHMEM\n");
-#endif
-
 	if (rte_eal_memory_init() < 0)
 		rte_panic("Cannot init memory\n");
 
@@ -814,11 +809,6 @@ rte_eal_init(int argc, char **argv)
 	if (rte_eal_tailqs_init() < 0)
 		rte_panic("Cannot init tail queues for objects\n");
 
-#ifdef RTE_LIBRTE_IVSHMEM
-	if (rte_eal_ivshmem_obj_init() < 0)
-		rte_panic("Cannot init IVSHMEM objects\n");
-#endif
-
 	if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0)
 		rte_panic("Cannot init logs\n");
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
deleted file mode 100644
index 67b3caf..0000000
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ /dev/null
@@ -1,954 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifdef RTE_LIBRTE_IVSHMEM /* hide it from coverage */
-
-#include <stdint.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/mman.h>
-#include <sys/file.h>
-#include <string.h>
-#include <sys/queue.h>
-
-#include <rte_log.h>
-#include <rte_pci.h>
-#include <rte_memory.h>
-#include <rte_eal.h>
-#include <rte_eal_memconfig.h>
-#include <rte_string_fns.h>
-#include <rte_errno.h>
-#include <rte_ring.h>
-#include <rte_malloc.h>
-#include <rte_common.h>
-#include <rte_ivshmem.h>
-
-#include "eal_internal_cfg.h"
-#include "eal_private.h"
-
-#define PCI_VENDOR_ID_IVSHMEM 0x1Af4
-#define PCI_DEVICE_ID_IVSHMEM 0x1110
-
-#define IVSHMEM_MAGIC 0x0BADC0DE
-
-#define IVSHMEM_RESOURCE_PATH "/sys/bus/pci/devices/%04x:%02x:%02x.%x/resource2"
-#define IVSHMEM_CONFIG_PATH "/var/run/.%s_ivshmem_config"
-
-#define PHYS 0x1
-#define VIRT 0x2
-#define IOREMAP 0x4
-#define FULL (PHYS|VIRT|IOREMAP)
-
-#define METADATA_SIZE_ALIGNED \
-	(RTE_ALIGN_CEIL(sizeof(struct rte_ivshmem_metadata),pagesz))
-
-#define CONTAINS(x,y)\
-	(((y).addr_64 >= (x).addr_64) && ((y).addr_64 < (x).addr_64 + (x).len))
-
-#define DIM(x) (sizeof(x)/sizeof(x[0]))
-
-struct ivshmem_pci_device {
-	char path[PATH_MAX];
-	phys_addr_t ioremap_addr;
-};
-
-/* data type to store in config */
-struct ivshmem_segment {
-	struct rte_ivshmem_metadata_entry entry;
-	uint64_t align;
-	char path[PATH_MAX];
-};
-struct ivshmem_shared_config {
-	struct ivshmem_segment segment[RTE_MAX_MEMSEG];
-	uint32_t segment_idx;
-	struct ivshmem_pci_device pci_devs[RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS];
-	uint32_t pci_devs_idx;
-};
-static struct ivshmem_shared_config * ivshmem_config;
-static int memseg_idx;
-static int pagesz;
-
-/* Tailq heads to add rings to */
-TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
-
-/*
- * Utility functions
- */
-
-static int
-is_ivshmem_device(struct rte_pci_device * dev)
-{
-	return dev->id.vendor_id == PCI_VENDOR_ID_IVSHMEM
-			&& dev->id.device_id == PCI_DEVICE_ID_IVSHMEM;
-}
-
-static void *
-map_metadata(int fd, uint64_t len)
-{
-	size_t metadata_len = sizeof(struct rte_ivshmem_metadata);
-	size_t aligned_len = METADATA_SIZE_ALIGNED;
-
-	return mmap(NULL, metadata_len, PROT_READ | PROT_WRITE,
-			MAP_SHARED, fd, len - aligned_len);
-}
-
-static void
-unmap_metadata(void * ptr)
-{
-	munmap(ptr, sizeof(struct rte_ivshmem_metadata));
-}
-
-static int
-has_ivshmem_metadata(int fd, uint64_t len)
-{
-	struct rte_ivshmem_metadata metadata;
-	void * ptr;
-
-	ptr = map_metadata(fd, len);
-
-	if (ptr == MAP_FAILED)
-		return -1;
-
-	metadata = *(struct rte_ivshmem_metadata*) (ptr);
-
-	unmap_metadata(ptr);
-
-	return metadata.magic_number == IVSHMEM_MAGIC;
-}
-
-static void
-remove_segment(struct ivshmem_segment * ms, int len, int idx)
-{
-	int i;
-
-	for (i = idx; i < len - 1; i++)
-		memcpy(&ms[i], &ms[i+1], sizeof(struct ivshmem_segment));
-	memset(&ms[len-1], 0, sizeof(struct ivshmem_segment));
-}
-
-static int
-overlap(const struct rte_memzone * mz1, const struct rte_memzone * mz2)
-{
-	uint64_t start1, end1, start2, end2;
-	uint64_t p_start1, p_end1, p_start2, p_end2;
-	uint64_t i_start1, i_end1, i_start2, i_end2;
-	int result = 0;
-
-	/* gather virtual addresses */
-	start1 = mz1->addr_64;
-	end1 = mz1->addr_64 + mz1->len;
-	start2 = mz2->addr_64;
-	end2 = mz2->addr_64 + mz2->len;
-
-	/* gather physical addresses */
-	p_start1 = mz1->phys_addr;
-	p_end1 = mz1->phys_addr + mz1->len;
-	p_start2 = mz2->phys_addr;
-	p_end2 = mz2->phys_addr + mz2->len;
-
-	/* gather ioremap addresses */
-	i_start1 = mz1->ioremap_addr;
-	i_end1 = mz1->ioremap_addr + mz1->len;
-	i_start2 = mz2->ioremap_addr;
-	i_end2 = mz2->ioremap_addr + mz2->len;
-
-	/* check for overlap in virtual addresses */
-	if (start1 >= start2 && start1 < end2)
-		result |= VIRT;
-	if (start2 >= start1 && start2 < end1)
-		result |= VIRT;
-
-	/* check for overlap in physical addresses */
-	if (p_start1 >= p_start2 && p_start1 < p_end2)
-		result |= PHYS;
-	if (p_start2 >= p_start1 && p_start2 < p_end1)
-		result |= PHYS;
-
-	/* check for overlap in ioremap addresses */
-	if (i_start1 >= i_start2 && i_start1 < i_end2)
-		result |= IOREMAP;
-	if (i_start2 >= i_start1 && i_start2 < i_end1)
-		result |= IOREMAP;
-
-	return result;
-}
-
-static int
-adjacent(const struct rte_memzone * mz1, const struct rte_memzone * mz2)
-{
-	uint64_t start1, end1, start2, end2;
-	uint64_t p_start1, p_end1, p_start2, p_end2;
-	uint64_t i_start1, i_end1, i_start2, i_end2;
-	int result = 0;
-
-	/* gather virtual addresses */
-	start1 = mz1->addr_64;
-	end1 = mz1->addr_64 + mz1->len;
-	start2 = mz2->addr_64;
-	end2 = mz2->addr_64 + mz2->len;
-
-	/* gather physical addresses */
-	p_start1 = mz1->phys_addr;
-	p_end1 = mz1->phys_addr + mz1->len;
-	p_start2 = mz2->phys_addr;
-	p_end2 = mz2->phys_addr + mz2->len;
-
-	/* gather ioremap addresses */
-	i_start1 = mz1->ioremap_addr;
-	i_end1 = mz1->ioremap_addr + mz1->len;
-	i_start2 = mz2->ioremap_addr;
-	i_end2 = mz2->ioremap_addr + mz2->len;
-
-	/* check if segments are virtually adjacent */
-	if (start1 == end2)
-		result |= VIRT;
-	if (start2 == end1)
-		result |= VIRT;
-
-	/* check if segments are physically adjacent */
-	if (p_start1 == p_end2)
-		result |= PHYS;
-	if (p_start2 == p_end1)
-		result |= PHYS;
-
-	/* check if segments are ioremap-adjacent */
-	if (i_start1 == i_end2)
-		result |= IOREMAP;
-	if (i_start2 == i_end1)
-		result |= IOREMAP;
-
-	return result;
-}
-
-static int
-has_adjacent_segments(struct ivshmem_segment * ms, int len)
-{
-	int i, j;
-
-	for (i = 0; i < len; i++)
-		for (j = i + 1; j < len; j++) {
-			/* we're only interested in fully adjacent segments; partially
-			 * adjacent segments can coexist.
-			 */
-			if (adjacent(&ms[i].entry.mz, &ms[j].entry.mz) == FULL)
-				return 1;
-		}
-	return 0;
-}
-
-static int
-has_overlapping_segments(struct ivshmem_segment * ms, int len)
-{
-	int i, j;
-
-	for (i = 0; i < len; i++)
-		for (j = i + 1; j < len; j++)
-			if (overlap(&ms[i].entry.mz, &ms[j].entry.mz))
-				return 1;
-	return 0;
-}
-
-static int
-seg_compare(const void * a, const void * b)
-{
-	const struct ivshmem_segment * s1 = (const struct ivshmem_segment*) a;
-	const struct ivshmem_segment * s2 = (const struct ivshmem_segment*) b;
-
-	/* move unallocated zones to the end */
-	if (s1->entry.mz.addr == NULL && s2->entry.mz.addr == NULL)
-		return 0;
-	if (s1->entry.mz.addr == 0)
-		return 1;
-	if (s2->entry.mz.addr == 0)
-		return -1;
-
-	return s1->entry.mz.phys_addr > s2->entry.mz.phys_addr;
-}
-
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-static void
-entry_dump(struct rte_ivshmem_metadata_entry *e)
-{
-	RTE_LOG(DEBUG, EAL, "\tvirt: %p-%p\n", e->mz.addr,
-			RTE_PTR_ADD(e->mz.addr, e->mz.len));
-	RTE_LOG(DEBUG, EAL, "\tphys: 0x%" PRIx64 "-0x%" PRIx64 "\n",
-			e->mz.phys_addr,
-			e->mz.phys_addr + e->mz.len);
-	RTE_LOG(DEBUG, EAL, "\tio: 0x%" PRIx64 "-0x%" PRIx64 "\n",
-			e->mz.ioremap_addr,
-			e->mz.ioremap_addr + e->mz.len);
-	RTE_LOG(DEBUG, EAL, "\tlen: 0x%" PRIx64 "\n", e->mz.len);
-	RTE_LOG(DEBUG, EAL, "\toff: 0x%" PRIx64 "\n", e->offset);
-}
-#endif
-
-
-
-/*
- * Actual useful code
- */
-
-/* read through metadata mapped from the IVSHMEM device */
-static int
-read_metadata(char * path, int path_len, int fd, uint64_t flen)
-{
-	struct rte_ivshmem_metadata metadata;
-	struct rte_ivshmem_metadata_entry * entry;
-	int idx, i;
-	void * ptr;
-
-	ptr = map_metadata(fd, flen);
-
-	if (ptr == MAP_FAILED)
-		return -1;
-
-	metadata = *(struct rte_ivshmem_metadata*) (ptr);
-
-	unmap_metadata(ptr);
-
-	RTE_LOG(DEBUG, EAL, "Parsing metadata for \"%s\"\n", metadata.name);
-
-	idx = ivshmem_config->segment_idx;
-
-	for (i = 0; i < RTE_LIBRTE_IVSHMEM_MAX_ENTRIES &&
-		idx <= RTE_MAX_MEMSEG; i++) {
-
-		if (idx == RTE_MAX_MEMSEG) {
-			RTE_LOG(ERR, EAL, "Not enough memory segments!\n");
-			return -1;
-		}
-
-		entry = &metadata.entry[i];
-
-		/* stop on uninitialized memzone */
-		if (entry->mz.len == 0)
-			break;
-
-		/* copy metadata entry */
-		memcpy(&ivshmem_config->segment[idx].entry, entry,
-				sizeof(struct rte_ivshmem_metadata_entry));
-
-		/* copy path */
-		snprintf(ivshmem_config->segment[idx].path, path_len, "%s", path);
-
-		idx++;
-	}
-	ivshmem_config->segment_idx = idx;
-
-	return 0;
-}
-
-/* check through each segment and look for adjacent or overlapping ones. */
-static int
-cleanup_segments(struct ivshmem_segment * ms, int tbl_len)
-{
-	struct ivshmem_segment * s, * tmp;
-	int i, j, concat, seg_adjacent, seg_overlapping;
-	uint64_t start1, start2, end1, end2, p_start1, p_start2, i_start1, i_start2;
-
-	qsort(ms, tbl_len, sizeof(struct ivshmem_segment),
-				seg_compare);
-
-	while (has_overlapping_segments(ms, tbl_len) ||
-			has_adjacent_segments(ms, tbl_len)) {
-
-		for (i = 0; i < tbl_len; i++) {
-			s = &ms[i];
-
-			concat = 0;
-
-			for (j = i + 1; j < tbl_len; j++) {
-				tmp = &ms[j];
-
-				/* check if this segment is overlapping with existing segment,
-				 * or is adjacent to existing segment */
-				seg_overlapping = overlap(&s->entry.mz, &tmp->entry.mz);
-				seg_adjacent = adjacent(&s->entry.mz, &tmp->entry.mz);
-
-				/* check if segments fully overlap or are fully adjacent */
-				if ((seg_adjacent == FULL) || (seg_overlapping == FULL)) {
-
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-					RTE_LOG(DEBUG, EAL, "Concatenating segments\n");
-					RTE_LOG(DEBUG, EAL, "Segment %i:\n", i);
-					entry_dump(&s->entry);
-					RTE_LOG(DEBUG, EAL, "Segment %i:\n", j);
-					entry_dump(&tmp->entry);
-#endif
-
-					start1 = s->entry.mz.addr_64;
-					start2 = tmp->entry.mz.addr_64;
-					p_start1 = s->entry.mz.phys_addr;
-					p_start2 = tmp->entry.mz.phys_addr;
-					i_start1 = s->entry.mz.ioremap_addr;
-					i_start2 = tmp->entry.mz.ioremap_addr;
-					end1 = s->entry.mz.addr_64 + s->entry.mz.len;
-					end2 = tmp->entry.mz.addr_64 + tmp->entry.mz.len;
-
-					/* settle for minimum start address and maximum length */
-					s->entry.mz.addr_64 = RTE_MIN(start1, start2);
-					s->entry.mz.phys_addr = RTE_MIN(p_start1, p_start2);
-					s->entry.mz.ioremap_addr = RTE_MIN(i_start1, i_start2);
-					s->entry.offset = RTE_MIN(s->entry.offset, tmp->entry.offset);
-					s->entry.mz.len = RTE_MAX(end1, end2) - s->entry.mz.addr_64;
-					concat = 1;
-
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-					RTE_LOG(DEBUG, EAL, "Resulting segment:\n");
-					entry_dump(&s->entry);
-
-#endif
-				}
-				/* if segments not fully overlap, we have an error condition.
-				 * adjacent segments can coexist.
-				 */
-				else if (seg_overlapping > 0) {
-					RTE_LOG(ERR, EAL, "Segments %i and %i overlap!\n", i, j);
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-					RTE_LOG(DEBUG, EAL, "Segment %i:\n", i);
-					entry_dump(&s->entry);
-					RTE_LOG(DEBUG, EAL, "Segment %i:\n", j);
-					entry_dump(&tmp->entry);
-#endif
-					return -1;
-				}
-				if (concat)
-					break;
-			}
-			/* if we concatenated, remove segment at j */
-			if (concat) {
-				remove_segment(ms, tbl_len, j);
-				tbl_len--;
-				break;
-			}
-		}
-	}
-
-	return tbl_len;
-}
-
-static int
-create_shared_config(void)
-{
-	char path[PATH_MAX];
-	int fd;
-
-	/* build ivshmem config file path */
-	snprintf(path, sizeof(path), IVSHMEM_CONFIG_PATH,
-			internal_config.hugefile_prefix);
-
-	fd = open(path, O_CREAT | O_RDWR, 0600);
-
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Could not open %s: %s\n", path, strerror(errno));
-		return -1;
-	}
-
-	/* try ex-locking first - if the file is locked, we have a problem */
-	if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
-		RTE_LOG(ERR, EAL, "Locking %s failed: %s\n", path, strerror(errno));
-		close(fd);
-		return -1;
-	}
-
-	if (ftruncate(fd, sizeof(struct ivshmem_shared_config)) < 0) {
-		RTE_LOG(ERR, EAL, "ftruncate failed: %s\n", strerror(errno));
-		return -1;
-	}
-
-	ivshmem_config = mmap(NULL, sizeof(struct ivshmem_shared_config),
-			PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-
-	if (ivshmem_config == MAP_FAILED)
-		return -1;
-
-	memset(ivshmem_config, 0, sizeof(struct ivshmem_shared_config));
-
-	/* change the exclusive lock we got earlier to a shared lock */
-	if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
-		RTE_LOG(ERR, EAL, "Locking %s failed: %s \n", path, strerror(errno));
-		return -1;
-	}
-
-	close(fd);
-
-	return 0;
-}
-
-/* open shared config file and, if present, map the config.
- * having no config file is not an error condition, as we later check if
- * ivshmem_config is NULL (if it is, that means nothing was mapped). */
-static int
-open_shared_config(void)
-{
-	char path[PATH_MAX];
-	int fd;
-
-	/* build ivshmem config file path */
-	snprintf(path, sizeof(path), IVSHMEM_CONFIG_PATH,
-			internal_config.hugefile_prefix);
-
-	fd = open(path, O_RDONLY);
-
-	/* if the file doesn't exist, just return success */
-	if (fd < 0 && errno == ENOENT)
-		return 0;
-	/* else we have an error condition */
-	else if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Could not open %s: %s\n",
-				path, strerror(errno));
-		return -1;
-	}
-
-	/* try ex-locking first - if the lock *does* succeed, this means it's a
-	 * stray config file, so it should be deleted.
-	 */
-	if (flock(fd, LOCK_EX | LOCK_NB) != -1) {
-
-		/* if we can't remove the file, something is wrong */
-		if (unlink(path) < 0) {
-			RTE_LOG(ERR, EAL, "Could not remove %s: %s\n", path,
-					strerror(errno));
-			return -1;
-		}
-
-		/* release the lock */
-		flock(fd, LOCK_UN);
-		close(fd);
-
-		/* return success as having a stray config file is equivalent to not
-		 * having config file at all.
-		 */
-		return 0;
-	}
-
-	ivshmem_config = mmap(NULL, sizeof(struct ivshmem_shared_config),
-			PROT_READ, MAP_SHARED, fd, 0);
-
-	if (ivshmem_config == MAP_FAILED)
-		return -1;
-
-	/* place a shared lock on config file */
-	if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
-		RTE_LOG(ERR, EAL, "Locking %s failed: %s \n", path, strerror(errno));
-		return -1;
-	}
-
-	close(fd);
-
-	return 0;
-}
-
-/*
- * This function does the following:
- *
- * 1) Builds a table of ivshmem_segments with proper offset alignment
- * 2) Cleans up that table so that we don't have any overlapping or adjacent
- *    memory segments
- * 3) Creates memsegs from this table and maps them into memory.
- */
-static inline int
-map_all_segments(void)
-{
-	struct ivshmem_segment ms_tbl[RTE_MAX_MEMSEG];
-	struct ivshmem_pci_device * pci_dev;
-	struct rte_mem_config * mcfg;
-	struct ivshmem_segment * seg;
-	int fd, fd_zero;
-	unsigned i, j;
-	struct rte_memzone mz;
-	struct rte_memseg ms;
-	void * base_addr;
-	uint64_t align, len;
-	phys_addr_t ioremap_addr;
-
-	ioremap_addr = 0;
-
-	memset(ms_tbl, 0, sizeof(ms_tbl));
-	memset(&mz, 0, sizeof(struct rte_memzone));
-	memset(&ms, 0, sizeof(struct rte_memseg));
-
-	/* first, build a table of memsegs to map, to avoid failed mmaps due to
-	 * overlaps
-	 */
-	for (i = 0; i < ivshmem_config->segment_idx && i <= RTE_MAX_MEMSEG; i++) {
-		if (i == RTE_MAX_MEMSEG) {
-			RTE_LOG(ERR, EAL, "Too many segments requested!\n");
-			return -1;
-		}
-
-		seg = &ivshmem_config->segment[i];
-
-		/* copy segment to table */
-		memcpy(&ms_tbl[i], seg, sizeof(struct ivshmem_segment));
-
-		/* find ioremap addr */
-		for (j = 0; j < DIM(ivshmem_config->pci_devs); j++) {
-			pci_dev = &ivshmem_config->pci_devs[j];
-			if (!strncmp(pci_dev->path, seg->path, sizeof(pci_dev->path))) {
-				ioremap_addr = pci_dev->ioremap_addr;
-				break;
-			}
-		}
-		if (ioremap_addr == 0) {
-			RTE_LOG(ERR, EAL, "Cannot find ioremap addr!\n");
-			return -1;
-		}
-
-		/* work out alignments */
-		align = seg->entry.mz.addr_64 -
-				RTE_ALIGN_FLOOR(seg->entry.mz.addr_64, 0x1000);
-		len = RTE_ALIGN_CEIL(seg->entry.mz.len + align, 0x1000);
-
-		/* save original alignments */
-		ms_tbl[i].align = align;
-
-		/* create a memory zone */
-		mz.addr_64 = seg->entry.mz.addr_64 - align;
-		mz.len = len;
-		mz.hugepage_sz = seg->entry.mz.hugepage_sz;
-		mz.phys_addr = seg->entry.mz.phys_addr - align;
-
-		/* find true physical address */
-		mz.ioremap_addr = ioremap_addr + seg->entry.offset - align;
-
-		ms_tbl[i].entry.offset = seg->entry.offset - align;
-
-		memcpy(&ms_tbl[i].entry.mz, &mz, sizeof(struct rte_memzone));
-	}
-
-	/* clean up the segments */
-	memseg_idx = cleanup_segments(ms_tbl, ivshmem_config->segment_idx);
-
-	if (memseg_idx < 0)
-		return -1;
-
-	mcfg = rte_eal_get_configuration()->mem_config;
-
-	fd_zero = open("/dev/zero", O_RDWR);
-
-	if (fd_zero < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open /dev/zero: %s\n", strerror(errno));
-		return -1;
-	}
-
-	/* create memsegs and put them into DPDK memory */
-	for (i = 0; i < (unsigned) memseg_idx; i++) {
-
-		seg = &ms_tbl[i];
-
-		ms.addr_64 = seg->entry.mz.addr_64;
-		ms.hugepage_sz = seg->entry.mz.hugepage_sz;
-		ms.len = seg->entry.mz.len;
-		ms.nchannel = rte_memory_get_nchannel();
-		ms.nrank = rte_memory_get_nrank();
-		ms.phys_addr = seg->entry.mz.phys_addr;
-		ms.ioremap_addr = seg->entry.mz.ioremap_addr;
-		ms.socket_id = seg->entry.mz.socket_id;
-
-		base_addr = mmap(ms.addr, ms.len,
-				PROT_READ | PROT_WRITE, MAP_PRIVATE, fd_zero, 0);
-
-		if (base_addr == MAP_FAILED || base_addr != ms.addr) {
-			RTE_LOG(ERR, EAL, "Cannot map /dev/zero!\n");
-			return -1;
-		}
-
-		fd = open(seg->path, O_RDWR);
-
-		if (fd < 0) {
-			RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", seg->path,
-					strerror(errno));
-			return -1;
-		}
-
-		munmap(ms.addr, ms.len);
-
-		base_addr = mmap(ms.addr, ms.len,
-				PROT_READ | PROT_WRITE, MAP_SHARED, fd,
-				seg->entry.offset);
-
-
-		if (base_addr == MAP_FAILED || base_addr != ms.addr) {
-			RTE_LOG(ERR, EAL, "Cannot map segment into memory: "
-					"expected %p got %p (%s)\n", ms.addr, base_addr,
-					strerror(errno));
-			return -1;
-		}
-
-		RTE_LOG(DEBUG, EAL, "Memory segment mapped: %p (len %" PRIx64 ") at "
-				"offset 0x%" PRIx64 "\n",
-				ms.addr, ms.len, seg->entry.offset);
-
-		/* put the pointers back into their real positions using original
-		 * alignment */
-		ms.addr_64 += seg->align;
-		ms.phys_addr += seg->align;
-		ms.ioremap_addr += seg->align;
-		ms.len -= seg->align;
-
-		/* at this point, the rest of DPDK memory is not initialized, so we
-		 * expect memsegs to be empty */
-		memcpy(&mcfg->memseg[i], &ms,
-				sizeof(struct rte_memseg));
-
-		close(fd);
-
-		RTE_LOG(DEBUG, EAL, "IVSHMEM segment found, size: 0x%lx\n",
-				ms.len);
-	}
-
-	return 0;
-}
-
-/* this happens at a later stage, after general EAL memory initialization */
-int
-rte_eal_ivshmem_obj_init(void)
-{
-	struct rte_ring_list* ring_list = NULL;
-	struct rte_mem_config * mcfg;
-	struct ivshmem_segment * seg;
-	struct rte_memzone * mz;
-	struct rte_ring * r;
-	struct rte_tailq_entry *te;
-	unsigned i, ms, idx;
-	uint64_t offset;
-
-	/* secondary process would not need any object discovery - it'll all
-	 * already be in shared config */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY || ivshmem_config == NULL)
-		return 0;
-
-	/* check that we have an initialised ring tail queue */
-	ring_list = RTE_TAILQ_LOOKUP(RTE_TAILQ_RING_NAME, rte_ring_list);
-	if (ring_list == NULL) {
-		RTE_LOG(ERR, EAL, "No rte_ring tailq found!\n");
-		return -1;
-	}
-
-	mcfg = rte_eal_get_configuration()->mem_config;
-
-	/* create memzones */
-	for (i = 0; i < ivshmem_config->segment_idx && i <= RTE_MAX_MEMZONE; i++) {
-
-		seg = &ivshmem_config->segment[i];
-
-		/* add memzone */
-		if (mcfg->memzone_cnt == RTE_MAX_MEMZONE) {
-			RTE_LOG(ERR, EAL, "No more memory zones available!\n");
-			return -1;
-		}
-
-		idx = mcfg->memzone_cnt;
-
-		RTE_LOG(DEBUG, EAL, "Found memzone: '%s' at %p (len 0x%" PRIx64 ")\n",
-				seg->entry.mz.name, seg->entry.mz.addr, seg->entry.mz.len);
-
-		memcpy(&mcfg->memzone[idx], &seg->entry.mz,
-				sizeof(struct rte_memzone));
-
-		/* find ioremap address */
-		for (ms = 0; ms <= RTE_MAX_MEMSEG; ms++) {
-			if (ms == RTE_MAX_MEMSEG) {
-				RTE_LOG(ERR, EAL, "Physical address of segment not found!\n");
-				return -1;
-			}
-			if (CONTAINS(mcfg->memseg[ms], mcfg->memzone[idx])) {
-				offset = mcfg->memzone[idx].addr_64 -
-								mcfg->memseg[ms].addr_64;
-				mcfg->memzone[idx].ioremap_addr = mcfg->memseg[ms].ioremap_addr +
-						offset;
-				break;
-			}
-		}
-
-		mcfg->memzone_cnt++;
-	}
-
-	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
-
-	/* find rings */
-	for (i = 0; i < mcfg->memzone_cnt; i++) {
-		mz = &mcfg->memzone[i];
-
-		/* check if memzone has a ring prefix */
-		if (strncmp(mz->name, RTE_RING_MZ_PREFIX,
-				sizeof(RTE_RING_MZ_PREFIX) - 1) != 0)
-			continue;
-
-		r = (struct rte_ring*) (mz->addr_64);
-
-		te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
-		if (te == NULL) {
-			RTE_LOG(ERR, EAL, "Cannot allocate ring tailq entry!\n");
-			return -1;
-		}
-
-		te->data = (void *) r;
-
-		TAILQ_INSERT_TAIL(ring_list, te, next);
-
-		RTE_LOG(DEBUG, EAL, "Found ring: '%s' at %p\n", r->name, mz->addr);
-	}
-	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-	rte_memzone_dump(stdout);
-	rte_ring_list_dump(stdout);
-#endif
-
-	return 0;
-}
-
-/* initialize ivshmem structures */
-int rte_eal_ivshmem_init(void)
-{
-	struct rte_pci_device * dev;
-	struct rte_pci_resource * res;
-	int fd, ret;
-	char path[PATH_MAX];
-
-	/* initialize everything to 0 */
-	memset(path, 0, sizeof(path));
-	ivshmem_config = NULL;
-
-	pagesz = getpagesize();
-
-	RTE_LOG(DEBUG, EAL, "Searching for IVSHMEM devices...\n");
-
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-
-		if (open_shared_config() < 0) {
-			RTE_LOG(ERR, EAL, "Could not open IVSHMEM config!\n");
-			return -1;
-		}
-	}
-	else {
-
-		TAILQ_FOREACH(dev, &pci_device_list, next) {
-
-			if (is_ivshmem_device(dev)) {
-
-				/* IVSHMEM memory is always on BAR2 */
-				res = &dev->mem_resource[2];
-
-				/* if we don't have a BAR2 */
-				if (res->len == 0)
-					continue;
-
-				/* construct pci device path */
-				snprintf(path, sizeof(path), IVSHMEM_RESOURCE_PATH,
-						dev->addr.domain, dev->addr.bus, dev->addr.devid,
-						dev->addr.function);
-
-				/* try to find memseg */
-				fd = open(path, O_RDWR);
-				if (fd < 0) {
-					RTE_LOG(ERR, EAL, "Could not open %s\n", path);
-					return -1;
-				}
-
-				/* check if it's a DPDK IVSHMEM device */
-				ret = has_ivshmem_metadata(fd, res->len);
-
-				/* is DPDK device */
-				if (ret == 1) {
-
-					/* config file creation is deferred until the first
-					 * DPDK device is found. then, it has to be created
-					 * only once. */
-					if (ivshmem_config == NULL &&
-							create_shared_config() < 0) {
-						RTE_LOG(ERR, EAL, "Could not create IVSHMEM config!\n");
-						close(fd);
-						return -1;
-					}
-
-					if (read_metadata(path, sizeof(path), fd, res->len) < 0) {
-						RTE_LOG(ERR, EAL, "Could not read metadata from"
-								" device %02x:%02x.%x!\n", dev->addr.bus,
-								dev->addr.devid, dev->addr.function);
-						close(fd);
-						return -1;
-					}
-
-					if (ivshmem_config->pci_devs_idx == RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS) {
-						RTE_LOG(WARNING, EAL,
-								"IVSHMEM PCI device limit exceeded. Increase "
-								"CONFIG_RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS  in "
-								"your config file.\n");
-						break;
-					}
-
-					RTE_LOG(INFO, EAL, "Found IVSHMEM device %02x:%02x.%x\n",
-							dev->addr.bus, dev->addr.devid, dev->addr.function);
-
-					ivshmem_config->pci_devs[ivshmem_config->pci_devs_idx].ioremap_addr = res->phys_addr;
-					snprintf(ivshmem_config->pci_devs[ivshmem_config->pci_devs_idx].path,
-							sizeof(ivshmem_config->pci_devs[ivshmem_config->pci_devs_idx].path),
-							"%s", path);
-
-					ivshmem_config->pci_devs_idx++;
-				}
-				/* failed to read */
-				else if (ret < 0) {
-					RTE_LOG(ERR, EAL, "Could not read IVSHMEM device: %s\n",
-							strerror(errno));
-					close(fd);
-					return -1;
-				}
-				/* not a DPDK device */
-				else
-					RTE_LOG(DEBUG, EAL, "Skipping non-DPDK IVSHMEM device\n");
-
-				/* close the BAR fd */
-				close(fd);
-			}
-		}
-	}
-
-	/* ivshmem_config is not NULL only if config was created and/or mapped */
-	if (ivshmem_config) {
-		if (map_all_segments() < 0) {
-			RTE_LOG(ERR, EAL, "Mapping IVSHMEM segments failed!\n");
-			return -1;
-		}
-	}
-	else {
-		RTE_LOG(DEBUG, EAL, "No IVSHMEM configuration found! \n");
-	}
-
-	return 0;
-}
-
-#endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 41e0a92..992a1b1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1436,15 +1436,8 @@ rte_eal_hugepage_init(void)
 	free(tmp_hp);
 	tmp_hp = NULL;
 
-	/* find earliest free memseg - this is needed because in case of IVSHMEM,
-	 * segments might have already been initialized */
-	for (j = 0; j < RTE_MAX_MEMSEG; j++)
-		if (mcfg->memseg[j].addr == NULL) {
-			/* move to previous segment and exit loop */
-			j--;
-			break;
-		}
-
+	/* first memseg index shall be 0 after incrementing it below */
+	j = -1;
 	for (i = 0; i < nr_hugefiles; i++) {
 		new_memseg = 0;
 
@@ -1597,15 +1590,6 @@ rte_eal_hugepage_attach(void)
 		if (mcfg->memseg[s].len == 0)
 			break;
 
-#ifdef RTE_LIBRTE_IVSHMEM
-		/*
-		 * if segment has ioremap address set, it's an IVSHMEM segment and
-		 * doesn't need mapping as it was already mapped earlier
-		 */
-		if (mcfg->memseg[s].ioremap_addr != 0)
-			continue;
-#endif
-
 		/*
 		 * fdzero is mmapped to get a contiguous block of virtual
 		 * addresses of the appropriate memseg size.
@@ -1644,16 +1628,6 @@ rte_eal_hugepage_attach(void)
 		void *addr, *base_addr;
 		uintptr_t offset = 0;
 		size_t mapping_size;
-#ifdef RTE_LIBRTE_IVSHMEM
-		/*
-		 * if segment has ioremap address set, it's an IVSHMEM segment and
-		 * doesn't need mapping as it was already mapped earlier
-		 */
-		if (mcfg->memseg[s].ioremap_addr != 0) {
-			s++;
-			continue;
-		}
-#endif
 		/*
 		 * free previously mapped memory so we can map the
 		 * hugepages into the space
diff --git a/lib/librte_ivshmem/Makefile b/lib/librte_ivshmem/Makefile
deleted file mode 100644
index c099438..0000000
--- a/lib/librte_ivshmem/Makefile
+++ /dev/null
@@ -1,54 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#     * Neither the name of Intel Corporation nor the names of its
-#       contributors may be used to endorse or promote products derived
-#       from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-# library name
-LIB = librte_ivshmem.a
-
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-
-EXPORT_MAP := rte_ivshmem_version.map
-
-LIBABIVER := 1
-
-# all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_IVSHMEM) := rte_ivshmem.c
-
-# install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_IVSHMEM)-include := rte_ivshmem.h
-
-# this lib needs EAL, ring and mempool
-DEPDIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += lib/librte_eal
-DEPDIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += lib/librte_ring
-DEPDIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += lib/librte_mempool
-
-include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ivshmem/rte_ivshmem.c b/lib/librte_ivshmem/rte_ivshmem.c
deleted file mode 100644
index c26edb6..0000000
--- a/lib/librte_ivshmem/rte_ivshmem.c
+++ /dev/null
@@ -1,919 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-#include <fcntl.h>
-#include <limits.h>
-#include <unistd.h>
-#include <sys/mman.h>
-#include <string.h>
-#include <stdio.h>
-
-#include <rte_eal_memconfig.h>
-#include <rte_memory.h>
-#include <rte_ivshmem.h>
-#include <rte_string_fns.h>
-#include <rte_common.h>
-#include <rte_log.h>
-#include <rte_debug.h>
-#include <rte_spinlock.h>
-#include <rte_common.h>
-#include <rte_malloc.h>
-
-#include "rte_ivshmem.h"
-
-#define IVSHMEM_CONFIG_FILE_FMT "/var/run/.dpdk_ivshmem_metadata_%s"
-#define IVSHMEM_QEMU_CMD_LINE_HEADER_FMT "-device ivshmem,size=%" PRIu64 "M,shm=fd%s"
-#define IVSHMEM_QEMU_CMD_FD_FMT ":%s:0x%" PRIx64 ":0x%" PRIx64
-#define IVSHMEM_QEMU_CMDLINE_BUFSIZE 1024
-#define IVSHMEM_MAX_PAGES (1 << 12)
-#define adjacent(x,y) (((x).phys_addr+(x).len)==(y).phys_addr)
-#define METADATA_SIZE_ALIGNED \
-	(RTE_ALIGN_CEIL(sizeof(struct rte_ivshmem_metadata),pagesz))
-
-#define GET_PAGEMAP_ADDR(in,addr,dlm,err)    \
-{                                      \
-	char *end;                         \
-	errno = 0;                         \
-	addr = strtoull((in), &end, 16);   \
-	if (errno != 0 || *end != (dlm)) { \
-		RTE_LOG(ERR, EAL, err);        \
-		goto error;                    \
-	}                                  \
-	(in) = end + 1;                    \
-}
-
-static int pagesz;
-
-struct memseg_cache_entry {
-	char filepath[PATH_MAX];
-	uint64_t offset;
-	uint64_t len;
-};
-
-struct ivshmem_config {
-	struct rte_ivshmem_metadata * metadata;
-	struct memseg_cache_entry memseg_cache[IVSHMEM_MAX_PAGES];
-		/**< account for multiple files per segment case */
-	struct flock lock;
-	rte_spinlock_t sl;
-};
-
-static struct ivshmem_config
-ivshmem_global_config[RTE_LIBRTE_IVSHMEM_MAX_METADATA_FILES];
-
-static rte_spinlock_t global_cfg_sl;
-
-static struct ivshmem_config *
-get_config_by_name(const char * name)
-{
-	struct rte_ivshmem_metadata * config;
-	unsigned i;
-
-	for (i = 0; i < RTE_DIM(ivshmem_global_config); i++) {
-		config = ivshmem_global_config[i].metadata;
-		if (config == NULL)
-			return NULL;
-		if (strncmp(name, config->name, IVSHMEM_NAME_LEN) == 0)
-			return &ivshmem_global_config[i];
-	}
-
-	return NULL;
-}
-
-static int
-overlap(const struct rte_memzone * s1, const struct rte_memzone * s2)
-{
-	uint64_t start1, end1, start2, end2;
-
-	start1 = s1->addr_64;
-	end1 = s1->addr_64 + s1->len;
-	start2 = s2->addr_64;
-	end2 = s2->addr_64 + s2->len;
-
-	if (start1 >= start2 && start1 < end2)
-		return 1;
-	if (start2 >= start1 && start2 < end1)
-		return 1;
-
-	return 0;
-}
-
-static struct rte_memzone *
-get_memzone_by_addr(const void * addr)
-{
-	struct rte_memzone * tmp, * mz;
-	struct rte_mem_config * mcfg;
-	int i;
-
-	mcfg = rte_eal_get_configuration()->mem_config;
-	mz = NULL;
-
-	/* find memzone for the ring */
-	for (i = 0; i < RTE_MAX_MEMZONE; i++) {
-		tmp = &mcfg->memzone[i];
-
-		if (tmp->addr_64 == (uint64_t) addr) {
-			mz = tmp;
-			break;
-		}
-	}
-
-	return mz;
-}
-
-static int
-entry_compare(const void * a, const void * b)
-{
-	const struct rte_ivshmem_metadata_entry * e1 =
-			(const struct rte_ivshmem_metadata_entry*) a;
-	const struct rte_ivshmem_metadata_entry * e2 =
-			(const struct rte_ivshmem_metadata_entry*) b;
-
-	/* move unallocated zones to the end */
-	if (e1->mz.addr == NULL && e2->mz.addr == NULL)
-		return 0;
-	if (e1->mz.addr == 0)
-		return 1;
-	if (e2->mz.addr == 0)
-		return -1;
-
-	return e1->mz.phys_addr > e2->mz.phys_addr;
-}
-
-/* fills hugepage cache entry for a given start virt_addr */
-static int
-get_hugefile_by_virt_addr(uint64_t virt_addr, struct memseg_cache_entry * e)
-{
-	uint64_t start_addr, end_addr;
-	char *start,*path_end;
-	char buf[PATH_MAX*2];
-	FILE *f;
-
-	start = NULL;
-	path_end = NULL;
-	start_addr = 0;
-
-	memset(e->filepath, 0, sizeof(e->filepath));
-
-	/* open /proc/self/maps */
-	f = fopen("/proc/self/maps", "r");
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "cannot open /proc/self/maps!\n");
-		return -1;
-	}
-
-	/* parse maps */
-	while (fgets(buf, sizeof(buf), f) != NULL) {
-
-		/* get endptr to end of start addr */
-		start = buf;
-
-		GET_PAGEMAP_ADDR(start,start_addr,'-',
-				"Cannot find start address in maps!\n");
-
-		/* if start address is bigger than our address, skip */
-		if (start_addr > virt_addr)
-			continue;
-
-		GET_PAGEMAP_ADDR(start,end_addr,' ',
-				"Cannot find end address in maps!\n");
-
-		/* if end address is less than our address, skip */
-		if (end_addr <= virt_addr)
-			continue;
-
-		/* find where the path starts */
-		start = strstr(start, "/");
-
-		if (start == NULL)
-			continue;
-
-		/* at this point, we know that this is our map.
-		 * now let's find the file */
-		path_end = strstr(start, "\n");
-		break;
-	}
-
-	if (path_end == NULL) {
-		RTE_LOG(ERR, EAL, "Hugefile path not found!\n");
-		goto error;
-	}
-
-	/* calculate offset and copy the file path */
-	snprintf(e->filepath, RTE_PTR_DIFF(path_end, start) + 1, "%s", start);
-
-	e->offset = virt_addr - start_addr;
-
-	fclose(f);
-
-	return 0;
-error:
-	fclose(f);
-	return -1;
-}
-
-/*
- * This is a complex function. What it does is the following:
- *  1. Goes through metadata and gets list of hugepages involved
- *  2. Sorts the hugepages by size (1G first)
- *  3. Goes through metadata again and writes correct offsets
- *  4. Goes through pages and finds out their filenames, offsets etc.
- */
-static int
-build_config(struct rte_ivshmem_metadata * metadata)
-{
-	struct rte_ivshmem_metadata_entry * e_local;
-	struct memseg_cache_entry * ms_local;
-	struct rte_memseg pages[IVSHMEM_MAX_PAGES];
-	struct rte_ivshmem_metadata_entry *entry;
-	struct memseg_cache_entry * c_entry, * prev_entry;
-	struct ivshmem_config * config;
-	unsigned i, j, mz_iter, ms_iter;
-	uint64_t biggest_len;
-	int biggest_idx;
-
-	/* return error if we try to use an unknown config file */
-	config = get_config_by_name(metadata->name);
-	if (config == NULL) {
-		RTE_LOG(ERR, EAL, "Cannot find IVSHMEM config %s!\n", metadata->name);
-		goto fail_e;
-	}
-
-	memset(pages, 0, sizeof(pages));
-
-	e_local = malloc(sizeof(config->metadata->entry));
-	if (e_local == NULL)
-		goto fail_e;
-	ms_local = malloc(sizeof(config->memseg_cache));
-	if (ms_local == NULL)
-		goto fail_ms;
-
-
-	/* make local copies before doing anything */
-	memcpy(e_local, config->metadata->entry, sizeof(config->metadata->entry));
-	memcpy(ms_local, config->memseg_cache, sizeof(config->memseg_cache));
-
-	qsort(e_local, RTE_DIM(config->metadata->entry), sizeof(struct rte_ivshmem_metadata_entry),
-			entry_compare);
-
-	/* first pass - collect all huge pages */
-	for (mz_iter = 0; mz_iter < RTE_DIM(config->metadata->entry); mz_iter++) {
-
-		entry = &e_local[mz_iter];
-
-		uint64_t start_addr = RTE_ALIGN_FLOOR(entry->mz.addr_64,
-				entry->mz.hugepage_sz);
-		uint64_t offset = entry->mz.addr_64 - start_addr;
-		uint64_t len = RTE_ALIGN_CEIL(entry->mz.len + offset,
-				entry->mz.hugepage_sz);
-
-		if (entry->mz.addr_64 == 0 || start_addr == 0 || len == 0)
-			continue;
-
-		int start_page;
-
-		/* find first unused page - mz are phys_addr sorted so we don't have to
-		 * look out for holes */
-		for (i = 0; i < RTE_DIM(pages); i++) {
-
-			/* skip if we already have this page */
-			if (pages[i].addr_64 == start_addr) {
-				start_addr += entry->mz.hugepage_sz;
-				len -= entry->mz.hugepage_sz;
-				continue;
-			}
-			/* we found a new page */
-			else if (pages[i].addr_64 == 0) {
-				start_page = i;
-				break;
-			}
-		}
-		if (i == RTE_DIM(pages)) {
-			RTE_LOG(ERR, EAL, "Cannot find unused page!\n");
-			goto fail;
-		}
-
-		/* populate however many pages the memzone has */
-		for (i = start_page; i < RTE_DIM(pages) && len != 0; i++) {
-
-			pages[i].addr_64 = start_addr;
-			pages[i].len = entry->mz.hugepage_sz;
-			start_addr += entry->mz.hugepage_sz;
-			len -= entry->mz.hugepage_sz;
-		}
-		/* if there's still length left */
-		if (len != 0) {
-			RTE_LOG(ERR, EAL, "Not enough space for pages!\n");
-			goto fail;
-		}
-	}
-
-	/* second pass - sort pages by size */
-	for (i = 0; i < RTE_DIM(pages); i++) {
-
-		if (pages[i].addr == NULL)
-			break;
-
-		biggest_len = 0;
-		biggest_idx = -1;
-
-		/*
-		 * browse all entries starting at 'i', and find the
-		 * entry with the smallest addr
-		 */
-		for (j=i; j< RTE_DIM(pages); j++) {
-			if (pages[j].addr == NULL)
-					break;
-			if (biggest_len == 0 ||
-				pages[j].len > biggest_len) {
-				biggest_len = pages[j].len;
-				biggest_idx = j;
-			}
-		}
-
-		/* should not happen */
-		if (biggest_idx == -1) {
-			RTE_LOG(ERR, EAL, "Error sorting by size!\n");
-			goto fail;
-		}
-		if (i != (unsigned) biggest_idx) {
-			struct rte_memseg tmp;
-
-			memcpy(&tmp, &pages[biggest_idx], sizeof(struct rte_memseg));
-
-			/* we don't want to break contiguousness, so instead of just
-			 * swapping segments, we move all the preceding segments to the
-			 * right and then put the old segment @ biggest_idx in place of
-			 * segment @ i */
-			for (j = biggest_idx - 1; j >= i; j--) {
-				memcpy(&pages[j+1], &pages[j], sizeof(struct rte_memseg));
-				memset(&pages[j], 0, sizeof(struct rte_memseg));
-				if (j == 0)
-					break;
-			}
-
-			/* put old biggest segment to its new place */
-			memcpy(&pages[i], &tmp, sizeof(struct rte_memseg));
-		}
-	}
-
-	/* third pass - write correct offsets */
-	for (mz_iter = 0; mz_iter < RTE_DIM(config->metadata->entry); mz_iter++) {
-
-		uint64_t offset = 0;
-
-		entry = &e_local[mz_iter];
-
-		if (entry->mz.addr_64 == 0)
-			break;
-
-		/* find page for current memzone */
-		for (i = 0; i < RTE_DIM(pages); i++) {
-			/* we found our page */
-			if (entry->mz.addr_64 >= pages[i].addr_64 &&
-					entry->mz.addr_64 < pages[i].addr_64 + pages[i].len) {
-				entry->offset = (entry->mz.addr_64 - pages[i].addr_64) +
-						offset;
-				break;
-			}
-			offset += pages[i].len;
-		}
-		if (i == RTE_DIM(pages)) {
-			RTE_LOG(ERR, EAL, "Page not found!\n");
-			goto fail;
-		}
-	}
-
-	ms_iter = 0;
-	prev_entry = NULL;
-
-	/* fourth pass - create proper memseg cache */
-	for (i = 0; i < RTE_DIM(pages) &&
-			ms_iter <= RTE_DIM(config->memseg_cache); i++) {
-		if (pages[i].addr_64 == 0)
-			break;
-
-
-		if (ms_iter == RTE_DIM(pages)) {
-			RTE_LOG(ERR, EAL, "The universe has collapsed!\n");
-			goto fail;
-		}
-
-		c_entry = &ms_local[ms_iter];
-		c_entry->len = pages[i].len;
-
-		if (get_hugefile_by_virt_addr(pages[i].addr_64, c_entry) < 0)
-			goto fail;
-
-		/* if previous entry has the same filename and is contiguous,
-		 * clear current entry and increase previous entry's length
-		 */
-		if (prev_entry != NULL &&
-				strncmp(c_entry->filepath, prev_entry->filepath,
-				sizeof(c_entry->filepath)) == 0 &&
-				prev_entry->offset + prev_entry->len == c_entry->offset) {
-			prev_entry->len += pages[i].len;
-			memset(c_entry, 0, sizeof(struct memseg_cache_entry));
-		}
-		else {
-			prev_entry = c_entry;
-			ms_iter++;
-		}
-	}
-
-	/* update current configuration with new valid data */
-	memcpy(config->metadata->entry, e_local, sizeof(config->metadata->entry));
-	memcpy(config->memseg_cache, ms_local, sizeof(config->memseg_cache));
-
-	free(ms_local);
-	free(e_local);
-
-	return 0;
-fail:
-	free(ms_local);
-fail_ms:
-	free(e_local);
-fail_e:
-	return -1;
-}
-
-static int
-add_memzone_to_metadata(const struct rte_memzone * mz,
-		struct ivshmem_config * config)
-{
-	struct rte_ivshmem_metadata_entry * entry;
-	unsigned i, idx;
-	struct rte_mem_config *mcfg;
-
-	if (mz->len == 0) {
-		RTE_LOG(ERR, EAL, "Trying to add an empty memzone\n");
-		return -1;
-	}
-
-	rte_spinlock_lock(&config->sl);
-
-	mcfg = rte_eal_get_configuration()->mem_config;
-
-	/* it prevents the memzone being freed while we add it to the metadata */
-	rte_rwlock_write_lock(&mcfg->mlock);
-
-	/* find free slot in this config */
-	for (i = 0; i < RTE_DIM(config->metadata->entry); i++) {
-		entry = &config->metadata->entry[i];
-
-		if (&entry->mz.addr_64 != 0 && overlap(mz, &entry->mz)) {
-			RTE_LOG(ERR, EAL, "Overlapping memzones!\n");
-			goto fail;
-		}
-
-		/* if addr is zero, the memzone is probably free */
-		if (entry->mz.addr_64 == 0) {
-			RTE_LOG(DEBUG, EAL, "Adding memzone '%s' at %p to metadata %s\n",
-					mz->name, mz->addr, config->metadata->name);
-			memcpy(&entry->mz, mz, sizeof(struct rte_memzone));
-
-			/* run config file parser */
-			if (build_config(config->metadata) < 0)
-				goto fail;
-
-			break;
-		}
-	}
-
-	/* if we reached the maximum, that means we have no place in config */
-	if (i == RTE_DIM(config->metadata->entry)) {
-		RTE_LOG(ERR, EAL, "No space left in IVSHMEM metadata %s!\n",
-				config->metadata->name);
-		goto fail;
-	}
-
-	idx = ((uintptr_t)mz - (uintptr_t)mcfg->memzone);
-	idx = idx / sizeof(struct rte_memzone);
-
-	/* mark the memzone not freeable */
-	mcfg->memzone[idx].ioremap_addr = mz->phys_addr;
-
-	rte_rwlock_write_unlock(&mcfg->mlock);
-	rte_spinlock_unlock(&config->sl);
-	return 0;
-fail:
-	rte_rwlock_write_unlock(&mcfg->mlock);
-	rte_spinlock_unlock(&config->sl);
-	return -1;
-}
-
-static int
-add_ring_to_metadata(const struct rte_ring * r,
-		struct ivshmem_config * config)
-{
-	struct rte_memzone * mz;
-
-	mz = get_memzone_by_addr(r);
-
-	if (!mz) {
-		RTE_LOG(ERR, EAL, "Cannot find memzone for ring!\n");
-		return -1;
-	}
-
-	return add_memzone_to_metadata(mz, config);
-}
-
-static int
-add_mempool_memzone_to_metadata(const void *addr,
-		struct ivshmem_config *config)
-{
-	struct rte_memzone *mz;
-
-	mz = get_memzone_by_addr(addr);
-
-	if (!mz) {
-		RTE_LOG(ERR, EAL, "Cannot find memzone for mempool!\n");
-		return -1;
-	}
-
-	return add_memzone_to_metadata(mz, config);
-}
-
-static int
-add_mempool_to_metadata(const struct rte_mempool *mp,
-		struct ivshmem_config *config)
-{
-	struct rte_mempool_memhdr *memhdr;
-	int ret;
-
-	ret = add_mempool_memzone_to_metadata(mp, config);
-	if (ret < 0)
-		return -1;
-
-	STAILQ_FOREACH(memhdr, &mp->mem_list, next) {
-		ret = add_mempool_memzone_to_metadata(memhdr->addr, config);
-		if (ret < 0)
-			return -1;
-	}
-
-	/* mempool consists of memzone and ring */
-	return add_ring_to_metadata(mp->pool_data, config);
-}
-
-int
-rte_ivshmem_metadata_add_ring(const struct rte_ring * r, const char * name)
-{
-	struct ivshmem_config * config;
-
-	if (name == NULL || r == NULL)
-		return -1;
-
-	config = get_config_by_name(name);
-
-	if (config == NULL) {
-		RTE_LOG(ERR, EAL, "Cannot find IVSHMEM config %s!\n", name);
-		return -1;
-	}
-
-	return add_ring_to_metadata(r, config);
-}
-
-int
-rte_ivshmem_metadata_add_memzone(const struct rte_memzone * mz, const char * name)
-{
-	struct ivshmem_config * config;
-
-	if (name == NULL || mz == NULL)
-		return -1;
-
-	config = get_config_by_name(name);
-
-	if (config == NULL) {
-		RTE_LOG(ERR, EAL, "Cannot find IVSHMEM config %s!\n", name);
-		return -1;
-	}
-
-	return add_memzone_to_metadata(mz, config);
-}
-
-int
-rte_ivshmem_metadata_add_mempool(const struct rte_mempool * mp, const char * name)
-{
-	struct ivshmem_config * config;
-
-	if (name == NULL || mp == NULL)
-		return -1;
-
-	config = get_config_by_name(name);
-
-	if (config == NULL) {
-		RTE_LOG(ERR, EAL, "Cannot find IVSHMEM config %s!\n", name);
-		return -1;
-	}
-
-	return add_mempool_to_metadata(mp, config);
-}
-
-static inline void
-ivshmem_config_path(char *buffer, size_t bufflen, const char *name)
-{
-	snprintf(buffer, bufflen, IVSHMEM_CONFIG_FILE_FMT, name);
-}
-
-
-
-static inline
-void *ivshmem_metadata_create(const char *name, size_t size,
-		struct flock *lock)
-{
-	int retval, fd;
-	void *metadata_addr;
-	char pathname[PATH_MAX];
-
-	ivshmem_config_path(pathname, sizeof(pathname), name);
-
-	fd = open(pathname, O_RDWR | O_CREAT, 0660);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open '%s'\n", pathname);
-		return NULL;
-	}
-
-	size = METADATA_SIZE_ALIGNED;
-
-	retval = fcntl(fd, F_SETLK, lock);
-	if (retval < 0){
-		close(fd);
-		RTE_LOG(ERR, EAL, "Cannot create lock on '%s'. Is another "
-				"process using it?\n", pathname);
-		return NULL;
-	}
-
-	retval = ftruncate(fd, size);
-	if (retval < 0){
-		close(fd);
-		RTE_LOG(ERR, EAL, "Cannot resize '%s'\n", pathname);
-		return NULL;
-	}
-
-	metadata_addr = mmap(NULL, size,
-				PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-
-	if (metadata_addr == MAP_FAILED){
-		RTE_LOG(ERR, EAL, "Cannot mmap memory for '%s'\n", pathname);
-
-		/* we don't care if we can't unlock */
-		fcntl(fd, F_UNLCK, lock);
-		close(fd);
-
-		return NULL;
-	}
-
-	return metadata_addr;
-}
-
-int rte_ivshmem_metadata_create(const char *name)
-{
-	struct ivshmem_config * ivshmem_config;
-	unsigned index;
-
-	if (pagesz == 0)
-		pagesz = getpagesize();
-
-	if (name == NULL)
-		return -1;
-
-	rte_spinlock_lock(&global_cfg_sl);
-
-	for (index = 0; index < RTE_DIM(ivshmem_global_config); index++) {
-		if (ivshmem_global_config[index].metadata == NULL) {
-			ivshmem_config = &ivshmem_global_config[index];
-			break;
-		}
-	}
-
-	if (index == RTE_DIM(ivshmem_global_config)) {
-		RTE_LOG(ERR, EAL, "Cannot create more ivshmem config files. "
-		"Maximum has been reached\n");
-		rte_spinlock_unlock(&global_cfg_sl);
-		return -1;
-	}
-
-	ivshmem_config->lock.l_type = F_WRLCK;
-	ivshmem_config->lock.l_whence = SEEK_SET;
-
-	ivshmem_config->lock.l_start = 0;
-	ivshmem_config->lock.l_len = METADATA_SIZE_ALIGNED;
-
-	ivshmem_global_config[index].metadata = ((struct rte_ivshmem_metadata *)
-			ivshmem_metadata_create(
-					name,
-					sizeof(struct rte_ivshmem_metadata),
-					&ivshmem_config->lock));
-
-	if (ivshmem_global_config[index].metadata == NULL) {
-		rte_spinlock_unlock(&global_cfg_sl);
-		return -1;
-	}
-
-	/* Metadata setup */
-	memset(ivshmem_config->metadata, 0, sizeof(struct rte_ivshmem_metadata));
-	ivshmem_config->metadata->magic_number = IVSHMEM_MAGIC;
-	snprintf(ivshmem_config->metadata->name,
-			sizeof(ivshmem_config->metadata->name), "%s", name);
-
-	rte_spinlock_unlock(&global_cfg_sl);
-
-	return 0;
-}
-
-int
-rte_ivshmem_metadata_cmdline_generate(char *buffer, unsigned size, const char *name)
-{
-	const struct memseg_cache_entry * ms_cache, *entry;
-	struct ivshmem_config * config;
-	char cmdline[IVSHMEM_QEMU_CMDLINE_BUFSIZE], *cmdline_ptr;
-	char cfg_file_path[PATH_MAX];
-	unsigned remaining_len, tmplen, iter;
-	uint64_t shared_mem_size, zero_size, total_size;
-
-	if (buffer == NULL || name == NULL)
-		return -1;
-
-	config = get_config_by_name(name);
-
-	if (config == NULL) {
-		RTE_LOG(ERR, EAL, "Config %s not found!\n", name);
-		return -1;
-	}
-
-	rte_spinlock_lock(&config->sl);
-
-	/* prepare metadata file path */
-	snprintf(cfg_file_path, sizeof(cfg_file_path), IVSHMEM_CONFIG_FILE_FMT,
-			config->metadata->name);
-
-	ms_cache = config->memseg_cache;
-
-	cmdline_ptr = cmdline;
-	remaining_len = sizeof(cmdline);
-
-	shared_mem_size = 0;
-	iter = 0;
-
-	while ((ms_cache[iter].len != 0) && (iter < RTE_DIM(config->metadata->entry))) {
-
-		entry = &ms_cache[iter];
-
-		/* Offset and sizes within the current pathname */
-		tmplen = snprintf(cmdline_ptr, remaining_len, IVSHMEM_QEMU_CMD_FD_FMT,
-				entry->filepath, entry->offset, entry->len);
-
-		shared_mem_size += entry->len;
-
-		cmdline_ptr = RTE_PTR_ADD(cmdline_ptr, tmplen);
-		remaining_len -= tmplen;
-
-		if (remaining_len == 0) {
-			RTE_LOG(ERR, EAL, "Command line too long!\n");
-			rte_spinlock_unlock(&config->sl);
-			return -1;
-		}
-
-		iter++;
-	}
-
-	total_size = rte_align64pow2(shared_mem_size + METADATA_SIZE_ALIGNED);
-	zero_size = total_size - shared_mem_size - METADATA_SIZE_ALIGNED;
-
-	/* add /dev/zero to command-line to fill the space */
-	tmplen = snprintf(cmdline_ptr, remaining_len, IVSHMEM_QEMU_CMD_FD_FMT,
-			"/dev/zero",
-			(uint64_t)0x0,
-			zero_size);
-
-	cmdline_ptr = RTE_PTR_ADD(cmdline_ptr, tmplen);
-	remaining_len -= tmplen;
-
-	if (remaining_len == 0) {
-		RTE_LOG(ERR, EAL, "Command line too long!\n");
-		rte_spinlock_unlock(&config->sl);
-		return -1;
-	}
-
-	/* add metadata file to the end of command-line */
-	tmplen = snprintf(cmdline_ptr, remaining_len, IVSHMEM_QEMU_CMD_FD_FMT,
-			cfg_file_path,
-			(uint64_t)0x0,
-			METADATA_SIZE_ALIGNED);
-
-	cmdline_ptr = RTE_PTR_ADD(cmdline_ptr, tmplen);
-	remaining_len -= tmplen;
-
-	if (remaining_len == 0) {
-		RTE_LOG(ERR, EAL, "Command line too long!\n");
-		rte_spinlock_unlock(&config->sl);
-		return -1;
-	}
-
-	/* if current length of the command line is bigger than the buffer supplied
-	 * by the user, or if command-line is bigger than what IVSHMEM accepts */
-	if ((sizeof(cmdline) - remaining_len) > size) {
-		RTE_LOG(ERR, EAL, "Buffer is too short!\n");
-		rte_spinlock_unlock(&config->sl);
-		return -1;
-	}
-	/* complete the command-line */
-	snprintf(buffer, size,
-			IVSHMEM_QEMU_CMD_LINE_HEADER_FMT,
-			total_size >> 20,
-			cmdline);
-
-	rte_spinlock_unlock(&config->sl);
-
-	return 0;
-}
-
-void
-rte_ivshmem_metadata_dump(FILE *f, const char *name)
-{
-	unsigned i = 0;
-	struct ivshmem_config * config;
-	struct rte_ivshmem_metadata_entry *entry;
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-	uint64_t addr;
-	uint64_t end, hugepage_sz;
-	struct memseg_cache_entry e;
-#endif
-
-	if (name == NULL)
-		return;
-
-	/* return error if we try to use an unknown config file */
-	config = get_config_by_name(name);
-	if (config == NULL) {
-		RTE_LOG(ERR, EAL, "Cannot find IVSHMEM config %s!\n", name);
-		return;
-	}
-
-	rte_spinlock_lock(&config->sl);
-
-	entry = &config->metadata->entry[0];
-
-	while (entry->mz.addr != NULL && i < RTE_DIM(config->metadata->entry)) {
-
-		fprintf(f, "Entry %u: name:<%-20s>, phys:0x%-15lx, len:0x%-15lx, "
-			"virt:%-15p, off:0x%-15lx\n",
-			i,
-			entry->mz.name,
-			entry->mz.phys_addr,
-			entry->mz.len,
-			entry->mz.addr,
-			entry->offset);
-		i++;
-
-#ifdef RTE_LIBRTE_IVSHMEM_DEBUG
-		fprintf(f, "\tHugepage files:\n");
-
-		hugepage_sz = entry->mz.hugepage_sz;
-		addr = RTE_ALIGN_FLOOR(entry->mz.addr_64, hugepage_sz);
-		end = addr + RTE_ALIGN_CEIL(entry->mz.len + (entry->mz.addr_64 - addr),
-				hugepage_sz);
-
-		for (; addr < end; addr += hugepage_sz) {
-			memset(&e, 0, sizeof(e));
-
-			get_hugefile_by_virt_addr(addr, &e);
-
-			fprintf(f, "\t0x%"PRIx64 "-0x%" PRIx64 " offset: 0x%" PRIx64 " %s\n",
-					addr, addr + hugepage_sz, e.offset, e.filepath);
-		}
-#endif
-		entry++;
-	}
-
-	rte_spinlock_unlock(&config->sl);
-}
diff --git a/lib/librte_ivshmem/rte_ivshmem.h b/lib/librte_ivshmem/rte_ivshmem.h
deleted file mode 100644
index a5d36d6..0000000
--- a/lib/librte_ivshmem/rte_ivshmem.h
+++ /dev/null
@@ -1,165 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef RTE_IVSHMEM_H_
-#define RTE_IVSHMEM_H_
-
-#include <rte_memzone.h>
-#include <rte_mempool.h>
-
-/**
- * @file
- *
- * The RTE IVSHMEM interface provides functions to create metadata files
- * describing memory segments to be shared via QEMU IVSHMEM.
- */
-
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define IVSHMEM_MAGIC 0x0BADC0DE
-#define IVSHMEM_NAME_LEN 32
-
-/**
- * Structure that holds IVSHMEM shared metadata entry.
- */
-struct rte_ivshmem_metadata_entry {
-	struct rte_memzone mz;	/**< shared memzone */
-	uint64_t offset;	/**< offset of memzone within IVSHMEM device */
-};
-
-/**
- * Structure that holds IVSHMEM metadata.
- */
-struct rte_ivshmem_metadata {
-	int magic_number;				/**< magic number */
-	char name[IVSHMEM_NAME_LEN];	/**< name of the metadata file */
-	struct rte_ivshmem_metadata_entry entry[RTE_LIBRTE_IVSHMEM_MAX_ENTRIES];
-			/**< metadata entries */
-};
-
-/**
- * Creates metadata file with a given name
- *
- * @param name
- *  Name of metadata file to be created
- *
- * @return
- *  - On success, zero
- *  - On failure, a negative value
- */
-int rte_ivshmem_metadata_create(const char * name);
-
-/**
- * Adds memzone to a specific metadata file
- *
- * @param mz
- *  Memzone to be added
- * @param md_name
- *  Name of metadata file for the memzone to be added to
- *
- * @return
- *  - On success, zero
- *  - On failure, a negative value
- */
-int rte_ivshmem_metadata_add_memzone(const struct rte_memzone * mz,
-		const char * md_name);
-
-/**
- * Adds a ring descriptor to a specific metadata file
- *
- * @param r
- *  Ring descriptor to be added
- * @param md_name
- *  Name of metadata file for the ring to be added to
- *
- * @return
- *  - On success, zero
- *  - On failure, a negative value
- */
-int rte_ivshmem_metadata_add_ring(const struct rte_ring * r,
-		const char * md_name);
-
-/**
- * Adds a mempool to a specific metadata file
- *
- * @param mp
- *  Mempool to be added
- * @param md_name
- *  Name of metadata file for the mempool to be added to
- *
- * @return
- *  - On success, zero
- *  - On failure, a negative value
- */
-int rte_ivshmem_metadata_add_mempool(const struct rte_mempool * mp,
-		const char * md_name);
-
-
-/**
- * Generates the QEMU command-line for IVSHMEM device for a given metadata file.
- * This function is to be called after all the objects were added.
- *
- * @param buffer
- *  Buffer to be filled with the command line arguments.
- * @param size
- *  Size of the buffer.
- * @param name
- *  Name of metadata file to generate QEMU command-line parameters for
- *
- * @return
- *  - On success, zero
- *  - On failure, a negative value
- */
-int rte_ivshmem_metadata_cmdline_generate(char *buffer, unsigned size,
-		const char *name);
-
-
-/**
- * Dump all metadata entries from a given metadata file to the console.
- *
- * @param f
- *   A pointer to a file for output
- * @name
- *  Name of the metadata file to be dumped to console.
- */
-void rte_ivshmem_metadata_dump(FILE *f, const char *name);
-
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* RTE_IVSHMEM_H_ */
diff --git a/lib/librte_ivshmem/rte_ivshmem_version.map b/lib/librte_ivshmem/rte_ivshmem_version.map
deleted file mode 100644
index 5a393dd..0000000
--- a/lib/librte_ivshmem/rte_ivshmem_version.map
+++ /dev/null
@@ -1,12 +0,0 @@
-DPDK_2.0 {
-	global:
-
-	rte_ivshmem_metadata_add_mempool;
-	rte_ivshmem_metadata_add_memzone;
-	rte_ivshmem_metadata_add_ring;
-	rte_ivshmem_metadata_cmdline_generate;
-	rte_ivshmem_metadata_create;
-	rte_ivshmem_metadata_dump;
-
-	local: *;
-};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index eb28e11..1a0095b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -62,7 +62,6 @@ _LDLIBS-y += -L$(RTE_SDK_BIN)/lib
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
-_LDLIBS-$(CONFIG_RTE_LIBRTE_IVSHMEM)        += -lrte_ivshmem
 endif
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PIPELINE)       += -lrte_pipeline
-- 
1.9.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v1] doc: add template release notes for 16.11
@ 2016-07-29 11:23  6% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2016-07-29 11:23 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Add template release notes for DPDK 16.11 with inline
comments and explanations of the various sections.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_16_11.rst | 205 +++++++++++++++++++++++++++++++++
 2 files changed, 206 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_16_11.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 52c63b4..7e51b2c 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -36,6 +36,7 @@ Release Notes
     :numbered:
 
     rel_description
+    release_16_11
     release_16_07
     release_16_04
     release_2_2
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
new file mode 100644
index 0000000..a6e3307
--- /dev/null
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -0,0 +1,205 @@
+DPDK Release 16.11
+==================
+
+.. **Read this first.**
+
+   The text below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text: ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      firefox build/doc/html/guides/rel_notes/release_16_11.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to understand
+     the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like this.
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     This section is a comment. Make sure to start the actual text at the margin.
+
+
+Resolved Issues
+---------------
+
+.. This section should contain bug fixes added to the relevant sections. Sample format:
+
+   * **code/section Fixed issue in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description of the resolved issue in the past tense.
+     The title should contain the code/lib section like a commit message.
+     Add the entries in alphabetic order in the relevant sections below.
+
+   This section is a comment. Make sure to start the actual text at the margin.
+
+
+EAL
+~~~
+
+
+Drivers
+~~~~~~~
+
+
+Libraries
+~~~~~~~~~
+
+
+Examples
+~~~~~~~~
+
+
+Other
+~~~~~
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+   This section is a comment. Make sure to start the actual text at the margin.
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. Make sure to start the actual text at the margin.
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change that was announced in
+     the previous releases and made in this release. Use fixed width quotes for
+     ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. Make sure to start the actual text at the margin.
+
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+``
+   sign, like this:
+
+     libethdev.so.4
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     libethdev.so.4
+     librte_acl.so.2
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_cryptodev.so.1
+     librte_distributor.so.1
+     librte_eal.so.2
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_ivshmem.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.2
+     librte_mempool.so.2
+     librte_meter.so.1
+     librte_pdump.so.1
+     librte_pipeline.so.3
+     librte_pmd_bond.so.1
+     librte_pmd_ring.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_table.so.2
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this release.
+
+   The format is:
+
+   #. Platform name.
+
+      * Platform details.
+      * Platform details.
+
+   This section is a comment. Make sure to start the actual text at the margin.
+
+
+Tested NICs
+-----------
+
+.. This section should contain a list of NICs that were tested with this release.
+
+   The format is:
+
+   #. NIC name.
+
+      * NIC details.
+      * NIC details.
+
+   This section is a comment. Make sure to start the actual text at the margin.
+
+
+Tested OSes
+-----------
+
+.. This section should contain a list of OSes that were tested with this release.
+   The format is as follows, in alphabetical order:
+
+   * CentOS 7.0
+   * Fedora 23
+   * Fedora 24
+   * FreeBSD 10.3
+   * Red Hat Enterprise Linux 7.2
+   * SUSE Enterprise Linux 12
+   * Ubuntu 15.10
+   * Ubuntu 16.04 LTS
+   * Wind River Linux 8
+
+   This section is a comment. Make sure to start the actual text at the margin.
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change of struct rte_port_source_params and rte_port_sink_params
  2016-07-27 10:42  7%     ` Thomas Monjalon
@ 2016-07-28 18:28  4%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-28 18:28 UTC (permalink / raw)
  To: Zhang, Roy Fan
  Cc: Dumitrescu, Cristian, dev, Panu Matilainen, Singh, Jasvinder

2016-07-27 12:42, Thomas Monjalon:
> 2016-07-27 10:08, Dumitrescu, Cristian:
> > As Thomas mentioned, today is probably the last day to discuss ABI changes. This one is pretty small and straightforward, any issues with it?
> > 
> > Panu had a concern that the change from "char *" to "const char *" is too small to be regarded as ABI breakage and we should simply go ahead and do it. My conservative proposal was to put a notice anyway.
> > 
> > Nonetheless, what I would like to get from Thomas and Panu is a path forward for this now:
> > a) If we agree to consider this an ABI change, please merge the notice for 16.7;
> 
> Panu was noticing 3 things (and I agree with them):
> - it is an API change
> - they can be grouped in only one list item
> - it is better to wait having more changes to break an API
> 
> About the third point, in this specific case, I think it is acceptable because:
> - it should not break the ABI
> - the impact of the API change is really small
> - I'm not sure the packet framework should be considered as a DPDK API.
> 
> > b) If we agree this is too small for an ABI change, please let us agree now
> > to accept our quick patch for 16.11 for this change.
> 
> For an API deprecation notice (reworded),
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-27  8:33  4%   ` Thomas Monjalon
@ 2016-07-28 18:04  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-28 18:04 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jerin.jacob, bruce.richardson

> > For 16.11, the mbuf structure will be modified implying ABI breakage.
> > Some discussions already took place here:
> > http://www.dpdk.org/dev/patchwork/patch/12878/
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: John Daley <johndale@cisco.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 16:25  7%                           ` Jerin Jacob
@ 2016-07-28 17:07  4%                             ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-28 17:07 UTC (permalink / raw)
  To: Jerin Jacob, Ananyev, Konstantin, Tomasz Kulasek; +Cc: dev

2016-07-28 21:55, Jerin Jacob:
> On Thu, Jul 28, 2016 at 04:52:45PM +0200, Thomas Monjalon wrote:
> > 2016-07-28 19:29, Jerin Jacob:
> > > Above things worries me, I wouldn't have cared if the changes are not comes
> > > in fastpath and I don't think this sort of issues will never get fixed any time
> > > soon in this community.
> > > 
> > > So I given up.
> > 
> > I feel something goes wrong here but I cannot understand your sentence.
> > Please could you reword/explain Jerin?
> 
> I guess you have removed the context from the email. Never mind.
> 
> 1) IMHO, Introducing a new fast path API which has "performance impact"
> on existing other PMD should get the consensus from the other PMD maintainers.
> At least, bare minimum, send a patch much in advance with the
> implementation of ethdev API as well as PMD
> driver implementation to get feedback from other developers _before_ ABI
> change announcement rather just debating on hypothetical points.

I totally agree with you and it was my first comment in this thread:
	http://dpdk.org/ml/archives/dev/2016-July/044366.html
Unfortunately it is difficult to have a formal process so it is not
so strict currently. You are welcome to suggest how to improve the
process for the next releases.

> 2) What I can understand from the discussion is that it is the
> workaround for an HW limitation.
> At this point, I am not sure tx_prep is the only way to address it and
> do other PMD have similar
> restriction?. If yes, Can we have abstract it in a proper way the usage
> part will be very clear from PMD and application perspective?

I feel the tx_prep can be interesting to solve a current problem.
However, as you say, there are maybe other solutions to consider.
That's why I think we can keep this deprecation notice and follow up
with a patch-based discussion. We will be able to discard this change
if something better is found.
As an example, we have just removed a deprecation notice which has
never been implemented:
	http://dpdk.org/browse/dpdk/commit/?id=16695af340
I know this process is not perfect and the ethdev API is far from perfect,
so we must continue improving our process to define a good API.

Konstantin, Tomasz,
I generally prefer waiting for a consensus. For this case, I'll make an
exception and apply the deprecation notice.
Please make an effort to better explain your next patches and meet
a clear consensus. We'll review your patches very carefully and keep
the right to reject them.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 14:52  4%                         ` Thomas Monjalon
@ 2016-07-28 16:25  7%                           ` Jerin Jacob
  2016-07-28 17:07  4%                             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-07-28 16:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Ananyev, Konstantin, dev

On Thu, Jul 28, 2016 at 04:52:45PM +0200, Thomas Monjalon wrote:
> 2016-07-28 19:29, Jerin Jacob:
> > Above things worries me, I wouldn't have cared if the changes are not comes
> > in fastpath and I don't think this sort of issues will never get fixed any time
> > soon in this community.
> > 
> > So I given up.
> 
> I feel something goes wrong here but I cannot understand your sentence.
> Please could you reword/explain Jerin?

I guess you have removed the context from the email. Never mind.

1) IMHO, Introducing a new fast path API which has "performance impact"
on existing other PMD should get the consensus from the other PMD maintainers.
At least, bare minimum, send a patch much in advance with the
implementation of ethdev API as well as PMD
driver implementation to get feedback from other developers _before_ ABI
change announcement rather just debating on hypothetical points.

2) What I can understand from the discussion is that it is the
workaround for an HW limitation.
At this point, I am not sure tx_prep is the only way to address it and
do other PMD have similar
restriction?. If yes, Can we have abstract it in a proper way the usage
part will be very clear from PMD and application perspective?

Jerin

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] doc: announce ivshmem support removal
  2016-07-28  9:20  3%   ` Christian Ehrhardt
@ 2016-07-28 15:23  0%     ` Mauricio Vasquez
  0 siblings, 0 replies; 200+ results
From: Mauricio Vasquez @ 2016-07-28 15:23 UTC (permalink / raw)
  To: Christian Ehrhardt, Jan Viktorin; +Cc: Thomas Monjalon, anatoly.burakov, dev

Hello All,

Here in Politecnico di Torino we use the ivshmem technology from a 
research point of view.

Our research efforts focus in optimizing the inter-Virtual Network 
Function communication, currently we have implemented two versions of 
our prototype, they are described in [1] and [2].

Unfortunately, we do not have the human resources to implement the 
improvements that are necessary for ivshmem in DPDK, however we could 
provide some feedback and testing for possible patches.

Best Regards,

Mauricio Vasquez.

[1] 
https://www.researchgate.net/publication/305699120_Transparent_Optimization_of_Inter-Virtual_Network_Function_Communication_in_Open_vSwitch

[2] 
https://www.researchgate.net/publication/305699122_A_Transparent_Highway_for_inter-Virtual_Network_Function_Communication_with_Open_vSwitch


On 07/28/2016 11:20 AM, Christian Ehrhardt wrote:
> Hi Thomas,
> just my two cents as Ubuntu DPDK maintainer (and part of the Debian Team
> that does the same).
> (It seems I can reuse that line for all posts about the deprecation notices
> :-) )
>
> While IVSHMEM was enabled (as it was the default) I never heard of any
> users of what we provided so far.
> But that is expected considering that not all qemu bits are landed either.
> Since it will follow the process of "deprecation notice -> grace period ->
> ABI bump" on removal, I think packaging and consuming applications should
> be fine.
>
> I'd agree to Hiroshi that, if really needed a clean re-implementation more
> properly separated from EAL likely is the best way to go.
>
> I think it is a good change to drop rather complex code in favor of
> stabilizing the main paths:
> Acked-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
>
>
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
>
> On Wed, Jul 27, 2016 at 9:08 PM, Jan Viktorin <viktorin@rehivetech.com>
> wrote:
>
>> On Wed, 20 Jul 2016 18:35:46 +0200
>> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>>
>>> There was a prior call with an explanation of what needs to be done:
>>>        http://dpdk.org/ml/archives/dev/2016-June/040844.html
>>> - Qemu patch upstreamed
>>> - IVSHMEM PCI device managed by a PCI driver
>>> - No DPDK objects (ring/mempool) allocated by EAL
>>>
>>> As nobody seems interested, it is time to remove this code which
>>> makes EAL improvements harder.
>>>
>>> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
>>> Acked-by: David Marchand <david.marchand@6wind.com>
>>> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Acked-by: Jan Viktorin <viktorin@rehivetech.com
>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption
  @ 2016-07-28 14:54  3%     ` Alex Williamson
  0 siblings, 0 replies; 200+ results
From: Alex Williamson @ 2016-07-28 14:54 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: Thomas Monjalon, dev

On Thu, 28 Jul 2016 09:42:13 +0000
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

> > Hi,
> > 
> > 2016-07-27 16:14, Alex Williamson:  
> > > I took a quick look at the dpdk vfio code and spotted an invalid
> > > assumption that should probably be corrected ASAP.  
> > 
> > It can theoretically be a bug but the value may never change in the kernel,
> > right?
> > So when you say ASAP, I feel it can wait the next DPDK release (we plan to
> > release today).
> > Do you agree?  
> 
> Unless there are imminent plans to change this in the kernel, I think it can wait for next release.

I don't have any plans to change it, but this relationship is not a
guaranteed part of the ABI.  I reserve the right to make such changes
in the future.  Thanks,

Alex

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:59  4%                       ` Jerin Jacob
@ 2016-07-28 14:52  4%                         ` Thomas Monjalon
  2016-07-28 16:25  7%                           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-28 14:52 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Ananyev, Konstantin, dev

2016-07-28 19:29, Jerin Jacob:
> Above things worries me, I wouldn't have cared if the changes are not comes
> in fastpath and I don't think this sort of issues will never get fixed any time
> soon in this community.
> 
> So I given up.

I feel something goes wrong here but I cannot understand your sentence.
Please could you reword/explain Jerin?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:58  4%                       ` Olivier MATZ
@ 2016-07-28 14:21  4%                         ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2016-07-28 14:21 UTC (permalink / raw)
  To: Olivier MATZ, Jerin Jacob; +Cc: Thomas Monjalon, dev

Hi Olivier,

> 
> Hi,
> 
> Jumping into this thread, it looks it's the last pending patch remaining for the release.
> 
> For reference, the idea of tx_prep() was mentionned some time ago in http://dpdk.org/ml/archives/dev/2014-May/002504.html
> 
> Few comments below.
> 
> On 07/28/2016 03:01 PM, Ananyev, Konstantin wrote:
> > Right now to make HW TX offloads to work user is required to do particular actions:
> > 1. set mbuf.ol_flags properly.
> > 2. setup mbuf.tx_offload fields properly.
> > 3. update L3/L4 header fields in a particular way.
> >
> > We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
> > Though if he still prefers to do #3  by himself - that's ok too.
> 
> I think moving #3 out of the application is a good idea. Today, for TSO, the offload dpdk API requires to set a specific pseudo header
> checksum (which does not include the ip len, as expected by Intel drivers), and set the IP checksum to 0.
> 
> In our app, the network stack sets the TCP checksum to the standard pseudo header checksum, and before sending the mbuf:
> - packets are split in sw if the driver does not support tso
> - the tcp csum is patched to match dpdk api if the driver supports tso
> 
> In the patchs I've recently sent adding tso support for virtio-pmd, it conforms to the dpdk API (phdr csum without ip len), so the tx function
> need to patch the mbuf data inside the driver, which is something what we want to avoid, for some good reasons explained by Konstantin.

Yep, that would be another good use-case for tx_prep() I suppose.

> 
> So, I think having a tx_prep would also be the occasion to standardize a bit the dpdk offload api, and let the driver-specific stuff in tx_prep().
> 
> Konstantin, any opinion about this?

Yes, that sounds like a good thing to me.

> 
> >>> Another main purpose of tx_prep(): for multi-segment packets is to
> >>> check that number of segments doesn't exceed  HW limit.
> >>> Again right now users have to do that on their own.
> 
> If calling tx_prep() is optional, does it mean that this check may be done twice? (once in tx_prep() and once in tx_burst())

I meant 'optional' in a way, that if user doesn't want to use tx_prep() and
do step #3 from the above on his own (what happens now) that is still ok.
But I think step #3 (modify packet's data) still needs to be done before tx_burst()  is called for the packets.

> 
> What would be the advantage of doing this check in tx_prep() instead of keeping it in tx_burst(), as it does not touch the mbuf data?
> 
> >>> 3.  Having it a s separate function would allow user control when/where
> >>>        to call it, let say only for some packets, or probably call tx_prep()
> >>>        on one core, and do actual tx_burst() for these packets on the other.
> 
> Yes, from what I remember, the pipeline model was the main reason why we do not just modify the packet in tx_burst(). Right?

Yes.

> 
> >>>>> If you refer as lost cycles here something like:
> >>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
> >>>>> yes.
> >>>>> Though comparing to actual work need to be done for most HW TX
> >>>>> offloads, I think it is neglectable.
> >>>>
> >>>> Not sure.
> 
> I think doing this test on a per-bulk basis should not impact performance.
> 
> > To be honest, I don't understand what is your concern here.
> > That proposed change doesn't break any existing functionality, doesn't
> > introduce any new requirements to the existing API, and wouldn't
> > introduce any performance regression for existing apps.
> > It is a an extension, and user is free not to use it, if it doesn't fit his needs.
> >  From other side there are users who are interested in that
> > functionality, and they do have use-cases for  it.
> 
> In my opinion, using tx_prep() will implicitly become mandatory as soon as the application want to do offload. An application that won't use
> it will have to prepare the mbuf, and this preparation will depend on the device, which is not acceptable inside an application.

Yes, I also hope that most apps that do use TX offloads will start to use it,
as I think it will be much more convenient way, then what we have right now.
I just liked to emphasis that user wouldn't be forced to.
Konstantin

> 
> 
> So, to conclude, the api change notification looks good to me, even if there is still some room to discuss the implementation details.
> 
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:01  4%                     ` Ananyev, Konstantin
  2016-07-28 13:58  4%                       ` Olivier MATZ
@ 2016-07-28 13:59  4%                       ` Jerin Jacob
  2016-07-28 14:52  4%                         ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Jerin Jacob @ 2016-07-28 13:59 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

On Thu, Jul 28, 2016 at 01:01:16PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > > >
> > > > Not according to proposal. It can't be too as application has no
> > > > idea what PMD driver does with "prep" what is the implication on a
> > > > HW if application does not
> > >
> > > Why application writer wouldn't have an idea?
> > > We would document what tx_prep() supposed to do, and in what cases user don't need it.
> > 
> > But how he/she detect that on that run-time ?
> 
> By the application logic for example.
> If let say is doing the l2fwd for that group of packets, it would know
> that it doesn't need to do tx_prep().
> 
> To be honest, I don't understand what is your concern here.
> That proposed change doesn't break any existing functionality,
> doesn't introduce any new requirements to the existing API, 
> and wouldn't introduce any performance regression for existing apps.

Yes for the existing application but no for ANY application that uses tx_prep() in future,
that run on the PMD where callback is NULL(one/two PMDs vs N PMDs)

> It is a an extension, and user is free not to use it, if it doesn't fit his needs.

If it is a workaround for a specific HW then why to change the normative
"fastpath" ethdev specification. You could give your fixup as internal
PMD driver routine and be done with it. It is as simple as that.

> From other side there are users who are interested in that functionality,
> and they do have use-cases for  it.
> So what worries you?
Above things worries me, I wouldn't have cared if the changes are not comes
in fastpath and I don't think this sort of issues will never get fixed any time
soon in this community.

So I given up.

Jerin

> Konstantin
> 
> > 
> > > Then it would be up to the user:
> > > - not to use it at all (one segment per packet, no HW TX offloads)
> > 
> > We already have TX flags for that
> > 
> > > - not to use tx_prep(), and make necessary preparations himself,
> > >   that what people have to do now.
> > > - use tx_prep()
> > >
> > > Konstantin
> > >

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 13:01  4%                     ` Ananyev, Konstantin
@ 2016-07-28 13:58  4%                       ` Olivier MATZ
  2016-07-28 14:21  4%                         ` Ananyev, Konstantin
  2016-07-28 13:59  4%                       ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Olivier MATZ @ 2016-07-28 13:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, Jerin Jacob; +Cc: Thomas Monjalon, dev

Hi,

Jumping into this thread, it looks it's the last pending patch remaining 
for the release.

For reference, the idea of tx_prep() was mentionned some time ago in
http://dpdk.org/ml/archives/dev/2014-May/002504.html

Few comments below.

On 07/28/2016 03:01 PM, Ananyev, Konstantin wrote:
> Right now to make HW TX offloads to work user is required to do particular actions:
> 1. set mbuf.ol_flags properly.
> 2. setup mbuf.tx_offload fields properly.
> 3. update L3/L4 header fields in a particular way.
>
> We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
> Though if he still prefers to do #3  by himself - that's ok too.

I think moving #3 out of the application is a good idea. Today, for TSO, 
the offload dpdk API requires to set a specific pseudo header checksum 
(which does not include the ip len, as expected by Intel drivers), and 
set the IP checksum to 0.

In our app, the network stack sets the TCP checksum to the standard 
pseudo header checksum, and before sending the mbuf:
- packets are split in sw if the driver does not support tso
- the tcp csum is patched to match dpdk api if the driver supports tso

In the patchs I've recently sent adding tso support for virtio-pmd, it 
conforms to the dpdk API (phdr csum without ip len), so the tx function 
need to patch the mbuf data inside the driver, which is something what 
we want to avoid, for some good reasons explained by Konstantin.

So, I think having a tx_prep would also be the occasion to standardize a 
bit the dpdk offload api, and let the driver-specific stuff in tx_prep().

Konstantin, any opinion about this?

>>> Another main purpose of tx_prep(): for multi-segment packets is to
>>> check that number of segments doesn't exceed  HW limit.
>>> Again right now users have to do that on their own.

If calling tx_prep() is optional, does it mean that this check may be 
done twice? (once in tx_prep() and once in tx_burst())

What would be the advantage of doing this check in tx_prep() instead of 
keeping it in tx_burst(), as it does not touch the mbuf data?

>>> 3.  Having it a s separate function would allow user control when/where
>>>        to call it, let say only for some packets, or probably call tx_prep()
>>>        on one core, and do actual tx_burst() for these packets on the other.

Yes, from what I remember, the pipeline model was the main reason why we 
do not just modify the packet in tx_burst(). Right?

>>>>> If you refer as lost cycles here something like:
>>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
>>>>> yes.
>>>>> Though comparing to actual work need to be done for most HW TX
>>>>> offloads, I think it is neglectable.
>>>>
>>>> Not sure.

I think doing this test on a per-bulk basis should not impact performance.

> To be honest, I don't understand what is your concern here.
> That proposed change doesn't break any existing functionality,
> doesn't introduce any new requirements to the existing API,
> and wouldn't introduce any performance regression for existing apps.
> It is a an extension, and user is free not to use it, if it doesn't fit his needs.
>  From other side there are users who are interested in that functionality,
> and they do have use-cases for  it.

In my opinion, using tx_prep() will implicitly become mandatory as soon 
as the application want to do offload. An application that won't use it 
will have to prepare the mbuf, and this preparation will depend on the 
device, which is not acceptable inside an application.


So, to conclude, the api change notification looks good to me, even if 
there is still some room to discuss the implementation details.


Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 11:38  4%                   ` Jerin Jacob
  2016-07-28 12:07  4%                     ` Avi Kivity
@ 2016-07-28 13:01  4%                     ` Ananyev, Konstantin
  2016-07-28 13:58  4%                       ` Olivier MATZ
  2016-07-28 13:59  4%                       ` Jerin Jacob
  1 sibling, 2 replies; 200+ results
From: Ananyev, Konstantin @ 2016-07-28 13:01 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Thomas Monjalon, dev



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, July 28, 2016 12:39 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> 
> On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
> > > If it does not cope up then it can skip tx'ing in the actual tx
> > > burst itself and move the "skipped" tx packets to end of the list in
> > > the tx burst so that application can take the action on "skipped"
> > > packet after the tx burst
> >
> > Sorry, that's too cryptic for me.
> > Can you reword it somehow?
> 
> OK.
> 1) lets say application requests 32 packets to send it using tx_burst.
> 2) packets are from p0 to p31
> 3) in driver due to some reason, it is not able to send the packets due to some constraints in the driver(say expect p2 and p16 everything
> else sent successfully by the driver)
> 4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and return 30
> 5) application can take action on p2 and p16 based the return value of 30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]

That would introduce packets reordering and unnecessary complicate the PMD TX functions.
Again it would require changes in *all* existing PMD tx functions.
So we don't plan to do things that way.

> 
> 
> >
> > >
> > >
> > > > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > > > Please read the original Tomasz's patch, I think he explained
> > > > possible use-cases with lot of details.
> > >
> > > Sorry, it is not very clear in terms of use cases.
> >
> > Ok, what I meant to say:
> > Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
> > - setup ipv4 header cksum field.
> > - calculate the pseudo header checksum
> > - setup tcp/udp cksum field.
> >
> > Rules how these calculations need to be done and which fields need to
> > be updated, may vary depending on HW underneath and requested offloads.
> > tx_prep() - supposed to hide all these nuances from user and allow him
> > to use TX HW offloads in a transparent way.
> 
> Not sure I understand it completely. Bit contradicting with below statement
> |We would document what tx_prep() supposed to do, and in what cases user
> |don't need it.

How that contradicts?
Right now to make HW TX offloads to work user is required to do particular actions:
1. set mbuf.ol_flags properly.
2. setup mbuf.tx_offload fields properly.
3. update L3/L4 header fields in a particular way.

We move #3 into tx_prep(), to hide that complexity from the user simplify things for him.
Though if he still prefers to do #3  by himself - that's ok too.
 
> 
> How about introducing a new ethdev generic eal command-line mode OR new ethdev_configure hint that PMD driver is in "tx_prep-
> >tx_burst" mode instead of just tx_burst? That way no fast-path performance degradation for the PMD that does not need it

User might want to send different packets over different devices,
or even over different queues over the same device.
Or even he might want to call tx_prep() for one group of packets,
but skip for different group of packets for the same TX queue. 
So I think we should allow user to decide when/where to call it.

> 
> 
> >
> > Another main purpose of tx_prep(): for multi-segment packets is to
> > check that number of segments doesn't exceed  HW limit.
> > Again right now users have to do that on their own.
> >
> > >
> > > In HW perspective, It it tries to avoid the illegal state. But not
> > > sure calling "back to back" tx prepare and then tx burst how does it
> > > improve the situation as the check illegal state check introduce in
> > > actual tx burst it self.
> > >
> > > In SW perspective, its try to avoid sending malformed packets. In my
> > > view the same can achieved with existing tx burst it self as PMD is
> > > the one finally send the packets on the wire.
> >
> > Ok, so your question is: why not to put that functionality into
> > tx_burst() itself, right?
> > For few reasons:
> > 1. putting that functionality into tx_burst() would introduce unnecessary
> >     slowdown for cases when that functionality is not needed
> >     (one segment per packet, no HW offloads).
> 
> These parameters can be configured on init time

No always, see above.

> 
> > 2. User might don't want to use tx_prep() - he/she might have its
> >     own analog, which he/she belives is faster/smarter,etc.
> 
> That's the current mode. Right?

Yes.

> > 3.  Having it a s separate function would allow user control when/where
> >       to call it, let say only for some packets, or probably call tx_prep()
> >       on one core, and do actual tx_burst() for these packets on the other.


> Why to process it under tx_prep() as application can always process the packet in one core

Because not every application wants to do it over the same core.
Some apps would like to do it on the same core, some apps would like to do it on different core.
With proposed API both models are possible.

> 
> > >
> > > proposal quote:
> > >
> > > 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
> > >    packet burst to be safely transmitted on device for desired HW
> > >    offloads (set/reset checksum field according to the hardware
> > >    requirements) and check HW constraints (number of segments per
> > >    packet, etc).
> > >
> > >    While the limitations and requirements may differ for devices, it
> > >    requires to extend rte_eth_dev structure with new function pointer
> > >    "tx_pkt_prep" which can be implemented in the driver to prepare and
> > >    verify packets, in devices specific way, before burst, what should to
> > >    prevent application to send malformed packets.
> > >
> > >
> > > >
> > > > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> > > >
> > > > If you refer as lost cycles here something like:
> > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
> > > > yes.
> > > > Though comparing to actual work need to be done for most HW TX
> > > > offloads, I think it is neglectable.
> > >
> > > Not sure.
> > >
> > > > Again, as I said before, it is totally voluntary for the application.
> > >
> > > Not according to proposal. It can't be too as application has no
> > > idea what PMD driver does with "prep" what is the implication on a
> > > HW if application does not
> >
> > Why application writer wouldn't have an idea?
> > We would document what tx_prep() supposed to do, and in what cases user don't need it.
> 
> But how he/she detect that on that run-time ?

By the application logic for example.
If let say is doing the l2fwd for that group of packets, it would know
that it doesn't need to do tx_prep().

To be honest, I don't understand what is your concern here.
That proposed change doesn't break any existing functionality,
doesn't introduce any new requirements to the existing API, 
and wouldn't introduce any performance regression for existing apps.
It is a an extension, and user is free not to use it, if it doesn't fit his needs.
>From other side there are users who are interested in that functionality,
and they do have use-cases for  it.
So what worries you?
Konstantin

> 
> > Then it would be up to the user:
> > - not to use it at all (one segment per packet, no HW TX offloads)
> 
> We already have TX flags for that
> 
> > - not to use tx_prep(), and make necessary preparations himself,
> >   that what people have to do now.
> > - use tx_prep()
> >
> > Konstantin
> >

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] DPDK Stable Releases and Long Term Support
@ 2016-07-28 12:33  3% Mcnamara, John
  2016-08-17 12:29  5% ` Panu Matilainen
  0 siblings, 1 reply; 200+ results
From: Mcnamara, John @ 2016-07-28 12:33 UTC (permalink / raw)
  To: dev


This document sets out the guidelines for DPDK Stable Releases and Long Term
Support releases (LTS) based on the initial RFC and comments:
http://dpdk.org/ml/archives/dev/2016-June/040256.html.

In particular it incorporates suggestions for a Stable Release structure as
well as a Long Term Support release.


Introduction
------------

The purpose of the DPDK Stable Releases will be to maintain releases of DPDK
with backported fixes over an extended period of time. This will provide
downstream consumers of DPDK with a stable target on which to base
applications or packages.

The Long Term Support release (LTS) will be a designation applied to a Stable
Release to indicate longer support.


Stable Releases
---------------

Any major release of DPDK can be designated as a Stable Release if a
maintainer volunteers to maintain it.

A Stable Release will be used to backport fixes from a N release back to a N-1
release, for example, from 16.11 to 16.07.

The duration of a stable release should be one complete release cycle. It can
be longer, up to 1 year, if a maintainer continues to support the stable
branch, or if users supply backported fixes, however the explicit commitment
should be for one release cycle.

The release cadence can be determined by the maintainer based on the number of
bugfixes and the criticality of the bugs. However, releases should be
coordinated with the validation engineers to ensure that a tagged release has
been tested.


LTS Release
-----------

A stable release can be designated as an LTS release based on community
agreement and a commitment from a maintainer. An LTS release will have a
maintenance duration of 2 years.

It is anticipated that there should be at least 4 releases per year of the LTS
or approximately 1 every 3 months. However, the cadence can be shorter or
longer depending on the number and criticality of the backported
fixes. Releases should be coordinated with the validation engineers to ensure
that a tagged release has been tested.


Initial Stable Release
----------------------

The initial DPDK Stable Release will be 16.07. It will be viewed as a trial of
the Stable Release/LTS policy to determine what are the best working practices
for DPDK.

The maintainer for the initial release will be Yuanhan Liu
<yuanhan.liu@linux.intel.com>. It is hoped that other community members will
volunteer as maintainers for other Stable Releases.

The initial targeted release for LTS is proposed to be 16.11 based on the
results of the work carried out on the 16.07 Stable Release.

A list has been set up for Stable Release/LTS specific discussions:
<stable@dpdk.org>. This address can also be used for CCing maintainers on bug
fix submissions.


What changes should be backported
---------------------------------

The backporting should be limited to bug fixes.

Features should not be backported to stable releases. It may be acceptable, in
limited cases, to back port features for the LTS release where:

* There is a justifiable use case (for example a new PMD).
* The change is non-invasive.
* The work of preparing the backport is done by the proposer.
* There is support within the community.


Testing
-------

Stable and LTS releases should be tested before release/tagging.

Intel will provide validation engineers to test the 16.07 Stable Release and
the initial LTS tree. Other community members should provide testing for other
stable releases.

The validation will consist of compilation testing on the range of OSes
supported by the master release and functional/performance testing on the
current major/LTS release of the following OSes:

* Ubuntu
* RHEL
* SuSE
* FreeBSD


Releasing
---------

A Stable Release will be released by:

* Tagging the release with YY.MM.nn (year, month, number) or similar.
* Uploading a tarball of the release to dpdk.org.
* Sending an announcement to the <announce@dpdk.org> list.


ABI
---

The Stable Release should not be seen as a way of breaking or circumventing
the DPDK ABI policy.


Review of the Stable Release/LTS guidelines
-------------------------------------------

This document serves as a set of guidelines for the planned Stable
Releases/LTS activities. However, the actual process can be reviewed and
amended over time, based on experiences and feedback.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 11:38  4%                   ` Jerin Jacob
@ 2016-07-28 12:07  4%                     ` Avi Kivity
  2016-07-28 13:01  4%                     ` Ananyev, Konstantin
  1 sibling, 0 replies; 200+ results
From: Avi Kivity @ 2016-07-28 12:07 UTC (permalink / raw)
  To: Jerin Jacob, Ananyev, Konstantin; +Cc: Thomas Monjalon, dev



On 07/28/2016 02:38 PM, Jerin Jacob wrote:
> On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
>>> If it does not cope up then it can skip tx'ing in the actual tx burst
>>> itself and move the "skipped" tx packets to end of the list in the tx
>>> burst so that application can take the action on "skipped" packet after
>>> the tx burst
>> Sorry, that's too cryptic for me.
>> Can you reword it somehow?
> OK.
> 1) lets say application requests 32 packets to send it using tx_burst.
> 2) packets are from p0 to p31
> 3) in driver due to some reason, it is not able to send the packets due to some
> constraints in the driver(say expect p2 and p16 everything else sent
> successfully by the driver)
> 4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and
> return 30
> 5) application can take action on p2 and p16 based the return value of
> 30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]

That can cause reordering; while it is legal, it reduces tcp performance.

Better to preserve the application-provided order.


>
>>>
>>>> Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
>>>> Please read the original Tomasz's patch, I think he explained possible use-cases
>>>> with lot of details.
>>> Sorry, it is not very clear in terms of use cases.
>> Ok, what I meant to say:
>> Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
>> - setup ipv4 header cksum field.
>> - calculate the pseudo header checksum
>> - setup tcp/udp cksum field.
>>
>> Rules how these calculations need to be done and which fields need to be updated,
>> may vary depending on HW underneath and requested offloads.
>> tx_prep() - supposed to hide all these nuances from user and allow him to use TX HW offloads
>> in a transparent way.
> Not sure I understand it completely. Bit contradicting with below
> statement
> |We would document what tx_prep() supposed to do, and in what cases user
> |don't need it.
>
> How about introducing a new ethdev generic eal command-line mode OR
> new ethdev_configure hint that PMD driver is in "tx_prep->tx_burst" mode
> instead of just tx_burst? That way no fast-path performance degradation
> for the PMD that does not need it
>
>
>> Another main purpose of tx_prep(): for multi-segment packets is to check
>> that number of segments doesn't exceed  HW limit.
>> Again right now users have to do that on their own.
>>
>>> In HW perspective, It it tries to avoid the illegal state. But not sure
>>> calling "back to back" tx prepare and then tx burst how does it improve the
>>> situation as the check illegal state check introduce in actual tx burst
>>> it self.
>>>
>>> In SW perspective, its try to avoid sending malformed packets. In my
>>> view the same can achieved with existing tx burst it self as PMD is the
>>> one finally send the packets on the wire.
>> Ok, so your question is: why not to put that functionality into
>> tx_burst() itself, right?
>> For few reasons:
>> 1. putting that functionality into tx_burst() would introduce unnecessary
>>      slowdown for cases when that functionality is not needed
>>      (one segment per packet, no HW offloads).
> These parameters can be configured on init time
>
>> 2. User might don't want to use tx_prep() - he/she might have its
>>      own analog, which he/she belives is faster/smarter,etc.
> That's the current mode. Right?
>> 3.  Having it a s separate function would allow user control when/where
>>        to call it, let say only for some packets, or probably call tx_prep()
>>        on one core, and do actual tx_burst() for these packets on the other.
> Why to process it under tx_prep() as application can always process the
> packet in one core
>
>>> proposal quote:
>>>
>>> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>>>     packet burst to be safely transmitted on device for desired HW
>>>     offloads (set/reset checksum field according to the hardware
>>>     requirements) and check HW constraints (number of segments per
>>>     packet, etc).
>>>
>>>     While the limitations and requirements may differ for devices, it
>>>     requires to extend rte_eth_dev structure with new function pointer
>>>     "tx_pkt_prep" which can be implemented in the driver to prepare and
>>>     verify packets, in devices specific way, before burst, what should to
>>>     prevent application to send malformed packets.
>>>
>>>
>>>>> and what if the PMD does not implement that callback then it is of waste cycles. Right?
>>>> If you refer as lost cycles here something like:
>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
>>>> then yes.
>>>> Though comparing to actual work need to be done for most HW TX offloads,
>>>> I think it is neglectable.
>>> Not sure.
>>>
>>>> Again, as I said before, it is totally voluntary for the application.
>>> Not according to proposal. It can't be too as application has no idea
>>> what PMD driver does with "prep" what is the implication on a HW if
>>> application does not
>> Why application writer wouldn't have an idea?
>> We would document what tx_prep() supposed to do, and in what cases user don't need it.
> But how he/she detect that on that run-time ?
>
>> Then it would be up to the user:
>> - not to use it at all (one segment per packet, no HW TX offloads)
> We already have TX flags for that
>
>> - not to use tx_prep(), and make necessary preparations himself,
>>    that what people have to do now.
>> - use tx_prep()
>>
>> Konstantin
>>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 15:24 11% ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
  2016-07-21 22:48  4%   ` Ananyev, Konstantin
@ 2016-07-28 12:04  4%   ` Avi Kivity
  1 sibling, 0 replies; 200+ results
From: Avi Kivity @ 2016-07-28 12:04 UTC (permalink / raw)
  To: Tomasz Kulasek, dev; +Cc: Vladislav Zolotarov, Takuya ASADA

On 07/21/2016 06:24 PM, Tomasz Kulasek wrote:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
>
> As discussed in that thread:
>
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
>
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
>
>   - Max number of fragments per packet allowed
>   - Max number of fragments per TSO segments
>   - The way pseudo-header checksum should be pre-calculated
>   - L3/L4 header fields filling
>   - etc.
>
>
> MOTIVATION:
> -----------
>
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>     However, this work is sometimes required, and now, it's an
>     application issue.
>
> 2) Different hardware may have different requirements for TX offloads,
>     other subset can be supported and so on.
>
> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>     device. These parameters may be vary for different devices.
>
>     For example i40e HW allows 8 fragments per packet, but that is after
>     TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
>
> 4) Fields in packet may require different initialization (like eg. will
>     require pseudo-header checksum precalculation, sometimes in a
>     different way depending on packet type, and so on). Now application
>     needs to care about it.
>
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>     prepare packet burst in acceptable form for specific device.
>
> 6) Some additional checks may be done in debug mode keeping tx_burst
>     implementation clean.

Thanks a lot for this.  Seastar suffered from this issue and had to 
apply NIC-specific workarounds.

The proposal will work well for seastar.

>
> PROPOSAL:
> ---------
>
> To help user to deal with all these varieties we propose to:
>
> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>     packet burst to be safely transmitted on device for desired HW
>     offloads (set/reset checksum field according to the hardware
>     requirements) and check HW constraints (number of segments per
>     packet, etc).
>
>     While the limitations and requirements may differ for devices, it
>     requires to extend rte_eth_dev structure with new function pointer
>     "tx_pkt_prep" which can be implemented in the driver to prepare and
>     verify packets, in devices specific way, before burst, what should to
>     prevent application to send malformed packets.
>
> 2. Also new fields will be introduced in rte_eth_desc_lim:
>     nb_seg_max and nb_mtu_seg_max, providing an information about max
>     segments in TSO and non-TSO packets acceptable by device.
>
>     This information is useful for application to not create/limit
>     malicious packet.
>
>
> APPLICATION (CASE OF USE):
> --------------------------
>
> 1) Application should to initialize burst of packets to send, set
>     required tx offload flags and required fields, like l2_len, l3_len,
>     l4_len, and tso_segsz
>
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>     required to send packets through the NIC.
>
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>     and/or restore invalid if function fails.
>
> eg.
>
> 	for (i = 0; i < nb_pkts; i++) {
>
> 		/* initialize or process packet */
>
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
>
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
>
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
>
> 		/* drop or restore invalid packets */
>
> 	}
>
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
>
> 	/* Free any unsent packets. */
>
>
>
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>   doc/guides/rel_notes/deprecation.rst |    7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>   * The mempool functions for single/multi producer/consumer are deprecated and
>     will be removed in 16.11.
>     It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28 10:36  4%                 ` Ananyev, Konstantin
@ 2016-07-28 11:38  4%                   ` Jerin Jacob
  2016-07-28 12:07  4%                     ` Avi Kivity
  2016-07-28 13:01  4%                     ` Ananyev, Konstantin
  0 siblings, 2 replies; 200+ results
From: Jerin Jacob @ 2016-07-28 11:38 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

On Thu, Jul 28, 2016 at 10:36:07AM +0000, Ananyev, Konstantin wrote:
> > If it does not cope up then it can skip tx'ing in the actual tx burst
> > itself and move the "skipped" tx packets to end of the list in the tx
> > burst so that application can take the action on "skipped" packet after
> > the tx burst
> 
> Sorry, that's too cryptic for me.
> Can you reword it somehow?

OK. 
1) lets say application requests 32 packets to send it using tx_burst.
2) packets are from p0 to p31
3) in driver due to some reason, it is not able to send the packets due to some
constraints in the driver(say expect p2 and p16 everything else sent
successfully by the driver)
4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and
return 30
5) application can take action on p2 and p16 based the return value of
30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]


> 
> > 
> > 
> > > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > > Please read the original Tomasz's patch, I think he explained possible use-cases
> > > with lot of details.
> > 
> > Sorry, it is not very clear in terms of use cases.
> 
> Ok, what I meant to say:
> Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
> - setup ipv4 header cksum field.
> - calculate the pseudo header checksum
> - setup tcp/udp cksum field.
> 
> Rules how these calculations need to be done and which fields need to be updated,
> may vary depending on HW underneath and requested offloads.
> tx_prep() - supposed to hide all these nuances from user and allow him to use TX HW offloads
> in a transparent way.

Not sure I understand it completely. Bit contradicting with below
statement
|We would document what tx_prep() supposed to do, and in what cases user
|don't need it.

How about introducing a new ethdev generic eal command-line mode OR
new ethdev_configure hint that PMD driver is in "tx_prep->tx_burst" mode
instead of just tx_burst? That way no fast-path performance degradation
for the PMD that does not need it


> 
> Another main purpose of tx_prep(): for multi-segment packets is to check
> that number of segments doesn't exceed  HW limit.
> Again right now users have to do that on their own.
> 
> > 
> > In HW perspective, It it tries to avoid the illegal state. But not sure
> > calling "back to back" tx prepare and then tx burst how does it improve the
> > situation as the check illegal state check introduce in actual tx burst
> > it self.
> > 
> > In SW perspective, its try to avoid sending malformed packets. In my
> > view the same can achieved with existing tx burst it self as PMD is the
> > one finally send the packets on the wire.
> 
> Ok, so your question is: why not to put that functionality into
> tx_burst() itself, right?
> For few reasons:
> 1. putting that functionality into tx_burst() would introduce unnecessary
>     slowdown for cases when that functionality is not needed
>     (one segment per packet, no HW offloads).

These parameters can be configured on init time

> 2. User might don't want to use tx_prep() - he/she might have its
>     own analog, which he/she belives is faster/smarter,etc.

That's the current mode. Right?
> 3.  Having it a s separate function would allow user control when/where
>       to call it, let say only for some packets, or probably call tx_prep()
>       on one core, and do actual tx_burst() for these packets on the other. 
Why to process it under tx_prep() as application can always process the
packet in one core

> > 
> > proposal quote:
> > 
> > 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
> >    packet burst to be safely transmitted on device for desired HW
> >    offloads (set/reset checksum field according to the hardware
> >    requirements) and check HW constraints (number of segments per
> >    packet, etc).
> > 
> >    While the limitations and requirements may differ for devices, it
> >    requires to extend rte_eth_dev structure with new function pointer
> >    "tx_pkt_prep" which can be implemented in the driver to prepare and
> >    verify packets, in devices specific way, before burst, what should to
> >    prevent application to send malformed packets.
> > 
> > 
> > >
> > > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> > >
> > > If you refer as lost cycles here something like:
> > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> > > then yes.
> > > Though comparing to actual work need to be done for most HW TX offloads,
> > > I think it is neglectable.
> > 
> > Not sure.
> > 
> > > Again, as I said before, it is totally voluntary for the application.
> > 
> > Not according to proposal. It can't be too as application has no idea
> > what PMD driver does with "prep" what is the implication on a HW if
> > application does not
> 
> Why application writer wouldn't have an idea? 
> We would document what tx_prep() supposed to do, and in what cases user don't need it.

But how he/she detect that on that run-time ?

> Then it would be up to the user:
> - not to use it at all (one segment per packet, no HW TX offloads)

We already have TX flags for that

> - not to use tx_prep(), and make necessary preparations himself,
>   that what people have to do now.
> - use tx_prep()
> 
> Konstantin
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-28  2:13  4%               ` Jerin Jacob
@ 2016-07-28 10:36  4%                 ` Ananyev, Konstantin
  2016-07-28 11:38  4%                   ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-07-28 10:36 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Thomas Monjalon, dev



> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org;
> > > > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > > > rte_eth_dev structure
> > > > >
> > > > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > > > > ---
> > > > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > > > +structure will be
> > > > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > > > +verification
> > > > > > > > +  and processing of packet burst to meet HW specific
> > > > > > > > +requirements before
> > > > > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > > > +information about number of
> > > > > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > > > > >
> > > > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > > >
> > > > > > I think I understand you want to split the TX processing:
> > > > > > 	1/ modify/write in mbufs
> > > > > > 	2/ write in HW
> > > > > > and let application decide:
> > > > > > 	- where the TX prep is done (which core)
> > > > >
> > > > > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > > > > if all the time it needs to call before tx_burst then the PMD won't
> > > > > have/don't need this callback waste cycles in fast path.Is this the expected behavior ?
> > > > > Anything think it as compile time to make other PMDs wont suffer because of this change.
> > > >
> > > > Not sure what suffering you are talking about...
> > > > Current model - i.e. when application does preparations (or doesn't if
> > > > none is required) on its own and just call tx_burst() would still be there.
> > > > If the app doesn't want to use tx_prep() by some reason - that still
> > > > ok, and decision is up to the particular app.
> > >
> > > So my question is in what basics application decides to call the preparation.
> > > Can you tell me the use case in application perspective?
> >
> > I suppose one most common use-case when application uses HW TX offloads,
> > and don't' to cope on its own which L3/L4 header fields need to be filled
> > for that particular dev_type/hw_offload combination.
> 
> If it does not cope up then it can skip tx'ing in the actual tx burst
> itself and move the "skipped" tx packets to end of the list in the tx
> burst so that application can take the action on "skipped" packet after
> the tx burst

Sorry, that's too cryptic for me.
Can you reword it somehow?

> 
> 
> > Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> > Please read the original Tomasz's patch, I think he explained possible use-cases
> > with lot of details.
> 
> Sorry, it is not very clear in terms of use cases.

Ok, what I meant to say:
Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
- setup ipv4 header cksum field.
- calculate the pseudo header checksum
- setup tcp/udp cksum field.

Rules how these calculations need to be done and which fields need to be updated,
may vary depending on HW underneath and requested offloads.
tx_prep() - supposed to hide all these nuances from user and allow him to use TX HW offloads
in a transparent way.

Another main purpose of tx_prep(): for multi-segment packets is to check
that number of segments doesn't exceed  HW limit.
Again right now users have to do that on their own.

> 
> In HW perspective, It it tries to avoid the illegal state. But not sure
> calling "back to back" tx prepare and then tx burst how does it improve the
> situation as the check illegal state check introduce in actual tx burst
> it self.
> 
> In SW perspective, its try to avoid sending malformed packets. In my
> view the same can achieved with existing tx burst it self as PMD is the
> one finally send the packets on the wire.

Ok, so your question is: why not to put that functionality into
tx_burst() itself, right?
For few reasons:
1. putting that functionality into tx_burst() would introduce unnecessary
    slowdown for cases when that functionality is not needed
    (one segment per packet, no HW offloads).
2. User might don't want to use tx_prep() - he/she might have its
    own analog, which he/she belives is faster/smarter,etc.
3.  Having it a s separate function would allow user control when/where
      to call it, let say only for some packets, or probably call tx_prep()
      on one core, and do actual tx_burst() for these packets on the other. 
       
> 
> proposal quote:
> 
> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 
> >
> > > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> >
> > If you refer as lost cycles here something like:
> > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> > then yes.
> > Though comparing to actual work need to be done for most HW TX offloads,
> > I think it is neglectable.
> 
> Not sure.
> 
> > Again, as I said before, it is totally voluntary for the application.
> 
> Not according to proposal. It can't be too as application has no idea
> what PMD driver does with "prep" what is the implication on a HW if
> application does not

Why application writer wouldn't have an idea? 
We would document what tx_prep() supposed to do, and in what cases user don't need it.
Then it would be up to the user:
- not to use it at all (one segment per packet, no HW TX offloads)
- not to use tx_prep(), and make necessary preparations himself,
  that what people have to do now.
- use tx_prep()

Konstantin

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: remove deprecation notice related to new flow types
@ 2016-07-28 10:15  7% Rahul Lakkireddy
  0 siblings, 0 replies; 200+ results
From: Rahul Lakkireddy @ 2016-07-28 10:15 UTC (permalink / raw)
  To: dev; +Cc: Kumar Sanghvi, Nirranjan Kirubaharan

Remove deprecation notice pertaining to introduction of new flow
types in favor of a more generic filtering infrastructure proposal.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..7fc1185 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -23,10 +23,6 @@ Deprecation Notices
   do not need to care about the kind of devices that are being used, making it
   easier to add new buses later.
 
-* ABI changes are planned for adding four new flow types. This impacts
-  RTE_ETH_FLOW_MAX. The release 2.2 does not contain these ABI changes,
-  but release 2.3 will. [postponed]
-
 * The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
   are respectively replaced by PKT_RX_VLAN_STRIPPED and
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
-- 
2.5.3

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] removal of old deprecation notice for Chelsio filtering
  2016-07-28  8:29  4% [dpdk-dev] removal of old deprecation notice for Chelsio filtering Thomas Monjalon
@ 2016-07-28 10:12  0% ` Rahul Lakkireddy
  0 siblings, 0 replies; 200+ results
From: Rahul Lakkireddy @ 2016-07-28 10:12 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Kumar Sanghvi, Nirranjan Kirubaharan

Hi Thomas,

On Thursday, July 07/28/16, 2016 at 01:29:20 -0700, Thomas Monjalon wrote:
> Hi Rahul,
> 
> We still have this deprecation notice:
> 
> * ABI changes are planned for adding four new flow types. This impacts
>   RTE_ETH_FLOW_MAX. The release 2.2 does not contain these ABI changes,
>   but release 2.3 will. [postponed]
> 
> Do you agree that we can remove it now we have a better generic filtering
> API approach?
> 

Yes.  We can remove this deprecation notice in favor of the generic
filtering proposal in discussion.  I'll send a patch now.

Thanks,
Rahul

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] doc: announce renaming of ethdev library
  2016-07-28  9:29  3%   ` Christian Ehrhardt
@ 2016-07-28  9:52  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-28  9:52 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: Jan Viktorin, dev

2016-07-28 11:29, Christian Ehrhardt:
> Just curious, do we already know by looking ahead if ethdev will get an ABI
> bump anyway?
> So will the transition be:
> a) libethdev4 -> librte_ethdev5
> b)libethdev4 -> librte_ethdev4
> If it is b) would/should one provide a compat symlink then in your Opinion?

Good point.
We'll make a symlink if the version stay the same.
Maybe it will be bumped because of a rework of the hotplug API.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] doc: announce renaming of ethdev library
  @ 2016-07-28  9:29  3%   ` Christian Ehrhardt
  2016-07-28  9:52  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2016-07-28  9:29 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, Thomas Monjalon

Hi Thomas,
just my two cents as Ubuntu DPDK maintainer (and part of the Debian Team
that does the same).
(Yeah I really could reuse it three times :-) )

It will be a bit of effort to adapt, but should be no rocket-science.
I like that eventually the namespace will be cleaner.

Just curious, do we already know by looking ahead if ethdev will get an ABI
bump anyway?
So will the transition be:
a) libethdev4 -> librte_ethdev5
b)libethdev4 -> librte_ethdev4
If it is b) would/should one provide a compat symlink then in your Opinion?

Anyway, for now I think it is fair to say:
Acked-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 27, 2016 at 6:33 PM, Jan Viktorin <viktorin@rehivetech.com>
wrote:

> On Tue, 26 Jul 2016 18:22:21 +0200
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>
> > The right name of ethdev should be dpdk_netdev. However:
> > 1/ We are using rte_ prefix in the code and library names.
> > 2/ The API uses rte_ethdev
> > That's why 16.11 will just have the rte_ prefix prepended to
> > the library filename as every other libraries.
> >
> > Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> >
> Acked-by: Jan Viktorin <viktorin@rehivetech.com>
>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] doc: announce ivshmem support removal
  @ 2016-07-28  9:20  3%   ` Christian Ehrhardt
  2016-07-28 15:23  0%     ` Mauricio Vasquez
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2016-07-28  9:20 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: Thomas Monjalon, anatoly.burakov, dev

Hi Thomas,
just my two cents as Ubuntu DPDK maintainer (and part of the Debian Team
that does the same).
(It seems I can reuse that line for all posts about the deprecation notices
:-) )

While IVSHMEM was enabled (as it was the default) I never heard of any
users of what we provided so far.
But that is expected considering that not all qemu bits are landed either.
Since it will follow the process of "deprecation notice -> grace period ->
ABI bump" on removal, I think packaging and consuming applications should
be fine.

I'd agree to Hiroshi that, if really needed a clean re-implementation more
properly separated from EAL likely is the best way to go.

I think it is a good change to drop rather complex code in favor of
stabilizing the main paths:
Acked-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 27, 2016 at 9:08 PM, Jan Viktorin <viktorin@rehivetech.com>
wrote:

> On Wed, 20 Jul 2016 18:35:46 +0200
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>
> > There was a prior call with an explanation of what needs to be done:
> >       http://dpdk.org/ml/archives/dev/2016-June/040844.html
> > - Qemu patch upstreamed
> > - IVSHMEM PCI device managed by a PCI driver
> > - No DPDK objects (ring/mempool) allocated by EAL
> >
> > As nobody seems interested, it is time to remove this code which
> > makes EAL improvements harder.
> >
> > Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Acked-by: David Marchand <david.marchand@6wind.com>
> > Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
> Acked-by: Jan Viktorin <viktorin@rehivetech.com
>

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] removal of old deprecation notice for Chelsio filtering
@ 2016-07-28  8:29  4% Thomas Monjalon
  2016-07-28 10:12  0% ` Rahul Lakkireddy
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-28  8:29 UTC (permalink / raw)
  To: Rahul Lakkireddy; +Cc: Kumar Sanghvi, dev

Hi Rahul,

We still have this deprecation notice:

* ABI changes are planned for adding four new flow types. This impacts
  RTE_ETH_FLOW_MAX. The release 2.2 does not contain these ABI changes,
  but release 2.3 will. [postponed]

Do you agree that we can remove it now we have a better generic filtering
API approach?

Thank you

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption
  2016-07-27 22:14  3% [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption Alex Williamson
  2016-07-28  6:54  0% ` Thomas Monjalon
@ 2016-07-28  8:06  0% ` Santosh Shukla
  1 sibling, 0 replies; 200+ results
From: Santosh Shukla @ 2016-07-28  8:06 UTC (permalink / raw)
  To: Alex Williamson; +Cc: anatoly.burakov, dev

On Thu, Jul 28, 2016 at 03:44:57AM +0530, Alex Williamson wrote:
> Hi,
> 
> I took a quick look at the dpdk vfio code and spotted an invalid
> assumption that should probably be corrected ASAP.  That is:
> 
> lib/librte_eal/linuxapp/eal/eal_vfio.h:
> #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
> #define VFIO_GET_REGION_IDX(x) (x >> 40)

Yes. I agree. We need some way to carry essential vfio region info in pci_dev,
needed for pread/pwrite. currently, rte_intr_handle only has vfio_dev_fd but
thats not sufficient information. I stumbled while adding ioport support in vfio
and took a short path to define region_idx thatway. To get-rid of this, Possible
approach could be;
- add essential vfio region specific info (ie.. offset, idx, flag) in
  rte_intr_handle. 
- or pull dev_fd to rte_pci_device{}; and define region specific details.

Thanks.

> Region offset to index is an implementation detail of the kernel, the
> vfio API defines that the offset of a given region (BAR) is found via
> the offset field of struct vfio_region_info returned via the
> VFIO_DEVICE_GET_REGION_INFO ioctl.  You're free to cache the offset
> into any sort of local variable you like, but the kernel may change the
> implementation of region index to offset at any point in time.  This is
> explicitly not part of the ABI.  Is there a place to file a bug, or is
> this sufficient?  Thanks,
> 
> Alex

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption
  2016-07-27 22:14  3% [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption Alex Williamson
@ 2016-07-28  6:54  0% ` Thomas Monjalon
    2016-07-28  8:06  0% ` Santosh Shukla
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-28  6:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dev, anatoly.burakov

Hi,

2016-07-27 16:14, Alex Williamson:
> I took a quick look at the dpdk vfio code and spotted an invalid
> assumption that should probably be corrected ASAP.

It can theoretically be a bug but the value may never change in the kernel,
right?
So when you say ASAP, I feel it can wait the next DPDK release
(we plan to release today).
Do you agree?

> That is:
> 
> lib/librte_eal/linuxapp/eal/eal_vfio.h:
> #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
> #define VFIO_GET_REGION_IDX(x) (x >> 40)
> 
> Region offset to index is an implementation detail of the kernel, the
> vfio API defines that the offset of a given region (BAR) is found via
> the offset field of struct vfio_region_info returned via the
> VFIO_DEVICE_GET_REGION_INFO ioctl.  You're free to cache the offset
> into any sort of local variable you like, but the kernel may change the
> implementation of region index to offset at any point in time.  This is
> explicitly not part of the ABI.  Is there a place to file a bug, or is
> this sufficient?  Thanks,

Thanks for the report. This email is sufficient :)

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
                     ` (3 preceding siblings ...)
  2016-07-28  2:35  4%   ` John Daley (johndale)
@ 2016-07-28  2:39  4%   ` Jerin Jacob
  4 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2016-07-28  2:39 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, thomas.monjalon, bruce.richardson

On Wed, Jul 20, 2016 at 09:16:14AM +0200, Olivier Matz wrote:
> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
                     ` (2 preceding siblings ...)
  2016-07-27  9:34  4%   ` Ananyev, Konstantin
@ 2016-07-28  2:35  4%   ` John Daley (johndale)
  2016-07-28  2:39  4%   ` Jerin Jacob
  4 siblings, 0 replies; 200+ results
From: John Daley (johndale) @ 2016-07-28  2:35 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jerin.jacob, thomas.monjalon, bruce.richardson

> 
> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---

Acked-by: John Daley <johndale@cisco.com>

Also, definitely +1 on trying to get m->next into the first cache line.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 20:51  4%             ` Ananyev, Konstantin
@ 2016-07-28  2:13  4%               ` Jerin Jacob
  2016-07-28 10:36  4%                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-07-28  2:13 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, dev

On Wed, Jul 27, 2016 at 08:51:09PM +0000, Ananyev, Konstantin wrote:
> 
> > 
> > On Wed, Jul 27, 2016 at 05:33:01PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org;
> > > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > > rte_eth_dev structure
> > > >
> > > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > > > ---
> > > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > > +structure will be
> > > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > > +verification
> > > > > > > +  and processing of packet burst to meet HW specific
> > > > > > > +requirements before
> > > > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > > +information about number of
> > > > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > > > >
> > > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > >
> > > > > I think I understand you want to split the TX processing:
> > > > > 	1/ modify/write in mbufs
> > > > > 	2/ write in HW
> > > > > and let application decide:
> > > > > 	- where the TX prep is done (which core)
> > > >
> > > > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > > > if all the time it needs to call before tx_burst then the PMD won't
> > > > have/don't need this callback waste cycles in fast path.Is this the expected behavior ?
> > > > Anything think it as compile time to make other PMDs wont suffer because of this change.
> > >
> > > Not sure what suffering you are talking about...
> > > Current model - i.e. when application does preparations (or doesn't if
> > > none is required) on its own and just call tx_burst() would still be there.
> > > If the app doesn't want to use tx_prep() by some reason - that still
> > > ok, and decision is up to the particular app.
> > 
> > So my question is in what basics application decides to call the preparation.
> > Can you tell me the use case in application perspective?
> 
> I suppose one most common use-case when application uses HW TX offloads,
> and don't' to cope on its own which L3/L4 header fields need to be filled
> for that particular dev_type/hw_offload combination.

If it does not cope up then it can skip tx'ing in the actual tx burst
itself and move the "skipped" tx packets to end of the list in the tx
burst so that application can take the action on "skipped" packet after
the tx burst


> Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
> Please read the original Tomasz's patch, I think he explained possible use-cases 
> with lot of details.

Sorry, it is not very clear in terms of use cases.

In HW perspective, It it tries to avoid the illegal state. But not sure
calling "back to back" tx prepare and then tx burst how does it improve the
situation as the check illegal state check introduce in actual tx burst
it self.

In SW perspective, its try to avoid sending malformed packets. In my
view the same can achieved with existing tx burst it self as PMD is the
one finally send the packets on the wire.

proposal quote:

1. Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.


> 
> > and what if the PMD does not implement that callback then it is of waste cycles. Right?
> 
> If you refer as lost cycles here something like:
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> then yes.
> Though comparing to actual work need to be done for most HW TX offloads,
> I think it is neglectable.

Not sure.

> Again, as I said before, it is totally voluntary for the application.

Not according to proposal. It can't be too as application has no idea
what PMD driver does with "prep" what is the implication on a HW if
application does not

Jerin

> Konstantin 
> 
> > 
> > Jerin
> > 
> > 
> > > Konstantin
> > >
> > > >
> > > >
> > > > > 	- what to do if the TX prep fail
> > > > > So adding some processing in this first part becomes "not too
> > > > > expensive" or "manageable" from the application point of view.
> > > > >
> > > > > If I well understand the intent,
> > > > >
> > > > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except
> > > > > typos ;)

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption
@ 2016-07-27 22:14  3% Alex Williamson
  2016-07-28  6:54  0% ` Thomas Monjalon
  2016-07-28  8:06  0% ` Santosh Shukla
  0 siblings, 2 replies; 200+ results
From: Alex Williamson @ 2016-07-27 22:14 UTC (permalink / raw)
  To: anatoly.burakov; +Cc: dev

Hi,

I took a quick look at the dpdk vfio code and spotted an invalid
assumption that should probably be corrected ASAP.  That is:

lib/librte_eal/linuxapp/eal/eal_vfio.h:
#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
#define VFIO_GET_REGION_IDX(x) (x >> 40)

Region offset to index is an implementation detail of the kernel, the
vfio API defines that the offset of a given region (BAR) is found via
the offset field of struct vfio_region_info returned via the
VFIO_DEVICE_GET_REGION_INFO ioctl.  You're free to cache the offset
into any sort of local variable you like, but the kernel may change the
implementation of region index to offset at any point in time.  This is
explicitly not part of the ABI.  Is there a place to file a bug, or is
this sufficient?  Thanks,

Alex

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 17:41  4%           ` Jerin Jacob
@ 2016-07-27 20:51  4%             ` Ananyev, Konstantin
  2016-07-28  2:13  4%               ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-07-27 20:51 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Thomas Monjalon, dev


> 
> On Wed, Jul 27, 2016 at 05:33:01PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org;
> > > Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > rte_eth_dev structure
> > >
> > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > > ---
> > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > +structure will be
> > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > +verification
> > > > > > +  and processing of packet burst to meet HW specific
> > > > > > +requirements before
> > > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > +information about number of
> > > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > > >
> > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > >
> > > > I think I understand you want to split the TX processing:
> > > > 	1/ modify/write in mbufs
> > > > 	2/ write in HW
> > > > and let application decide:
> > > > 	- where the TX prep is done (which core)
> > >
> > > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > > if all the time it needs to call before tx_burst then the PMD won't
> > > have/don't need this callback waste cycles in fast path.Is this the expected behavior ?
> > > Anything think it as compile time to make other PMDs wont suffer because of this change.
> >
> > Not sure what suffering you are talking about...
> > Current model - i.e. when application does preparations (or doesn't if
> > none is required) on its own and just call tx_burst() would still be there.
> > If the app doesn't want to use tx_prep() by some reason - that still
> > ok, and decision is up to the particular app.
> 
> So my question is in what basics application decides to call the preparation.
> Can you tell me the use case in application perspective?

I suppose one most common use-case when application uses HW TX offloads,
and don't' to cope on its own which L3/L4 header fields need to be filled
for that particular dev_type/hw_offload combination.
Instead it just setups the ol_flags, fills tx_offload fields and calls tx_prep().
Please read the original Tomasz's patch, I think he explained possible use-cases 
with lot of details.  

> and what if the PMD does not implement that callback then it is of waste cycles. Right?

If you refer as lost cycles here something like:
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
then yes.
Though comparing to actual work need to be done for most HW TX offloads,
I think it is neglectable.
Again, as I said before, it is totally voluntary for the application.
Konstantin 

> 
> Jerin
> 
> 
> > Konstantin
> >
> > >
> > >
> > > > 	- what to do if the TX prep fail
> > > > So adding some processing in this first part becomes "not too
> > > > expensive" or "manageable" from the application point of view.
> > > >
> > > > If I well understand the intent,
> > > >
> > > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except
> > > > typos ;)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 17:33  4%         ` Ananyev, Konstantin
@ 2016-07-27 17:41  4%           ` Jerin Jacob
  2016-07-27 20:51  4%             ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-07-27 17:41 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Thomas Monjalon, Kulasek, TomaszX, dev

On Wed, Jul 27, 2016 at 05:33:01PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, July 27, 2016 6:11 PM
> > To: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> > 
> > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > > ---
> > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure
> > > > > +will be
> > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > +verification
> > > > > +  and processing of packet burst to meet HW specific requirements
> > > > > +before
> > > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information
> > > > > +about number of
> > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > >
> > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > >
> > > I think I understand you want to split the TX processing:
> > > 	1/ modify/write in mbufs
> > > 	2/ write in HW
> > > and let application decide:
> > > 	- where the TX prep is done (which core)
> > 
> > In what basics applications knows when and where to call tx_pkt_prep in fast path.
> > if all the time it needs to call before tx_burst then the PMD won't have/don't need this callback waste cycles in fast path.Is this the expected
> > behavior ?
> > Anything think it as compile time to make other PMDs wont suffer because of this change.
> 
> Not sure what suffering you are talking about...
> Current model - i.e. when application does preparations (or doesn't if none is required)
> on its own and just call tx_burst() would still be there.
> If the app doesn't want to use tx_prep() by some reason - that still ok,
> and decision is up to the particular app. 

So my question is in what basics application decides to call the preparation.
Can you tell me the use case in application perspective?
and what if the PMD does not implement that callback then it is of waste
cycles. Right?

Jerin


> Konstantin
> 
> > 
> > 
> > > 	- what to do if the TX prep fail
> > > So adding some processing in this first part becomes "not too
> > > expensive" or "manageable" from the application point of view.
> > >
> > > If I well understand the intent,
> > >
> > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except typos ;)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27 17:10  4%       ` Jerin Jacob
@ 2016-07-27 17:33  4%         ` Ananyev, Konstantin
  2016-07-27 17:41  4%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-07-27 17:33 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon; +Cc: Kulasek, TomaszX, dev



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, July 27, 2016 6:11 PM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
> 
> On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > ---
> > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure
> > > > +will be
> > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > +verification
> > > > +  and processing of packet burst to meet HW specific requirements
> > > > +before
> > > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information
> > > > +about number of
> > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > >
> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > I think I understand you want to split the TX processing:
> > 	1/ modify/write in mbufs
> > 	2/ write in HW
> > and let application decide:
> > 	- where the TX prep is done (which core)
> 
> In what basics applications knows when and where to call tx_pkt_prep in fast path.
> if all the time it needs to call before tx_burst then the PMD won't have/don't need this callback waste cycles in fast path.Is this the expected
> behavior ?
> Anything think it as compile time to make other PMDs wont suffer because of this change.

Not sure what suffering you are talking about...
Current model - i.e. when application does preparations (or doesn't if none is required)
on its own and just call tx_burst() would still be there.
If the app doesn't want to use tx_prep() by some reason - that still ok,
and decision is up to the particular app. 
Konstantin

> 
> 
> > 	- what to do if the TX prep fail
> > So adding some processing in this first part becomes "not too
> > expensive" or "manageable" from the application point of view.
> >
> > If I well understand the intent,
> >
> > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com> (except typos ;)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-27  8:59  4%     ` Thomas Monjalon
@ 2016-07-27 17:10  4%       ` Jerin Jacob
  2016-07-27 17:33  4%         ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-07-27 17:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Kulasek, TomaszX, dev, Ananyev,  Konstantin

On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > ---
> > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> > > +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> > > +  and processing of packet burst to meet HW specific requirements before
> > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > 
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> I think I understand you want to split the TX processing:
> 	1/ modify/write in mbufs
> 	2/ write in HW
> and let application decide:
> 	- where the TX prep is done (which core)

In what basics applications knows when and where to call tx_pkt_prep in fast path.
if all the time it needs to call before tx_burst then the PMD won't have/don't need this
callback waste cycles in fast path.Is this the expected behavior ?
Anything think it as compile time to make other PMDs wont suffer because
of this change.


> 	- what to do if the TX prep fail
> So adding some processing in this first part becomes "not too expensive" or
> "manageable" from the application point of view.
> 
> If I well understand the intent,
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> (except typos ;)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change of struct rte_port_source_params and rte_port_sink_params
  2016-07-27 10:08  9%   ` Dumitrescu, Cristian
@ 2016-07-27 10:42  7%     ` Thomas Monjalon
  2016-07-28 18:28  4%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-27 10:42 UTC (permalink / raw)
  To: Dumitrescu, Cristian, Zhang, Roy Fan
  Cc: dev, Panu Matilainen, Singh, Jasvinder

2016-07-27 10:08, Dumitrescu, Cristian:
> As Thomas mentioned, today is probably the last day to discuss ABI changes. This one is pretty small and straightforward, any issues with it?
> 
> Panu had a concern that the change from "char *" to "const char *" is too small to be regarded as ABI breakage and we should simply go ahead and do it. My conservative proposal was to put a notice anyway.
> 
> Nonetheless, what I would like to get from Thomas and Panu is a path forward for this now:
> a) If we agree to consider this an ABI change, please merge the notice for 16.7;

Panu was noticing 3 things (and I agree with them):
- it is an API change
- they can be grouped in only one list item
- it is better to wait having more changes to break an API

About the third point, in this specific case, I think it is acceptable because:
- it should not break the ABI
- the impact of the API change is really small
- I'm not sure the packet framework should be considered as a DPDK API.

> b) If we agree this is too small for an ABI change, please let us agree now
> to accept our quick patch for 16.11 for this change.

For an API deprecation notice (reworded),
Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>


> > -----Original Message-----
> > The ABI changes are planned for rte_port_source_params and
> > rte_port_sink_params, which will be supported from release 16.11. Here
> > announces that ABI changes in detail.
> > 
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> > ---
> > +* ABI will change for rte_port_source_params struct. The member
> > file_name
> > +  data type will be changed from char * to const char *. This change targets
> > +  release 16.11
> > +
> > +* ABI will change for rte_port_sink_params struct. The member file_name
> > +  data type will be changed from char * to const char *. This change targets
> > +  release 16.11

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change of struct rte_port_source_params and rte_port_sink_params
  @ 2016-07-27 10:08  9%   ` Dumitrescu, Cristian
  2016-07-27 10:42  7%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2016-07-27 10:08 UTC (permalink / raw)
  To: Thomas Monjalon, Panu Matilainen; +Cc: Zhang, Roy Fan, Singh, Jasvinder, dev

As Thomas mentioned, today is probably the last day to discuss ABI changes. This one is pretty small and straightforward, any issues with it?

Panu had a concern that the change from "char *" to "const char *" is too small to be regarded as ABI breakage and we should simply go ahead and do it. My conservative proposal was to put a notice anyway.

Nonetheless, what I would like to get from Thomas and Panu is a path forward for this now:
a) If we agree to consider this an ABI change, please merge the notice for 16.7;
b) If we agree this is too small for an ABI change, please let us agree now to accept our quick patch for 16.11 for this change.

Thanks,
Cristian


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> Sent: Thursday, May 19, 2016 3:19 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2] doc: announce ABI change of struct
> rte_port_source_params and rte_port_sink_params
> 
> The ABI changes are planned for rte_port_source_params and
> rte_port_sink_params, which will be supported from release 16.11. Here
> announces that ABI changes in detail.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index fffe9c7..4f3fefe 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -74,3 +74,11 @@ Deprecation Notices
>    a handle, like the way kernel exposes an fd to user for locating a
>    specific file, and to keep all major structures internally, so that
>    we are likely to be free from ABI violations in future.
> +
> +* ABI will change for rte_port_source_params struct. The member
> file_name
> +  data type will be changed from char * to const char *. This change targets
> +  release 16.11
> +
> +* ABI will change for rte_port_sink_params struct. The member file_name
> +  data type will be changed from char * to const char *. This change targets
> +  release 16.11
> --
> 2.5.5

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
  2016-07-20  8:54  4%   ` Ferruh Yigit
  2016-07-27  8:33  4%   ` Thomas Monjalon
@ 2016-07-27  9:34  4%   ` Ananyev, Konstantin
  2016-07-28  2:35  4%   ` John Daley (johndale)
  2016-07-28  2:39  4%   ` Jerin Jacob
  4 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2016-07-27  9:34 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jerin.jacob, thomas.monjalon, Richardson, Bruce



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Wednesday, July 20, 2016 8:16 AM
> To: dev@dpdk.org
> Cc: jerin.jacob@caviumnetworks.com; thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
> 
> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
> 
> v1->v2:
> - reword the sentences to keep things more open, as suggested by Bruce
> 
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..b9f5a93 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,9 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are deprecated and
>    will be removed in 16.11.
>    It is replaced by rte_mempool_generic_get/put functions.
> +
> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
> +  may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
> +  ``nb_segs`` in one operation, because some platforms have an overhead if the
> +  store address is not naturally aligned. Other mbuf fields, such as the
> +  ``port`` field, may be moved or removed as part of this mbuf work.
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 2.8.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] last days for deprecation notices
@ 2016-07-27  9:15  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-27  9:15 UTC (permalink / raw)
  To: dev

Hi everybody,

There are some announces pending to make some changes in 16.11 which
will break the API or ABI:
	http://dpdk.org/dev/patchwork/project/dpdk/list/?q=announce
Some of them are really good but will probably not happen because there
is no visible consensus (or often no discussion at all).

Such changes need 3 "meaningful" acks to be accepted:
	http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst#n56
Note: 3 acks from the same company are not really "meaningful" ;)

The release is planned on Thursday 28, so it is almost the last day
to discuss these changes.
Thanks

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 22:48  4%   ` Ananyev, Konstantin
@ 2016-07-27  8:59  4%     ` Thomas Monjalon
  2016-07-27 17:10  4%       ` Jerin Jacob
  2016-07-31  7:50  4%     ` Vlad Zolotarov
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-27  8:59 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev, Ananyev, Konstantin

> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> > +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> > +  and processing of packet burst to meet HW specific requirements before
> > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

I think I understand you want to split the TX processing:
	1/ modify/write in mbufs
	2/ write in HW
and let application decide:
	- where the TX prep is done (which core)
	- what to do if the TX prep fail
So adding some processing in this first part becomes "not too expensive" or
"manageable" from the application point of view.

If I well understand the intent,

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
(except typos ;)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
  2016-07-20  8:54  4%   ` Ferruh Yigit
@ 2016-07-27  8:33  4%   ` Thomas Monjalon
  2016-07-28 18:04  4%     ` Thomas Monjalon
  2016-07-27  9:34  4%   ` Ananyev, Konstantin
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-27  8:33 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jerin.jacob, bruce.richardson

> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-21 15:24 11% ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
@ 2016-07-21 22:48  4%   ` Ananyev, Konstantin
  2016-07-27  8:59  4%     ` Thomas Monjalon
  2016-07-31  7:50  4%     ` Vlad Zolotarov
  2016-07-28 12:04  4%   ` Avi Kivity
  1 sibling, 2 replies; 200+ results
From: Ananyev, Konstantin @ 2016-07-21 22:48 UTC (permalink / raw)
  To: Kulasek, TomaszX, dev



> 
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
> 
> As discussed in that thread:
> 
> http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Different NIC models depending on HW offload requested might impose
> different requirements on packets to be TX-ed in terms of:
> 
>  - Max number of fragments per packet allowed
>  - Max number of fragments per TSO segments
>  - The way pseudo-header checksum should be pre-calculated
>  - L3/L4 header fields filling
>  - etc.
> 
> 
> MOTIVATION:
> -----------
> 
> 1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
>    However, this work is sometimes required, and now, it's an
>    application issue.
> 
> 2) Different hardware may have different requirements for TX offloads,
>    other subset can be supported and so on.
> 
> 3) Some parameters (eg. number of segments in ixgbe driver) may hung
>    device. These parameters may be vary for different devices.
> 
>    For example i40e HW allows 8 fragments per packet, but that is after
>    TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.
> 
> 4) Fields in packet may require different initialization (like eg. will
>    require pseudo-header checksum precalculation, sometimes in a
>    different way depending on packet type, and so on). Now application
>    needs to care about it.
> 
> 5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
>    prepare packet burst in acceptable form for specific device.
> 
> 6) Some additional checks may be done in debug mode keeping tx_burst
>    implementation clean.
> 
> 
> PROPOSAL:
> ---------
> 
> To help user to deal with all these varieties we propose to:
> 
> 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
>    packet burst to be safely transmitted on device for desired HW
>    offloads (set/reset checksum field according to the hardware
>    requirements) and check HW constraints (number of segments per
>    packet, etc).
> 
>    While the limitations and requirements may differ for devices, it
>    requires to extend rte_eth_dev structure with new function pointer
>    "tx_pkt_prep" which can be implemented in the driver to prepare and
>    verify packets, in devices specific way, before burst, what should to
>    prevent application to send malformed packets.
> 
> 2. Also new fields will be introduced in rte_eth_desc_lim:
>    nb_seg_max and nb_mtu_seg_max, providing an information about max
>    segments in TSO and non-TSO packets acceptable by device.
> 
>    This information is useful for application to not create/limit
>    malicious packet.
> 
> 
> APPLICATION (CASE OF USE):
> --------------------------
> 
> 1) Application should to initialize burst of packets to send, set
>    required tx offload flags and required fields, like l2_len, l3_len,
>    l4_len, and tso_segsz
> 
> 2) Application passes burst to the rte_eth_tx_prep to check conditions
>    required to send packets through the NIC.
> 
> 3) The result of rte_eth_tx_prep can be used to send valid packets
>    and/or restore invalid if function fails.
> 
> eg.
> 
> 	for (i = 0; i < nb_pkts; i++) {
> 
> 		/* initialize or process packet */
> 
> 		bufs[i]->tso_segsz = 800;
> 		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
> 				| PKT_TX_IP_CKSUM;
> 		bufs[i]->l2_len = sizeof(struct ether_hdr);
> 		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
> 		bufs[i]->l4_len = sizeof(struct tcp_hdr);
> 	}
> 
> 	/* Prepare burst of TX packets */
> 	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);
> 
> 	if (nb_prep < nb_pkts) {
> 		printf("tx_prep failed\n");
> 
> 		/* drop or restore invalid packets */
> 
> 	}
> 
> 	/* Send burst of TX packets */
> 	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);
> 
> 	/* Free any unsent packets. */
> 
> 
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..485aacb 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,10 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are deprecated and
>    will be removed in 16.11.
>    It is replaced by rte_mempool_generic_get/put functions.
> +
> +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +  and processing of packet burst to meet HW specific requirements before
> +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
> +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
> +  segments limit to be transmitted by device for TSO/non-TSO packets.
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] mempool: adjust name string size in related data types
  2016-07-21 14:25  0%         ` Olivier Matz
@ 2016-07-21 21:16  0%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-21 21:16 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: dev, Olivier Matz, Zoltan Kiss, Bruce Richardson

2016-07-21 16:25, Olivier Matz:
> On 07/21/2016 03:47 PM, Zoltan Kiss wrote:
> > On 21/07/16 14:40, Olivier Matz wrote:
> >> On 07/20/2016 07:16 PM, Zoltan Kiss wrote:
> >>> A recent patch brought up an issue about the size of the 'name' fields:
> >>>
> >>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
> >>>
> >>> These relations should be observed:
> >>>
> >>> 1. Each ring creates a memzone with a prefixed name:
> >>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
> >>>
> >>> 2. There are some mempool handlers which create a ring with a prefixed
> >>> name:
> >>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE -
> >>> strlen(RTE_MEMPOOL_MZ_PREFIX)
> >>>
> >>> 3. A mempool can create up to RTE_MAX_MEMZONE pre and postfixed
> >>> memzones:
> >>> sprintf(postfix, "_%d", RTE_MAX_MEMZONE)
> >>> RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
> >>>     strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen(postfix)
> >>>
> >>> Setting all of them to 32 hides this restriction from the application.
> >>> This patch decreases the mempool and ring string size to accommodate for
> >>> these prefixes, but it doesn't apply the 3rd constraint. Applications
> >>> relying on these constants need to be recompiled, otherwise they'll run
> >>> into ENAMETOOLONG issues.
> >>> The size of the arrays are kept 32 for ABI compatibility, it can be
> >>> decreased next time the ABI changes.
> >>>
> >>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
> >>
> >> Looks like to be a good compromise for the 16.07 release. One question
> >> however: why not taking in account the 3rd constraint? Because it may
> >> not completly fix the issue if the mempool is fragmented.
> >>
> >> We could define RTE_MEMPOOL_NAMESIZE to 24
> >>  = 32 - len('mp_') - len('_0123'))
> > 
> > I was trying to figure out a compile time macro for strlen(postfix), but
> > I could not. Your suggestion would work only until someone increases
> > RTE_MAX_MEMZONE above 9999. As the likelihood of fragmenting a pool over
> > 99 memzones seems small, I did not bother to fix this with an ugly hack,
> > but if you think we should include it, let me know!
> 
> Ok, looks fair, thanks for the clarification.
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 14:24 13% [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
  2016-07-20 15:01  4% ` Thomas Monjalon
@ 2016-07-21 15:24 11% ` Tomasz Kulasek
  2016-07-21 22:48  4%   ` Ananyev, Konstantin
  2016-07-28 12:04  4%   ` Avi Kivity
  2016-07-31  7:46  4% ` [dpdk-dev] [PATCH] " Vlad Zolotarov
  2 siblings, 2 replies; 200+ results
From: Tomasz Kulasek @ 2016-07-21 15:24 UTC (permalink / raw)
  To: dev

This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
changes in rte_eth_dev and rte_eth_desc_lim structures.

As discussed in that thread:

http://dpdk.org/ml/archives/dev/2015-September/023603.html

Different NIC models depending on HW offload requested might impose
different requirements on packets to be TX-ed in terms of:

 - Max number of fragments per packet allowed
 - Max number of fragments per TSO segments
 - The way pseudo-header checksum should be pre-calculated
 - L3/L4 header fields filling
 - etc.


MOTIVATION:
-----------

1) Some work cannot (and didn't should) be done in rte_eth_tx_burst.
   However, this work is sometimes required, and now, it's an
   application issue.

2) Different hardware may have different requirements for TX offloads,
   other subset can be supported and so on.

3) Some parameters (eg. number of segments in ixgbe driver) may hung
   device. These parameters may be vary for different devices.

   For example i40e HW allows 8 fragments per packet, but that is after
   TSO segmentation. While ixgbe has a 38-fragment pre-TSO limit.

4) Fields in packet may require different initialization (like eg. will
   require pseudo-header checksum precalculation, sometimes in a
   different way depending on packet type, and so on). Now application
   needs to care about it.

5) Using additional API (rte_eth_tx_prep) before rte_eth_tx_burst let to
   prepare packet burst in acceptable form for specific device.

6) Some additional checks may be done in debug mode keeping tx_burst
   implementation clean.


PROPOSAL:
---------

To help user to deal with all these varieties we propose to:

1. Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before burst, what should to
   prevent application to send malformed packets.

2. Also new fields will be introduced in rte_eth_desc_lim: 
   nb_seg_max and nb_mtu_seg_max, providing an information about max
   segments in TSO and non-TSO packets acceptable by device.

   This information is useful for application to not create/limit
   malicious packet.


APPLICATION (CASE OF USE):
--------------------------

1) Application should to initialize burst of packets to send, set
   required tx offload flags and required fields, like l2_len, l3_len,
   l4_len, and tso_segsz

2) Application passes burst to the rte_eth_tx_prep to check conditions
   required to send packets through the NIC.

3) The result of rte_eth_tx_prep can be used to send valid packets
   and/or restore invalid if function fails.

eg.

	for (i = 0; i < nb_pkts; i++) {

		/* initialize or process packet */

		bufs[i]->tso_segsz = 800;
		bufs[i]->ol_flags = PKT_TX_TCP_SEG | PKT_TX_IPV4
				| PKT_TX_IP_CKSUM;
		bufs[i]->l2_len = sizeof(struct ether_hdr);
		bufs[i]->l3_len = sizeof(struct ipv4_hdr);
		bufs[i]->l4_len = sizeof(struct tcp_hdr);
	}

	/* Prepare burst of TX packets */
	nb_prep = rte_eth_tx_prep(port, 0, bufs, nb_pkts);

	if (nb_prep < nb_pkts) {
		printf("tx_prep failed\n");

		/* drop or restore invalid packets */

	}

	/* Send burst of TX packets */
	nb_tx = rte_eth_tx_burst(port, 0, bufs, nb_prep);

	/* Free any unsent packets. */



Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 doc/guides/rel_notes/deprecation.rst |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..485aacb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,3 +41,10 @@ Deprecation Notices
 * The mempool functions for single/multi producer/consumer are deprecated and
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
+
+* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
+  extended with new function pointer ``tx_pkt_prep`` allowing verification
+  and processing of packet burst to meet HW specific requirements before
+  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
+  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
+  segments limit to be transmitted by device for TSO/non-TSO packets.
-- 
1.7.9.5

^ permalink raw reply	[relevance 11%]

* Re: [dpdk-dev] [PATCH v2] mempool: adjust name string size in related data types
  2016-07-21 13:47  0%       ` Zoltan Kiss
@ 2016-07-21 14:25  0%         ` Olivier Matz
  2016-07-21 21:16  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-07-21 14:25 UTC (permalink / raw)
  To: Zoltan Kiss, Zoltan Kiss, dev; +Cc: Bruce Richardson



On 07/21/2016 03:47 PM, Zoltan Kiss wrote:
> 
> 
> On 21/07/16 14:40, Olivier Matz wrote:
>> Hi Zoltan,
>>
>>
>> On 07/20/2016 07:16 PM, Zoltan Kiss wrote:
>>> A recent patch brought up an issue about the size of the 'name' fields:
>>>
>>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>>
>>> These relations should be observed:
>>>
>>> 1. Each ring creates a memzone with a prefixed name:
>>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>>>
>>> 2. There are some mempool handlers which create a ring with a prefixed
>>> name:
>>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE -
>>> strlen(RTE_MEMPOOL_MZ_PREFIX)
>>>
>>> 3. A mempool can create up to RTE_MAX_MEMZONE pre and postfixed
>>> memzones:
>>> sprintf(postfix, "_%d", RTE_MAX_MEMZONE)
>>> RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
>>>     strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen(postfix)
>>>
>>> Setting all of them to 32 hides this restriction from the application.
>>> This patch decreases the mempool and ring string size to accommodate for
>>> these prefixes, but it doesn't apply the 3rd constraint. Applications
>>> relying on these constants need to be recompiled, otherwise they'll run
>>> into ENAMETOOLONG issues.
>>> The size of the arrays are kept 32 for ABI compatibility, it can be
>>> decreased next time the ABI changes.
>>>
>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>>
>> Looks like to be a good compromise for the 16.07 release. One question
>> however: why not taking in account the 3rd constraint? Because it may
>> not completly fix the issue if the mempool is fragmented.
>>
>> We could define RTE_MEMPOOL_NAMESIZE to 24
>>  = 32 - len('mp_') - len('_0123'))
> 
> I was trying to figure out a compile time macro for strlen(postfix), but
> I could not. Your suggestion would work only until someone increases
> RTE_MAX_MEMZONE above 9999. As the likelihood of fragmenting a pool over
> 99 memzones seems small, I did not bother to fix this with an ugly hack,
> but if you think we should include it, let me know!

Ok, looks fair, thanks for the clarification.

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] mempool: adjust name string size in related data types
  2016-07-21 13:40  0%     ` Olivier Matz
@ 2016-07-21 13:47  0%       ` Zoltan Kiss
  2016-07-21 14:25  0%         ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Zoltan Kiss @ 2016-07-21 13:47 UTC (permalink / raw)
  To: Olivier Matz, Zoltan Kiss, dev; +Cc: Bruce Richardson



On 21/07/16 14:40, Olivier Matz wrote:
> Hi Zoltan,
>
>
> On 07/20/2016 07:16 PM, Zoltan Kiss wrote:
>> A recent patch brought up an issue about the size of the 'name' fields:
>>
>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>
>> These relations should be observed:
>>
>> 1. Each ring creates a memzone with a prefixed name:
>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>>
>> 2. There are some mempool handlers which create a ring with a prefixed
>> name:
>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE - strlen(RTE_MEMPOOL_MZ_PREFIX)
>>
>> 3. A mempool can create up to RTE_MAX_MEMZONE pre and postfixed memzones:
>> sprintf(postfix, "_%d", RTE_MAX_MEMZONE)
>> RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
>> 	strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen(postfix)
>>
>> Setting all of them to 32 hides this restriction from the application.
>> This patch decreases the mempool and ring string size to accommodate for
>> these prefixes, but it doesn't apply the 3rd constraint. Applications
>> relying on these constants need to be recompiled, otherwise they'll run
>> into ENAMETOOLONG issues.
>> The size of the arrays are kept 32 for ABI compatibility, it can be
>> decreased next time the ABI changes.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>
> Looks like to be a good compromise for the 16.07 release. One question
> however: why not taking in account the 3rd constraint? Because it may
> not completly fix the issue if the mempool is fragmented.
>
> We could define RTE_MEMPOOL_NAMESIZE to 24
>  = 32 - len('mp_') - len('_0123'))

I was trying to figure out a compile time macro for strlen(postfix), but 
I could not. Your suggestion would work only until someone increases 
RTE_MAX_MEMZONE above 9999. As the likelihood of fragmenting a pool over 
99 memzones seems small, I did not bother to fix this with an ugly hack, 
but if you think we should include it, let me know!

>
> Thanks,
> Olivier
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] mempool: adjust name string size in related data types
  2016-07-20 17:16 12%   ` [dpdk-dev] [PATCH v2] " Zoltan Kiss
@ 2016-07-21 13:40  0%     ` Olivier Matz
  2016-07-21 13:47  0%       ` Zoltan Kiss
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-07-21 13:40 UTC (permalink / raw)
  To: Zoltan Kiss, dev; +Cc: Bruce Richardson

Hi Zoltan,


On 07/20/2016 07:16 PM, Zoltan Kiss wrote:
> A recent patch brought up an issue about the size of the 'name' fields:
> 
> 85cf0079 mem: avoid memzone/mempool/ring name truncation
> 
> These relations should be observed:
> 
> 1. Each ring creates a memzone with a prefixed name:
> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
> 
> 2. There are some mempool handlers which create a ring with a prefixed
> name:
> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE - strlen(RTE_MEMPOOL_MZ_PREFIX)
> 
> 3. A mempool can create up to RTE_MAX_MEMZONE pre and postfixed memzones:
> sprintf(postfix, "_%d", RTE_MAX_MEMZONE)
> RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
> 	strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen(postfix)
> 
> Setting all of them to 32 hides this restriction from the application.
> This patch decreases the mempool and ring string size to accommodate for
> these prefixes, but it doesn't apply the 3rd constraint. Applications
> relying on these constants need to be recompiled, otherwise they'll run
> into ENAMETOOLONG issues.
> The size of the arrays are kept 32 for ABI compatibility, it can be
> decreased next time the ABI changes.
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>

Looks like to be a good compromise for the 16.07 release. One question
however: why not taking in account the 3rd constraint? Because it may
not completly fix the issue if the mempool is fragmented.

We could define RTE_MEMPOOL_NAMESIZE to 24
 = 32 - len('mp_') - len('_0123'))

Thanks,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-20 10:41  2%           ` Adrien Mazarguil
@ 2016-07-21  3:18  0%             ` Lu, Wenzhuo
  0 siblings, 0 replies; 200+ results
From: Lu, Wenzhuo @ 2016-07-21  3:18 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

Hi Adrien,

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Wednesday, July 20, 2016 6:41 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org; Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody;
> Ajit Khaparde; Rahul Lakkireddy; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> Guarch, Pablo; Olga Shern
> Subject: Re: [RFC] Generic flow director/filtering/classification API
> 
> Hi Wenzhuo,
> 
> On Wed, Jul 20, 2016 at 02:16:51AM +0000, Lu, Wenzhuo wrote:
> [...]
> > > So, today an application cannot combine N-tuple and FDIR flow rules
> > > and get a reliable outcome, unless it is designed for specific
> > > devices with a known behavior.
> > >
> > > > What's the right behavior of PMD if APP want to create a flow
> > > > director rule
> > > which has a higher or even equal priority than an existing n-tuple
> > > rule? Should PMD return fail?
> > >
> > > First remember applications only deal with the generic API, PMDs are
> > > responsible for choosing the most appropriate HW implementation to
> > > use according to the requested flow rules (FDIR, N-tuple or anything else).
> > >
> > > For the specific case of FDIR vs N-tuple, if the underlying HW
> > > supports both I do not see why the PMD would create a N-tuple rule.
> > > Doesn't FDIR support everything N-tuple can do and much more?
> > Talking about the filters, fdir can cover n-tuple. I think that's why i40e only
> supports fdir but not n-tuple. But n-tuple has its own highlight. As we know, at
> least on intel NICs, fdir only supports per device mask. But n-tuple can support
> per rule mask.
> > As every pattern has spec and mask both, we cannot guarantee the masks are
> same. I think ixgbe will try to use n-tuple first if can. Because even the masks are
> different, we can support them all.
> 
> OK, makes sense. In that case existing rules may indeed prevent subsequent
> ones from getting created if their priority is wrong. I do not think there is a way
> around that if the application needs this exact ordering.
Agree. I don’t see any workaround either. PMD has to return fail sometimes.

> 
> > > Assuming such a thing happened anyway, that the PMD had to create a
> > > rule using a high priority filter type and that the application
> > > requests the creation of a rule that can only be done using a lower
> > > priority filter type, but also requested a higher priority for that rule, then yes,
> it should obviously fail.
> > >
> > > That is, unless the PMD can perform some kind of workaround to have both.
> > >
> > > > If so, do we need more fail reasons? According to this RFC, I
> > > > think we need
> > > return " EEXIST: collision with an existing rule. ", but it's not
> > > very clear, APP doesn't know the problem is priority, maybe more detailed
> reason is helpful.
> > >
> > > Possibly, I've defined a basic set of errors, there are quite a
> > > number of errno values to choose from. However I think we should not
> define too many values.
> > > In my opinion the basic set covers every possible failure:
> > >
> > > - EINVAL: invalid format, rule is broken or cannot be understood by the PMD
> > >   anyhow.
> > >
> > > - ENOTSUP: pattern/actions look fine but something in the requested rule is
> > >   not supported and thus cannot be applied.
> > >
> > > - EEXIST: pattern/actions are fine and could have been applied if only some
> > >   other rule did not prevent the PMD to do it (I see it as the closest thing
> > >   to "ETOOBAD" which unfortunately does not exist).
> > >
> > > - ENOMEM: like EEXIST, except it is due to the lack of resources not because
> > >   of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
> > >   settled on ENOMEM as it is well known. Still open to debate.
> > >
> > > Errno values are only useful to get a rough idea of the reason, and
> > > another mechanism is needed to pinpoint the exact problem for
> > > debugging/reporting purposes, something like:
> > >
> > >  enum rte_flow_error_type {
> > >      RTE_FLOW_ERROR_TYPE_NONE,
> > >      RTE_FLOW_ERROR_TYPE_UNKNOWN,
> > >      RTE_FLOW_ERROR_TYPE_PRIORITY,
> > >      RTE_FLOW_ERROR_TYPE_PATTERN,
> > >      RTE_FLOW_ERROR_TYPE_ACTION,
> > >  };
> > >
> > >  struct rte_flow_error {
> > >      enum rte_flow_error_type type;
> > >      void *offset; /* Points to the exact pattern item or action. */
> > >      const char *message;
> > >  };
> > When we are using a CLI and it fails, normally it will let us know
> > which parameter is not appropriate. So, I think it’s a good idea to
> > have this error structure :)
> 
> Agreed.
> 
> > > Then either provide an optional struct rte_flow_error pointer to
> > > rte_flow_validate(), or a separate function (rte_flow_analyze()?),
> > > since processing this may be quite expensive and applications may
> > > not care about the exact reason.
> > Agree the processing may be too expensive. Maybe we can say it's optional to
> return error details. And that's a good question that what APP should do if
> creating the rule fails. I believe normally it will choose handle the rule by itself.
> But I think it's not bad to feedback more. Or even the APP want to adjust the
> rules, it cannot be an option for lack of info.
> 
> All right then, I'll add it to the specification.
> 
>  int
>  rte_flow_validate(uint8_t port_id,
>                    const struct rte_flow_pattern *pattern,
>                    const struct rte_flow_actions *actions,
>                    struct rte_flow_error *error);
> 
> With error possibly NULL if the application does not care. Is it fine for you?
Yes, it looks good to me. Thanks for that :)

> 
> [...]
> > > > > > > - PMDs, not applications, are responsible for maintaining flow rules
> > > > > > >   configuration when stopping and restarting a port or performing
> other
> > > > > > >   actions which may affect them. They can only be destroyed explicitly.
> > > > > > Don’t understand " They can only be destroyed explicitly."
> > > > >
> > > > > This part says that as long as an application has not called
> > > > > rte_flow_destroy() on a flow rule, it never disappears, whatever
> > > > > happens to the port (stopped, restarted). The application is not
> > > > > responsible for re-creating rules after that.
> > > > >
> > > > > Note that according to the specification, this may translate to
> > > > > not being able to stop a port as long as a flow rule is present,
> > > > > depending on how nice the PMD intends to be with applications.
> > > > > Implementation can be done in small steps with minimal amount of
> > > > > code on
> > > the PMD side.
> > > > Does it mean PMD should store and maintain all the rules? Why not
> > > > let rte do
> > > that? I think if PMD maintain all the rules, it means every kind of
> > > NIC should have a copy of code for the rules. But if rte do that,
> > > only one copy of code need to be maintained, right?
> > >
> > > I've considered having rules stored in a common format understood at
> > > the RTE level and not specific to each PMD and decided that the
> > > opaque rte_flow pointer was a better choice for the following reasons:
> > >
> > > - Even though flow rules management is done in the control path, processing
> > >   must be as fast as possible. Letting PMDs store flow rules using their own
> > >   internal representation gives them the chance to achieve better
> > >   performance.
> > Not quite understand. I think we're talking about maintain the rules by SW. I
> don’t think there's something need to be optimized according to specific NICs. If
> we need to optimize the code, I think we need to consider the CPU, OS ... and
> some common means. I'm wrong?
> 
> Perhaps we were talking about different things, here I was only explaining why
> rte_flow (the result of creating a flow rule) should be opaque and fully managed
> by the PMD. More on the SW side of things below.
> 
> > > - An opaque context managed by PMDs would probably have to be stored
> > >   somewhere as well anyway.
> > >
> > > - PMDs may not need to allocate/store anything at all if they exclusively
> > >   rely on HW state for everything. In my opinion, the generic API has enough
> > >   constraints for this to work and maintain consistency between flow
> > >   rules. Note this is currently how most PMDs implement FDIR and other
> > >   filter types.
> > Yes, the rules are stored by HW. But considering stop/start the device, the
> rules in HW will lose. we have to store the rules by SW and re-program them
> when restarting the device.
> 
> Assume a HW capable of keeping flow rules programmed even during a
> stop/start cycle (e.g. mlx4/mlx5 may be able to do it from DPDK point of view),
> don't you think it is more efficient to standardize on this behavior and let PMDs
> restore flow rules for HW that do not support it regardless of whether it would
> be done by RTE or the application (SW)?
Didn’t know that. As some NICs have already had the ability to keep the rules during a stop/start cycle, maybe it could be a trend :)

> 
> > And in existing code, we store the filters by SW at least on Intel NICs. But I
> think we cannot reuse them, because considering the priority and which
> category of filter should be chosen, I think we need a whole new table for
> generic API. I think it’s what's designed now, right?
> 
> So I understand you'd want RTE to help your PMD keep track of the flow rules it
> created?
Yes. But as you said before, it’s not a good idea for mlx4/mlx5, because their HW doesn't need SW to re-program the rules after stopping/starting. If we make it a common mechanism, it just wastes time for mlx4/mlx5.

> 
> Nothing wrong with that, all I'm saying is that it should be entirely optional. RTE
> should not automatically maintain a list. PMDs have to call RTE helpers if they
> need help to maintain a context. These helpers are not defined in this API yet
> because it is difficult to know what will be useful in advance.
> 
> > > - RTE can (and will) provide helpers to avoid most of the code redundancy,
> > >   PMDs are free to use them or manage everything by themselves.
> > >
> > > - Given that the opaque rte_flow pointer associated with a flow rule is to
> > >   be stored by the application, PMDs do not even have to keep references to
> > >   them.
> > Don’t understand. More details?
> 
> In an application:
> 
>  rte_flow *foo = rte_flow_create(...);
> 
> In the above example, foo cannot be dereferenced by the application nor RTE,
> only the PMD is aware of its contents. This object can only be used with
> rte_flow*() functions.
> 
> PMDs are thus free to make this object grow as needed when adding internal
> features without breaking any kind of public API/ABI.
> 
> What I meant is, given that the application is supposed to store foo somewhere
> in order to destroy it later, the PMD does not have to keep track of that pointer
> assuming it does not need to access it later on its own for some reason.
> 
> > > - The flow rules format described in this specification (pattern / actions)
> > >   will be used by applications directly, and will be free to arrange them in
> > >   lists, trees or in any other way if they need to keep flow specifications
> > >   around for further processing.
> > Who will create the lists, trees or something else? According to previous
> discussion, I think the APP will program the rules one by one. So if APP organize
> the rules to lists, trees..., PMD doesn’t know that.
> > And you said " Given that the opaque rte_flow pointer associated with a flow
> rule is to be stored by the application ". I'm lost here.
> 
> I guess that's because we're discussing two different things, flow rule
> specifications and flow rule objects. Let me sum it up:
> 
> - Flow rule specifications are the patterns/actions combinations provided by
>   applications to rte_flow_create(). Applications can store those as needed
>   and organize them as they wish (hash, tree, list). Neither PMDs nor RTE
>   will do it for them.
> 
> - Flow rule objects (struct rte_flow *) are generated when a flow rule is
>   created. Applications must keep these around if they want to manipulate
>   them later (i.e. destroy or query existing rules).
Thanks for this clarification. So the specifications can be different with objects, right? The specifications are what the APP wants, the objects are what the APP really gets. As rte_flow_create can fail. Right?

> 
> Then PMDs *may* need to keep and arrange flow rule objects internally for
> management purposes. Could be because HW requires it, detecting conflicting
> rules, managing priorities and so on. Possible reasons are not described in this
> API because these are thought as PMD-specific needs.
Got it.

> 
> > > > When the port is stopped and restarted, rte can reconfigure the
> > > > rules. Is the
> > > concern that PMD may adjust the sequence of the rules according to
> > > the priority, so every NIC has a different list of rules? But PMD
> > > can adjust them again when rte reconfiguring the rules.
> > >
> > > What about PMDs able to stop and restart ports without destroying
> > > their own flow rules? If we assume flow rules must be destroyed when
> > > stopping a port, these PMDs are needlessly penalized with slower
> > > stop/start cycles. Think about it assuming thousands of flow rules.
> > I believe the rules maintained by SW should not be destroyed, because they're
> used to be re-programed when the device starts again.
> 
> Do we agree that applications should not care? Flow rules configured before
> stopping a port must still be there after restarting it.
Yes, agree.

> 
> What we seem to not agree about is that you think RTE should be responsible
> for restoring flow rules of devices that lose them when stopped. I think doing so
> is unfair to devices for which it is not the case and not really nice to applications,
> so my opinion is that the PMD is responsible for restoring flow rules however it
> wants. It is free to use RTE helpers to keep their track, as long as it's all managed
> internally.
What I think is RTE can store the flow rules and recreate them after restarting, in the function like rte_dev_start, so APP knows nothing about it. But according to the discussing above, I think the design doesn't support it, right?
RTE doesn't store the flow rules objects and event it stores them, there's no way designed to re-program the objects. And also considering some HW doesn't need to be re-programed. I think it's OK  to let PMD maintain the rules as the re-programing is a NIC specific requirement.

> 
> > > Thus from an application point of view, whatever happens when
> > > stopping and restarting a port should not matter. If a flow rule was
> > > present before, it must still be present afterwards. If the PMD had
> > > to destroy flow rules and re-create them, it does not actually matter if they
> differ slightly at the HW level, as long as:
> > >
> > > - Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
> > >   and refer to the same rules.
> > >
> > > - The overall behavior of all rules is the same.
> > >
> > > The list of rules you think of (patterns / actions) is maintained by
> > > applications (not RTE), and only if they need them. RTE would needlessly
> duplicate this.
> > As said before, need more details to understand this. Maybe an example
> > is better :)
> 
> The generic format both RTE and applications might understand is the one
> described in this API (struct rte_flow_pattern and struct rte_flow_actions).
> 
> If we wanted RTE to maintain some sort of per-port state for flow rule
> specifications, it would have to be a copy of these structures arranged somehow
> (list or something else).
> 
> If we consider that PMDs need to keep a context object associated to a flow
> rule (the opaque struct rte_flow *), then RTE would most likely have to store it
> along with the flow specification.
> 
> Such a list may not be useful to applications (list lookups take time), so they
> would implement their own redundant method. They might also require extra
> room to attach some application context to flow rules. A generic list cannot plan
> for it.
> 
> Applications know what they want to do with flow rules and are responsible for
> managing them efficiently with RTE out of the way.
> 
> I'm not sure if this answered your question, if not, please describe a scenario
> where a RTE-managed list of flow rules would be mandatory.
Got your point and agree :)

> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] validate_abi: build faster by augmenting make with job count
  @ 2016-07-20 19:02  9% ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2016-07-20 19:02 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman, Neil Horman, Thomas Monjalon, Mcnamara, John

John Mcnamara and I were discussing enhacing the validate_abi script to build
the dpdk tree faster with multiple jobs.  Theres no reason not to do it, so this
implements that requirement.  It uses a MAKE_JOBS variable that can be set by
the user to limit the job count.  By default the job count is set to the number
of online cpus.

Signed-off-by: Neil Horman <nhorman@tuxdrier.com>
CC: Thomas Monjalon <thomas.monjalon@6wind.com>
CC: "Mcnamara, John" <john.mcnamara@intel.com>

---
Change notes

v2) switch variable to DPDK_MAKE_JOBS
    make provision for no existance of lscpu
---
 scripts/validate-abi.sh | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/scripts/validate-abi.sh b/scripts/validate-abi.sh
index c36ad61..feda6c8 100755
--- a/scripts/validate-abi.sh
+++ b/scripts/validate-abi.sh
@@ -97,6 +97,17 @@ fixup_config() {
 #trap on ctrl-c to clean up
 trap cleanup_and_exit SIGINT
 
+if [ -z "$DPDK_MAKE_JOBS" ]
+then
+	# This counts the number of cpus on the system
+	if [ -e /usr/bin/lscpu ]
+	then
+		DPDK_MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
+	else
+		DPDK_MAKE_JOBS=1
+	fi
+fi
+
 #Save the current branch
 CURRENT_BRANCH=`git branch | grep \* | cut -d' ' -f2`
 
@@ -183,7 +194,7 @@ log "INFO" "Configuring DPDK $TAG1"
 make config T=$TARGET O=$TARGET > $VERBOSE 2>&1
 
 log "INFO" "Building DPDK $TAG1. This might take a moment"
-make O=$TARGET > $VERBOSE 2>&1
+make -j$DPDK_MAKE_JOBS O=$TARGET > $VERBOSE 2>&1
 
 if [ $? -ne 0 ]
 then
@@ -214,7 +225,7 @@ log "INFO" "Configuring DPDK $TAG2"
 make config T=$TARGET O=$TARGET > $VERBOSE 2>&1
 
 log "INFO" "Building DPDK $TAG2. This might take a moment"
-make O=$TARGET > $VERBOSE 2>&1
+make -j$DPDK_MAKE_JOBS O=$TARGET > $VERBOSE 2>&1
 
 if [ $? -ne 0 ]
 then
-- 
2.5.5

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-20 13:37  4%           ` Olivier Matz
  2016-07-20 14:01  0%             ` Richardson, Bruce
@ 2016-07-20 17:20  0%             ` Zoltan Kiss
  1 sibling, 0 replies; 200+ results
From: Zoltan Kiss @ 2016-07-20 17:20 UTC (permalink / raw)
  To: Olivier Matz, Zoltan Kiss, dev



On 20/07/16 14:37, Olivier Matz wrote:
> Hi,
>
> On 07/20/2016 02:41 PM, Zoltan Kiss wrote:
>>
>>
>> On 19/07/16 17:17, Olivier Matz wrote:
>>> Hi Zoltan,
>>>
>>> On 07/19/2016 05:59 PM, Zoltan Kiss wrote:
>>>>
>>>>
>>>> On 19/07/16 16:37, Olivier Matz wrote:
>>>>> Hi Zoltan,
>>>>>
>>>>> On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
>>>>>> A recent fix brought up an issue about the size of the 'name' fields:
>>>>>>
>>>>>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>>>>>
>>>>>> These relations should be observed:
>>>>>>
>>>>>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>>>>>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE -
>>>>>> strlen(RTE_MEMPOOL_MZ_PREFIX)
>>>>>>
>>>>>> Setting all of them to 32 hides this restriction from the application.
>>>>>> This patch increases the memzone string size to accomodate for these
>>>>>> prefixes, and the same happens with the ring name string. The ABI
>>>>>> needs to
>>>>>> be broken to fix this API issue, this way doesn't break applications
>>>>>> previously not failing due to the truncating bug now fixed.
>>>>>>
>>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>>>>>
>>>>> I agree it is a problem for an application because it cannot know what
>>>>> is the maximum name length. On the other hand, breaking the ABI for
>>>>> this
>>>>> looks a bit overkill. Maybe we could reduce RTE_MEMPOOL_NAMESIZE and
>>>>> RTE_RING_NAMESIZE instead of increasing RTE_MEMZONE_NAMESIZE? That way,
>>>>> we could keep the ABI as is.
>>>>
>>>> But that would break the ABI too, wouldn't it? Unless you keep the array
>>>> the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.
>>>
>>> Yes, that was the idea.
>>>
>>>> And even then, the API breaks anyway. There are applications - I have at
>>>> least some - which use all 32 bytes to store the name. Decrease that
>>>> would cause headache to change the naming scheme, because it's a 30
>>>> character long id, and chopping the last few chars would cause name
>>>> collisions and annoying bugs.
>>>
>>> Before my patch (85cf0079), long names were silently truncated when
>>> mempool created its ring and/or memzones. Now, it returns an error.
>>
>> With 16.04 an application could operate as expected if the first 26
>> character were unique. Your patch revealed the problem that characters
>> after these were left out of the name. Now applications fail where this
>> never been a bug because their naming scheme guarantees the uniqueness
>> on the first 26 chars (or makes it very unlikely)
>> Where the first 26 is not unique, it failed earlier too, because at
>> memzone creation it checks for duplicate names.
>
> Yes, I understand that there is a behavior change for applications using
> names larger than 26 between 16.04 and 16.07. I also understand that
> there is no way for an application to know what is the maximum usable
> size for a mempool or a ring.
>
>
>>> I'm not getting why changing the struct to something like below would
>>> break the API, since it would already return an error today.
>>>
>>>    #define RTE_MEMPOOL_NAMESIZE \
>>
>> Wait, this would mean applications need to recompile to use the smaller
>> value. AFAIK that's an ABI break too, right? At the moment I don't see a
>> way to fix this without breaking the ABI
>
> With this modification, if you don't recompile the application, its
> behavior will still be the same as today -> it will return ENAMETOOLONG.
> If you recompile it, the application will be aware of the maximum
> length. To me, it seems to be a acceptable compromise for this release.
>
> The patch you're proposing also changes the ABI of librte_ring and
> librte_eal, which cannot be accepted for the release.

Ok, I've sent a new version with this approach.

>
>
>>
>>>        (RTE_MEMZONE_NAMESIZE - sizeof(pool_prefix) - sizeof(ring prefix))
>>>    struct rte_mempool {
>>>        union {
>>>              char name[RTE_MEMPOOL_NAMESIZE];
>>>              char pad[32];
>>>        };
>>>        ...
>>>    }
>>>
>>> Anyway, it may not be the proper solution since it supposes that a
>>> mempool includes a ring based on a memzone, which is not always true now
>>> with mempool handlers.
>>
>> Oh, as we dug deeper it gets better!
>> Indeed, we don't necessarily have this ring + memzone pair underneath,
>> but the user is not aware of that, and I think we should keep it that
>> way. It should only care that the string passed shouldn't be bigger than
>> a certain amount.
>
> Yes. What I'm just saying here is that it's not a good solution to write
> in the #define that "a mempool is based on a ring + a memzone", because
> if some someone adds a new mempool handler replacing the ring, and also
> creating a memzone prefixed by something larger than "rg_", we will have
> to break the ABI again.

If someone adds a new handler, (s)he needs to keep in mind what's the 
max size for pool name, and any derived object using that name as a base 
should check if it fits.

>
>
>> Also, even though we don't necessarily have the ring, we still reserve
>> memzone's in rte_mempool_populate_default(). And their name has a 3
>> letter prefix, and a "_%d" postfix, where the %d could be as much as
>> RTE_MAX_MEMZONE in worst case (2560 by default) So actually:
>>
>> RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
>> strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen("_2560")
>>
>>
>> As a side note, there is another bug around here: rte_ring_create()
>> doesn't check for name duplications. However the user of the library can
>> lookup based on the name with rte_ring_lookup(), and it will return the
>> first ring with that name
>
> The name uniqueness is checked by rte_memzone_reserve().
>
>
>>>>> It would even be better to get rid of this static char[] for the
>>>>> structure names and replace it by an allocated const char *. I didn't
>>>>> check it's feasible for memzones. What do you think?
>>>>
>>>> It would work too, but I don't think it would help a lot. We would still
>>>> need max sizes for the names. Storing them somewhere else won't help us
>>>> in this problem.
>>>
>>> Why should we have a maximum length for the names?
>>
>> What happens if an application loads DPDK and create a mempool with a
>> name string 2 million characters long? Maybe nothing we should worry
>> about, but in general I think unlimited length function parameters are
>> problematic at the very least. The length should be passed at least
>> (which also creates a max due to the size of the param). But I think it
>> would be just easier to have these maximums set, observing the above
>> constrains.
>
> I think have a maximum name length brings more problems than not having
> it, especially ABI problems.
>
>
> Regards,
> Olivier
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] mempool: adjust name string size in related data types
  2016-07-19 14:37  3% ` [dpdk-dev] [PATCH] mempool: adjust name string size in related data types Zoltan Kiss
  2016-07-19 15:37  4%   ` Olivier Matz
@ 2016-07-20 17:16 12%   ` Zoltan Kiss
  2016-07-21 13:40  0%     ` Olivier Matz
  1 sibling, 1 reply; 200+ results
From: Zoltan Kiss @ 2016-07-20 17:16 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Bruce Richardson

A recent patch brought up an issue about the size of the 'name' fields:

85cf0079 mem: avoid memzone/mempool/ring name truncation

These relations should be observed:

1. Each ring creates a memzone with a prefixed name:
RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)

2. There are some mempool handlers which create a ring with a prefixed
name:
RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE - strlen(RTE_MEMPOOL_MZ_PREFIX)

3. A mempool can create up to RTE_MAX_MEMZONE pre and postfixed memzones:
sprintf(postfix, "_%d", RTE_MAX_MEMZONE)
RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
	strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen(postfix)

Setting all of them to 32 hides this restriction from the application.
This patch decreases the mempool and ring string size to accommodate for
these prefixes, but it doesn't apply the 3rd constraint. Applications
relying on these constants need to be recompiled, otherwise they'll run
into ENAMETOOLONG issues.
The size of the arrays are kept 32 for ABI compatibility, it can be
decreased next time the ABI changes.

Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
---

Notes:
    v2: keep arrays 32 bytes and decrease the max sizes to maintain ABI
    compatibility

 lib/librte_mempool/rte_mempool.h | 11 +++++++++--
 lib/librte_ring/rte_ring.h       | 12 ++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 4a8fbb1..059ad9e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -123,7 +123,9 @@ struct rte_mempool_objsz {
 	/**< Total size of an object (header + elt + trailer). */
 };
 
-#define RTE_MEMPOOL_NAMESIZE 32 /**< Maximum length of a memory pool. */
+/**< Maximum length of a memory pool's name. */
+#define RTE_MEMPOOL_NAMESIZE (RTE_RING_NAMESIZE - \
+			      sizeof(RTE_MEMPOOL_MZ_PREFIX) + 1)
 #define RTE_MEMPOOL_MZ_PREFIX "MP_"
 
 /* "MP_<name>" */
@@ -208,7 +210,12 @@ struct rte_mempool_memhdr {
  * The RTE mempool structure.
  */
 struct rte_mempool {
-	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to
+	 * RTE_MEMPOOL_NAMESIZE next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE]; /**< Name of mempool. */
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
 		uint64_t pool_id;        /**< External mempool identifier. */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index eb45e41..0e22e69 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -100,6 +100,7 @@ extern "C" {
 #include <rte_lcore.h>
 #include <rte_atomic.h>
 #include <rte_branch_prediction.h>
+#include <rte_memzone.h>
 
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
@@ -126,8 +127,10 @@ struct rte_ring_debug_stats {
 } __rte_cache_aligned;
 #endif
 
-#define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
 #define RTE_RING_MZ_PREFIX "RG_"
+/**< The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
 #ifndef RTE_RING_PAUSE_REP_COUNT
 #define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
@@ -147,7 +150,12 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
  * a problem.
  */
 struct rte_ring {
-	char name[RTE_RING_NAMESIZE];    /**< Name of the ring. */
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
 	int flags;                       /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
-- 
1.9.1

^ permalink raw reply	[relevance 12%]

* [dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count
@ 2016-07-20 17:09  9% Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2016-07-20 17:09 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman, Neil Horman, Thomas Monjalon, Mcnamara, John

From: Neil Horman <nhorman@redhat.com>

John Mcnamara and I were discussing enhacing the validate_abi script to build
the dpdk tree faster with multiple jobs.  Theres no reason not to do it, so this
implements that requirement.  It uses a MAKE_JOBS variable that can be set by
the user to limit the job count.  By default the job count is set to the number
of online cpus.

Signed-off-by: Neil Horman <nhorman@tuxdrier.com>
CC: Thomas Monjalon <thomas.monjalon@6wind.com>
CC: "Mcnamara, John" <john.mcnamara@intel.com>
---
 scripts/validate-abi.sh | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/scripts/validate-abi.sh b/scripts/validate-abi.sh
index c36ad61..1c9627b 100755
--- a/scripts/validate-abi.sh
+++ b/scripts/validate-abi.sh
@@ -97,6 +97,12 @@ fixup_config() {
 #trap on ctrl-c to clean up
 trap cleanup_and_exit SIGINT
 
+if [ -z "$MAKE_JOBS" ]
+then
+	# This counts the number of cpus on the system
+	MAKE_JOBS=`lscpu -p=cpu | grep -v "#" | wc -l`
+fi
+
 #Save the current branch
 CURRENT_BRANCH=`git branch | grep \* | cut -d' ' -f2`
 
@@ -183,7 +189,7 @@ log "INFO" "Configuring DPDK $TAG1"
 make config T=$TARGET O=$TARGET > $VERBOSE 2>&1
 
 log "INFO" "Building DPDK $TAG1. This might take a moment"
-make O=$TARGET > $VERBOSE 2>&1
+make -j$MAKE_JOBS O=$TARGET > $VERBOSE 2>&1
 
 if [ $? -ne 0 ]
 then
@@ -214,7 +220,7 @@ log "INFO" "Configuring DPDK $TAG2"
 make config T=$TARGET O=$TARGET > $VERBOSE 2>&1
 
 log "INFO" "Building DPDK $TAG2. This might take a moment"
-make O=$TARGET > $VERBOSE 2>&1
+make -j$MAKE_JOBS O=$TARGET > $VERBOSE 2>&1
 
 if [ $? -ne 0 ]
 then
-- 
2.5.5

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 15:22  7%     ` Thomas Monjalon
@ 2016-07-20 15:42  4%       ` Kulasek, TomaszX
  0 siblings, 0 replies; 200+ results
From: Kulasek, TomaszX @ 2016-07-20 15:42 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, July 20, 2016 17:22
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev
> structure
> 
> 2016-07-20 15:13, Ananyev, Konstantin:
> > Hi Thomas,
> >
> > > Hi,
> > >
> > > This patch announces an interesting change in the DPDK design.
> > >
> > > 2016-07-20 16:24, Tomasz Kulasek:
> > > > This is an ABI deprecation notice for DPDK 16.11 in librte_ether
> > > > about changes in rte_eth_dev and rte_eth_desc_lim structures.
> > > >
> > > > In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> > > > necessary preparations of packet burst to be safely transmitted on
> > > > device for desired HW offloads (set/reset checksum field according
> > > > to the hardware requirements) and check HW constraints (number of
> > > > segments per packet, etc).
> > > >
> > > > While the limitations and requirements may differ for devices, it
> > > > requires to extend rte_eth_dev structure with new function pointer
> > > > "tx_pkt_prep" which can be implemented in the driver to prepare
> > > > and verify packets, in devices specific way, before burst, what
> > > > should to prevent application to send malformed packets.
> > > >
> > > > Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max
> > > > and nb_mtu_seg_max, providing an information about max segments in
> > > > TSO and non TSO packets acceptable by device.
> > >
> > > We cannot acknowledge such notice without a prior design discussion.
> > > Please explain why you plan to work on this change and give a draft of
> the new structures (a RFC patch would be ideal).
> >
> > I think it is not really a deprecation note, but announce ABI change for
> rte_ethdev.h structures.
> 
> An ABI break requires a deprecation notice. So it is :)
> 
> > The plan is to implement what was proposed & discussed the following
> thread:
> > http://dpdk.org/ml/archives/dev/2015-September/023603.html
> 
> Please could you summarize it here?

Hi Thomas,

The implementation of rte_eth_tx_prep() will be similar to the rte_eth_tx_burst(), passing same arguments to the driver, so packets can be checked and modified before real burst will be done.

The API for new function will be implemented in the fallowed way:

+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_id*.
+ * The *nb_pkts* parameter is the number of packets to be prepared which are
+ * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
+ * allocated from a pool created with rte_pktmbuf_pool_create().
+ * For each packet to send, the rte_eth_tx_prep() function performs
+ * the following operations:
+ *
+ * - Check if packet meets devices requirements for tx offloads.
+ *
+ * - Check limitations about number of segments.
+ *
+ * - Check additional requirements when debug is enabled.
+ *
+ * - Update and/or reset required checksums when tx offload is set for packet.
+ *
+ * The rte_eth_tx_prep() function returns the number of packets ready it
+ * actually sent. A return value equal to *nb_pkts* means that all packets
+ * are valid and ready to be sent.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param tx_pkts
+ *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *   which contain the output packets.
+ * @param nb_pkts
+ *   The maximum number of packets to process.
+ * @return
+ *   The number of packets correct and ready to be sent. The return value can be
+ *   less than the value of the *tx_pkts* parameter when some packet doesn't
+ *   meet devices requirements with rte_errno set appropriately.
+ */
+static inline uint16_t
+rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts,
+		uint16_t nb_pkts)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (!dev->tx_pkt_prep) {
+		rte_errno = -ENOTSUP;
+		return 0;
+	}
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	if (queue_id >= dev->data->nb_tx_queues) {
+		RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		rte_errno = -EINVAL;
+		return 0;
+	}
+#endif
+
+	return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
+}

Tomasz

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 15:13  7%   ` Ananyev, Konstantin
@ 2016-07-20 15:22  7%     ` Thomas Monjalon
  2016-07-20 15:42  4%       ` Kulasek, TomaszX
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-20 15:22 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Kulasek, TomaszX, dev

2016-07-20 15:13, Ananyev, Konstantin:
> Hi Thomas,
> 
> > Hi,
> > 
> > This patch announces an interesting change in the DPDK design.
> > 
> > 2016-07-20 16:24, Tomasz Kulasek:
> > > This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> > > changes in rte_eth_dev and rte_eth_desc_lim structures.
> > >
> > > In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> > > necessary preparations of packet burst to be safely transmitted on
> > > device for desired HW offloads (set/reset checksum field according to
> > > the hardware requirements) and check HW constraints (number of
> > > segments per packet, etc).
> > >
> > > While the limitations and requirements may differ for devices, it
> > > requires to extend rte_eth_dev structure with new function pointer
> > > "tx_pkt_prep" which can be implemented in the driver to prepare and
> > > verify packets, in devices specific way, before burst, what should to
> > > prevent application to send malformed packets.
> > >
> > > Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> > > nb_mtu_seg_max, providing an information about max segments in TSO and
> > > non TSO packets acceptable by device.
> > 
> > We cannot acknowledge such notice without a prior design discussion.
> > Please explain why you plan to work on this change and give a draft of the new structures (a RFC patch would be ideal).
> 
> I think it is not really a deprecation note, but announce ABI change for rte_ethdev.h structures.

An ABI break requires a deprecation notice. So it is :)

> The plan is to implement what was proposed & discussed the following thread:
> http://dpdk.org/ml/archives/dev/2015-September/023603.html

Please could you summarize it here?

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 15:01  4% ` Thomas Monjalon
@ 2016-07-20 15:13  7%   ` Ananyev, Konstantin
  2016-07-20 15:22  7%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-07-20 15:13 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev

Hi Thomas,

> Hi,
> 
> This patch announces an interesting change in the DPDK design.
> 
> 2016-07-20 16:24, Tomasz Kulasek:
> > This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> > changes in rte_eth_dev and rte_eth_desc_lim structures.
> >
> > In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> > necessary preparations of packet burst to be safely transmitted on
> > device for desired HW offloads (set/reset checksum field according to
> > the hardware requirements) and check HW constraints (number of
> > segments per packet, etc).
> >
> > While the limitations and requirements may differ for devices, it
> > requires to extend rte_eth_dev structure with new function pointer
> > "tx_pkt_prep" which can be implemented in the driver to prepare and
> > verify packets, in devices specific way, before burst, what should to
> > prevent application to send malformed packets.
> >
> > Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> > nb_mtu_seg_max, providing an information about max segments in TSO and
> > non TSO packets acceptable by device.
> 
> We cannot acknowledge such notice without a prior design discussion.
> Please explain why you plan to work on this change and give a draft of the new structures (a RFC patch would be ideal).

I think it is not really a deprecation note, but announce ABI change for rte_ethdev.h structures.
The plan is to implement what was proposed & discussed the following thread:
http://dpdk.org/ml/archives/dev/2015-September/023603.html

Konstantin

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
  2016-07-20 14:24 13% [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
@ 2016-07-20 15:01  4% ` Thomas Monjalon
  2016-07-20 15:13  7%   ` Ananyev, Konstantin
  2016-07-21 15:24 11% ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
  2016-07-31  7:46  4% ` [dpdk-dev] [PATCH] " Vlad Zolotarov
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-20 15:01 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

Hi,

This patch announces an interesting change in the DPDK design.

2016-07-20 16:24, Tomasz Kulasek:
> This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
> changes in rte_eth_dev and rte_eth_desc_lim structures.
> 
> In 16.11, we plan to introduce rte_eth_tx_prep() function to do
> necessary preparations of packet burst to be safely transmitted on
> device for desired HW offloads (set/reset checksum field according to
> the hardware requirements) and check HW constraints (number of segments
> per packet, etc).
> 
> While the limitations and requirements may differ for devices, it
> requires to extend rte_eth_dev structure with new function pointer
> "tx_pkt_prep" which can be implemented in the driver to prepare and
> verify packets, in devices specific way, before burst, what should to
> prevent application to send malformed packets.
> 
> Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
> nb_mtu_seg_max, providing an information about max segments in TSO and
> non TSO packets acceptable by device.

We cannot acknowledge such notice without a prior design discussion.
Please explain why you plan to work on this change and give a draft of
the new structures (a RFC patch would be ideal).

Thanks

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure
@ 2016-07-20 14:24 13% Tomasz Kulasek
  2016-07-20 15:01  4% ` Thomas Monjalon
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Tomasz Kulasek @ 2016-07-20 14:24 UTC (permalink / raw)
  To: dev

This is an ABI deprecation notice for DPDK 16.11 in librte_ether about
changes in rte_eth_dev and rte_eth_desc_lim structures.

In 16.11, we plan to introduce rte_eth_tx_prep() function to do
necessary preparations of packet burst to be safely transmitted on
device for desired HW offloads (set/reset checksum field according to
the hardware requirements) and check HW constraints (number of segments
per packet, etc).

While the limitations and requirements may differ for devices, it
requires to extend rte_eth_dev structure with new function pointer
"tx_pkt_prep" which can be implemented in the driver to prepare and
verify packets, in devices specific way, before burst, what should to
prevent application to send malformed packets.

Also new fields will be introduced in rte_eth_desc_lim: nb_seg_max and
nb_mtu_seg_max, providing an information about max segments in TSO and
non TSO packets acceptable by device.

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 doc/guides/rel_notes/deprecation.rst |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..485aacb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,3 +41,10 @@ Deprecation Notices
 * The mempool functions for single/multi producer/consumer are deprecated and
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
+
+* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
+  extended with new function pointer ``tx_pkt_prep`` allowing verification
+  and processing of packet burst to meet HW specific requirements before
+  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
+  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about number of
+  segments limit to be transmitted by device for TSO/non-TSO packets.
-- 
1.7.9.5

^ permalink raw reply	[relevance 13%]

* [dpdk-dev] [PATCH] unify tools naming
@ 2016-07-20 14:24  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-20 14:24 UTC (permalink / raw)
  To: dev

The following tools may be installed system-wise.
It may be cleaner and more convenient to find them with the same
dpdk- prefix (especially for autocompletion).
Moreover, the script dpdk_nic_bind.py deserves a new name because it is
not restricted to NICs and can be used for e.g. crypto.

These files are renamed:
pmdinfogen       -> dpdk-pmdinfogen
pmdinfo.py       -> dpdk-pmdinfo.py
dpdk_pdump       -> dpdk-pdump
dpdk_proc_info   -> dpdk-procinfo
dpdk_nic_bind.py -> dpdk-devbind.py
setup.sh         -> dpdk-setup.sh

The tools pmdinfogen, pmdinfo.py and dpdk_pdump are new in 16.07.

The scripts dpdk_nic_bind.py and setup.sh may have been used with
previous releases by end users. That's why a symbolic link still
provide the old name in the installed tools directory.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---

It would be good to have this rename in 16.07 before that pmdinfo.py
and dpdk_pdump are widely used.

A possible addition to this patch could be renaming the test apps:
test         -> dpdk-test
testacl      -> dpdk-testacl
testpipeline -> dpdk-testpipeline
testpmd      -> dpdk-testpmd

---
 MAINTAINERS                                      |  2 +-
 app/pdump/Makefile                               |  2 +-
 app/proc_info/Makefile                           |  2 +-
 buildtools/pmdinfogen/Makefile                   |  2 +-
 doc/guides/faq/faq.rst                           |  2 +-
 doc/guides/linux_gsg/build_dpdk.rst              | 14 +++++++-------
 doc/guides/linux_gsg/nic_perf_intel_platform.rst |  6 +++---
 doc/guides/linux_gsg/quick_start.rst             | 12 ++++++------
 doc/guides/nics/bnx2x.rst                        |  4 ++--
 doc/guides/nics/cxgbe.rst                        |  4 ++--
 doc/guides/nics/ena.rst                          |  2 +-
 doc/guides/nics/enic.rst                         |  6 +++---
 doc/guides/nics/i40e.rst                         |  4 ++--
 doc/guides/nics/intel_vf.rst                     |  8 ++++----
 doc/guides/nics/nfp.rst                          |  8 ++++----
 doc/guides/nics/qede.rst                         |  2 +-
 doc/guides/nics/thunderx.rst                     | 16 ++++++++--------
 doc/guides/nics/virtio.rst                       |  2 +-
 doc/guides/prog_guide/dev_kit_build_system.rst   | 14 +++++++-------
 doc/guides/rel_notes/release_16_07.rst           |  3 +++
 doc/guides/sample_app_ug/pdump.rst               | 14 +++++++-------
 doc/guides/sample_app_ug/proc_info.rst           |  8 ++++----
 doc/guides/testpmd_app_ug/testpmd_funcs.rst      | 10 +++++-----
 doc/guides/xen/pkt_switch.rst                    |  2 +-
 lib/librte_eal/common/eal_common_options.c       |  4 ++--
 mk/internal/rte.compile-pre.mk                   |  2 +-
 mk/rte.sdkinstall.mk                             | 16 ++++++++++------
 mk/rte.sdktest.mk                                |  4 ++--
 tools/{dpdk_nic_bind.py => dpdk-devbind.py}      |  0
 tools/{pmdinfo.py => dpdk-pmdinfo.py}            |  4 +---
 tools/{setup.sh => dpdk-setup.sh}                | 24 ++++++++++++------------
 31 files changed, 104 insertions(+), 99 deletions(-)
 rename tools/{dpdk_nic_bind.py => dpdk-devbind.py} (100%)
 rename tools/{pmdinfo.py => dpdk-pmdinfo.py} (99%)
 rename tools/{setup.sh => dpdk-setup.sh} (95%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2996b09..3bfcc9f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -70,7 +70,7 @@ F: scripts/validate-abi.sh
 
 Driver information
 F: buildtools/pmdinfogen/
-F: tools/pmdinfo.py
+F: tools/dpdk-pmdinfo.py
 
 
 Environment Abstraction Layer
diff --git a/app/pdump/Makefile b/app/pdump/Makefile
index d85bb08..536198f 100644
--- a/app/pdump/Makefile
+++ b/app/pdump/Makefile
@@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 ifeq ($(CONFIG_RTE_LIBRTE_PDUMP),y)
 
-APP = dpdk_pdump
+APP = dpdk-pdump
 
 CFLAGS += $(WERROR_FLAGS)
 
diff --git a/app/proc_info/Makefile b/app/proc_info/Makefile
index 33e058e..e051e03 100644
--- a/app/proc_info/Makefile
+++ b/app/proc_info/Makefile
@@ -31,7 +31,7 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
-APP = dpdk_proc_info
+APP = dpdk-procinfo
 
 CFLAGS += $(WERROR_FLAGS)
 
diff --git a/buildtools/pmdinfogen/Makefile b/buildtools/pmdinfogen/Makefile
index 3885d3b..bd8f900 100644
--- a/buildtools/pmdinfogen/Makefile
+++ b/buildtools/pmdinfogen/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 # library name
 #
-HOSTAPP = pmdinfogen
+HOSTAPP = dpdk-pmdinfogen
 
 #
 # all sources are stored in SRCS-y
diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index 3228b92..8d1ea6c 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -50,7 +50,7 @@ When you stop and restart the test application, it looks to see if the pages are
 If you look in the directory, you will see ``n`` number of 2M pages files. If you specified 1024, you will see 1024 page files.
 These are then placed in memory segments to get contiguous memory.
 
-If you need to change the number of pages, it is easier to first remove the pages. The tools/setup.sh script provides an option to do this.
+If you need to change the number of pages, it is easier to first remove the pages. The tools/dpdk-setup.sh script provides an option to do this.
 See the "Quick Start Setup Script" section in the :ref:`DPDK Getting Started Guide <linux_gsg>` for more information.
 
 
diff --git a/doc/guides/linux_gsg/build_dpdk.rst b/doc/guides/linux_gsg/build_dpdk.rst
index fb2c481..f8007b3 100644
--- a/doc/guides/linux_gsg/build_dpdk.rst
+++ b/doc/guides/linux_gsg/build_dpdk.rst
@@ -198,7 +198,7 @@ however please consult your distributions documentation to make sure that is the
 Also, to use VFIO, both kernel and BIOS must support and be configured to use IO virtualization (such as Intel® VT-d).
 
 For proper operation of VFIO when running DPDK applications as a non-privileged user, correct permissions should also be set up.
-This can be done by using the DPDK setup script (called setup.sh and located in the tools directory).
+This can be done by using the DPDK setup script (called dpdk-setup.sh and located in the tools directory).
 
 .. _linux_gsg_binding_kernel:
 
@@ -224,7 +224,7 @@ and to bind and unbind those ports from the different kernel modules, including
 The following are some examples of how the script can be used.
 A full description of the script and its parameters can be obtained by calling the script with the ``--help`` or ``--usage`` options.
 Note that the uio or vfio kernel modules to be used, should be loaded into the kernel before
-running the ``dpdk_nic_bind.py`` script.
+running the ``dpdk-devbind.py`` script.
 
 .. warning::
 
@@ -238,14 +238,14 @@ running the ``dpdk_nic_bind.py`` script.
 
 .. warning::
 
-    While any user can run the dpdk_nic_bind.py script to view the status of the network ports,
+    While any user can run the dpdk-devbind.py script to view the status of the network ports,
     binding or unbinding network ports requires root privileges.
 
 To see the status of all network ports on the system:
 
 .. code-block:: console
 
-    ./tools/dpdk_nic_bind.py --status
+    ./tools/dpdk-devbind.py --status
 
     Network devices using DPDK-compatible driver
     ============================================
@@ -267,16 +267,16 @@ To bind device ``eth1``,``04:00.1``, to the ``uio_pci_generic`` driver:
 
 .. code-block:: console
 
-    ./tools/dpdk_nic_bind.py --bind=uio_pci_generic 04:00.1
+    ./tools/dpdk-devbind.py --bind=uio_pci_generic 04:00.1
 
 or, alternatively,
 
 .. code-block:: console
 
-    ./tools/dpdk_nic_bind.py --bind=uio_pci_generic eth1
+    ./tools/dpdk-devbind.py --bind=uio_pci_generic eth1
 
 To restore device ``82:00.0`` to its original kernel binding:
 
 .. code-block:: console
 
-    ./tools/dpdk_nic_bind.py --bind=ixgbe 82:00.0
+    ./tools/dpdk-devbind.py --bind=ixgbe 82:00.0
diff --git a/doc/guides/linux_gsg/nic_perf_intel_platform.rst b/doc/guides/linux_gsg/nic_perf_intel_platform.rst
index b433732..d4a8362 100644
--- a/doc/guides/linux_gsg/nic_perf_intel_platform.rst
+++ b/doc/guides/linux_gsg/nic_perf_intel_platform.rst
@@ -192,12 +192,12 @@ Configurations before running DPDK
 
 
       # Bind ports 82:00.0 and 85:00.0 to dpdk driver
-      ./dpdk_folder/tools/dpdk_nic_bind.py -b igb_uio 82:00.0 85:00.0
+      ./dpdk_folder/tools/dpdk-devbind.py -b igb_uio 82:00.0 85:00.0
 
       # Check the port driver status
-      ./dpdk_folder/tools/dpdk_nic_bind.py --status
+      ./dpdk_folder/tools/dpdk-devbind.py --status
 
-   See ``dpdk_nic_bind.py --help`` for more details.
+   See ``dpdk-devbind.py --help`` for more details.
 
 
 More details about DPDK setup and Linux kernel requirements see :ref:`linux_gsg_compiling_dpdk`.
diff --git a/doc/guides/linux_gsg/quick_start.rst b/doc/guides/linux_gsg/quick_start.rst
index 1e0f8ff..8789b58 100644
--- a/doc/guides/linux_gsg/quick_start.rst
+++ b/doc/guides/linux_gsg/quick_start.rst
@@ -33,7 +33,7 @@
 Quick Start Setup Script
 ========================
 
-The setup.sh script, found in the tools subdirectory, allows the user to perform the following tasks:
+The dpdk-setup.sh script, found in the tools subdirectory, allows the user to perform the following tasks:
 
 *   Build the DPDK libraries
 
@@ -63,7 +63,7 @@ the user may compile their own application that links in the EAL libraries to cr
 Script Organization
 -------------------
 
-The setup.sh script is logically organized into a series of steps that a user performs in sequence.
+The dpdk-setup.sh script is logically organized into a series of steps that a user performs in sequence.
 Each step provides a number of options that guide the user to completing the desired task.
 The following is a brief synopsis of each step.
 
@@ -98,17 +98,17 @@ The final step has options for restoring the system to its original state.
 Use Cases
 ---------
 
-The following are some example of how to use the setup.sh script.
+The following are some example of how to use the dpdk-setup.sh script.
 The script should be run using the source command.
 Some options in the script prompt the user for further data before proceeding.
 
 .. warning::
 
-    The setup.sh script should be run with root privileges.
+    The dpdk-setup.sh script should be run with root privileges.
 
 .. code-block:: console
 
-    source tools/setup.sh
+    source tools/dpdk-setup.sh
 
     ------------------------------------------------------------------------
 
@@ -269,7 +269,7 @@ The following selection demonstrates the launch of the test application to run o
 Applications
 ------------
 
-Once the user has run the setup.sh script, built one of the EAL targets and set up hugepages (if using one of the Linux EAL targets),
+Once the user has run the dpdk-setup.sh script, built one of the EAL targets and set up hugepages (if using one of the Linux EAL targets),
 the user can then move on to building and running their application or one of the examples provided.
 
 The examples in the /examples directory provide a good starting point to gain an understanding of the operation of the DPDK.
diff --git a/doc/guides/nics/bnx2x.rst b/doc/guides/nics/bnx2x.rst
index df8fb47..6453168 100644
--- a/doc/guides/nics/bnx2x.rst
+++ b/doc/guides/nics/bnx2x.rst
@@ -207,7 +207,7 @@ devices managed by ``librte_pmd_bnx2x`` in Linux operating system.
 #. Bind the QLogic adapters to ``igb_uio`` or ``vfio-pci`` loaded in the
    previous step::
 
-      ./tools/dpdk_nic_bind.py --bind igb_uio 0000:84:00.0 0000:84:00.1
+      ./tools/dpdk-devbind.py --bind igb_uio 0000:84:00.0 0000:84:00.1
 
    or
 
@@ -219,7 +219,7 @@ devices managed by ``librte_pmd_bnx2x`` in Linux operating system.
 
       sudo chmod 0666 /dev/vfio/*
 
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:84:00.0 0000:84:00.1
+      ./tools/dpdk-devbind.py --bind vfio-pci 0000:84:00.0 0000:84:00.1
 
 #. Start ``testpmd`` with basic parameters:
 
diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst
index d718f19..d8236b0 100644
--- a/doc/guides/nics/cxgbe.rst
+++ b/doc/guides/nics/cxgbe.rst
@@ -285,7 +285,7 @@ devices managed by librte_pmd_cxgbe in Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind igb_uio 0000:02:00.4
+      ./tools/dpdk-devbind.py --bind igb_uio 0000:02:00.4
 
    or
 
@@ -297,7 +297,7 @@ devices managed by librte_pmd_cxgbe in Linux operating system.
 
       sudo chmod 0666 /dev/vfio/*
 
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:02:00.4
+      ./tools/dpdk-devbind.py --bind vfio-pci 0000:02:00.4
 
    .. note::
 
diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 9f93848..073b35a 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -225,7 +225,7 @@ devices managed by librte_pmd_ena.
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind=igb_uio 0000:02:00.1
+      ./tools/dpdk-devbind.py --bind=igb_uio 0000:02:00.1
 
 #. Start testpmd with basic parameters:
 
diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst
index e67c3db..2e7d7a4 100644
--- a/doc/guides/nics/enic.rst
+++ b/doc/guides/nics/enic.rst
@@ -177,13 +177,13 @@ Prerequisites
 - DPDK suite should be configured based on the user's decision to use VFIO or
   UIO framework
 - If the vNIC device(s) to be used is bound to the kernel mode Ethernet driver
-  (enic), use 'ifconfig' to bring the interface down. The dpdk_nic_bind.py tool
+  (enic), use 'ifconfig' to bring the interface down. The dpdk-devbind.py tool
   can then be used to unbind the device's bus id from the enic kernel mode
   driver.
 - Bind the intended vNIC to vfio-pci in case the user wants ENIC PMD to use
-  VFIO framework using dpdk_nic_bind.py.
+  VFIO framework using dpdk-devbind.py.
 - Bind the intended vNIC to igb_uio in case the user wants ENIC PMD to use
-  UIO framework using dpdk_nic_bind.py.
+  UIO framework using dpdk-devbind.py.
 
 At this point the system should be ready to run DPDK applications. Once the
 application runs to completion, the vNIC can be detached from vfio-pci or
diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index da695af..4d12b10 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -164,13 +164,13 @@ devices managed by ``librte_pmd_i40e`` in the Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind igb_uio 0000:83:00.0
+      ./tools/dpdk-devbind.py --bind igb_uio 0000:83:00.0
 
    Or setup VFIO permissions for regular users and then bind to ``vfio-pci``:
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0000:83:00.0
+      ./tools/dpdk-devbind.py --bind vfio-pci 0000:83:00.0
 
 #. Start ``testpmd`` with basic parameters:
 
diff --git a/doc/guides/nics/intel_vf.rst b/doc/guides/nics/intel_vf.rst
index a68198f..95a79b5 100644
--- a/doc/guides/nics/intel_vf.rst
+++ b/doc/guides/nics/intel_vf.rst
@@ -151,7 +151,7 @@ For example,
 
         modprobe uio
         insmod igb_uio
-        ./dpdk_nic_bind.py -b igb_uio bb:ss.f
+        ./dpdk-devbind.py -b igb_uio bb:ss.f
         echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device)
 
     Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library.
@@ -236,7 +236,7 @@ For example,
 
         modprobe uio
         insmod igb_uio
-        ./dpdk_nic_bind.py -b igb_uio bb:ss.f
+        ./dpdk-devbind.py -b igb_uio bb:ss.f
         echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific PCI device)
 
     Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library.
@@ -285,7 +285,7 @@ For example,
     .. code-block:: console
 
         insmod igb_uio
-        ./dpdk_nic_bind.py -b igb_uio bb:ss.f
+        ./dpdk-devbind.py -b igb_uio bb:ss.f
         echo 2 > /sys/bus/pci/devices/0000\:bb\:ss.f/max_vfs (To enable two VFs on a specific pci device)
 
     Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library.
@@ -406,7 +406,7 @@ The setup procedure is as follows:
 
         modprobe uio
         insmod igb_uio
-        ./dpdk_nic_bind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1
+        ./dpdk-devbind.py -b igb_uio 02:00.0 02:00.1 0e:00.0 0e:00.1
         echo 2 > /sys/bus/pci/devices/0000\:02\:00.0/max_vfs
         echo 2 > /sys/bus/pci/devices/0000\:02\:00.1/max_vfs
         echo 2 > /sys/bus/pci/devices/0000\:0e\:00.0/max_vfs
diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index e4ebc71..4ef6e02 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -242,9 +242,9 @@ Using the NFP PMD is not different to using other PMDs. Usual steps are:
    useful for installing the UIO modules and for binding the right device to those
    modules avoiding doing so manually:
 
-   * **setup.sh**
-   * **dpdk_nic_bind.py**
+   * **dpdk-setup.sh**
+   * **dpdk-devbind.py**
 
-   Configuration may be performed by running setup.sh which invokes
-   dpdk_nic_bind.py as needed. Executing setup.sh will display a menu of
+   Configuration may be performed by running dpdk-setup.sh which invokes
+   dpdk-devbind.py as needed. Executing dpdk-setup.sh will display a menu of
    configuration options.
diff --git a/doc/guides/nics/qede.rst b/doc/guides/nics/qede.rst
index f7ca8eb..53d749c 100644
--- a/doc/guides/nics/qede.rst
+++ b/doc/guides/nics/qede.rst
@@ -177,7 +177,7 @@ devices managed by ``librte_pmd_qede`` in Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind igb_uio 0000:84:00.0 0000:84:00.1 \
+      ./tools/dpdk-devbind.py --bind igb_uio 0000:84:00.0 0000:84:00.1 \
                                               0000:84:00.2 0000:84:00.3
 
 #. Start ``testpmd`` with basic parameters:
diff --git a/doc/guides/nics/thunderx.rst b/doc/guides/nics/thunderx.rst
index e38f260..248b1af 100644
--- a/doc/guides/nics/thunderx.rst
+++ b/doc/guides/nics/thunderx.rst
@@ -146,7 +146,7 @@ managed by ``librte_pmd_thunderx_nicvf`` in the Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0002:01:00.2
+      ./tools/dpdk-devbind.py --bind vfio-pci 0002:01:00.2
 
 #. Start ``testpmd`` with basic parameters:
 
@@ -246,11 +246,11 @@ This section provides instructions to configure SR-IOV with Linux OS.
 
       Unless ``thunder-nicvf`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_VF`` setting.
 
-#. Verify PF/VF bind using ``dpdk_nic_bind.py``:
+#. Verify PF/VF bind using ``dpdk-devbind.py``:
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --status
+      ./tools/dpdk-devbind.py --status
 
    Example output:
 
@@ -268,18 +268,18 @@ This section provides instructions to configure SR-IOV with Linux OS.
 
       modprobe vfio-pci
 
-#. Bind VF devices to ``vfio-pci`` using ``dpdk_nic_bind.py``:
+#. Bind VF devices to ``vfio-pci`` using ``dpdk-devbind.py``:
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0002:01:00.1
-      ./tools/dpdk_nic_bind.py --bind vfio-pci 0002:01:00.2
+      ./tools/dpdk-devbind.py --bind vfio-pci 0002:01:00.1
+      ./tools/dpdk-devbind.py --bind vfio-pci 0002:01:00.2
 
-#. Verify VF bind using ``dpdk_nic_bind.py``:
+#. Verify VF bind using ``dpdk-devbind.py``:
 
    .. code-block:: console
 
-      ./tools/dpdk_nic_bind.py --status
+      ./tools/dpdk-devbind.py --status
 
    Example output:
 
diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
index c6335d4..5431015 100644
--- a/doc/guides/nics/virtio.rst
+++ b/doc/guides/nics/virtio.rst
@@ -172,7 +172,7 @@ Host2VM communication example
         modprobe uio
         echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
         modprobe uio_pci_generic
-        python tools/dpdk_nic_bind.py -b uio_pci_generic 00:03.0
+        python tools/dpdk-devbind.py -b uio_pci_generic 00:03.0
 
     We use testpmd as the forwarding application in this example.
 
diff --git a/doc/guides/prog_guide/dev_kit_build_system.rst b/doc/guides/prog_guide/dev_kit_build_system.rst
index 18a3010..fa2411f 100644
--- a/doc/guides/prog_guide/dev_kit_build_system.rst
+++ b/doc/guides/prog_guide/dev_kit_build_system.rst
@@ -309,11 +309,11 @@ Misc
 Internally Generated Build Tools
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-``app/pmdinfogen``
+``app/dpdk-pmdinfogen``
 
 
-``pmdinfogen`` scans an object (.o) file for various well known symbol names.  These
-well known symbol names are defined by various macros and used to export
+``dpdk-pmdinfogen`` scans an object (.o) file for various well known symbol names.
+These well known symbol names are defined by various macros and used to export
 important information about hardware support and usage for pmd files.  For
 instance the macro:
 
@@ -328,10 +328,10 @@ Creates the following symbol:
    static char this_pmd_name0[] __attribute__((used)) = "<name>";
 
 
-Which pmdinfogen scans for.  Using this information other relevant bits of data
-can be exported from the object file and used to produce a hardware support
-description, that pmdinfogen then encodes into a json formatted string in the
-following format:
+Which ``dpdk-pmdinfogen`` scans for.  Using this information other relevant
+bits of data can be exported from the object file and used to produce a
+hardware support description, that ``dpdk-pmdinfogen`` then encodes into a
+json formatted string in the following format:
 
 .. code-block:: c
 
diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d3a144f..b79e710 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -289,6 +289,9 @@ API Changes
 * The function ``rte_eth_dev_set_mtu`` adds a new return value ``-EBUSY``, which
   indicates the operation is forbidden because the port is running.
 
+* The script ``dpdk_nic_bind.py`` is renamed to ``dpdk-devbind.py``.
+  And the script ``setup.sh`` is renamed to ``dpdk-setup.sh``.
+
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/pdump.rst b/doc/guides/sample_app_ug/pdump.rst
index ceb038e..ac0e7c9 100644
--- a/doc/guides/sample_app_ug/pdump.rst
+++ b/doc/guides/sample_app_ug/pdump.rst
@@ -30,15 +30,15 @@
     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
-dpdk_pdump Application
+dpdk-pdump Application
 ======================
 
-The ``dpdk_pdump`` tool is a Data Plane Development Kit (DPDK) tool that runs as
+The ``dpdk-pdump`` tool is a Data Plane Development Kit (DPDK) tool that runs as
 a DPDK secondary process and is capable of enabling packet capture on dpdk ports.
 
    .. Note::
 
-      * The ``dpdk_pdump`` tool depends on libpcap based PMD which is disabled
+      * The ``dpdk-pdump`` tool depends on libpcap based PMD which is disabled
         by default in the build configuration files,
         owing to an external dependency on the libpcap development files
         which must be installed on the board.
@@ -53,7 +53,7 @@ The tool has a number of command line options:
 
 .. code-block:: console
 
-   ./build/app/dpdk_pdump --
+   ./build/app/dpdk-pdump --
                           --pdump '(port=<port id> | device_id=<pci id or vdev name>),
                                    (queue=<queue_id>),
                                    (rx-dev=<iface or pcap file> |
@@ -95,10 +95,10 @@ PCI address (or) name of the eth device on which packets should be captured.
 
    .. Note::
 
-      * As of now the ``dpdk_pdump`` tool cannot capture the packets of virtual devices
+      * As of now the ``dpdk-pdump`` tool cannot capture the packets of virtual devices
         in the primary process due to a bug in the ethdev library. Due to this bug, in a multi process context,
         when the primary and secondary have different ports set, then the secondary process
-        (here the ``dpdk_pdump`` tool) overwrites the ``rte_eth_devices[]`` entries of the primary process.
+        (here the ``dpdk-pdump`` tool) overwrites the ``rte_eth_devices[]`` entries of the primary process.
 
 ``queue``:
 Queue id of the eth device on which packets should be captured. The user can pass a queue value of ``*`` to enable
@@ -141,4 +141,4 @@ Example
 
 .. code-block:: console
 
-   $ sudo ./build/app/dpdk_pdump -- --pdump 'port=0,queue=*,rx-dev=/tmp/rx.pcap'
+   $ sudo ./build/app/dpdk-pdump -- --pdump 'port=0,queue=*,rx-dev=/tmp/rx.pcap'
diff --git a/doc/guides/sample_app_ug/proc_info.rst b/doc/guides/sample_app_ug/proc_info.rst
index 542950b..73f2195 100644
--- a/doc/guides/sample_app_ug/proc_info.rst
+++ b/doc/guides/sample_app_ug/proc_info.rst
@@ -30,10 +30,10 @@
     OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
-dpdk_proc_info Application
-==========================
+dpdk-procinfo Application
+=========================
 
-The dpdk_proc_info application is a Data Plane Development Kit (DPDK) application
+The dpdk-procinfo application is a Data Plane Development Kit (DPDK) application
 that runs as a DPDK secondary process and is capable of retrieving port
 statistics, resetting port statistics and printing DPDK memory information.
 This application extends the original functionality that was supported by
@@ -45,7 +45,7 @@ The application has a number of command line options:
 
 .. code-block:: console
 
-   ./$(RTE_TARGET)/app/dpdk_proc_info -- -m | [-p PORTMASK] [--stats | --xstats |
+   ./$(RTE_TARGET)/app/dpdk-procinfo -- -m | [-p PORTMASK] [--stats | --xstats |
    --stats-reset | --xstats-reset]
 
 Parameters
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 30e410d..f87e0c2 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -999,7 +999,7 @@ For example, to move a pci device using ixgbe under DPDK management:
 .. code-block:: console
 
    # Check the status of the available devices.
-   ./tools/dpdk_nic_bind.py --status
+   ./tools/dpdk-devbind.py --status
 
    Network devices using DPDK-compatible driver
    ============================================
@@ -1011,11 +1011,11 @@ For example, to move a pci device using ixgbe under DPDK management:
 
 
    # Bind the device to igb_uio.
-   sudo ./tools/dpdk_nic_bind.py -b igb_uio 0000:0a:00.0
+   sudo ./tools/dpdk-devbind.py -b igb_uio 0000:0a:00.0
 
 
    # Recheck the status of the devices.
-   ./tools/dpdk_nic_bind.py --status
+   ./tools/dpdk-devbind.py --status
    Network devices using DPDK-compatible driver
    ============================================
    0000:0a:00.0 '82599ES 10-Gigabit' drv=igb_uio unused=
@@ -1118,9 +1118,9 @@ For example, to move a pci device under kernel management:
 
 .. code-block:: console
 
-   sudo ./tools/dpdk_nic_bind.py -b ixgbe 0000:0a:00.0
+   sudo ./tools/dpdk-devbind.py -b ixgbe 0000:0a:00.0
 
-   ./tools/dpdk_nic_bind.py --status
+   ./tools/dpdk-devbind.py --status
 
    Network devices using DPDK-compatible driver
    ============================================
diff --git a/doc/guides/xen/pkt_switch.rst b/doc/guides/xen/pkt_switch.rst
index 3a6fc47..00a8f0c 100644
--- a/doc/guides/xen/pkt_switch.rst
+++ b/doc/guides/xen/pkt_switch.rst
@@ -323,7 +323,7 @@ Building and Running the Switching Backend
     .. code-block:: console
 
         modprobe uio_pci_generic
-        python tools/dpdk_nic_bind.py -b uio_pci_generic 0000:09:00:00.0
+        python tools/dpdk-devbind.py -b uio_pci_generic 0000:09:00:00.0
 
     In this case, 0000:09:00.0 is the PCI address for the NIC controller.
 
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 0a594d7..481c732 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -116,9 +116,9 @@ TAILQ_HEAD_INITIALIZER(solib_list);
 static const char *default_solib_dir = RTE_EAL_PMD_PATH;
 
 /*
- * Stringified version of solib path used by pmdinfo.py
+ * Stringified version of solib path used by dpdk-pmdinfo.py
  * Note: PLEASE DO NOT ALTER THIS without making a corresponding
- * change to tools/pmdinfo.py
+ * change to tools/dpdk-pmdinfo.py
  */
 static const char dpdk_solib_path[] __attribute__((used)) =
 "DPDK_PLUGIN_PATH=" RTE_EAL_PMD_PATH;
diff --git a/mk/internal/rte.compile-pre.mk b/mk/internal/rte.compile-pre.mk
index 9c25ff6..f740179 100644
--- a/mk/internal/rte.compile-pre.mk
+++ b/mk/internal/rte.compile-pre.mk
@@ -84,7 +84,7 @@ C_TO_O = $(CC) -Wp,-MD,$(call obj2dep,$(@)).tmp $(CFLAGS) \
 C_TO_O_STR = $(subst ','\'',$(C_TO_O)) #'# fix syntax highlight
 C_TO_O_DISP = $(if $(V),"$(C_TO_O_STR)","  CC $(@)")
 endif
-PMDINFO_GEN = $(RTE_SDK_BIN)/app/pmdinfogen $@ $@.pmd.c
+PMDINFO_GEN = $(RTE_SDK_BIN)/app/dpdk-pmdinfogen $@ $@.pmd.c
 PMDINFO_CC = $(CC) $(CFLAGS) -c -o $@.pmd.o $@.pmd.c
 PMDINFO_LD = $(CROSS)ld $(LDFLAGS) -r -o $@.o $@.pmd.o $@
 PMDINFO_TO_O = if grep -q 'PMD_REGISTER_DRIVER(.*)' $<; then \
diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 7cd352c..5217063 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -117,18 +117,22 @@ install-runtime:
 	$(Q)cp -a    $O/lib/* $(DESTDIR)$(libdir)
 	$(Q)$(call rte_mkdir, $(DESTDIR)$(bindir))
 	$(Q)tar -cf -      -C $O --exclude 'app/*.map' \
-		--exclude app/pmdinfogen \
+		--exclude app/dpdk-pmdinfogen \
 		--exclude 'app/cmdline*' --exclude app/test \
 		--exclude app/testacl --exclude app/testpipeline app | \
 	    tar -xf -      -C $(DESTDIR)$(bindir) --strip-components=1 \
 		--keep-newer-files --warning=no-ignore-newer
 	$(Q)$(call rte_mkdir,      $(DESTDIR)$(datadir))
 	$(Q)cp -a $(RTE_SDK)/tools $(DESTDIR)$(datadir)
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-setup.sh, \
+	                           $(DESTDIR)$(datadir)/tools/setup.sh)
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-devbind.py, \
+	                           $(DESTDIR)$(datadir)/tools/dpdk_nic_bind.py)
 	$(Q)$(call rte_mkdir,      $(DESTDIR)$(sbindir))
-	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk_nic_bind.py, \
-	                           $(DESTDIR)$(sbindir)/dpdk_nic_bind)
-	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/pmdinfo.py, \
-	                           $(DESTDIR)$(bindir)/dpdk_pmdinfo)
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-devbind.py, \
+	                           $(DESTDIR)$(sbindir)/dpdk-devbind)
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-pmdinfo.py, \
+	                           $(DESTDIR)$(bindir)/dpdk-pmdinfo)
 
 install-kmod:
 ifneq ($(wildcard $O/kmod/*),)
@@ -146,7 +150,7 @@ install-sdk:
 	$(Q)cp -a               $(RTE_SDK)/scripts       $(DESTDIR)$(sdkdir)
 	$(Q)$(call rte_mkdir,                            $(DESTDIR)$(targetdir)/app)
 	$(Q)cp -a               $O/.config               $(DESTDIR)$(targetdir)
-	$(Q)cp -a               $O/app/pmdinfogen        $(DESTDIR)$(targetdir)/app
+	$(Q)cp -a               $O/app/dpdk-pmdinfogen   $(DESTDIR)$(targetdir)/app
 	$(Q)$(call rte_symlink, $(DESTDIR)$(includedir), $(DESTDIR)$(targetdir)/include)
 	$(Q)$(call rte_symlink, $(DESTDIR)$(libdir),     $(DESTDIR)$(targetdir)/lib)
 
diff --git a/mk/rte.sdktest.mk b/mk/rte.sdktest.mk
index ff57181..ddbbbf6 100644
--- a/mk/rte.sdktest.mk
+++ b/mk/rte.sdktest.mk
@@ -66,7 +66,7 @@ test fast_test perf_test:
 	fi
 
 # this is a special target to ease the pain of running coverage tests
-# this runs all the autotests, cmdline_test script and dpdk_proc_info
+# this runs all the autotests, cmdline_test script and dpdk-procinfo
 coverage:
 	@mkdir -p $(AUTOTEST_DIR) ; \
 	cd $(AUTOTEST_DIR) ; \
@@ -78,7 +78,7 @@ coverage:
 			$(RTE_OUTPUT)/app/test \
 			$(RTE_TARGET) \
 			$(BLACKLIST) $(WHITELIST) ; \
-		$(RTE_OUTPUT)/app/dpdk_proc_info --file-prefix=ring_perf -- -m; \
+		$(RTE_OUTPUT)/app/dpdk-procinfo --file-prefix=ring_perf -- -m; \
 	else \
 		echo "No test found, please do a 'make build' first, or specify O=" ;\
 	fi
diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk-devbind.py
similarity index 100%
rename from tools/dpdk_nic_bind.py
rename to tools/dpdk-devbind.py
diff --git a/tools/pmdinfo.py b/tools/dpdk-pmdinfo.py
similarity index 99%
rename from tools/pmdinfo.py
rename to tools/dpdk-pmdinfo.py
index 662034a..dcc8db8 100755
--- a/tools/pmdinfo.py
+++ b/tools/dpdk-pmdinfo.py
@@ -1,6 +1,5 @@
 #!/usr/bin/env python
 # -------------------------------------------------------------------------
-# scripts/pmdinfo.py
 #
 # Utility to dump PMD_INFO_STRING support from an object file
 #
@@ -569,8 +568,7 @@ def main(stream=None):
     optparser = OptionParser(
         usage='usage: %prog [-hrtp] [-d <pci id file] <elf-file>',
         description="Dump pmd hardware support info",
-        add_help_option=True,
-        prog='pmdinfo.py')
+        add_help_option=True)
     optparser.add_option('-r', '--raw',
                          action='store_true', dest='raw_output',
                          help='Dump raw json strings')
diff --git a/tools/setup.sh b/tools/dpdk-setup.sh
similarity index 95%
rename from tools/setup.sh
rename to tools/dpdk-setup.sh
index 6097ab7..ac81b2e 100755
--- a/tools/setup.sh
+++ b/tools/dpdk-setup.sh
@@ -32,7 +32,7 @@
 #   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 #
-# Run with "source /path/to/setup.sh"
+# Run with "source /path/to/dpdk-setup.sh"
 #
 
 #
@@ -422,13 +422,13 @@ grep_meminfo()
 }
 
 #
-# Calls dpdk_nic_bind.py --status to show the NIC and what they
+# Calls dpdk-devbind.py --status to show the NIC and what they
 # are all bound to, in terms of drivers.
 #
 show_nics()
 {
 	if [ -d /sys/module/vfio_pci -o -d /sys/module/igb_uio ]; then
-		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		${RTE_SDK}/tools/dpdk-devbind.py --status
 	else
 		echo "# Please load the 'igb_uio' or 'vfio-pci' kernel module before "
 		echo "# querying or adjusting NIC device bindings"
@@ -436,16 +436,16 @@ show_nics()
 }
 
 #
-# Uses dpdk_nic_bind.py to move devices to work with vfio-pci
+# Uses dpdk-devbind.py to move devices to work with vfio-pci
 #
 bind_nics_to_vfio()
 {
 	if [ -d /sys/module/vfio_pci ]; then
-		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		${RTE_SDK}/tools/dpdk-devbind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to VFIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH &&
+		sudo ${RTE_SDK}/tools/dpdk-devbind.py -b vfio-pci $PCI_PATH &&
 			echo "OK"
 	else
 		echo "# Please load the 'vfio-pci' kernel module before querying or "
@@ -454,16 +454,16 @@ bind_nics_to_vfio()
 }
 
 #
-# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+# Uses dpdk-devbind.py to move devices to work with igb_uio
 #
 bind_nics_to_igb_uio()
 {
 	if [ -d /sys/module/igb_uio ]; then
-		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		${RTE_SDK}/tools/dpdk-devbind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/tools/dpdk-devbind.py -b igb_uio $PCI_PATH && echo "OK"
 	else
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -471,18 +471,18 @@ bind_nics_to_igb_uio()
 }
 
 #
-# Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
+# Uses dpdk-devbind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
 {
-	${RTE_SDK}/tools/dpdk_nic_bind.py --status
+	${RTE_SDK}/tools/dpdk-devbind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to unbind: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/tools/dpdk-devbind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
-- 
2.7.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-20 13:37  4%           ` Olivier Matz
@ 2016-07-20 14:01  0%             ` Richardson, Bruce
  2016-07-20 17:20  0%             ` Zoltan Kiss
  1 sibling, 0 replies; 200+ results
From: Richardson, Bruce @ 2016-07-20 14:01 UTC (permalink / raw)
  To: Olivier Matz, Zoltan Kiss, Zoltan Kiss, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Wednesday, July 20, 2016 2:37 PM
> To: Zoltan Kiss <zoltan.kiss@linaro.org>; Zoltan Kiss
> <zoltan.kiss@schaman.hu>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mempool: adjust name string size in
> related data types
> 
> Hi,
> 
> On 07/20/2016 02:41 PM, Zoltan Kiss wrote:
> >
> >
> > On 19/07/16 17:17, Olivier Matz wrote:
> >> Hi Zoltan,
> >>
> >> On 07/19/2016 05:59 PM, Zoltan Kiss wrote:
> >>>
> >>>
> >>> On 19/07/16 16:37, Olivier Matz wrote:
> >>>> Hi Zoltan,
> >>>>
> >>>> On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
> >>>>> A recent fix brought up an issue about the size of the 'name'
> fields:
> >>>>>
> >>>>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
> >>>>>
> >>>>> These relations should be observed:
> >>>>>
> >>>>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
> >>>>> strlen(RTE_RING_MZ_PREFIX) RTE_MEMPOOL_NAMESIZE <=
> >>>>> RTE_RING_NAMESIZE -
> >>>>> strlen(RTE_MEMPOOL_MZ_PREFIX)
> >>>>>
> >>>>> Setting all of them to 32 hides this restriction from the
> application.
> >>>>> This patch increases the memzone string size to accomodate for
> >>>>> these prefixes, and the same happens with the ring name string.
> >>>>> The ABI needs to be broken to fix this API issue, this way doesn't
> >>>>> break applications previously not failing due to the truncating
> >>>>> bug now fixed.
> >>>>>
> >>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
> >>>>
> >>>> I agree it is a problem for an application because it cannot know
> >>>> what is the maximum name length. On the other hand, breaking the
> >>>> ABI for this looks a bit overkill. Maybe we could reduce
> >>>> RTE_MEMPOOL_NAMESIZE and RTE_RING_NAMESIZE instead of increasing
> >>>> RTE_MEMZONE_NAMESIZE? That way, we could keep the ABI as is.
> >>>
> >>> But that would break the ABI too, wouldn't it? Unless you keep the
> >>> array the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.
> >>
> >> Yes, that was the idea.
> >>
> >>> And even then, the API breaks anyway. There are applications - I
> >>> have at least some - which use all 32 bytes to store the name.
> >>> Decrease that would cause headache to change the naming scheme,
> >>> because it's a 30 character long id, and chopping the last few chars
> >>> would cause name collisions and annoying bugs.
> >>
> >> Before my patch (85cf0079), long names were silently truncated when
> >> mempool created its ring and/or memzones. Now, it returns an error.
> >
> > With 16.04 an application could operate as expected if the first 26
> > character were unique. Your patch revealed the problem that characters
> > after these were left out of the name. Now applications fail where
> > this never been a bug because their naming scheme guarantees the
> > uniqueness on the first 26 chars (or makes it very unlikely) Where the
> > first 26 is not unique, it failed earlier too, because at memzone
> > creation it checks for duplicate names.
> 
> Yes, I understand that there is a behavior change for applications using
> names larger than 26 between 16.04 and 16.07. I also understand that there
> is no way for an application to know what is the maximum usable size for a
> mempool or a ring.
> 
> 
> >> I'm not getting why changing the struct to something like below would
> >> break the API, since it would already return an error today.
> >>
> >>    #define RTE_MEMPOOL_NAMESIZE \
> >
> > Wait, this would mean applications need to recompile to use the
> > smaller value. AFAIK that's an ABI break too, right? At the moment I
> > don't see a way to fix this without breaking the ABI
> 
> With this modification, if you don't recompile the application, its
> behavior will still be the same as today -> it will return ENAMETOOLONG.
> If you recompile it, the application will be aware of the maximum length.
> To me, it seems to be a acceptable compromise for this release.
> 
> The patch you're proposing also changes the ABI of librte_ring and
> librte_eal, which cannot be accepted for the release.
> 
> 
> >
> >>        (RTE_MEMZONE_NAMESIZE - sizeof(pool_prefix) - sizeof(ring
> prefix))
> >>    struct rte_mempool {
> >>        union {
> >>              char name[RTE_MEMPOOL_NAMESIZE];
> >>              char pad[32];
> >>        };
> >>        ...
> >>    }
> >>
> >> Anyway, it may not be the proper solution since it supposes that a
> >> mempool includes a ring based on a memzone, which is not always true
> >> now with mempool handlers.
> >
> > Oh, as we dug deeper it gets better!
> > Indeed, we don't necessarily have this ring + memzone pair underneath,
> > but the user is not aware of that, and I think we should keep it that
> > way. It should only care that the string passed shouldn't be bigger
> > than a certain amount.
> 
> Yes. What I'm just saying here is that it's not a good solution to write
> in the #define that "a mempool is based on a ring + a memzone", because if
> some someone adds a new mempool handler replacing the ring, and also
> creating a memzone prefixed by something larger than "rg_", we will have
> to break the ABI again.
> 
> 
> > Also, even though we don't necessarily have the ring, we still reserve
> > memzone's in rte_mempool_populate_default(). And their name has a 3
> > letter prefix, and a "_%d" postfix, where the %d could be as much as
> > RTE_MAX_MEMZONE in worst case (2560 by default) So actually:
> >
> > RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
> > strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen("_2560")
> >
> >
> > As a side note, there is another bug around here: rte_ring_create()
> > doesn't check for name duplications. However the user of the library
> > can lookup based on the name with rte_ring_lookup(), and it will
> > return the first ring with that name
> 
> The name uniqueness is checked by rte_memzone_reserve().
> 
> 
> >>>> It would even be better to get rid of this static char[] for the
> >>>> structure names and replace it by an allocated const char *. I
> >>>> didn't check it's feasible for memzones. What do you think?
> >>>
> >>> It would work too, but I don't think it would help a lot. We would
> >>> still need max sizes for the names. Storing them somewhere else
> >>> won't help us in this problem.
> >>
> >> Why should we have a maximum length for the names?
> >
> > What happens if an application loads DPDK and create a mempool with a
> > name string 2 million characters long? Maybe nothing we should worry
> > about, but in general I think unlimited length function parameters are
> > problematic at the very least. The length should be passed at least
> > (which also creates a max due to the size of the param). But I think
> > it would be just easier to have these maximums set, observing the
> > above constrains.
> 
> I think have a maximum name length brings more problems than not having
> it, especially ABI problems.
> 

I disagree. I think we should have reasonable max names, and allow functions to return an error in case of a name being too long. However, what I think we also need to do is to guarantee a minimum maximum name length to allow apps to ensure they never hit that name-too-long error. We can guarantee that for ring/mempool etc, that the max allowed name will always be at least 20 characters, for example. That gives plenty of scope for adjusting the max as we need to, while giving others reasonable guarantees too.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-20 12:41  4%         ` Zoltan Kiss
@ 2016-07-20 13:37  4%           ` Olivier Matz
  2016-07-20 14:01  0%             ` Richardson, Bruce
  2016-07-20 17:20  0%             ` Zoltan Kiss
  0 siblings, 2 replies; 200+ results
From: Olivier Matz @ 2016-07-20 13:37 UTC (permalink / raw)
  To: Zoltan Kiss, Zoltan Kiss, dev

Hi,

On 07/20/2016 02:41 PM, Zoltan Kiss wrote:
> 
> 
> On 19/07/16 17:17, Olivier Matz wrote:
>> Hi Zoltan,
>>
>> On 07/19/2016 05:59 PM, Zoltan Kiss wrote:
>>>
>>>
>>> On 19/07/16 16:37, Olivier Matz wrote:
>>>> Hi Zoltan,
>>>>
>>>> On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
>>>>> A recent fix brought up an issue about the size of the 'name' fields:
>>>>>
>>>>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>>>>
>>>>> These relations should be observed:
>>>>>
>>>>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>>>>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE -
>>>>> strlen(RTE_MEMPOOL_MZ_PREFIX)
>>>>>
>>>>> Setting all of them to 32 hides this restriction from the application.
>>>>> This patch increases the memzone string size to accomodate for these
>>>>> prefixes, and the same happens with the ring name string. The ABI
>>>>> needs to
>>>>> be broken to fix this API issue, this way doesn't break applications
>>>>> previously not failing due to the truncating bug now fixed.
>>>>>
>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>>>>
>>>> I agree it is a problem for an application because it cannot know what
>>>> is the maximum name length. On the other hand, breaking the ABI for
>>>> this
>>>> looks a bit overkill. Maybe we could reduce RTE_MEMPOOL_NAMESIZE and
>>>> RTE_RING_NAMESIZE instead of increasing RTE_MEMZONE_NAMESIZE? That way,
>>>> we could keep the ABI as is.
>>>
>>> But that would break the ABI too, wouldn't it? Unless you keep the array
>>> the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.
>>
>> Yes, that was the idea.
>>
>>> And even then, the API breaks anyway. There are applications - I have at
>>> least some - which use all 32 bytes to store the name. Decrease that
>>> would cause headache to change the naming scheme, because it's a 30
>>> character long id, and chopping the last few chars would cause name
>>> collisions and annoying bugs.
>>
>> Before my patch (85cf0079), long names were silently truncated when
>> mempool created its ring and/or memzones. Now, it returns an error.
> 
> With 16.04 an application could operate as expected if the first 26
> character were unique. Your patch revealed the problem that characters
> after these were left out of the name. Now applications fail where this
> never been a bug because their naming scheme guarantees the uniqueness
> on the first 26 chars (or makes it very unlikely)
> Where the first 26 is not unique, it failed earlier too, because at
> memzone creation it checks for duplicate names.

Yes, I understand that there is a behavior change for applications using
names larger than 26 between 16.04 and 16.07. I also understand that
there is no way for an application to know what is the maximum usable
size for a mempool or a ring.


>> I'm not getting why changing the struct to something like below would
>> break the API, since it would already return an error today.
>>
>>    #define RTE_MEMPOOL_NAMESIZE \
> 
> Wait, this would mean applications need to recompile to use the smaller
> value. AFAIK that's an ABI break too, right? At the moment I don't see a
> way to fix this without breaking the ABI

With this modification, if you don't recompile the application, its
behavior will still be the same as today -> it will return ENAMETOOLONG.
If you recompile it, the application will be aware of the maximum
length. To me, it seems to be a acceptable compromise for this release.

The patch you're proposing also changes the ABI of librte_ring and
librte_eal, which cannot be accepted for the release.


> 
>>        (RTE_MEMZONE_NAMESIZE - sizeof(pool_prefix) - sizeof(ring prefix))
>>    struct rte_mempool {
>>        union {
>>              char name[RTE_MEMPOOL_NAMESIZE];
>>              char pad[32];
>>        };
>>        ...
>>    }
>>
>> Anyway, it may not be the proper solution since it supposes that a
>> mempool includes a ring based on a memzone, which is not always true now
>> with mempool handlers.
> 
> Oh, as we dug deeper it gets better!
> Indeed, we don't necessarily have this ring + memzone pair underneath,
> but the user is not aware of that, and I think we should keep it that
> way. It should only care that the string passed shouldn't be bigger than
> a certain amount.

Yes. What I'm just saying here is that it's not a good solution to write
in the #define that "a mempool is based on a ring + a memzone", because
if some someone adds a new mempool handler replacing the ring, and also
creating a memzone prefixed by something larger than "rg_", we will have
to break the ABI again.


> Also, even though we don't necessarily have the ring, we still reserve
> memzone's in rte_mempool_populate_default(). And their name has a 3
> letter prefix, and a "_%d" postfix, where the %d could be as much as
> RTE_MAX_MEMZONE in worst case (2560 by default) So actually:
> 
> RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE -
> strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen("_2560")
> 
> 
> As a side note, there is another bug around here: rte_ring_create()
> doesn't check for name duplications. However the user of the library can
> lookup based on the name with rte_ring_lookup(), and it will return the
> first ring with that name

The name uniqueness is checked by rte_memzone_reserve().


>>>> It would even be better to get rid of this static char[] for the
>>>> structure names and replace it by an allocated const char *. I didn't
>>>> check it's feasible for memzones. What do you think?
>>>
>>> It would work too, but I don't think it would help a lot. We would still
>>> need max sizes for the names. Storing them somewhere else won't help us
>>> in this problem.
>>
>> Why should we have a maximum length for the names?
> 
> What happens if an application loads DPDK and create a mempool with a
> name string 2 million characters long? Maybe nothing we should worry
> about, but in general I think unlimited length function parameters are
> problematic at the very least. The length should be passed at least
> (which also creates a max due to the size of the param). But I think it
> would be just easier to have these maximums set, observing the above
> constrains.

I think have a maximum name length brings more problems than not having
it, especially ABI problems.


Regards,
Olivier

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-19 16:17  0%       ` Olivier Matz
@ 2016-07-20 12:41  4%         ` Zoltan Kiss
  2016-07-20 13:37  4%           ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Zoltan Kiss @ 2016-07-20 12:41 UTC (permalink / raw)
  To: Olivier Matz, Zoltan Kiss, dev



On 19/07/16 17:17, Olivier Matz wrote:
> Hi Zoltan,
>
> On 07/19/2016 05:59 PM, Zoltan Kiss wrote:
>>
>>
>> On 19/07/16 16:37, Olivier Matz wrote:
>>> Hi Zoltan,
>>>
>>> On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
>>>> A recent fix brought up an issue about the size of the 'name' fields:
>>>>
>>>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>>>
>>>> These relations should be observed:
>>>>
>>>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>>>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE -
>>>> strlen(RTE_MEMPOOL_MZ_PREFIX)
>>>>
>>>> Setting all of them to 32 hides this restriction from the application.
>>>> This patch increases the memzone string size to accomodate for these
>>>> prefixes, and the same happens with the ring name string. The ABI
>>>> needs to
>>>> be broken to fix this API issue, this way doesn't break applications
>>>> previously not failing due to the truncating bug now fixed.
>>>>
>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>>>
>>> I agree it is a problem for an application because it cannot know what
>>> is the maximum name length. On the other hand, breaking the ABI for this
>>> looks a bit overkill. Maybe we could reduce RTE_MEMPOOL_NAMESIZE and
>>> RTE_RING_NAMESIZE instead of increasing RTE_MEMZONE_NAMESIZE? That way,
>>> we could keep the ABI as is.
>>
>> But that would break the ABI too, wouldn't it? Unless you keep the array
>> the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.
>
> Yes, that was the idea.
>
>> And even then, the API breaks anyway. There are applications - I have at
>> least some - which use all 32 bytes to store the name. Decrease that
>> would cause headache to change the naming scheme, because it's a 30
>> character long id, and chopping the last few chars would cause name
>> collisions and annoying bugs.
>
> Before my patch (85cf0079), long names were silently truncated when
> mempool created its ring and/or memzones. Now, it returns an error.

With 16.04 an application could operate as expected if the first 26 
character were unique. Your patch revealed the problem that characters 
after these were left out of the name. Now applications fail where this 
never been a bug because their naming scheme guarantees the uniqueness 
on the first 26 chars (or makes it very unlikely)
Where the first 26 is not unique, it failed earlier too, because at 
memzone creation it checks for duplicate names.

>
> I'm not getting why changing the struct to something like below would
> break the API, since it would already return an error today.
>
>    #define RTE_MEMPOOL_NAMESIZE \

Wait, this would mean applications need to recompile to use the smaller 
value. AFAIK that's an ABI break too, right? At the moment I don't see a 
way to fix this without breaking the ABI

>        (RTE_MEMZONE_NAMESIZE - sizeof(pool_prefix) - sizeof(ring prefix))
>    struct rte_mempool {
>        union {
>              char name[RTE_MEMPOOL_NAMESIZE];
>              char pad[32];
>        };
>        ...
>    }
>
> Anyway, it may not be the proper solution since it supposes that a
> mempool includes a ring based on a memzone, which is not always true now
> with mempool handlers.

Oh, as we dug deeper it gets better!
Indeed, we don't necessarily have this ring + memzone pair underneath, 
but the user is not aware of that, and I think we should keep it that 
way. It should only care that the string passed shouldn't be bigger than 
a certain amount.
Also, even though we don't necessarily have the ring, we still reserve 
memzone's in rte_mempool_populate_default(). And their name has a 3 
letter prefix, and a "_%d" postfix, where the %d could be as much as 
RTE_MAX_MEMZONE in worst case (2560 by default) So actually:

RTE_MEMPOOL_NAMESIZE <= RTE_MEMZONE_NAMESIZE - 
strlen(RTE_MEMPOOL_MZ_PREFIX) - strlen("_2560")


As a side note, there is another bug around here: rte_ring_create() 
doesn't check for name duplications. However the user of the library can 
lookup based on the name with rte_ring_lookup(), and it will return the 
first ring with that name

>
>>> It would even be better to get rid of this static char[] for the
>>> structure names and replace it by an allocated const char *. I didn't
>>> check it's feasible for memzones. What do you think?
>>
>> It would work too, but I don't think it would help a lot. We would still
>> need max sizes for the names. Storing them somewhere else won't help us
>> in this problem.
>
> Why should we have a maximum length for the names?

What happens if an application loads DPDK and create a mempool with a 
name string 2 million characters long? Maybe nothing we should worry 
about, but in general I think unlimited length function parameters are 
problematic at the very least. The length should be passed at least 
(which also creates a max due to the size of the param). But I think it 
would be just easier to have these maximums set, observing the above 
constrains.

>
>
> Thanks,
> Olivier
>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-20  2:16  0%         ` Lu, Wenzhuo
@ 2016-07-20 10:41  2%           ` Adrien Mazarguil
  2016-07-21  3:18  0%             ` Lu, Wenzhuo
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-07-20 10:41 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

Hi Wenzhuo,

On Wed, Jul 20, 2016 at 02:16:51AM +0000, Lu, Wenzhuo wrote:
[...]
> > So, today an application cannot combine N-tuple and FDIR flow rules and get a
> > reliable outcome, unless it is designed for specific devices with a known
> > behavior.
> > 
> > > What's the right behavior of PMD if APP want to create a flow director rule
> > which has a higher or even equal priority than an existing n-tuple rule? Should
> > PMD return fail?
> > 
> > First remember applications only deal with the generic API, PMDs are
> > responsible for choosing the most appropriate HW implementation to use
> > according to the requested flow rules (FDIR, N-tuple or anything else).
> > 
> > For the specific case of FDIR vs N-tuple, if the underlying HW supports both I do
> > not see why the PMD would create a N-tuple rule. Doesn't FDIR support
> > everything N-tuple can do and much more?
> Talking about the filters, fdir can cover n-tuple. I think that's why i40e only supports fdir but not n-tuple. But n-tuple has its own highlight. As we know, at least on intel NICs, fdir only supports per device mask. But n-tuple can support per rule mask.
> As every pattern has spec and mask both, we cannot guarantee the masks are same. I think ixgbe will try to use n-tuple first if can. Because even the masks are different, we can support them all.

OK, makes sense. In that case existing rules may indeed prevent subsequent
ones from getting created if their priority is wrong. I do not think there
is a way around that if the application needs this exact ordering.

> > Assuming such a thing happened anyway, that the PMD had to create a rule
> > using a high priority filter type and that the application requests the creation of a
> > rule that can only be done using a lower priority filter type, but also requested a
> > higher priority for that rule, then yes, it should obviously fail.
> > 
> > That is, unless the PMD can perform some kind of workaround to have both.
> > 
> > > If so, do we need more fail reasons? According to this RFC, I think we need
> > return " EEXIST: collision with an existing rule. ", but it's not very clear, APP
> > doesn't know the problem is priority, maybe more detailed reason is helpful.
> > 
> > Possibly, I've defined a basic set of errors, there are quite a number of errno
> > values to choose from. However I think we should not define too many values.
> > In my opinion the basic set covers every possible failure:
> > 
> > - EINVAL: invalid format, rule is broken or cannot be understood by the PMD
> >   anyhow.
> > 
> > - ENOTSUP: pattern/actions look fine but something in the requested rule is
> >   not supported and thus cannot be applied.
> > 
> > - EEXIST: pattern/actions are fine and could have been applied if only some
> >   other rule did not prevent the PMD to do it (I see it as the closest thing
> >   to "ETOOBAD" which unfortunately does not exist).
> > 
> > - ENOMEM: like EEXIST, except it is due to the lack of resources not because
> >   of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
> >   settled on ENOMEM as it is well known. Still open to debate.
> > 
> > Errno values are only useful to get a rough idea of the reason, and another
> > mechanism is needed to pinpoint the exact problem for debugging/reporting
> > purposes, something like:
> > 
> >  enum rte_flow_error_type {
> >      RTE_FLOW_ERROR_TYPE_NONE,
> >      RTE_FLOW_ERROR_TYPE_UNKNOWN,
> >      RTE_FLOW_ERROR_TYPE_PRIORITY,
> >      RTE_FLOW_ERROR_TYPE_PATTERN,
> >      RTE_FLOW_ERROR_TYPE_ACTION,
> >  };
> > 
> >  struct rte_flow_error {
> >      enum rte_flow_error_type type;
> >      void *offset; /* Points to the exact pattern item or action. */
> >      const char *message;
> >  };
> When we are using a CLI and it fails, normally it will let us know which parameter is not appropriate. So, I think it’s a good idea to have this error structure :)

Agreed.

> > Then either provide an optional struct rte_flow_error pointer to
> > rte_flow_validate(), or a separate function (rte_flow_analyze()?), since
> > processing this may be quite expensive and applications may not care about the
> > exact reason.
> Agree the processing may be too expensive. Maybe we can say it's optional to return error details. And that's a good question that what APP should do if creating the rule fails. I believe normally it will choose handle the rule by itself. But I think it's not bad to feedback more. Or even the APP want to adjust the rules, it cannot be an option for lack of info.

All right then, I'll add it to the specification.

 int
 rte_flow_validate(uint8_t port_id,
                   const struct rte_flow_pattern *pattern,
                   const struct rte_flow_actions *actions,
                   struct rte_flow_error *error);

With error possibly NULL if the application does not care. Is it fine for
you?

[...]
> > > > > > - PMDs, not applications, are responsible for maintaining flow rules
> > > > > >   configuration when stopping and restarting a port or performing other
> > > > > >   actions which may affect them. They can only be destroyed explicitly.
> > > > > Don’t understand " They can only be destroyed explicitly."
> > > >
> > > > This part says that as long as an application has not called
> > > > rte_flow_destroy() on a flow rule, it never disappears, whatever
> > > > happens to the port (stopped, restarted). The application is not
> > > > responsible for re-creating rules after that.
> > > >
> > > > Note that according to the specification, this may translate to not
> > > > being able to stop a port as long as a flow rule is present,
> > > > depending on how nice the PMD intends to be with applications.
> > > > Implementation can be done in small steps with minimal amount of code on
> > the PMD side.
> > > Does it mean PMD should store and maintain all the rules? Why not let rte do
> > that? I think if PMD maintain all the rules, it means every kind of NIC should have
> > a copy of code for the rules. But if rte do that, only one copy of code need to be
> > maintained, right?
> > 
> > I've considered having rules stored in a common format understood at the RTE
> > level and not specific to each PMD and decided that the opaque rte_flow pointer
> > was a better choice for the following reasons:
> > 
> > - Even though flow rules management is done in the control path, processing
> >   must be as fast as possible. Letting PMDs store flow rules using their own
> >   internal representation gives them the chance to achieve better
> >   performance.
> Not quite understand. I think we're talking about maintain the rules by SW. I don’t think there's something need to be optimized according to specific NICs. If we need to optimize the code, I think we need to consider the CPU, OS ... and some common means. I'm wrong?

Perhaps we were talking about different things, here I was only explaining
why rte_flow (the result of creating a flow rule) should be opaque and fully
managed by the PMD. More on the SW side of things below.

> > - An opaque context managed by PMDs would probably have to be stored
> >   somewhere as well anyway.
> > 
> > - PMDs may not need to allocate/store anything at all if they exclusively
> >   rely on HW state for everything. In my opinion, the generic API has enough
> >   constraints for this to work and maintain consistency between flow
> >   rules. Note this is currently how most PMDs implement FDIR and other
> >   filter types.
> Yes, the rules are stored by HW. But considering stop/start the device, the rules in HW will lose. we have to store the rules by SW and re-program them when restarting the device.

Assume a HW capable of keeping flow rules programmed even during a
stop/start cycle (e.g. mlx4/mlx5 may be able to do it from DPDK point of
view), don't you think it is more efficient to standardize on this behavior
and let PMDs restore flow rules for HW that do not support it regardless of
whether it would be done by RTE or the application (SW)?

> And in existing code, we store the filters by SW at least on Intel NICs. But I think we cannot reuse them, because considering the priority and which category of filter should be chosen, I think we need a whole new table for generic API. I think it’s what's designed now, right?

So I understand you'd want RTE to help your PMD keep track of the flow rules
it created?

Nothing wrong with that, all I'm saying is that it should be entirely
optional. RTE should not automatically maintain a list. PMDs have to call
RTE helpers if they need help to maintain a context. These helpers are not
defined in this API yet because it is difficult to know what will be useful
in advance.

> > - RTE can (and will) provide helpers to avoid most of the code redundancy,
> >   PMDs are free to use them or manage everything by themselves.
> > 
> > - Given that the opaque rte_flow pointer associated with a flow rule is to
> >   be stored by the application, PMDs do not even have to keep references to
> >   them.
> Don’t understand. More details?

In an application:

 rte_flow *foo = rte_flow_create(...);

In the above example, foo cannot be dereferenced by the application nor RTE,
only the PMD is aware of its contents. This object can only be used with
rte_flow*() functions.

PMDs are thus free to make this object grow as needed when adding internal
features without breaking any kind of public API/ABI.

What I meant is, given that the application is supposed to store foo
somewhere in order to destroy it later, the PMD does not have to keep track
of that pointer assuming it does not need to access it later on its own for
some reason.

> > - The flow rules format described in this specification (pattern / actions)
> >   will be used by applications directly, and will be free to arrange them in
> >   lists, trees or in any other way if they need to keep flow specifications
> >   around for further processing.
> Who will create the lists, trees or something else? According to previous discussion, I think the APP will program the rules one by one. So if APP organize the rules to lists, trees..., PMD doesn’t know that. 
> And you said " Given that the opaque rte_flow pointer associated with a flow rule is to be stored by the application ". I'm lost here.

I guess that's because we're discussing two different things, flow rule
specifications and flow rule objects. Let me sum it up:

- Flow rule specifications are the patterns/actions combinations provided by
  applications to rte_flow_create(). Applications can store those as needed
  and organize them as they wish (hash, tree, list). Neither PMDs nor RTE
  will do it for them.

- Flow rule objects (struct rte_flow *) are generated when a flow rule is
  created. Applications must keep these around if they want to manipulate
  them later (i.e. destroy or query existing rules).

Then PMDs *may* need to keep and arrange flow rule objects internally for
management purposes. Could be because HW requires it, detecting conflicting
rules, managing priorities and so on. Possible reasons are not described in
this API because these are thought as PMD-specific needs.

> > > When the port is stopped and restarted, rte can reconfigure the rules. Is the
> > concern that PMD may adjust the sequence of the rules according to the priority,
> > so every NIC has a different list of rules? But PMD can adjust them again when
> > rte reconfiguring the rules.
> > 
> > What about PMDs able to stop and restart ports without destroying their own
> > flow rules? If we assume flow rules must be destroyed when stopping a port,
> > these PMDs are needlessly penalized with slower stop/start cycles. Think about
> > it assuming thousands of flow rules.
> I believe the rules maintained by SW should not be destroyed, because they're used to be re-programed when the device starts again.

Do we agree that applications should not care? Flow rules configured before
stopping a port must still be there after restarting it.

What we seem to not agree about is that you think RTE should be responsible
for restoring flow rules of devices that lose them when stopped. I think
doing so is unfair to devices for which it is not the case and not really
nice to applications, so my opinion is that the PMD is responsible for
restoring flow rules however it wants. It is free to use RTE helpers to keep
their track, as long as it's all managed internally.

> > Thus from an application point of view, whatever happens when stopping and
> > restarting a port should not matter. If a flow rule was present before, it must
> > still be present afterwards. If the PMD had to destroy flow rules and re-create
> > them, it does not actually matter if they differ slightly at the HW level, as long as:
> > 
> > - Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
> >   and refer to the same rules.
> > 
> > - The overall behavior of all rules is the same.
> > 
> > The list of rules you think of (patterns / actions) is maintained by applications
> > (not RTE), and only if they need them. RTE would needlessly duplicate this.
> As said before, need more details to understand this. Maybe an example is better :)

The generic format both RTE and applications might understand is the one
described in this API (struct rte_flow_pattern and struct
rte_flow_actions).

If we wanted RTE to maintain some sort of per-port state for flow rule
specifications, it would have to be a copy of these structures arranged
somehow (list or something else).

If we consider that PMDs need to keep a context object associated to a flow
rule (the opaque struct rte_flow *), then RTE would most likely have to
store it along with the flow specification.

Such a list may not be useful to applications (list lookups take time), so
they would implement their own redundant method. They might also require
extra room to attach some application context to flow rules. A generic list
cannot plan for it.

Applications know what they want to do with flow rules and are responsible
for managing them efficiently with RTE out of the way.

I'm not sure if this answered your question, if not, please describe a
scenario where a RTE-managed list of flow rules would be mandatory.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
@ 2016-07-20  8:54  4%   ` Ferruh Yigit
  2016-07-27  8:33  4%   ` Thomas Monjalon
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2016-07-20  8:54 UTC (permalink / raw)
  To: Olivier Matz, dev; +Cc: jerin.jacob, thomas.monjalon, bruce.richardson

On 7/20/2016 8:16 AM, Olivier Matz wrote:
> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
> 
> v1->v2:
> - reword the sentences to keep things more open, as suggested by Bruce
> 
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..b9f5a93 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,9 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are deprecated and
>    will be removed in 16.11.
>    It is replaced by rte_mempool_generic_get/put functions.
> +
> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
> +  may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
> +  ``nb_segs`` in one operation, because some platforms have an overhead if the
> +  store address is not naturally aligned. Other mbuf fields, such as the
> +  ``port`` field, may be moved or removed as part of this mbuf work.
> 

Not directly related to this patch, but generally for deprecation
notices, does it make sense to tag explicitly which library effected, like:

* librte_mbuf [perhaps with version here]:
  Explanation about deprecation ...

For this case it is more clear which library effected, but sometimes
that is not obvious from deprecation notice.

Also when checked for if specific library effected, it is harder to find
with current notes.

Thanks,
ferruh

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure
  2016-07-19 14:01 13% [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure Olivier Matz
  2016-07-19 14:40  4% ` Bruce Richardson
@ 2016-07-20  7:16 13% ` Olivier Matz
  2016-07-20  8:54  4%   ` Ferruh Yigit
                     ` (4 more replies)
  1 sibling, 5 replies; 200+ results
From: Olivier Matz @ 2016-07-20  7:16 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob, thomas.monjalon, bruce.richardson

For 16.11, the mbuf structure will be modified implying ABI breakage.
Some discussions already took place here:
http://www.dpdk.org/dev/patchwork/patch/12878/

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---

v1->v2:
- reword the sentences to keep things more open, as suggested by Bruce

 doc/guides/rel_notes/deprecation.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..b9f5a93 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,3 +41,9 @@ Deprecation Notices
 * The mempool functions for single/multi producer/consumer are deprecated and
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
+
+* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
+  may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
+  ``nb_segs`` in one operation, because some platforms have an overhead if the
+  store address is not naturally aligned. Other mbuf fields, such as the
+  ``port`` field, may be moved or removed as part of this mbuf work.
-- 
2.8.1

^ permalink raw reply	[relevance 13%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-19 13:12  0%       ` Adrien Mazarguil
@ 2016-07-20  2:16  0%         ` Lu, Wenzhuo
  2016-07-20 10:41  2%           ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Lu, Wenzhuo @ 2016-07-20  2:16 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

Hi Adrien,


> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Tuesday, July 19, 2016 9:12 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org; Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody;
> Ajit Khaparde; Rahul Lakkireddy; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> Guarch, Pablo; Olga Shern
> Subject: Re: [RFC] Generic flow director/filtering/classification API
> 
> On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote:
> > Hi Adrien,
> > Thanks for your clarification.  Most of my questions are clear, but still
> something may need to be discussed, comment below.
> 
> Hi Wenzhuo,
> 
> Please see below.
> 
> [...]
> > > > > Requirements for a new API:
> > > > >
> > > > > - Flexible and extensible without causing API/ABI problems for existing
> > > > >   applications.
> > > > > - Should be unambiguous and easy to use.
> > > > > - Support existing filtering features and actions listed in `Filter types`_.
> > > > > - Support packet alteration.
> > > > > - In case of overlapping filters, their priority should be well documented.
> > > > Does that mean we don't guarantee the consistent of priority? The
> > > > priority can
> > > be different on different NICs. So the behavior of the actions  can be
> different.
> > > Right?
> > >
> > > No, the intent is precisely to define what happens in order to get a
> > > consistent result across different devices, and document cases with
> undefined behavior.
> > > There must be no room left for interpretation.
> > >
> > > For example, the API must describe what happens when two overlapping
> > > filters (e.g. one matching an Ethernet header, another one matching
> > > an IP header) match a given packet at a given priority level.
> > >
> > > It is documented in section 4.1.1 (priorities) as "undefined behavior".
> > > Applications remain free to do it and deal with consequences, at
> > > least they know they cannot expect a consistent outcome, unless they
> > > use different priority levels for both rules, see also 4.4.5 (flow rules priority).
> > >
> > > > Seems the users still need to aware the some details of the HW? Do
> > > > we need
> > > to add the negotiation for the priority?
> > >
> > > Priorities as defined in this document may not be directly mappable
> > > to HW capabilities (e.g. HW does not support enough priorities, or
> > > that some corner case make them not work as described), in which
> > > case the PMD may choose to simulate priorities (again 4.4.5), as
> > > long as the end result follows the specification.
> > >
> > > So users must not be aware of some HW details, the PMD does and must
> > > perform the needed workarounds to suit their expectations. Users may
> > > only be impacted by errors while attempting to create rules that are
> > > either unsupported or would cause them (or existing rules) to diverge from
> the spec.
> > The problem is sometime the priority of the filters is fixed according
> > to
> > > HW's implementation. For example, on ixgbe, n-tuple has a higher
> > > priority than flow director.
> 
> As a side note I did not know that N-tuple had a higher priority than flow
> director on ixgbe, priorities among filter types do not seem to be documented at
> all in DPDK. This is one of the reasons I think we need a generic API to handle
> flow configuration.
Totally agree with you. We haven't documented the info well enough. And even we do that, users have to study the details of every NIC, it can still make the filters very hard to use. I believe a generic API is very helpful here :)

> 
> 
> So, today an application cannot combine N-tuple and FDIR flow rules and get a
> reliable outcome, unless it is designed for specific devices with a known
> behavior.
> 
> > What's the right behavior of PMD if APP want to create a flow director rule
> which has a higher or even equal priority than an existing n-tuple rule? Should
> PMD return fail?
> 
> First remember applications only deal with the generic API, PMDs are
> responsible for choosing the most appropriate HW implementation to use
> according to the requested flow rules (FDIR, N-tuple or anything else).
> 
> For the specific case of FDIR vs N-tuple, if the underlying HW supports both I do
> not see why the PMD would create a N-tuple rule. Doesn't FDIR support
> everything N-tuple can do and much more?
Talking about the filters, fdir can cover n-tuple. I think that's why i40e only supports fdir but not n-tuple. But n-tuple has its own highlight. As we know, at least on intel NICs, fdir only supports per device mask. But n-tuple can support per rule mask.
As every pattern has spec and mask both, we cannot guarantee the masks are same. I think ixgbe will try to use n-tuple first if can. Because even the masks are different, we can support them all.

> 
> Assuming such a thing happened anyway, that the PMD had to create a rule
> using a high priority filter type and that the application requests the creation of a
> rule that can only be done using a lower priority filter type, but also requested a
> higher priority for that rule, then yes, it should obviously fail.
> 
> That is, unless the PMD can perform some kind of workaround to have both.
> 
> > If so, do we need more fail reasons? According to this RFC, I think we need
> return " EEXIST: collision with an existing rule. ", but it's not very clear, APP
> doesn't know the problem is priority, maybe more detailed reason is helpful.
> 
> Possibly, I've defined a basic set of errors, there are quite a number of errno
> values to choose from. However I think we should not define too many values.
> In my opinion the basic set covers every possible failure:
> 
> - EINVAL: invalid format, rule is broken or cannot be understood by the PMD
>   anyhow.
> 
> - ENOTSUP: pattern/actions look fine but something in the requested rule is
>   not supported and thus cannot be applied.
> 
> - EEXIST: pattern/actions are fine and could have been applied if only some
>   other rule did not prevent the PMD to do it (I see it as the closest thing
>   to "ETOOBAD" which unfortunately does not exist).
> 
> - ENOMEM: like EEXIST, except it is due to the lack of resources not because
>   of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
>   settled on ENOMEM as it is well known. Still open to debate.
> 
> Errno values are only useful to get a rough idea of the reason, and another
> mechanism is needed to pinpoint the exact problem for debugging/reporting
> purposes, something like:
> 
>  enum rte_flow_error_type {
>      RTE_FLOW_ERROR_TYPE_NONE,
>      RTE_FLOW_ERROR_TYPE_UNKNOWN,
>      RTE_FLOW_ERROR_TYPE_PRIORITY,
>      RTE_FLOW_ERROR_TYPE_PATTERN,
>      RTE_FLOW_ERROR_TYPE_ACTION,
>  };
> 
>  struct rte_flow_error {
>      enum rte_flow_error_type type;
>      void *offset; /* Points to the exact pattern item or action. */
>      const char *message;
>  };
When we are using a CLI and it fails, normally it will let us know which parameter is not appropriate. So, I think it’s a good idea to have this error structure :)

> 
> Then either provide an optional struct rte_flow_error pointer to
> rte_flow_validate(), or a separate function (rte_flow_analyze()?), since
> processing this may be quite expensive and applications may not care about the
> exact reason.
Agree the processing may be too expensive. Maybe we can say it's optional to return error details. And that's a good question that what APP should do if creating the rule fails. I believe normally it will choose handle the rule by itself. But I think it's not bad to feedback more. Or even the APP want to adjust the rules, it cannot be an option for lack of info.

> 
> What do you suggest?
> 
> > > > > Behavior
> > > > > --------
> > > > >
> > > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> > > > >   returned).
> > > > >
> > > > > - There is no provision for reentrancy/multi-thread safety, although
> nothing
> > > > >   should prevent different devices from being configured at the same
> > > > >   time. PMDs may protect their control path functions accordingly.
> > > > >
> > > > > - Stopping the data path (TX/RX) should not be necessary when
> > > > > managing
> > > flow
> > > > >   rules. If this cannot be achieved naturally or with workarounds (such as
> > > > >   temporarily replacing the burst function pointers), an appropriate error
> > > > >   code must be returned (``EBUSY``).
> > > > PMD cannot stop the data path without adding lock. So I think if
> > > > some rules
> > > cannot be applied without stopping rx/tx, PMD has to return fail.
> > > > Or let the APP to stop the data path.
> > >
> > > Agreed, that is the intent. If the PMD cannot touch flow rules for
> > > some reason even after trying really hard, then it just returns EBUSY.
> > >
> > > Perhaps we should write down that applications may get a different
> > > outcome after stopping the data path if they get EBUSY?
> > Agree, it's better to describe more about the APP. BTW, I checked the
> > behavior of ixgbe/igb, I think we can add/delete filters during
> > runtime. Hopefully we'll not hit too many EBUSY problems on other NICs
> > :)
> 
> OK, I will add it.
> 
> > > > > - PMDs, not applications, are responsible for maintaining flow rules
> > > > >   configuration when stopping and restarting a port or performing other
> > > > >   actions which may affect them. They can only be destroyed explicitly.
> > > > Don’t understand " They can only be destroyed explicitly."
> > >
> > > This part says that as long as an application has not called
> > > rte_flow_destroy() on a flow rule, it never disappears, whatever
> > > happens to the port (stopped, restarted). The application is not
> > > responsible for re-creating rules after that.
> > >
> > > Note that according to the specification, this may translate to not
> > > being able to stop a port as long as a flow rule is present,
> > > depending on how nice the PMD intends to be with applications.
> > > Implementation can be done in small steps with minimal amount of code on
> the PMD side.
> > Does it mean PMD should store and maintain all the rules? Why not let rte do
> that? I think if PMD maintain all the rules, it means every kind of NIC should have
> a copy of code for the rules. But if rte do that, only one copy of code need to be
> maintained, right?
> 
> I've considered having rules stored in a common format understood at the RTE
> level and not specific to each PMD and decided that the opaque rte_flow pointer
> was a better choice for the following reasons:
> 
> - Even though flow rules management is done in the control path, processing
>   must be as fast as possible. Letting PMDs store flow rules using their own
>   internal representation gives them the chance to achieve better
>   performance.
Not quite understand. I think we're talking about maintain the rules by SW. I don’t think there's something need to be optimized according to specific NICs. If we need to optimize the code, I think we need to consider the CPU, OS ... and some common means. I'm wrong?

> 
> - An opaque context managed by PMDs would probably have to be stored
>   somewhere as well anyway.
> 
> - PMDs may not need to allocate/store anything at all if they exclusively
>   rely on HW state for everything. In my opinion, the generic API has enough
>   constraints for this to work and maintain consistency between flow
>   rules. Note this is currently how most PMDs implement FDIR and other
>   filter types.
Yes, the rules are stored by HW. But considering stop/start the device, the rules in HW will lose. we have to store the rules by SW and re-program them when restarting the device.
And in existing code, we store the filters by SW at least on Intel NICs. But I think we cannot reuse them, because considering the priority and which category of filter should be chosen, I think we need a whole new table for generic API. I think it’s what's designed now, right?

> 
> - RTE can (and will) provide helpers to avoid most of the code redundancy,
>   PMDs are free to use them or manage everything by themselves.
> 
> - Given that the opaque rte_flow pointer associated with a flow rule is to
>   be stored by the application, PMDs do not even have to keep references to
>   them.
Don’t understand. More details?

> 
> - The flow rules format described in this specification (pattern / actions)
>   will be used by applications directly, and will be free to arrange them in
>   lists, trees or in any other way if they need to keep flow specifications
>   around for further processing.
Who will create the lists, trees or something else? According to previous discussion, I think the APP will program the rules one by one. So if APP organize the rules to lists, trees..., PMD doesn’t know that. 
And you said " Given that the opaque rte_flow pointer associated with a flow rule is to be stored by the application ". I'm lost here.

> 
> > When the port is stopped and restarted, rte can reconfigure the rules. Is the
> concern that PMD may adjust the sequence of the rules according to the priority,
> so every NIC has a different list of rules? But PMD can adjust them again when
> rte reconfiguring the rules.
> 
> What about PMDs able to stop and restart ports without destroying their own
> flow rules? If we assume flow rules must be destroyed when stopping a port,
> these PMDs are needlessly penalized with slower stop/start cycles. Think about
> it assuming thousands of flow rules.
I believe the rules maintained by SW should not be destroyed, because they're used to be re-programed when the device starts again.

> 
> Thus from an application point of view, whatever happens when stopping and
> restarting a port should not matter. If a flow rule was present before, it must
> still be present afterwards. If the PMD had to destroy flow rules and re-create
> them, it does not actually matter if they differ slightly at the HW level, as long as:
> 
> - Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
>   and refer to the same rules.
> 
> - The overall behavior of all rules is the same.
> 
> The list of rules you think of (patterns / actions) is maintained by applications
> (not RTE), and only if they need them. RTE would needlessly duplicate this.
As said before, need more details to understand this. Maybe an example is better :)

> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-19 15:59  3%     ` Zoltan Kiss
@ 2016-07-19 16:17  0%       ` Olivier Matz
  2016-07-20 12:41  4%         ` Zoltan Kiss
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-07-19 16:17 UTC (permalink / raw)
  To: Zoltan Kiss, Zoltan Kiss, dev

Hi Zoltan,

On 07/19/2016 05:59 PM, Zoltan Kiss wrote:
> 
> 
> On 19/07/16 16:37, Olivier Matz wrote:
>> Hi Zoltan,
>>
>> On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
>>> A recent fix brought up an issue about the size of the 'name' fields:
>>>
>>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>>
>>> These relations should be observed:
>>>
>>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE -
>>> strlen(RTE_MEMPOOL_MZ_PREFIX)
>>>
>>> Setting all of them to 32 hides this restriction from the application.
>>> This patch increases the memzone string size to accomodate for these
>>> prefixes, and the same happens with the ring name string. The ABI
>>> needs to
>>> be broken to fix this API issue, this way doesn't break applications
>>> previously not failing due to the truncating bug now fixed.
>>>
>>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>>
>> I agree it is a problem for an application because it cannot know what
>> is the maximum name length. On the other hand, breaking the ABI for this
>> looks a bit overkill. Maybe we could reduce RTE_MEMPOOL_NAMESIZE and
>> RTE_RING_NAMESIZE instead of increasing RTE_MEMZONE_NAMESIZE? That way,
>> we could keep the ABI as is.
> 
> But that would break the ABI too, wouldn't it? Unless you keep the array
> the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.

Yes, that was the idea.

> And even then, the API breaks anyway. There are applications - I have at
> least some - which use all 32 bytes to store the name. Decrease that
> would cause headache to change the naming scheme, because it's a 30
> character long id, and chopping the last few chars would cause name
> collisions and annoying bugs.

Before my patch (85cf0079), long names were silently truncated when
mempool created its ring and/or memzones. Now, it returns an error.

I'm not getting why changing the struct to something like below would
break the API, since it would already return an error today.

  #define RTE_MEMPOOL_NAMESIZE \
      (RTE_MEMZONE_NAMESIZE - sizeof(pool_prefix) - sizeof(ring prefix))
  struct rte_mempool {
      union {
            char name[RTE_MEMPOOL_NAMESIZE];
            char pad[32];
      };
      ...
  }

Anyway, it may not be the proper solution since it supposes that a
mempool includes a ring based on a memzone, which is not always true now
with mempool handlers.

>> It would even be better to get rid of this static char[] for the
>> structure names and replace it by an allocated const char *. I didn't
>> check it's feasible for memzones. What do you think?
> 
> It would work too, but I don't think it would help a lot. We would still
> need max sizes for the names. Storing them somewhere else won't help us
> in this problem.

Why should we have a maximum length for the names?


Thanks,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-19 15:37  4%   ` Olivier Matz
@ 2016-07-19 15:59  3%     ` Zoltan Kiss
  2016-07-19 16:17  0%       ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Zoltan Kiss @ 2016-07-19 15:59 UTC (permalink / raw)
  To: Olivier Matz, Zoltan Kiss, dev



On 19/07/16 16:37, Olivier Matz wrote:
> Hi Zoltan,
>
> On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
>> A recent fix brought up an issue about the size of the 'name' fields:
>>
>> 85cf0079 mem: avoid memzone/mempool/ring name truncation
>>
>> These relations should be observed:
>>
>> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
>> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE - strlen(RTE_MEMPOOL_MZ_PREFIX)
>>
>> Setting all of them to 32 hides this restriction from the application.
>> This patch increases the memzone string size to accomodate for these
>> prefixes, and the same happens with the ring name string. The ABI needs to
>> be broken to fix this API issue, this way doesn't break applications
>> previously not failing due to the truncating bug now fixed.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
>
> I agree it is a problem for an application because it cannot know what
> is the maximum name length. On the other hand, breaking the ABI for this
> looks a bit overkill. Maybe we could reduce RTE_MEMPOOL_NAMESIZE and
> RTE_RING_NAMESIZE instead of increasing RTE_MEMZONE_NAMESIZE? That way,
> we could keep the ABI as is.

But that would break the ABI too, wouldn't it? Unless you keep the array 
the same size (32 bytes) by using RTE_MEMZONE_NAMESIZE.
And even then, the API breaks anyway. There are applications - I have at 
least some - which use all 32 bytes to store the name. Decrease that 
would cause headache to change the naming scheme, because it's a 30 
character long id, and chopping the last few chars would cause name 
collisions and annoying bugs.

>
> It would even be better to get rid of this static char[] for the
> structure names and replace it by an allocated const char *. I didn't
> check it's feasible for memzones. What do you think?

It would work too, but I don't think it would help a lot. We would still 
need max sizes for the names. Storing them somewhere else won't help us 
in this problem.

>
> In any case, I think it's a bit late for 16.07 for this kind of fix.
>
> Regards,
> Olivier
>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  2016-07-19 14:37  3% ` [dpdk-dev] [PATCH] mempool: adjust name string size in related data types Zoltan Kiss
@ 2016-07-19 15:37  4%   ` Olivier Matz
  2016-07-19 15:59  3%     ` Zoltan Kiss
  2016-07-20 17:16 12%   ` [dpdk-dev] [PATCH v2] " Zoltan Kiss
  1 sibling, 1 reply; 200+ results
From: Olivier Matz @ 2016-07-19 15:37 UTC (permalink / raw)
  To: Zoltan Kiss, dev

Hi Zoltan,

On 07/19/2016 04:37 PM, Zoltan Kiss wrote:
> A recent fix brought up an issue about the size of the 'name' fields:
> 
> 85cf0079 mem: avoid memzone/mempool/ring name truncation
> 
> These relations should be observed:
> 
> RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
> RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE - strlen(RTE_MEMPOOL_MZ_PREFIX)
> 
> Setting all of them to 32 hides this restriction from the application.
> This patch increases the memzone string size to accomodate for these
> prefixes, and the same happens with the ring name string. The ABI needs to
> be broken to fix this API issue, this way doesn't break applications
> previously not failing due to the truncating bug now fixed.
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>

I agree it is a problem for an application because it cannot know what
is the maximum name length. On the other hand, breaking the ABI for this
looks a bit overkill. Maybe we could reduce RTE_MEMPOOL_NAMESIZE and
RTE_RING_NAMESIZE instead of increasing RTE_MEMZONE_NAMESIZE? That way,
we could keep the ABI as is.

It would even be better to get rid of this static char[] for the
structure names and replace it by an allocated const char *. I didn't
check it's feasible for memzones. What do you think?

In any case, I think it's a bit late for 16.07 for this kind of fix.

Regards,
Olivier

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure
  2016-07-19 15:07  4%     ` Richardson, Bruce
@ 2016-07-19 15:28  4%       ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2016-07-19 15:28 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev, jerin.jacob, thomas.monjalon



On 07/19/2016 05:07 PM, Richardson, Bruce wrote:
> 
> 
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Tuesday, July 19, 2016 4:04 PM
>> To: Richardson, Bruce <bruce.richardson@intel.com>
>> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
>> thomas.monjalon@6wind.com
>> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf
>> structure
>>
>> Hi Bruce,
>>
>> On 07/19/2016 04:40 PM, Bruce Richardson wrote:
>>> On Tue, Jul 19, 2016 at 04:01:15PM +0200, Olivier Matz wrote:
>>>> For 16.11, the mbuf structure will be modified implying ABI breakage.
>>>> Some discussions already took place here:
>>>> http://www.dpdk.org/dev/patchwork/patch/12878/
>>>>
>>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>>> ---
>>>>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>>> b/doc/guides/rel_notes/deprecation.rst
>>>> index f502f86..2245bc2 100644
>>>> --- a/doc/guides/rel_notes/deprecation.rst
>>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>>> @@ -41,3 +41,9 @@ Deprecation Notices
>>>>  * The mempool functions for single/multi producer/consumer are
>> deprecated and
>>>>    will be removed in 16.11.
>>>>    It is replaced by rte_mempool_generic_get/put functions.
>>>> +
>>>> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure:
>>>> +some
>>>> +  fields will be reordered to facilitate the writing of
>>>> +``data_off``,
>>>> +  ``refcnt``, and ``nb_segs`` in one operation. Indeed, some
>>>> +platforms
>>>> +  have an overhead if the store address is not naturally aligned.
>>>> +The
>>>> +  useless ``port`` field will also be removed at the same occasion.
>>>> --
>>>
>>> Have we fully bottomed out on the mbuf changes. I'm not sure that once
>>> patches start getting considered for merge, new opinions may come
>>> forward. For instance, is the "port" field really "useless"?
>>>
>>> Would it not be better to put in a less specific deprecation notice?
>>> What happens if this notice goes in and the final changes are
>>> different from those called out here?
>>
>> Yes, you are right. What about the following text?
>>
>> ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
>> fields may be reordered to facilitate the writing of ``data_off``,
>> ``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms have
>> an overhead if the store address is not naturally aligned. The ``port``
>> field may also be removed at the same occasion.
>>
> Better. Two suggestions:
> 1. change "Indeed" to "because" and join the sentences.
> 2. change the last sentence to be even more general: "Other mbuf fields, such as the port field, may be moved or removed as part of this mbuf work".

It's much better indeed ;)
Thanks Bruce, I'll submit a v2.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure
  2016-07-19 15:04  7%   ` Olivier Matz
@ 2016-07-19 15:07  4%     ` Richardson, Bruce
  2016-07-19 15:28  4%       ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Richardson, Bruce @ 2016-07-19 15:07 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jerin.jacob, thomas.monjalon



> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, July 19, 2016 4:04 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf
> structure
> 
> Hi Bruce,
> 
> On 07/19/2016 04:40 PM, Bruce Richardson wrote:
> > On Tue, Jul 19, 2016 at 04:01:15PM +0200, Olivier Matz wrote:
> >> For 16.11, the mbuf structure will be modified implying ABI breakage.
> >> Some discussions already took place here:
> >> http://www.dpdk.org/dev/patchwork/patch/12878/
> >>
> >> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index f502f86..2245bc2 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -41,3 +41,9 @@ Deprecation Notices
> >>  * The mempool functions for single/multi producer/consumer are
> deprecated and
> >>    will be removed in 16.11.
> >>    It is replaced by rte_mempool_generic_get/put functions.
> >> +
> >> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure:
> >> +some
> >> +  fields will be reordered to facilitate the writing of
> >> +``data_off``,
> >> +  ``refcnt``, and ``nb_segs`` in one operation. Indeed, some
> >> +platforms
> >> +  have an overhead if the store address is not naturally aligned.
> >> +The
> >> +  useless ``port`` field will also be removed at the same occasion.
> >> --
> >
> > Have we fully bottomed out on the mbuf changes. I'm not sure that once
> > patches start getting considered for merge, new opinions may come
> > forward. For instance, is the "port" field really "useless"?
> >
> > Would it not be better to put in a less specific deprecation notice?
> > What happens if this notice goes in and the final changes are
> > different from those called out here?
> 
> Yes, you are right. What about the following text?
> 
> ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
> fields may be reordered to facilitate the writing of ``data_off``,
> ``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms have
> an overhead if the store address is not naturally aligned. The ``port``
> field may also be removed at the same occasion.
> 
Better. Two suggestions:
1. change "Indeed" to "because" and join the sentences.
2. change the last sentence to be even more general: "Other mbuf fields, such as the port field, may be moved or removed as part of this mbuf work".

/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure
  2016-07-19 14:40  4% ` Bruce Richardson
@ 2016-07-19 15:04  7%   ` Olivier Matz
  2016-07-19 15:07  4%     ` Richardson, Bruce
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-07-19 15:04 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, jerin.jacob, thomas.monjalon

Hi Bruce,

On 07/19/2016 04:40 PM, Bruce Richardson wrote:
> On Tue, Jul 19, 2016 at 04:01:15PM +0200, Olivier Matz wrote:
>> For 16.11, the mbuf structure will be modified implying ABI breakage.
>> Some discussions already took place here:
>> http://www.dpdk.org/dev/patchwork/patch/12878/
>>
>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index f502f86..2245bc2 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -41,3 +41,9 @@ Deprecation Notices
>>  * The mempool functions for single/multi producer/consumer are deprecated and
>>    will be removed in 16.11.
>>    It is replaced by rte_mempool_generic_get/put functions.
>> +
>> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
>> +  fields will be reordered to facilitate the writing of ``data_off``,
>> +  ``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms
>> +  have an overhead if the store address is not naturally aligned. The
>> +  useless ``port`` field will also be removed at the same occasion.
>> -- 
> 
> Have we fully bottomed out on the mbuf changes. I'm not sure that once patches
> start getting considered for merge, new opinions may come forward. For instance,
> is the "port" field really "useless"?
> 
> Would it not be better to put in a less specific deprecation notice? What happens
> if this notice goes in and the final changes are different from those called out
> here?

Yes, you are right. What about the following text?

ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
fields may be reordered to facilitate the writing of ``data_off``,
``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms
have an overhead if the store address is not naturally aligned. The
``port`` field may also be removed at the same occasion.


Thanks,
Olivier

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure
  2016-07-19 14:01 13% [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure Olivier Matz
@ 2016-07-19 14:40  4% ` Bruce Richardson
  2016-07-19 15:04  7%   ` Olivier Matz
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2016-07-19 14:40 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, jerin.jacob, thomas.monjalon

On Tue, Jul 19, 2016 at 04:01:15PM +0200, Olivier Matz wrote:
> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index f502f86..2245bc2 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,9 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are deprecated and
>    will be removed in 16.11.
>    It is replaced by rte_mempool_generic_get/put functions.
> +
> +* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
> +  fields will be reordered to facilitate the writing of ``data_off``,
> +  ``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms
> +  have an overhead if the store address is not naturally aligned. The
> +  useless ``port`` field will also be removed at the same occasion.
> -- 

Have we fully bottomed out on the mbuf changes. I'm not sure that once patches
start getting considered for merge, new opinions may come forward. For instance,
is the "port" field really "useless"?

Would it not be better to put in a less specific deprecation notice? What happens
if this notice goes in and the final changes are different from those called out
here?

/Bruce

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] mempool: adjust name string size in related data types
  @ 2016-07-19 14:37  3% ` Zoltan Kiss
  2016-07-19 15:37  4%   ` Olivier Matz
  2016-07-20 17:16 12%   ` [dpdk-dev] [PATCH v2] " Zoltan Kiss
  0 siblings, 2 replies; 200+ results
From: Zoltan Kiss @ 2016-07-19 14:37 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz


A recent fix brought up an issue about the size of the 'name' fields:

85cf0079 mem: avoid memzone/mempool/ring name truncation

These relations should be observed:

RTE_RING_NAMESIZE <= RTE_MEMZONE_NAMESIZE - strlen(RTE_RING_MZ_PREFIX)
RTE_MEMPOOL_NAMESIZE <= RTE_RING_NAMESIZE - strlen(RTE_MEMPOOL_MZ_PREFIX)

Setting all of them to 32 hides this restriction from the application.
This patch increases the memzone string size to accomodate for these
prefixes, and the same happens with the ring name string. The ABI needs to
be broken to fix this API issue, this way doesn't break applications
previously not failing due to the truncating bug now fixed.

Signed-off-by: Zoltan Kiss <zoltan.kiss@schaman.hu>
---
 lib/librte_eal/common/include/rte_memzone.h | 2 +-
 lib/librte_mempool/rte_mempool.h            | 4 +++-
 lib/librte_ring/rte_ring.h                  | 5 ++++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_memzone.h b/lib/librte_eal/common/include/rte_memzone.h
index f69b5a8..ba3a1f0 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ -74,7 +74,7 @@ extern "C" {
  */
 struct rte_memzone {
 
-#define RTE_MEMZONE_NAMESIZE 32       /**< Maximum length of memory zone name.*/
+#define RTE_MEMZONE_NAMESIZE (32 + 6)     /**< Maximum length of memory zone name.*/
 	char name[RTE_MEMZONE_NAMESIZE];  /**< Name of the memory zone. */
 
 	phys_addr_t phys_addr;            /**< Start physical address. */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 4a8fbb1..61e8d19 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -123,7 +123,9 @@ struct rte_mempool_objsz {
 	/**< Total size of an object (header + elt + trailer). */
 };
 
-#define RTE_MEMPOOL_NAMESIZE 32 /**< Maximum length of a memory pool. */
+/**< Maximum length of a memory pool's name. */
+#define RTE_MEMPOOL_NAMESIZE (RTE_RING_NAMESIZE - \
+			      sizeof(RTE_MEMPOOL_MZ_PREFIX) + 1)
 #define RTE_MEMPOOL_MZ_PREFIX "MP_"
 
 /* "MP_<name>" */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index eb45e41..d6185de 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -100,6 +100,7 @@ extern "C" {
 #include <rte_lcore.h>
 #include <rte_atomic.h>
 #include <rte_branch_prediction.h>
+#include <rte_memzone.h>
 
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
@@ -126,8 +127,10 @@ struct rte_ring_debug_stats {
 } __rte_cache_aligned;
 #endif
 
-#define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
 #define RTE_RING_MZ_PREFIX "RG_"
+/**< The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
 #ifndef RTE_RING_PAUSE_REP_COUNT
 #define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-- 
1.9.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure
@ 2016-07-19 14:01 13% Olivier Matz
  2016-07-19 14:40  4% ` Bruce Richardson
  2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
  0 siblings, 2 replies; 200+ results
From: Olivier Matz @ 2016-07-19 14:01 UTC (permalink / raw)
  To: dev; +Cc: jerin.jacob, thomas.monjalon

For 16.11, the mbuf structure will be modified implying ABI breakage.
Some discussions already took place here:
http://www.dpdk.org/dev/patchwork/patch/12878/

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index f502f86..2245bc2 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -41,3 +41,9 @@ Deprecation Notices
 * The mempool functions for single/multi producer/consumer are deprecated and
   will be removed in 16.11.
   It is replaced by rte_mempool_generic_get/put functions.
+
+* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some
+  fields will be reordered to facilitate the writing of ``data_off``,
+  ``refcnt``, and ``nb_segs`` in one operation. Indeed, some platforms
+  have an overhead if the store address is not naturally aligned. The
+  useless ``port`` field will also be removed at the same occasion.
-- 
2.8.1

^ permalink raw reply	[relevance 13%]

* Re: [dpdk-dev] [PATCH] rte_delay_us can be replaced with user function
  @ 2016-07-19 13:17  3% ` Wiles, Keith
  0 siblings, 0 replies; 200+ results
From: Wiles, Keith @ 2016-07-19 13:17 UTC (permalink / raw)
  To: jozmarti; +Cc: dev

Hi Jozef,

I have two quick comments inline.
> On Jul 19, 2016, at 7:42 AM, jozmarti@cisco.com wrote:
> 
> From: Jozef Martiniak <jozmarti@cisco.com>
> 
> when running single-core, some drivers tend to call rte_delay_us for a
> long time, and that is causing packet drops.
> Attached patch introduces 2 new functions:
> 
> void rte_delay_us_callback_register(void(*userfunc)(unsigned));
> void rte_delay_us_callback_unregister(void);
> 
> First one replaces rte_delay_us with userfunc and second one restores
> original rte_delay_us.
> Test user_delay_us is included.
> 
> Signed-off-by: Jozef Martiniak <jozmarti@cisco.com>
> ---
> app/test/test_cycles.c                             | 39 ++++++++++++++++++++++
> lib/librte_eal/common/eal_common_timer.c           | 19 +++++++++++
> lib/librte_eal/common/include/generic/rte_cycles.h | 13 ++++++++
> 3 files changed, 71 insertions(+)
> 
> diff --git a/app/test/test_cycles.c b/app/test/test_cycles.c
> index f6c043a..2b44a53 100644
> --- a/app/test/test_cycles.c
> +++ b/app/test/test_cycles.c
> @@ -90,3 +90,42 @@ test_cycles(void)
> }
> 
> REGISTER_TEST_COMMAND(cycles_autotest, test_cycles);
> +
> +/*
> + * rte_delay_us_callback test
> + *
> + * - check if callback is correctly registered/unregistered
> + *
> + */
> +
> +static int pattern;
> +static void my_rte_delay_us(unsigned us)
> +{
> +    pattern += us;
> +}
> +
> +static int
> +test_user_delay_us(void)
> +{
> +    pattern = 0;
> +
> +    rte_delay_us_callback_register(my_rte_delay_us);
> +
> +    rte_delay_us(2);
> +    if (pattern != 2)
> +        return -1;
> +
> +    rte_delay_us(3);
> +    if (pattern != 5)
> +        return -1;
> +
> +    rte_delay_us_callback_unregister();
> +
> +    rte_delay_us(3);
> +    if (pattern != 5)
> +        return -1;
> +
> +    return 0;
> +}
> +
> +REGISTER_TEST_COMMAND(user_delay_us, test_user_delay_us);
> diff --git a/lib/librte_eal/common/eal_common_timer.c b/lib/librte_eal/common/eal_common_timer.c
> index c4227cd..a982562 100644
> --- a/lib/librte_eal/common/eal_common_timer.c
> +++ b/lib/librte_eal/common/eal_common_timer.c
> @@ -47,9 +47,18 @@
> /* The frequency of the RDTSC timer resolution */
> static uint64_t eal_tsc_resolution_hz;
> 
> +/* User function which replaces rte_delay_us function */
> +static void (*rte_delay_us_override)(unsigned) = NULL;
> +
> void
> rte_delay_us(unsigned us)
> {
> +	if (unlikely(rte_delay_us_override != NULL))
> +	{
> +	    rte_delay_us_override(us);
> +	    return;
> +	}
> +
> 	const uint64_t start = rte_get_timer_cycles();
> 	const uint64_t ticks = (uint64_t)us * rte_get_timer_hz() / 1E6;
> 	while ((rte_get_timer_cycles() - start) < ticks)
> @@ -84,3 +93,13 @@ set_tsc_freq(void)
> 	RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
> 	eal_tsc_resolution_hz = freq;
> }
> +
> +void rte_delay_us_callback_register(void (*userfunc)(unsigned))
> +{
> +    rte_delay_us_override = userfunc;
> +}
> +
> +void rte_delay_us_callback_unregister(void)
> +{
> +    rte_delay_us_override = NULL;
> +}

I guess I would have used the rte_delay_us_callback_register(NULL) to unregister, but this is fine.

> diff --git a/lib/librte_eal/common/include/generic/rte_cycles.h b/lib/librte_eal/common/include/generic/rte_cycles.h
> index 8cc21f2..274f798 100644
> --- a/lib/librte_eal/common/include/generic/rte_cycles.h
> +++ b/lib/librte_eal/common/include/generic/rte_cycles.h
> @@ -202,4 +202,17 @@ rte_delay_ms(unsigned ms)
> 	rte_delay_us(ms * 1000);
> }
> 
> +/**
> + * Replace rte_delay_us with user defined function.
> + *
> + * @param userfunc
> + *   User function which replaces rte_delay_us.
> + */
> +void rte_delay_us_callback_register(void(*userfunc)(unsigned));
> +
> +/**
> + * Unregister user callback function. Restores original rte_delay_us.
> + */
> +void rte_delay_us_callback_unregister(void);

Just a note we need to add these two new APIs to the map file for ABI checking.

Other then these two comments I would give this one a +1 unless someone else has some comments.

> +
> #endif /* _RTE_CYCLES_H_ */
> -- 
> 2.1.4
> 

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] doc: fix release notes for 16.07
@ 2016-07-19 13:16 13% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2016-07-19 13:16 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Fix grammar, spelling and formatting of DPDK 16.07 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst | 134 +++++++++++++++++----------------
 1 file changed, 71 insertions(+), 63 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d3a144f..38887a5 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -40,16 +40,16 @@ New Features
 
 * **Added mempool external cache for non-EAL thread.**
 
-   Added new functions to create, free or flush a user-owned mempool
-   cache for non-EAL threads. Previously, cache was always disabled
-   on these threads.
+  Added new functions to create, free or flush a user-owned mempool
+  cache for non-EAL threads. Previously the cache was always disabled
+  on these threads.
 
 * **Changed the memory allocation in mempool library.**
 
   * Added ability to allocate a large mempool in virtually fragmented memory.
   * Added new APIs to populate a mempool with memory.
   * Added an API to free a mempool.
-  * Modified the API of rte_mempool_obj_iter() function.
+  * Modified the API of the ``rte_mempool_obj_iter()`` function.
   * Dropped specific Xen Dom0 code.
   * Dropped specific anonymous mempool code in testpmd.
 
@@ -63,10 +63,10 @@ New Features
 
 * **Added mailbox interrupt support for ixgbe and igb VFs.**
 
-  When the physical NIC link comes down or up, the PF driver will send a
+  When the physical NIC link comes up or down, the PF driver will send a
   mailbox message to notify each VF. To handle this link up/down event,
-  add mailbox interrupts support to receive the message and allow the app to
-  register a callback for it.
+  support have been added for a mailbox interrupt to receive the message and
+  allow the application to register a callback for it.
 
 * **Updated the ixgbe base driver.**
 
@@ -74,51 +74,49 @@ New Features
   following:
 
   * Added sgmii link for X550.
-  * Added mac link setup for X550a SFP and SFP+.
+  * Added MAC link setup for X550a SFP and SFP+.
   * Added KR support for X550em_a.
-  * Added new phy definitions for M88E1500.
+  * Added new PHY definitions for M88E1500.
   * Added support for the VLVF to be bypassed when adding/removing a VFTA entry.
   * Added X550a flow control auto negotiation support.
 
 * **Updated the i40e base driver.**
 
-  Updated the i40e base driver, which includes support for new devices IDs.
+  Updated the i40e base driver including support for new devices IDs.
 
-* **Supported virtio on IBM POWER8.**
+* **Added support for virtio on IBM POWER8.**
 
   The ioports are mapped in memory when using Linux UIO.
 
-* **Virtio support for containers.**
+* **Added support for Virtio in containers.**
 
   Add a new virtual device, named virtio-user, to support virtio for containers.
 
   Known limitations:
 
   * Control queue and multi-queue are not supported yet.
-  * Cannot work with --huge-unlink.
-  * Cannot work with --no-huge.
-  * Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages.
-  * Root privilege is a must for sorting hugepages by physical address.
-  * Can only be used with vhost user backend.
+  * Doesn't work with ``--huge-unlink``.
+  * Doesn't work with ``--no-huge``.
+  * Doesn't work when there are more than ``VHOST_MEMORY_MAX_NREGIONS(8)`` hugepages.
+  * Root privilege is required for sorting hugepages by physical address.
+  * Can only be used with the vhost user backend.
 
 * **Added vhost-user client mode.**
 
-  DPDK vhost-user could be the server as well as the client. It supports
-  server mode only before, now it also supports client mode. Client mode
-  is enabled when ``RTE_VHOST_USER_CLIENT`` flag is set while calling
+  DPDK vhost-user now supports client mode as well as server mode. Client mode
+  is enabled when the ``RTE_VHOST_USER_CLIENT`` flag is set while calling
   ``rte_vhost_driver_register``.
 
-  When DPDK vhost-user restarts from normal or abnormal quit (say crash),
-  the client mode would allow DPDK to establish the connect again.  Note
-  that a brand new QEMU version (v2.7 or above) is needed, otherwise, the
-  reconnect won't work.
+  When DPDK vhost-user restarts from an normal or abnormal exit (such as a
+  crash), the client mode allows DPDK to establish the connection again. Note
+  that QEMU version v2.7 or above is required for this feature.
 
-  DPDK vhost-user will also try to reconnect by default when
+  DPDK vhost-user will also try to reconnect by default when:
 
-  * the first connect fails (when QEMU is not started yet)
-  * the connection is broken (when QEMU restarts)
+  * The first connect fails (when QEMU is not started yet).
+  * The connection is broken (when QEMU restarts).
 
-  It can be turned off if flag ``RTE_VHOST_USER_NO_RECONNECT`` is set.
+  It can be turned off by setting the ``RTE_VHOST_USER_NO_RECONNECT`` flag.
 
 * **Added NSH packet recognition in i40e.**
 
@@ -127,7 +125,7 @@ New Features
   Now AESNI MB PMD supports 128/192/256-bit counter mode AES encryption and
   decryption.
 
-* **Added support of AES counter mode for Intel QuickAssist devices.**
+* **Added support for AES counter mode with Intel QuickAssist devices.**
 
   Enabled support for the AES CTR algorithm for Intel QuickAssist devices.
   Provided support for algorithm-chaining operations.
@@ -141,33 +139,33 @@ New Features
 
   The following features/modifications have been added to rte_hash library:
 
-  * Enabled application developers to use an extra flag for rte_hash creation
-    to specify default behavior (multi-thread safe/unsafe) with rte_hash_add_key
-    function.
-  * Changed Cuckoo search algorithm to breadth first search for multi-writer
-    routine and split Cuckoo Search and Move operations in order to reduce
-    transactional code region and improve TSX performance.
-  * Added a hash multi-writer test case for test app.
+  * Enabled application developers to use an extra flag for ``rte_hash``
+    creation to specify default behavior (multi-thread safe/unsafe) with the
+    ``rte_hash_add_key`` function.
+  * Changed the Cuckoo Hash Search algorithm to breadth first search for
+    multi-writer routines and split Cuckoo Hash Search and Move operations in
+    order to reduce transactional code region and improve TSX performance.
+  * Added a hash multi-writer test case to the test app.
 
 * **Improved IP Pipeline Application.**
 
-  The following features have been added to ip_pipeline application:
+  The following features have been added to the ip_pipeline application:
 
-  * Configure the MAC address in the routing pipeline and automatic routes
+  * Configure the MAC address in the routing pipeline and automatic route
     updates with change in link state.
   * Enable RSS per network interface through the configuration file.
   * Streamline the CLI code.
 
 * **Added keepalive enhancements.**
 
-  Adds support for reporting of core states other than dead to
+  Added support for reporting of core states other than dead to
   monitoring applications, enabling the support of broader liveness
   reporting to external processes.
 
 * **Added packet capture framework.**
 
-  * A new library ``librte_pdump`` is added to provide packet capture API.
-  * A new ``app/pdump`` tool is added to capture packets in DPDK.
+  * A new library ``librte_pdump`` is added to provide a packet capture API.
+  * A new ``app/pdump`` tool is added to demonstrate capture packets in DPDK.
 
 
 * **Added floating VEB support for i40e PF driver.**
@@ -197,18 +195,20 @@ EAL
 
 * **igb_uio: Fixed possible mmap failure for Linux >= 4.5.**
 
-  mmaping the iomem range of the PCI device fails for kernels that
-  enabled CONFIG_IO_STRICT_DEVMEM option:
+  The mmaping of the iomem range of the PCI device fails for kernels that
+  enabled the ``CONFIG_IO_STRICT_DEVMEM`` option. The error seen by the
+  user is as similar to the following::
 
-  EAL: pci_map_resource():
-           cannot mmap(39, 0x7f1c51800000, 0x100000, 0x0):
-           Invalid argument (0xffffffffffffffff)
+      EAL: pci_map_resource():
 
-  CONFIG_IO_STRICT_DEVMEM is introduced in Linux v4.5
+          cannot mmap(39, 0x7f1c51800000, 0x100000, 0x0):
+          Invalid argument (0xffffffffffffffff)
 
-  Updated igb_uio to stop reserving PCI memory resources, from
-  kernel point of view iomem region looks like idle and mmap worked
-  again. This matches uio_pci_generic usage.
+  The ``CONFIG_IO_STRICT_DEVMEM`` kernel option was introduced in Linux v4.5.
+
+  The issues was resolve by updating ``igb_uio`` to stop reserving PCI memory
+  resources. From the kernel point of view the iomem region looks like idle
+  and mmap works again. This matches the ``uio_pci_generic`` usage.
 
 
 Drivers
@@ -234,9 +234,9 @@ Libraries
 
 * **mbuf: Fixed refcnt update when detaching.**
 
-  Fix the ``rte_pktmbuf_detach()`` function to decrement the direct
-  mbuf's reference counter. The previous behavior was not to affect
-  the reference counter. It lead a memory leak of the direct mbuf.
+  Fix the ``rte_pktmbuf_detach()`` function to decrement the direct mbuf's
+  reference counter. The previous behavior was not to affect the reference
+  counter. This lead to a memory leak of the direct mbuf.
 
 
 Examples
@@ -266,9 +266,17 @@ API Changes
    * Add a short 1-2 sentence description of the API change. Use fixed width
      quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
 
-* The following counters are removed from ``rte_eth_stats`` structure:
-  ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
-  tx_pause_xon, rx_pause_xon, tx_pause_xoff, rx_pause_xoff.
+* The following counters are removed from the ``rte_eth_stats`` structure:
+
+  * ``ibadcrc``
+  * ``ibadlen``
+  * ``imcasts``
+  * ``fdirmatch``
+  * ``fdirmiss``
+  * ``tx_pause_xon``
+  * ``rx_pause_xon``
+  * ``tx_pause_xoff``
+  * ``rx_pause_xoff``
 
 * The extended statistics are fetched by ids with ``rte_eth_xstats_get``
   after a lookup by name ``rte_eth_xstats_get_names``.
@@ -280,8 +288,8 @@ API Changes
   ``rte_vhost_avail_entries``.
 
 * All existing vhost APIs and callbacks with ``virtio_net`` struct pointer
-  as the parameter have been changed due to the ABI refactoring mentioned
-  below: it's replaced by ``int vid``.
+  as the parameter have been changed due to the ABI refactoring described
+  below. It is replaced by ``int vid``.
 
 * The function ``rte_vhost_enqueue_burst`` no longer supports concurrent enqueuing
   packets to the same queue.
@@ -297,15 +305,15 @@ ABI Changes
      the previous releases and made in this release. Use fixed width quotes for
      ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
 
-* The ``rte_port_source_params`` structure has new fields to support PCAP file.
+* The ``rte_port_source_params`` structure has new fields to support PCAP files.
   It was already in release 16.04 with ``RTE_NEXT_ABI`` flag.
 
 * The ``rte_eth_dev_info`` structure has new fields ``nb_rx_queues`` and ``nb_tx_queues``
-  to support number of queues configured by software.
+  to support the number of queues configured by software.
 
-* vhost ABI refactoring has been made: ``virtio_net`` structure is never
-  exported to application any more. Instead, a handle, ``vid``, has been
-  used to represent this structure internally.
+* A Vhost ABI refactoring has been made: the ``virtio_net`` structure is no
+  longer exported directly to the application. Instead, a handle, ``vid``, has
+  been used to represent this structure internally.
 
 
 Shared Library Versions
-- 
2.7.4

^ permalink raw reply	[relevance 13%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-19  8:11  0%     ` Lu, Wenzhuo
@ 2016-07-19 13:12  0%       ` Adrien Mazarguil
  2016-07-20  2:16  0%         ` Lu, Wenzhuo
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-07-19 13:12 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote:
> Hi Adrien,
> Thanks for your clarification.  Most of my questions are clear, but still something may need to be discussed, comment below.

Hi Wenzhuo,

Please see below.

[...]
> > > > Requirements for a new API:
> > > >
> > > > - Flexible and extensible without causing API/ABI problems for existing
> > > >   applications.
> > > > - Should be unambiguous and easy to use.
> > > > - Support existing filtering features and actions listed in `Filter types`_.
> > > > - Support packet alteration.
> > > > - In case of overlapping filters, their priority should be well documented.
> > > Does that mean we don't guarantee the consistent of priority? The priority can
> > be different on different NICs. So the behavior of the actions  can be different.
> > Right?
> > 
> > No, the intent is precisely to define what happens in order to get a consistent
> > result across different devices, and document cases with undefined behavior.
> > There must be no room left for interpretation.
> > 
> > For example, the API must describe what happens when two overlapping filters
> > (e.g. one matching an Ethernet header, another one matching an IP header)
> > match a given packet at a given priority level.
> > 
> > It is documented in section 4.1.1 (priorities) as "undefined behavior".
> > Applications remain free to do it and deal with consequences, at least they know
> > they cannot expect a consistent outcome, unless they use different priority
> > levels for both rules, see also 4.4.5 (flow rules priority).
> > 
> > > Seems the users still need to aware the some details of the HW? Do we need
> > to add the negotiation for the priority?
> > 
> > Priorities as defined in this document may not be directly mappable to HW
> > capabilities (e.g. HW does not support enough priorities, or that some corner
> > case make them not work as described), in which case the PMD may choose to
> > simulate priorities (again 4.4.5), as long as the end result follows the
> > specification.
> > 
> > So users must not be aware of some HW details, the PMD does and must
> > perform the needed workarounds to suit their expectations. Users may only be
> > impacted by errors while attempting to create rules that are either unsupported
> > or would cause them (or existing rules) to diverge from the spec.
> The problem is sometime the priority of the filters is fixed according to
> > HW's implementation. For example, on ixgbe, n-tuple has a higher
> > priority than flow director.

As a side note I did not know that N-tuple had a higher priority than flow
director on ixgbe, priorities among filter types do not seem to be
documented at all in DPDK. This is one of the reasons I think we need a
generic API to handle flow configuration.

So, today an application cannot combine N-tuple and FDIR flow rules and get
a reliable outcome, unless it is designed for specific devices with a known
behavior.

> What's the right behavior of PMD if APP want to create a flow director rule which has a higher or even equal priority than an existing n-tuple rule? Should PMD return fail? 

First remember applications only deal with the generic API, PMDs are
responsible for choosing the most appropriate HW implementation to use
according to the requested flow rules (FDIR, N-tuple or anything else).

For the specific case of FDIR vs N-tuple, if the underlying HW supports both
I do not see why the PMD would create a N-tuple rule. Doesn't FDIR support
everything N-tuple can do and much more?

Assuming such a thing happened anyway, that the PMD had to create a rule
using a high priority filter type and that the application requests the
creation of a rule that can only be done using a lower priority filter type,
but also requested a higher priority for that rule, then yes, it should
obviously fail.

That is, unless the PMD can perform some kind of workaround to have both.

> If so, do we need more fail reasons? According to this RFC, I think we need return " EEXIST: collision with an existing rule. ", but it's not very clear, APP doesn't know the problem is priority, maybe more detailed reason is helpful.

Possibly, I've defined a basic set of errors, there are quite a number of
errno values to choose from. However I think we should not define too many
values. In my opinion the basic set covers every possible failure:

- EINVAL: invalid format, rule is broken or cannot be understood by the PMD
  anyhow.

- ENOTSUP: pattern/actions look fine but something in the requested rule is
  not supported and thus cannot be applied.

- EEXIST: pattern/actions are fine and could have been applied if only some
  other rule did not prevent the PMD to do it (I see it as the closest thing
  to "ETOOBAD" which unfortunately does not exist).

- ENOMEM: like EEXIST, except it is due to the lack of resources not because
  of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
  settled on ENOMEM as it is well known. Still open to debate.

Errno values are only useful to get a rough idea of the reason, and another
mechanism is needed to pinpoint the exact problem for debugging/reporting
purposes, something like:

 enum rte_flow_error_type {
     RTE_FLOW_ERROR_TYPE_NONE,
     RTE_FLOW_ERROR_TYPE_UNKNOWN,
     RTE_FLOW_ERROR_TYPE_PRIORITY,
     RTE_FLOW_ERROR_TYPE_PATTERN,
     RTE_FLOW_ERROR_TYPE_ACTION,
 };

 struct rte_flow_error {
     enum rte_flow_error_type type;
     void *offset; /* Points to the exact pattern item or action. */
     const char *message;
 };

Then either provide an optional struct rte_flow_error pointer to
rte_flow_validate(), or a separate function (rte_flow_analyze()?), since
processing this may be quite expensive and applications may not care about
the exact reason.

What do you suggest?

> > > > Behavior
> > > > --------
> > > >
> > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> > > >   returned).
> > > >
> > > > - There is no provision for reentrancy/multi-thread safety, although nothing
> > > >   should prevent different devices from being configured at the same
> > > >   time. PMDs may protect their control path functions accordingly.
> > > >
> > > > - Stopping the data path (TX/RX) should not be necessary when managing
> > flow
> > > >   rules. If this cannot be achieved naturally or with workarounds (such as
> > > >   temporarily replacing the burst function pointers), an appropriate error
> > > >   code must be returned (``EBUSY``).
> > > PMD cannot stop the data path without adding lock. So I think if some rules
> > cannot be applied without stopping rx/tx, PMD has to return fail.
> > > Or let the APP to stop the data path.
> > 
> > Agreed, that is the intent. If the PMD cannot touch flow rules for some reason
> > even after trying really hard, then it just returns EBUSY.
> > 
> > Perhaps we should write down that applications may get a different outcome
> > after stopping the data path if they get EBUSY?
> Agree, it's better to describe more about the APP. BTW, I checked the behavior of ixgbe/igb, I think we can add/delete filters during runtime. Hopefully we'll not hit too many EBUSY problems on other NICs :)

OK, I will add it.

> > > > - PMDs, not applications, are responsible for maintaining flow rules
> > > >   configuration when stopping and restarting a port or performing other
> > > >   actions which may affect them. They can only be destroyed explicitly.
> > > Don’t understand " They can only be destroyed explicitly."
> > 
> > This part says that as long as an application has not called
> > rte_flow_destroy() on a flow rule, it never disappears, whatever happens to the
> > port (stopped, restarted). The application is not responsible for re-creating rules
> > after that.
> > 
> > Note that according to the specification, this may translate to not being able to
> > stop a port as long as a flow rule is present, depending on how nice the PMD
> > intends to be with applications. Implementation can be done in small steps with
> > minimal amount of code on the PMD side.
> Does it mean PMD should store and maintain all the rules? Why not let rte do that? I think if PMD maintain all the rules, it means every kind of NIC should have a copy of code for the rules. But if rte do that, only one copy of code need to be maintained, right?

I've considered having rules stored in a common format understood at the RTE
level and not specific to each PMD and decided that the opaque rte_flow
pointer was a better choice for the following reasons: 

- Even though flow rules management is done in the control path, processing
  must be as fast as possible. Letting PMDs store flow rules using their own
  internal representation gives them the chance to achieve better
  performance.

- An opaque context managed by PMDs would probably have to be stored
  somewhere as well anyway.

- PMDs may not need to allocate/store anything at all if they exclusively
  rely on HW state for everything. In my opinion, the generic API has enough
  constraints for this to work and maintain consistency between flow
  rules. Note this is currently how most PMDs implement FDIR and other
  filter types.

- RTE can (and will) provide helpers to avoid most of the code redundancy,
  PMDs are free to use them or manage everything by themselves.

- Given that the opaque rte_flow pointer associated with a flow rule is to
  be stored by the application, PMDs do not even have to keep references to
  them.

- The flow rules format described in this specification (pattern / actions)
  will be used by applications directly, and will be free to arrange them in
  lists, trees or in any other way if they need to keep flow specifications
  around for further processing.

> When the port is stopped and restarted, rte can reconfigure the rules. Is the concern that PMD may adjust the sequence of the rules according to the priority, so every NIC has a different list of rules? But PMD can adjust them again when rte reconfiguring the rules.

What about PMDs able to stop and restart ports without destroying their own
flow rules? If we assume flow rules must be destroyed when stopping a port,
these PMDs are needlessly penalized with slower stop/start cycles. Think
about it assuming thousands of flow rules.

Thus from an application point of view, whatever happens when stopping and
restarting a port should not matter. If a flow rule was present before, it
must still be present afterwards. If the PMD had to destroy flow rules and
re-create them, it does not actually matter if they differ slightly at the
HW level, as long as:

- Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
  and refer to the same rules.

- The overall behavior of all rules is the same.

The list of rules you think of (patterns / actions) is maintained by
applications (not RTE), and only if they need them. RTE would needlessly
duplicate this.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-07 10:26  2%   ` Adrien Mazarguil
@ 2016-07-19  8:11  0%     ` Lu, Wenzhuo
  2016-07-19 13:12  0%       ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Lu, Wenzhuo @ 2016-07-19  8:11 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

Hi Adrien,
Thanks for your clarification.  Most of my questions are clear, but still something may need to be discussed, comment below.


> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Thursday, July 7, 2016 6:27 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org; Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody;
> Ajit Khaparde; Rahul Lakkireddy; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> Guarch, Pablo; Olga Shern
> Subject: Re: [RFC] Generic flow director/filtering/classification API
> 
> Hi Lu Wenzhuo,
> 
> Thanks for your feedback, I'm replying below as well.
> 
> On Thu, Jul 07, 2016 at 07:14:18AM +0000, Lu, Wenzhuo wrote:
> > Hi Adrien,
> > I have some questions, please see inline, thanks.
> >
> > > -----Original Message-----
> > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > > Sent: Wednesday, July 6, 2016 2:17 AM
> > > To: dev@dpdk.org
> > > Cc: Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody; Ajit
> > > Khaparde; Rahul Lakkireddy; Lu, Wenzhuo; Jan Medala; John Daley;
> > > Chen, Jing D; Ananyev, Konstantin; Matej Vido; Alejandro Lucero;
> > > Sony Chacko; Jerin Jacob; De Lara Guarch, Pablo; Olga Shern
> > > Subject: [RFC] Generic flow director/filtering/classification API
> > >
> > >
> > > Requirements for a new API:
> > >
> > > - Flexible and extensible without causing API/ABI problems for existing
> > >   applications.
> > > - Should be unambiguous and easy to use.
> > > - Support existing filtering features and actions listed in `Filter types`_.
> > > - Support packet alteration.
> > > - In case of overlapping filters, their priority should be well documented.
> > Does that mean we don't guarantee the consistent of priority? The priority can
> be different on different NICs. So the behavior of the actions  can be different.
> Right?
> 
> No, the intent is precisely to define what happens in order to get a consistent
> result across different devices, and document cases with undefined behavior.
> There must be no room left for interpretation.
> 
> For example, the API must describe what happens when two overlapping filters
> (e.g. one matching an Ethernet header, another one matching an IP header)
> match a given packet at a given priority level.
> 
> It is documented in section 4.1.1 (priorities) as "undefined behavior".
> Applications remain free to do it and deal with consequences, at least they know
> they cannot expect a consistent outcome, unless they use different priority
> levels for both rules, see also 4.4.5 (flow rules priority).
> 
> > Seems the users still need to aware the some details of the HW? Do we need
> to add the negotiation for the priority?
> 
> Priorities as defined in this document may not be directly mappable to HW
> capabilities (e.g. HW does not support enough priorities, or that some corner
> case make them not work as described), in which case the PMD may choose to
> simulate priorities (again 4.4.5), as long as the end result follows the
> specification.
> 
> So users must not be aware of some HW details, the PMD does and must
> perform the needed workarounds to suit their expectations. Users may only be
> impacted by errors while attempting to create rules that are either unsupported
> or would cause them (or existing rules) to diverge from the spec.
The problem is sometime the priority of the filters is fixed according to HW's implementation. For example, on ixgbe, n-tuple has a higher priority than flow director. What's the right behavior of PMD if APP want to create a flow director rule which has a higher or even equal priority than an existing n-tuple rule? Should PMD return fail? 
If so, do we need more fail reasons? According to this RFC, I think we need return " EEXIST: collision with an existing rule. ", but it's not very clear, APP doesn't know the problem is priority, maybe more detailed reason is helpful.


> > > Behavior
> > > --------
> > >
> > > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> > >   returned).
> > >
> > > - There is no provision for reentrancy/multi-thread safety, although nothing
> > >   should prevent different devices from being configured at the same
> > >   time. PMDs may protect their control path functions accordingly.
> > >
> > > - Stopping the data path (TX/RX) should not be necessary when managing
> flow
> > >   rules. If this cannot be achieved naturally or with workarounds (such as
> > >   temporarily replacing the burst function pointers), an appropriate error
> > >   code must be returned (``EBUSY``).
> > PMD cannot stop the data path without adding lock. So I think if some rules
> cannot be applied without stopping rx/tx, PMD has to return fail.
> > Or let the APP to stop the data path.
> 
> Agreed, that is the intent. If the PMD cannot touch flow rules for some reason
> even after trying really hard, then it just returns EBUSY.
> 
> Perhaps we should write down that applications may get a different outcome
> after stopping the data path if they get EBUSY?
Agree, it's better to describe more about the APP. BTW, I checked the behavior of ixgbe/igb, I think we can add/delete filters during runtime. Hopefully we'll not hit too many EBUSY problems on other NICs :)

> 
> > > - PMDs, not applications, are responsible for maintaining flow rules
> > >   configuration when stopping and restarting a port or performing other
> > >   actions which may affect them. They can only be destroyed explicitly.
> > Don’t understand " They can only be destroyed explicitly."
> 
> This part says that as long as an application has not called
> rte_flow_destroy() on a flow rule, it never disappears, whatever happens to the
> port (stopped, restarted). The application is not responsible for re-creating rules
> after that.
> 
> Note that according to the specification, this may translate to not being able to
> stop a port as long as a flow rule is present, depending on how nice the PMD
> intends to be with applications. Implementation can be done in small steps with
> minimal amount of code on the PMD side.
Does it mean PMD should store and maintain all the rules? Why not let rte do that? I think if PMD maintain all the rules, it means every kind of NIC should have a copy of code for the rules. But if rte do that, only one copy of code need to be maintained, right?
When the port is stopped and restarted, rte can reconfigure the rules. Is the concern that PMD may adjust the sequence of the rules according to the priority, so every NIC has a different list of rules? But PMD can adjust them again when rte reconfiguring the rules.

> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 04/17] eal: remove duplicate function declaration
  2016-07-12  6:01  3%   ` [dpdk-dev] [PATCH v6 04/17] eal: remove duplicate function declaration Shreyansh Jain
@ 2016-07-14 17:13  0%     ` viktorin
  0 siblings, 0 replies; 200+ results
From: viktorin @ 2016-07-14 17:13 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, thomas.monjalon, david.marchand

On Tue, 12 Jul 2016 11:31:09 +0530
Shreyansh Jain <shreyansh.jain@nxp.com> wrote:

> rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
> introduction.
> This function has been exported in ABI, so remove it from eal_private.h
> 
> Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
Reviewed-by: Jan Viktorin <viktorin@rehivetech.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ether: Driver-specific stats getting overwritten
  @ 2016-07-14 15:50  3%     ` Remy Horton
  0 siblings, 0 replies; 200+ results
From: Remy Horton @ 2016-07-14 15:50 UTC (permalink / raw)
  To: Igor Ryzhov, Thomas Monjalon; +Cc: dev


On 14/07/2016 14:51, Igor Ryzhov wrote:
[..]
> How about deleting rx_nombuf from rte_eth_stats? Do you think this
> counter is necessary? It just shows enormous numbers in case of a
> lack of processing speed. But we already have imissed counter which
> shows real number of packets, dropped for the same reason.

Deleting it has API/ABI breakage issues. There is also lack of 
consistency between drivers as to what imissed includes, as some don't 
implement it at all whereas others include filtered packets as well.


>> 14 июля 2016 г., в 16:37, Thomas Monjalon
>> <thomas.monjalon@6wind.com> написал(а):
>>
[..]
>> Yes it is strange and has always been like that. Why not moving the
>> assignment before calling the driver callback?

Think I'll do that. Easier than updating all the drivers that don't fill 
it in..

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 04/17] eal: remove duplicate function declaration
  @ 2016-07-12  6:01  3%   ` Shreyansh Jain
  2016-07-14 17:13  0%     ` viktorin
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-07-12  6:01 UTC (permalink / raw)
  To: dev; +Cc: viktorin, thomas.monjalon, david.marchand

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci devices")
Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 lib/librte_eal/common/eal_private.h | 7 -------
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 857dc3e..06a68f6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -259,13 +259,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);
 
 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 3fb2188..fe9c704 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include <rte_cpuflags.h>
 #include <rte_interrupts.h>
 #include <rte_pci.h>
+#include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_common.h>
 #include <rte_version.h>
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] librte_pmd_bond: fix exported symbol versioning
  2016-07-11 11:27  3% ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
@ 2016-07-11 12:58  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-11 12:58 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: Eric Kinzie, dev

2016-07-11 13:27, Christian Ehrhardt:
> *update in v2*
> - add missing changes in rte_eth_bond_8023ad.h
> 
> The older versions of rte_eth_bond_8023ad_conf_get and
> rte_eth_bond_8023ad_setup were available in the old way since 2.0 - at
> least according to the map file.
> 
> But versioning in the code was set to 16.04.
> That breaks compatibility checks for 2.0 on that library.
> 
> For example with the dpdk abi checker:
> http://people.canonical.com/~paelzer/compat_report.html
> 
> To fix, version the old symbols on the 2.0 version as they were
> initially added to the map file.
> 
> See http://people.canonical.com/~paelzer/compat_report.html
> 
> Fixes: dc40f17a ("net/bonding: allow external state machine in mode 4")
> 
> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] librte_pmd_bond: fix exported symbol versioning
  2016-07-06 11:39  3% [dpdk-dev] [PATCH] librte_pmd_bond: fix exported symbol versioning Christian Ehrhardt
@ 2016-07-11 11:27  3% ` Christian Ehrhardt
  2016-07-11 12:58  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2016-07-11 11:27 UTC (permalink / raw)
  To: Eric Kinzie, christian.ehrhardt, thomas.monjalon, dev

*update in v2*
- add missing changes in rte_eth_bond_8023ad.h

The older versions of rte_eth_bond_8023ad_conf_get and
rte_eth_bond_8023ad_setup were available in the old way since 2.0 - at
least according to the map file.

But versioning in the code was set to 16.04.
That breaks compatibility checks for 2.0 on that library.

For example with the dpdk abi checker:
http://people.canonical.com/~paelzer/compat_report.html

To fix, version the old symbols on the 2.0 version as they were
initially added to the map file.

See http://people.canonical.com/~paelzer/compat_report.html

Fixes: dc40f17a ("net/bonding: allow external state machine in mode 4")

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 drivers/net/bonding/rte_eth_bond_8023ad.c | 12 ++++++------
 drivers/net/bonding/rte_eth_bond_8023ad.h |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
index 48a50e4..2f7ae70 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -1068,7 +1068,7 @@ bond_mode_8023ad_conf_assign(struct mode8023ad_private *mode4,
 }
 
 static void
-bond_mode_8023ad_setup_v1604(struct rte_eth_dev *dev,
+bond_mode_8023ad_setup_v20(struct rte_eth_dev *dev,
 		struct rte_eth_bond_8023ad_conf *conf)
 {
 	struct rte_eth_bond_8023ad_conf def_conf;
@@ -1214,7 +1214,7 @@ free_out:
 }
 
 int
-rte_eth_bond_8023ad_conf_get_v1604(uint8_t port_id,
+rte_eth_bond_8023ad_conf_get_v20(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf)
 {
 	struct rte_eth_dev *bond_dev;
@@ -1229,7 +1229,7 @@ rte_eth_bond_8023ad_conf_get_v1604(uint8_t port_id,
 	bond_mode_8023ad_conf_get(bond_dev, conf);
 	return 0;
 }
-VERSION_SYMBOL(rte_eth_bond_8023ad_conf_get, _v1604, 16.04);
+VERSION_SYMBOL(rte_eth_bond_8023ad_conf_get, _v20, 2.0);
 
 int
 rte_eth_bond_8023ad_conf_get_v1607(uint8_t port_id,
@@ -1278,7 +1278,7 @@ bond_8023ad_setup_validate(uint8_t port_id,
 }
 
 int
-rte_eth_bond_8023ad_setup_v1604(uint8_t port_id,
+rte_eth_bond_8023ad_setup_v20(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf)
 {
 	struct rte_eth_dev *bond_dev;
@@ -1289,11 +1289,11 @@ rte_eth_bond_8023ad_setup_v1604(uint8_t port_id,
 		return err;
 
 	bond_dev = &rte_eth_devices[port_id];
-	bond_mode_8023ad_setup_v1604(bond_dev, conf);
+	bond_mode_8023ad_setup_v20(bond_dev, conf);
 
 	return 0;
 }
-VERSION_SYMBOL(rte_eth_bond_8023ad_setup, _v1604, 16.04);
+VERSION_SYMBOL(rte_eth_bond_8023ad_setup, _v20, 2.0);
 
 int
 rte_eth_bond_8023ad_setup_v1607(uint8_t port_id,
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h
index 1de34bc..6b8ff57 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.h
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.h
@@ -188,7 +188,7 @@ int
 rte_eth_bond_8023ad_conf_get(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf);
 int
-rte_eth_bond_8023ad_conf_get_v1604(uint8_t port_id,
+rte_eth_bond_8023ad_conf_get_v20(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf);
 int
 rte_eth_bond_8023ad_conf_get_v1607(uint8_t port_id,
@@ -209,7 +209,7 @@ int
 rte_eth_bond_8023ad_setup(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf);
 int
-rte_eth_bond_8023ad_setup_v1604(uint8_t port_id,
+rte_eth_bond_8023ad_setup_v20(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf);
 int
 rte_eth_bond_8023ad_setup_v1607(uint8_t port_id,
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] cryptodev: move new cryptodev type to bottom of enum
  2016-07-06 14:05  3% [dpdk-dev] [PATCH] cryptodev: move new cryptodev type to bottom of enum Pablo de Lara
@ 2016-07-08 17:52  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-08 17:52 UTC (permalink / raw)
  To: Pablo de Lara; +Cc: dev, declan.doherty

2016-07-06 15:05, Pablo de Lara:
> New cryptodev type for the new KASUMI PMD was added
> in the cryptodev type enum, but not at the end of it,
> causing an ABI breakage.
> 
> Fixes: 2773c86d061a ("crypto/kasumi: add driver for KASUMI library")
> 
> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 10/10] maintainers: add section for pmdinfo
  @ 2016-07-08 14:42  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-08 14:42 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

The author of this feature is Neil Horman.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
---
 MAINTAINERS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a59191e..f996c2e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -68,6 +68,10 @@ F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: scripts/validate-abi.sh
 
+Driver information
+F: buildtools/pmdinfogen/
+F: tools/pmdinfo.py
+
 
 Environment Abstraction Layer
 -----------------------------
-- 
2.7.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-07 23:15  0% ` Chandran, Sugesh
@ 2016-07-08 13:03  0%   ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-07-08 13:03 UTC (permalink / raw)
  To: Chandran, Sugesh
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Lu, Wenzhuo, Jan Medala,
	John Daley, Chen, Jing D, Ananyev, Konstantin, Matej Vido,
	Alejandro Lucero, Sony Chacko, Jerin Jacob, De Lara Guarch,
	Pablo, Olga Shern

Hi Sugesh,

On Thu, Jul 07, 2016 at 11:15:07PM +0000, Chandran, Sugesh wrote:
> Hi Adrien,
> 
> Thank you for proposing this. It would be really useful for application such as OVS-DPDK.
> Please find my comments and questions inline below prefixed with [Sugesh]. Most of them are from the perspective of enabling these APIs in application such as OVS-DPDK.

Thanks, I'm replying below.

> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Tuesday, July 5, 2016 7:17 PM
> > To: dev@dpdk.org
> > Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Zhang, Helin
> > <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Rasesh
> > Mody <rasesh.mody@qlogic.com>; Ajit Khaparde
> > <ajit.khaparde@broadcom.com>; Rahul Lakkireddy
> > <rahul.lakkireddy@chelsio.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>;
> > Jan Medala <jan@semihalf.com>; John Daley <johndale@cisco.com>; Chen,
> > Jing D <jing.d.chen@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Matej Vido <matejvido@gmail.com>;
> > Alejandro Lucero <alejandro.lucero@netronome.com>; Sony Chacko
> > <sony.chacko@qlogic.com>; Jerin Jacob
> > <jerin.jacob@caviumnetworks.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>; Olga Shern <olgas@mellanox.com>
> > Subject: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
> > 
> > Hi All,
> > 
> > First, forgive me for this large message, I know our mailboxes already
> > suffer quite a bit from the amount of traffic on this ML.
> > 
> > This is not exactly yet another thread about how flow director should be
> > extended, rather about a brand new API to handle filtering and
> > classification for incoming packets in the most PMD-generic and
> > application-friendly fashion we can come up with. Reasons described below.
> > 
> > I think this topic is important enough to include both the users of this API
> > as well as PMD maintainers. So far I have CC'ed librte_ether (especially
> > rte_eth_ctrl.h contributors), testpmd and PMD maintainers (with and
> > without
> > a .filter_ctrl implementation), but if you know application maintainers
> > other than testpmd who use FDIR or might be interested in this discussion,
> > feel free to add them.
> > 
> > The issues we found with the current approach are already summarized in
> > the
> > following document, but here is a quick summary for TL;DR folks:
> > 
> > - PMDs do not expose a common set of filter types and even when they do,
> >   their behavior more or less differs.
> > 
> > - Applications need to determine and adapt to device-specific limitations
> >   and quirks on their own, without help from PMDs.
> > 
> > - Writing an application that creates flow rules targeting all devices
> >   supported by DPDK is thus difficult, if not impossible.
> > 
> > - The current API has too many unspecified areas (particularly regarding
> >   side effects of flow rules) that make PMD implementation tricky.
> > 
> > This RFC API handles everything currently supported by .filter_ctrl, the
> > idea being to reimplement all of these to make them fully usable by
> > applications in a more generic and well defined fashion. It has a very small
> > set of mandatory features and an easy method to let applications probe for
> > supported capabilities.
> > 
> > The only downside is more work for the software control side of PMDs
> > because
> > they have to adapt to the API instead of the reverse. I think helpers can be
> > added to EAL to assist with this.
> > 
> > HTML version:
> > 
> >  https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
> > 
> > PDF version:
> > 
> >  https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf
> > 
> > Related draft header file (for reference while reading the specification):
> > 
> >  https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h
> > 
> > Git tree for completeness (latest .rst version can be retrieved from here):
> > 
> >  https://github.com/6WIND/rte_flow
> > 
> > What follows is the ReST source of the above, for inline comments and
> > discussion. I intend to update that specification accordingly.
> > 
> > ========================
> > Generic filter interface
> > ========================
> > 
> > .. footer::
> > 
> >    v0.6
> > 
> > .. contents::
> > .. sectnum::
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Overview
> > ========
> > 
> > DPDK provides several competing interfaces added over time to perform
> > packet
> > matching and related actions such as filtering and classification.
> > 
> > They must be extended to implement the features supported by newer
> > devices
> > in order to expose them to applications, however the current design has
> > several drawbacks:
> > 
> > - Complicated filter combinations which have not been hard-coded cannot be
> >   expressed.
> > - Prone to API/ABI breakage when new features must be added to an
> > existing
> >   filter type, which frequently happens.
> > 
> > From an application point of view:
> > 
> > - Having disparate interfaces, all optional and lacking in features does not
> >   make this API easy to use.
> > - Seemingly arbitrary built-in limitations of filter types based on the
> >   device they were initially designed for.
> > - Undefined relationship between different filter types.
> > - High complexity, considerable undocumented and/or undefined behavior.
> > 
> > Considering the growing number of devices supported by DPDK, adding a
> > new
> > filter type each time a new feature must be implemented is not sustainable
> > in the long term. Applications not written to target a specific device
> > cannot really benefit from such an API.
> > 
> > For these reasons, this document defines an extensible unified API that
> > encompasses and supersedes these legacy filter types.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Current API
> > ===========
> > 
> > Rationale
> > ---------
> > 
> > The reason several competing (and mostly overlapping) filtering APIs are
> > present in DPDK is due to its nature as a thin layer between hardware and
> > software.
> > 
> > Each subsequent interface has been added to better match the capabilities
> > and limitations of the latest supported device, which usually happened to
> > need an incompatible configuration approach. Because of this, many ended
> > up
> > device-centric and not usable by applications that were not written for that
> > particular device.
> > 
> > This document is not the first attempt to address this proliferation issue,
> > in fact a lot of work has already been done both to create a more generic
> > interface while somewhat keeping compatibility with legacy ones through a
> > common call interface (``rte_eth_dev_filter_ctrl()`` with the
> > ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``).
> > 
> > Today, these previously incompatible interfaces are known as filter types
> > (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``).
> > 
> > However while trivial to extend with new types, it only shifted the
> > underlying problem as applications still need to be written for one kind of
> > filter type, which, as described in the following sections, is not
> > necessarily implemented by all PMDs that support filtering.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Filter types
> > ------------
> > 
> > This section summarizes the capabilities of each filter type.
> > 
> > Although the following list is exhaustive, the description of individual
> > types may contain inaccuracies due to the lack of documentation or usage
> > examples.
> > 
> > Note: names are prefixed with ``RTE_ETH_FILTER_``.
> > 
> > ``MACVLAN``
> > ~~~~~~~~~~~
> > 
> > Matching:
> > 
> > - L2 source/destination addresses.
> > - Optional 802.1Q VLAN ID.
> > - Masking individual fields on a rule basis is not supported.
> > 
> > Action:
> > 
> > - Packets are redirected either to a given VF device using its ID or to the
> >   PF.
> > 
> > ``ETHERTYPE``
> > ~~~~~~~~~~~~~
> > 
> > Matching:
> > 
> > - L2 source/destination addresses (optional).
> > - Ethertype (no VLAN ID?).
> > - Masking individual fields on a rule basis is not supported.
> > 
> > Action:
> > 
> > - Receive packets on a given queue.
> > - Drop packets.
> > 
> > ``FLEXIBLE``
> > ~~~~~~~~~~~~
> > 
> > Matching:
> > 
> > - At most 128 consecutive bytes anywhere in packets.
> > - Masking is supported with byte granularity.
> > - Priorities are supported (relative to this filter type, undefined
> >   otherwise).
> > 
> > Action:
> > 
> > - Receive packets on a given queue.
> > 
> > ``SYN``
> > ~~~~~~~
> > 
> > Matching:
> > 
> > - TCP SYN packets only.
> > - One high priority bit can be set to give the highest possible priority to
> >   this type when other filters with different types are configured.
> > 
> > Action:
> > 
> > - Receive packets on a given queue.
> > 
> > ``NTUPLE``
> > ~~~~~~~~~~
> > 
> > Matching:
> > 
> > - Source/destination IPv4 addresses (optional in 2-tuple mode).
> > - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes).
> > - L4 protocol (2 and 5-tuple modes).
> > - Masking individual fields is supported.
> > - TCP flags.
> > - Up to 7 levels of priority relative to this filter type, undefined
> >   otherwise.
> > - No IPv6.
> > 
> > Action:
> > 
> > - Receive packets on a given queue.
> > 
> > ``TUNNEL``
> > ~~~~~~~~~~
> > 
> > Matching:
> > 
> > - Outer L2 source/destination addresses.
> > - Inner L2 source/destination addresses.
> > - Inner VLAN ID.
> > - IPv4/IPv6 source (destination?) address.
> > - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
> > 802.1BR
> >   E-Tag).
> > - Tenant ID for tunneling protocols that have one.
> > - Any combination of the above can be specified.
> > - Masking individual fields on a rule basis is not supported.
> > 
> > Action:
> > 
> > - Receive packets on a given queue.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``FDIR``
> > ~~~~~~~~
> > 
> > Queries:
> > 
> > - Device capabilities and limitations.
> > - Device statistics about configured filters (resource usage, collisions).
> > - Device configuration (matching input set and masks)
> > 
> > Matching:
> > 
> > - Device mode of operation: none (to disable filtering), signature
> >   (hash-based dispatching from masked fields) or perfect (either MAC VLAN
> > or
> >   tunnel).
> > - L2 Ethertype.
> > - Outer L2 destination address (MAC VLAN mode).
> > - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID
> >   (tunnel mode).
> > - IPv4 source/destination addresses, ToS, TTL and protocol fields.
> > - IPv6 source/destination addresses, TC, protocol and hop limits fields.
> > - UDP source/destination IPv4/IPv6 and ports.
> > - TCP source/destination IPv4/IPv6 and ports.
> > - SCTP source/destination IPv4/IPv6, ports and verification tag field.
> > - Note, only one protocol type at once (either only L2 Ethertype, basic
> >   IPv6, IPv4+UDP, IPv4+TCP and so on).
> > - VLAN TCI (extended API).
> > - At most 16 bytes to match in payload (extended API). A global device
> >   look-up table specifies for each possible protocol layer (unknown, raw,
> >   L2, L3, L4) the offset to use for each byte (they do not need to be
> >   contiguous) and the related bitmask.
> > - Whether packet is addressed to PF or VF, in that case its ID can be
> >   matched as well (extended API).
> > - Masking most of the above fields is supported, but simultaneously affects
> >   all filters configured on a device.
> > - Input set can be modified in a similar fashion for a given device to
> >   ignore individual fields of filters (i.e. do not match the destination
> >   address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_**
> >   macros). Configuring this also affects RSS processing on **i40e**.
> > - Filters can also provide 32 bits of arbitrary data to return as part of
> >   matched packets.
> > 
> > Action:
> > 
> > - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue.
> > - **RTE_ETH_FDIR_REJECT**: drop packet immediately.
> > - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list,
> >   otherwise process it with subsequent filters.
> > - For accepted packets and if requested by filter, either 32 bits of
> >   arbitrary data and four bytes of matched payload (only in case of flex
> >   bytes matching), or eight bytes of matched payload (flex also) are added
> >   to meta data.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``HASH``
> > ~~~~~~~~
> > 
> > Not an actual filter type. Provides and retrieves the global device
> > configuration (per port or entire NIC) for hash functions and their
> > properties.
> > 
> > Hash function selection: "default" (keep current), XOR or Toeplitz.
> > 
> > This function can be configured per flow type (**RTE_ETH_FLOW_**
> > definitions), supported types are:
> > 
> > - Unknown.
> > - Raw.
> > - Fragmented or non-fragmented IPv4.
> > - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other).
> > - Fragmented or non-fragmented IPv6.
> > - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other).
> > - L2 payload.
> > - IPv6 with extensions.
> > - IPv6 with L4 (TCP, UDP) and extensions.
> > 
> > ``L2_TUNNEL``
> > ~~~~~~~~~~~~~
> > 
> > Matching:
> > 
> > - All packets received on a given port.
> > 
> > Action:
> > 
> > - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
> >   802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag
> >   is implemented at the moment).
> > - VF ID to use for tag insertion (currently unused).
> > - Destination pool for tag based forwarding (pools are IDs that can be
> >   affected to ports, duplication occurs if the same ID is shared by several
> >   ports of the same NIC).
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Driver support
> > --------------
> > 
> > ======== ======= ========= ======== === ====== ====== ==== ====
> > =========
> > Driver   MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH
> > L2_TUNNEL
> > ======== ======= ========= ======== === ====== ====== ==== ====
> > =========
> > bnx2x
> > cxgbe
> > e1000            yes       yes      yes yes
> > ena
> > enic                                                  yes
> > fm10k
> > i40e     yes     yes                           yes    yes  yes
> > ixgbe            yes                yes yes           yes       yes
> > mlx4
> > mlx5                                                  yes
> > szedata2
> > ======== ======= ========= ======== === ====== ====== ==== ====
> > =========
> > 
> > Flow director
> > -------------
> > 
> > Flow director (FDIR) is the name of the most capable filter type, which
> > covers most features offered by others. As such, it is the most widespread
> > in PMDs that support filtering (i.e. all of them besides **e1000**).
> > 
> > It is also the only type that allows an arbitrary 32 bits value provided by
> > applications to be attached to a filter and returned with matching packets
> > instead of relying on the destination queue to recognize flows.
> > 
> > Unfortunately, even FDIR requires applications to be aware of low-level
> > capabilities and limitations (most of which come directly from **ixgbe** and
> > **i40e**):
> > 
> > - Bitmasks are set globally per device (port?), not per filter.
> [Sugesh] This means application cannot define filters that matches on arbitrary different offsets?
> If that’s the case, I assume the application has to program bitmask in advance. Otherwise how 
> the API framework deduce this bitmask information from the rules?? Its not very clear to me
> that how application pass down the bitmask information for multiple filters on same port?

This is my understanding of how flow director currently works, perhaps
someome more familiar with it can answer this question better than I could.

Let me take an example, if particular device can only handle a single IPv4
mask common to all flow rules (say only to match destination addresses),
updating that mask to also match the source address affects all defined and
future flow rules simultaneously.

That is how FDIR currently works and I think it is wrong, as it penalizes
devices that do support individual bit-masks per rule, and is a little
awkward from an application point of view.

What I suggest for the new API instead is the ability to specify one
bit-mask per rule, and let the PMD deal with HW limitations by automatically
configuring global bitmasks from the first added rule, then refusing to add
subsequent rules if they specify a conflicting bit-mask. Existing rules
remain unaffected that way, and applications do not have to be extra
cautious.

> > - Configuration state is not expected to be saved by the driver, and
> >   stopping/restarting a port requires the application to perform it again
> >   (API documentation is also unclear about this).
> > - Monolithic approach with ABI issues as soon as a new kind of flow or
> >   combination needs to be supported.
> > - Cryptic global statistics/counters.
> > - Unclear about how priorities are managed; filters seem to be arranged as a
> >   linked list in hardware (possibly related to configuration order).
> > 
> > Packet alteration
> > -----------------
> > 
> > One interesting feature is that the L2 tunnel filter type implements the
> > ability to alter incoming packets through a filter (in this case to
> > encapsulate them), thus the **mlx5** flow encap/decap features are not a
> > foreign concept.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Proposed API
> > ============
> > 
> > Terminology
> > -----------
> > 
> > - **Filtering API**: overall framework affecting the fate of selected
> >   packets, covers everything described in this document.
> > - **Matching pattern**: properties to look for in received packets, a
> >   combination of any number of items.
> > - **Pattern item**: part of a pattern that either matches packet data
> >   (protocol header, payload or derived information), or specifies properties
> >   of the pattern itself.
> > - **Actions**: what needs to be done when a packet matches a pattern.
> > - **Flow rule**: this is the result of combining a *matching pattern* with
> >   *actions*.
> > - **Filter rule**: a less generic term than *flow rule*, can otherwise be
> >   used interchangeably.
> > - **Hit**: a flow rule is said to be *hit* when processing a matching
> >   packet.
> > 
> > Requirements
> > ------------
> > 
> > As described in the previous section, there is a growing need for a common
> > method to configure filtering and related actions in a hardware independent
> > fashion.
> > 
> > The filtering API should not disallow any filter combination by design and
> > must remain as simple as possible to use. It can simply be defined as a
> > method to perform one or several actions on selected packets.
> > 
> > PMDs are aware of the capabilities of the device they manage and should be
> > responsible for preventing unsupported or conflicting combinations.
> > 
> > This approach is fundamentally different as it places most of the burden on
> > the software side of the PMD instead of having device capabilities directly
> > mapped to API functions, then expecting applications to work around
> > ensuing
> > compatibility issues.
> > 
> > Requirements for a new API:
> > 
> > - Flexible and extensible without causing API/ABI problems for existing
> >   applications.
> > - Should be unambiguous and easy to use.
> > - Support existing filtering features and actions listed in `Filter types`_.
> > - Support packet alteration.
> > - In case of overlapping filters, their priority should be well documented.
> > - Support filter queries (for example to retrieve counters).
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > High level design
> > -----------------
> > 
> > The chosen approach to make filtering as generic as possible is by
> > expressing matching patterns through lists of items instead of the flat
> > structures used in DPDK today, enabling combinations that are not
> > predefined
> > and thus being more versatile.
> > 
> > Flow rules can have several distinct actions (such as counting,
> > encapsulating, decapsulating before redirecting packets to a particular
> > queue, etc.), instead of relying on several rules to achieve this and having
> > applications deal with hardware implementation details regarding their
> > order.
> > 
> > Support for different priority levels on a rule basis is provided, for
> > example in order to force a more specific rule come before a more generic
> > one for packets matched by both, however hardware support for more than
> > a
> > single priority level cannot be guaranteed. When supported, the number of
> > available priority levels is usually low, which is why they can also be
> > implemented in software by PMDs (e.g. to simulate missing priority levels by
> > reordering rules).
> > 
> > In order to remain as hardware agnostic as possible, by default all rules
> > are considered to have the same priority, which means that the order
> > between
> > overlapping rules (when a packet is matched by several filters) is
> > undefined, packet duplication may even occur as a result.
> > 
> > PMDs may refuse to create overlapping rules at a given priority level when
> > they can be detected (e.g. if a pattern matches an existing filter).
> > 
> > Thus predictable results for a given priority level can only be achieved
> > with non-overlapping rules, using perfect matching on all protocol layers.
> > 
> > Support for multiple actions per rule may be implemented internally on top
> > of non-default hardware priorities, as a result both features may not be
> > simultaneously available to applications.
> > 
> > Considering that allowed pattern/actions combinations cannot be known in
> > advance and would result in an unpractically large number of capabilities to
> > expose, a method is provided to validate a given rule from the current
> > device configuration state without actually adding it (akin to a "dry run"
> > mode).
> > 
> > This enables applications to check if the rule types they need is supported
> > at initialization time, before starting their data path. This method can be
> > used anytime, its only requirement being that the resources needed by a
> > rule
> > must exist (e.g. a target RX queue must be configured first).
> > 
> > Each defined rule is associated with an opaque handle managed by the PMD,
> > applications are responsible for keeping it. These can be used for queries
> > and rules management, such as retrieving counters or other data and
> > destroying them.
> > 
> > Handles must be destroyed before releasing associated resources such as
> > queues.
> > 
> > Integration
> > -----------
> > 
> > To avoid ABI breakage, this new interface will be implemented through the
> > existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using
> > **RTE_ETH_FILTER_GENERIC** as a new filter type.
> > 
> > However a public front-end API described in `Rules management`_ will
> > be added as the preferred method to use it.
> > 
> > Once discussions with the community have converged to a definite API,
> > legacy
> > filter types should be deprecated and a deadline defined to remove their
> > support entirely.
> > 
> > PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC**
> > or
> > drop filtering support entirely. Less maintained PMDs for older hardware
> > may
> > lose support at this point.
> > 
> > The notion of filter type will then be deprecated and subsequently dropped
> > to avoid confusion between both frameworks.
> > 
> > Implementation details
> > ======================
> > 
> > Flow rule
> > ---------
> > 
> > A flow rule is the combination of a matching pattern with a list of actions,
> > and is the basis of this API.
> > 
> > Priorities
> > ~~~~~~~~~~
> > 
> > A priority can be assigned to a matching pattern.
> > 
> > The default priority level is 0 and is also the highest. Support for more
> > than a single priority level in hardware is not guaranteed.
> > 
> > If a packet is matched by several filters at a given priority level, the
> > outcome is undefined. It can take any path and can even be duplicated.
> > 
> > Matching pattern
> > ~~~~~~~~~~~~~~~~
> > 
> > A matching pattern comprises any number of items of various types.
> > 
> > Items are arranged in a list to form a matching pattern for packets. They
> > fall in two categories:
> > 
> > - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, VXLAN and
> > so
> >   on), usually associated with a specification structure. These must be
> >   stacked in the same order as the protocol layers to match, starting from
> >   L2.
> > 
> > - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF,
> >   SIGNATURE and so on), often without a specification structure. Since they
> >   are meta data that does not match packet contents, these can be specified
> >   anywhere within item lists without affecting the protocol matching items.
> > 
> > Most item specifications can be optionally paired with a mask to narrow the
> > specific fields or bits to be matched.
> > 
> > - Items are defined with ``struct rte_flow_item``.
> > - Patterns are defined with ``struct rte_flow_pattern``.
> > 
> > Example of an item specification matching an Ethernet header:
> > 
> > +-----------------------------------------+
> > | Ethernet                                |
> > +==========+=========+====================+
> > | ``spec`` | ``src`` | ``00:01:02:03:04`` |
> > |          +---------+--------------------+
> > |          | ``dst`` | ``00:2a:66:00:01`` |
> > +----------+---------+--------------------+
> > | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` |
> > |          +---------+--------------------+
> > |          | ``dst`` | ``00:00:00:00:ff`` |
> > +----------+---------+--------------------+
> > 
> > Non-masked bits stand for any value, Ethernet headers with the following
> > properties are thus matched:
> > 
> > - ``src``: ``??:01:02:03:??``
> > - ``dst``: ``??:??:??:??:01``
> > 
> > Except for meta types that do not need one, ``spec`` must be a valid pointer
> > to a structure of the related item type. A ``mask`` of the same type can be
> > provided to tell which bits in ``spec`` are to be matched.
> > 
> > A mask is normally only needed for ``spec`` fields matching packet data,
> > ignored otherwise. See individual item types for more information.
> > 
> > A ``NULL`` mask pointer is allowed and is similar to matching with a full
> > mask (all ones) ``spec`` fields supported by hardware, the remaining fields
> > are ignored (all zeroes), there is thus no error checking for unsupported
> > fields.
> > 
> > Matching pattern items for packet data must be naturally stacked (ordered
> > from lowest to highest protocol layer), as in the following examples:
> > 
> > +--------------+
> > | TCPv4 as L4  |
> > +===+==========+
> > | 0 | Ethernet |
> > +---+----------+
> > | 1 | IPv4     |
> > +---+----------+
> > | 2 | TCP      |
> > +---+----------+
> > 
> > +----------------+
> > | TCPv6 in VXLAN |
> > +===+============+
> > | 0 | Ethernet   |
> > +---+------------+
> > | 1 | IPv4       |
> > +---+------------+
> > | 2 | UDP        |
> > +---+------------+
> > | 3 | VXLAN      |
> > +---+------------+
> > | 4 | Ethernet   |
> > +---+------------+
> > | 5 | IPv6       |
> > +---+------------+
> > | 6 | TCP        |
> > +---+------------+
> > 
> > +-----------------------------+
> > | TCPv4 as L4 with meta items |
> > +===+=========================+
> > | 0 | VOID                    |
> > +---+-------------------------+
> > | 1 | Ethernet                |
> > +---+-------------------------+
> > | 2 | VOID                    |
> > +---+-------------------------+
> > | 3 | IPv4                    |
> > +---+-------------------------+
> > | 4 | TCP                     |
> > +---+-------------------------+
> > | 5 | VOID                    |
> > +---+-------------------------+
> > | 6 | VOID                    |
> > +---+-------------------------+
> > 
> > The above example shows how meta items do not affect packet data
> > matching
> > items, as long as those remain stacked properly. The resulting matching
> > pattern is identical to "TCPv4 as L4".
> > 
> > +----------------+
> > | UDPv6 anywhere |
> > +===+============+
> > | 0 | IPv6       |
> > +---+------------+
> > | 1 | UDP        |
> > +---+------------+
> > 
> > If supported by the PMD, omitting one or several protocol layers at the
> > bottom of the stack as in the above example (missing an Ethernet
> > specification) enables hardware to look anywhere in packets.
> > 
> > It is unspecified whether the payload of supported encapsulations
> > (e.g. VXLAN inner packet) is matched by such a pattern, which may apply to
> > inner, outer or both packets.
> > 
> > +---------------------+
> > | Invalid, missing L3 |
> > +===+=================+
> > | 0 | Ethernet        |
> > +---+-----------------+
> > | 1 | UDP             |
> > +---+-----------------+
> > 
> > The above pattern is invalid due to a missing L3 specification between L2
> > and L4. It is only allowed at the bottom and at the top of the stack.
> > 
> > Meta item types
> > ~~~~~~~~~~~~~~~
> > 
> > These do not match packet data but affect how the pattern is processed,
> > most
> > of them do not need a specification structure. This particularity allows
> > them to be specified anywhere without affecting other item types.
> > 
> > ``END``
> > ^^^^^^^
> > 
> > End marker for item lists. Prevents further processing of items, thereby
> > ending the pattern.
> > 
> > - Its numeric value is **0** for convenience.
> > - PMD support is mandatory.
> > - Both ``spec`` and ``mask`` are ignored.
> > 
> > +--------------------+
> > | END                |
> > +==========+=========+
> > | ``spec`` | ignored |
> > +----------+---------+
> > | ``mask`` | ignored |
> > +----------+---------+
> > 
> > ``VOID``
> > ^^^^^^^^
> > 
> > Used as a placeholder for convenience. It is ignored and simply discarded by
> > PMDs.
> > 
> > - PMD support is mandatory.
> > - Both ``spec`` and ``mask`` are ignored.
> > 
> > +--------------------+
> > | VOID               |
> > +==========+=========+
> > | ``spec`` | ignored |
> > +----------+---------+
> > | ``mask`` | ignored |
> > +----------+---------+
> > 
> > One usage example for this type is generating rules that share a common
> > prefix quickly without reallocating memory, only by updating item types:
> > 
> > +------------------------+
> > | TCP, UDP or ICMP as L4 |
> > +===+====================+
> > | 0 | Ethernet           |
> > +---+--------------------+
> > | 1 | IPv4               |
> > +---+------+------+------+
> > | 2 | UDP  | VOID | VOID |
> > +---+------+------+------+
> > | 3 | VOID | TCP  | VOID |
> > +---+------+------+------+
> > | 4 | VOID | VOID | ICMP |
> > +---+------+------+------+
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``INVERT``
> > ^^^^^^^^^^
> > 
> > Inverted matching, i.e. process packets that do not match the pattern.
> > 
> > - Both ``spec`` and ``mask`` are ignored.
> > 
> > +--------------------+
> > | INVERT             |
> > +==========+=========+
> > | ``spec`` | ignored |
> > +----------+---------+
> > | ``mask`` | ignored |
> > +----------+---------+
> > 
> > Usage example in order to match non-TCPv4 packets only:
> > 
> > +--------------------+
> > | Anything but TCPv4 |
> > +===+================+
> > | 0 | INVERT         |
> > +---+----------------+
> > | 1 | Ethernet       |
> > +---+----------------+
> > | 2 | IPv4           |
> > +---+----------------+
> > | 3 | TCP            |
> > +---+----------------+
> > 
> > ``PF``
> > ^^^^^^
> > 
> > Matches packets addressed to the physical function of the device.
> > 
> > - Both ``spec`` and ``mask`` are ignored.
> > 
> > +--------------------+
> > | PF                 |
> > +==========+=========+
> > | ``spec`` | ignored |
> > +----------+---------+
> > | ``mask`` | ignored |
> > +----------+---------+
> > 
> > ``VF``
> > ^^^^^^
> > 
> > Matches packets addressed to the given virtual function ID of the device.
> > 
> > - Only ``spec`` needs to be defined, ``mask`` is ignored.
> > 
> > +----------------------------------------+
> > | VF                                     |
> > +==========+=========+===================+
> > | ``spec`` | ``vf``  | destination VF ID |
> > +----------+---------+-------------------+
> > | ``mask`` | ignored                     |
> > +----------+-----------------------------+
> > 
> > ``SIGNATURE``
> > ^^^^^^^^^^^^^
> > 
> > Requests hash-based signature dispatching for this rule.
> > 
> > Considering this is a global setting on devices that support it, all
> > subsequent filter rules may have to be created with it as well.
> > 
> > - Only ``spec`` needs to be defined, ``mask`` is ignored.
> > 
> > +--------------------+
> > | SIGNATURE          |
> > +==========+=========+
> > | ``spec`` | TBD     |
> > +----------+---------+
> > | ``mask`` | ignored |
> > +----------+---------+
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Data matching item types
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > Most of these are basically protocol header definitions with associated
> > bitmasks. They must be specified (stacked) from lowest to highest protocol
> > layer.
> > 
> > The following list is not exhaustive as new protocols will be added in the
> > future.
> > 
> > ``ANY``
> > ^^^^^^^
> > 
> > Matches any protocol in place of the current layer, a single ANY may also
> > stand for several protocol layers.
> > 
> > This is usually specified as the first pattern item when looking for a
> > protocol anywhere in a packet.
> > 
> > - A maximum value of **0** requests matching any number of protocol
> > layers
> >   above or equal to the minimum value, a maximum value lower than the
> >   minimum one is otherwise invalid.
> > - Only ``spec`` needs to be defined, ``mask`` is ignored.
> > 
> > +-----------------------------------------------------------------------+
> > | ANY                                                                   |
> > +==========+=========+====================================
> > ==============+
> > | ``spec`` | ``min`` | minimum number of layers covered                 |
> > |          +---------+--------------------------------------------------+
> > |          | ``max`` | maximum number of layers covered, 0 for infinity |
> > +----------+---------+--------------------------------------------------+
> > | ``mask`` | ignored                                                    |
> > +----------+------------------------------------------------------------+
> > 
> > Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or
> > IPv6)
> > and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
> > or IPv6) matched by the second ANY specification:
> > 
> > +----------------------------------+
> > | TCP in VXLAN with wildcards      |
> > +===+==============================+
> > | 0 | Ethernet                     |
> > +---+-----+----------+---------+---+
> > | 1 | ANY | ``spec`` | ``min`` | 2 |
> > |   |     |          +---------+---+
> > |   |     |          | ``max`` | 2 |
> > +---+-----+----------+---------+---+
> > | 2 | VXLAN                        |
> > +---+------------------------------+
> > | 3 | Ethernet                     |
> > +---+-----+----------+---------+---+
> > | 4 | ANY | ``spec`` | ``min`` | 1 |
> > |   |     |          +---------+---+
> > |   |     |          | ``max`` | 1 |
> > +---+-----+----------+---------+---+
> > | 5 | TCP                          |
> > +---+------------------------------+
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``RAW``
> > ^^^^^^^
> > 
> > Matches a string of a given length at a given offset (in bytes), or anywhere
> > in the payload of the current protocol layer (including L2 header if used as
> > the first item in the stack).
> > 
> > This does not increment the protocol layer count as it is not a protocol
> > definition. Subsequent RAW items modulate the first absolute one with
> > relative offsets.
> > 
> > - Using **-1** as the ``offset`` of the first RAW item makes its absolute
> >   offset not fixed, i.e. the pattern is searched everywhere.
> > - ``mask`` only affects the pattern.
> > 
> > +--------------------------------------------------------------+
> > | RAW                                                          |
> > +==========+=============+================================
> > =====+
> > | ``spec`` | ``offset``  | absolute or relative pattern offset |
> > |          +-------------+-------------------------------------+
> > |          | ``length``  | pattern length                      |
> > |          +-------------+-------------------------------------+
> > |          | ``pattern`` | byte string of the above length     |
> > +----------+-------------+-------------------------------------+
> > | ``mask`` | ``offset``  | ignored                             |
> > |          +-------------+-------------------------------------+
> > |          | ``length``  | ignored                             |
> > |          +-------------+-------------------------------------+
> > |          | ``pattern`` | bitmask with the same byte length   |
> > +----------+-------------+-------------------------------------+
> > 
> > Example pattern looking for several strings at various offsets of a UDP
> > payload, using combined RAW items:
> > 
> > +------------------------------------------+
> > | UDP payload matching                     |
> > +===+======================================+
> > | 0 | Ethernet                             |
> > +---+--------------------------------------+
> > | 1 | IPv4                                 |
> > +---+--------------------------------------+
> > | 2 | UDP                                  |
> > +---+-----+----------+-------------+-------+
> > | 3 | RAW | ``spec`` | ``offset``  | -1    |
> > |   |     |          +-------------+-------+
> > |   |     |          | ``length``  | 3     |
> > |   |     |          +-------------+-------+
> > |   |     |          | ``pattern`` | "foo" |
> > +---+-----+----------+-------------+-------+
> > | 4 | RAW | ``spec`` | ``offset``  | 20    |
> > |   |     |          +-------------+-------+
> > |   |     |          | ``length``  | 3     |
> > |   |     |          +-------------+-------+
> > |   |     |          | ``pattern`` | "bar" |
> > +---+-----+----------+-------------+-------+
> > | 5 | RAW | ``spec`` | ``offset``  | -30   |
> > |   |     |          +-------------+-------+
> > |   |     |          | ``length``  | 3     |
> > |   |     |          +-------------+-------+
> > |   |     |          | ``pattern`` | "baz" |
> > +---+-----+----------+-------------+-------+
> > 
> > This translates to:
> > 
> > - Locate "foo" in UDP payload, remember its offset.
> > - Check "bar" at "foo"'s offset plus 20 bytes.
> > - Check "baz" at "foo"'s offset minus 30 bytes.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``ETH``
> > ^^^^^^^
> > 
> > Matches an Ethernet header.
> > 
> > - ``dst``: destination MAC.
> > - ``src``: source MAC.
> > - ``type``: EtherType.
> > - ``tags``: number of 802.1Q/ad tags defined.
> > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
> > 
> >  - ``tpid``: Tag protocol identifier.
> >  - ``tci``: Tag control information.
> > 
> > ``IPV4``
> > ^^^^^^^^
> > 
> > Matches an IPv4 header.
> > 
> > - ``src``: source IP address.
> > - ``dst``: destination IP address.
> > - ``tos``: ToS/DSCP field.
> > - ``ttl``: TTL field.
> > - ``proto``: protocol number for the next layer.
> > 
> > ``IPV6``
> > ^^^^^^^^
> > 
> > Matches an IPv6 header.
> > 
> > - ``src``: source IP address.
> > - ``dst``: destination IP address.
> > - ``tc``: traffic class field.
> > - ``nh``: Next header field (protocol).
> > - ``hop_limit``: hop limit field (TTL).
> > 
> > ``ICMP``
> > ^^^^^^^^
> > 
> > Matches an ICMP header.
> > 
> > - TBD.
> > 
> > ``UDP``
> > ^^^^^^^
> > 
> > Matches a UDP header.
> > 
> > - ``sport``: source port.
> > - ``dport``: destination port.
> > - ``length``: UDP length.
> > - ``checksum``: UDP checksum.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``TCP``
> > ^^^^^^^
> > 
> > Matches a TCP header.
> > 
> > - ``sport``: source port.
> > - ``dport``: destination port.
> > - All other TCP fields and bits.
> > 
> > ``VXLAN``
> > ^^^^^^^^^
> > 
> > Matches a VXLAN header.
> > 
> > - TBD.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Actions
> > ~~~~~~~
> > 
> > Each possible action is represented by a type. Some have associated
> > configuration structures. Several actions combined in a list can be affected
> > to a flow rule. That list is not ordered.
> > 
> > At least one action must be defined in a filter rule in order to do
> > something with matched packets.
> > 
> > - Actions are defined with ``struct rte_flow_action``.
> > - A list of actions is defined with ``struct rte_flow_actions``.
> > 
> > They fall in three categories:
> > 
> > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
> >   processing matched packets by subsequent flow rules, unless overridden
> >   with PASSTHRU.
> > 
> > - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
> > for
> >   additional processing by subsequent flow rules.
> > 
> > - Other non terminating meta actions that do not affect the fate of packets
> >   (END, VOID, ID, COUNT).
> > 
> > When several actions are combined in a flow rule, they should all have
> > different types (e.g. dropping a packet twice is not possible). However
> > considering the VOID type is an exception to this rule, the defined behavior
> > is for PMDs to only take into account the last action of a given type found
> > in the list. PMDs still perform error checking on the entire list.
> > 
> > *Note that PASSTHRU is the only action able to override a terminating rule.*
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Example of an action that redirects packets to queue index 10:
> > 
> > +----------------+
> > | QUEUE          |
> > +===========+====+
> > | ``queue`` | 10 |
> > +-----------+----+
> > 
> > Action lists examples, their order is not significant, applications must
> > consider all actions to be performed simultaneously:
> > 
> > +----------------+
> > | Count and drop |
> > +=======+========+
> > | COUNT |        |
> > +-------+--------+
> > | DROP  |        |
> > +-------+--------+
> > 
> > +--------------------------+
> > | Tag, count and redirect  |
> > +=======+===========+======+
> > | ID    | ``id``    | 0x2a |
> > +-------+-----------+------+
> > | COUNT |                  |
> > +-------+-----------+------+
> > | QUEUE | ``queue`` | 10   |
> > +-------+-----------+------+
> > 
> > +-----------------------+
> > | Redirect to queue 5   |
> > +=======+===============+
> > | DROP  |               |
> > +-------+-----------+---+
> > | QUEUE | ``queue`` | 5 |
> > +-------+-----------+---+
> > 
> > In the above example, considering both actions are performed
> > simultaneously,
> > its end result is that only QUEUE has any effect.
> > 
> > +-----------------------+
> > | Redirect to queue 3   |
> > +=======+===========+===+
> > | QUEUE | ``queue`` | 5 |
> > +-------+-----------+---+
> > | VOID  |               |
> > +-------+-----------+---+
> > | QUEUE | ``queue`` | 3 |
> > +-------+-----------+---+
> > 
> > As previously described, only the last action of a given type found in the
> > list is taken into account. The above example also shows that VOID is
> > ignored.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Action types
> > ~~~~~~~~~~~~
> > 
> > Common action types are described in this section. Like pattern item types,
> > this list is not exhaustive as new actions will be added in the future.
> > 
> > ``END`` (action)
> > ^^^^^^^^^^^^^^^^
> > 
> > End marker for action lists. Prevents further processing of actions, thereby
> > ending the list.
> > 
> > - Its numeric value is **0** for convenience.
> > - PMD support is mandatory.
> > - No configurable property.
> > 
> > +---------------+
> > | END           |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > ``VOID`` (action)
> > ^^^^^^^^^^^^^^^^^
> > 
> > Used as a placeholder for convenience. It is ignored and simply discarded by
> > PMDs.
> > 
> > - PMD support is mandatory.
> > - No configurable property.
> > 
> > +---------------+
> > | VOID          |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > ``PASSTHRU``
> > ^^^^^^^^^^^^
> > 
> > Leaves packets up for additional processing by subsequent flow rules. This
> > is the default when a rule does not contain a terminating action, but can be
> > specified to force a rule to become non-terminating.
> > 
> > - No configurable property.
> > 
> > +---------------+
> > | PASSTHRU      |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > Example to copy a packet to a queue and continue processing by subsequent
> > flow rules:
> [Sugesh] If a packet get copied to a queue, it’s a termination action. 
> How can its possible to do subsequent action after the packet already 
> moved to the queue. ?How it differs from DUP action?
>  Am I missing anything here? 

Devices may not support the combination of QUEUE + PASSTHRU (i.e. making
QUEUE non-terminating). However these same devices may expose the ability to
copy a packet to another (sniffer) queue all while keeping the rule
terminating (QUEUE + DUP but no PASSTHRU).

DUP with two rules, assuming priorties and PASSTRHU are supported:

- pattern X, priority 0; actions: QUEUE 5, PASSTHRU (non-terminating)

- pattern X, priority 1; actions: QUEUE 6 (terminating)

DUP with two actions on a single rule and a single priority:

- pattern X, priority 0; actions: DUP 5, QUEUE 6 (terminating)

If supported, from an application point of view the end result is similar in
both cases (note the second case may be implemented by the PMD using two HW
rules internally).

However the second case does not waste a priority level and clearly states
the intent to the PMD which is more likely to be supported. If HW supports
DUP directly it is even faster since there is a single rule. That is why I
thought having DUP as an action would be useful.

> > +--------------------------+
> > | Copy to queue 8          |
> > +==========+===============+
> > | PASSTHRU |               |
> > +----------+-----------+---+
> > | QUEUE    | ``queue`` | 8 |
> > +----------+-----------+---+
> > 
> > ``ID``
> > ^^^^^^
> > 
> > Attaches a 32 bit value to packets.
> > 
> > +----------------------------------------------+
> > | ID                                           |
> > +========+=====================================+
> > | ``id`` | 32 bit value to return with packets |
> > +--------+-------------------------------------+
> > 
> [Sugesh] I assume the application has to program the flow 
> with a unique ID and matching packets are stamped with this ID
> when reporting to the software. The uniqueness of ID is NOT 
> guaranteed by the API framework. Correct me if I am wrong here.

You are right, if the way I wrote it is not clear enough, I'm open to
suggestions to improve it.

> [Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to have a
> 64 bit ID? So that application can use the control plane flow pointer
> Itself as an ID. Does it make sense? 

I've specified a 32 bit ID for now because this is what FDIR supports and
also what existing devices can report today AFAIK (i40e and mlx5).

We could use 64 bit for future-proofness in a separate action like "ID64"
when at least one device supports it.

To PMD maintainers: please comment if you know devices that support tagging
matching packets with more than 32 bits of user-provided data!

> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``QUEUE``
> > ^^^^^^^^^
> > 
> > Assigns packets to a given queue index.
> > 
> > - Terminating by default.
> > 
> > +--------------------------------+
> > | QUEUE                          |
> > +===========+====================+
> > | ``queue`` | queue index to use |
> > +-----------+--------------------+
> > 
> > ``DROP``
> > ^^^^^^^^
> > 
> > Drop packets.
> > 
> > - No configurable property.
> > - Terminating by default.
> > - PASSTHRU overrides this action if both are specified.
> > 
> > +---------------+
> > | DROP          |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > ``COUNT``
> > ^^^^^^^^^
> > 
> [Sugesh] Should we really have to set count action explicitly for every rule?
> IMHO it would be great to be an implicit action. Most of the application would be
> interested in the stats of almost all the filters/flows .

I can see why, but no, it must be explicitly requested because you may want
to know in advance when it is not supported. Also considering it is
something else to be done by HW (a separate action), we can assume enabling
this may slow things down a bit.

HW limitations may also prevent you from having as many flow counters as you
want, in which case you probably want to carefully pick which rules have
them.

I think this target is most useful with DROP, VF and PF actions since
those are currently the only ones where SW may not see the related packets.

> > Enables hits counter for this rule.
> > 
> > This counter can be retrieved and reset through ``rte_flow_query()``, see
> > ``struct rte_flow_query_count``.
> > 
> > - Counters can be retrieved with ``rte_flow_query()``.
> > - No configurable property.
> > 
> > +---------------+
> > | COUNT         |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > Query structure to retrieve and reset the flow rule hits counter:
> > 
> > +------------------------------------------------+
> > | COUNT query                                    |
> > +===========+=====+==============================+
> > | ``reset`` | in  | reset counter after query    |
> > +-----------+-----+------------------------------+
> > | ``hits``  | out | number of hits for this flow |
> > +-----------+-----+------------------------------+
> > 
> > ``DUP``
> > ^^^^^^^
> > 
> > Duplicates packets to a given queue index.
> > 
> > This is normally combined with QUEUE, however when used alone, it is
> > actually similar to QUEUE + PASSTHRU.
> > 
> > - Non-terminating by default.
> > 
> > +------------------------------------------------+
> > | DUP                                            |
> > +===========+====================================+
> > | ``queue`` | queue index to duplicate packet to |
> > +-----------+------------------------------------+
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``RSS``
> > ^^^^^^^
> > 
> > Similar to QUEUE, except RSS is additionally performed on packets to spread
> > them among several queues according to the provided parameters.
> > 
> > - Terminating by default.
> > 
> > +---------------------------------------------+
> > | RSS                                         |
> > +==============+==============================+
> > | ``rss_conf`` | RSS parameters               |
> > +--------------+------------------------------+
> > | ``queues``   | number of entries in queue[] |
> > +--------------+------------------------------+
> > | ``queue[]``  | queue indices to use         |
> > +--------------+------------------------------+
> > 
> > ``PF`` (action)
> > ^^^^^^^^^^^^^^^
> > 
> > Redirects packets to the physical function (PF) of the current device.
> > 
> > - No configurable property.
> > - Terminating by default.
> > 
> > +---------------+
> > | PF            |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > ``VF`` (action)
> > ^^^^^^^^^^^^^^^
> > 
> > Redirects packets to the virtual function (VF) of the current device with
> > the specified ID.
> > 
> > - Terminating by default.
> > 
> > +---------------------------------------+
> > | VF                                    |
> > +========+==============================+
> > | ``id`` | VF ID to redirect packets to |
> > +--------+------------------------------+
> > 
> > Planned types
> > ~~~~~~~~~~~~~
> > 
> > Other action types are planned but not defined yet. These actions will add
> > the ability to alter matching packets in several ways, such as performing
> > encapsulation/decapsulation of tunnel headers on specific flows.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Rules management
> > ----------------
> > 
> > A simple API with only four functions is provided to fully manage flows.
> > 
> > Each created flow rule is associated with an opaque, PMD-specific handle
> > pointer. The application is responsible for keeping it until the rule is
> > destroyed.
> > 
> > Flows rules are defined with ``struct rte_flow``.
> > 
> > Validation
> > ~~~~~~~~~~
> > 
> > Given that expressing a definite set of device capabilities with this API is
> > not practical, a dedicated function is provided to check if a flow rule is
> > supported and can be created.
> > 
> > ::
> > 
> >  int
> >  rte_flow_validate(uint8_t port_id,
> >                    const struct rte_flow_pattern *pattern,
> >                    const struct rte_flow_actions *actions);
> > 
> > While this function has no effect on the target device, the flow rule is
> > validated against its current configuration state and the returned value
> > should be considered valid by the caller for that state only.
> > 
> > The returned value is guaranteed to remain valid only as long as no
> > successful calls to rte_flow_create() or rte_flow_destroy() are made in the
> > meantime and no device parameter affecting flow rules in any way are
> > modified, due to possible collisions or resource limitations (although in
> > such cases ``EINVAL`` should not be returned).
> > 
> > Arguments:
> > 
> > - ``port_id``: port identifier of Ethernet device.
> > - ``pattern``: pattern specification to check.
> > - ``actions``: actions associated with the flow definition.
> > 
> > Return value:
> > 
> > - **0** if flow rule is valid and can be created. A negative errno value
> >   otherwise (``rte_errno`` is also set), the following errors are defined.
> > - ``-EINVAL``: unknown or invalid rule specification.
> > - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial masks
> >   are unsupported).
> > - ``-EEXIST``: collision with an existing rule.
> > - ``-ENOMEM``: not enough resources.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Creation
> > ~~~~~~~~
> > 
> > Creating a flow rule is similar to validating one, except the rule is
> > actually created.
> > 
> > ::
> > 
> >  struct rte_flow *
> >  rte_flow_create(uint8_t port_id,
> >                  const struct rte_flow_pattern *pattern,
> >                  const struct rte_flow_actions *actions);
> > 
> > Arguments:
> > 
> > - ``port_id``: port identifier of Ethernet device.
> > - ``pattern``: pattern specification to add.
> > - ``actions``: actions associated with the flow definition.
> > 
> > Return value:
> > 
> > A valid flow pointer in case of success, NULL otherwise and ``rte_errno`` is
> > set to the positive version of one of the error codes defined for
> > ``rte_flow_validate()``.
> [Sugesh] : Kind of implementation specific query. What if application
> try to add duplicate rules? Does the API create new flow entry for every 
> API call? 

If an application adds duplicate rules at a given priority level, the second
one may return an error depending on the PMD. Collisions are sometimes
trivial to detect (such as the same pattern twice), others not so much (one
matching an Ethernet header only, the other one matching an IP header only).

Either way if a packet is matched by two rules at a given priority level,
what happens is described in 3.3 (High level design) and 4.4.1 (Priorities).

Applications are responsible for not relying on the PMD to detect these, or
should use a single priority level for each rule to make things clear.

However since the number of HW priority levels is finite and possibly small,
they must also make sure not to waste them. My advice is to only use
priority levels when it cannot be proven that rules do not collide.

If all you have is perfect matching rules without wildcards and all of them
match the same number of layers, a single priority level is fine.

> [Sugesh] Another concern is the cost and time of installing these rules
> in the hardware. Can we make these APIs time bound(or at least an option to
> set the time limit to execute these APIs), so that
> Application doesn’t have to wait so long when installing and deleting flows with
> slow hardware/NIC. What do you think? Most of the datapath flow installations are 
> dynamic and triggered only when there is
> an ingress traffic. Delay in flow insertion/deletion have unpredictable consequences.

This API is (currently) aimed at the control path only, and must indeed be
assumed to be slow. Creating million of rules may take quite long as it may
involve syscalls and other time-consuming synchronization things on the PMD
side.

So currently there is no plan to have rules added from the data path with
time constraints. I think it would be implemented through a different set of
functions anyway.

I do not think adding time limits is practical, even specifying in the API
that creating a single flow rule must take less than a maximum number of
seconds in order to be effective is too much of a constraint (applications
that create all flows during init may not care after all).

You should consider in any case that modifying flow rules will always be
slower than receiving packets, there is no way around that. Applications
have to live with it and provide a software fallback for incoming packets
while managing flow rules.

Moreover, think about what happens when you hit the maximum number of flow
rules and cannot create any more. Applications need to implement some kind
of fallback in their data path.

Offloading flows in HW is also only useful if they live much longer than the
time taken to create and delete them. Perhaps applications may choose to do
so after detecting long lived flows such as TCP sessions.

You may have one separate control thread dedicated to manage flows and
keep your normal control thread unaffected by delays. Several threads can
even be dedicated, one per device.

> [Sugesh] Another query is on the synchronization part. What if same rules are 
> handled from different threads? Is application responsible for handling the concurrent
> hardware programming?

Like most (if not all) DPDK APIs, applications are responsible for managing
locking issues as decribed in 4.3 (Behavior). Since this is a control path
API and applications usually have a single control thread, locking should
not be necessary in most cases.

Regarding my above comment about using several control threads to manage
different devices, section 4.3 says:
 
 "There is no provision for reentrancy/multi-thread safety, although nothing
 should prevent different devices from being configured at the same
 time. PMDs may protect their control path functions accordingly."

I'd like to emphasize it is not "per port" but "per device", since in a few
cases a configurable resource is shared by several ports. It may be
difficult for applications to determine which ports are shared by a given
device but this falls outside the scope of this API.

Do you think adding the guarantee that it is always safe to configure two
different ports simultaneously without locking from the application side is
necessary? In which case the PMD would be responsible for locking shared
resources.

> > Destruction
> > ~~~~~~~~~~~
> > 
> > Flow rules destruction is not automatic, and a queue should not be released
> > if any are still attached to it. Applications must take care of performing
> > this step before releasing resources.
> > 
> > ::
> > 
> >  int
> >  rte_flow_destroy(uint8_t port_id,
> >                   struct rte_flow *flow);
> > 
> > 
> [Sugesh] I would suggest having a clean-up API is really useful as the releasing of
> Queue(is it applicable for releasing of port too?) is not guaranteeing the automatic flow 
> destruction.

Would something like rte_flow_flush(port_id) do the trick? I wanted to
emphasize in this first draft that applications should really keep the flow
pointers around in order to manage/destroy them. It is their responsibility,
not PMD's.

> This way application can initialize the port,
> clean-up all the existing rules and create new rules  on a clean slate.

No resource can be released as long as a flow rule is using it (bad things
may happen otherwise), all flow rules must be destroyed first, thus none can
possibly remain after initializing a port. It is assumed that PMDs do
automatic clean up during init if necessary to ensure this.

> > Failure to destroy a flow rule may occur when other flow rules depend on it,
> > and destroying it would result in an inconsistent state.
> > 
> > This function is only guaranteed to succeed if flow rules are destroyed in
> > reverse order of their creation.
> > 
> > Arguments:
> > 
> > - ``port_id``: port identifier of Ethernet device.
> > - ``flow``: flow rule to destroy.
> > 
> > Return value:
> > 
> > - **0** on success, a negative errno value otherwise and ``rte_errno`` is
> >   set.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Query
> > ~~~~~
> > 
> > Query an existing flow rule.
> > 
> > This function allows retrieving flow-specific data such as counters. Data
> > is gathered by special actions which must be present in the flow rule
> > definition.
> > 
> > ::
> > 
> >  int
> >  rte_flow_query(uint8_t port_id,
> >                 struct rte_flow *flow,
> >                 enum rte_flow_action_type action,
> >                 void *data);
> > 
> > Arguments:
> > 
> > - ``port_id``: port identifier of Ethernet device.
> > - ``flow``: flow rule to query.
> > - ``action``: action type to query.
> > - ``data``: pointer to storage for the associated query data type.
> > 
> > Return value:
> > 
> > - **0** on success, a negative errno value otherwise and ``rte_errno`` is
> >   set.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Behavior
> > --------
> > 
> > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> >   returned).
> > 
> > - There is no provision for reentrancy/multi-thread safety, although nothing
> >   should prevent different devices from being configured at the same
> >   time. PMDs may protect their control path functions accordingly.
> > 
> > - Stopping the data path (TX/RX) should not be necessary when managing
> > flow
> >   rules. If this cannot be achieved naturally or with workarounds (such as
> >   temporarily replacing the burst function pointers), an appropriate error
> >   code must be returned (``EBUSY``).
> > 
> > - PMDs, not applications, are responsible for maintaining flow rules
> >   configuration when stopping and restarting a port or performing other
> >   actions which may affect them. They can only be destroyed explicitly.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> [Sugesh] Query all the rules for a specific port/queue?? Useful when adding and
> deleting ports and queues dynamically according to the need. I am not sure 
> what are the other  different usecases for these APIs. But I feel it makes much easier to 
> manage flows from the application. What do you think?

Not sure, that seems to fall out of the scope of this API. As described,
applications already store the related rte_flow pointers. Accordingly, they
know how many rules are associated to a given port. They need both a port ID
and a flow rule pointer to destroy them after all.

Now perhaps something to convert back an existing rte_flow to a pattern and
a list of actions, however I cannot see an immediate use case for it.

What you describe seems to be doable through a front-end API, I think
keeping this one as low-level as possible with only basic actions is better
right now. I'll keep your suggestion in mind.

> > Compatibility
> > -------------
> > 
> > No known hardware implementation supports all the features described in
> > this
> > document.
> > 
> > Unsupported features or combinations are not expected to be fully
> > emulated
> > in software by PMDs for performance reasons. Partially supported features
> > may be completed in software as long as hardware performs most of the
> > work
> > (such as queue redirection and packet recognition).
> > 
> > However PMDs are expected to do their best to satisfy application requests
> > by working around hardware limitations as long as doing so does not affect
> > the behavior of existing flow rules.
> > 
> > The following sections provide a few examples of such cases, they are based
> > on limitations built into the previous APIs.
> > 
> > Global bitmasks
> > ~~~~~~~~~~~~~~~
> > 
> > Each flow rule comes with its own, per-layer bitmasks, while hardware may
> > support only a single, device-wide bitmask for a given layer type, so that
> > two IPv4 rules cannot use different bitmasks.
> > 
> > The expected behavior in this case is that PMDs automatically configure
> > global bitmasks according to the needs of the first created flow rule.
> > 
> > Subsequent rules are allowed only if their bitmasks match those, the
> > ``EEXIST`` error code should be returned otherwise.
> > 
> > Unsupported layer types
> > ~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > Many protocols can be simulated by crafting patterns with the `RAW`_ type.
> > 
> > PMDs can rely on this capability to simulate support for protocols with
> > fixed headers not directly recognized by hardware.
> > 
> > ``ANY`` pattern item
> > ~~~~~~~~~~~~~~~~~~~~
> > 
> > This pattern item stands for anything, which can be difficult to translate
> > to something hardware would understand, particularly if followed by more
> > specific types.
> > 
> > Consider the following pattern:
> > 
> > +---+--------------------------------+
> > | 0 | ETHER                          |
> > +---+--------------------------------+
> > | 1 | ANY (``min`` = 1, ``max`` = 1) |
> > +---+--------------------------------+
> > | 2 | TCP                            |
> > +---+--------------------------------+
> > 
> > Knowing that TCP does not make sense with something other than IPv4 and
> > IPv6
> > as L3, such a pattern may be translated to two flow rules instead:
> > 
> > +---+--------------------+
> > | 0 | ETHER              |
> > +---+--------------------+
> > | 1 | IPV4 (zeroed mask) |
> > +---+--------------------+
> > | 2 | TCP                |
> > +---+--------------------+
> > 
> > +---+--------------------+
> > | 0 | ETHER              |
> > +---+--------------------+
> > | 1 | IPV6 (zeroed mask) |
> > +---+--------------------+
> > | 2 | TCP                |
> > +---+--------------------+
> > 
> > Note that as soon as a ANY rule covers several layers, this approach may
> > yield a large number of hidden flow rules. It is thus suggested to only
> > support the most common scenarios (anything as L2 and/or L3).
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > Unsupported actions
> > ~~~~~~~~~~~~~~~~~~~
> > 
> > - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
> >   tagging (`ID`_) may be implemented in software as long as the target queue
> >   is used by a single rule.
> > 
> > - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
> >   rules combining `QUEUE`_ and `PASSTHRU`_.
> > 
> > - When a single target queue is provided, `RSS`_ can also be implemented
> >   through `QUEUE`_.
> > 
> > Flow rules priority
> > ~~~~~~~~~~~~~~~~~~~
> > 
> > While it would naturally make sense, flow rules cannot be assumed to be
> > processed by hardware in the same order as their creation for several
> > reasons:
> > 
> > - They may be managed internally as a tree or a hash table instead of a
> >   list.
> > - Removing a flow rule before adding another one can either put the new
> > rule
> >   at the end of the list or reuse a freed entry.
> > - Duplication may occur when packets are matched by several rules.
> > 
> > For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> > predictable behavior is only guaranteed by using different priority levels.
> > 
> > Priority levels are not necessarily implemented in hardware, or may be
> > severely limited (e.g. a single priority bit).
> > 
> > For these reasons, priority levels may be implemented purely in software by
> > PMDs.
> > 
> > - For devices expecting flow rules to be added in the correct order, PMDs
> >   may destroy and re-create existing rules after adding a new one with
> >   a higher priority.
> > 
> > - A configurable number of dummy or empty rules can be created at
> >   initialization time to save high priority slots for later.
> > 
> > - In order to save priority levels, PMDs may evaluate whether rules are
> >   likely to collide and adjust their priority accordingly.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > API migration
> > =============
> > 
> > Exhaustive list of deprecated filter types and how to convert them to
> > generic flow rules.
> > 
> > ``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
> > ---------------------------------------
> > 
> > `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF
> > (action)`_ or `PF (action)`_ terminating action.
> > 
> > +------------------------------------+
> > | MACVLAN                            |
> > +--------------------------+---------+
> > | Pattern                  | Actions |
> > +===+=====+==========+=====+=========+
> > | 0 | ETH | ``spec`` | any | VF,     |
> > |   |     +----------+-----+ PF      |
> > |   |     | ``mask`` | any |         |
> > +---+-----+----------+-----+---------+
> > 
> > ``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
> > ----------------------------------------------
> > 
> > `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as
> > a terminating action.
> > 
> > +------------------------------------+
> > | ETHERTYPE                          |
> > +--------------------------+---------+
> > | Pattern                  | Actions |
> > +===+=====+==========+=====+=========+
> > | 0 | ETH | ``spec`` | any | QUEUE,  |
> > |   |     +----------+-----+ DROP    |
> > |   |     | ``mask`` | any |         |
> > +---+-----+----------+-----+---------+
> > 
> > ``FLEXIBLE`` to ``RAW`` → ``QUEUE``
> > -----------------------------------
> > 
> > `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the
> > terminating action and a defined priority level.
> > 
> > +------------------------------------+
> > | FLEXIBLE                           |
> > +--------------------------+---------+
> > | Pattern                  | Actions |
> > +===+=====+==========+=====+=========+
> > | 0 | RAW | ``spec`` | any | QUEUE   |
> > |   |     +----------+-----+         |
> > |   |     | ``mask`` | any |         |
> > +---+-----+----------+-----+---------+
> > 
> > ``SYN`` to ``TCP`` → ``QUEUE``
> > ------------------------------
> > 
> > `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
> > `QUEUE`_ as the terminating action.
> > 
> > Priority level can be set to simulate the high priority bit.
> > 
> > +---------------------------------------------+
> > | SYN                                         |
> > +-----------------------------------+---------+
> > | Pattern                           | Actions |
> > +===+======+==========+=============+=========+
> > | 0 | ETH  | ``spec`` | N/A         | QUEUE   |
> > |   |      +----------+-------------+         |
> > |   |      | ``mask`` | empty       |         |
> > +---+------+----------+-------------+         |
> > | 1 | IPV4 | ``spec`` | N/A         |         |
> > |   |      +----------+-------------+         |
> > |   |      | ``mask`` | empty       |         |
> > +---+------+----------+-------------+         |
> > | 2 | TCP  | ``spec`` | ``syn`` = 1 |         |
> > |   |      +----------+-------------+         |
> > |   |      | ``mask`` | ``syn`` = 1 |         |
> > +---+------+----------+-------------+---------+
> > 
> > ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
> > ----------------------------------------------------
> > 
> > `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
> > `UDP`_ as L4 and `QUEUE`_ as the terminating action.
> > 
> > A priority level can be specified as well.
> > 
> > +---------------------------------------+
> > | NTUPLE                                |
> > +-----------------------------+---------+
> > | Pattern                     | Actions |
> > +===+======+==========+=======+=========+
> > | 0 | ETH  | ``spec`` | N/A   | QUEUE   |
> > |   |      +----------+-------+         |
> > |   |      | ``mask`` | empty |         |
> > +---+------+----------+-------+         |
> > | 1 | IPV4 | ``spec`` | any   |         |
> > |   |      +----------+-------+         |
> > |   |      | ``mask`` | any   |         |
> > +---+------+----------+-------+         |
> > | 2 | TCP, | ``spec`` | any   |         |
> > |   | UDP  +----------+-------+         |
> > |   |      | ``mask`` | any   |         |
> > +---+------+----------+-------+---------+
> > 
> > ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
> > ---------------------------------------------------------------------------
> > 
> > `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types.
> > 
> > In the following table, `ANY`_ is used to cover the optional L4.
> > 
> > +------------------------------------------------+
> > | TUNNEL                                         |
> > +--------------------------------------+---------+
> > | Pattern                              | Actions |
> > +===+=========+==========+=============+=========+
> > | 0 | ETH     | ``spec`` | any         | QUEUE   |
> > |   |         +----------+-------------+         |
> > |   |         | ``mask`` | any         |         |
> > +---+---------+----------+-------------+         |
> > | 1 | IPV4,   | ``spec`` | any         |         |
> > |   | IPV6    +----------+-------------+         |
> > |   |         | ``mask`` | any         |         |
> > +---+---------+----------+-------------+         |
> > | 2 | ANY     | ``spec`` | ``min`` = 0 |         |
> > |   |         |          +-------------+         |
> > |   |         |          | ``max`` = 0 |         |
> > |   |         +----------+-------------+         |
> > |   |         | ``mask`` | N/A         |         |
> > +---+---------+----------+-------------+         |
> > | 3 | VXLAN,  | ``spec`` | any         |         |
> > |   | GENEVE, +----------+-------------+         |
> > |   | TEREDO, | ``mask`` | any         |         |
> > |   | NVGRE,  |          |             |         |
> > |   | GRE,    |          |             |         |
> > |   | ...     |          |             |         |
> > +---+---------+----------+-------------+---------+
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
> > ---------------------------------------------------------------
> > 
> > `FDIR`_ is more complex than any other type, there are several methods to
> > emulate its functionality. It is summarized for the most part in the table
> > below.
> > 
> > A few features are intentionally not supported:
> > 
> > - The ability to configure the matching input set and masks for the entire
> >   device, PMDs should take care of it automatically according to flow rules.
> > 
> > - Returning four or eight bytes of matched data when using flex bytes
> >   filtering. Although a specific action could implement it, it conflicts
> >   with the much more useful 32 bits tagging on devices that support it.
> > 
> > - Side effects on RSS processing of the entire device. Flow rules that
> >   conflict with the current device configuration should not be
> >   allowed. Similarly, device configuration should not be allowed when it
> >   affects existing flow rules.
> > 
> > - Device modes of operation. "none" is unsupported since filtering cannot be
> >   disabled as long as a flow rule is present.
> > 
> > - "MAC VLAN" or "tunnel" perfect matching modes should be automatically
> > set
> >   according to the created flow rules.
> > 
> > +----------------------------------------------+
> > | FDIR                                         |
> > +---------------------------------+------------+
> > | Pattern                         | Actions    |
> > +===+============+==========+=====+============+
> > | 0 | ETH,       | ``spec`` | any | QUEUE,     |
> > |   | RAW        +----------+-----+ DROP,      |
> > |   |            | ``mask`` | any | PASSTHRU   |
> > +---+------------+----------+-----+------------+
> > | 1 | IPV4,      | ``spec`` | any | ID         |
> > |   | IPV6       +----------+-----+ (optional) |
> > |   |            | ``mask`` | any |            |
> > +---+------------+----------+-----+            |
> > | 2 | TCP,       | ``spec`` | any |            |
> > |   | UDP,       +----------+-----+            |
> > |   | SCTP       | ``mask`` | any |            |
> > +---+------------+----------+-----+            |
> > | 3 | VF,        | ``spec`` | any |            |
> > |   | PF,        +----------+-----+            |
> > |   | SIGNATURE  | ``mask`` | any |            |
> > |   | (optional) |          |     |            |
> > +---+------------+----------+-----+------------+
> > 
> > ``HASH``
> > ~~~~~~~~
> > 
> > Hashing configuration is set per rule through the `SIGNATURE`_ item.
> > 
> > Since it is usually a global device setting, all flow rules created with
> > this item may have to share the same specification.
> > 
> > ``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > All packets are matched. This type alters incoming packets to encapsulate
> > them in a chosen tunnel type, optionally redirect them to a VF as well.
> > 
> > The destination pool for tag based forwarding can be emulated with other
> > flow rules using `DUP`_ as the action.
> > 
> > +----------------------------------------+
> > | L2_TUNNEL                              |
> > +---------------------------+------------+
> > | Pattern                   | Actions    |
> > +===+======+==========+=====+============+
> > | 0 | VOID | ``spec`` | N/A | VXLAN,     |
> > |   |      |          |     | GENEVE,    |
> > |   |      |          |     | ...        |
> > |   |      +----------+-----+------------+
> > |   |      | ``mask`` | N/A | VF         |
> > |   |      |          |     | (optional) |
> > +---+------+----------+-----+------------+
> > 
> > --
> > Adrien Mazarguil
> > 6WIND

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-05 18:16  2% [dpdk-dev] [RFC] Generic flow director/filtering/classification API Adrien Mazarguil
  2016-07-07  7:14  0% ` Lu, Wenzhuo
  2016-07-07 23:15  0% ` Chandran, Sugesh
@ 2016-07-08 11:11  0% ` Liang, Cunming
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Liang, Cunming @ 2016-07-08 11:11 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Helin Zhang, Jingjing Wu, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Wenzhuo Lu, Jan Medala,
	John Daley, Jing Chen, Konstantin Ananyev, Matej Vido,
	Alejandro Lucero, Sony Chacko, Jerin Jacob, Pablo de Lara,
	Olga Shern

Hi Adrien,

On 7/6/2016 2:16 AM, Adrien Mazarguil wrote:
> Hi All,
>
> First, forgive me for this large message, I know our mailboxes already
> suffer quite a bit from the amount of traffic on this ML.
>
> This is not exactly yet another thread about how flow director should be
> extended, rather about a brand new API to handle filtering and
> classification for incoming packets in the most PMD-generic and
> application-friendly fashion we can come up with. Reasons described below.
>
> I think this topic is important enough to include both the users of this API
> as well as PMD maintainers. So far I have CC'ed librte_ether (especially
> rte_eth_ctrl.h contributors), testpmd and PMD maintainers (with and without
> a .filter_ctrl implementation), but if you know application maintainers
> other than testpmd who use FDIR or might be interested in this discussion,
> feel free to add them.
>
> The issues we found with the current approach are already summarized in the
> following document, but here is a quick summary for TL;DR folks:
>
> - PMDs do not expose a common set of filter types and even when they do,
>    their behavior more or less differs.
>
> - Applications need to determine and adapt to device-specific limitations
>    and quirks on their own, without help from PMDs.
>
> - Writing an application that creates flow rules targeting all devices
>    supported by DPDK is thus difficult, if not impossible.
>
> - The current API has too many unspecified areas (particularly regarding
>    side effects of flow rules) that make PMD implementation tricky.
>
> This RFC API handles everything currently supported by .filter_ctrl, the
> idea being to reimplement all of these to make them fully usable by
> applications in a more generic and well defined fashion. It has a very small
> set of mandatory features and an easy method to let applications probe for
> supported capabilities.
>
> The only downside is more work for the software control side of PMDs because
> they have to adapt to the API instead of the reverse. I think helpers can be
> added to EAL to assist with this.
>
> HTML version:
>
>   https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
>
> PDF version:
>
>   https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf
>
> Related draft header file (for reference while reading the specification):
>
>   https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h
>
> Git tree for completeness (latest .rst version can be retrieved from here):
>
>   https://github.com/6WIND/rte_flow
>
> What follows is the ReST source of the above, for inline comments and
> discussion. I intend to update that specification accordingly.
>
> ========================
> Generic filter interface
> ========================
>
> .. footer::
>
>     v0.6
>
> .. contents::
> .. sectnum::
> .. raw:: pdf
>
>     PageBreak
>
> Overview
> ========
>
> DPDK provides several competing interfaces added over time to perform packet
> matching and related actions such as filtering and classification.
>
> They must be extended to implement the features supported by newer devices
> in order to expose them to applications, however the current design has
> several drawbacks:
>
> - Complicated filter combinations which have not been hard-coded cannot be
>    expressed.
> - Prone to API/ABI breakage when new features must be added to an existing
>    filter type, which frequently happens.
>
>  From an application point of view:
>
> - Having disparate interfaces, all optional and lacking in features does not
>    make this API easy to use.
> - Seemingly arbitrary built-in limitations of filter types based on the
>    device they were initially designed for.
> - Undefined relationship between different filter types.
> - High complexity, considerable undocumented and/or undefined behavior.
>
> Considering the growing number of devices supported by DPDK, adding a new
> filter type each time a new feature must be implemented is not sustainable
> in the long term. Applications not written to target a specific device
> cannot really benefit from such an API.
>
> For these reasons, this document defines an extensible unified API that
> encompasses and supersedes these legacy filter types.
>
> .. raw:: pdf
>
>     PageBreak
>
> Current API
> ===========
>
> Rationale
> ---------
>
> The reason several competing (and mostly overlapping) filtering APIs are
> present in DPDK is due to its nature as a thin layer between hardware and
> software.
>
> Each subsequent interface has been added to better match the capabilities
> and limitations of the latest supported device, which usually happened to
> need an incompatible configuration approach. Because of this, many ended up
> device-centric and not usable by applications that were not written for that
> particular device.
>
> This document is not the first attempt to address this proliferation issue,
> in fact a lot of work has already been done both to create a more generic
> interface while somewhat keeping compatibility with legacy ones through a
> common call interface (``rte_eth_dev_filter_ctrl()`` with the
> ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``).
>
> Today, these previously incompatible interfaces are known as filter types
> (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``).
>
> However while trivial to extend with new types, it only shifted the
> underlying problem as applications still need to be written for one kind of
> filter type, which, as described in the following sections, is not
> necessarily implemented by all PMDs that support filtering.
>
> .. raw:: pdf
>
>     PageBreak
>
> Filter types
> ------------
>
> This section summarizes the capabilities of each filter type.
>
> Although the following list is exhaustive, the description of individual
> types may contain inaccuracies due to the lack of documentation or usage
> examples.
>
> Note: names are prefixed with ``RTE_ETH_FILTER_``.
>
> ``MACVLAN``
> ~~~~~~~~~~~
>
> Matching:
>
> - L2 source/destination addresses.
> - Optional 802.1Q VLAN ID.
> - Masking individual fields on a rule basis is not supported.
>
> Action:
>
> - Packets are redirected either to a given VF device using its ID or to the
>    PF.
>
> ``ETHERTYPE``
> ~~~~~~~~~~~~~
>
> Matching:
>
> - L2 source/destination addresses (optional).
> - Ethertype (no VLAN ID?).
> - Masking individual fields on a rule basis is not supported.
>
> Action:
>
> - Receive packets on a given queue.
> - Drop packets.
>
> ``FLEXIBLE``
> ~~~~~~~~~~~~
>
> Matching:
>
> - At most 128 consecutive bytes anywhere in packets.
> - Masking is supported with byte granularity.
> - Priorities are supported (relative to this filter type, undefined
>    otherwise).
>
> Action:
>
> - Receive packets on a given queue.
>
> ``SYN``
> ~~~~~~~
>
> Matching:
>
> - TCP SYN packets only.
> - One high priority bit can be set to give the highest possible priority to
>    this type when other filters with different types are configured.
>
> Action:
>
> - Receive packets on a given queue.
>
> ``NTUPLE``
> ~~~~~~~~~~
>
> Matching:
>
> - Source/destination IPv4 addresses (optional in 2-tuple mode).
> - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes).
> - L4 protocol (2 and 5-tuple modes).
> - Masking individual fields is supported.
> - TCP flags.
> - Up to 7 levels of priority relative to this filter type, undefined
>    otherwise.
> - No IPv6.
>
> Action:
>
> - Receive packets on a given queue.
>
> ``TUNNEL``
> ~~~~~~~~~~
>
> Matching:
>
> - Outer L2 source/destination addresses.
> - Inner L2 source/destination addresses.
> - Inner VLAN ID.
> - IPv4/IPv6 source (destination?) address.
> - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, 802.1BR
>    E-Tag).
> - Tenant ID for tunneling protocols that have one.
> - Any combination of the above can be specified.
> - Masking individual fields on a rule basis is not supported.
>
> Action:
>
> - Receive packets on a given queue.
>
> .. raw:: pdf
>
>     PageBreak
>
> ``FDIR``
> ~~~~~~~~
>
> Queries:
>
> - Device capabilities and limitations.
> - Device statistics about configured filters (resource usage, collisions).
> - Device configuration (matching input set and masks)
>
> Matching:
>
> - Device mode of operation: none (to disable filtering), signature
>    (hash-based dispatching from masked fields) or perfect (either MAC VLAN or
>    tunnel).
> - L2 Ethertype.
> - Outer L2 destination address (MAC VLAN mode).
> - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID
>    (tunnel mode).
> - IPv4 source/destination addresses, ToS, TTL and protocol fields.
> - IPv6 source/destination addresses, TC, protocol and hop limits fields.
> - UDP source/destination IPv4/IPv6 and ports.
> - TCP source/destination IPv4/IPv6 and ports.
> - SCTP source/destination IPv4/IPv6, ports and verification tag field.
> - Note, only one protocol type at once (either only L2 Ethertype, basic
>    IPv6, IPv4+UDP, IPv4+TCP and so on).
> - VLAN TCI (extended API).
> - At most 16 bytes to match in payload (extended API). A global device
>    look-up table specifies for each possible protocol layer (unknown, raw,
>    L2, L3, L4) the offset to use for each byte (they do not need to be
>    contiguous) and the related bitmask.
> - Whether packet is addressed to PF or VF, in that case its ID can be
>    matched as well (extended API).
> - Masking most of the above fields is supported, but simultaneously affects
>    all filters configured on a device.
> - Input set can be modified in a similar fashion for a given device to
>    ignore individual fields of filters (i.e. do not match the destination
>    address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_**
>    macros). Configuring this also affects RSS processing on **i40e**.
> - Filters can also provide 32 bits of arbitrary data to return as part of
>    matched packets.
>
> Action:
>
> - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue.
> - **RTE_ETH_FDIR_REJECT**: drop packet immediately.
> - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list,
>    otherwise process it with subsequent filters.
> - For accepted packets and if requested by filter, either 32 bits of
>    arbitrary data and four bytes of matched payload (only in case of flex
>    bytes matching), or eight bytes of matched payload (flex also) are added
>    to meta data.
>
> .. raw:: pdf
>
>     PageBreak
>
> ``HASH``
> ~~~~~~~~
>
> Not an actual filter type. Provides and retrieves the global device
> configuration (per port or entire NIC) for hash functions and their
> properties.
>
> Hash function selection: "default" (keep current), XOR or Toeplitz.
>
> This function can be configured per flow type (**RTE_ETH_FLOW_**
> definitions), supported types are:
>
> - Unknown.
> - Raw.
> - Fragmented or non-fragmented IPv4.
> - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other).
> - Fragmented or non-fragmented IPv6.
> - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other).
> - L2 payload.
> - IPv6 with extensions.
> - IPv6 with L4 (TCP, UDP) and extensions.
>
> ``L2_TUNNEL``
> ~~~~~~~~~~~~~
>
> Matching:
>
> - All packets received on a given port.
>
> Action:
>
> - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
>    802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag
>    is implemented at the moment).
> - VF ID to use for tag insertion (currently unused).
> - Destination pool for tag based forwarding (pools are IDs that can be
>    affected to ports, duplication occurs if the same ID is shared by several
>    ports of the same NIC).
>
> .. raw:: pdf
>
>     PageBreak
>
> Driver support
> --------------
>
> ======== ======= ========= ======== === ====== ====== ==== ==== =========
> Driver   MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH L2_TUNNEL
> ======== ======= ========= ======== === ====== ====== ==== ==== =========
> bnx2x
> cxgbe
> e1000            yes       yes      yes yes
> ena
> enic                                                  yes
> fm10k
> i40e     yes     yes                           yes    yes  yes
> ixgbe            yes                yes yes           yes       yes
> mlx4
> mlx5                                                  yes
> szedata2
> ======== ======= ========= ======== === ====== ====== ==== ==== =========
>
> Flow director
> -------------
>
> Flow director (FDIR) is the name of the most capable filter type, which
> covers most features offered by others. As such, it is the most widespread
> in PMDs that support filtering (i.e. all of them besides **e1000**).
>
> It is also the only type that allows an arbitrary 32 bits value provided by
> applications to be attached to a filter and returned with matching packets
> instead of relying on the destination queue to recognize flows.
>
> Unfortunately, even FDIR requires applications to be aware of low-level
> capabilities and limitations (most of which come directly from **ixgbe** and
> **i40e**):
>
> - Bitmasks are set globally per device (port?), not per filter.
> - Configuration state is not expected to be saved by the driver, and
>    stopping/restarting a port requires the application to perform it again
>    (API documentation is also unclear about this).
> - Monolithic approach with ABI issues as soon as a new kind of flow or
>    combination needs to be supported.
> - Cryptic global statistics/counters.
> - Unclear about how priorities are managed; filters seem to be arranged as a
>    linked list in hardware (possibly related to configuration order).
>
> Packet alteration
> -----------------
>
> One interesting feature is that the L2 tunnel filter type implements the
> ability to alter incoming packets through a filter (in this case to
> encapsulate them), thus the **mlx5** flow encap/decap features are not a
> foreign concept.
>
> .. raw:: pdf
>
>     PageBreak
>
> Proposed API
> ============
>
> Terminology
> -----------
>
> - **Filtering API**: overall framework affecting the fate of selected
>    packets, covers everything described in this document.
> - **Matching pattern**: properties to look for in received packets, a
>    combination of any number of items.
> - **Pattern item**: part of a pattern that either matches packet data
>    (protocol header, payload or derived information), or specifies properties
>    of the pattern itself.
> - **Actions**: what needs to be done when a packet matches a pattern.
> - **Flow rule**: this is the result of combining a *matching pattern* with
>    *actions*.
> - **Filter rule**: a less generic term than *flow rule*, can otherwise be
>    used interchangeably.
> - **Hit**: a flow rule is said to be *hit* when processing a matching
>    packet.
>
> Requirements
> ------------
>
> As described in the previous section, there is a growing need for a common
> method to configure filtering and related actions in a hardware independent
> fashion.
>
> The filtering API should not disallow any filter combination by design and
> must remain as simple as possible to use. It can simply be defined as a
> method to perform one or several actions on selected packets.
>
> PMDs are aware of the capabilities of the device they manage and should be
> responsible for preventing unsupported or conflicting combinations.
>
> This approach is fundamentally different as it places most of the burden on
> the software side of the PMD instead of having device capabilities directly
> mapped to API functions, then expecting applications to work around ensuing
> compatibility issues.
>
> Requirements for a new API:
>
> - Flexible and extensible without causing API/ABI problems for existing
>    applications.
> - Should be unambiguous and easy to use.
> - Support existing filtering features and actions listed in `Filter types`_.
> - Support packet alteration.
> - In case of overlapping filters, their priority should be well documented.
> - Support filter queries (for example to retrieve counters).
>
> .. raw:: pdf
>
>     PageBreak
>
> High level design
> -----------------
>
> The chosen approach to make filtering as generic as possible is by
> expressing matching patterns through lists of items instead of the flat
> structures used in DPDK today, enabling combinations that are not predefined
> and thus being more versatile.
>
> Flow rules can have several distinct actions (such as counting,
> encapsulating, decapsulating before redirecting packets to a particular
> queue, etc.), instead of relying on several rules to achieve this and having
> applications deal with hardware implementation details regarding their
> order.
>
> Support for different priority levels on a rule basis is provided, for
> example in order to force a more specific rule come before a more generic
> one for packets matched by both, however hardware support for more than a
> single priority level cannot be guaranteed. When supported, the number of
> available priority levels is usually low, which is why they can also be
> implemented in software by PMDs (e.g. to simulate missing priority levels by
> reordering rules).
>
> In order to remain as hardware agnostic as possible, by default all rules
> are considered to have the same priority, which means that the order between
> overlapping rules (when a packet is matched by several filters) is
> undefined, packet duplication may even occur as a result.
>
> PMDs may refuse to create overlapping rules at a given priority level when
> they can be detected (e.g. if a pattern matches an existing filter).
>
> Thus predictable results for a given priority level can only be achieved
> with non-overlapping rules, using perfect matching on all protocol layers.
>
> Support for multiple actions per rule may be implemented internally on top
> of non-default hardware priorities, as a result both features may not be
> simultaneously available to applications.
>
> Considering that allowed pattern/actions combinations cannot be known in
> advance and would result in an unpractically large number of capabilities to
> expose, a method is provided to validate a given rule from the current
> device configuration state without actually adding it (akin to a "dry run"
> mode).
>
> This enables applications to check if the rule types they need is supported
> at initialization time, before starting their data path. This method can be
> used anytime, its only requirement being that the resources needed by a rule
> must exist (e.g. a target RX queue must be configured first).
>
> Each defined rule is associated with an opaque handle managed by the PMD,
> applications are responsible for keeping it. These can be used for queries
> and rules management, such as retrieving counters or other data and
> destroying them.
>
> Handles must be destroyed before releasing associated resources such as
> queues.
>
> Integration
> -----------
>
> To avoid ABI breakage, this new interface will be implemented through the
> existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using
> **RTE_ETH_FILTER_GENERIC** as a new filter type.
>
> However a public front-end API described in `Rules management`_ will
> be added as the preferred method to use it.
>
> Once discussions with the community have converged to a definite API, legacy
> filter types should be deprecated and a deadline defined to remove their
> support entirely.
>
> PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC** or
> drop filtering support entirely. Less maintained PMDs for older hardware may
> lose support at this point.
>
> The notion of filter type will then be deprecated and subsequently dropped
> to avoid confusion between both frameworks.
>
> Implementation details
> ======================
>
> Flow rule
> ---------
>
> A flow rule is the combination of a matching pattern with a list of actions,
> and is the basis of this API.
>
> Priorities
> ~~~~~~~~~~
>
> A priority can be assigned to a matching pattern.
>
> The default priority level is 0 and is also the highest. Support for more
> than a single priority level in hardware is not guaranteed.
>
> If a packet is matched by several filters at a given priority level, the
> outcome is undefined. It can take any path and can even be duplicated.
>
> Matching pattern
> ~~~~~~~~~~~~~~~~
>
> A matching pattern comprises any number of items of various types.
>
> Items are arranged in a list to form a matching pattern for packets. They
> fall in two categories:
>
> - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, VXLAN and so
>    on), usually associated with a specification structure. These must be
>    stacked in the same order as the protocol layers to match, starting from
>    L2.
>
> - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF,
>    SIGNATURE and so on), often without a specification structure. Since they
>    are meta data that does not match packet contents, these can be specified
>    anywhere within item lists without affecting the protocol matching items.
>
> Most item specifications can be optionally paired with a mask to narrow the
> specific fields or bits to be matched.
>
> - Items are defined with ``struct rte_flow_item``.
> - Patterns are defined with ``struct rte_flow_pattern``.
>
> Example of an item specification matching an Ethernet header:
>
> +-----------------------------------------+
> | Ethernet                                |
> +==========+=========+====================+
> | ``spec`` | ``src`` | ``00:01:02:03:04`` |
> |          +---------+--------------------+
> |          | ``dst`` | ``00:2a:66:00:01`` |
> +----------+---------+--------------------+
> | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` |
> |          +---------+--------------------+
> |          | ``dst`` | ``00:00:00:00:ff`` |
> +----------+---------+--------------------+
>
> Non-masked bits stand for any value, Ethernet headers with the following
> properties are thus matched:
>
> - ``src``: ``??:01:02:03:??``
> - ``dst``: ``??:??:??:??:01``
>
> Except for meta types that do not need one, ``spec`` must be a valid pointer
> to a structure of the related item type. A ``mask`` of the same type can be
> provided to tell which bits in ``spec`` are to be matched.
>
> A mask is normally only needed for ``spec`` fields matching packet data,
> ignored otherwise. See individual item types for more information.
>
> A ``NULL`` mask pointer is allowed and is similar to matching with a full
> mask (all ones) ``spec`` fields supported by hardware, the remaining fields
> are ignored (all zeroes), there is thus no error checking for unsupported
> fields.
>
> Matching pattern items for packet data must be naturally stacked (ordered
> from lowest to highest protocol layer), as in the following examples:
>
> +--------------+
> | TCPv4 as L4  |
> +===+==========+
> | 0 | Ethernet |
> +---+----------+
> | 1 | IPv4     |
> +---+----------+
> | 2 | TCP      |
> +---+----------+
>
> +----------------+
> | TCPv6 in VXLAN |
> +===+============+
> | 0 | Ethernet   |
> +---+------------+
> | 1 | IPv4       |
> +---+------------+
> | 2 | UDP        |
> +---+------------+
> | 3 | VXLAN      |
> +---+------------+
> | 4 | Ethernet   |
> +---+------------+
> | 5 | IPv6       |
> +---+------------+
> | 6 | TCP        |
> +---+------------+
>
> +-----------------------------+
> | TCPv4 as L4 with meta items |
> +===+=========================+
> | 0 | VOID                    |
> +---+-------------------------+
> | 1 | Ethernet                |
> +---+-------------------------+
> | 2 | VOID                    |
> +---+-------------------------+
> | 3 | IPv4                    |
> +---+-------------------------+
> | 4 | TCP                     |
> +---+-------------------------+
> | 5 | VOID                    |
> +---+-------------------------+
> | 6 | VOID                    |
> +---+-------------------------+
>
> The above example shows how meta items do not affect packet data matching
> items, as long as those remain stacked properly. The resulting matching
> pattern is identical to "TCPv4 as L4".
>
> +----------------+
> | UDPv6 anywhere |
> +===+============+
> | 0 | IPv6       |
> +---+------------+
> | 1 | UDP        |
> +---+------------+
>
> If supported by the PMD, omitting one or several protocol layers at the
> bottom of the stack as in the above example (missing an Ethernet
> specification) enables hardware to look anywhere in packets.
>
> It is unspecified whether the payload of supported encapsulations
> (e.g. VXLAN inner packet) is matched by such a pattern, which may apply to
> inner, outer or both packets.
>
> +---------------------+
> | Invalid, missing L3 |
> +===+=================+
> | 0 | Ethernet        |
> +---+-----------------+
> | 1 | UDP             |
> +---+-----------------+
>
> The above pattern is invalid due to a missing L3 specification between L2
> and L4. It is only allowed at the bottom and at the top of the stack.
>
> Meta item types
> ~~~~~~~~~~~~~~~
>
> These do not match packet data but affect how the pattern is processed, most
> of them do not need a specification structure. This particularity allows
> them to be specified anywhere without affecting other item types.
[LC] For the meta item(END, VOID, INVERT) and some data matching type 
like ANY and RAW,
it's all PMD responsible to understand the key character and to parse 
the header graph?
>
> ``END``
> ^^^^^^^
>
> End marker for item lists. Prevents further processing of items, thereby
> ending the pattern.
>
> - Its numeric value is **0** for convenience.
> - PMD support is mandatory.
> - Both ``spec`` and ``mask`` are ignored.
>
> +--------------------+
> | END                |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
>
> ``VOID``
> ^^^^^^^^
>
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
>
> - PMD support is mandatory.
> - Both ``spec`` and ``mask`` are ignored.
>
> +--------------------+
> | VOID               |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
>
> One usage example for this type is generating rules that share a common
> prefix quickly without reallocating memory, only by updating item types:
>
> +------------------------+
> | TCP, UDP or ICMP as L4 |
> +===+====================+
> | 0 | Ethernet           |
> +---+--------------------+
> | 1 | IPv4               |
> +---+------+------+------+
> | 2 | UDP  | VOID | VOID |
> +---+------+------+------+
> | 3 | VOID | TCP  | VOID |
> +---+------+------+------+
> | 4 | VOID | VOID | ICMP |
> +---+------+------+------+
>
> .. raw:: pdf
>
>     PageBreak
>
> ``INVERT``
> ^^^^^^^^^^
>
> Inverted matching, i.e. process packets that do not match the pattern.
>
> - Both ``spec`` and ``mask`` are ignored.
>
> +--------------------+
> | INVERT             |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
>
> Usage example in order to match non-TCPv4 packets only:
>
> +--------------------+
> | Anything but TCPv4 |
> +===+================+
> | 0 | INVERT         |
> +---+----------------+
> | 1 | Ethernet       |
> +---+----------------+
> | 2 | IPv4           |
> +---+----------------+
> | 3 | TCP            |
> +---+----------------+
>
> ``PF``
> ^^^^^^
>
> Matches packets addressed to the physical function of the device.
>
> - Both ``spec`` and ``mask`` are ignored.
>
> +--------------------+
> | PF                 |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
>
> ``VF``
> ^^^^^^
>
> Matches packets addressed to the given virtual function ID of the device.
>
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
>
> +----------------------------------------+
> | VF                                     |
> +==========+=========+===================+
> | ``spec`` | ``vf``  | destination VF ID |
> +----------+---------+-------------------+
> | ``mask`` | ignored                     |
> +----------+-----------------------------+
>
> ``SIGNATURE``
> ^^^^^^^^^^^^^
>
> Requests hash-based signature dispatching for this rule.
>
> Considering this is a global setting on devices that support it, all
> subsequent filter rules may have to be created with it as well.
>
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
>
> +--------------------+
> | SIGNATURE          |
> +==========+=========+
> | ``spec`` | TBD     |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
>
> .. raw:: pdf
>
>     PageBreak
>
> Data matching item types
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> Most of these are basically protocol header definitions with associated
> bitmasks. They must be specified (stacked) from lowest to highest protocol
> layer.
>
> The following list is not exhaustive as new protocols will be added in the
> future.
>
> ``ANY``
> ^^^^^^^
>
> Matches any protocol in place of the current layer, a single ANY may also
> stand for several protocol layers.
>
> This is usually specified as the first pattern item when looking for a
> protocol anywhere in a packet.
>
> - A maximum value of **0** requests matching any number of protocol layers
>    above or equal to the minimum value, a maximum value lower than the
>    minimum one is otherwise invalid.
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
>
> +-----------------------------------------------------------------------+
> | ANY                                                                   |
> +==========+=========+==================================================+
> | ``spec`` | ``min`` | minimum number of layers covered                 |
> |          +---------+--------------------------------------------------+
> |          | ``max`` | maximum number of layers covered, 0 for infinity |
> +----------+---------+--------------------------------------------------+
> | ``mask`` | ignored                                                    |
> +----------+------------------------------------------------------------+
>
> Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
> and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
> or IPv6) matched by the second ANY specification:
>
> +----------------------------------+
> | TCP in VXLAN with wildcards      |
> +===+==============================+
> | 0 | Ethernet                     |
> +---+-----+----------+---------+---+
> | 1 | ANY | ``spec`` | ``min`` | 2 |
> |   |     |          +---------+---+
> |   |     |          | ``max`` | 2 |
> +---+-----+----------+---------+---+
> | 2 | VXLAN                        |
> +---+------------------------------+
> | 3 | Ethernet                     |
> +---+-----+----------+---------+---+
> | 4 | ANY | ``spec`` | ``min`` | 1 |
> |   |     |          +---------+---+
> |   |     |          | ``max`` | 1 |
> +---+-----+----------+---------+---+
> | 5 | TCP                          |
> +---+------------------------------+
>
> .. raw:: pdf
>
>     PageBreak
>
> ``RAW``
> ^^^^^^^
>
> Matches a string of a given length at a given offset (in bytes), or anywhere
> in the payload of the current protocol layer (including L2 header if used as
> the first item in the stack).
>
> This does not increment the protocol layer count as it is not a protocol
> definition. Subsequent RAW items modulate the first absolute one with
> relative offsets.
>
> - Using **-1** as the ``offset`` of the first RAW item makes its absolute
>    offset not fixed, i.e. the pattern is searched everywhere.
> - ``mask`` only affects the pattern.
The RAW matching type allow offset & length which support anchor setting 
setting and string match.
It's not defined for a user defined packet layout. Sometimes, comparing 
payload raw data after a header require
{offset, length}. One typical case is 5-tuples matching. The 'PORT' of 
transport layer is an offset to the IP header.
It can't address by IP/ANY, as it requires to extract key from the field 
in ANY.

>
> +--------------------------------------------------------------+
> | RAW                                                          |
> +==========+=============+=====================================+
> | ``spec`` | ``offset``  | absolute or relative pattern offset |
> |          +-------------+-------------------------------------+
> |          | ``length``  | pattern length                      |
> |          +-------------+-------------------------------------+
> |          | ``pattern`` | byte string of the above length     |
> +----------+-------------+-------------------------------------+
> | ``mask`` | ``offset``  | ignored                             |
> |          +-------------+-------------------------------------+
> |          | ``length``  | ignored                             |
> |          +-------------+-------------------------------------+
> |          | ``pattern`` | bitmask with the same byte length   |
> +----------+-------------+-------------------------------------+
>
> Example pattern looking for several strings at various offsets of a UDP
> payload, using combined RAW items:
>
> +------------------------------------------+
> | UDP payload matching                     |
> +===+======================================+
> | 0 | Ethernet                             |
> +---+--------------------------------------+
> | 1 | IPv4                                 |
> +---+--------------------------------------+
> | 2 | UDP                                  |
> +---+-----+----------+-------------+-------+
> | 3 | RAW | ``spec`` | ``offset``  | -1    |
> |   |     |          +-------------+-------+
> |   |     |          | ``length``  | 3     |
> |   |     |          +-------------+-------+
> |   |     |          | ``pattern`` | "foo" |
> +---+-----+----------+-------------+-------+
> | 4 | RAW | ``spec`` | ``offset``  | 20    |
> |   |     |          +-------------+-------+
> |   |     |          | ``length``  | 3     |
> |   |     |          +-------------+-------+
> |   |     |          | ``pattern`` | "bar" |
> +---+-----+----------+-------------+-------+
> | 5 | RAW | ``spec`` | ``offset``  | -30   |
> |   |     |          +-------------+-------+
> |   |     |          | ``length``  | 3     |
> |   |     |          +-------------+-------+
> |   |     |          | ``pattern`` | "baz" |
> +---+-----+----------+-------------+-------+
>
> This translates to:
>
> - Locate "foo" in UDP payload, remember its offset.
> - Check "bar" at "foo"'s offset plus 20 bytes.
> - Check "baz" at "foo"'s offset minus 30 bytes.
>
> .. raw:: pdf
>
>     PageBreak
>
> ``ETH``
> ^^^^^^^
>
> Matches an Ethernet header.
>
> - ``dst``: destination MAC.
> - ``src``: source MAC.
> - ``type``: EtherType.
> - ``tags``: number of 802.1Q/ad tags defined.
> - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
>
>   - ``tpid``: Tag protocol identifier.
>   - ``tci``: Tag control information.
>
> ``IPV4``
> ^^^^^^^^
>
> Matches an IPv4 header.
>
> - ``src``: source IP address.
> - ``dst``: destination IP address.
> - ``tos``: ToS/DSCP field.
> - ``ttl``: TTL field.
> - ``proto``: protocol number for the next layer.
>
> ``IPV6``
> ^^^^^^^^
>
> Matches an IPv6 header.
>
> - ``src``: source IP address.
> - ``dst``: destination IP address.
> - ``tc``: traffic class field.
> - ``nh``: Next header field (protocol).
> - ``hop_limit``: hop limit field (TTL).
>
> ``ICMP``
> ^^^^^^^^
>
> Matches an ICMP header.
>
> - TBD.
>
> ``UDP``
> ^^^^^^^
>
> Matches a UDP header.
>
> - ``sport``: source port.
> - ``dport``: destination port.
> - ``length``: UDP length.
> - ``checksum``: UDP checksum.
>
> .. raw:: pdf
>
>     PageBreak
>
> ``TCP``
> ^^^^^^^
>
> Matches a TCP header.
>
> - ``sport``: source port.
> - ``dport``: destination port.
> - All other TCP fields and bits.
>
> ``VXLAN``
> ^^^^^^^^^
>
> Matches a VXLAN header.
>
> - TBD.
>
> .. raw:: pdf
>
>     PageBreak
>
> Actions
> ~~~~~~~
>
> Each possible action is represented by a type. Some have associated
> configuration structures. Several actions combined in a list can be affected
> to a flow rule. That list is not ordered.
>
> At least one action must be defined in a filter rule in order to do
> something with matched packets.
>
> - Actions are defined with ``struct rte_flow_action``.
> - A list of actions is defined with ``struct rte_flow_actions``.
>
> They fall in three categories:
>
> - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
>    processing matched packets by subsequent flow rules, unless overridden
>    with PASSTHRU.
>
> - Non terminating actions (PASSTHRU, DUP) that leave matched packets up for
>    additional processing by subsequent flow rules.
>
> - Other non terminating meta actions that do not affect the fate of packets
>    (END, VOID, ID, COUNT).
>
> When several actions are combined in a flow rule, they should all have
> different types (e.g. dropping a packet twice is not possible). However
> considering the VOID type is an exception to this rule, the defined behavior
> is for PMDs to only take into account the last action of a given type found
> in the list. PMDs still perform error checking on the entire list.
>
> *Note that PASSTHRU is the only action able to override a terminating rule.*
[LC] I'm wondering how to address the meta data carried by mbuf, there's 
no mentioned here.
For packets hit one specific flow, usually there's something for CPU to 
identify the flow.
FDIR and RSS as an example, has id or key in mbuf. In addition, some 
meta may pointed by userdata in mbuf.
Any view on it ?

>
> .. raw:: pdf
>
>     PageBreak
>
> Example of an action that redirects packets to queue index 10:
>
> +----------------+
> | QUEUE          |
> +===========+====+
> | ``queue`` | 10 |
> +-----------+----+
>
> Action lists examples, their order is not significant, applications must
> consider all actions to be performed simultaneously:
>
> +----------------+
> | Count and drop |
> +=======+========+
> | COUNT |        |
> +-------+--------+
> | DROP  |        |
> +-------+--------+
>
> +--------------------------+
> | Tag, count and redirect  |
> +=======+===========+======+
> | ID    | ``id``    | 0x2a |
> +-------+-----------+------+
> | COUNT |                  |
> +-------+-----------+------+
> | QUEUE | ``queue`` | 10   |
> +-------+-----------+------+
>
> +-----------------------+
> | Redirect to queue 5   |
> +=======+===============+
> | DROP  |               |
> +-------+-----------+---+
> | QUEUE | ``queue`` | 5 |
> +-------+-----------+---+
>
> In the above example, considering both actions are performed simultaneously,
> its end result is that only QUEUE has any effect.
>
> +-----------------------+
> | Redirect to queue 3   |
> +=======+===========+===+
> | QUEUE | ``queue`` | 5 |
> +-------+-----------+---+
> | VOID  |               |
> +-------+-----------+---+
> | QUEUE | ``queue`` | 3 |
> +-------+-----------+---+
>
> As previously described, only the last action of a given type found in the
> list is taken into account. The above example also shows that VOID is
> ignored.
>
> .. raw:: pdf
>
>     PageBreak
>
> Action types
> ~~~~~~~~~~~~
>
> Common action types are described in this section. Like pattern item types,
> this list is not exhaustive as new actions will be added in the future.
>
> ``END`` (action)
> ^^^^^^^^^^^^^^^^
>
> End marker for action lists. Prevents further processing of actions, thereby
> ending the list.
>
> - Its numeric value is **0** for convenience.
> - PMD support is mandatory.
> - No configurable property.
>
> +---------------+
> | END           |
> +===============+
> | no properties |
> +---------------+
>
> ``VOID`` (action)
> ^^^^^^^^^^^^^^^^^
>
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
>
> - PMD support is mandatory.
> - No configurable property.
>
> +---------------+
> | VOID          |
> +===============+
> | no properties |
> +---------------+
>
> ``PASSTHRU``
> ^^^^^^^^^^^^
>
> Leaves packets up for additional processing by subsequent flow rules. This
> is the default when a rule does not contain a terminating action, but can be
> specified to force a rule to become non-terminating.
>
> - No configurable property.
>
> +---------------+
> | PASSTHRU      |
> +===============+
> | no properties |
> +---------------+
>
> Example to copy a packet to a queue and continue processing by subsequent
> flow rules:
>
> +--------------------------+
> | Copy to queue 8          |
> +==========+===============+
> | PASSTHRU |               |
> +----------+-----------+---+
> | QUEUE    | ``queue`` | 8 |
> +----------+-----------+---+
>
> ``ID``
> ^^^^^^
>
> Attaches a 32 bit value to packets.
>
> +----------------------------------------------+
> | ID                                           |
> +========+=====================================+
> | ``id`` | 32 bit value to return with packets |
> +--------+-------------------------------------+
>
> .. raw:: pdf
>
>     PageBreak
>
> ``QUEUE``
> ^^^^^^^^^
>
> Assigns packets to a given queue index.
>
> - Terminating by default.
>
> +--------------------------------+
> | QUEUE                          |
> +===========+====================+
> | ``queue`` | queue index to use |
> +-----------+--------------------+
>
> ``DROP``
> ^^^^^^^^
>
> Drop packets.
>
> - No configurable property.
> - Terminating by default.
> - PASSTHRU overrides this action if both are specified.
>
> +---------------+
> | DROP          |
> +===============+
> | no properties |
> +---------------+
>
> ``COUNT``
> ^^^^^^^^^
>
> Enables hits counter for this rule.
>
> This counter can be retrieved and reset through ``rte_flow_query()``, see
> ``struct rte_flow_query_count``.
>
> - Counters can be retrieved with ``rte_flow_query()``.
> - No configurable property.
>
> +---------------+
> | COUNT         |
> +===============+
> | no properties |
> +---------------+
>
> Query structure to retrieve and reset the flow rule hits counter:
>
> +------------------------------------------------+
> | COUNT query                                    |
> +===========+=====+==============================+
> | ``reset`` | in  | reset counter after query    |
> +-----------+-----+------------------------------+
> | ``hits``  | out | number of hits for this flow |
> +-----------+-----+------------------------------+
>
> ``DUP``
> ^^^^^^^
>
> Duplicates packets to a given queue index.
>
> This is normally combined with QUEUE, however when used alone, it is
> actually similar to QUEUE + PASSTHRU.
>
> - Non-terminating by default.
>
> +------------------------------------------------+
> | DUP                                            |
> +===========+====================================+
> | ``queue`` | queue index to duplicate packet to |
> +-----------+------------------------------------+
>
> .. raw:: pdf
>
>     PageBreak
>
> ``RSS``
> ^^^^^^^
>
> Similar to QUEUE, except RSS is additionally performed on packets to spread
> them among several queues according to the provided parameters.
>
> - Terminating by default.
>
> +---------------------------------------------+
> | RSS                                         |
> +==============+==============================+
> | ``rss_conf`` | RSS parameters               |
> +--------------+------------------------------+
> | ``queues``   | number of entries in queue[] |
> +--------------+------------------------------+
> | ``queue[]``  | queue indices to use         |
> +--------------+------------------------------+
>
> ``PF`` (action)
> ^^^^^^^^^^^^^^^
>
> Redirects packets to the physical function (PF) of the current device.
>
> - No configurable property.
> - Terminating by default.
>
> +---------------+
> | PF            |
> +===============+
> | no properties |
> +---------------+
>
> ``VF`` (action)
> ^^^^^^^^^^^^^^^
>
> Redirects packets to the virtual function (VF) of the current device with
> the specified ID.
>
> - Terminating by default.
>
> +---------------------------------------+
> | VF                                    |
> +========+==============================+
> | ``id`` | VF ID to redirect packets to |
> +--------+------------------------------+
>
> Planned types
> ~~~~~~~~~~~~~
>
> Other action types are planned but not defined yet. These actions will add
> the ability to alter matching packets in several ways, such as performing
> encapsulation/decapsulation of tunnel headers on specific flows.
>
> .. raw:: pdf
>
>     PageBreak
>
> Rules management
> ----------------
>
> A simple API with only four functions is provided to fully manage flows.
>
> Each created flow rule is associated with an opaque, PMD-specific handle
> pointer. The application is responsible for keeping it until the rule is
> destroyed.
>
> Flows rules are defined with ``struct rte_flow``.
>
> Validation
> ~~~~~~~~~~
>
> Given that expressing a definite set of device capabilities with this API is
> not practical, a dedicated function is provided to check if a flow rule is
> supported and can be created.
>
> ::
>
>   int
>   rte_flow_validate(uint8_t port_id,
>                     const struct rte_flow_pattern *pattern,
>                     const struct rte_flow_actions *actions);
>
> While this function has no effect on the target device, the flow rule is
> validated against its current configuration state and the returned value
> should be considered valid by the caller for that state only.
>
> The returned value is guaranteed to remain valid only as long as no
> successful calls to rte_flow_create() or rte_flow_destroy() are made in the
> meantime and no device parameter affecting flow rules in any way are
> modified, due to possible collisions or resource limitations (although in
> such cases ``EINVAL`` should not be returned).
>
> Arguments:
>
> - ``port_id``: port identifier of Ethernet device.
> - ``pattern``: pattern specification to check.
> - ``actions``: actions associated with the flow definition.
>
> Return value:
>
> - **0** if flow rule is valid and can be created. A negative errno value
>    otherwise (``rte_errno`` is also set), the following errors are defined.
> - ``-EINVAL``: unknown or invalid rule specification.
> - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial masks
>    are unsupported).
> - ``-EEXIST``: collision with an existing rule.
> - ``-ENOMEM``: not enough resources.
>
> .. raw:: pdf
>
>     PageBreak
>
> Creation
> ~~~~~~~~
>
> Creating a flow rule is similar to validating one, except the rule is
> actually created.
>
> ::
>
>   struct rte_flow *
>   rte_flow_create(uint8_t port_id,
>                   const struct rte_flow_pattern *pattern,
>                   const struct rte_flow_actions *actions);
>
> Arguments:
>
> - ``port_id``: port identifier of Ethernet device.
> - ``pattern``: pattern specification to add.
> - ``actions``: actions associated with the flow definition.
>
> Return value:
>
> A valid flow pointer in case of success, NULL otherwise and ``rte_errno`` is
> set to the positive version of one of the error codes defined for
> ``rte_flow_validate()``.
>
> Destruction
> ~~~~~~~~~~~
>
> Flow rules destruction is not automatic, and a queue should not be released
> if any are still attached to it. Applications must take care of performing
> this step before releasing resources.
>
> ::
>
>   int
>   rte_flow_destroy(uint8_t port_id,
>                    struct rte_flow *flow);
>
>
> Failure to destroy a flow rule may occur when other flow rules depend on it,
> and destroying it would result in an inconsistent state.
>
> This function is only guaranteed to succeed if flow rules are destroyed in
> reverse order of their creation.
>
> Arguments:
>
> - ``port_id``: port identifier of Ethernet device.
> - ``flow``: flow rule to destroy.
>
> Return value:
>
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>    set.
>
> .. raw:: pdf
>
>     PageBreak
>
> Query
> ~~~~~
>
> Query an existing flow rule.
>
> This function allows retrieving flow-specific data such as counters. Data
> is gathered by special actions which must be present in the flow rule
> definition.
>
> ::
>
>   int
>   rte_flow_query(uint8_t port_id,
>                  struct rte_flow *flow,
>                  enum rte_flow_action_type action,
>                  void *data);
>
> Arguments:
>
> - ``port_id``: port identifier of Ethernet device.
> - ``flow``: flow rule to query.
> - ``action``: action type to query.
> - ``data``: pointer to storage for the associated query data type.
>
> Return value:
>
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>    set.
>
> .. raw:: pdf
>
>     PageBreak
>
> Behavior
> --------
>
> - API operations are synchronous and blocking (``EAGAIN`` cannot be
>    returned).
>
> - There is no provision for reentrancy/multi-thread safety, although nothing
>    should prevent different devices from being configured at the same
>    time. PMDs may protect their control path functions accordingly.
>
> - Stopping the data path (TX/RX) should not be necessary when managing flow
>    rules. If this cannot be achieved naturally or with workarounds (such as
>    temporarily replacing the burst function pointers), an appropriate error
>    code must be returned (``EBUSY``).
>
> - PMDs, not applications, are responsible for maintaining flow rules
>    configuration when stopping and restarting a port or performing other
>    actions which may affect them. They can only be destroyed explicitly.
>
> .. raw:: pdf
>
>     PageBreak
>
> Compatibility
> -------------
>
> No known hardware implementation supports all the features described in this
> document.
>
> Unsupported features or combinations are not expected to be fully emulated
> in software by PMDs for performance reasons. Partially supported features
> may be completed in software as long as hardware performs most of the work
> (such as queue redirection and packet recognition).
>
> However PMDs are expected to do their best to satisfy application requests
> by working around hardware limitations as long as doing so does not affect
> the behavior of existing flow rules.
>
> The following sections provide a few examples of such cases, they are based
> on limitations built into the previous APIs.
>
> Global bitmasks
> ~~~~~~~~~~~~~~~
>
> Each flow rule comes with its own, per-layer bitmasks, while hardware may
> support only a single, device-wide bitmask for a given layer type, so that
> two IPv4 rules cannot use different bitmasks.
>
> The expected behavior in this case is that PMDs automatically configure
> global bitmasks according to the needs of the first created flow rule.
>
> Subsequent rules are allowed only if their bitmasks match those, the
> ``EEXIST`` error code should be returned otherwise.
>
> Unsupported layer types
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> Many protocols can be simulated by crafting patterns with the `RAW`_ type.
>
> PMDs can rely on this capability to simulate support for protocols with
> fixed headers not directly recognized by hardware.
>
> ``ANY`` pattern item
> ~~~~~~~~~~~~~~~~~~~~
>
> This pattern item stands for anything, which can be difficult to translate
> to something hardware would understand, particularly if followed by more
> specific types.
>
> Consider the following pattern:
>
> +---+--------------------------------+
> | 0 | ETHER                          |
> +---+--------------------------------+
> | 1 | ANY (``min`` = 1, ``max`` = 1) |
> +---+--------------------------------+
> | 2 | TCP                            |
> +---+--------------------------------+
>
> Knowing that TCP does not make sense with something other than IPv4 and IPv6
> as L3, such a pattern may be translated to two flow rules instead:
>
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV4 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
>
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV6 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
>
> Note that as soon as a ANY rule covers several layers, this approach may
> yield a large number of hidden flow rules. It is thus suggested to only
> support the most common scenarios (anything as L2 and/or L3).
>
> .. raw:: pdf
>
>     PageBreak
>
> Unsupported actions
> ~~~~~~~~~~~~~~~~~~~
>
> - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
>    tagging (`ID`_) may be implemented in software as long as the target queue
>    is used by a single rule.
>
> - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
>    rules combining `QUEUE`_ and `PASSTHRU`_.
>
> - When a single target queue is provided, `RSS`_ can also be implemented
>    through `QUEUE`_.
>
> Flow rules priority
> ~~~~~~~~~~~~~~~~~~~
>
> While it would naturally make sense, flow rules cannot be assumed to be
> processed by hardware in the same order as their creation for several
> reasons:
>
> - They may be managed internally as a tree or a hash table instead of a
>    list.
> - Removing a flow rule before adding another one can either put the new rule
>    at the end of the list or reuse a freed entry.
> - Duplication may occur when packets are matched by several rules.
>
> For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> predictable behavior is only guaranteed by using different priority levels.
>
> Priority levels are not necessarily implemented in hardware, or may be
> severely limited (e.g. a single priority bit).
>
> For these reasons, priority levels may be implemented purely in software by
> PMDs.
>
> - For devices expecting flow rules to be added in the correct order, PMDs
>    may destroy and re-create existing rules after adding a new one with
>    a higher priority.
>
> - A configurable number of dummy or empty rules can be created at
>    initialization time to save high priority slots for later.
>
> - In order to save priority levels, PMDs may evaluate whether rules are
>    likely to collide and adjust their priority accordingly.
>
> .. raw:: pdf
>
>     PageBreak
>
> API migration
> =============
>
> Exhaustive list of deprecated filter types and how to convert them to
> generic flow rules.
>
> ``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
> ---------------------------------------
>
> `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF
> (action)`_ or `PF (action)`_ terminating action.
>
> +------------------------------------+
> | MACVLAN                            |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | ETH | ``spec`` | any | VF,     |
> |   |     +----------+-----+ PF      |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
>
> ``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
> ----------------------------------------------
>
> `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as
> a terminating action.
>
> +------------------------------------+
> | ETHERTYPE                          |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | ETH | ``spec`` | any | QUEUE,  |
> |   |     +----------+-----+ DROP    |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
>
> ``FLEXIBLE`` to ``RAW`` → ``QUEUE``
> -----------------------------------
>
> `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the
> terminating action and a defined priority level.
>
> +------------------------------------+
> | FLEXIBLE                           |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | RAW | ``spec`` | any | QUEUE   |
> |   |     +----------+-----+         |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
>
> ``SYN`` to ``TCP`` → ``QUEUE``
> ------------------------------
>
> `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
> `QUEUE`_ as the terminating action.
>
> Priority level can be set to simulate the high priority bit.
>
> +---------------------------------------------+
> | SYN                                         |
> +-----------------------------------+---------+
> | Pattern                           | Actions |
> +===+======+==========+=============+=========+
> | 0 | ETH  | ``spec`` | N/A         | QUEUE   |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | empty       |         |
> +---+------+----------+-------------+         |
> | 1 | IPV4 | ``spec`` | N/A         |         |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | empty       |         |
> +---+------+----------+-------------+         |
> | 2 | TCP  | ``spec`` | ``syn`` = 1 |         |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | ``syn`` = 1 |         |
> +---+------+----------+-------------+---------+
>
> ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
> ----------------------------------------------------
>
> `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
> `UDP`_ as L4 and `QUEUE`_ as the terminating action.
>
> A priority level can be specified as well.
>
> +---------------------------------------+
> | NTUPLE                                |
> +-----------------------------+---------+
> | Pattern                     | Actions |
> +===+======+==========+=======+=========+
> | 0 | ETH  | ``spec`` | N/A   | QUEUE   |
> |   |      +----------+-------+         |
> |   |      | ``mask`` | empty |         |
> +---+------+----------+-------+         |
> | 1 | IPV4 | ``spec`` | any   |         |
> |   |      +----------+-------+         |
> |   |      | ``mask`` | any   |         |
> +---+------+----------+-------+         |
> | 2 | TCP, | ``spec`` | any   |         |
> |   | UDP  +----------+-------+         |
> |   |      | ``mask`` | any   |         |
> +---+------+----------+-------+---------+
>
> ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
> ---------------------------------------------------------------------------
>
> `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types.
>
> In the following table, `ANY`_ is used to cover the optional L4.
>
> +------------------------------------------------+
> | TUNNEL                                         |
> +--------------------------------------+---------+
> | Pattern                              | Actions |
> +===+=========+==========+=============+=========+
> | 0 | ETH     | ``spec`` | any         | QUEUE   |
> |   |         +----------+-------------+         |
> |   |         | ``mask`` | any         |         |
> +---+---------+----------+-------------+         |
> | 1 | IPV4,   | ``spec`` | any         |         |
> |   | IPV6    +----------+-------------+         |
> |   |         | ``mask`` | any         |         |
> +---+---------+----------+-------------+         |
> | 2 | ANY     | ``spec`` | ``min`` = 0 |         |
> |   |         |          +-------------+         |
> |   |         |          | ``max`` = 0 |         |
> |   |         +----------+-------------+         |
> |   |         | ``mask`` | N/A         |         |
> +---+---------+----------+-------------+         |
> | 3 | VXLAN,  | ``spec`` | any         |         |
> |   | GENEVE, +----------+-------------+         |
> |   | TEREDO, | ``mask`` | any         |         |
> |   | NVGRE,  |          |             |         |
> |   | GRE,    |          |             |         |
> |   | ...     |          |             |         |
> +---+---------+----------+-------------+---------+
>
> .. raw:: pdf
>
>     PageBreak
>
> ``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
> ---------------------------------------------------------------
>
> `FDIR`_ is more complex than any other type, there are several methods to
> emulate its functionality. It is summarized for the most part in the table
> below.
>
> A few features are intentionally not supported:
>
> - The ability to configure the matching input set and masks for the entire
>    device, PMDs should take care of it automatically according to flow rules.
>
> - Returning four or eight bytes of matched data when using flex bytes
>    filtering. Although a specific action could implement it, it conflicts
>    with the much more useful 32 bits tagging on devices that support it.
>
> - Side effects on RSS processing of the entire device. Flow rules that
>    conflict with the current device configuration should not be
>    allowed. Similarly, device configuration should not be allowed when it
>    affects existing flow rules.
>
> - Device modes of operation. "none" is unsupported since filtering cannot be
>    disabled as long as a flow rule is present.
>
> - "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
>    according to the created flow rules.
>
> +----------------------------------------------+
> | FDIR                                         |
> +---------------------------------+------------+
> | Pattern                         | Actions    |
> +===+============+==========+=====+============+
> | 0 | ETH,       | ``spec`` | any | QUEUE,     |
> |   | RAW        +----------+-----+ DROP,      |
> |   |            | ``mask`` | any | PASSTHRU   |
> +---+------------+----------+-----+------------+
> | 1 | IPV4,      | ``spec`` | any | ID         |
> |   | IPV6       +----------+-----+ (optional) |
> |   |            | ``mask`` | any |            |
> +---+------------+----------+-----+            |
> | 2 | TCP,       | ``spec`` | any |            |
> |   | UDP,       +----------+-----+            |
> |   | SCTP       | ``mask`` | any |            |
> +---+------------+----------+-----+            |
> | 3 | VF,        | ``spec`` | any |            |
> |   | PF,        +----------+-----+            |
> |   | SIGNATURE  | ``mask`` | any |            |
> |   | (optional) |          |     |            |
> +---+------------+----------+-----+------------+
>
> ``HASH``
> ~~~~~~~~
>
> Hashing configuration is set per rule through the `SIGNATURE`_ item.
>
> Since it is usually a global device setting, all flow rules created with
> this item may have to share the same specification.
>
> ``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> All packets are matched. This type alters incoming packets to encapsulate
> them in a chosen tunnel type, optionally redirect them to a VF as well.
>
> The destination pool for tag based forwarding can be emulated with other
> flow rules using `DUP`_ as the action.
>
> +----------------------------------------+
> | L2_TUNNEL                              |
> +---------------------------+------------+
> | Pattern                   | Actions    |
> +===+======+==========+=====+============+
> | 0 | VOID | ``spec`` | N/A | VXLAN,     |
> |   |      |          |     | GENEVE,    |
> |   |      |          |     | ...        |
> |   |      +----------+-----+------------+
> |   |      | ``mask`` | N/A | VF         |
> |   |      |          |     | (optional) |
> +---+------+----------+-----+------------+
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 10/10] maintainers: add section for pmdinfo
  @ 2016-07-08 10:14  4%   ` Thomas Monjalon
    1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-08 10:14 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

The author of this feature is Neil Horman.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
---
 MAINTAINERS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a59191e..f996c2e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -68,6 +68,10 @@ F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: scripts/validate-abi.sh
 
+Driver information
+F: buildtools/pmdinfogen/
+F: tools/pmdinfo.py
+
 
 Environment Abstraction Layer
 -----------------------------
-- 
2.7.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-05 18:16  2% [dpdk-dev] [RFC] Generic flow director/filtering/classification API Adrien Mazarguil
  2016-07-07  7:14  0% ` Lu, Wenzhuo
@ 2016-07-07 23:15  0% ` Chandran, Sugesh
  2016-07-08 13:03  0%   ` Adrien Mazarguil
  2016-07-08 11:11  0% ` Liang, Cunming
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Chandran, Sugesh @ 2016-07-07 23:15 UTC (permalink / raw)
  To: Adrien Mazarguil, dev
  Cc: Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Lu, Wenzhuo, Jan Medala,
	John Daley, Chen, Jing D, Ananyev, Konstantin, Matej Vido,
	Alejandro Lucero, Sony Chacko, Jerin Jacob, De Lara Guarch,
	Pablo, Olga Shern

Hi Adrien,

Thank you for proposing this. It would be really useful for application such as OVS-DPDK.
Please find my comments and questions inline below prefixed with [Sugesh]. Most of them are from the perspective of enabling these APIs in application such as OVS-DPDK.

Regards
_Sugesh


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Tuesday, July 5, 2016 7:17 PM
> To: dev@dpdk.org
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Zhang, Helin
> <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Rasesh
> Mody <rasesh.mody@qlogic.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Rahul Lakkireddy
> <rahul.lakkireddy@chelsio.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>;
> Jan Medala <jan@semihalf.com>; John Daley <johndale@cisco.com>; Chen,
> Jing D <jing.d.chen@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Matej Vido <matejvido@gmail.com>;
> Alejandro Lucero <alejandro.lucero@netronome.com>; Sony Chacko
> <sony.chacko@qlogic.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Olga Shern <olgas@mellanox.com>
> Subject: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
> 
> Hi All,
> 
> First, forgive me for this large message, I know our mailboxes already
> suffer quite a bit from the amount of traffic on this ML.
> 
> This is not exactly yet another thread about how flow director should be
> extended, rather about a brand new API to handle filtering and
> classification for incoming packets in the most PMD-generic and
> application-friendly fashion we can come up with. Reasons described below.
> 
> I think this topic is important enough to include both the users of this API
> as well as PMD maintainers. So far I have CC'ed librte_ether (especially
> rte_eth_ctrl.h contributors), testpmd and PMD maintainers (with and
> without
> a .filter_ctrl implementation), but if you know application maintainers
> other than testpmd who use FDIR or might be interested in this discussion,
> feel free to add them.
> 
> The issues we found with the current approach are already summarized in
> the
> following document, but here is a quick summary for TL;DR folks:
> 
> - PMDs do not expose a common set of filter types and even when they do,
>   their behavior more or less differs.
> 
> - Applications need to determine and adapt to device-specific limitations
>   and quirks on their own, without help from PMDs.
> 
> - Writing an application that creates flow rules targeting all devices
>   supported by DPDK is thus difficult, if not impossible.
> 
> - The current API has too many unspecified areas (particularly regarding
>   side effects of flow rules) that make PMD implementation tricky.
> 
> This RFC API handles everything currently supported by .filter_ctrl, the
> idea being to reimplement all of these to make them fully usable by
> applications in a more generic and well defined fashion. It has a very small
> set of mandatory features and an easy method to let applications probe for
> supported capabilities.
> 
> The only downside is more work for the software control side of PMDs
> because
> they have to adapt to the API instead of the reverse. I think helpers can be
> added to EAL to assist with this.
> 
> HTML version:
> 
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
> 
> PDF version:
> 
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf
> 
> Related draft header file (for reference while reading the specification):
> 
>  https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h
> 
> Git tree for completeness (latest .rst version can be retrieved from here):
> 
>  https://github.com/6WIND/rte_flow
> 
> What follows is the ReST source of the above, for inline comments and
> discussion. I intend to update that specification accordingly.
> 
> ========================
> Generic filter interface
> ========================
> 
> .. footer::
> 
>    v0.6
> 
> .. contents::
> .. sectnum::
> .. raw:: pdf
> 
>    PageBreak
> 
> Overview
> ========
> 
> DPDK provides several competing interfaces added over time to perform
> packet
> matching and related actions such as filtering and classification.
> 
> They must be extended to implement the features supported by newer
> devices
> in order to expose them to applications, however the current design has
> several drawbacks:
> 
> - Complicated filter combinations which have not been hard-coded cannot be
>   expressed.
> - Prone to API/ABI breakage when new features must be added to an
> existing
>   filter type, which frequently happens.
> 
> From an application point of view:
> 
> - Having disparate interfaces, all optional and lacking in features does not
>   make this API easy to use.
> - Seemingly arbitrary built-in limitations of filter types based on the
>   device they were initially designed for.
> - Undefined relationship between different filter types.
> - High complexity, considerable undocumented and/or undefined behavior.
> 
> Considering the growing number of devices supported by DPDK, adding a
> new
> filter type each time a new feature must be implemented is not sustainable
> in the long term. Applications not written to target a specific device
> cannot really benefit from such an API.
> 
> For these reasons, this document defines an extensible unified API that
> encompasses and supersedes these legacy filter types.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Current API
> ===========
> 
> Rationale
> ---------
> 
> The reason several competing (and mostly overlapping) filtering APIs are
> present in DPDK is due to its nature as a thin layer between hardware and
> software.
> 
> Each subsequent interface has been added to better match the capabilities
> and limitations of the latest supported device, which usually happened to
> need an incompatible configuration approach. Because of this, many ended
> up
> device-centric and not usable by applications that were not written for that
> particular device.
> 
> This document is not the first attempt to address this proliferation issue,
> in fact a lot of work has already been done both to create a more generic
> interface while somewhat keeping compatibility with legacy ones through a
> common call interface (``rte_eth_dev_filter_ctrl()`` with the
> ``.filter_ctrl`` PMD callback in ``rte_ethdev.h``).
> 
> Today, these previously incompatible interfaces are known as filter types
> (``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``).
> 
> However while trivial to extend with new types, it only shifted the
> underlying problem as applications still need to be written for one kind of
> filter type, which, as described in the following sections, is not
> necessarily implemented by all PMDs that support filtering.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Filter types
> ------------
> 
> This section summarizes the capabilities of each filter type.
> 
> Although the following list is exhaustive, the description of individual
> types may contain inaccuracies due to the lack of documentation or usage
> examples.
> 
> Note: names are prefixed with ``RTE_ETH_FILTER_``.
> 
> ``MACVLAN``
> ~~~~~~~~~~~
> 
> Matching:
> 
> - L2 source/destination addresses.
> - Optional 802.1Q VLAN ID.
> - Masking individual fields on a rule basis is not supported.
> 
> Action:
> 
> - Packets are redirected either to a given VF device using its ID or to the
>   PF.
> 
> ``ETHERTYPE``
> ~~~~~~~~~~~~~
> 
> Matching:
> 
> - L2 source/destination addresses (optional).
> - Ethertype (no VLAN ID?).
> - Masking individual fields on a rule basis is not supported.
> 
> Action:
> 
> - Receive packets on a given queue.
> - Drop packets.
> 
> ``FLEXIBLE``
> ~~~~~~~~~~~~
> 
> Matching:
> 
> - At most 128 consecutive bytes anywhere in packets.
> - Masking is supported with byte granularity.
> - Priorities are supported (relative to this filter type, undefined
>   otherwise).
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> ``SYN``
> ~~~~~~~
> 
> Matching:
> 
> - TCP SYN packets only.
> - One high priority bit can be set to give the highest possible priority to
>   this type when other filters with different types are configured.
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> ``NTUPLE``
> ~~~~~~~~~~
> 
> Matching:
> 
> - Source/destination IPv4 addresses (optional in 2-tuple mode).
> - Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes).
> - L4 protocol (2 and 5-tuple modes).
> - Masking individual fields is supported.
> - TCP flags.
> - Up to 7 levels of priority relative to this filter type, undefined
>   otherwise.
> - No IPv6.
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> ``TUNNEL``
> ~~~~~~~~~~
> 
> Matching:
> 
> - Outer L2 source/destination addresses.
> - Inner L2 source/destination addresses.
> - Inner VLAN ID.
> - IPv4/IPv6 source (destination?) address.
> - Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
> 802.1BR
>   E-Tag).
> - Tenant ID for tunneling protocols that have one.
> - Any combination of the above can be specified.
> - Masking individual fields on a rule basis is not supported.
> 
> Action:
> 
> - Receive packets on a given queue.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``FDIR``
> ~~~~~~~~
> 
> Queries:
> 
> - Device capabilities and limitations.
> - Device statistics about configured filters (resource usage, collisions).
> - Device configuration (matching input set and masks)
> 
> Matching:
> 
> - Device mode of operation: none (to disable filtering), signature
>   (hash-based dispatching from masked fields) or perfect (either MAC VLAN
> or
>   tunnel).
> - L2 Ethertype.
> - Outer L2 destination address (MAC VLAN mode).
> - Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID
>   (tunnel mode).
> - IPv4 source/destination addresses, ToS, TTL and protocol fields.
> - IPv6 source/destination addresses, TC, protocol and hop limits fields.
> - UDP source/destination IPv4/IPv6 and ports.
> - TCP source/destination IPv4/IPv6 and ports.
> - SCTP source/destination IPv4/IPv6, ports and verification tag field.
> - Note, only one protocol type at once (either only L2 Ethertype, basic
>   IPv6, IPv4+UDP, IPv4+TCP and so on).
> - VLAN TCI (extended API).
> - At most 16 bytes to match in payload (extended API). A global device
>   look-up table specifies for each possible protocol layer (unknown, raw,
>   L2, L3, L4) the offset to use for each byte (they do not need to be
>   contiguous) and the related bitmask.
> - Whether packet is addressed to PF or VF, in that case its ID can be
>   matched as well (extended API).
> - Masking most of the above fields is supported, but simultaneously affects
>   all filters configured on a device.
> - Input set can be modified in a similar fashion for a given device to
>   ignore individual fields of filters (i.e. do not match the destination
>   address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_**
>   macros). Configuring this also affects RSS processing on **i40e**.
> - Filters can also provide 32 bits of arbitrary data to return as part of
>   matched packets.
> 
> Action:
> 
> - **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue.
> - **RTE_ETH_FDIR_REJECT**: drop packet immediately.
> - **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list,
>   otherwise process it with subsequent filters.
> - For accepted packets and if requested by filter, either 32 bits of
>   arbitrary data and four bytes of matched payload (only in case of flex
>   bytes matching), or eight bytes of matched payload (flex also) are added
>   to meta data.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``HASH``
> ~~~~~~~~
> 
> Not an actual filter type. Provides and retrieves the global device
> configuration (per port or entire NIC) for hash functions and their
> properties.
> 
> Hash function selection: "default" (keep current), XOR or Toeplitz.
> 
> This function can be configured per flow type (**RTE_ETH_FLOW_**
> definitions), supported types are:
> 
> - Unknown.
> - Raw.
> - Fragmented or non-fragmented IPv4.
> - Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other).
> - Fragmented or non-fragmented IPv6.
> - Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other).
> - L2 payload.
> - IPv6 with extensions.
> - IPv6 with L4 (TCP, UDP) and extensions.
> 
> ``L2_TUNNEL``
> ~~~~~~~~~~~~~
> 
> Matching:
> 
> - All packets received on a given port.
> 
> Action:
> 
> - Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
>   802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag
>   is implemented at the moment).
> - VF ID to use for tag insertion (currently unused).
> - Destination pool for tag based forwarding (pools are IDs that can be
>   affected to ports, duplication occurs if the same ID is shared by several
>   ports of the same NIC).
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Driver support
> --------------
> 
> ======== ======= ========= ======== === ====== ====== ==== ====
> =========
> Driver   MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH
> L2_TUNNEL
> ======== ======= ========= ======== === ====== ====== ==== ====
> =========
> bnx2x
> cxgbe
> e1000            yes       yes      yes yes
> ena
> enic                                                  yes
> fm10k
> i40e     yes     yes                           yes    yes  yes
> ixgbe            yes                yes yes           yes       yes
> mlx4
> mlx5                                                  yes
> szedata2
> ======== ======= ========= ======== === ====== ====== ==== ====
> =========
> 
> Flow director
> -------------
> 
> Flow director (FDIR) is the name of the most capable filter type, which
> covers most features offered by others. As such, it is the most widespread
> in PMDs that support filtering (i.e. all of them besides **e1000**).
> 
> It is also the only type that allows an arbitrary 32 bits value provided by
> applications to be attached to a filter and returned with matching packets
> instead of relying on the destination queue to recognize flows.
> 
> Unfortunately, even FDIR requires applications to be aware of low-level
> capabilities and limitations (most of which come directly from **ixgbe** and
> **i40e**):
> 
> - Bitmasks are set globally per device (port?), not per filter.
[Sugesh] This means application cannot define filters that matches on arbitrary different offsets?
If that’s the case, I assume the application has to program bitmask in advance. Otherwise how 
the API framework deduce this bitmask information from the rules?? Its not very clear to me
that how application pass down the bitmask information for multiple filters on same port?
> - Configuration state is not expected to be saved by the driver, and
>   stopping/restarting a port requires the application to perform it again
>   (API documentation is also unclear about this).
> - Monolithic approach with ABI issues as soon as a new kind of flow or
>   combination needs to be supported.
> - Cryptic global statistics/counters.
> - Unclear about how priorities are managed; filters seem to be arranged as a
>   linked list in hardware (possibly related to configuration order).
> 
> Packet alteration
> -----------------
> 
> One interesting feature is that the L2 tunnel filter type implements the
> ability to alter incoming packets through a filter (in this case to
> encapsulate them), thus the **mlx5** flow encap/decap features are not a
> foreign concept.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Proposed API
> ============
> 
> Terminology
> -----------
> 
> - **Filtering API**: overall framework affecting the fate of selected
>   packets, covers everything described in this document.
> - **Matching pattern**: properties to look for in received packets, a
>   combination of any number of items.
> - **Pattern item**: part of a pattern that either matches packet data
>   (protocol header, payload or derived information), or specifies properties
>   of the pattern itself.
> - **Actions**: what needs to be done when a packet matches a pattern.
> - **Flow rule**: this is the result of combining a *matching pattern* with
>   *actions*.
> - **Filter rule**: a less generic term than *flow rule*, can otherwise be
>   used interchangeably.
> - **Hit**: a flow rule is said to be *hit* when processing a matching
>   packet.
> 
> Requirements
> ------------
> 
> As described in the previous section, there is a growing need for a common
> method to configure filtering and related actions in a hardware independent
> fashion.
> 
> The filtering API should not disallow any filter combination by design and
> must remain as simple as possible to use. It can simply be defined as a
> method to perform one or several actions on selected packets.
> 
> PMDs are aware of the capabilities of the device they manage and should be
> responsible for preventing unsupported or conflicting combinations.
> 
> This approach is fundamentally different as it places most of the burden on
> the software side of the PMD instead of having device capabilities directly
> mapped to API functions, then expecting applications to work around
> ensuing
> compatibility issues.
> 
> Requirements for a new API:
> 
> - Flexible and extensible without causing API/ABI problems for existing
>   applications.
> - Should be unambiguous and easy to use.
> - Support existing filtering features and actions listed in `Filter types`_.
> - Support packet alteration.
> - In case of overlapping filters, their priority should be well documented.
> - Support filter queries (for example to retrieve counters).
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> High level design
> -----------------
> 
> The chosen approach to make filtering as generic as possible is by
> expressing matching patterns through lists of items instead of the flat
> structures used in DPDK today, enabling combinations that are not
> predefined
> and thus being more versatile.
> 
> Flow rules can have several distinct actions (such as counting,
> encapsulating, decapsulating before redirecting packets to a particular
> queue, etc.), instead of relying on several rules to achieve this and having
> applications deal with hardware implementation details regarding their
> order.
> 
> Support for different priority levels on a rule basis is provided, for
> example in order to force a more specific rule come before a more generic
> one for packets matched by both, however hardware support for more than
> a
> single priority level cannot be guaranteed. When supported, the number of
> available priority levels is usually low, which is why they can also be
> implemented in software by PMDs (e.g. to simulate missing priority levels by
> reordering rules).
> 
> In order to remain as hardware agnostic as possible, by default all rules
> are considered to have the same priority, which means that the order
> between
> overlapping rules (when a packet is matched by several filters) is
> undefined, packet duplication may even occur as a result.
> 
> PMDs may refuse to create overlapping rules at a given priority level when
> they can be detected (e.g. if a pattern matches an existing filter).
> 
> Thus predictable results for a given priority level can only be achieved
> with non-overlapping rules, using perfect matching on all protocol layers.
> 
> Support for multiple actions per rule may be implemented internally on top
> of non-default hardware priorities, as a result both features may not be
> simultaneously available to applications.
> 
> Considering that allowed pattern/actions combinations cannot be known in
> advance and would result in an unpractically large number of capabilities to
> expose, a method is provided to validate a given rule from the current
> device configuration state without actually adding it (akin to a "dry run"
> mode).
> 
> This enables applications to check if the rule types they need is supported
> at initialization time, before starting their data path. This method can be
> used anytime, its only requirement being that the resources needed by a
> rule
> must exist (e.g. a target RX queue must be configured first).
> 
> Each defined rule is associated with an opaque handle managed by the PMD,
> applications are responsible for keeping it. These can be used for queries
> and rules management, such as retrieving counters or other data and
> destroying them.
> 
> Handles must be destroyed before releasing associated resources such as
> queues.
> 
> Integration
> -----------
> 
> To avoid ABI breakage, this new interface will be implemented through the
> existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using
> **RTE_ETH_FILTER_GENERIC** as a new filter type.
> 
> However a public front-end API described in `Rules management`_ will
> be added as the preferred method to use it.
> 
> Once discussions with the community have converged to a definite API,
> legacy
> filter types should be deprecated and a deadline defined to remove their
> support entirely.
> 
> PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC**
> or
> drop filtering support entirely. Less maintained PMDs for older hardware
> may
> lose support at this point.
> 
> The notion of filter type will then be deprecated and subsequently dropped
> to avoid confusion between both frameworks.
> 
> Implementation details
> ======================
> 
> Flow rule
> ---------
> 
> A flow rule is the combination of a matching pattern with a list of actions,
> and is the basis of this API.
> 
> Priorities
> ~~~~~~~~~~
> 
> A priority can be assigned to a matching pattern.
> 
> The default priority level is 0 and is also the highest. Support for more
> than a single priority level in hardware is not guaranteed.
> 
> If a packet is matched by several filters at a given priority level, the
> outcome is undefined. It can take any path and can even be duplicated.
> 
> Matching pattern
> ~~~~~~~~~~~~~~~~
> 
> A matching pattern comprises any number of items of various types.
> 
> Items are arranged in a list to form a matching pattern for packets. They
> fall in two categories:
> 
> - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, VXLAN and
> so
>   on), usually associated with a specification structure. These must be
>   stacked in the same order as the protocol layers to match, starting from
>   L2.
> 
> - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF,
>   SIGNATURE and so on), often without a specification structure. Since they
>   are meta data that does not match packet contents, these can be specified
>   anywhere within item lists without affecting the protocol matching items.
> 
> Most item specifications can be optionally paired with a mask to narrow the
> specific fields or bits to be matched.
> 
> - Items are defined with ``struct rte_flow_item``.
> - Patterns are defined with ``struct rte_flow_pattern``.
> 
> Example of an item specification matching an Ethernet header:
> 
> +-----------------------------------------+
> | Ethernet                                |
> +==========+=========+====================+
> | ``spec`` | ``src`` | ``00:01:02:03:04`` |
> |          +---------+--------------------+
> |          | ``dst`` | ``00:2a:66:00:01`` |
> +----------+---------+--------------------+
> | ``mask`` | ``src`` | ``00:ff:ff:ff:00`` |
> |          +---------+--------------------+
> |          | ``dst`` | ``00:00:00:00:ff`` |
> +----------+---------+--------------------+
> 
> Non-masked bits stand for any value, Ethernet headers with the following
> properties are thus matched:
> 
> - ``src``: ``??:01:02:03:??``
> - ``dst``: ``??:??:??:??:01``
> 
> Except for meta types that do not need one, ``spec`` must be a valid pointer
> to a structure of the related item type. A ``mask`` of the same type can be
> provided to tell which bits in ``spec`` are to be matched.
> 
> A mask is normally only needed for ``spec`` fields matching packet data,
> ignored otherwise. See individual item types for more information.
> 
> A ``NULL`` mask pointer is allowed and is similar to matching with a full
> mask (all ones) ``spec`` fields supported by hardware, the remaining fields
> are ignored (all zeroes), there is thus no error checking for unsupported
> fields.
> 
> Matching pattern items for packet data must be naturally stacked (ordered
> from lowest to highest protocol layer), as in the following examples:
> 
> +--------------+
> | TCPv4 as L4  |
> +===+==========+
> | 0 | Ethernet |
> +---+----------+
> | 1 | IPv4     |
> +---+----------+
> | 2 | TCP      |
> +---+----------+
> 
> +----------------+
> | TCPv6 in VXLAN |
> +===+============+
> | 0 | Ethernet   |
> +---+------------+
> | 1 | IPv4       |
> +---+------------+
> | 2 | UDP        |
> +---+------------+
> | 3 | VXLAN      |
> +---+------------+
> | 4 | Ethernet   |
> +---+------------+
> | 5 | IPv6       |
> +---+------------+
> | 6 | TCP        |
> +---+------------+
> 
> +-----------------------------+
> | TCPv4 as L4 with meta items |
> +===+=========================+
> | 0 | VOID                    |
> +---+-------------------------+
> | 1 | Ethernet                |
> +---+-------------------------+
> | 2 | VOID                    |
> +---+-------------------------+
> | 3 | IPv4                    |
> +---+-------------------------+
> | 4 | TCP                     |
> +---+-------------------------+
> | 5 | VOID                    |
> +---+-------------------------+
> | 6 | VOID                    |
> +---+-------------------------+
> 
> The above example shows how meta items do not affect packet data
> matching
> items, as long as those remain stacked properly. The resulting matching
> pattern is identical to "TCPv4 as L4".
> 
> +----------------+
> | UDPv6 anywhere |
> +===+============+
> | 0 | IPv6       |
> +---+------------+
> | 1 | UDP        |
> +---+------------+
> 
> If supported by the PMD, omitting one or several protocol layers at the
> bottom of the stack as in the above example (missing an Ethernet
> specification) enables hardware to look anywhere in packets.
> 
> It is unspecified whether the payload of supported encapsulations
> (e.g. VXLAN inner packet) is matched by such a pattern, which may apply to
> inner, outer or both packets.
> 
> +---------------------+
> | Invalid, missing L3 |
> +===+=================+
> | 0 | Ethernet        |
> +---+-----------------+
> | 1 | UDP             |
> +---+-----------------+
> 
> The above pattern is invalid due to a missing L3 specification between L2
> and L4. It is only allowed at the bottom and at the top of the stack.
> 
> Meta item types
> ~~~~~~~~~~~~~~~
> 
> These do not match packet data but affect how the pattern is processed,
> most
> of them do not need a specification structure. This particularity allows
> them to be specified anywhere without affecting other item types.
> 
> ``END``
> ^^^^^^^
> 
> End marker for item lists. Prevents further processing of items, thereby
> ending the pattern.
> 
> - Its numeric value is **0** for convenience.
> - PMD support is mandatory.
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | END                |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> ``VOID``
> ^^^^^^^^
> 
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
> 
> - PMD support is mandatory.
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | VOID               |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> One usage example for this type is generating rules that share a common
> prefix quickly without reallocating memory, only by updating item types:
> 
> +------------------------+
> | TCP, UDP or ICMP as L4 |
> +===+====================+
> | 0 | Ethernet           |
> +---+--------------------+
> | 1 | IPv4               |
> +---+------+------+------+
> | 2 | UDP  | VOID | VOID |
> +---+------+------+------+
> | 3 | VOID | TCP  | VOID |
> +---+------+------+------+
> | 4 | VOID | VOID | ICMP |
> +---+------+------+------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``INVERT``
> ^^^^^^^^^^
> 
> Inverted matching, i.e. process packets that do not match the pattern.
> 
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | INVERT             |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> Usage example in order to match non-TCPv4 packets only:
> 
> +--------------------+
> | Anything but TCPv4 |
> +===+================+
> | 0 | INVERT         |
> +---+----------------+
> | 1 | Ethernet       |
> +---+----------------+
> | 2 | IPv4           |
> +---+----------------+
> | 3 | TCP            |
> +---+----------------+
> 
> ``PF``
> ^^^^^^
> 
> Matches packets addressed to the physical function of the device.
> 
> - Both ``spec`` and ``mask`` are ignored.
> 
> +--------------------+
> | PF                 |
> +==========+=========+
> | ``spec`` | ignored |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> ``VF``
> ^^^^^^
> 
> Matches packets addressed to the given virtual function ID of the device.
> 
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
> 
> +----------------------------------------+
> | VF                                     |
> +==========+=========+===================+
> | ``spec`` | ``vf``  | destination VF ID |
> +----------+---------+-------------------+
> | ``mask`` | ignored                     |
> +----------+-----------------------------+
> 
> ``SIGNATURE``
> ^^^^^^^^^^^^^
> 
> Requests hash-based signature dispatching for this rule.
> 
> Considering this is a global setting on devices that support it, all
> subsequent filter rules may have to be created with it as well.
> 
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
> 
> +--------------------+
> | SIGNATURE          |
> +==========+=========+
> | ``spec`` | TBD     |
> +----------+---------+
> | ``mask`` | ignored |
> +----------+---------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Data matching item types
> ~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Most of these are basically protocol header definitions with associated
> bitmasks. They must be specified (stacked) from lowest to highest protocol
> layer.
> 
> The following list is not exhaustive as new protocols will be added in the
> future.
> 
> ``ANY``
> ^^^^^^^
> 
> Matches any protocol in place of the current layer, a single ANY may also
> stand for several protocol layers.
> 
> This is usually specified as the first pattern item when looking for a
> protocol anywhere in a packet.
> 
> - A maximum value of **0** requests matching any number of protocol
> layers
>   above or equal to the minimum value, a maximum value lower than the
>   minimum one is otherwise invalid.
> - Only ``spec`` needs to be defined, ``mask`` is ignored.
> 
> +-----------------------------------------------------------------------+
> | ANY                                                                   |
> +==========+=========+====================================
> ==============+
> | ``spec`` | ``min`` | minimum number of layers covered                 |
> |          +---------+--------------------------------------------------+
> |          | ``max`` | maximum number of layers covered, 0 for infinity |
> +----------+---------+--------------------------------------------------+
> | ``mask`` | ignored                                                    |
> +----------+------------------------------------------------------------+
> 
> Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or
> IPv6)
> and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
> or IPv6) matched by the second ANY specification:
> 
> +----------------------------------+
> | TCP in VXLAN with wildcards      |
> +===+==============================+
> | 0 | Ethernet                     |
> +---+-----+----------+---------+---+
> | 1 | ANY | ``spec`` | ``min`` | 2 |
> |   |     |          +---------+---+
> |   |     |          | ``max`` | 2 |
> +---+-----+----------+---------+---+
> | 2 | VXLAN                        |
> +---+------------------------------+
> | 3 | Ethernet                     |
> +---+-----+----------+---------+---+
> | 4 | ANY | ``spec`` | ``min`` | 1 |
> |   |     |          +---------+---+
> |   |     |          | ``max`` | 1 |
> +---+-----+----------+---------+---+
> | 5 | TCP                          |
> +---+------------------------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``RAW``
> ^^^^^^^
> 
> Matches a string of a given length at a given offset (in bytes), or anywhere
> in the payload of the current protocol layer (including L2 header if used as
> the first item in the stack).
> 
> This does not increment the protocol layer count as it is not a protocol
> definition. Subsequent RAW items modulate the first absolute one with
> relative offsets.
> 
> - Using **-1** as the ``offset`` of the first RAW item makes its absolute
>   offset not fixed, i.e. the pattern is searched everywhere.
> - ``mask`` only affects the pattern.
> 
> +--------------------------------------------------------------+
> | RAW                                                          |
> +==========+=============+================================
> =====+
> | ``spec`` | ``offset``  | absolute or relative pattern offset |
> |          +-------------+-------------------------------------+
> |          | ``length``  | pattern length                      |
> |          +-------------+-------------------------------------+
> |          | ``pattern`` | byte string of the above length     |
> +----------+-------------+-------------------------------------+
> | ``mask`` | ``offset``  | ignored                             |
> |          +-------------+-------------------------------------+
> |          | ``length``  | ignored                             |
> |          +-------------+-------------------------------------+
> |          | ``pattern`` | bitmask with the same byte length   |
> +----------+-------------+-------------------------------------+
> 
> Example pattern looking for several strings at various offsets of a UDP
> payload, using combined RAW items:
> 
> +------------------------------------------+
> | UDP payload matching                     |
> +===+======================================+
> | 0 | Ethernet                             |
> +---+--------------------------------------+
> | 1 | IPv4                                 |
> +---+--------------------------------------+
> | 2 | UDP                                  |
> +---+-----+----------+-------------+-------+
> | 3 | RAW | ``spec`` | ``offset``  | -1    |
> |   |     |          +-------------+-------+
> |   |     |          | ``length``  | 3     |
> |   |     |          +-------------+-------+
> |   |     |          | ``pattern`` | "foo" |
> +---+-----+----------+-------------+-------+
> | 4 | RAW | ``spec`` | ``offset``  | 20    |
> |   |     |          +-------------+-------+
> |   |     |          | ``length``  | 3     |
> |   |     |          +-------------+-------+
> |   |     |          | ``pattern`` | "bar" |
> +---+-----+----------+-------------+-------+
> | 5 | RAW | ``spec`` | ``offset``  | -30   |
> |   |     |          +-------------+-------+
> |   |     |          | ``length``  | 3     |
> |   |     |          +-------------+-------+
> |   |     |          | ``pattern`` | "baz" |
> +---+-----+----------+-------------+-------+
> 
> This translates to:
> 
> - Locate "foo" in UDP payload, remember its offset.
> - Check "bar" at "foo"'s offset plus 20 bytes.
> - Check "baz" at "foo"'s offset minus 30 bytes.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``ETH``
> ^^^^^^^
> 
> Matches an Ethernet header.
> 
> - ``dst``: destination MAC.
> - ``src``: source MAC.
> - ``type``: EtherType.
> - ``tags``: number of 802.1Q/ad tags defined.
> - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
> 
>  - ``tpid``: Tag protocol identifier.
>  - ``tci``: Tag control information.
> 
> ``IPV4``
> ^^^^^^^^
> 
> Matches an IPv4 header.
> 
> - ``src``: source IP address.
> - ``dst``: destination IP address.
> - ``tos``: ToS/DSCP field.
> - ``ttl``: TTL field.
> - ``proto``: protocol number for the next layer.
> 
> ``IPV6``
> ^^^^^^^^
> 
> Matches an IPv6 header.
> 
> - ``src``: source IP address.
> - ``dst``: destination IP address.
> - ``tc``: traffic class field.
> - ``nh``: Next header field (protocol).
> - ``hop_limit``: hop limit field (TTL).
> 
> ``ICMP``
> ^^^^^^^^
> 
> Matches an ICMP header.
> 
> - TBD.
> 
> ``UDP``
> ^^^^^^^
> 
> Matches a UDP header.
> 
> - ``sport``: source port.
> - ``dport``: destination port.
> - ``length``: UDP length.
> - ``checksum``: UDP checksum.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``TCP``
> ^^^^^^^
> 
> Matches a TCP header.
> 
> - ``sport``: source port.
> - ``dport``: destination port.
> - All other TCP fields and bits.
> 
> ``VXLAN``
> ^^^^^^^^^
> 
> Matches a VXLAN header.
> 
> - TBD.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Actions
> ~~~~~~~
> 
> Each possible action is represented by a type. Some have associated
> configuration structures. Several actions combined in a list can be affected
> to a flow rule. That list is not ordered.
> 
> At least one action must be defined in a filter rule in order to do
> something with matched packets.
> 
> - Actions are defined with ``struct rte_flow_action``.
> - A list of actions is defined with ``struct rte_flow_actions``.
> 
> They fall in three categories:
> 
> - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
>   processing matched packets by subsequent flow rules, unless overridden
>   with PASSTHRU.
> 
> - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
> for
>   additional processing by subsequent flow rules.
> 
> - Other non terminating meta actions that do not affect the fate of packets
>   (END, VOID, ID, COUNT).
> 
> When several actions are combined in a flow rule, they should all have
> different types (e.g. dropping a packet twice is not possible). However
> considering the VOID type is an exception to this rule, the defined behavior
> is for PMDs to only take into account the last action of a given type found
> in the list. PMDs still perform error checking on the entire list.
> 
> *Note that PASSTHRU is the only action able to override a terminating rule.*
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Example of an action that redirects packets to queue index 10:
> 
> +----------------+
> | QUEUE          |
> +===========+====+
> | ``queue`` | 10 |
> +-----------+----+
> 
> Action lists examples, their order is not significant, applications must
> consider all actions to be performed simultaneously:
> 
> +----------------+
> | Count and drop |
> +=======+========+
> | COUNT |        |
> +-------+--------+
> | DROP  |        |
> +-------+--------+
> 
> +--------------------------+
> | Tag, count and redirect  |
> +=======+===========+======+
> | ID    | ``id``    | 0x2a |
> +-------+-----------+------+
> | COUNT |                  |
> +-------+-----------+------+
> | QUEUE | ``queue`` | 10   |
> +-------+-----------+------+
> 
> +-----------------------+
> | Redirect to queue 5   |
> +=======+===============+
> | DROP  |               |
> +-------+-----------+---+
> | QUEUE | ``queue`` | 5 |
> +-------+-----------+---+
> 
> In the above example, considering both actions are performed
> simultaneously,
> its end result is that only QUEUE has any effect.
> 
> +-----------------------+
> | Redirect to queue 3   |
> +=======+===========+===+
> | QUEUE | ``queue`` | 5 |
> +-------+-----------+---+
> | VOID  |               |
> +-------+-----------+---+
> | QUEUE | ``queue`` | 3 |
> +-------+-----------+---+
> 
> As previously described, only the last action of a given type found in the
> list is taken into account. The above example also shows that VOID is
> ignored.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Action types
> ~~~~~~~~~~~~
> 
> Common action types are described in this section. Like pattern item types,
> this list is not exhaustive as new actions will be added in the future.
> 
> ``END`` (action)
> ^^^^^^^^^^^^^^^^
> 
> End marker for action lists. Prevents further processing of actions, thereby
> ending the list.
> 
> - Its numeric value is **0** for convenience.
> - PMD support is mandatory.
> - No configurable property.
> 
> +---------------+
> | END           |
> +===============+
> | no properties |
> +---------------+
> 
> ``VOID`` (action)
> ^^^^^^^^^^^^^^^^^
> 
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
> 
> - PMD support is mandatory.
> - No configurable property.
> 
> +---------------+
> | VOID          |
> +===============+
> | no properties |
> +---------------+
> 
> ``PASSTHRU``
> ^^^^^^^^^^^^
> 
> Leaves packets up for additional processing by subsequent flow rules. This
> is the default when a rule does not contain a terminating action, but can be
> specified to force a rule to become non-terminating.
> 
> - No configurable property.
> 
> +---------------+
> | PASSTHRU      |
> +===============+
> | no properties |
> +---------------+
> 
> Example to copy a packet to a queue and continue processing by subsequent
> flow rules:
[Sugesh] If a packet get copied to a queue, it’s a termination action. 
How can its possible to do subsequent action after the packet already 
moved to the queue. ?How it differs from DUP action?
 Am I missing anything here? 
> 
> +--------------------------+
> | Copy to queue 8          |
> +==========+===============+
> | PASSTHRU |               |
> +----------+-----------+---+
> | QUEUE    | ``queue`` | 8 |
> +----------+-----------+---+
> 
> ``ID``
> ^^^^^^
> 
> Attaches a 32 bit value to packets.
> 
> +----------------------------------------------+
> | ID                                           |
> +========+=====================================+
> | ``id`` | 32 bit value to return with packets |
> +--------+-------------------------------------+
> 
[Sugesh] I assume the application has to program the flow 
with a unique ID and matching packets are stamped with this ID
when reporting to the software. The uniqueness of ID is NOT 
guaranteed by the API framework. Correct me if I am wrong here.

[Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to have a
64 bit ID? So that application can use the control plane flow pointer
Itself as an ID. Does it make sense? 


> .. raw:: pdf
> 
>    PageBreak
> 
> ``QUEUE``
> ^^^^^^^^^
> 
> Assigns packets to a given queue index.
> 
> - Terminating by default.
> 
> +--------------------------------+
> | QUEUE                          |
> +===========+====================+
> | ``queue`` | queue index to use |
> +-----------+--------------------+
> 
> ``DROP``
> ^^^^^^^^
> 
> Drop packets.
> 
> - No configurable property.
> - Terminating by default.
> - PASSTHRU overrides this action if both are specified.
> 
> +---------------+
> | DROP          |
> +===============+
> | no properties |
> +---------------+
> 
> ``COUNT``
> ^^^^^^^^^
> 
[Sugesh] Should we really have to set count action explicitly for every rule?
IMHO it would be great to be an implicit action. Most of the application would be
interested in the stats of almost all the filters/flows .
> Enables hits counter for this rule.
> 
> This counter can be retrieved and reset through ``rte_flow_query()``, see
> ``struct rte_flow_query_count``.
> 
> - Counters can be retrieved with ``rte_flow_query()``.
> - No configurable property.
> 
> +---------------+
> | COUNT         |
> +===============+
> | no properties |
> +---------------+
> 
> Query structure to retrieve and reset the flow rule hits counter:
> 
> +------------------------------------------------+
> | COUNT query                                    |
> +===========+=====+==============================+
> | ``reset`` | in  | reset counter after query    |
> +-----------+-----+------------------------------+
> | ``hits``  | out | number of hits for this flow |
> +-----------+-----+------------------------------+
> 
> ``DUP``
> ^^^^^^^
> 
> Duplicates packets to a given queue index.
> 
> This is normally combined with QUEUE, however when used alone, it is
> actually similar to QUEUE + PASSTHRU.
> 
> - Non-terminating by default.
> 
> +------------------------------------------------+
> | DUP                                            |
> +===========+====================================+
> | ``queue`` | queue index to duplicate packet to |
> +-----------+------------------------------------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``RSS``
> ^^^^^^^
> 
> Similar to QUEUE, except RSS is additionally performed on packets to spread
> them among several queues according to the provided parameters.
> 
> - Terminating by default.
> 
> +---------------------------------------------+
> | RSS                                         |
> +==============+==============================+
> | ``rss_conf`` | RSS parameters               |
> +--------------+------------------------------+
> | ``queues``   | number of entries in queue[] |
> +--------------+------------------------------+
> | ``queue[]``  | queue indices to use         |
> +--------------+------------------------------+
> 
> ``PF`` (action)
> ^^^^^^^^^^^^^^^
> 
> Redirects packets to the physical function (PF) of the current device.
> 
> - No configurable property.
> - Terminating by default.
> 
> +---------------+
> | PF            |
> +===============+
> | no properties |
> +---------------+
> 
> ``VF`` (action)
> ^^^^^^^^^^^^^^^
> 
> Redirects packets to the virtual function (VF) of the current device with
> the specified ID.
> 
> - Terminating by default.
> 
> +---------------------------------------+
> | VF                                    |
> +========+==============================+
> | ``id`` | VF ID to redirect packets to |
> +--------+------------------------------+
> 
> Planned types
> ~~~~~~~~~~~~~
> 
> Other action types are planned but not defined yet. These actions will add
> the ability to alter matching packets in several ways, such as performing
> encapsulation/decapsulation of tunnel headers on specific flows.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Rules management
> ----------------
> 
> A simple API with only four functions is provided to fully manage flows.
> 
> Each created flow rule is associated with an opaque, PMD-specific handle
> pointer. The application is responsible for keeping it until the rule is
> destroyed.
> 
> Flows rules are defined with ``struct rte_flow``.
> 
> Validation
> ~~~~~~~~~~
> 
> Given that expressing a definite set of device capabilities with this API is
> not practical, a dedicated function is provided to check if a flow rule is
> supported and can be created.
> 
> ::
> 
>  int
>  rte_flow_validate(uint8_t port_id,
>                    const struct rte_flow_pattern *pattern,
>                    const struct rte_flow_actions *actions);
> 
> While this function has no effect on the target device, the flow rule is
> validated against its current configuration state and the returned value
> should be considered valid by the caller for that state only.
> 
> The returned value is guaranteed to remain valid only as long as no
> successful calls to rte_flow_create() or rte_flow_destroy() are made in the
> meantime and no device parameter affecting flow rules in any way are
> modified, due to possible collisions or resource limitations (although in
> such cases ``EINVAL`` should not be returned).
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``pattern``: pattern specification to check.
> - ``actions``: actions associated with the flow definition.
> 
> Return value:
> 
> - **0** if flow rule is valid and can be created. A negative errno value
>   otherwise (``rte_errno`` is also set), the following errors are defined.
> - ``-EINVAL``: unknown or invalid rule specification.
> - ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial masks
>   are unsupported).
> - ``-EEXIST``: collision with an existing rule.
> - ``-ENOMEM``: not enough resources.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Creation
> ~~~~~~~~
> 
> Creating a flow rule is similar to validating one, except the rule is
> actually created.
> 
> ::
> 
>  struct rte_flow *
>  rte_flow_create(uint8_t port_id,
>                  const struct rte_flow_pattern *pattern,
>                  const struct rte_flow_actions *actions);
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``pattern``: pattern specification to add.
> - ``actions``: actions associated with the flow definition.
> 
> Return value:
> 
> A valid flow pointer in case of success, NULL otherwise and ``rte_errno`` is
> set to the positive version of one of the error codes defined for
> ``rte_flow_validate()``.
[Sugesh] : Kind of implementation specific query. What if application
try to add duplicate rules? Does the API create new flow entry for every 
API call? 
[Sugesh] Another concern is the cost and time of installing these rules
in the hardware. Can we make these APIs time bound(or at least an option to
set the time limit to execute these APIs), so that
Application doesn’t have to wait so long when installing and deleting flows with
slow hardware/NIC. What do you think? Most of the datapath flow installations are 
dynamic and triggered only when there is
an ingress traffic. Delay in flow insertion/deletion have unpredictable consequences.

[Sugesh] Another query is on the synchronization part. What if same rules are 
handled from different threads? Is application responsible for handling the concurrent
hardware programming?

> 
> Destruction
> ~~~~~~~~~~~
> 
> Flow rules destruction is not automatic, and a queue should not be released
> if any are still attached to it. Applications must take care of performing
> this step before releasing resources.
> 
> ::
> 
>  int
>  rte_flow_destroy(uint8_t port_id,
>                   struct rte_flow *flow);
> 
> 
[Sugesh] I would suggest having a clean-up API is really useful as the releasing of
Queue(is it applicable for releasing of port too?) is not guaranteeing the automatic flow 
destruction. This way application can initialize the port,
clean-up all the existing rules and create new rules  on a clean slate.

> Failure to destroy a flow rule may occur when other flow rules depend on it,
> and destroying it would result in an inconsistent state.
> 
> This function is only guaranteed to succeed if flow rules are destroyed in
> reverse order of their creation.
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``flow``: flow rule to destroy.
> 
> Return value:
> 
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>   set.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Query
> ~~~~~
> 
> Query an existing flow rule.
> 
> This function allows retrieving flow-specific data such as counters. Data
> is gathered by special actions which must be present in the flow rule
> definition.
> 
> ::
> 
>  int
>  rte_flow_query(uint8_t port_id,
>                 struct rte_flow *flow,
>                 enum rte_flow_action_type action,
>                 void *data);
> 
> Arguments:
> 
> - ``port_id``: port identifier of Ethernet device.
> - ``flow``: flow rule to query.
> - ``action``: action type to query.
> - ``data``: pointer to storage for the associated query data type.
> 
> Return value:
> 
> - **0** on success, a negative errno value otherwise and ``rte_errno`` is
>   set.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Behavior
> --------
> 
> - API operations are synchronous and blocking (``EAGAIN`` cannot be
>   returned).
> 
> - There is no provision for reentrancy/multi-thread safety, although nothing
>   should prevent different devices from being configured at the same
>   time. PMDs may protect their control path functions accordingly.
> 
> - Stopping the data path (TX/RX) should not be necessary when managing
> flow
>   rules. If this cannot be achieved naturally or with workarounds (such as
>   temporarily replacing the burst function pointers), an appropriate error
>   code must be returned (``EBUSY``).
> 
> - PMDs, not applications, are responsible for maintaining flow rules
>   configuration when stopping and restarting a port or performing other
>   actions which may affect them. They can only be destroyed explicitly.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
[Sugesh] Query all the rules for a specific port/queue?? Useful when adding and
deleting ports and queues dynamically according to the need. I am not sure 
what are the other  different usecases for these APIs. But I feel it makes much easier to 
manage flows from the application. What do you think?
> Compatibility
> -------------
> 
> No known hardware implementation supports all the features described in
> this
> document.
> 
> Unsupported features or combinations are not expected to be fully
> emulated
> in software by PMDs for performance reasons. Partially supported features
> may be completed in software as long as hardware performs most of the
> work
> (such as queue redirection and packet recognition).
> 
> However PMDs are expected to do their best to satisfy application requests
> by working around hardware limitations as long as doing so does not affect
> the behavior of existing flow rules.
> 
> The following sections provide a few examples of such cases, they are based
> on limitations built into the previous APIs.
> 
> Global bitmasks
> ~~~~~~~~~~~~~~~
> 
> Each flow rule comes with its own, per-layer bitmasks, while hardware may
> support only a single, device-wide bitmask for a given layer type, so that
> two IPv4 rules cannot use different bitmasks.
> 
> The expected behavior in this case is that PMDs automatically configure
> global bitmasks according to the needs of the first created flow rule.
> 
> Subsequent rules are allowed only if their bitmasks match those, the
> ``EEXIST`` error code should be returned otherwise.
> 
> Unsupported layer types
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> Many protocols can be simulated by crafting patterns with the `RAW`_ type.
> 
> PMDs can rely on this capability to simulate support for protocols with
> fixed headers not directly recognized by hardware.
> 
> ``ANY`` pattern item
> ~~~~~~~~~~~~~~~~~~~~
> 
> This pattern item stands for anything, which can be difficult to translate
> to something hardware would understand, particularly if followed by more
> specific types.
> 
> Consider the following pattern:
> 
> +---+--------------------------------+
> | 0 | ETHER                          |
> +---+--------------------------------+
> | 1 | ANY (``min`` = 1, ``max`` = 1) |
> +---+--------------------------------+
> | 2 | TCP                            |
> +---+--------------------------------+
> 
> Knowing that TCP does not make sense with something other than IPv4 and
> IPv6
> as L3, such a pattern may be translated to two flow rules instead:
> 
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV4 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
> 
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV6 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
> 
> Note that as soon as a ANY rule covers several layers, this approach may
> yield a large number of hidden flow rules. It is thus suggested to only
> support the most common scenarios (anything as L2 and/or L3).
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> Unsupported actions
> ~~~~~~~~~~~~~~~~~~~
> 
> - When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
>   tagging (`ID`_) may be implemented in software as long as the target queue
>   is used by a single rule.
> 
> - A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
>   rules combining `QUEUE`_ and `PASSTHRU`_.
> 
> - When a single target queue is provided, `RSS`_ can also be implemented
>   through `QUEUE`_.
> 
> Flow rules priority
> ~~~~~~~~~~~~~~~~~~~
> 
> While it would naturally make sense, flow rules cannot be assumed to be
> processed by hardware in the same order as their creation for several
> reasons:
> 
> - They may be managed internally as a tree or a hash table instead of a
>   list.
> - Removing a flow rule before adding another one can either put the new
> rule
>   at the end of the list or reuse a freed entry.
> - Duplication may occur when packets are matched by several rules.
> 
> For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> predictable behavior is only guaranteed by using different priority levels.
> 
> Priority levels are not necessarily implemented in hardware, or may be
> severely limited (e.g. a single priority bit).
> 
> For these reasons, priority levels may be implemented purely in software by
> PMDs.
> 
> - For devices expecting flow rules to be added in the correct order, PMDs
>   may destroy and re-create existing rules after adding a new one with
>   a higher priority.
> 
> - A configurable number of dummy or empty rules can be created at
>   initialization time to save high priority slots for later.
> 
> - In order to save priority levels, PMDs may evaluate whether rules are
>   likely to collide and adjust their priority accordingly.
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> API migration
> =============
> 
> Exhaustive list of deprecated filter types and how to convert them to
> generic flow rules.
> 
> ``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
> ---------------------------------------
> 
> `MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF
> (action)`_ or `PF (action)`_ terminating action.
> 
> +------------------------------------+
> | MACVLAN                            |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | ETH | ``spec`` | any | VF,     |
> |   |     +----------+-----+ PF      |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
> 
> ``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
> ----------------------------------------------
> 
> `ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as
> a terminating action.
> 
> +------------------------------------+
> | ETHERTYPE                          |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | ETH | ``spec`` | any | QUEUE,  |
> |   |     +----------+-----+ DROP    |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
> 
> ``FLEXIBLE`` to ``RAW`` → ``QUEUE``
> -----------------------------------
> 
> `FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the
> terminating action and a defined priority level.
> 
> +------------------------------------+
> | FLEXIBLE                           |
> +--------------------------+---------+
> | Pattern                  | Actions |
> +===+=====+==========+=====+=========+
> | 0 | RAW | ``spec`` | any | QUEUE   |
> |   |     +----------+-----+         |
> |   |     | ``mask`` | any |         |
> +---+-----+----------+-----+---------+
> 
> ``SYN`` to ``TCP`` → ``QUEUE``
> ------------------------------
> 
> `SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
> `QUEUE`_ as the terminating action.
> 
> Priority level can be set to simulate the high priority bit.
> 
> +---------------------------------------------+
> | SYN                                         |
> +-----------------------------------+---------+
> | Pattern                           | Actions |
> +===+======+==========+=============+=========+
> | 0 | ETH  | ``spec`` | N/A         | QUEUE   |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | empty       |         |
> +---+------+----------+-------------+         |
> | 1 | IPV4 | ``spec`` | N/A         |         |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | empty       |         |
> +---+------+----------+-------------+         |
> | 2 | TCP  | ``spec`` | ``syn`` = 1 |         |
> |   |      +----------+-------------+         |
> |   |      | ``mask`` | ``syn`` = 1 |         |
> +---+------+----------+-------------+---------+
> 
> ``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
> ----------------------------------------------------
> 
> `NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
> `UDP`_ as L4 and `QUEUE`_ as the terminating action.
> 
> A priority level can be specified as well.
> 
> +---------------------------------------+
> | NTUPLE                                |
> +-----------------------------+---------+
> | Pattern                     | Actions |
> +===+======+==========+=======+=========+
> | 0 | ETH  | ``spec`` | N/A   | QUEUE   |
> |   |      +----------+-------+         |
> |   |      | ``mask`` | empty |         |
> +---+------+----------+-------+         |
> | 1 | IPV4 | ``spec`` | any   |         |
> |   |      +----------+-------+         |
> |   |      | ``mask`` | any   |         |
> +---+------+----------+-------+         |
> | 2 | TCP, | ``spec`` | any   |         |
> |   | UDP  +----------+-------+         |
> |   |      | ``mask`` | any   |         |
> +---+------+----------+-------+---------+
> 
> ``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
> ---------------------------------------------------------------------------
> 
> `TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types.
> 
> In the following table, `ANY`_ is used to cover the optional L4.
> 
> +------------------------------------------------+
> | TUNNEL                                         |
> +--------------------------------------+---------+
> | Pattern                              | Actions |
> +===+=========+==========+=============+=========+
> | 0 | ETH     | ``spec`` | any         | QUEUE   |
> |   |         +----------+-------------+         |
> |   |         | ``mask`` | any         |         |
> +---+---------+----------+-------------+         |
> | 1 | IPV4,   | ``spec`` | any         |         |
> |   | IPV6    +----------+-------------+         |
> |   |         | ``mask`` | any         |         |
> +---+---------+----------+-------------+         |
> | 2 | ANY     | ``spec`` | ``min`` = 0 |         |
> |   |         |          +-------------+         |
> |   |         |          | ``max`` = 0 |         |
> |   |         +----------+-------------+         |
> |   |         | ``mask`` | N/A         |         |
> +---+---------+----------+-------------+         |
> | 3 | VXLAN,  | ``spec`` | any         |         |
> |   | GENEVE, +----------+-------------+         |
> |   | TEREDO, | ``mask`` | any         |         |
> |   | NVGRE,  |          |             |         |
> |   | GRE,    |          |             |         |
> |   | ...     |          |             |         |
> +---+---------+----------+-------------+---------+
> 
> .. raw:: pdf
> 
>    PageBreak
> 
> ``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
> ---------------------------------------------------------------
> 
> `FDIR`_ is more complex than any other type, there are several methods to
> emulate its functionality. It is summarized for the most part in the table
> below.
> 
> A few features are intentionally not supported:
> 
> - The ability to configure the matching input set and masks for the entire
>   device, PMDs should take care of it automatically according to flow rules.
> 
> - Returning four or eight bytes of matched data when using flex bytes
>   filtering. Although a specific action could implement it, it conflicts
>   with the much more useful 32 bits tagging on devices that support it.
> 
> - Side effects on RSS processing of the entire device. Flow rules that
>   conflict with the current device configuration should not be
>   allowed. Similarly, device configuration should not be allowed when it
>   affects existing flow rules.
> 
> - Device modes of operation. "none" is unsupported since filtering cannot be
>   disabled as long as a flow rule is present.
> 
> - "MAC VLAN" or "tunnel" perfect matching modes should be automatically
> set
>   according to the created flow rules.
> 
> +----------------------------------------------+
> | FDIR                                         |
> +---------------------------------+------------+
> | Pattern                         | Actions    |
> +===+============+==========+=====+============+
> | 0 | ETH,       | ``spec`` | any | QUEUE,     |
> |   | RAW        +----------+-----+ DROP,      |
> |   |            | ``mask`` | any | PASSTHRU   |
> +---+------------+----------+-----+------------+
> | 1 | IPV4,      | ``spec`` | any | ID         |
> |   | IPV6       +----------+-----+ (optional) |
> |   |            | ``mask`` | any |            |
> +---+------------+----------+-----+            |
> | 2 | TCP,       | ``spec`` | any |            |
> |   | UDP,       +----------+-----+            |
> |   | SCTP       | ``mask`` | any |            |
> +---+------------+----------+-----+            |
> | 3 | VF,        | ``spec`` | any |            |
> |   | PF,        +----------+-----+            |
> |   | SIGNATURE  | ``mask`` | any |            |
> |   | (optional) |          |     |            |
> +---+------------+----------+-----+------------+
> 
> ``HASH``
> ~~~~~~~~
> 
> Hashing configuration is set per rule through the `SIGNATURE`_ item.
> 
> Since it is usually a global device setting, all flow rules created with
> this item may have to share the same specification.
> 
> ``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> All packets are matched. This type alters incoming packets to encapsulate
> them in a chosen tunnel type, optionally redirect them to a VF as well.
> 
> The destination pool for tag based forwarding can be emulated with other
> flow rules using `DUP`_ as the action.
> 
> +----------------------------------------+
> | L2_TUNNEL                              |
> +---------------------------+------------+
> | Pattern                   | Actions    |
> +===+======+==========+=====+============+
> | 0 | VOID | ``spec`` | N/A | VXLAN,     |
> |   |      |          |     | GENEVE,    |
> |   |      |          |     | ...        |
> |   |      +----------+-----+------------+
> |   |      | ``mask`` | N/A | VF         |
> |   |      |          |     | (optional) |
> +---+------+----------+-----+------------+
> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 11/11] maintainers: add section for pmdinfo
  2016-07-07 15:36  4% ` [dpdk-dev] [PATCH 11/11] maintainers: add section for pmdinfo Thomas Monjalon
@ 2016-07-07 16:14  0%   ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2016-07-07 16:14 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Thu, Jul 07, 2016 at 05:36:30PM +0200, Thomas Monjalon wrote:
> The author of this feature is Neil Horman.
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> ---
>  MAINTAINERS | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a59191e..f996c2e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -68,6 +68,10 @@ F: lib/librte_compat/
>  F: doc/guides/rel_notes/deprecation.rst
>  F: scripts/validate-abi.sh
>  
> +Driver information
> +F: buildtools/pmdinfogen/
> +F: tools/pmdinfo.py
> +
>  
>  Environment Abstraction Layer
>  -----------------------------
> -- 
> 2.7.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 11/11] maintainers: add section for pmdinfo
  @ 2016-07-07 15:36  4% ` Thomas Monjalon
  2016-07-07 16:14  0%   ` Neil Horman
    1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-07-07 15:36 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

The author of this feature is Neil Horman.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 MAINTAINERS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a59191e..f996c2e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -68,6 +68,10 @@ F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: scripts/validate-abi.sh
 
+Driver information
+F: buildtools/pmdinfogen/
+F: tools/pmdinfo.py
+
 
 Environment Abstraction Layer
 -----------------------------
-- 
2.7.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-07  7:14  0% ` Lu, Wenzhuo
@ 2016-07-07 10:26  2%   ` Adrien Mazarguil
  2016-07-19  8:11  0%     ` Lu, Wenzhuo
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-07-07 10:26 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: dev, Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

Hi Lu Wenzhuo,

Thanks for your feedback, I'm replying below as well.

On Thu, Jul 07, 2016 at 07:14:18AM +0000, Lu, Wenzhuo wrote:
> Hi Adrien,
> I have some questions, please see inline, thanks.
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Wednesday, July 6, 2016 2:17 AM
> > To: dev@dpdk.org
> > Cc: Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody; Ajit Khaparde;
> > Rahul Lakkireddy; Lu, Wenzhuo; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> > Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> > Guarch, Pablo; Olga Shern
> > Subject: [RFC] Generic flow director/filtering/classification API
> > 
> > 
> > Requirements for a new API:
> > 
> > - Flexible and extensible without causing API/ABI problems for existing
> >   applications.
> > - Should be unambiguous and easy to use.
> > - Support existing filtering features and actions listed in `Filter types`_.
> > - Support packet alteration.
> > - In case of overlapping filters, their priority should be well documented.
> Does that mean we don't guarantee the consistent of priority? The priority can be different on different NICs. So the behavior of the actions  can be different. Right?

No, the intent is precisely to define what happens in order to get a
consistent result across different devices, and document cases with
undefined behavior. There must be no room left for interpretation.

For example, the API must describe what happens when two overlapping filters
(e.g. one matching an Ethernet header, another one matching an IP header)
match a given packet at a given priority level.

It is documented in section 4.1.1 (priorities) as "undefined behavior".
Applications remain free to do it and deal with consequences, at least they
know they cannot expect a consistent outcome, unless they use different
priority levels for both rules, see also 4.4.5 (flow rules priority).

> Seems the users still need to aware the some details of the HW? Do we need to add the negotiation for the priority?

Priorities as defined in this document may not be directly mappable to HW
capabilities (e.g. HW does not support enough priorities, or that some
corner case make them not work as described), in which case the PMD may
choose to simulate priorities (again 4.4.5), as long as the end result
follows the specification.

So users must not be aware of some HW details, the PMD does and must perform
the needed workarounds to suit their expectations. Users may only be
impacted by errors while attempting to create rules that are either
unsupported or would cause them (or existing rules) to diverge from the
spec.

> > Flow rules can have several distinct actions (such as counting,
> > encapsulating, decapsulating before redirecting packets to a particular
> > queue, etc.), instead of relying on several rules to achieve this and having
> > applications deal with hardware implementation details regarding their
> > order.
> I think normally HW doesn't support several actions in one rule. If a rule has several actions, seems HW has to split it to several rules. The order can still be a problem.

Note that, except for a very small subset of pattern items and actions,
supporting multiple actions for a given rule is not mandatory, and can be
emulated as you said by having to split them into several rules each with
its own priority if possible (see 3.3 "high level design").

Also, a rule "action" as defined in this API can be just about anything, for
example combining a queue redirection with 32-bit tagging. FDIR supports
many cases of what can be described as several actions, see 5.7 "FDIR to
most item types → QUEUE, DROP, PASSTHRU".

If you were thinking about having two queue targets for a given rule, then
I agree with you - that is why a rule cannot have more than a single action
of a given type (see 4.1.5 actions), to avoid such abuse from applications.

Applications must use several pass-through rules with different priority
levels if they want to perform a given action several times on a given
packet. Again, PMDs support is not mandatory as pass-through is optional.

> > ``ETH``
> > ^^^^^^^
> > 
> > Matches an Ethernet header.
> > 
> > - ``dst``: destination MAC.
> > - ``src``: source MAC.
> > - ``type``: EtherType.
> > - ``tags``: number of 802.1Q/ad tags defined.
> > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
> > 
> >  - ``tpid``: Tag protocol identifier.
> >  - ``tci``: Tag control information.
> "ETH" means all the parameters, dst, src, type... need to be matched? The same question for IPv4, IPv6 ...

Yes, it's basically the description of all Ethernet header fields including
VLAN tags (same for other protocols). Please see the linked draft header
file which should make all of this easier to understand:

 https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h

> > ``UDP``
> > ^^^^^^^
> > 
> > Matches a UDP header.
> > 
> > - ``sport``: source port.
> > - ``dport``: destination port.
> > - ``length``: UDP length.
> > - ``checksum``: UDP checksum.
> Why checksum? Do we need to filter the packets by checksum value?

Well, I've decided to include all protocol header fields for completeness
(so the ABI does not need to be broken later then they become necessary, or
require another pattern item), not that all of them make sense in a pattern.

In this specific case, all PMDs I know of must reject a pattern
specification with a nonzero mask for the checksum field, because none of
them support it.

> > ``VOID`` (action)
> > ^^^^^^^^^^^^^^^^^
> > 
> > Used as a placeholder for convenience. It is ignored and simply discarded by
> > PMDs.
> Don't understand why we need VOID. If it’s about the format. Why not guarantee it in rte layer?

I'm not sure to understand your question about rte layer, but this type is
fully managed by the PMD and is not supposed to be translated to a hardware
action.

I think it may come handy in some cases (like the VOID pattern item), so it
is defined just in case. Should be relatively trivial to support.

Applications may find a use for it when they want to statically define
templates for flow rules, when they need room for some reason.

> > Behavior
> > --------
> > 
> > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> >   returned).
> > 
> > - There is no provision for reentrancy/multi-thread safety, although nothing
> >   should prevent different devices from being configured at the same
> >   time. PMDs may protect their control path functions accordingly.
> > 
> > - Stopping the data path (TX/RX) should not be necessary when managing flow
> >   rules. If this cannot be achieved naturally or with workarounds (such as
> >   temporarily replacing the burst function pointers), an appropriate error
> >   code must be returned (``EBUSY``).
> PMD cannot stop the data path without adding lock. So I think if some rules cannot be applied without stopping rx/tx, PMD has to return fail.
> Or let the APP to stop the data path.

Agreed, that is the intent. If the PMD cannot touch flow rules for some
reason even after trying really hard, then it just returns EBUSY.

Perhaps we should write down that applications may get a different outcome
after stopping the data path if they get EBUSY?

> > - PMDs, not applications, are responsible for maintaining flow rules
> >   configuration when stopping and restarting a port or performing other
> >   actions which may affect them. They can only be destroyed explicitly.
> Don’t understand " They can only be destroyed explicitly."

This part says that as long as an application has not called
rte_flow_destroy() on a flow rule, it never disappears, whatever happens to
the port (stopped, restarted). The application is not responsible for
re-creating rules after that.

Note that according to the specification, this may translate to not being
able to stop a port as long as a flow rule is present, depending on how nice
the PMD intends to be with applications. Implementation can be done in small
steps with minimal amount of code on the PMD side.

> If a new rule has conflict with an old one, what should we do? Return fail?

That should not happen. If say 100 rules have been created with various
priorities and the port is happily running with them, stopping the port may
require the PMD to destroy them, it then has to re-create all 100 of them
exactly as they were automatically when restarting the port.

If re-creating them is not possible for some reason, the port cannot be
restarted as long as rules that cannot be added back haven't been destroyed
by the application. Frankly, this should not happen.

To manage this case, I suggest preventing applications from doing things
that conflict with existing flow rules while the port is stopped (just like
when it is not stopped, as described in 5.7 "FDIR to most item types").

> > ``ANY`` pattern item
> > ~~~~~~~~~~~~~~~~~~~~
> > 
> > This pattern item stands for anything, which can be difficult to translate
> > to something hardware would understand, particularly if followed by more
> > specific types.
> > 
> > Consider the following pattern:
> > 
> > +---+--------------------------------+
> > | 0 | ETHER                          |
> > +---+--------------------------------+
> > | 1 | ANY (``min`` = 1, ``max`` = 1) |
> > +---+--------------------------------+
> > | 2 | TCP                            |
> > +---+--------------------------------+
> > 
> > Knowing that TCP does not make sense with something other than IPv4 and IPv6
> > as L3, such a pattern may be translated to two flow rules instead:
> > 
> > +---+--------------------+
> > | 0 | ETHER              |
> > +---+--------------------+
> > | 1 | IPV4 (zeroed mask) |
> > +---+--------------------+
> > | 2 | TCP                |
> > +---+--------------------+
> > 
> > +---+--------------------+
> > | 0 | ETHER              |
> > +---+--------------------+
> > | 1 | IPV6 (zeroed mask) |
> > +---+--------------------+
> > | 2 | TCP                |
> > +---+--------------------+
> > 
> > Note that as soon as a ANY rule covers several layers, this approach may
> > yield a large number of hidden flow rules. It is thus suggested to only
> > support the most common scenarios (anything as L2 and/or L3).
> I think "any" may make things confusing.  How about if the NIC doesn't support IPv6? Should we return fail for this rule?

In a sense you are right, ANY relies on HW capabilities so you cannot know
that it won't match unsupported protocols. The above example would be
somewhat useless for a conscious application which should really have
created two separate flow rules (and gotten an error on the IPv6 one).

So an ANY flow rule only able to match v4 packets won't return an error.

ANY can be useful to match outer packets when only a tunnel header and the
inner packet are meaningful to the application. HW that does not recognize
the outer packet is not able to recognize the inner one anyway.

This section only says that PMDs should do their best to make HW match what
they can when faced with ANY.

Also once again, ANY support is not mandatory.

> > Flow rules priority
> > ~~~~~~~~~~~~~~~~~~~
> > 
> > While it would naturally make sense, flow rules cannot be assumed to be
> > processed by hardware in the same order as their creation for several
> > reasons:
> > 
> > - They may be managed internally as a tree or a hash table instead of a
> >   list.
> > - Removing a flow rule before adding another one can either put the new rule
> >   at the end of the list or reuse a freed entry.
> > - Duplication may occur when packets are matched by several rules.
> > 
> > For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> > predictable behavior is only guaranteed by using different priority levels.
> > 
> > Priority levels are not necessarily implemented in hardware, or may be
> > severely limited (e.g. a single priority bit).
> > 
> > For these reasons, priority levels may be implemented purely in software by
> > PMDs.
> > 
> > - For devices expecting flow rules to be added in the correct order, PMDs
> >   may destroy and re-create existing rules after adding a new one with
> >   a higher priority.
> > 
> > - A configurable number of dummy or empty rules can be created at
> >   initialization time to save high priority slots for later.
> > 
> > - In order to save priority levels, PMDs may evaluate whether rules are
> >   likely to collide and adjust their priority accordingly.
> If there's 3 rules, r1, r2,r3. The rules say the priority is r1 > r2 > r3. If PMD can only support r1 > r3 > r2, or doesn't support r2. Should PMD apply r1 and r3 or totally not support them all?

Remember that the API lets applications create only one rule at a time. If
all 3 rules are not supported together but individually are, the answer
depends on what the application does:

1. r1 OK, r2 FAIL => application chooses to stop here, thus only r1 works as
  expected (may roll back and remove r1 as a result).

2. r1 OK, r2 FAIL, r3 OK => application chooses to ignore the fact r2 failed
  and added r3 anyway, so it should end up with r1 > r3.

Applications should do as described in 1, they need to check for errors if
they want consistency.

This document describes only the basic functions, but may be extended later
with methods to add several flow rules at once, so rules that depend on
others can be added together and a single failure is returned without the
need for a rollback at the application level.

> A generic question, is the parsing supposed to be done by rte or PMD?

Actually, a bit of both. EAL will certainly at least provide helpers to
assist PMDs. This specification defines only the public-facing API for now,
but our goal is really to have something that is not too difficult to
implement both for applications and PMDs.

These helpers can be defined later with the first implementation.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4] Pci: Add the class_id support
  2016-07-06 11:08  3%     ` Ferruh Yigit
@ 2016-07-07  7:46  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-07  7:46 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Ziye Yang, dev

2016-07-06 12:08, Ferruh Yigit:
> On 6/14/2016 3:52 PM, Thomas Monjalon wrote:
> > 2016-05-24 20:50, Ziye Yang:
> >> This patch is used to add the class_id (class_code,
> >> subclass_code, programming_interface) support for
> >> pci_device probe. With this patch, it will be
> >> flexible for users to probe a class of devices
> >> by class_id.
> >>
> >>
> >> Signed-off-by: Ziye Yang <ziye.yang@intel.com>
> > 
> > Applied, thanks
> > 
> Hi Thomas, Ziye,
> 
> Is modification in public "struct rte_pci_id" is a ABI break?
> If so, it requires eal LIBABIVER increase and release notes update.

Not really sure. I was thinking that it is used only by drivers
but not by applications.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
  2016-07-05 18:16  2% [dpdk-dev] [RFC] Generic flow director/filtering/classification API Adrien Mazarguil
@ 2016-07-07  7:14  0% ` Lu, Wenzhuo
  2016-07-07 10:26  2%   ` Adrien Mazarguil
  2016-07-07 23:15  0% ` Chandran, Sugesh
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Lu, Wenzhuo @ 2016-07-07  7:14 UTC (permalink / raw)
  To: Adrien Mazarguil, dev
  Cc: Thomas Monjalon, Zhang, Helin, Wu, Jingjing, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Jan Medala, John Daley, Chen,
	Jing D, Ananyev, Konstantin, Matej Vido, Alejandro Lucero,
	Sony Chacko, Jerin Jacob, De Lara Guarch, Pablo, Olga Shern

Hi Adrien,
I have some questions, please see inline, thanks.

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Wednesday, July 6, 2016 2:17 AM
> To: dev@dpdk.org
> Cc: Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody; Ajit Khaparde;
> Rahul Lakkireddy; Lu, Wenzhuo; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> Guarch, Pablo; Olga Shern
> Subject: [RFC] Generic flow director/filtering/classification API
> 
> 
> Requirements for a new API:
> 
> - Flexible and extensible without causing API/ABI problems for existing
>   applications.
> - Should be unambiguous and easy to use.
> - Support existing filtering features and actions listed in `Filter types`_.
> - Support packet alteration.
> - In case of overlapping filters, their priority should be well documented.
Does that mean we don't guarantee the consistent of priority? The priority can be different on different NICs. So the behavior of the actions  can be different. Right?
Seems the users still need to aware the some details of the HW? Do we need to add the negotiation for the priority?

> 
> Flow rules can have several distinct actions (such as counting,
> encapsulating, decapsulating before redirecting packets to a particular
> queue, etc.), instead of relying on several rules to achieve this and having
> applications deal with hardware implementation details regarding their
> order.
I think normally HW doesn't support several actions in one rule. If a rule has several actions, seems HW has to split it to several rules. The order can still be a problem.

> 
> ``ETH``
> ^^^^^^^
> 
> Matches an Ethernet header.
> 
> - ``dst``: destination MAC.
> - ``src``: source MAC.
> - ``type``: EtherType.
> - ``tags``: number of 802.1Q/ad tags defined.
> - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
> 
>  - ``tpid``: Tag protocol identifier.
>  - ``tci``: Tag control information.
"ETH" means all the parameters, dst, src, type... need to be matched? The same question for IPv4, IPv6 ...

> 
> ``UDP``
> ^^^^^^^
> 
> Matches a UDP header.
> 
> - ``sport``: source port.
> - ``dport``: destination port.
> - ``length``: UDP length.
> - ``checksum``: UDP checksum.
Why checksum? Do we need to filter the packets by checksum value?

> 
> ``VOID`` (action)
> ^^^^^^^^^^^^^^^^^
> 
> Used as a placeholder for convenience. It is ignored and simply discarded by
> PMDs.
Don't understand why we need VOID. If it’s about the format. Why not guarantee it in rte layer?

> 
> Behavior
> --------
> 
> - API operations are synchronous and blocking (``EAGAIN`` cannot be
>   returned).
> 
> - There is no provision for reentrancy/multi-thread safety, although nothing
>   should prevent different devices from being configured at the same
>   time. PMDs may protect their control path functions accordingly.
> 
> - Stopping the data path (TX/RX) should not be necessary when managing flow
>   rules. If this cannot be achieved naturally or with workarounds (such as
>   temporarily replacing the burst function pointers), an appropriate error
>   code must be returned (``EBUSY``).
PMD cannot stop the data path without adding lock. So I think if some rules cannot be applied without stopping rx/tx, PMD has to return fail.
Or let the APP to stop the data path.

> 
> - PMDs, not applications, are responsible for maintaining flow rules
>   configuration when stopping and restarting a port or performing other
>   actions which may affect them. They can only be destroyed explicitly.
Don’t understand " They can only be destroyed explicitly." If a new rule has conflict with an old one, what should we do? Return fail?

> 
> ``ANY`` pattern item
> ~~~~~~~~~~~~~~~~~~~~
> 
> This pattern item stands for anything, which can be difficult to translate
> to something hardware would understand, particularly if followed by more
> specific types.
> 
> Consider the following pattern:
> 
> +---+--------------------------------+
> | 0 | ETHER                          |
> +---+--------------------------------+
> | 1 | ANY (``min`` = 1, ``max`` = 1) |
> +---+--------------------------------+
> | 2 | TCP                            |
> +---+--------------------------------+
> 
> Knowing that TCP does not make sense with something other than IPv4 and IPv6
> as L3, such a pattern may be translated to two flow rules instead:
> 
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV4 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
> 
> +---+--------------------+
> | 0 | ETHER              |
> +---+--------------------+
> | 1 | IPV6 (zeroed mask) |
> +---+--------------------+
> | 2 | TCP                |
> +---+--------------------+
> 
> Note that as soon as a ANY rule covers several layers, this approach may
> yield a large number of hidden flow rules. It is thus suggested to only
> support the most common scenarios (anything as L2 and/or L3).
I think "any" may make things confusing.  How about if the NIC doesn't support IPv6? Should we return fail for this rule?

> 
> Flow rules priority
> ~~~~~~~~~~~~~~~~~~~
> 
> While it would naturally make sense, flow rules cannot be assumed to be
> processed by hardware in the same order as their creation for several
> reasons:
> 
> - They may be managed internally as a tree or a hash table instead of a
>   list.
> - Removing a flow rule before adding another one can either put the new rule
>   at the end of the list or reuse a freed entry.
> - Duplication may occur when packets are matched by several rules.
> 
> For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> predictable behavior is only guaranteed by using different priority levels.
> 
> Priority levels are not necessarily implemented in hardware, or may be
> severely limited (e.g. a single priority bit).
> 
> For these reasons, priority levels may be implemented purely in software by
> PMDs.
> 
> - For devices expecting flow rules to be added in the correct order, PMDs
>   may destroy and re-create existing rules after adding a new one with
>   a higher priority.
> 
> - A configurable number of dummy or empty rules can be created at
>   initialization time to save high priority slots for later.
> 
> - In order to save priority levels, PMDs may evaluate whether rules are
>   likely to collide and adjust their priority accordingly.
If there's 3 rules, r1, r2,r3. The rules say the priority is r1 > r2 > r3. If PMD can only support r1 > r3 > r2, or doesn't support r2. Should PMD apply r1 and r3 or totally not support them all?

A generic question, is the parsing supposed to be done by rte or PMD?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] cryptodev: move new cryptodev type to bottom of enum
@ 2016-07-06 14:05  3% Pablo de Lara
  2016-07-08 17:52  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Pablo de Lara @ 2016-07-06 14:05 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty, Pablo de Lara

New cryptodev type for the new KASUMI PMD was added
in the cryptodev type enum, but not at the end of it,
causing an ABI breakage.

Fixes: 2773c86d061a ("crypto/kasumi: add driver for KASUMI library")

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Reported-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_cryptodev/rte_cryptodev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index 7768f0a..508c1f7 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -67,9 +67,9 @@ enum rte_cryptodev_type {
 	RTE_CRYPTODEV_NULL_PMD = 1,	/**< Null crypto PMD */
 	RTE_CRYPTODEV_AESNI_GCM_PMD,	/**< AES-NI GCM PMD */
 	RTE_CRYPTODEV_AESNI_MB_PMD,	/**< AES-NI multi buffer PMD */
-	RTE_CRYPTODEV_KASUMI_PMD,	/**< KASUMI PMD */
 	RTE_CRYPTODEV_QAT_SYM_PMD,	/**< QAT PMD Symmetric Crypto */
 	RTE_CRYPTODEV_SNOW3G_PMD,	/**< SNOW 3G PMD */
+	RTE_CRYPTODEV_KASUMI_PMD,	/**< KASUMI PMD */
 };
 
 extern const char **rte_cyptodev_names;
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/3] kasumi: add new KASUMI PMD
  2016-07-06 11:26  3%     ` Ferruh Yigit
  2016-07-06 13:07  0%       ` Thomas Monjalon
@ 2016-07-06 13:22  0%       ` De Lara Guarch, Pablo
  1 sibling, 0 replies; 200+ results
From: De Lara Guarch, Pablo @ 2016-07-06 13:22 UTC (permalink / raw)
  To: Yigit, Ferruh, dev; +Cc: Doherty, Declan, Jain, Deepak K



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, July 06, 2016 12:26 PM
> To: De Lara Guarch, Pablo; dev@dpdk.org
> Cc: Doherty, Declan; Jain, Deepak K
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] kasumi: add new KASUMI PMD
> 
> On 6/20/2016 3:40 PM, Pablo de Lara wrote:
> > Added new SW PMD which makes use of the libsso_kasumi SW library,
> > which provides wireless algorithms KASUMI F8 and F9
> > in software.
> >
> > This PMD supports cipher-only, hash-only and chained operations
> > ("cipher then hash" and "hash then cipher") of the following
> > algorithms:
> > - RTE_CRYPTO_SYM_CIPHER_KASUMI_F8
> > - RTE_CRYPTO_SYM_AUTH_KASUMI_F9
> >
> > Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> > Acked-by: Jain, Deepak K <deepak.k.jain@intel.com>
> 
> ...
> 
> > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > @@ -59,12 +59,15 @@ extern "C" {
> >  /**< Intel QAT Symmetric Crypto PMD device name */
> >  #define CRYPTODEV_NAME_SNOW3G_PMD	("cryptodev_snow3g_pmd")
> >  /**< SNOW 3G PMD device name */
> > +#define CRYPTODEV_NAME_KASUMI_PMD	("cryptodev_kasumi_pmd")
> > +/**< KASUMI PMD device name */
> >
> >  /** Crypto device type */
> >  enum rte_cryptodev_type {
> >  	RTE_CRYPTODEV_NULL_PMD = 1,	/**< Null crypto PMD */
> >  	RTE_CRYPTODEV_AESNI_GCM_PMD,	/**< AES-NI GCM PMD */
> >  	RTE_CRYPTODEV_AESNI_MB_PMD,	/**< AES-NI multi buffer PMD
> */
> > +	RTE_CRYPTODEV_KASUMI_PMD,	/**< KASUMI PMD */
> Does adding new field into middle cause a ABI breakage?
> Since now value of below fields changed.

Right! Thanks for the catch, will send a patch to fix that.
> 
> Btw, librte_cryptodev is not listed in release notes, "shared library
> versions" section, not sure if this is intentional.
> 
> >  	RTE_CRYPTODEV_QAT_SYM_PMD,	/**< QAT PMD Symmetric
> Crypto */
> >  	RTE_CRYPTODEV_SNOW3G_PMD,	/**< SNOW 3G PMD */
> >  };
> 
> ...

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 1/3] kasumi: add new KASUMI PMD
  2016-07-06 11:26  3%     ` Ferruh Yigit
@ 2016-07-06 13:07  0%       ` Thomas Monjalon
  2016-07-06 13:22  0%       ` De Lara Guarch, Pablo
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-06 13:07 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, Pablo de Lara, declan.doherty, deepak.k.jain, reshma.pattan

2016-07-06 12:26, Ferruh Yigit:
> On 6/20/2016 3:40 PM, Pablo de Lara wrote:
> >  enum rte_cryptodev_type {
> >  	RTE_CRYPTODEV_NULL_PMD = 1,	/**< Null crypto PMD */
> >  	RTE_CRYPTODEV_AESNI_GCM_PMD,	/**< AES-NI GCM PMD */
> >  	RTE_CRYPTODEV_AESNI_MB_PMD,	/**< AES-NI multi buffer PMD */
> > +	RTE_CRYPTODEV_KASUMI_PMD,	/**< KASUMI PMD */
> Does adding new field into middle cause a ABI breakage?
> Since now value of below fields changed.
> 
> Btw, librte_cryptodev is not listed in release notes, "shared library
> versions" section, not sure if this is intentional.

Good catch!
Now that crypto is not experimental anymore, we must add cryptodev in
release notes. librte_pdump is also missing in this list.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] librte_pmd_bond: fix exported symbol versioning
@ 2016-07-06 11:39  3% Christian Ehrhardt
  2016-07-11 11:27  3% ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2016-07-06 11:39 UTC (permalink / raw)
  To: Eric Kinzie, christian.ehrhardt, thomas.monjalon, dev

The older versions of rte_eth_bond_8023ad_conf_get and
rte_eth_bond_8023ad_setup were available in the old way since 2.0 - at
least according to the map file.

But versioning in the code was set to 16.04.
That breaks compatibility checks for 2.0 on that library.

For example with the dpdk abi checker:
http://people.canonical.com/~paelzer/compat_report.html

To fix, version the old symbols on the 2.0 version as they were
initially added to the map file.

See http://people.canonical.com/~paelzer/compat_report.html

Fixes: dc40f17a ("net/bonding: allow external state machine in mode 4")

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 drivers/net/bonding/rte_eth_bond_8023ad.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
index 48a50e4..2f7ae70 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -1068,7 +1068,7 @@ bond_mode_8023ad_conf_assign(struct mode8023ad_private *mode4,
 }
 
 static void
-bond_mode_8023ad_setup_v1604(struct rte_eth_dev *dev,
+bond_mode_8023ad_setup_v20(struct rte_eth_dev *dev,
 		struct rte_eth_bond_8023ad_conf *conf)
 {
 	struct rte_eth_bond_8023ad_conf def_conf;
@@ -1214,7 +1214,7 @@ free_out:
 }
 
 int
-rte_eth_bond_8023ad_conf_get_v1604(uint8_t port_id,
+rte_eth_bond_8023ad_conf_get_v20(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf)
 {
 	struct rte_eth_dev *bond_dev;
@@ -1229,7 +1229,7 @@ rte_eth_bond_8023ad_conf_get_v1604(uint8_t port_id,
 	bond_mode_8023ad_conf_get(bond_dev, conf);
 	return 0;
 }
-VERSION_SYMBOL(rte_eth_bond_8023ad_conf_get, _v1604, 16.04);
+VERSION_SYMBOL(rte_eth_bond_8023ad_conf_get, _v20, 2.0);
 
 int
 rte_eth_bond_8023ad_conf_get_v1607(uint8_t port_id,
@@ -1278,7 +1278,7 @@ bond_8023ad_setup_validate(uint8_t port_id,
 }
 
 int
-rte_eth_bond_8023ad_setup_v1604(uint8_t port_id,
+rte_eth_bond_8023ad_setup_v20(uint8_t port_id,
 		struct rte_eth_bond_8023ad_conf *conf)
 {
 	struct rte_eth_dev *bond_dev;
@@ -1289,11 +1289,11 @@ rte_eth_bond_8023ad_setup_v1604(uint8_t port_id,
 		return err;
 
 	bond_dev = &rte_eth_devices[port_id];
-	bond_mode_8023ad_setup_v1604(bond_dev, conf);
+	bond_mode_8023ad_setup_v20(bond_dev, conf);
 
 	return 0;
 }
-VERSION_SYMBOL(rte_eth_bond_8023ad_setup, _v1604, 16.04);
+VERSION_SYMBOL(rte_eth_bond_8023ad_setup, _v20, 2.0);
 
 int
 rte_eth_bond_8023ad_setup_v1607(uint8_t port_id,
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/3] kasumi: add new KASUMI PMD
  @ 2016-07-06 11:26  3%     ` Ferruh Yigit
  2016-07-06 13:07  0%       ` Thomas Monjalon
  2016-07-06 13:22  0%       ` De Lara Guarch, Pablo
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2016-07-06 11:26 UTC (permalink / raw)
  To: Pablo de Lara, dev; +Cc: declan.doherty, deepak.k.jain

On 6/20/2016 3:40 PM, Pablo de Lara wrote:
> Added new SW PMD which makes use of the libsso_kasumi SW library,
> which provides wireless algorithms KASUMI F8 and F9
> in software.
> 
> This PMD supports cipher-only, hash-only and chained operations
> ("cipher then hash" and "hash then cipher") of the following
> algorithms:
> - RTE_CRYPTO_SYM_CIPHER_KASUMI_F8
> - RTE_CRYPTO_SYM_AUTH_KASUMI_F9
> 
> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> Acked-by: Jain, Deepak K <deepak.k.jain@intel.com>

...

> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -59,12 +59,15 @@ extern "C" {
>  /**< Intel QAT Symmetric Crypto PMD device name */
>  #define CRYPTODEV_NAME_SNOW3G_PMD	("cryptodev_snow3g_pmd")
>  /**< SNOW 3G PMD device name */
> +#define CRYPTODEV_NAME_KASUMI_PMD	("cryptodev_kasumi_pmd")
> +/**< KASUMI PMD device name */
>  
>  /** Crypto device type */
>  enum rte_cryptodev_type {
>  	RTE_CRYPTODEV_NULL_PMD = 1,	/**< Null crypto PMD */
>  	RTE_CRYPTODEV_AESNI_GCM_PMD,	/**< AES-NI GCM PMD */
>  	RTE_CRYPTODEV_AESNI_MB_PMD,	/**< AES-NI multi buffer PMD */
> +	RTE_CRYPTODEV_KASUMI_PMD,	/**< KASUMI PMD */
Does adding new field into middle cause a ABI breakage?
Since now value of below fields changed.

Btw, librte_cryptodev is not listed in release notes, "shared library
versions" section, not sure if this is intentional.

>  	RTE_CRYPTODEV_QAT_SYM_PMD,	/**< QAT PMD Symmetric Crypto */
>  	RTE_CRYPTODEV_SNOW3G_PMD,	/**< SNOW 3G PMD */
>  };

...

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4] Pci: Add the class_id support
  @ 2016-07-06 11:08  3%     ` Ferruh Yigit
  2016-07-07  7:46  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-07-06 11:08 UTC (permalink / raw)
  To: Thomas Monjalon, Ziye Yang; +Cc: dev

On 6/14/2016 3:52 PM, Thomas Monjalon wrote:
> 2016-05-24 20:50, Ziye Yang:
>> This patch is used to add the class_id (class_code,
>> subclass_code, programming_interface) support for
>> pci_device probe. With this patch, it will be
>> flexible for users to probe a class of devices
>> by class_id.
>>
>>
>> Signed-off-by: Ziye Yang <ziye.yang@intel.com>
> 
> Applied, thanks
> 
Hi Thomas, Ziye,

Is modification in public "struct rte_pci_id" is a ABI break?
If so, it requires eal LIBABIVER increase and release notes update.

Regards,
ferruh

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] mk: filter duplicate configuration entries
  @ 2016-07-06  5:37  3%       ` Christian Ehrhardt
  0 siblings, 0 replies; 200+ results
From: Christian Ehrhardt @ 2016-07-06  5:37 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Ferruh Yigit, dev

Hi,
I came up with something very similar when looking for tac replacements
yesterday, but had no time to finish things.
But your suggestion is even shorter - I had found "sed -n '1{h;T;};G;h;$p;'
file" or "sed -n '1!G;h;$p'".
That removes the tac dependency, which I agree is a good thing.

To chain things up without a temp file one would need the "in-place"
features of sed&awk which I'm not sure they are available (awk >=4.1 and
only GNU awk).
sed -i is only used in validate-abi.sh which might not be used on all
platforms to count as "-i is there already so I can use it".
And I really don't want to break anyone due to that change, just naively
clean up the resulting config a bit.
Also we already have a temp file .config_tmp in the same scope and remove
it on our own.
So it is not that much different to create and remove a second one for that
section.

Thanks for both of your feedback, submitting v3 now ...


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Tue, Jul 5, 2016 at 9:47 PM, Thomas Monjalon <thomas.monjalon@6wind.com>
wrote:

> 2016-07-05 17:47, Ferruh Yigit:
> > On 6/30/2016 1:00 PM, Christian Ehrhardt wrote:
> > > +           tac $(RTE_OUTPUT)/.config_tmp >
> $(RTE_OUTPUT)/.config_tmp_reverse ; \
> > Now we are adding new binary dependency (tac) to build system
>
> tac can be replaced by sed '1!G;h;$!d'
>
>

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [RFC] Generic flow director/filtering/classification API
@ 2016-07-05 18:16  2% Adrien Mazarguil
  2016-07-07  7:14  0% ` Lu, Wenzhuo
                   ` (5 more replies)
  0 siblings, 6 replies; 200+ results
From: Adrien Mazarguil @ 2016-07-05 18:16 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Helin Zhang, Jingjing Wu, Rasesh Mody,
	Ajit Khaparde, Rahul Lakkireddy, Wenzhuo Lu, Jan Medala,
	John Daley, Jing Chen, Konstantin Ananyev, Matej Vido,
	Alejandro Lucero, Sony Chacko, Jerin Jacob, Pablo de Lara,
	Olga Shern

Hi All,

First, forgive me for this large message, I know our mailboxes already
suffer quite a bit from the amount of traffic on this ML.

This is not exactly yet another thread about how flow director should be
extended, rather about a brand new API to handle filtering and
classification for incoming packets in the most PMD-generic and
application-friendly fashion we can come up with. Reasons described below.

I think this topic is important enough to include both the users of this API
as well as PMD maintainers. So far I have CC'ed librte_ether (especially
rte_eth_ctrl.h contributors), testpmd and PMD maintainers (with and without
a .filter_ctrl implementation), but if you know application maintainers
other than testpmd who use FDIR or might be interested in this discussion,
feel free to add them.

The issues we found with the current approach are already summarized in the
following document, but here is a quick summary for TL;DR folks:

- PMDs do not expose a common set of filter types and even when they do,
  their behavior more or less differs.

- Applications need to determine and adapt to device-specific limitations
  and quirks on their own, without help from PMDs.

- Writing an application that creates flow rules targeting all devices
  supported by DPDK is thus difficult, if not impossible.

- The current API has too many unspecified areas (particularly regarding
  side effects of flow rules) that make PMD implementation tricky.

This RFC API handles everything currently supported by .filter_ctrl, the
idea being to reimplement all of these to make them fully usable by
applications in a more generic and well defined fashion. It has a very small
set of mandatory features and an easy method to let applications probe for
supported capabilities.

The only downside is more work for the software control side of PMDs because
they have to adapt to the API instead of the reverse. I think helpers can be
added to EAL to assist with this.

HTML version:

 https://rawgit.com/6WIND/rte_flow/master/rte_flow.html

PDF version:

 https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf

Related draft header file (for reference while reading the specification):

 https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h

Git tree for completeness (latest .rst version can be retrieved from here):

 https://github.com/6WIND/rte_flow

What follows is the ReST source of the above, for inline comments and
discussion. I intend to update that specification accordingly.

========================
Generic filter interface
========================

.. footer::

   v0.6

.. contents::
.. sectnum::
.. raw:: pdf

   PageBreak

Overview
========

DPDK provides several competing interfaces added over time to perform packet
matching and related actions such as filtering and classification.

They must be extended to implement the features supported by newer devices
in order to expose them to applications, however the current design has
several drawbacks:

- Complicated filter combinations which have not been hard-coded cannot be
  expressed.
- Prone to API/ABI breakage when new features must be added to an existing
  filter type, which frequently happens.

>From an application point of view:

- Having disparate interfaces, all optional and lacking in features does not
  make this API easy to use.
- Seemingly arbitrary built-in limitations of filter types based on the
  device they were initially designed for.
- Undefined relationship between different filter types.
- High complexity, considerable undocumented and/or undefined behavior.

Considering the growing number of devices supported by DPDK, adding a new
filter type each time a new feature must be implemented is not sustainable
in the long term. Applications not written to target a specific device
cannot really benefit from such an API.

For these reasons, this document defines an extensible unified API that
encompasses and supersedes these legacy filter types.

.. raw:: pdf

   PageBreak

Current API
===========

Rationale
---------

The reason several competing (and mostly overlapping) filtering APIs are
present in DPDK is due to its nature as a thin layer between hardware and
software.

Each subsequent interface has been added to better match the capabilities
and limitations of the latest supported device, which usually happened to
need an incompatible configuration approach. Because of this, many ended up
device-centric and not usable by applications that were not written for that
particular device.

This document is not the first attempt to address this proliferation issue,
in fact a lot of work has already been done both to create a more generic
interface while somewhat keeping compatibility with legacy ones through a
common call interface (``rte_eth_dev_filter_ctrl()`` with the
``.filter_ctrl`` PMD callback in ``rte_ethdev.h``).

Today, these previously incompatible interfaces are known as filter types
(``RTE_ETH_FILTER_*`` from ``enum rte_filter_type`` in ``rte_eth_ctrl.h``).

However while trivial to extend with new types, it only shifted the
underlying problem as applications still need to be written for one kind of
filter type, which, as described in the following sections, is not
necessarily implemented by all PMDs that support filtering.

.. raw:: pdf

   PageBreak

Filter types
------------

This section summarizes the capabilities of each filter type.

Although the following list is exhaustive, the description of individual
types may contain inaccuracies due to the lack of documentation or usage
examples.

Note: names are prefixed with ``RTE_ETH_FILTER_``.

``MACVLAN``
~~~~~~~~~~~

Matching:

- L2 source/destination addresses.
- Optional 802.1Q VLAN ID.
- Masking individual fields on a rule basis is not supported.

Action:

- Packets are redirected either to a given VF device using its ID or to the
  PF.

``ETHERTYPE``
~~~~~~~~~~~~~

Matching:

- L2 source/destination addresses (optional).
- Ethertype (no VLAN ID?).
- Masking individual fields on a rule basis is not supported.

Action:

- Receive packets on a given queue.
- Drop packets.

``FLEXIBLE``
~~~~~~~~~~~~

Matching:

- At most 128 consecutive bytes anywhere in packets.
- Masking is supported with byte granularity.
- Priorities are supported (relative to this filter type, undefined
  otherwise).

Action:

- Receive packets on a given queue.

``SYN``
~~~~~~~

Matching:

- TCP SYN packets only.
- One high priority bit can be set to give the highest possible priority to
  this type when other filters with different types are configured.

Action:

- Receive packets on a given queue.

``NTUPLE``
~~~~~~~~~~

Matching:

- Source/destination IPv4 addresses (optional in 2-tuple mode).
- Source/destination TCP/UDP port (mandatory in 2 and 5-tuple modes).
- L4 protocol (2 and 5-tuple modes).
- Masking individual fields is supported.
- TCP flags.
- Up to 7 levels of priority relative to this filter type, undefined
  otherwise.
- No IPv6.

Action:

- Receive packets on a given queue.

``TUNNEL``
~~~~~~~~~~

Matching:

- Outer L2 source/destination addresses.
- Inner L2 source/destination addresses.
- Inner VLAN ID.
- IPv4/IPv6 source (destination?) address.
- Tunnel type to match (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE, 802.1BR
  E-Tag).
- Tenant ID for tunneling protocols that have one.
- Any combination of the above can be specified.
- Masking individual fields on a rule basis is not supported.

Action:

- Receive packets on a given queue.

.. raw:: pdf

   PageBreak

``FDIR``
~~~~~~~~

Queries:

- Device capabilities and limitations.
- Device statistics about configured filters (resource usage, collisions).
- Device configuration (matching input set and masks)

Matching:

- Device mode of operation: none (to disable filtering), signature
  (hash-based dispatching from masked fields) or perfect (either MAC VLAN or
  tunnel).
- L2 Ethertype.
- Outer L2 destination address (MAC VLAN mode).
- Inner L2 destination address, tunnel type (NVGRE, VXLAN) and tunnel ID
  (tunnel mode).
- IPv4 source/destination addresses, ToS, TTL and protocol fields.
- IPv6 source/destination addresses, TC, protocol and hop limits fields.
- UDP source/destination IPv4/IPv6 and ports.
- TCP source/destination IPv4/IPv6 and ports.
- SCTP source/destination IPv4/IPv6, ports and verification tag field.
- Note, only one protocol type at once (either only L2 Ethertype, basic
  IPv6, IPv4+UDP, IPv4+TCP and so on).
- VLAN TCI (extended API).
- At most 16 bytes to match in payload (extended API). A global device
  look-up table specifies for each possible protocol layer (unknown, raw,
  L2, L3, L4) the offset to use for each byte (they do not need to be
  contiguous) and the related bitmask.
- Whether packet is addressed to PF or VF, in that case its ID can be
  matched as well (extended API).
- Masking most of the above fields is supported, but simultaneously affects
  all filters configured on a device.
- Input set can be modified in a similar fashion for a given device to
  ignore individual fields of filters (i.e. do not match the destination
  address in a IPv4 filter, refer to **RTE_ETH_INPUT_SET_**
  macros). Configuring this also affects RSS processing on **i40e**.
- Filters can also provide 32 bits of arbitrary data to return as part of
  matched packets.

Action:

- **RTE_ETH_FDIR_ACCEPT**: receive (accept) packet on a given queue.
- **RTE_ETH_FDIR_REJECT**: drop packet immediately.
- **RTE_ETH_FDIR_PASSTHRU**: similar to accept for the last filter in list,
  otherwise process it with subsequent filters.
- For accepted packets and if requested by filter, either 32 bits of
  arbitrary data and four bytes of matched payload (only in case of flex
  bytes matching), or eight bytes of matched payload (flex also) are added
  to meta data.

.. raw:: pdf

   PageBreak

``HASH``
~~~~~~~~

Not an actual filter type. Provides and retrieves the global device
configuration (per port or entire NIC) for hash functions and their
properties.

Hash function selection: "default" (keep current), XOR or Toeplitz.

This function can be configured per flow type (**RTE_ETH_FLOW_**
definitions), supported types are:

- Unknown.
- Raw.
- Fragmented or non-fragmented IPv4.
- Non-fragmented IPv4 with L4 (TCP, UDP, SCTP or other).
- Fragmented or non-fragmented IPv6.
- Non-fragmented IPv6 with L4 (TCP, UDP, SCTP or other).
- L2 payload.
- IPv6 with extensions.
- IPv6 with L4 (TCP, UDP) and extensions.

``L2_TUNNEL``
~~~~~~~~~~~~~

Matching:

- All packets received on a given port.

Action:

- Add tunnel encapsulation (VXLAN, GENEVE, TEREDO, NVGRE, IP over GRE,
  802.1BR E-Tag) using the provided Ethertype and tunnel ID (only E-Tag
  is implemented at the moment).
- VF ID to use for tag insertion (currently unused).
- Destination pool for tag based forwarding (pools are IDs that can be
  affected to ports, duplication occurs if the same ID is shared by several
  ports of the same NIC).

.. raw:: pdf

   PageBreak

Driver support
--------------

======== ======= ========= ======== === ====== ====== ==== ==== =========
Driver   MACVLAN ETHERTYPE FLEXIBLE SYN NTUPLE TUNNEL FDIR HASH L2_TUNNEL
======== ======= ========= ======== === ====== ====== ==== ==== =========
bnx2x
cxgbe
e1000            yes       yes      yes yes
ena
enic                                                  yes
fm10k
i40e     yes     yes                           yes    yes  yes
ixgbe            yes                yes yes           yes       yes
mlx4
mlx5                                                  yes
szedata2
======== ======= ========= ======== === ====== ====== ==== ==== =========

Flow director
-------------

Flow director (FDIR) is the name of the most capable filter type, which
covers most features offered by others. As such, it is the most widespread
in PMDs that support filtering (i.e. all of them besides **e1000**).

It is also the only type that allows an arbitrary 32 bits value provided by
applications to be attached to a filter and returned with matching packets
instead of relying on the destination queue to recognize flows.

Unfortunately, even FDIR requires applications to be aware of low-level
capabilities and limitations (most of which come directly from **ixgbe** and
**i40e**):

- Bitmasks are set globally per device (port?), not per filter.
- Configuration state is not expected to be saved by the driver, and
  stopping/restarting a port requires the application to perform it again
  (API documentation is also unclear about this).
- Monolithic approach with ABI issues as soon as a new kind of flow or
  combination needs to be supported.
- Cryptic global statistics/counters.
- Unclear about how priorities are managed; filters seem to be arranged as a
  linked list in hardware (possibly related to configuration order).

Packet alteration
-----------------

One interesting feature is that the L2 tunnel filter type implements the
ability to alter incoming packets through a filter (in this case to
encapsulate them), thus the **mlx5** flow encap/decap features are not a
foreign concept.

.. raw:: pdf

   PageBreak

Proposed API
============

Terminology
-----------

- **Filtering API**: overall framework affecting the fate of selected
  packets, covers everything described in this document.
- **Matching pattern**: properties to look for in received packets, a
  combination of any number of items.
- **Pattern item**: part of a pattern that either matches packet data
  (protocol header, payload or derived information), or specifies properties
  of the pattern itself.
- **Actions**: what needs to be done when a packet matches a pattern.
- **Flow rule**: this is the result of combining a *matching pattern* with
  *actions*.
- **Filter rule**: a less generic term than *flow rule*, can otherwise be
  used interchangeably.
- **Hit**: a flow rule is said to be *hit* when processing a matching
  packet.

Requirements
------------

As described in the previous section, there is a growing need for a common
method to configure filtering and related actions in a hardware independent
fashion.

The filtering API should not disallow any filter combination by design and
must remain as simple as possible to use. It can simply be defined as a
method to perform one or several actions on selected packets.

PMDs are aware of the capabilities of the device they manage and should be
responsible for preventing unsupported or conflicting combinations.

This approach is fundamentally different as it places most of the burden on
the software side of the PMD instead of having device capabilities directly
mapped to API functions, then expecting applications to work around ensuing
compatibility issues.

Requirements for a new API:

- Flexible and extensible without causing API/ABI problems for existing
  applications.
- Should be unambiguous and easy to use.
- Support existing filtering features and actions listed in `Filter types`_.
- Support packet alteration.
- In case of overlapping filters, their priority should be well documented.
- Support filter queries (for example to retrieve counters).

.. raw:: pdf

   PageBreak

High level design
-----------------

The chosen approach to make filtering as generic as possible is by
expressing matching patterns through lists of items instead of the flat
structures used in DPDK today, enabling combinations that are not predefined
and thus being more versatile.

Flow rules can have several distinct actions (such as counting,
encapsulating, decapsulating before redirecting packets to a particular
queue, etc.), instead of relying on several rules to achieve this and having
applications deal with hardware implementation details regarding their
order.

Support for different priority levels on a rule basis is provided, for
example in order to force a more specific rule come before a more generic
one for packets matched by both, however hardware support for more than a
single priority level cannot be guaranteed. When supported, the number of
available priority levels is usually low, which is why they can also be
implemented in software by PMDs (e.g. to simulate missing priority levels by
reordering rules).

In order to remain as hardware agnostic as possible, by default all rules
are considered to have the same priority, which means that the order between
overlapping rules (when a packet is matched by several filters) is
undefined, packet duplication may even occur as a result.

PMDs may refuse to create overlapping rules at a given priority level when
they can be detected (e.g. if a pattern matches an existing filter).

Thus predictable results for a given priority level can only be achieved
with non-overlapping rules, using perfect matching on all protocol layers.

Support for multiple actions per rule may be implemented internally on top
of non-default hardware priorities, as a result both features may not be
simultaneously available to applications.

Considering that allowed pattern/actions combinations cannot be known in
advance and would result in an unpractically large number of capabilities to
expose, a method is provided to validate a given rule from the current
device configuration state without actually adding it (akin to a "dry run"
mode).

This enables applications to check if the rule types they need is supported
at initialization time, before starting their data path. This method can be
used anytime, its only requirement being that the resources needed by a rule
must exist (e.g. a target RX queue must be configured first).

Each defined rule is associated with an opaque handle managed by the PMD,
applications are responsible for keeping it. These can be used for queries
and rules management, such as retrieving counters or other data and
destroying them.

Handles must be destroyed before releasing associated resources such as
queues.

Integration
-----------

To avoid ABI breakage, this new interface will be implemented through the
existing filtering control framework (``rte_eth_dev_filter_ctrl()``) using
**RTE_ETH_FILTER_GENERIC** as a new filter type.

However a public front-end API described in `Rules management`_ will
be added as the preferred method to use it.

Once discussions with the community have converged to a definite API, legacy
filter types should be deprecated and a deadline defined to remove their
support entirely.

PMDs will have to be gradually converted to **RTE_ETH_FILTER_GENERIC** or
drop filtering support entirely. Less maintained PMDs for older hardware may
lose support at this point.

The notion of filter type will then be deprecated and subsequently dropped
to avoid confusion between both frameworks.

Implementation details
======================

Flow rule
---------

A flow rule is the combination of a matching pattern with a list of actions,
and is the basis of this API.

Priorities
~~~~~~~~~~

A priority can be assigned to a matching pattern.

The default priority level is 0 and is also the highest. Support for more
than a single priority level in hardware is not guaranteed.

If a packet is matched by several filters at a given priority level, the
outcome is undefined. It can take any path and can even be duplicated.

Matching pattern
~~~~~~~~~~~~~~~~

A matching pattern comprises any number of items of various types.

Items are arranged in a list to form a matching pattern for packets. They
fall in two categories:

- Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, VXLAN and so
  on), usually associated with a specification structure. These must be
  stacked in the same order as the protocol layers to match, starting from
  L2.

- Affecting how the pattern is processed (END, VOID, INVERT, PF, VF,
  SIGNATURE and so on), often without a specification structure. Since they
  are meta data that does not match packet contents, these can be specified
  anywhere within item lists without affecting the protocol matching items.

Most item specifications can be optionally paired with a mask to narrow the
specific fields or bits to be matched.

- Items are defined with ``struct rte_flow_item``.
- Patterns are defined with ``struct rte_flow_pattern``.

Example of an item specification matching an Ethernet header:

+-----------------------------------------+
| Ethernet                                |
+==========+=========+====================+
| ``spec`` | ``src`` | ``00:01:02:03:04`` |
|          +---------+--------------------+
|          | ``dst`` | ``00:2a:66:00:01`` |
+----------+---------+--------------------+
| ``mask`` | ``src`` | ``00:ff:ff:ff:00`` |
|          +---------+--------------------+
|          | ``dst`` | ``00:00:00:00:ff`` |
+----------+---------+--------------------+

Non-masked bits stand for any value, Ethernet headers with the following
properties are thus matched:

- ``src``: ``??:01:02:03:??``
- ``dst``: ``??:??:??:??:01``

Except for meta types that do not need one, ``spec`` must be a valid pointer
to a structure of the related item type. A ``mask`` of the same type can be
provided to tell which bits in ``spec`` are to be matched.

A mask is normally only needed for ``spec`` fields matching packet data,
ignored otherwise. See individual item types for more information.

A ``NULL`` mask pointer is allowed and is similar to matching with a full
mask (all ones) ``spec`` fields supported by hardware, the remaining fields
are ignored (all zeroes), there is thus no error checking for unsupported
fields.

Matching pattern items for packet data must be naturally stacked (ordered
from lowest to highest protocol layer), as in the following examples:

+--------------+
| TCPv4 as L4  |
+===+==========+
| 0 | Ethernet |
+---+----------+
| 1 | IPv4     |
+---+----------+
| 2 | TCP      |
+---+----------+

+----------------+
| TCPv6 in VXLAN |
+===+============+
| 0 | Ethernet   |
+---+------------+
| 1 | IPv4       |
+---+------------+
| 2 | UDP        |
+---+------------+
| 3 | VXLAN      |
+---+------------+
| 4 | Ethernet   |
+---+------------+
| 5 | IPv6       |
+---+------------+
| 6 | TCP        |
+---+------------+

+-----------------------------+
| TCPv4 as L4 with meta items |
+===+=========================+
| 0 | VOID                    |
+---+-------------------------+
| 1 | Ethernet                |
+---+-------------------------+
| 2 | VOID                    |
+---+-------------------------+
| 3 | IPv4                    |
+---+-------------------------+
| 4 | TCP                     |
+---+-------------------------+
| 5 | VOID                    |
+---+-------------------------+
| 6 | VOID                    |
+---+-------------------------+

The above example shows how meta items do not affect packet data matching
items, as long as those remain stacked properly. The resulting matching
pattern is identical to "TCPv4 as L4".

+----------------+
| UDPv6 anywhere |
+===+============+
| 0 | IPv6       |
+---+------------+
| 1 | UDP        |
+---+------------+

If supported by the PMD, omitting one or several protocol layers at the
bottom of the stack as in the above example (missing an Ethernet
specification) enables hardware to look anywhere in packets.

It is unspecified whether the payload of supported encapsulations
(e.g. VXLAN inner packet) is matched by such a pattern, which may apply to
inner, outer or both packets.

+---------------------+
| Invalid, missing L3 |
+===+=================+
| 0 | Ethernet        |
+---+-----------------+
| 1 | UDP             |
+---+-----------------+

The above pattern is invalid due to a missing L3 specification between L2
and L4. It is only allowed at the bottom and at the top of the stack.

Meta item types
~~~~~~~~~~~~~~~

These do not match packet data but affect how the pattern is processed, most
of them do not need a specification structure. This particularity allows
them to be specified anywhere without affecting other item types.

``END``
^^^^^^^

End marker for item lists. Prevents further processing of items, thereby
ending the pattern.

- Its numeric value is **0** for convenience.
- PMD support is mandatory.
- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| END                |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

``VOID``
^^^^^^^^

Used as a placeholder for convenience. It is ignored and simply discarded by
PMDs.

- PMD support is mandatory.
- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| VOID               |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

One usage example for this type is generating rules that share a common
prefix quickly without reallocating memory, only by updating item types:

+------------------------+
| TCP, UDP or ICMP as L4 |
+===+====================+
| 0 | Ethernet           |
+---+--------------------+
| 1 | IPv4               |
+---+------+------+------+
| 2 | UDP  | VOID | VOID |
+---+------+------+------+
| 3 | VOID | TCP  | VOID |
+---+------+------+------+
| 4 | VOID | VOID | ICMP |
+---+------+------+------+

.. raw:: pdf

   PageBreak

``INVERT``
^^^^^^^^^^

Inverted matching, i.e. process packets that do not match the pattern.

- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| INVERT             |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

Usage example in order to match non-TCPv4 packets only:

+--------------------+
| Anything but TCPv4 |
+===+================+
| 0 | INVERT         |
+---+----------------+
| 1 | Ethernet       |
+---+----------------+
| 2 | IPv4           |
+---+----------------+
| 3 | TCP            |
+---+----------------+

``PF``
^^^^^^

Matches packets addressed to the physical function of the device.

- Both ``spec`` and ``mask`` are ignored.

+--------------------+
| PF                 |
+==========+=========+
| ``spec`` | ignored |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

``VF``
^^^^^^

Matches packets addressed to the given virtual function ID of the device.

- Only ``spec`` needs to be defined, ``mask`` is ignored.

+----------------------------------------+
| VF                                     |
+==========+=========+===================+
| ``spec`` | ``vf``  | destination VF ID |
+----------+---------+-------------------+
| ``mask`` | ignored                     |
+----------+-----------------------------+

``SIGNATURE``
^^^^^^^^^^^^^

Requests hash-based signature dispatching for this rule.

Considering this is a global setting on devices that support it, all
subsequent filter rules may have to be created with it as well.

- Only ``spec`` needs to be defined, ``mask`` is ignored.

+--------------------+
| SIGNATURE          |
+==========+=========+
| ``spec`` | TBD     |
+----------+---------+
| ``mask`` | ignored |
+----------+---------+

.. raw:: pdf

   PageBreak

Data matching item types
~~~~~~~~~~~~~~~~~~~~~~~~

Most of these are basically protocol header definitions with associated
bitmasks. They must be specified (stacked) from lowest to highest protocol
layer.

The following list is not exhaustive as new protocols will be added in the
future.

``ANY``
^^^^^^^

Matches any protocol in place of the current layer, a single ANY may also
stand for several protocol layers.

This is usually specified as the first pattern item when looking for a
protocol anywhere in a packet.

- A maximum value of **0** requests matching any number of protocol layers
  above or equal to the minimum value, a maximum value lower than the
  minimum one is otherwise invalid.
- Only ``spec`` needs to be defined, ``mask`` is ignored.

+-----------------------------------------------------------------------+
| ANY                                                                   |
+==========+=========+==================================================+
| ``spec`` | ``min`` | minimum number of layers covered                 |
|          +---------+--------------------------------------------------+
|          | ``max`` | maximum number of layers covered, 0 for infinity |
+----------+---------+--------------------------------------------------+
| ``mask`` | ignored                                                    |
+----------+------------------------------------------------------------+

Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
or IPv6) matched by the second ANY specification:

+----------------------------------+
| TCP in VXLAN with wildcards      |
+===+==============================+
| 0 | Ethernet                     |
+---+-----+----------+---------+---+
| 1 | ANY | ``spec`` | ``min`` | 2 |
|   |     |          +---------+---+
|   |     |          | ``max`` | 2 |
+---+-----+----------+---------+---+
| 2 | VXLAN                        |
+---+------------------------------+
| 3 | Ethernet                     |
+---+-----+----------+---------+---+
| 4 | ANY | ``spec`` | ``min`` | 1 |
|   |     |          +---------+---+
|   |     |          | ``max`` | 1 |
+---+-----+----------+---------+---+
| 5 | TCP                          |
+---+------------------------------+

.. raw:: pdf

   PageBreak

``RAW``
^^^^^^^

Matches a string of a given length at a given offset (in bytes), or anywhere
in the payload of the current protocol layer (including L2 header if used as
the first item in the stack).

This does not increment the protocol layer count as it is not a protocol
definition. Subsequent RAW items modulate the first absolute one with
relative offsets.

- Using **-1** as the ``offset`` of the first RAW item makes its absolute
  offset not fixed, i.e. the pattern is searched everywhere.
- ``mask`` only affects the pattern.

+--------------------------------------------------------------+
| RAW                                                          |
+==========+=============+=====================================+
| ``spec`` | ``offset``  | absolute or relative pattern offset |
|          +-------------+-------------------------------------+
|          | ``length``  | pattern length                      |
|          +-------------+-------------------------------------+
|          | ``pattern`` | byte string of the above length     |
+----------+-------------+-------------------------------------+
| ``mask`` | ``offset``  | ignored                             |
|          +-------------+-------------------------------------+
|          | ``length``  | ignored                             |
|          +-------------+-------------------------------------+
|          | ``pattern`` | bitmask with the same byte length   |
+----------+-------------+-------------------------------------+

Example pattern looking for several strings at various offsets of a UDP
payload, using combined RAW items:

+------------------------------------------+
| UDP payload matching                     |
+===+======================================+
| 0 | Ethernet                             |
+---+--------------------------------------+
| 1 | IPv4                                 |
+---+--------------------------------------+
| 2 | UDP                                  |
+---+-----+----------+-------------+-------+
| 3 | RAW | ``spec`` | ``offset``  | -1    |
|   |     |          +-------------+-------+
|   |     |          | ``length``  | 3     |
|   |     |          +-------------+-------+
|   |     |          | ``pattern`` | "foo" |
+---+-----+----------+-------------+-------+
| 4 | RAW | ``spec`` | ``offset``  | 20    |
|   |     |          +-------------+-------+
|   |     |          | ``length``  | 3     |
|   |     |          +-------------+-------+
|   |     |          | ``pattern`` | "bar" |
+---+-----+----------+-------------+-------+
| 5 | RAW | ``spec`` | ``offset``  | -30   |
|   |     |          +-------------+-------+
|   |     |          | ``length``  | 3     |
|   |     |          +-------------+-------+
|   |     |          | ``pattern`` | "baz" |
+---+-----+----------+-------------+-------+

This translates to:

- Locate "foo" in UDP payload, remember its offset.
- Check "bar" at "foo"'s offset plus 20 bytes.
- Check "baz" at "foo"'s offset minus 30 bytes.

.. raw:: pdf

   PageBreak

``ETH``
^^^^^^^

Matches an Ethernet header.

- ``dst``: destination MAC.
- ``src``: source MAC.
- ``type``: EtherType.
- ``tags``: number of 802.1Q/ad tags defined.
- ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:

 - ``tpid``: Tag protocol identifier.
 - ``tci``: Tag control information.

``IPV4``
^^^^^^^^

Matches an IPv4 header.

- ``src``: source IP address.
- ``dst``: destination IP address.
- ``tos``: ToS/DSCP field.
- ``ttl``: TTL field.
- ``proto``: protocol number for the next layer.

``IPV6``
^^^^^^^^

Matches an IPv6 header.

- ``src``: source IP address.
- ``dst``: destination IP address.
- ``tc``: traffic class field.
- ``nh``: Next header field (protocol).
- ``hop_limit``: hop limit field (TTL).

``ICMP``
^^^^^^^^

Matches an ICMP header.

- TBD.

``UDP``
^^^^^^^

Matches a UDP header.

- ``sport``: source port.
- ``dport``: destination port.
- ``length``: UDP length.
- ``checksum``: UDP checksum.

.. raw:: pdf

   PageBreak

``TCP``
^^^^^^^

Matches a TCP header.

- ``sport``: source port.
- ``dport``: destination port.
- All other TCP fields and bits.

``VXLAN``
^^^^^^^^^

Matches a VXLAN header.

- TBD.

.. raw:: pdf

   PageBreak

Actions
~~~~~~~

Each possible action is represented by a type. Some have associated
configuration structures. Several actions combined in a list can be affected
to a flow rule. That list is not ordered.

At least one action must be defined in a filter rule in order to do
something with matched packets.

- Actions are defined with ``struct rte_flow_action``.
- A list of actions is defined with ``struct rte_flow_actions``.

They fall in three categories:

- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
  processing matched packets by subsequent flow rules, unless overridden
  with PASSTHRU.

- Non terminating actions (PASSTHRU, DUP) that leave matched packets up for
  additional processing by subsequent flow rules.

- Other non terminating meta actions that do not affect the fate of packets
  (END, VOID, ID, COUNT).

When several actions are combined in a flow rule, they should all have
different types (e.g. dropping a packet twice is not possible). However
considering the VOID type is an exception to this rule, the defined behavior
is for PMDs to only take into account the last action of a given type found
in the list. PMDs still perform error checking on the entire list.

*Note that PASSTHRU is the only action able to override a terminating rule.*

.. raw:: pdf

   PageBreak

Example of an action that redirects packets to queue index 10:

+----------------+
| QUEUE          |
+===========+====+
| ``queue`` | 10 |
+-----------+----+

Action lists examples, their order is not significant, applications must
consider all actions to be performed simultaneously:

+----------------+
| Count and drop |
+=======+========+
| COUNT |        |
+-------+--------+
| DROP  |        |
+-------+--------+

+--------------------------+
| Tag, count and redirect  |
+=======+===========+======+
| ID    | ``id``    | 0x2a |
+-------+-----------+------+
| COUNT |                  |
+-------+-----------+------+
| QUEUE | ``queue`` | 10   |
+-------+-----------+------+

+-----------------------+
| Redirect to queue 5   |
+=======+===============+
| DROP  |               |
+-------+-----------+---+
| QUEUE | ``queue`` | 5 |
+-------+-----------+---+

In the above example, considering both actions are performed simultaneously,
its end result is that only QUEUE has any effect.

+-----------------------+
| Redirect to queue 3   |
+=======+===========+===+
| QUEUE | ``queue`` | 5 |
+-------+-----------+---+
| VOID  |               |
+-------+-----------+---+
| QUEUE | ``queue`` | 3 |
+-------+-----------+---+

As previously described, only the last action of a given type found in the
list is taken into account. The above example also shows that VOID is
ignored.

.. raw:: pdf

   PageBreak

Action types
~~~~~~~~~~~~

Common action types are described in this section. Like pattern item types,
this list is not exhaustive as new actions will be added in the future.

``END`` (action)
^^^^^^^^^^^^^^^^

End marker for action lists. Prevents further processing of actions, thereby
ending the list.

- Its numeric value is **0** for convenience.
- PMD support is mandatory.
- No configurable property.

+---------------+
| END           |
+===============+
| no properties |
+---------------+

``VOID`` (action)
^^^^^^^^^^^^^^^^^

Used as a placeholder for convenience. It is ignored and simply discarded by
PMDs.

- PMD support is mandatory.
- No configurable property.

+---------------+
| VOID          |
+===============+
| no properties |
+---------------+

``PASSTHRU``
^^^^^^^^^^^^

Leaves packets up for additional processing by subsequent flow rules. This
is the default when a rule does not contain a terminating action, but can be
specified to force a rule to become non-terminating.

- No configurable property.

+---------------+
| PASSTHRU      |
+===============+
| no properties |
+---------------+

Example to copy a packet to a queue and continue processing by subsequent
flow rules:

+--------------------------+
| Copy to queue 8          |
+==========+===============+
| PASSTHRU |               |
+----------+-----------+---+
| QUEUE    | ``queue`` | 8 |
+----------+-----------+---+

``ID``
^^^^^^

Attaches a 32 bit value to packets.

+----------------------------------------------+
| ID                                           |
+========+=====================================+
| ``id`` | 32 bit value to return with packets |
+--------+-------------------------------------+

.. raw:: pdf

   PageBreak

``QUEUE``
^^^^^^^^^

Assigns packets to a given queue index.

- Terminating by default.

+--------------------------------+
| QUEUE                          |
+===========+====================+
| ``queue`` | queue index to use |
+-----------+--------------------+

``DROP``
^^^^^^^^

Drop packets.

- No configurable property.
- Terminating by default.
- PASSTHRU overrides this action if both are specified.

+---------------+
| DROP          |
+===============+
| no properties |
+---------------+

``COUNT``
^^^^^^^^^

Enables hits counter for this rule.

This counter can be retrieved and reset through ``rte_flow_query()``, see
``struct rte_flow_query_count``.

- Counters can be retrieved with ``rte_flow_query()``.
- No configurable property.

+---------------+
| COUNT         |
+===============+
| no properties |
+---------------+

Query structure to retrieve and reset the flow rule hits counter:

+------------------------------------------------+
| COUNT query                                    |
+===========+=====+==============================+
| ``reset`` | in  | reset counter after query    |
+-----------+-----+------------------------------+
| ``hits``  | out | number of hits for this flow |
+-----------+-----+------------------------------+

``DUP``
^^^^^^^

Duplicates packets to a given queue index.

This is normally combined with QUEUE, however when used alone, it is
actually similar to QUEUE + PASSTHRU.

- Non-terminating by default.

+------------------------------------------------+
| DUP                                            |
+===========+====================================+
| ``queue`` | queue index to duplicate packet to |
+-----------+------------------------------------+

.. raw:: pdf

   PageBreak

``RSS``
^^^^^^^

Similar to QUEUE, except RSS is additionally performed on packets to spread
them among several queues according to the provided parameters.

- Terminating by default.

+---------------------------------------------+
| RSS                                         |
+==============+==============================+
| ``rss_conf`` | RSS parameters               |
+--------------+------------------------------+
| ``queues``   | number of entries in queue[] |
+--------------+------------------------------+
| ``queue[]``  | queue indices to use         |
+--------------+------------------------------+

``PF`` (action)
^^^^^^^^^^^^^^^

Redirects packets to the physical function (PF) of the current device.

- No configurable property.
- Terminating by default.

+---------------+
| PF            |
+===============+
| no properties |
+---------------+

``VF`` (action)
^^^^^^^^^^^^^^^

Redirects packets to the virtual function (VF) of the current device with
the specified ID.

- Terminating by default.

+---------------------------------------+
| VF                                    |
+========+==============================+
| ``id`` | VF ID to redirect packets to |
+--------+------------------------------+

Planned types
~~~~~~~~~~~~~

Other action types are planned but not defined yet. These actions will add
the ability to alter matching packets in several ways, such as performing
encapsulation/decapsulation of tunnel headers on specific flows.

.. raw:: pdf

   PageBreak

Rules management
----------------

A simple API with only four functions is provided to fully manage flows.

Each created flow rule is associated with an opaque, PMD-specific handle
pointer. The application is responsible for keeping it until the rule is
destroyed.

Flows rules are defined with ``struct rte_flow``.

Validation
~~~~~~~~~~

Given that expressing a definite set of device capabilities with this API is
not practical, a dedicated function is provided to check if a flow rule is
supported and can be created.

::

 int
 rte_flow_validate(uint8_t port_id,
                   const struct rte_flow_pattern *pattern,
                   const struct rte_flow_actions *actions);

While this function has no effect on the target device, the flow rule is
validated against its current configuration state and the returned value
should be considered valid by the caller for that state only.

The returned value is guaranteed to remain valid only as long as no
successful calls to rte_flow_create() or rte_flow_destroy() are made in the
meantime and no device parameter affecting flow rules in any way are
modified, due to possible collisions or resource limitations (although in
such cases ``EINVAL`` should not be returned).

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``pattern``: pattern specification to check.
- ``actions``: actions associated with the flow definition.

Return value:

- **0** if flow rule is valid and can be created. A negative errno value
  otherwise (``rte_errno`` is also set), the following errors are defined.
- ``-EINVAL``: unknown or invalid rule specification.
- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial masks
  are unsupported).
- ``-EEXIST``: collision with an existing rule.
- ``-ENOMEM``: not enough resources.

.. raw:: pdf

   PageBreak

Creation
~~~~~~~~

Creating a flow rule is similar to validating one, except the rule is
actually created.

::

 struct rte_flow *
 rte_flow_create(uint8_t port_id,
                 const struct rte_flow_pattern *pattern,
                 const struct rte_flow_actions *actions);

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``pattern``: pattern specification to add.
- ``actions``: actions associated with the flow definition.

Return value:

A valid flow pointer in case of success, NULL otherwise and ``rte_errno`` is
set to the positive version of one of the error codes defined for
``rte_flow_validate()``.

Destruction
~~~~~~~~~~~

Flow rules destruction is not automatic, and a queue should not be released
if any are still attached to it. Applications must take care of performing
this step before releasing resources.

::

 int
 rte_flow_destroy(uint8_t port_id,
                  struct rte_flow *flow);


Failure to destroy a flow rule may occur when other flow rules depend on it,
and destroying it would result in an inconsistent state.

This function is only guaranteed to succeed if flow rules are destroyed in
reverse order of their creation.

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``flow``: flow rule to destroy.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

.. raw:: pdf

   PageBreak

Query
~~~~~

Query an existing flow rule.

This function allows retrieving flow-specific data such as counters. Data
is gathered by special actions which must be present in the flow rule
definition.

::

 int
 rte_flow_query(uint8_t port_id,
                struct rte_flow *flow,
                enum rte_flow_action_type action,
                void *data);

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``flow``: flow rule to query.
- ``action``: action type to query.
- ``data``: pointer to storage for the associated query data type.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

.. raw:: pdf

   PageBreak

Behavior
--------

- API operations are synchronous and blocking (``EAGAIN`` cannot be
  returned).

- There is no provision for reentrancy/multi-thread safety, although nothing
  should prevent different devices from being configured at the same
  time. PMDs may protect their control path functions accordingly.

- Stopping the data path (TX/RX) should not be necessary when managing flow
  rules. If this cannot be achieved naturally or with workarounds (such as
  temporarily replacing the burst function pointers), an appropriate error
  code must be returned (``EBUSY``).

- PMDs, not applications, are responsible for maintaining flow rules
  configuration when stopping and restarting a port or performing other
  actions which may affect them. They can only be destroyed explicitly.

.. raw:: pdf

   PageBreak

Compatibility
-------------

No known hardware implementation supports all the features described in this
document.

Unsupported features or combinations are not expected to be fully emulated
in software by PMDs for performance reasons. Partially supported features
may be completed in software as long as hardware performs most of the work
(such as queue redirection and packet recognition).

However PMDs are expected to do their best to satisfy application requests
by working around hardware limitations as long as doing so does not affect
the behavior of existing flow rules.

The following sections provide a few examples of such cases, they are based
on limitations built into the previous APIs.

Global bitmasks
~~~~~~~~~~~~~~~

Each flow rule comes with its own, per-layer bitmasks, while hardware may
support only a single, device-wide bitmask for a given layer type, so that
two IPv4 rules cannot use different bitmasks.

The expected behavior in this case is that PMDs automatically configure
global bitmasks according to the needs of the first created flow rule.

Subsequent rules are allowed only if their bitmasks match those, the
``EEXIST`` error code should be returned otherwise.

Unsupported layer types
~~~~~~~~~~~~~~~~~~~~~~~

Many protocols can be simulated by crafting patterns with the `RAW`_ type.

PMDs can rely on this capability to simulate support for protocols with
fixed headers not directly recognized by hardware.

``ANY`` pattern item
~~~~~~~~~~~~~~~~~~~~

This pattern item stands for anything, which can be difficult to translate
to something hardware would understand, particularly if followed by more
specific types.

Consider the following pattern:

+---+--------------------------------+
| 0 | ETHER                          |
+---+--------------------------------+
| 1 | ANY (``min`` = 1, ``max`` = 1) |
+---+--------------------------------+
| 2 | TCP                            |
+---+--------------------------------+

Knowing that TCP does not make sense with something other than IPv4 and IPv6
as L3, such a pattern may be translated to two flow rules instead:

+---+--------------------+
| 0 | ETHER              |
+---+--------------------+
| 1 | IPV4 (zeroed mask) |
+---+--------------------+
| 2 | TCP                |
+---+--------------------+

+---+--------------------+
| 0 | ETHER              |
+---+--------------------+
| 1 | IPV6 (zeroed mask) |
+---+--------------------+
| 2 | TCP                |
+---+--------------------+

Note that as soon as a ANY rule covers several layers, this approach may
yield a large number of hidden flow rules. It is thus suggested to only
support the most common scenarios (anything as L2 and/or L3).

.. raw:: pdf

   PageBreak

Unsupported actions
~~~~~~~~~~~~~~~~~~~

- When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
  tagging (`ID`_) may be implemented in software as long as the target queue
  is used by a single rule.

- A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
  rules combining `QUEUE`_ and `PASSTHRU`_.

- When a single target queue is provided, `RSS`_ can also be implemented
  through `QUEUE`_.

Flow rules priority
~~~~~~~~~~~~~~~~~~~

While it would naturally make sense, flow rules cannot be assumed to be
processed by hardware in the same order as their creation for several
reasons:

- They may be managed internally as a tree or a hash table instead of a
  list.
- Removing a flow rule before adding another one can either put the new rule
  at the end of the list or reuse a freed entry.
- Duplication may occur when packets are matched by several rules.

For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
predictable behavior is only guaranteed by using different priority levels.

Priority levels are not necessarily implemented in hardware, or may be
severely limited (e.g. a single priority bit).

For these reasons, priority levels may be implemented purely in software by
PMDs.

- For devices expecting flow rules to be added in the correct order, PMDs
  may destroy and re-create existing rules after adding a new one with
  a higher priority.

- A configurable number of dummy or empty rules can be created at
  initialization time to save high priority slots for later.

- In order to save priority levels, PMDs may evaluate whether rules are
  likely to collide and adjust their priority accordingly.

.. raw:: pdf

   PageBreak

API migration
=============

Exhaustive list of deprecated filter types and how to convert them to
generic flow rules.

``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
---------------------------------------

`MACVLAN`_ can be translated to a basic `ETH`_ flow rule with a `VF
(action)`_ or `PF (action)`_ terminating action.

+------------------------------------+
| MACVLAN                            |
+--------------------------+---------+
| Pattern                  | Actions |
+===+=====+==========+=====+=========+
| 0 | ETH | ``spec`` | any | VF,     |
|   |     +----------+-----+ PF      |
|   |     | ``mask`` | any |         |
+---+-----+----------+-----+---------+

``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
----------------------------------------------

`ETHERTYPE`_ is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as
a terminating action.

+------------------------------------+
| ETHERTYPE                          |
+--------------------------+---------+
| Pattern                  | Actions |
+===+=====+==========+=====+=========+
| 0 | ETH | ``spec`` | any | QUEUE,  |
|   |     +----------+-----+ DROP    |
|   |     | ``mask`` | any |         |
+---+-----+----------+-----+---------+

``FLEXIBLE`` to ``RAW`` → ``QUEUE``
-----------------------------------

`FLEXIBLE`_ can be translated to one `RAW`_ pattern with `QUEUE`_ as the
terminating action and a defined priority level.

+------------------------------------+
| FLEXIBLE                           |
+--------------------------+---------+
| Pattern                  | Actions |
+===+=====+==========+=====+=========+
| 0 | RAW | ``spec`` | any | QUEUE   |
|   |     +----------+-----+         |
|   |     | ``mask`` | any |         |
+---+-----+----------+-----+---------+

``SYN`` to ``TCP`` → ``QUEUE``
------------------------------

`SYN`_ is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
`QUEUE`_ as the terminating action.

Priority level can be set to simulate the high priority bit.

+---------------------------------------------+
| SYN                                         |
+-----------------------------------+---------+
| Pattern                           | Actions |
+===+======+==========+=============+=========+
| 0 | ETH  | ``spec`` | N/A         | QUEUE   |
|   |      +----------+-------------+         |
|   |      | ``mask`` | empty       |         |
+---+------+----------+-------------+         |
| 1 | IPV4 | ``spec`` | N/A         |         |
|   |      +----------+-------------+         |
|   |      | ``mask`` | empty       |         |
+---+------+----------+-------------+         |
| 2 | TCP  | ``spec`` | ``syn`` = 1 |         |
|   |      +----------+-------------+         |
|   |      | ``mask`` | ``syn`` = 1 |         |
+---+------+----------+-------------+---------+

``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
----------------------------------------------------

`NTUPLE`_ is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
`UDP`_ as L4 and `QUEUE`_ as the terminating action.

A priority level can be specified as well.

+---------------------------------------+
| NTUPLE                                |
+-----------------------------+---------+
| Pattern                     | Actions |
+===+======+==========+=======+=========+
| 0 | ETH  | ``spec`` | N/A   | QUEUE   |
|   |      +----------+-------+         |
|   |      | ``mask`` | empty |         |
+---+------+----------+-------+         |
| 1 | IPV4 | ``spec`` | any   |         |
|   |      +----------+-------+         |
|   |      | ``mask`` | any   |         |
+---+------+----------+-------+         |
| 2 | TCP, | ``spec`` | any   |         |
|   | UDP  +----------+-------+         |
|   |      | ``mask`` | any   |         |
+---+------+----------+-------+---------+

``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
---------------------------------------------------------------------------

`TUNNEL`_ matches common IPv4 and IPv6 L3/L4-based tunnel types.

In the following table, `ANY`_ is used to cover the optional L4.

+------------------------------------------------+
| TUNNEL                                         |
+--------------------------------------+---------+
| Pattern                              | Actions |
+===+=========+==========+=============+=========+
| 0 | ETH     | ``spec`` | any         | QUEUE   |
|   |         +----------+-------------+         |
|   |         | ``mask`` | any         |         |
+---+---------+----------+-------------+         |
| 1 | IPV4,   | ``spec`` | any         |         |
|   | IPV6    +----------+-------------+         |
|   |         | ``mask`` | any         |         |
+---+---------+----------+-------------+         |
| 2 | ANY     | ``spec`` | ``min`` = 0 |         |
|   |         |          +-------------+         |
|   |         |          | ``max`` = 0 |         |
|   |         +----------+-------------+         |
|   |         | ``mask`` | N/A         |         |
+---+---------+----------+-------------+         |
| 3 | VXLAN,  | ``spec`` | any         |         |
|   | GENEVE, +----------+-------------+         |
|   | TEREDO, | ``mask`` | any         |         |
|   | NVGRE,  |          |             |         |
|   | GRE,    |          |             |         |
|   | ...     |          |             |         |
+---+---------+----------+-------------+---------+

.. raw:: pdf

   PageBreak

``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
---------------------------------------------------------------

`FDIR`_ is more complex than any other type, there are several methods to
emulate its functionality. It is summarized for the most part in the table
below.

A few features are intentionally not supported:

- The ability to configure the matching input set and masks for the entire
  device, PMDs should take care of it automatically according to flow rules.

- Returning four or eight bytes of matched data when using flex bytes
  filtering. Although a specific action could implement it, it conflicts
  with the much more useful 32 bits tagging on devices that support it.

- Side effects on RSS processing of the entire device. Flow rules that
  conflict with the current device configuration should not be
  allowed. Similarly, device configuration should not be allowed when it
  affects existing flow rules.

- Device modes of operation. "none" is unsupported since filtering cannot be
  disabled as long as a flow rule is present.

- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
  according to the created flow rules.

+----------------------------------------------+
| FDIR                                         |
+---------------------------------+------------+
| Pattern                         | Actions    |
+===+============+==========+=====+============+
| 0 | ETH,       | ``spec`` | any | QUEUE,     |
|   | RAW        +----------+-----+ DROP,      |
|   |            | ``mask`` | any | PASSTHRU   |
+---+------------+----------+-----+------------+
| 1 | IPV4,      | ``spec`` | any | ID         |
|   | IPV6       +----------+-----+ (optional) |
|   |            | ``mask`` | any |            |
+---+------------+----------+-----+            |
| 2 | TCP,       | ``spec`` | any |            |
|   | UDP,       +----------+-----+            |
|   | SCTP       | ``mask`` | any |            |
+---+------------+----------+-----+            |
| 3 | VF,        | ``spec`` | any |            |
|   | PF,        +----------+-----+            |
|   | SIGNATURE  | ``mask`` | any |            |
|   | (optional) |          |     |            |
+---+------------+----------+-----+------------+

``HASH``
~~~~~~~~

Hashing configuration is set per rule through the `SIGNATURE`_ item.

Since it is usually a global device setting, all flow rules created with
this item may have to share the same specification.

``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All packets are matched. This type alters incoming packets to encapsulate
them in a chosen tunnel type, optionally redirect them to a VF as well.

The destination pool for tag based forwarding can be emulated with other
flow rules using `DUP`_ as the action.

+----------------------------------------+
| L2_TUNNEL                              |
+---------------------------+------------+
| Pattern                   | Actions    |
+===+======+==========+=====+============+
| 0 | VOID | ``spec`` | N/A | VXLAN,     |
|   |      |          |     | GENEVE,    |
|   |      |          |     | ...        |
|   |      +----------+-----+------------+
|   |      | ``mask`` | N/A | VF         |
|   |      |          |     | (optional) |
+---+------+----------+-----+------------+

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 01/18] doc: add template for release notes 16.11
  @ 2016-07-05 15:41  6% ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2016-07-05 15:41 UTC (permalink / raw)
  To: dev

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_16_11.rst | 160 +++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_16_11.rst

diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
new file mode 100644
index 0000000..0106bc9
--- /dev/null
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -0,0 +1,160 @@
+DPDK Release 16.11
+==================
+
+.. **Read this first.**
+
+   The text below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text: ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      firefox build/doc/html/guides/rel_notes/release_16_11.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to understand
+     the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like this.
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+
+Resolved Issues
+---------------
+
+.. This section should contain bug fixes added to the relevant sections. Sample format:
+
+   * **code/section Fixed issue in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description of the resolved issue in the past tense.
+     The title should contain the code/lib section like a commit message.
+     Add the entries in alphabetic order in the relevant sections below.
+
+
+EAL
+~~~
+
+
+Drivers
+~~~~~~~
+
+
+Libraries
+~~~~~~~~~
+
+
+Examples
+~~~~~~~~
+
+
+Other
+~~~~~
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+
+ABI Changes
+-----------
+
+.. * Add a short 1-2 sentence description of the ABI change that was announced in
+     the previous releases and made in this release. Use fixed width quotes for
+     ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+`` sign.
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     libethdev.so.3
+     librte_acl.so.2
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_distributor.so.1
+     librte_eal.so.2
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_ivshmem.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.2
+     librte_mempool.so.2
+     librte_meter.so.1
+     librte_pipeline.so.3
+     librte_pmd_bond.so.1
+     librte_pmd_ring.so.2
+     librte_port.so.2
+     librte_power.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_table.so.2
+     librte_timer.so.1
+     librte_vhost.so.2
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this
+   release.
+
+   The format is:
+
+   #. Platform name.
+
+      - Platform details.
+      - Platform details.
+
+
+Tested NICs
+-----------
+
+.. This section should contain a list of NICs that were tested with this release.
+
+   The format is:
+
+   #. NIC name.
+
+      - NIC details.
+      - NIC details.
-- 
2.8.1

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v9 7/7] tools: query binaries for support information
    2016-07-04  1:14  2%   ` [dpdk-dev] [PATCH v9 4/7] pmdinfogen: parse driver to generate code to export Thomas Monjalon
@ 2016-07-04  1:14  2%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-04  1:14 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev, Panu Matilainen

From: Neil Horman <nhorman@tuxdriver.com>

This tool searches for the primer string PMD_INFO_STRING= in any ELF binary,
and, if found parses the remainder of the string as a json encoded string,
outputting the results in either a human readable or raw, script parseable
format.

Note that, in the case of dynamically linked applications, pmdinfo.py will
scan for implicitly linked PMDs by searching the specified binaries
.dynamic section for DT_NEEDED entries that contain the substring
librte_pmd.  The DT_RUNPATH, LD_LIBRARY_PATH, /usr/lib and /lib are
searched for these libraries, in that order.

If a file is specified with no path, it is assumed to be a PMD DSO, and the
LD_LIBRARY_PATH, /usr/lib[64]/ and /lib[64] is searched for it.

Currently the tool can output data in 3 formats:
a) raw, suitable for scripting, where the raw JSON strings are dumped out
b) table format (default) where hex pci ids are dumped in a table format
c) pretty, where a user supplied pci.ids file is used to print out vendor
and device strings

There is a dependency on pyelftools.
The script is not yet compatible with Python 3.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   1 +
 lib/librte_eal/common/eal_common_options.c |   2 +-
 mk/rte.sdkinstall.mk                       |   2 +
 tools/dpdk-pmdinfo.py                      | 628 +++++++++++++++++++++++++++++
 4 files changed, 632 insertions(+), 1 deletion(-)
 create mode 100755 tools/dpdk-pmdinfo.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 1a8a3b7..1e972f0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -71,6 +71,7 @@ F: scripts/validate-abi.sh
 Driver information
 M: Neil Horman <nhorman@tuxdriver.com>
 F: buildtools/pmdinfogen/
+F: tools/dpdk-pmdinfo.py
 
 
 Environment Abstraction Layer
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 7e9f7b8..b562c8a 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -115,7 +115,7 @@ TAILQ_HEAD_INITIALIZER(solib_list);
 /* Default path of external loadable drivers */
 static const char *default_solib_dir = RTE_EAL_PMD_PATH;
 
-/* Stringified version of default solib path */
+/* Stringified version of default solib path used by dpdk-pmdinfo.py */
 static const char dpdk_solib_path[] __attribute__((used)) =
 "DPDK_PLUGIN_PATH=" RTE_EAL_PMD_PATH;
 
diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 2b92157..76be308 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -126,6 +126,8 @@ install-runtime:
 	$(Q)$(call rte_mkdir,      $(DESTDIR)$(sbindir))
 	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk_nic_bind.py, \
 	                           $(DESTDIR)$(sbindir)/dpdk_nic_bind)
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-pmdinfo.py, \
+	                           $(DESTDIR)$(bindir)/dpdk-pmdinfo)
 
 install-kmod:
 ifneq ($(wildcard $O/kmod/*),)
diff --git a/tools/dpdk-pmdinfo.py b/tools/dpdk-pmdinfo.py
new file mode 100755
index 0000000..b8a9be2
--- /dev/null
+++ b/tools/dpdk-pmdinfo.py
@@ -0,0 +1,628 @@
+#!/usr/bin/python
+# -------------------------------------------------------------------------
+#
+# Utility to dump PMD_INFO_STRING support from an object file
+#
+# -------------------------------------------------------------------------
+import os
+import sys
+from optparse import OptionParser
+import string
+import json
+
+# For running from development directory. It should take precedence over the
+# installed pyelftools.
+sys.path.insert(0, '.')
+
+
+from elftools import __version__
+from elftools.common.exceptions import ELFError
+from elftools.common.py3compat import (
+    ifilter, byte2int, bytes2str, itervalues, str2bytes)
+from elftools.elf.elffile import ELFFile
+from elftools.elf.dynamic import DynamicSection, DynamicSegment
+from elftools.elf.enums import ENUM_D_TAG
+from elftools.elf.segments import InterpSegment
+from elftools.elf.sections import SymbolTableSection
+from elftools.elf.gnuversions import (
+    GNUVerSymSection, GNUVerDefSection,
+    GNUVerNeedSection,
+)
+from elftools.elf.relocation import RelocationSection
+from elftools.elf.descriptions import (
+    describe_ei_class, describe_ei_data, describe_ei_version,
+    describe_ei_osabi, describe_e_type, describe_e_machine,
+    describe_e_version_numeric, describe_p_type, describe_p_flags,
+    describe_sh_type, describe_sh_flags,
+    describe_symbol_type, describe_symbol_bind, describe_symbol_visibility,
+    describe_symbol_shndx, describe_reloc_type, describe_dyn_tag,
+    describe_ver_flags,
+)
+from elftools.elf.constants import E_FLAGS
+from elftools.dwarf.dwarfinfo import DWARFInfo
+from elftools.dwarf.descriptions import (
+    describe_reg_name, describe_attr_value, set_global_machine_arch,
+    describe_CFI_instructions, describe_CFI_register_rule,
+    describe_CFI_CFA_rule,
+)
+from elftools.dwarf.constants import (
+    DW_LNS_copy, DW_LNS_set_file, DW_LNE_define_file)
+from elftools.dwarf.callframe import CIE, FDE
+
+raw_output = False
+pcidb = None
+
+# ===========================================
+
+
+class Vendor:
+    """
+    Class for vendors. This is the top level class
+    for the devices belong to a specific vendor.
+    self.devices is the device dictionary
+    subdevices are in each device.
+    """
+
+    def __init__(self, vendorStr):
+        """
+        Class initializes with the raw line from pci.ids
+        Parsing takes place inside __init__
+        """
+        self.ID = vendorStr.split()[0]
+        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
+        self.devices = {}
+
+    def add_device(self, deviceStr):
+        """
+        Adds a device to self.devices
+        takes the raw line from pci.ids
+        """
+        s = deviceStr.strip()
+        devID = s.split()[0]
+        if devID in self.devices:
+            pass
+        else:
+            self.devices[devID] = Device(deviceStr)
+
+    def report(self):
+        print self.ID, self.name
+        for id, dev in self.devices.items():
+            dev.report()
+
+    def find_device(self, devid):
+        # convert to a hex string and remove 0x
+        devid = hex(devid)[2:]
+        try:
+            return self.devices[devid]
+        except:
+            return Device("%s  Unknown Device" % devid)
+
+
+class Device:
+
+    def __init__(self, deviceStr):
+        """
+        Class for each device.
+        Each vendor has its own devices dictionary.
+        """
+        s = deviceStr.strip()
+        self.ID = s.split()[0]
+        self.name = s.replace("%s  " % self.ID, "")
+        self.subdevices = {}
+
+    def report(self):
+        print "\t%s\t%s" % (self.ID, self.name)
+        for subID, subdev in self.subdevices.items():
+            subdev.report()
+
+    def add_sub_device(self, subDeviceStr):
+        """
+        Adds a subvendor, subdevice to device.
+        Uses raw line from pci.ids
+        """
+        s = subDeviceStr.strip()
+        spl = s.split()
+        subVendorID = spl[0]
+        subDeviceID = spl[1]
+        subDeviceName = s.split("  ")[-1]
+        devID = "%s:%s" % (subVendorID, subDeviceID)
+        self.subdevices[devID] = SubDevice(
+            subVendorID, subDeviceID, subDeviceName)
+
+    def find_subid(self, subven, subdev):
+        subven = hex(subven)[2:]
+        subdev = hex(subdev)[2:]
+        devid = "%s:%s" % (subven, subdev)
+
+        try:
+            return self.subdevices[devid]
+        except:
+            if (subven == "ffff" and subdev == "ffff"):
+                return SubDevice("ffff", "ffff", "(All Subdevices)")
+            else:
+                return SubDevice(subven, subdev, "(Unknown Subdevice)")
+
+
+class SubDevice:
+    """
+    Class for subdevices.
+    """
+
+    def __init__(self, vendor, device, name):
+        """
+        Class initializes with vendorid, deviceid and name
+        """
+        self.vendorID = vendor
+        self.deviceID = device
+        self.name = name
+
+    def report(self):
+        print "\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name)
+
+
+class PCIIds:
+    """
+    Top class for all pci.ids entries.
+    All queries will be asked to this class.
+    PCIIds.vendors["0e11"].devices["0046"].\
+    subdevices["0e11:4091"].name  =  "Smart Array 6i"
+    """
+
+    def __init__(self, filename):
+        """
+        Prepares the directories.
+        Checks local data file.
+        Tries to load from local, if not found, downloads from web
+        """
+        self.version = ""
+        self.date = ""
+        self.vendors = {}
+        self.contents = None
+        self.read_local(filename)
+        self.parse()
+
+    def report_vendors(self):
+        """Reports the vendors
+        """
+        for vid, v in self.vendors.items():
+            print v.ID, v.name
+
+    def report(self, vendor=None):
+        """
+        Reports everything for all vendors or a specific vendor
+        PCIIds.report()  reports everything
+        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
+        """
+        if vendor is not None:
+            self.vendors[vendor].report()
+        else:
+            for vID, v in self.vendors.items():
+                v.report()
+
+    def find_vendor(self, vid):
+        # convert vid to a hex string and remove the 0x
+        vid = hex(vid)[2:]
+
+        try:
+            return self.vendors[vid]
+        except:
+            return Vendor("%s Unknown Vendor" % (vid))
+
+    def find_date(self, content):
+        for l in content:
+            if l.find("Date:") > -1:
+                return l.split()[-2].replace("-", "")
+        return None
+
+    def parse(self):
+        if len(self.contents) < 1:
+            print "data/%s-pci.ids not found" % self.date
+        else:
+            vendorID = ""
+            deviceID = ""
+            for l in self.contents:
+                if l[0] == "#":
+                    continue
+                elif len(l.strip()) == 0:
+                    continue
+                else:
+                    if l.find("\t\t") == 0:
+                        self.vendors[vendorID].devices[
+                            deviceID].add_sub_device(l)
+                    elif l.find("\t") == 0:
+                        deviceID = l.strip().split()[0]
+                        self.vendors[vendorID].add_device(l)
+                    else:
+                        vendorID = l.split()[0]
+                        self.vendors[vendorID] = Vendor(l)
+
+    def read_local(self, filename):
+        """
+        Reads the local file
+        """
+        self.contents = open(filename).readlines()
+        self.date = self.find_date(self.contents)
+
+    def load_local(self):
+        """
+        Loads database from local. If there is no file,
+        it creates a new one from web
+        """
+        self.date = idsfile[0].split("/")[1].split("-")[0]
+        self.read_local()
+
+
+# =======================================
+
+def search_file(filename, search_path):
+    """ Given a search path, find file with requested name """
+    for path in string.split(search_path, ":"):
+        candidate = os.path.join(path, filename)
+        if os.path.exists(candidate):
+            return os.path.abspath(candidate)
+    return None
+
+
+class ReadElf(object):
+    """ display_* methods are used to emit output into the output stream
+    """
+
+    def __init__(self, file, output):
+        """ file:
+                stream object with the ELF file to read
+
+            output:
+                output stream to write to
+        """
+        self.elffile = ELFFile(file)
+        self.output = output
+
+        # Lazily initialized if a debug dump is requested
+        self._dwarfinfo = None
+
+        self._versioninfo = None
+
+    def _section_from_spec(self, spec):
+        """ Retrieve a section given a "spec" (either number or name).
+            Return None if no such section exists in the file.
+        """
+        try:
+            num = int(spec)
+            if num < self.elffile.num_sections():
+                return self.elffile.get_section(num)
+            else:
+                return None
+        except ValueError:
+            # Not a number. Must be a name then
+            return self.elffile.get_section_by_name(str2bytes(spec))
+
+    def pretty_print_pmdinfo(self, pmdinfo):
+        global pcidb
+
+        for i in pmdinfo["pci_ids"]:
+            vendor = pcidb.find_vendor(i[0])
+            device = vendor.find_device(i[1])
+            subdev = device.find_subid(i[2], i[3])
+            print("%s (%s) : %s (%s) %s" %
+                  (vendor.name, vendor.ID, device.name,
+                   device.ID, subdev.name))
+
+    def parse_pmd_info_string(self, mystring):
+        global raw_output
+        global pcidb
+
+        optional_pmd_info = [{'id': 'params', 'tag': 'PMD PARAMETERS'}]
+
+        i = mystring.index("=")
+        mystring = mystring[i + 2:]
+        pmdinfo = json.loads(mystring)
+
+        if raw_output:
+            print(pmdinfo)
+            return
+
+        print("PMD NAME: " + pmdinfo["name"])
+        for i in optional_pmd_info:
+            try:
+                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
+            except KeyError as e:
+                continue
+
+        if (len(pmdinfo["pci_ids"]) != 0):
+            print("PMD HW SUPPORT:")
+            if pcidb is not None:
+                self.pretty_print_pmdinfo(pmdinfo)
+            else:
+                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
+                for i in pmdinfo["pci_ids"]:
+                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
+                          (i[0], i[1], i[2], i[3]))
+
+        print("")
+
+    def display_pmd_info_strings(self, section_spec):
+        """ Display a strings dump of a section. section_spec is either a
+            section number or a name.
+        """
+        section = self._section_from_spec(section_spec)
+        if section is None:
+            return
+
+        data = section.data()
+        dataptr = 0
+
+        while dataptr < len(data):
+            while (dataptr < len(data) and
+                    not (32 <= byte2int(data[dataptr]) <= 127)):
+                dataptr += 1
+
+            if dataptr >= len(data):
+                break
+
+            endptr = dataptr
+            while endptr < len(data) and byte2int(data[endptr]) != 0:
+                endptr += 1
+
+            mystring = bytes2str(data[dataptr:endptr])
+            rc = mystring.find("PMD_INFO_STRING")
+            if (rc != -1):
+                self.parse_pmd_info_string(mystring)
+
+            dataptr = endptr
+
+    def find_librte_eal(self, section):
+        for tag in section.iter_tags():
+            if tag.entry.d_tag == 'DT_NEEDED':
+                if "librte_eal" in tag.needed:
+                    return tag.needed
+        return None
+
+    def search_for_autoload_path(self):
+        scanelf = self
+        scanfile = None
+        library = None
+
+        section = self._section_from_spec(".dynamic")
+        try:
+            eallib = self.find_librte_eal(section)
+            if eallib is not None:
+                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
+                if ldlibpath is None:
+                    ldlibpath = ""
+                dtr = self.get_dt_runpath(section)
+                library = search_file(eallib,
+                                      dtr + ":" + ldlibpath +
+                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
+                if library is None:
+                    return (None, None)
+                if raw_output is False:
+                    print("Scanning for autoload path in %s" % library)
+                scanfile = open(library, 'rb')
+                scanelf = ReadElf(scanfile, sys.stdout)
+        except AttributeError:
+            # Not a dynamic binary
+            pass
+        except ELFError:
+            scanfile.close()
+            return (None, None)
+
+        section = scanelf._section_from_spec(".rodata")
+        if section is None:
+            if scanfile is not None:
+                scanfile.close()
+            return (None, None)
+
+        data = section.data()
+        dataptr = 0
+
+        while dataptr < len(data):
+            while (dataptr < len(data) and
+                    not (32 <= byte2int(data[dataptr]) <= 127)):
+                dataptr += 1
+
+            if dataptr >= len(data):
+                break
+
+            endptr = dataptr
+            while endptr < len(data) and byte2int(data[endptr]) != 0:
+                endptr += 1
+
+            mystring = bytes2str(data[dataptr:endptr])
+            rc = mystring.find("DPDK_PLUGIN_PATH")
+            if (rc != -1):
+                rc = mystring.find("=")
+                return (mystring[rc + 1:], library)
+
+            dataptr = endptr
+        if scanfile is not None:
+            scanfile.close()
+        return (None, None)
+
+    def get_dt_runpath(self, dynsec):
+        for tag in dynsec.iter_tags():
+            if tag.entry.d_tag == 'DT_RUNPATH':
+                return tag.runpath
+        return ""
+
+    def process_dt_needed_entries(self):
+        """ Look to see if there are any DT_NEEDED entries in the binary
+            And process those if there are
+        """
+        global raw_output
+        runpath = ""
+        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
+        if ldlibpath is None:
+            ldlibpath = ""
+
+        dynsec = self._section_from_spec(".dynamic")
+        try:
+            runpath = self.get_dt_runpath(dynsec)
+        except AttributeError:
+            # dynsec is None, just return
+            return
+
+        for tag in dynsec.iter_tags():
+            if tag.entry.d_tag == 'DT_NEEDED':
+                rc = tag.needed.find("librte_pmd")
+                if (rc != -1):
+                    library = search_file(tag.needed,
+                                          runpath + ":" + ldlibpath +
+                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
+                    if library is not None:
+                        if raw_output is False:
+                            print("Scanning %s for pmd information" % library)
+                        with open(library, 'rb') as file:
+                            try:
+                                libelf = ReadElf(file, sys.stdout)
+                            except ELFError as e:
+                                print("%s is no an ELF file" % library)
+                                continue
+                            libelf.process_dt_needed_entries()
+                            libelf.display_pmd_info_strings(".rodata")
+                            file.close()
+
+
+def scan_autoload_path(autoload_path):
+    global raw_output
+
+    if os.path.exists(autoload_path) is False:
+        return
+
+    try:
+        dirs = os.listdir(autoload_path)
+    except OSError as e:
+        # Couldn't read the directory, give up
+        return
+
+    for d in dirs:
+        dpath = os.path.join(autoload_path, d)
+        if os.path.isdir(dpath):
+            scan_autoload_path(dpath)
+        if os.path.isfile(dpath):
+            try:
+                file = open(dpath, 'rb')
+                readelf = ReadElf(file, sys.stdout)
+            except ELFError as e:
+                # this is likely not an elf file, skip it
+                continue
+            except IOError as e:
+                # No permission to read the file, skip it
+                continue
+
+            if raw_output is False:
+                print("Hw Support for library %s" % d)
+            readelf.display_pmd_info_strings(".rodata")
+            file.close()
+
+
+def scan_for_autoload_pmds(dpdk_path):
+    """
+    search the specified application or path for a pmd autoload path
+    then scan said path for pmds and report hw support
+    """
+    global raw_output
+
+    if (os.path.isfile(dpdk_path) is False):
+        if raw_output is False:
+            print("Must specify a file name")
+        return
+
+    file = open(dpdk_path, 'rb')
+    try:
+        readelf = ReadElf(file, sys.stdout)
+    except ElfError as e:
+        if raw_output is False:
+            print("Unable to parse %s" % file)
+        return
+
+    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
+    if (autoload_path is None or autoload_path is ""):
+        if (raw_output is False):
+            print("No autoload path configured in %s" % dpdk_path)
+        return
+    if (raw_output is False):
+        if (scannedfile is None):
+            scannedfile = dpdk_path
+        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
+
+    file.close()
+    if (raw_output is False):
+        print("Discovered Autoload HW Support:")
+    scan_autoload_path(autoload_path)
+    return
+
+
+def main(stream=None):
+    global raw_output
+    global pcidb
+
+    optparser = OptionParser(
+        usage='usage: %prog [-hrtp] [-d <pci id file] <elf-file>',
+        description="Dump pmd hardware support info",
+        add_help_option=True,
+        prog='pmdinfo.py')
+    optparser.add_option('-r', '--raw',
+                         action='store_true', dest='raw_output',
+                         help='Dump raw json strings')
+    optparser.add_option("-d", "--pcidb", dest="pcifile",
+                         help="specify a pci database "
+                              "to get vendor names from",
+                         default="/usr/share/hwdata/pci.ids", metavar="FILE")
+    optparser.add_option("-t", "--table", dest="tblout",
+                         help="output information on hw support as a hex table",
+                         action='store_true')
+    optparser.add_option("-p", "--plugindir", dest="pdir",
+                         help="scan dpdk for autoload plugins",
+                         action='store_true')
+
+    options, args = optparser.parse_args()
+
+    if options.raw_output:
+        raw_output = True
+
+    if options.pcifile:
+        pcidb = PCIIds(options.pcifile)
+        if pcidb is None:
+            print("Pci DB file not found")
+            exit(1)
+
+    if options.tblout:
+        options.pcifile = None
+        pcidb = None
+
+    if (len(args) == 0):
+        optparser.print_usage()
+        exit(1)
+
+    if options.pdir is True:
+        exit(scan_for_autoload_pmds(args[0]))
+
+    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
+    if (ldlibpath is None):
+        ldlibpath = ""
+
+    if (os.path.exists(args[0]) is True):
+        myelffile = args[0]
+    else:
+        myelffile = search_file(
+            args[0], ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
+
+    if (myelffile is None):
+        print("File not found")
+        sys.exit(1)
+
+    with open(myelffile, 'rb') as file:
+        try:
+            readelf = ReadElf(file, sys.stdout)
+            readelf.process_dt_needed_entries()
+            readelf.display_pmd_info_strings(".rodata")
+            sys.exit(0)
+
+        except ELFError as ex:
+            sys.stderr.write('ELF error: %s\n' % ex)
+            sys.exit(1)
+
+
+# -------------------------------------------------------------------------
+if __name__ == '__main__':
+    main()
-- 
2.7.0

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v9 4/7] pmdinfogen: parse driver to generate code to export
  @ 2016-07-04  1:14  2%   ` Thomas Monjalon
  2016-07-04  1:14  2%   ` [dpdk-dev] [PATCH v9 7/7] tools: query binaries for support information Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-07-04  1:14 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev, Panu Matilainen

From: Neil Horman <nhorman@tuxdriver.com>

dpdk-pmdinfogen is a tool used to parse object files and build JSON
strings for use in later determining hardware support in a DSO or
application binary.
dpdk-pmdinfogen looks for the non-exported symbol names rte_pmd_name<n>
(where n is a integer counter) and <name>_pci_table_export.
It records the name of each of these tuples, using the later to find
the symbolic name of the PCI table for physical devices that the object
supports.  With this information, it outputs a C file with a single line
of the form:

static char *<name>_pmd_info[] __attribute__((used)) = " \
	PMD_INFO_STRING=<json_string>";

Where <name> is the arbitrary name of the PMD, and <json_string> is the
JSON encoded string that hold relevant PMD information, including the PMD
name, type and optional array of PCI device/vendor IDs that the driver
supports.

This C file is suitable for compiling to object code, then relocatably
linking into the parent file from which the C was generated.  This creates
an entry in the string table of the object that can inform a later tool
about hardware support.

Note 1: When installed as part of a SDK package, dpdk-pmdinfogen should
        be built for the SDK target. It is not handled currently.
Note 2: Some generated files are not cleaned by "make clean".

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Remy Horton <remy.horton@intel.com>
Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 GNUmakefile                                    |   2 +-
 MAINTAINERS                                    |   4 +
 GNUmakefile => buildtools/Makefile             |  17 +-
 GNUmakefile => buildtools/pmdinfogen/Makefile  |  21 +-
 buildtools/pmdinfogen/pmdinfogen.c             | 451 +++++++++++++++++++++++++
 buildtools/pmdinfogen/pmdinfogen.h             |  99 ++++++
 doc/guides/prog_guide/dev_kit_build_system.rst |  15 +-
 mk/rte.sdkbuild.mk                             |   2 +-
 mk/rte.sdkinstall.mk                           |   3 +
 9 files changed, 587 insertions(+), 27 deletions(-)
 copy GNUmakefile => buildtools/Makefile (87%)
 copy GNUmakefile => buildtools/pmdinfogen/Makefile (84%)
 create mode 100644 buildtools/pmdinfogen/pmdinfogen.c
 create mode 100644 buildtools/pmdinfogen/pmdinfogen.h

diff --git a/GNUmakefile b/GNUmakefile
index b59e4b6..00fe0db 100644
--- a/GNUmakefile
+++ b/GNUmakefile
@@ -40,6 +40,6 @@ export RTE_SDK
 # directory list
 #
 
-ROOTDIRS-y := lib drivers app
+ROOTDIRS-y := buildtools lib drivers app
 
 include $(RTE_SDK)/mk/rte.sdkroot.mk
diff --git a/MAINTAINERS b/MAINTAINERS
index a59191e..1a8a3b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -68,6 +68,10 @@ F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: scripts/validate-abi.sh
 
+Driver information
+M: Neil Horman <nhorman@tuxdriver.com>
+F: buildtools/pmdinfogen/
+
 
 Environment Abstraction Layer
 -----------------------------
diff --git a/GNUmakefile b/buildtools/Makefile
similarity index 87%
copy from GNUmakefile
copy to buildtools/Makefile
index b59e4b6..35a42ff 100644
--- a/GNUmakefile
+++ b/buildtools/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2016 Neil Horman. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -29,17 +29,8 @@
 #   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 #   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
-#
-# Head Makefile for compiling rte SDK
-#
-
-RTE_SDK := $(CURDIR)
-export RTE_SDK
-
-#
-# directory list
-#
+include $(RTE_SDK)/mk/rte.vars.mk
 
-ROOTDIRS-y := lib drivers app
+DIRS-y += pmdinfogen
 
-include $(RTE_SDK)/mk/rte.sdkroot.mk
+include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/GNUmakefile b/buildtools/pmdinfogen/Makefile
similarity index 84%
copy from GNUmakefile
copy to buildtools/pmdinfogen/Makefile
index b59e4b6..327927e 100644
--- a/GNUmakefile
+++ b/buildtools/pmdinfogen/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2016 Neil Horman. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -29,17 +29,16 @@
 #   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 #   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
-#
-# Head Makefile for compiling rte SDK
-#
+include $(RTE_SDK)/mk/rte.vars.mk
 
-RTE_SDK := $(CURDIR)
-export RTE_SDK
+HOSTAPP_DIR = buildtools
+HOSTAPP = dpdk-pmdinfogen
 
-#
-# directory list
-#
+SRCS-y += pmdinfogen.c
+
+HOST_CFLAGS += $(WERROR_FLAGS) -g
+HOST_CFLAGS += -I$(RTE_OUTPUT)/include
 
-ROOTDIRS-y := lib drivers app
+DEPDIRS-y += lib/librte_eal
 
-include $(RTE_SDK)/mk/rte.sdkroot.mk
+include $(RTE_SDK)/mk/rte.hostapp.mk
diff --git a/buildtools/pmdinfogen/pmdinfogen.c b/buildtools/pmdinfogen/pmdinfogen.c
new file mode 100644
index 0000000..101bce1
--- /dev/null
+++ b/buildtools/pmdinfogen/pmdinfogen.c
@@ -0,0 +1,451 @@
+/* Postprocess pmd object files to export hw support
+ *
+ * Copyright 2016 Neil Horman <nhorman@tuxdriver.com>
+ * Based in part on modpost.c from the linux kernel
+ *
+ * This software may be used and distributed according to the terms
+ * of the GNU General Public License V2, incorporated herein by reference.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <ctype.h>
+#include <string.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <rte_common.h>
+#include "pmdinfogen.h"
+
+#ifdef RTE_ARCH_64
+#define ADDR_SIZE 64
+#else
+#define ADDR_SIZE 32
+#endif
+
+
+static void *
+grab_file(const char *filename, unsigned long *size)
+{
+	struct stat st;
+	void *map = MAP_FAILED;
+	int fd;
+
+	fd = open(filename, O_RDONLY);
+	if (fd < 0)
+		return NULL;
+	if (fstat(fd, &st))
+		goto failed;
+
+	*size = st.st_size;
+	map = mmap(NULL, *size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+
+failed:
+	close(fd);
+	if (map == MAP_FAILED)
+		return NULL;
+	return map;
+}
+
+/*
+ * Return a copy of the next line in a mmap'ed file.
+ * spaces in the beginning of the line is trimmed away.
+ * Return a pointer to a static buffer.
+ */
+static void
+release_file(void *file, unsigned long size)
+{
+	munmap(file, size);
+}
+
+/*
+ * Note, it seems odd that we have both a CONVERT_NATIVE and a TO_NATIVE macro
+ * below.  We do this because the values passed to TO_NATIVE may themselves be
+ * macros and need both macros here to get expanded.  Specifically its the width
+ * variable we are concerned with, because it needs to get expanded prior to
+ * string concatenation
+ */
+#define CONVERT_NATIVE(fend, width, x) ({ \
+typeof(x) ___x; \
+if ((fend) == ELFDATA2LSB) \
+	___x = rte_le_to_cpu_##width(x); \
+else \
+	___x = rte_be_to_cpu_##width(x); \
+	___x; \
+})
+
+#define TO_NATIVE(fend, width, x) CONVERT_NATIVE(fend, width, x)
+
+static int
+parse_elf(struct elf_info *info, const char *filename)
+{
+	unsigned int i;
+	Elf_Ehdr *hdr;
+	Elf_Shdr *sechdrs;
+	Elf_Sym  *sym;
+	int endian;
+	unsigned int symtab_idx = ~0U, symtab_shndx_idx = ~0U;
+
+	hdr = grab_file(filename, &info->size);
+	if (!hdr) {
+		perror(filename);
+		return -ENOENT;
+	}
+	info->hdr = hdr;
+	if (info->size < sizeof(*hdr)) {
+		/* file too small, assume this is an empty .o file */
+		return 0;
+	}
+	/* Is this a valid ELF file? */
+	if ((hdr->e_ident[EI_MAG0] != ELFMAG0) ||
+	    (hdr->e_ident[EI_MAG1] != ELFMAG1) ||
+	    (hdr->e_ident[EI_MAG2] != ELFMAG2) ||
+	    (hdr->e_ident[EI_MAG3] != ELFMAG3)) {
+		/* Not an ELF file - silently ignore it */
+		return 0;
+	}
+
+	if (!hdr->e_ident[EI_DATA]) {
+		/* Unknown endian */
+		return 0;
+	}
+
+	endian = hdr->e_ident[EI_DATA];
+
+	/* Fix endianness in ELF header */
+	hdr->e_type      = TO_NATIVE(endian, 16, hdr->e_type);
+	hdr->e_machine   = TO_NATIVE(endian, 16, hdr->e_machine);
+	hdr->e_version   = TO_NATIVE(endian, 32, hdr->e_version);
+	hdr->e_entry     = TO_NATIVE(endian, ADDR_SIZE, hdr->e_entry);
+	hdr->e_phoff     = TO_NATIVE(endian, ADDR_SIZE, hdr->e_phoff);
+	hdr->e_shoff     = TO_NATIVE(endian, ADDR_SIZE, hdr->e_shoff);
+	hdr->e_flags     = TO_NATIVE(endian, 32, hdr->e_flags);
+	hdr->e_ehsize    = TO_NATIVE(endian, 16, hdr->e_ehsize);
+	hdr->e_phentsize = TO_NATIVE(endian, 16, hdr->e_phentsize);
+	hdr->e_phnum     = TO_NATIVE(endian, 16, hdr->e_phnum);
+	hdr->e_shentsize = TO_NATIVE(endian, 16, hdr->e_shentsize);
+	hdr->e_shnum     = TO_NATIVE(endian, 16, hdr->e_shnum);
+	hdr->e_shstrndx  = TO_NATIVE(endian, 16, hdr->e_shstrndx);
+
+	sechdrs = RTE_PTR_ADD(hdr, hdr->e_shoff);
+	info->sechdrs = sechdrs;
+
+	/* Check if file offset is correct */
+	if (hdr->e_shoff > info->size) {
+		fprintf(stderr, "section header offset=%lu in file '%s' "
+			"is bigger than filesize=%lu\n",
+			(unsigned long)hdr->e_shoff,
+			filename, info->size);
+		return 0;
+	}
+
+	if (hdr->e_shnum == SHN_UNDEF) {
+		/*
+		 * There are more than 64k sections,
+		 * read count from .sh_size.
+		 */
+		info->num_sections = TO_NATIVE(endian, 32, sechdrs[0].sh_size);
+	} else {
+		info->num_sections = hdr->e_shnum;
+	}
+	if (hdr->e_shstrndx == SHN_XINDEX)
+		info->secindex_strings =
+			TO_NATIVE(endian, 32, sechdrs[0].sh_link);
+	else
+		info->secindex_strings = hdr->e_shstrndx;
+
+	/* Fix endianness in section headers */
+	for (i = 0; i < info->num_sections; i++) {
+		sechdrs[i].sh_name      =
+			TO_NATIVE(endian, 32, sechdrs[i].sh_name);
+		sechdrs[i].sh_type      =
+			TO_NATIVE(endian, 32, sechdrs[i].sh_type);
+		sechdrs[i].sh_flags     =
+			TO_NATIVE(endian, 32, sechdrs[i].sh_flags);
+		sechdrs[i].sh_addr      =
+			TO_NATIVE(endian, ADDR_SIZE, sechdrs[i].sh_addr);
+		sechdrs[i].sh_offset    =
+			TO_NATIVE(endian, ADDR_SIZE, sechdrs[i].sh_offset);
+		sechdrs[i].sh_size      =
+			TO_NATIVE(endian, 32, sechdrs[i].sh_size);
+		sechdrs[i].sh_link      =
+			TO_NATIVE(endian, 32, sechdrs[i].sh_link);
+		sechdrs[i].sh_info      =
+			TO_NATIVE(endian, 32, sechdrs[i].sh_info);
+		sechdrs[i].sh_addralign =
+			TO_NATIVE(endian, ADDR_SIZE, sechdrs[i].sh_addralign);
+		sechdrs[i].sh_entsize   =
+			TO_NATIVE(endian, ADDR_SIZE, sechdrs[i].sh_entsize);
+	}
+	/* Find symbol table. */
+	for (i = 1; i < info->num_sections; i++) {
+		int nobits = sechdrs[i].sh_type == SHT_NOBITS;
+
+		if (!nobits && sechdrs[i].sh_offset > info->size) {
+			fprintf(stderr, "%s is truncated. "
+				"sechdrs[i].sh_offset=%lu > sizeof(*hrd)=%zu\n",
+				filename, (unsigned long)sechdrs[i].sh_offset,
+				sizeof(*hdr));
+			return 0;
+		}
+
+		if (sechdrs[i].sh_type == SHT_SYMTAB) {
+			unsigned int sh_link_idx;
+			symtab_idx = i;
+			info->symtab_start = RTE_PTR_ADD(hdr,
+				sechdrs[i].sh_offset);
+			info->symtab_stop  = RTE_PTR_ADD(hdr,
+				sechdrs[i].sh_offset + sechdrs[i].sh_size);
+			sh_link_idx = sechdrs[i].sh_link;
+			info->strtab       = RTE_PTR_ADD(hdr,
+				sechdrs[sh_link_idx].sh_offset);
+		}
+
+		/* 32bit section no. table? ("more than 64k sections") */
+		if (sechdrs[i].sh_type == SHT_SYMTAB_SHNDX) {
+			symtab_shndx_idx = i;
+			info->symtab_shndx_start = RTE_PTR_ADD(hdr,
+				sechdrs[i].sh_offset);
+			info->symtab_shndx_stop  = RTE_PTR_ADD(hdr,
+				sechdrs[i].sh_offset + sechdrs[i].sh_size);
+		}
+	}
+	if (!info->symtab_start)
+		fprintf(stderr, "%s has no symtab?\n", filename);
+
+	/* Fix endianness in symbols */
+	for (sym = info->symtab_start; sym < info->symtab_stop; sym++) {
+		sym->st_shndx = TO_NATIVE(endian, 16, sym->st_shndx);
+		sym->st_name  = TO_NATIVE(endian, 32, sym->st_name);
+		sym->st_value = TO_NATIVE(endian, ADDR_SIZE, sym->st_value);
+		sym->st_size  = TO_NATIVE(endian, ADDR_SIZE, sym->st_size);
+	}
+
+	if (symtab_shndx_idx != ~0U) {
+		Elf32_Word *p;
+		if (symtab_idx != sechdrs[symtab_shndx_idx].sh_link)
+			fprintf(stderr,
+				"%s: SYMTAB_SHNDX has bad sh_link: %u!=%u\n",
+				filename, sechdrs[symtab_shndx_idx].sh_link,
+				symtab_idx);
+		/* Fix endianness */
+		for (p = info->symtab_shndx_start; p < info->symtab_shndx_stop; p++)
+			*p = TO_NATIVE(endian, 32, *p);
+	}
+
+	return 1;
+}
+
+static void
+parse_elf_finish(struct elf_info *info)
+{
+	struct pmd_driver *tmp, *idx = info->drivers;
+	release_file(info->hdr, info->size);
+	while (idx) {
+		tmp = idx->next;
+		free(idx);
+		idx = tmp;
+	}
+}
+
+static const char *
+get_sym_name(struct elf_info *elf, Elf_Sym *sym)
+{
+	if (sym)
+		return elf->strtab + sym->st_name;
+	else
+		return "(unknown)";
+}
+
+static void *
+get_sym_value(struct elf_info *info, const Elf_Sym *sym)
+{
+	return RTE_PTR_ADD(info->hdr,
+		info->sechdrs[sym->st_shndx].sh_offset + sym->st_value);
+}
+
+static Elf_Sym *
+find_sym_in_symtab(struct elf_info *info, const char *name, Elf_Sym *last)
+{
+	Elf_Sym *idx;
+	if (last)
+		idx = last+1;
+	else
+		idx = info->symtab_start;
+
+	for (; idx < info->symtab_stop; idx++) {
+		const char *n = get_sym_name(info, idx);
+		if (!strncmp(n, name, strlen(name)))
+			return idx;
+	}
+	return NULL;
+}
+
+struct opt_tag {
+	const char *suffix;
+	const char *json_id;
+};
+
+static const struct opt_tag opt_tags[] = {
+	{"_param_string_export", "params"},
+};
+
+static int
+complete_pmd_entry(struct elf_info *info, struct pmd_driver *drv)
+{
+	const char *tname;
+	int i;
+	char tmpsymname[128];
+	Elf_Sym *tmpsym;
+
+	drv->name = get_sym_value(info, drv->name_sym);
+
+	for (i = 0; i < PMD_OPT_MAX; i++) {
+		memset(tmpsymname, 0, 128);
+		sprintf(tmpsymname, "__%s%s", drv->name, opt_tags[i].suffix);
+		tmpsym = find_sym_in_symtab(info, tmpsymname, NULL);
+		if (!tmpsym)
+			continue;
+		drv->opt_vals[i] = get_sym_value(info, tmpsym);
+	}
+
+	memset(tmpsymname, 0, 128);
+	sprintf(tmpsymname, "__%s_pci_table_export", drv->name);
+
+	tmpsym = find_sym_in_symtab(info, tmpsymname, NULL);
+
+	/*
+	 * If this returns NULL, then this is a PMD_VDEV, because
+	 * it has no pci table reference
+	 */
+	if (!tmpsym) {
+		drv->pci_tbl = NULL;
+		return 0;
+	}
+
+	tname = get_sym_value(info, tmpsym);
+	tmpsym = find_sym_in_symtab(info, tname, NULL);
+	if (!tmpsym) {
+		fprintf(stderr, "No symbol %s\n", tname);
+		return -ENOENT;
+	}
+
+	drv->pci_tbl = (struct rte_pci_id *)get_sym_value(info, tmpsym);
+	if (!drv->pci_tbl) {
+		fprintf(stderr, "Failed to get PCI table %s\n", tname);
+		return -ENOENT;
+	}
+
+	return 0;
+}
+
+static int
+locate_pmd_entries(struct elf_info *info)
+{
+	Elf_Sym *last = NULL;
+	struct pmd_driver *new;
+
+	info->drivers = NULL;
+
+	do {
+		new = calloc(sizeof(struct pmd_driver), 1);
+		new->name_sym = find_sym_in_symtab(info, "rte_pmd_name", last);
+		last = new->name_sym;
+		if (!new->name_sym)
+			free(new);
+		else {
+			if (complete_pmd_entry(info, new)) {
+				fprintf(stderr, "Failed to complete pmd entry\n");
+				free(new);
+				return -ENOENT;
+			} else {
+				new->next = info->drivers;
+				info->drivers = new;
+			}
+		}
+	} while (last);
+
+	return 0;
+}
+
+static void
+output_pmd_info_string(struct elf_info *info, char *outfile)
+{
+	FILE *ofd;
+	struct pmd_driver *drv;
+	struct rte_pci_id *pci_ids;
+	int idx = 0;
+
+	ofd = fopen(outfile, "w+");
+	if (!ofd) {
+		fprintf(stderr, "Unable to open output file\n");
+		return;
+	}
+
+	drv = info->drivers;
+
+	while (drv) {
+		fprintf(ofd, "const char %s_pmd_info[] __attribute__((used)) = "
+			"\"PMD_INFO_STRING= {",
+			drv->name);
+		fprintf(ofd, "\\\"name\\\" : \\\"%s\\\", ", drv->name);
+
+		for (idx = 0; idx < PMD_OPT_MAX; idx++) {
+			if (drv->opt_vals[idx])
+				fprintf(ofd, "\\\"%s\\\" : \\\"%s\\\", ",
+					opt_tags[idx].json_id,
+					drv->opt_vals[idx]);
+		}
+
+		pci_ids = drv->pci_tbl;
+		fprintf(ofd, "\\\"pci_ids\\\" : [");
+
+		while (pci_ids && pci_ids->device_id) {
+			fprintf(ofd, "[%d, %d, %d, %d]",
+				pci_ids->vendor_id, pci_ids->device_id,
+				pci_ids->subsystem_vendor_id,
+				pci_ids->subsystem_device_id);
+			pci_ids++;
+			if (pci_ids->device_id)
+				fprintf(ofd, ",");
+			else
+				fprintf(ofd, " ");
+		}
+		fprintf(ofd, "]}\";");
+		drv = drv->next;
+	}
+
+	fclose(ofd);
+}
+
+int main(int argc, char **argv)
+{
+	struct elf_info info;
+	int rc;
+
+	if (argc < 3) {
+		fprintf(stderr,
+			"usage: dpdk-pmdinfogen <object file> <c output file>\n");
+		exit(127);
+	}
+
+	rc = parse_elf(&info, argv[1]);
+	if (rc < 0)
+		exit(-rc);
+
+	rc = locate_pmd_entries(&info);
+	if (rc < 0)
+		goto error;
+
+	if (info.drivers) {
+		output_pmd_info_string(&info, argv[2]);
+		rc = 0;
+	} else {
+		rc = -1;
+		fprintf(stderr, "No drivers registered\n");
+	}
+
+error:
+	parse_elf_finish(&info);
+	exit(-rc);
+}
diff --git a/buildtools/pmdinfogen/pmdinfogen.h b/buildtools/pmdinfogen/pmdinfogen.h
new file mode 100644
index 0000000..7e57702
--- /dev/null
+++ b/buildtools/pmdinfogen/pmdinfogen.h
@@ -0,0 +1,99 @@
+/* Postprocess pmd object files to export hw support
+ *
+ * Copyright 2016 Neil Horman <nhorman@tuxdriver.com>
+ * Based in part on modpost.c from the linux kernel
+ *
+ * This software may be used and distributed according to the terms
+ * of the GNU General Public License V2, incorporated herein by reference.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <elf.h>
+#include <rte_config.h>
+#include <rte_pci.h>
+#include <rte_byteorder.h>
+
+/* On BSD-alike OSes elf.h defines these according to host's word size */
+#undef ELF_ST_BIND
+#undef ELF_ST_TYPE
+#undef ELF_R_SYM
+#undef ELF_R_TYPE
+
+/*
+ * Define ELF64_* to ELF_*, the latter being defined in both 32 and 64 bit
+ * flavors in elf.h.  This makes our code a bit more generic between arches
+ * and allows us to support 32 bit code in the future should we ever want to
+ */
+#ifdef RTE_ARCH_64
+#define Elf_Ehdr    Elf64_Ehdr
+#define Elf_Shdr    Elf64_Shdr
+#define Elf_Sym     Elf64_Sym
+#define Elf_Addr    Elf64_Addr
+#define Elf_Sword   Elf64_Sxword
+#define Elf_Section Elf64_Half
+#define ELF_ST_BIND ELF64_ST_BIND
+#define ELF_ST_TYPE ELF64_ST_TYPE
+
+#define Elf_Rel     Elf64_Rel
+#define Elf_Rela    Elf64_Rela
+#define ELF_R_SYM   ELF64_R_SYM
+#define ELF_R_TYPE  ELF64_R_TYPE
+#else
+#define Elf_Ehdr    Elf32_Ehdr
+#define Elf_Shdr    Elf32_Shdr
+#define Elf_Sym     Elf32_Sym
+#define Elf_Addr    Elf32_Addr
+#define Elf_Sword   Elf32_Sxword
+#define Elf_Section Elf32_Half
+#define ELF_ST_BIND ELF32_ST_BIND
+#define ELF_ST_TYPE ELF32_ST_TYPE
+
+#define Elf_Rel     Elf32_Rel
+#define Elf_Rela    Elf32_Rela
+#define ELF_R_SYM   ELF32_R_SYM
+#define ELF_R_TYPE  ELF32_R_TYPE
+#endif
+
+
+enum opt_params {
+	PMD_PARAM_STRING = 0,
+	PMD_OPT_MAX
+};
+
+struct pmd_driver {
+	Elf_Sym *name_sym;
+	const char *name;
+	struct rte_pci_id *pci_tbl;
+	struct pmd_driver *next;
+
+	const char *opt_vals[PMD_OPT_MAX];
+};
+
+struct elf_info {
+	unsigned long size;
+	Elf_Ehdr     *hdr;
+	Elf_Shdr     *sechdrs;
+	Elf_Sym      *symtab_start;
+	Elf_Sym      *symtab_stop;
+	char         *strtab;
+
+	/* support for 32bit section numbers */
+
+	unsigned int num_sections; /* max_secindex + 1 */
+	unsigned int secindex_strings;
+	/* if Nth symbol table entry has .st_shndx = SHN_XINDEX,
+	 * take shndx from symtab_shndx_start[N] instead
+	 */
+	Elf32_Word   *symtab_shndx_start;
+	Elf32_Word   *symtab_shndx_stop;
+
+	struct pmd_driver *drivers;
+};
diff --git a/doc/guides/prog_guide/dev_kit_build_system.rst b/doc/guides/prog_guide/dev_kit_build_system.rst
index dedd18a..fa34fe0 100644
--- a/doc/guides/prog_guide/dev_kit_build_system.rst
+++ b/doc/guides/prog_guide/dev_kit_build_system.rst
@@ -70,7 +70,7 @@ Each build directory contains include files, libraries, and applications:
     ...
     ~/DEV/DPDK$ ls i686-native-linuxapp-gcc
 
-    app build hostapp include kmod lib Makefile
+    app build buildtools include kmod lib Makefile
 
 
     ~/DEV/DPDK$ ls i686-native-linuxapp-gcc/app/
@@ -307,6 +307,7 @@ Misc
 Internally Generated Build Tools
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+``dpdk-pmdinfogen`` scans an object (.o) file for various well known symbol names.
 These well known symbol names are defined by various macros and used to export
 important information about hardware support and usage for PMD files.  For
 instance the macro:
@@ -321,6 +322,18 @@ Creates the following symbol:
 
    static char rte_pmd_name0[] __attribute__((used)) = "<name>";
 
+Which dpdk-pmdinfogen scans for.  Using this information other relevant bits of
+data can be exported from the object file and used to produce a hardware support
+description, that dpdk-pmdinfogen then encodes into a json formatted string in
+the following format:
+
+.. code-block:: C
+
+   static char <name_pmd_string>="PMD_INFO_STRING=\"{'name' : '<name>', ...}\"";
+
+These strings can then be searched for by external tools to determine the
+hardware support of a given library or application.
+
 .. _Useful_Variables_Provided_by_the_Build_System:
 
 Useful Variables Provided by the Build System
diff --git a/mk/rte.sdkbuild.mk b/mk/rte.sdkbuild.mk
index f1a163a..5edbf50 100644
--- a/mk/rte.sdkbuild.mk
+++ b/mk/rte.sdkbuild.mk
@@ -63,7 +63,7 @@ build: $(ROOTDIRS-y)
 .PHONY: clean
 clean: $(CLEANDIRS)
 	@rm -rf $(RTE_OUTPUT)/include $(RTE_OUTPUT)/app \
-		$(RTE_OUTPUT)/lib $(RTE_OUTPUT)/kmod
+		$(RTE_OUTPUT)/lib $(RTE_OUTPUT)/kmod $(RTE_OUTPUT)/buildtools
 	@[ -d $(RTE_OUTPUT)/include ] || mkdir -p $(RTE_OUTPUT)/include
 	@$(RTE_SDK)/scripts/gen-config-h.sh $(RTE_OUTPUT)/.config \
 		> $(RTE_OUTPUT)/include/rte_config.h
diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index abdab0f..2b92157 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -141,10 +141,13 @@ install-sdk:
 	$(Q)$(call rte_mkdir,                            $(DESTDIR)$(sdkdir))
 	$(Q)cp -a               $(RTE_SDK)/mk            $(DESTDIR)$(sdkdir)
 	$(Q)cp -a               $(RTE_SDK)/scripts       $(DESTDIR)$(sdkdir)
+	$(Q)cp -a               $O/buildtools            $(DESTDIR)$(sdkdir)
 	$(Q)$(call rte_mkdir,                            $(DESTDIR)$(targetdir))
 	$(Q)cp -a               $O/.config               $(DESTDIR)$(targetdir)
 	$(Q)$(call rte_symlink, $(DESTDIR)$(includedir), $(DESTDIR)$(targetdir)/include)
 	$(Q)$(call rte_symlink, $(DESTDIR)$(libdir),     $(DESTDIR)$(targetdir)/lib)
+	$(Q)$(call rte_symlink, $(DESTDIR)$(sdkdir)/buildtools, \
+	                        $(DESTDIR)$(targetdir)/buildtools)
 
 install-doc:
 ifneq ($(wildcard $O/doc),)
-- 
2.7.0

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 00/20] vhost ABI/API refactoring
  2016-06-30 11:15  7%         ` Mcnamara, John
@ 2016-06-30 11:40  4%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-06-30 11:40 UTC (permalink / raw)
  To: Mcnamara, John
  Cc: Panu Matilainen, Yuanhan Liu, dev, Xie, Huawei, Rich Lane,
	Tetsuya Mukawa

2016-06-30 11:15, Mcnamara, John:
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Panu Matilainen
> > On 06/30/2016 10:57 AM, Yuanhan Liu wrote:
> > > On Thu, Jun 30, 2016 at 10:39:45AM +0300, Panu Matilainen wrote:
> > >> On 06/07/2016 06:51 AM, Yuanhan Liu wrote:
> > >>> v3: - adapted the new vhost ABI/API changes to tep_term example, to
> > make
> > >>>      sure not break build at least.
> > >>>    - bumped the ABI version to 3
> > >>>
> > >>> NOTE: I created a branch at dpdk.org [0] for more conveinient testing:
> > >>>
> > >>>    [0]: git://dpdk.org/next/dpdk-next-virtio for-testing
> > >>>
> > >>>
> > >>> Every time we introduce a new feature to vhost, we are likely to
> > >>> break ABI. Moreover, some cleanups (such as the one from Ilya to
> > >>> remove vec_buf
> > >> >from vhost_virtqueue struct) also break ABI.
> > >>>
> > >>> This patch set is meant to resolve above issue ultimately, by hiding
> > >>> virtio_net structure (as well as few others) internaly, and export
> > >>> the virtio_net dev strut to applications by a number, vid, like the
> > >>> way kernel exposes an fd to user space.
> > >>>
> > >>> Back to the patch set, the first part of this set makes some changes
> > >>> to vhost example, vhost-pmd and vhost, bit by bit, to remove the
> > >>> dependence to "virtio_net" struct. And then do the final change to
> > >>> make the current APIs to adapt to using "vid".
> > >>>
> > >>> After that, "vrtio_net_device_ops" is the only left open struct that
> > >>> an application can acces, therefore, it's the only place that might
> > >>> introduce potential ABI breakage in future for extension. Hence, I
> > >>> made few more
> > >>> (5) space reservation, to make sure we will not break ABI for a long
> > >>> time, and hopefuly, forever.
> > >>
> > >> Been intending to say this for a while but seems I never actually got
> > >> around to do so:
> > >>
> > >> This is a really fine example of how to refactor an API against
> > >> constant ABI breakages, thank you Yuanhan!
> > >
> > > Panu, thanks!
> > >
> > >> Exported structs are one of the biggest obstacles in keeping a stable
> > >> ABI while adding new features, and while its not always possible to
> > >> hide everything to this extent, the damage (erm,
> > >> exposure) can usually be considerably limited by careful API design.
> > >
> > > Agreed.
> > >
> > >> Since the first and the foremost objection against doing this in the
> > >> DPDK context is always "but performance!", I'm curious as to what
> > >> sort of numbers you're getting with the new API vs the old one? I'm
> > >> really hoping other libraries would follow suit after seeing that its
> > >> possible to provide a future-proof API/ABI without sacrificing
> > >> performance :)
> > >
> > > From my (limited) test, nope, I see no performance drop at all, not
> > > even a little.
> > 
> > Awesome!
> > 
> > With that, hopefully others will see the light and follow its example.
> > If nothing else, they ought to get a bit envious when you can add features
> > left and right without ever having to wait for API/ABI break periods etc
> > ;)
> 
> Agreed. We should be doing more of this type of refactoring work to make the API/ABI less easier to break.

+1
But we must check the possible performance degradation with care :)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 00/20] vhost ABI/API refactoring
  2016-06-30  9:05  7%       ` Panu Matilainen
@ 2016-06-30 11:15  7%         ` Mcnamara, John
  2016-06-30 11:40  4%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Mcnamara, John @ 2016-06-30 11:15 UTC (permalink / raw)
  To: Panu Matilainen, Yuanhan Liu
  Cc: dev, Xie, Huawei, Thomas Monjalon, Rich Lane, Tetsuya Mukawa



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Panu Matilainen
> Sent: Thursday, June 30, 2016 10:05 AM
> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Cc: dev@dpdk.org; Xie, Huawei <huawei.xie@intel.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>; Rich Lane <rich.lane@bigswitch.com>; Tetsuya
> Mukawa <mukawa@igel.co.jp>
> Subject: Re: [dpdk-dev] [PATCH v3 00/20] vhost ABI/API refactoring
> 
> On 06/30/2016 10:57 AM, Yuanhan Liu wrote:
> > On Thu, Jun 30, 2016 at 10:39:45AM +0300, Panu Matilainen wrote:
> >> On 06/07/2016 06:51 AM, Yuanhan Liu wrote:
> >>> v3: - adapted the new vhost ABI/API changes to tep_term example, to
> make
> >>>      sure not break build at least.
> >>>    - bumped the ABI version to 3
> >>>
> >>> NOTE: I created a branch at dpdk.org [0] for more conveinient testing:
> >>>
> >>>    [0]: git://dpdk.org/next/dpdk-next-virtio for-testing
> >>>
> >>>
> >>> Every time we introduce a new feature to vhost, we are likely to
> >>> break ABI. Moreover, some cleanups (such as the one from Ilya to
> >>> remove vec_buf
> >> >from vhost_virtqueue struct) also break ABI.
> >>>
> >>> This patch set is meant to resolve above issue ultimately, by hiding
> >>> virtio_net structure (as well as few others) internaly, and export
> >>> the virtio_net dev strut to applications by a number, vid, like the
> >>> way kernel exposes an fd to user space.
> >>>
> >>> Back to the patch set, the first part of this set makes some changes
> >>> to vhost example, vhost-pmd and vhost, bit by bit, to remove the
> >>> dependence to "virtio_net" struct. And then do the final change to
> >>> make the current APIs to adapt to using "vid".
> >>>
> >>> After that, "vrtio_net_device_ops" is the only left open struct that
> >>> an application can acces, therefore, it's the only place that might
> >>> introduce potential ABI breakage in future for extension. Hence, I
> >>> made few more
> >>> (5) space reservation, to make sure we will not break ABI for a long
> >>> time, and hopefuly, forever.
> >>
> >> Been intending to say this for a while but seems I never actually got
> >> around to do so:
> >>
> >> This is a really fine example of how to refactor an API against
> >> constant ABI breakages, thank you Yuanhan!
> >
> > Panu, thanks!
> >
> >> Exported structs are one of the biggest obstacles in keeping a stable
> >> ABI while adding new features, and while its not always possible to
> >> hide everything to this extent, the damage (erm,
> >> exposure) can usually be considerably limited by careful API design.
> >
> > Agreed.
> >
> >> Since the first and the foremost objection against doing this in the
> >> DPDK context is always "but performance!", I'm curious as to what
> >> sort of numbers you're getting with the new API vs the old one? I'm
> >> really hoping other libraries would follow suit after seeing that its
> >> possible to provide a future-proof API/ABI without sacrificing
> >> performance :)
> >
> > From my (limited) test, nope, I see no performance drop at all, not
> > even a little.
> 
> Awesome!
> 
> With that, hopefully others will see the light and follow its example.
> If nothing else, they ought to get a bit envious when you can add features
> left and right without ever having to wait for API/ABI break periods etc
> ;)

Agreed. We should be doing more of this type of refactoring work to make the API/ABI less easier to break.

John

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v3 00/20] vhost ABI/API refactoring
  2016-06-30  7:57  4%     ` Yuanhan Liu
@ 2016-06-30  9:05  7%       ` Panu Matilainen
  2016-06-30 11:15  7%         ` Mcnamara, John
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2016-06-30  9:05 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, huawei.xie, Thomas Monjalon, Rich Lane, Tetsuya Mukawa

On 06/30/2016 10:57 AM, Yuanhan Liu wrote:
> On Thu, Jun 30, 2016 at 10:39:45AM +0300, Panu Matilainen wrote:
>> On 06/07/2016 06:51 AM, Yuanhan Liu wrote:
>>> v3: - adapted the new vhost ABI/API changes to tep_term example, to make
>>>      sure not break build at least.
>>>    - bumped the ABI version to 3
>>>
>>> NOTE: I created a branch at dpdk.org [0] for more conveinient testing:
>>>
>>>    [0]: git://dpdk.org/next/dpdk-next-virtio for-testing
>>>
>>>
>>> Every time we introduce a new feature to vhost, we are likely to break
>>> ABI. Moreover, some cleanups (such as the one from Ilya to remove vec_buf
>> >from vhost_virtqueue struct) also break ABI.
>>>
>>> This patch set is meant to resolve above issue ultimately, by hiding
>>> virtio_net structure (as well as few others) internaly, and export the
>>> virtio_net dev strut to applications by a number, vid, like the way
>>> kernel exposes an fd to user space.
>>>
>>> Back to the patch set, the first part of this set makes some changes to
>>> vhost example, vhost-pmd and vhost, bit by bit, to remove the dependence
>>> to "virtio_net" struct. And then do the final change to make the current
>>> APIs to adapt to using "vid".
>>>
>>> After that, "vrtio_net_device_ops" is the only left open struct that an
>>> application can acces, therefore, it's the only place that might introduce
>>> potential ABI breakage in future for extension. Hence, I made few more
>>> (5) space reservation, to make sure we will not break ABI for a long time,
>>> and hopefuly, forever.
>>
>> Been intending to say this for a while but seems I never actually got around
>> to do so:
>>
>> This is a really fine example of how to refactor an API against constant ABI
>> breakages, thank you Yuanhan!
>
> Panu, thanks!
>
>> Exported structs are one of the biggest
>> obstacles in keeping a stable ABI while adding new features, and while its
>> not always possible to hide everything to this extent, the damage (erm,
>> exposure) can usually be considerably limited by careful API design.
>
> Agreed.
>
>> Since the first and the foremost objection against doing this in the DPDK
>> context is always "but performance!", I'm curious as to what sort of numbers
>> you're getting with the new API vs the old one? I'm really hoping other
>> libraries would follow suit after seeing that its possible to provide a
>> future-proof API/ABI without sacrificing performance :)
>
> From my (limited) test, nope, I see no performance drop at all, not even a
> little.

Awesome!

With that, hopefully others will see the light and follow its example. 
If nothing else, they ought to get a bit envious when you can add 
features left and right without ever having to wait for API/ABI break 
periods etc ;)

	- Panu -

>
> 	--yliu
>

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v3 00/20] vhost ABI/API refactoring
  @ 2016-06-30  7:57  4%     ` Yuanhan Liu
  2016-06-30  9:05  7%       ` Panu Matilainen
  0 siblings, 1 reply; 200+ results
From: Yuanhan Liu @ 2016-06-30  7:57 UTC (permalink / raw)
  To: Panu Matilainen
  Cc: dev, huawei.xie, Thomas Monjalon, Rich Lane, Tetsuya Mukawa

On Thu, Jun 30, 2016 at 10:39:45AM +0300, Panu Matilainen wrote:
> On 06/07/2016 06:51 AM, Yuanhan Liu wrote:
> >v3: - adapted the new vhost ABI/API changes to tep_term example, to make
> >      sure not break build at least.
> >    - bumped the ABI version to 3
> >
> >NOTE: I created a branch at dpdk.org [0] for more conveinient testing:
> >
> >    [0]: git://dpdk.org/next/dpdk-next-virtio for-testing
> >
> >
> >Every time we introduce a new feature to vhost, we are likely to break
> >ABI. Moreover, some cleanups (such as the one from Ilya to remove vec_buf
> >from vhost_virtqueue struct) also break ABI.
> >
> >This patch set is meant to resolve above issue ultimately, by hiding
> >virtio_net structure (as well as few others) internaly, and export the
> >virtio_net dev strut to applications by a number, vid, like the way
> >kernel exposes an fd to user space.
> >
> >Back to the patch set, the first part of this set makes some changes to
> >vhost example, vhost-pmd and vhost, bit by bit, to remove the dependence
> >to "virtio_net" struct. And then do the final change to make the current
> >APIs to adapt to using "vid".
> >
> >After that, "vrtio_net_device_ops" is the only left open struct that an
> >application can acces, therefore, it's the only place that might introduce
> >potential ABI breakage in future for extension. Hence, I made few more
> >(5) space reservation, to make sure we will not break ABI for a long time,
> >and hopefuly, forever.
> 
> Been intending to say this for a while but seems I never actually got around
> to do so:
> 
> This is a really fine example of how to refactor an API against constant ABI
> breakages, thank you Yuanhan!

Panu, thanks!

> Exported structs are one of the biggest
> obstacles in keeping a stable ABI while adding new features, and while its
> not always possible to hide everything to this extent, the damage (erm,
> exposure) can usually be considerably limited by careful API design.

Agreed.

> Since the first and the foremost objection against doing this in the DPDK
> context is always "but performance!", I'm curious as to what sort of numbers
> you're getting with the new API vs the old one? I'm really hoping other
> libraries would follow suit after seeing that its possible to provide a
> future-proof API/ABI without sacrificing performance :)

>From my (limited) test, nope, I see no performance drop at all, not even a
little.

	--yliu

^ permalink raw reply	[relevance 4%]

Results 11401-11600 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2016-05-10  9:13     [dpdk-dev] Ring PMD: why are stats counters atomic? Mauricio Vásquez
2016-05-10  9:36     ` Bruce Richardson
2016-05-16 13:12       ` Mauricio Vásquez
2016-05-16 13:16         ` Bruce Richardson
2016-08-15 20:41  0%       ` Mauricio Vásquez
2016-05-13  5:24     [dpdk-dev] [PATCH v2 00/19] vhost ABI/API refactoring Yuanhan Liu
2016-06-07  3:51     ` [dpdk-dev] [PATCH v3 00/20] " Yuanhan Liu
2016-06-30  7:39       ` Panu Matilainen
2016-06-30  7:57  4%     ` Yuanhan Liu
2016-06-30  9:05  7%       ` Panu Matilainen
2016-06-30 11:15  7%         ` Mcnamara, John
2016-06-30 11:40  4%           ` Thomas Monjalon
2016-05-16 13:18     [dpdk-dev] [PATCH 0/2] doc: announce ABI change of struct rte_port_source_params Fan Zhang
2016-05-19 14:18     ` [dpdk-dev] [PATCH v2] doc: announce ABI change of struct rte_port_source_params and rte_port_sink_params Fan Zhang
2016-07-27 10:08  9%   ` Dumitrescu, Cristian
2016-07-27 10:42  7%     ` Thomas Monjalon
2016-07-28 18:28  4%       ` Thomas Monjalon
2016-05-19 13:17     [dpdk-dev] [PATCH v3] ci: Add the class_id support in pci probe Ziye Yang
2016-05-24 12:50     ` [dpdk-dev] [PATCH v4] Pci: Add the class_id support Ziye Yang
2016-06-14 14:52       ` Thomas Monjalon
2016-07-06 11:08  3%     ` Ferruh Yigit
2016-07-07  7:46  0%       ` Thomas Monjalon
2016-06-17 10:32     [dpdk-dev] [PATCH v2 0/3] Add new KASUMI SW PMD Pablo de Lara
2016-06-20 14:40     ` [dpdk-dev] [PATCH v3 " Pablo de Lara
2016-06-20 14:40       ` [dpdk-dev] [PATCH v3 1/3] kasumi: add new KASUMI PMD Pablo de Lara
2016-07-06 11:26  3%     ` Ferruh Yigit
2016-07-06 13:07  0%       ` Thomas Monjalon
2016-07-06 13:22  0%       ` De Lara Guarch, Pablo
2016-06-17 18:46     [dpdk-dev] [PATCHv8 0/6] Implement pmd hardware support exports Neil Horman
2016-07-04  1:13     ` [dpdk-dev] [PATCH v9 0/7] export PMD infos Thomas Monjalon
2016-07-04  1:14  2%   ` [dpdk-dev] [PATCH v9 4/7] pmdinfogen: parse driver to generate code to export Thomas Monjalon
2016-07-04  1:14  2%   ` [dpdk-dev] [PATCH v9 7/7] tools: query binaries for support information Thomas Monjalon
2016-06-21 12:02     [dpdk-dev] [PATCH v4 00/17] prepare for rte_device / rte_driver Shreyansh Jain
2016-07-12  6:01     ` [dpdk-dev] [PATCH v6 00/17] Prepare " Shreyansh Jain
2016-07-12  6:01  3%   ` [dpdk-dev] [PATCH v6 04/17] eal: remove duplicate function declaration Shreyansh Jain
2016-07-14 17:13  0%     ` viktorin
2016-08-01 10:45     ` [dpdk-dev] [PATCH v7 00/17] Prepare for rte_device / rte_driver Shreyansh Jain
2016-08-01 10:45  3%   ` [dpdk-dev] [PATCH v7 04/17] eal: remove duplicate function declaration Shreyansh Jain
2016-08-26 13:56     ` [dpdk-dev] [PATCH v8 00/25] Introducing rte_driver/rte_device generalization Shreyansh Jain
2016-08-26 13:56  3%   ` [dpdk-dev] [PATCH v8 02/25] eal: remove duplicate function declaration Shreyansh Jain
2016-09-07 14:07     ` [dpdk-dev] [PATCH v9 00/25] Introducing rte_driver/rte_device generalization Shreyansh Jain
2016-09-07 14:07  3%   ` [dpdk-dev] [PATCH v9 02/25] eal: remove duplicate function declaration Shreyansh Jain
2016-09-16  4:29     ` [dpdk-dev] [PATCH v10 00/25] Introducing rte_driver/rte_device generalization Shreyansh Jain
2016-09-16  4:29  3%   ` [dpdk-dev] [PATCH v10 02/25] eal: remove duplicate function declaration Shreyansh Jain
2016-09-16 11:42  0%     ` Jan Viktorin
2016-09-20 12:41     ` [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization Shreyansh Jain
2016-09-20 12:41  3%   ` [dpdk-dev] [PATCH v11 01/24] eal: remove duplicate function declaration Shreyansh Jain
2016-10-03 14:28     ` [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization Thomas Monjalon
2016-10-04  6:51       ` Shreyansh Jain
2016-10-04  7:42         ` Thomas Monjalon
2016-10-05 11:57           ` Shreyansh Jain
2016-10-17 13:43  3%         ` Ferruh Yigit
2016-10-17 17:29  0%           ` Shreyansh Jain
2016-10-18  9:23  0%             ` Ferruh Yigit
2016-06-30 11:57     [dpdk-dev] [RFC] mk: filter duplicate configuration entries Christian Ehrhardt
2016-06-30 12:00     ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
2016-07-05 16:47       ` Ferruh Yigit
2016-07-05 19:47         ` Thomas Monjalon
2016-07-06  5:37  3%       ` Christian Ehrhardt
2016-07-05 15:41     [dpdk-dev] [PATCH 00/18] software parser for packet type Olivier Matz
2016-07-05 15:41  6% ` [dpdk-dev] [PATCH 01/18] doc: add template for release notes 16.11 Olivier Matz
2016-07-05 18:16  2% [dpdk-dev] [RFC] Generic flow director/filtering/classification API Adrien Mazarguil
2016-07-07  7:14  0% ` Lu, Wenzhuo
2016-07-07 10:26  2%   ` Adrien Mazarguil
2016-07-19  8:11  0%     ` Lu, Wenzhuo
2016-07-19 13:12  0%       ` Adrien Mazarguil
2016-07-20  2:16  0%         ` Lu, Wenzhuo
2016-07-20 10:41  2%           ` Adrien Mazarguil
2016-07-21  3:18  0%             ` Lu, Wenzhuo
2016-07-07 23:15  0% ` Chandran, Sugesh
2016-07-08 13:03  0%   ` Adrien Mazarguil
2016-07-08 11:11  0% ` Liang, Cunming
2016-07-11 10:41     ` Jerin Jacob
2016-07-21 19:20       ` Adrien Mazarguil
2016-07-23 21:10         ` John Fastabend
2016-08-02 18:19           ` John Fastabend
2016-08-03 14:30  2%         ` Adrien Mazarguil
2016-08-03 18:10  0%           ` John Fastabend
2016-08-04 13:05  0%             ` Adrien Mazarguil
2016-08-09 21:24  0%               ` John Fastabend
2016-07-21  8:13     ` Rahul Lakkireddy
2016-07-21 17:07       ` Adrien Mazarguil
2016-07-25 11:32         ` Rahul Lakkireddy
2016-07-25 16:40           ` John Fastabend
2016-07-26 10:07             ` Rahul Lakkireddy
2016-08-03 16:44  3%           ` Adrien Mazarguil
2016-08-03 19:11  0%             ` John Fastabend
2016-08-19 19:32  2% ` [dpdk-dev] [RFC v2] " Adrien Mazarguil
2016-08-19 19:32  1%   ` [dpdk-dev] [RFC v2] ethdev: introduce generic flow API Adrien Mazarguil
2016-08-22 18:20  0%     ` John Fastabend
2016-07-06 11:39  3% [dpdk-dev] [PATCH] librte_pmd_bond: fix exported symbol versioning Christian Ehrhardt
2016-07-11 11:27  3% ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
2016-07-11 12:58  0%   ` Thomas Monjalon
2016-07-06 14:05  3% [dpdk-dev] [PATCH] cryptodev: move new cryptodev type to bottom of enum Pablo de Lara
2016-07-08 17:52  0% ` Thomas Monjalon
2016-07-07 15:36     [dpdk-dev] [PATCH 00/11] additions to pmdinfogen Thomas Monjalon
2016-07-07 15:36  4% ` [dpdk-dev] [PATCH 11/11] maintainers: add section for pmdinfo Thomas Monjalon
2016-07-07 16:14  0%   ` Neil Horman
2016-07-08 10:14     ` [dpdk-dev] [PATCH v2 00/10] additions to pmdinfogen Thomas Monjalon
2016-07-08 10:14  4%   ` [dpdk-dev] [PATCH v2 10/10] maintainers: add section for pmdinfo Thomas Monjalon
2016-07-08 14:42       ` [dpdk-dev] [PATCH v3 00/10] additions to pmdinfogen Thomas Monjalon
2016-07-08 14:42  4%     ` [dpdk-dev] [PATCH v3 10/10] maintainers: add section for pmdinfo Thomas Monjalon
2016-07-13 13:02     [dpdk-dev] [PATCH v4 00/10] Fix build errors related to exported headers Adrien Mazarguil
2016-09-08 12:25     ` [dpdk-dev] [PATCH v5 " Adrien Mazarguil
2016-09-08 12:25  3%   ` [dpdk-dev] [PATCH v5 05/10] lib: work around unnamed structs/unions Adrien Mazarguil
2016-07-14 13:29     [dpdk-dev] rte_ether: Driver-specific stats getting overwritten Remy Horton
2016-07-14 13:37     ` Thomas Monjalon
2016-07-14 13:51       ` Igor Ryzhov
2016-07-14 15:50  3%     ` Remy Horton
2016-07-19 12:42     [dpdk-dev] [PATCH] rte_delay_us can be replaced with user function jozmarti
2016-07-19 13:17  3% ` Wiles, Keith
2016-07-19 13:16 13% [dpdk-dev] [PATCH v1] doc: fix release notes for 16.07 John McNamara
2016-07-19 14:01 13% [dpdk-dev] [PATCH] doc: announce ABI change for mbuf structure Olivier Matz
2016-07-19 14:40  4% ` Bruce Richardson
2016-07-19 15:04  7%   ` Olivier Matz
2016-07-19 15:07  4%     ` Richardson, Bruce
2016-07-19 15:28  4%       ` Olivier Matz
2016-07-20  7:16 13% ` [dpdk-dev] [PATCH v2] " Olivier Matz
2016-07-20  8:54  4%   ` Ferruh Yigit
2016-07-27  8:33  4%   ` Thomas Monjalon
2016-07-28 18:04  4%     ` Thomas Monjalon
2016-07-27  9:34  4%   ` Ananyev, Konstantin
2016-07-28  2:35  4%   ` John Daley (johndale)
2016-07-28  2:39  4%   ` Jerin Jacob
2016-07-19 14:37     [dpdk-dev] [PATCH] mempool: fix lack of free() registration Zoltan Kiss
2016-07-19 14:37  3% ` [dpdk-dev] [PATCH] mempool: adjust name string size in related data types Zoltan Kiss
2016-07-19 15:37  4%   ` Olivier Matz
2016-07-19 15:59  3%     ` Zoltan Kiss
2016-07-19 16:17  0%       ` Olivier Matz
2016-07-20 12:41  4%         ` Zoltan Kiss
2016-07-20 13:37  4%           ` Olivier Matz
2016-07-20 14:01  0%             ` Richardson, Bruce
2016-07-20 17:20  0%             ` Zoltan Kiss
2016-07-20 17:16 12%   ` [dpdk-dev] [PATCH v2] " Zoltan Kiss
2016-07-21 13:40  0%     ` Olivier Matz
2016-07-21 13:47  0%       ` Zoltan Kiss
2016-07-21 14:25  0%         ` Olivier Matz
2016-07-21 21:16  0%           ` Thomas Monjalon
2016-07-20 14:24  3% [dpdk-dev] [PATCH] unify tools naming Thomas Monjalon
2016-07-20 14:24 13% [dpdk-dev] [PATCH] doc: announce ABI change for rte_eth_dev structure Tomasz Kulasek
2016-07-20 15:01  4% ` Thomas Monjalon
2016-07-20 15:13  7%   ` Ananyev, Konstantin
2016-07-20 15:22  7%     ` Thomas Monjalon
2016-07-20 15:42  4%       ` Kulasek, TomaszX
2016-07-21 15:24 11% ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
2016-07-21 22:48  4%   ` Ananyev, Konstantin
2016-07-27  8:59  4%     ` Thomas Monjalon
2016-07-27 17:10  4%       ` Jerin Jacob
2016-07-27 17:33  4%         ` Ananyev, Konstantin
2016-07-27 17:41  4%           ` Jerin Jacob
2016-07-27 20:51  4%             ` Ananyev, Konstantin
2016-07-28  2:13  4%               ` Jerin Jacob
2016-07-28 10:36  4%                 ` Ananyev, Konstantin
2016-07-28 11:38  4%                   ` Jerin Jacob
2016-07-28 12:07  4%                     ` Avi Kivity
2016-07-28 13:01  4%                     ` Ananyev, Konstantin
2016-07-28 13:58  4%                       ` Olivier MATZ
2016-07-28 14:21  4%                         ` Ananyev, Konstantin
2016-07-28 13:59  4%                       ` Jerin Jacob
2016-07-28 14:52  4%                         ` Thomas Monjalon
2016-07-28 16:25  7%                           ` Jerin Jacob
2016-07-28 17:07  4%                             ` Thomas Monjalon
2016-07-31  7:50  4%     ` Vlad Zolotarov
2016-07-28 12:04  4%   ` Avi Kivity
2016-07-31  7:46  4% ` [dpdk-dev] [PATCH] " Vlad Zolotarov
2016-07-31  8:10  4%   ` Vlad Zolotarov
2016-07-20 16:35     [dpdk-dev] [PATCH] doc: announce ivshmem support removal Thomas Monjalon
2016-07-27 19:08     ` [dpdk-dev] " Jan Viktorin
2016-07-28  9:20  3%   ` Christian Ehrhardt
2016-07-28 15:23  0%     ` Mauricio Vasquez
2016-07-20 17:09  9% [dpdk-dev] [PATCH] validate_abi: build faster by augmenting make with job count Neil Horman
2016-07-20 17:40     Thomas Monjalon
2016-07-20 19:02  9% ` [dpdk-dev] [PATCH v2] " Neil Horman
2016-07-20 20:16     [dpdk-dev] [PATCH] " Neil Horman
2016-07-20 22:32     ` Wiles, Keith
2016-07-21 13:54       ` Neil Horman
2016-07-21 14:09         ` Wiles, Keith
2016-07-21 15:06           ` Neil Horman
2016-07-21 15:22             ` Wiles, Keith
2016-07-21 18:34               ` Neil Horman
2016-07-24 18:08                 ` Wiles, Keith
2016-08-01 11:49                   ` Neil Horman
2016-08-01 16:16                     ` Wiles, Keith
2016-08-01 18:08  4%                   ` Neil Horman
2016-07-26 16:22     [dpdk-dev] [PATCH] doc: announce renaming of ethdev library Thomas Monjalon
2016-07-27 16:33     ` [dpdk-dev] " Jan Viktorin
2016-07-28  9:29  3%   ` Christian Ehrhardt
2016-07-28  9:52  0%     ` Thomas Monjalon
2016-07-27  9:15  3% [dpdk-dev] last days for deprecation notices Thomas Monjalon
2016-07-27 22:14  3% [dpdk-dev] [bug] dpdk-vfio: Invalid region/index assumption Alex Williamson
2016-07-28  6:54  0% ` Thomas Monjalon
2016-07-28  9:42       ` Burakov, Anatoly
2016-07-28 14:54  3%     ` Alex Williamson
2016-07-28  8:06  0% ` Santosh Shukla
2016-07-28  8:29  4% [dpdk-dev] removal of old deprecation notice for Chelsio filtering Thomas Monjalon
2016-07-28 10:12  0% ` Rahul Lakkireddy
2016-07-28 10:15  7% [dpdk-dev] [PATCH] doc: remove deprecation notice related to new flow types Rahul Lakkireddy
2016-07-28 12:33  3% [dpdk-dev] DPDK Stable Releases and Long Term Support Mcnamara, John
2016-08-17 12:29  5% ` Panu Matilainen
2016-08-17 13:30  0%   ` Mcnamara, John
2016-07-29 11:23  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 16.11 John McNamara
2016-07-29 12:28  1% [dpdk-dev] [PATCH] ivshmem: remove integration in dpdk David Marchand
2016-07-29 13:01  9% [dpdk-dev] [PATCH] log: remove history dump Thomas Monjalon
2016-07-29 13:50  9% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
2016-07-29 13:41  8% [dpdk-dev] [PATCH] doc: postpone mempool ABI breakage Thomas Monjalon
2016-08-03 16:46  4% ` Thomas Monjalon
2016-08-01 13:47     [dpdk-dev] [PATCH] lpm: remove redundant check when adding lpm rule Wei Dai
2016-08-02  2:09     ` [dpdk-dev] [PATCH v2] " Wei Dai
2016-08-02 16:04       ` Bruce Richardson
2016-08-02 21:36  3%     ` Thomas Monjalon
2016-08-03  9:16  4%       ` Bruce Richardson
2016-08-09  1:01  1% [dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK Jerin Jacob
2016-08-09  8:48  0% ` Bruce Richardson
2016-08-09 16:30  4% [dpdk-dev] [PATCH 1/2] lib/librte_port: modify source and sink port structure parameter Jasvinder Singh
2016-08-15 15:02     [dpdk-dev] [PATCH] mk: gcc -march support for intel processors code names Reshma Pattan
2016-08-22 14:19  7% ` [dpdk-dev] [PATCH v2] " Reshma Pattan
2016-10-10 21:33  8%   ` [dpdk-dev] [PATCH v3] " Reshma Pattan
2016-08-16 14:01     [dpdk-dev] [PATCH] Performance optimization of ACL build process Vladyslav Buslov
2016-08-16 14:01     ` [dpdk-dev] [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov
2016-08-31  1:27       ` Ananyev, Konstantin
2016-08-31  8:38         ` Vladyslav Buslov
2016-08-31  9:59  3%       ` Ananyev, Konstantin
2016-08-17 12:34  3% [dpdk-dev] Best Practices for PMD Verification before Upstream Requests Shepard Siegel
2016-08-22 13:07  0% ` Thomas Monjalon
2016-08-18 13:48  3% [dpdk-dev] [RFC PATCH 0/5] add API's for VF management Bernard Iremonger
2016-08-26  9:10  3% ` [dpdk-dev] [RFC PATCH v2 " Bernard Iremonger
2016-09-09  8:49  0%   ` Pattan, Reshma
2016-09-16 11:05  3%   ` [dpdk-dev] [PATCH v3 0/3] " Bernard Iremonger
2016-09-16 14:15  3%   ` Bernard Iremonger
2016-09-21 10:20  3%     ` [dpdk-dev] [PATCH v4 " Bernard Iremonger
2016-08-23  8:10     [dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support Yuanhan Liu
2016-08-23  8:10  3% ` [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling Yuanhan Liu
2016-08-23  9:17  0%   ` Maxime Coquelin
2016-08-24  7:26  3%   ` Xu, Qian Q
2016-08-24  7:40  0%     ` Yuanhan Liu
2016-08-24  7:36  0%       ` Xu, Qian Q
2016-09-23  4:13     ` [dpdk-dev] [PATCH v2 0/7] vhost: add dequeue zero copy support Yuanhan Liu
2016-09-23  4:13  3%   ` [dpdk-dev] [PATCH v2 1/7] vhost: simplify memory regions handling Yuanhan Liu
2016-10-09  7:27       ` [dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support Yuanhan Liu
2016-10-09  7:27  3%     ` [dpdk-dev] [PATCH v3 1/7] vhost: simplify memory regions handling Yuanhan Liu
2016-08-26 15:06 19% [dpdk-dev] [PATCH] scripts: disable optimization for ABI validation Ferruh Yigit
2016-09-15 14:23  4% ` Thomas Monjalon
2016-09-02  6:36  4% [dpdk-dev] [PATCH 1/2] net/virtio: support modern device id Jason Wang
2016-09-02  8:58  4% [dpdk-dev] [PATCH 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
2016-09-02  8:58     [dpdk-dev] [PATCH 1/2] librte_ether: ensure not overwrite device data in mp app Marcin Kerlin
2016-09-20 14:06  4% ` [dpdk-dev] [PATCH v2 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
2016-09-20 14:31  4% ` Marcin Kerlin
2016-09-14 12:15     [dpdk-dev] [PATCH] vhost: change the vhost library to a common framework which can support more VIRTIO devices Changpeng Liu
2016-09-13 12:58  3% ` Yuanhan Liu
2016-09-13 13:24  0%   ` Thomas Monjalon
2016-09-13 13:49  3%     ` Yuanhan Liu
2016-09-19 13:42  2% [dpdk-dev] [RFC 0/7] changing mbuf pool handler Olivier Matz
2016-09-22 11:52  0% ` Hemant Agrawal
2016-10-03 15:49  0%   ` Olivier Matz
2016-10-05  9:41  0%     ` Hunt, David
2016-10-05 11:49  0%       ` Hemant Agrawal
2016-10-05 13:15  0%         ` Hunt, David
2016-09-20  7:30     [dpdk-dev] [RFC] examples/ethtool: enhance ethtool app in i40e Qiming Yang
2016-09-21 10:20  3% ` Remy Horton
2016-09-20 14:31     [dpdk-dev] [PATCH v2 1/2] librte_ether: ensure not overwrite device data in mp app Marcin Kerlin
2016-09-26 14:53  4% ` [dpdk-dev] [PATCH v3 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
2016-09-23 11:22 13% [dpdk-dev] [PATCH] doc: announce ABI changes in filtering support Your Name
2016-09-23 14:47     [dpdk-dev] [PATCH] eal: check cpu flags at init Flavio Leitner
2016-09-26 15:43  3% ` Aaron Conole
2016-09-27 18:32  0%   ` Flavio Leitner
2016-09-29 20:42  0%     ` Aaron Conole
2016-10-03 14:13  0%       ` Thomas Monjalon
2016-09-26 14:53     [dpdk-dev] [PATCH v3 1/2] librte_ether: ensure not overwrite device data in mp app Marcin Kerlin
2016-09-27 10:29  4% ` [dpdk-dev] [PATCH v4 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
2016-09-27 11:13  4% ` Marcin Kerlin
2016-09-27 11:13     [dpdk-dev] [PATCH v4 1/2] librte_ether: add protection against overwrite device data Marcin Kerlin
2016-09-30 14:00  4% ` [dpdk-dev] [PATCH v5 0/2] app/testpmd: improve multiprocess support Marcin Kerlin
2016-09-30 15:03  0%   ` Pattan, Reshma
2016-10-18  7:57  0%   ` Sergio Gonzalez Monroy
2016-09-28  8:25  4% [dpdk-dev] [PATCH V2 1/2] net/virtio: support modern device id Jason Wang
2016-09-30 15:45     [dpdk-dev] [PATCH v3 0/2] add callbacks for VF management Bernard Iremonger
2016-10-04 14:52     ` [dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions Bernard Iremonger
2016-10-05 16:10  3%   ` Thomas Monjalon
2016-10-05 17:04  4%     ` Iremonger, Bernard
2016-10-05 17:19  3%       ` Thomas Monjalon
2016-10-04 14:52     [dpdk-dev] [PATCH v4 0/2] add callbacks for VF management Bernard Iremonger
2016-10-06 11:26     ` [dpdk-dev] [PATCH v5 01/13] librte_ether: modify internal callback function Bernard Iremonger
2016-10-06 12:56       ` Thomas Monjalon
2016-10-06 14:33  3%     ` Iremonger, Bernard
2016-10-06 14:56  0%       ` Thomas Monjalon
2016-10-06 15:32  0%         ` Iremonger, Bernard
2016-10-07 16:10     [dpdk-dev] [RFC v2] latencystats: added new library for latency stats Reshma Pattan
2016-10-17 13:39     ` [dpdk-dev] [RFC v3] " Reshma Pattan
2016-10-18 10:44  3%   ` Pattan, Reshma
2016-10-09  3:16 13% [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance Qiming Yang
2016-10-10  9:42  0% [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API Zhao1, Wei
2016-10-10 13:19     ` Adrien Mazarguil
2016-10-11  1:47       ` Zhao1, Wei
2016-10-11  8:21  3%     ` Adrien Mazarguil
2016-10-12  2:38  0%       ` Zhao1, Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).