DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [dpdk-dev] [PATCH v1] doc: add guidelines on stable and lts releases
@ 2017-01-13 13:06  6% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2017-01-13 13:06 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, thomas.monjalon, John McNamara

Add document explaining the current Stable and LTS process.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---

V1: For background see previous discussions on Stable and LTS releses:

   http://dpdk.org/ml/archives/dev/2016-July/044848.html
   http://dpdk.org/ml/archives/dev/2016-June/040256.html

doc/guides/contributing/index.rst  |  1 +
 doc/guides/contributing/stable.rst | 99 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+)
 create mode 100644 doc/guides/contributing/stable.rst

diff --git a/doc/guides/contributing/index.rst b/doc/guides/contributing/index.rst
index f6af317..329b678 100644
--- a/doc/guides/contributing/index.rst
+++ b/doc/guides/contributing/index.rst
@@ -10,4 +10,5 @@ Contributor's Guidelines
     versioning
     documentation
     patches
+    stable
     cheatsheet
diff --git a/doc/guides/contributing/stable.rst b/doc/guides/contributing/stable.rst
new file mode 100644
index 0000000..735e116
--- /dev/null
+++ b/doc/guides/contributing/stable.rst
@@ -0,0 +1,99 @@
+.. stable_lts_releases:
+
+DPDK Stable Releases and Long Term Support
+==========================================
+
+This section sets out the guidelines for the DPDK Stable Releases and the DPDK
+Long Term Support releases (LTS).
+
+
+Introduction
+------------
+
+The purpose of the DPDK Stable Releases is to maintain releases of DPDK with
+backported fixes over an extended period of time. This provides downstream
+consumers of DPDK with a stable target on which to base applications or
+packages.
+
+The Long Term Support release (LTS) is a designation applied to a Stable
+Release to indicate longer term support.
+
+
+Stable Releases
+---------------
+
+Any major release of DPDK can be designated as a Stable Release if a
+maintainer volunteers to maintain it.
+
+A Stable Release is used to backport fixes from an ``N`` release back to an
+``N-1`` release, for example, from 16.11 to 16.07.
+
+The duration of a stable is one complete release cycle (3 months). It can be
+longer, up to 1 year, if a maintainer continues to support the stable branch,
+or if users supply backported fixes, however the explicit commitment should be
+for one release cycle.
+
+The release cadence is determined by the maintainer based on the number of
+bugfixes and the criticality of the bugs. Releases should be coordinated with
+the validation engineers to ensure that a tagged release has been tested.
+
+
+LTS Release
+-----------
+
+A stable release can be designated as an LTS release based on community
+agreement and a commitment from a maintainer. An LTS release will have a
+maintenance duration of 2 years.
+
+The current DPDK LTS release is 16.11.
+
+It is anticipated that there will be at least 4 releases per year of the LTS
+or approximately 1 every 3 months. However, the cadence can be shorter or
+longer depending on the number and criticality of the backported
+fixes. Releases should be coordinated with the validation engineers to ensure
+that a tagged release has been tested.
+
+
+What changes should be backported
+---------------------------------
+
+Backporting should be limited to bug fixes.
+
+Features should not be backported to stable releases. It may be acceptable, in
+limited cases, to back port features for the LTS release where:
+
+* There is a justifiable use case (for example a new PMD).
+* The change is non-invasive.
+* The work of preparing the backport is done by the proposer.
+* There is support within the community.
+
+
+The Stable Mailing List
+-----------------------
+
+The Stable and LTS release are coordinated on the stable@dpdk.org mailing
+list.
+
+All fix patches to the master branch that are candidates for backporting
+should also be CCed to the `stable@dpdk.org <http://dpdk.org/ml/listinfo/stable>`_
+mailing list.
+
+
+Releasing
+---------
+
+A Stable Release will be released by:
+
+* Tagging the release with YY.MM.n (year, month, number).
+* Uploading a tarball of the release to dpdk.org.
+* Sending an announcement to the `announce@dpdk.org <http://dpdk.org/ml/listinfo/announce>`_
+  list.
+
+Stable release are available on the `dpdk.org download page <http://dpdk.org/download>`_.
+
+
+ABI
+---
+
+The Stable Release should not be seen as a way of breaking or circumventing
+the DPDK ABI policy.
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes
  2017-01-11 15:05  3% [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Bruce Richardson
                   ` (3 preceding siblings ...)
  2017-01-11 15:05  6% ` [dpdk-dev] [RFC PATCH 09/11] ring: make existing rings reuse the typed ring definitions Bruce Richardson
@ 2017-01-13 14:23  3% ` Olivier Matz
  4 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2017-01-13 14:23 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Hi Bruce,

On Wed, 11 Jan 2017 15:05:14 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> The rte_ring library in DPDK provides an excellent high-performance
> mechanism which can be used for passing pointers between cores and
> for other tasks such as buffering. However, it does have a number
> of limitations:
> 
> * type information of pointers is lost, as it works with void pointers
> * typecasting is needed when using enqueue/dequeue burst functions,
>   since arrays of other types cannot be automatically cast to void **
> * the data to be passed through the ring itself must be no bigger than
>   a pointer
> 
> While the first two limitations are an inconvenience, the final one is
> one that can prevent use of rte_rings in cases where their
> functionality is needed. The use-case which has inspired the patchset
> is that of eventdev. When working with rte_events, each event is a
> 16-byte structure consisting of a pointer and some metadata e.g.
> priority and type. For these events, what is passed around between
> cores is not pointers to events, but the events themselves. This
> makes existing rings unsuitable for use by applications working with
> rte_events, and also for use internally inside any software
> implementation of an eventdev.
> 
> For rings to handle events or other similarly sized structures, e.g.
> NIC descriptors, etc., we then have two options - duplicate rte_ring
> code to create new ring implementations for each of those types, or
> generalise the existing code using macros so that the data type
> handled by each rings is a compile time paramter. This patchset takes
> the latter approach, and once applied would allow us to add an
> rte_event_ring type to DPDK using a header file containing:
> 
> #define RING_TYPE struct rte_event
> #define RING_TYPE_NAME rte_event
> #include <rte_typed_ring.h>
> #undef RING_TYPE_NAME
> #undef RING_TYPE
> 
> [NOTE: the event_ring is not defined in this set, since it depends on
> the eventdev implementation not present in the main tree]
> 
> If we want to elimiate some of the typecasting on our code when
> enqueuing and dequeuing mbuf pointers, an rte_mbuf_ring type can be
> similarly created using the same number of lines of code.
> 
> The downside of this generalisation is that the code for the rings now
> has far more use of macros in it. However, I do not feel that overall
> readability suffers much from this change, the since the changes are
> pretty much just search-replace onces. There should also be no ABI
> compatibility issues with this change, since the existing rte_ring
> structures remain the same.

I didn't dive deeply in the patches, just had a quick look. I
understand the need, and even if I really don't like the "#define +
#include" way to create a new specific ring (for readability,
grepability), that may be a solution to your problem.

I think using a similar approach than in sys/queue.h would be even
worse in terms of readability.


What do you think about the following approach?

- add a new elt_size in rte_ring structure

- update create/enqueue/dequeue/... functions to manage the elt size

- change:
    rte_ring_enqueue_bulk(struct rte_ring *r,
      void * const *obj_table, unsigned n)
  to:
    rte_ring_enqueue_bulk(struct rte_ring *r, void *obj_table,
      unsigned n) 

  This relaxes the type for the API in the function. In the caller,
  the type of obj_table would be:
  - (void **) in case of a ring of pointers
  - (uint8_t *) in case of a ring of uint8_t
  - (struct rte_event *) in case of a ring of rte_event
  ...

  I think (I have not tested it) it won't break compilation since
  any type can be implicitly casted into a void *. Also, I'd say it
  is possible to avoid breaking the ABI.

- deprecate or forbid calls to:
    rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
    (and similar)

  Because with a ring of pointers, obj is the pointer, passed by value.
  For other types, we would need
    rte_ring_mp_enqueue(struct rte_ring *r, <TYPE> obj)

  Maybe we could consider using a macro here.


The drawbacks I see are:
- a dynamic elt_size may slightly decrease performance
- it still uses casts to (void *), so there is no type checking


Regards,
Olivier

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 1/5] ethdev: add firmware version get
  @ 2017-01-12  6:31  5%                 ` Qiming Yang
  0 siblings, 0 replies; 200+ results
From: Qiming Yang @ 2017-01-12  6:31 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: remy.horton, Qiming Yang

This patch adds a new API 'rte_eth_dev_fw_version_get' for
fetching firmware version by a given device.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
 doc/guides/nics/features/default.ini   |  1 +
 doc/guides/rel_notes/deprecation.rst   |  4 ----
 doc/guides/rel_notes/release_17_02.rst |  5 +++++
 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 25 +++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  1 +
 6 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index f1bf9bf..ae40d57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -50,6 +50,7 @@ Timesync             =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 054e2e7..755dc65 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -30,10 +30,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
-  will be extended with a new member ``fw_version`` in order to store
-  the NIC firmware version.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 5762d3f..f9134bb 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -66,6 +66,11 @@ New Features
   Support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps adapters
   has been added to the existing mlx5 PMD.
 
+* **Added firmware version get API.**
+
+  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
+  version by a given device.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 917557a..89cffcf 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1588,6 +1588,18 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
 			STAT_QMAP_RX);
 }
 
+int
+rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, size_t fw_size)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->fw_version_get, -ENOTSUP);
+	return (*dev->dev_ops->fw_version_get)(dev, fw_version, fw_size);
+}
+
 void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ded43d7..a9b3686 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1177,6 +1177,10 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
+				     char *fw_version, size_t fw_size);
+/**< @internal Get firmware information of an Ethernet device. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1459,6 +1463,7 @@ struct eth_dev_ops {
 	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
 	eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
 	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
+	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
 	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
 	/**< Get packet types supported and identified by device. */
 
@@ -2396,6 +2401,26 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
 void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
 
 /**
+ * Retrieve the firmware version of a device.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param fw_version
+ *   A array pointer to store the firmware version of a device,
+ *   allocated by caller.
+ * @param fw_size
+ *   The size of the array pointed by fw_version, which should be
+ *   large enough to store firmware version of the device.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if operation is not supported.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if *fw_size* is not enough to store firmware version.
+ */
+int rte_eth_dev_fw_version_get(uint8_t port_id,
+			       char *fw_version, size_t fw_size);
+
+/**
  * Retrieve the supported packet types of an Ethernet device.
  *
  * When a packet type is announced as supported, it *must* be recognized by
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 0c2859e..c6c9d0d 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -146,6 +146,7 @@ DPDK_17.02 {
 	global:
 
 	_rte_eth_dev_reset;
+	rte_eth_dev_fw_version_get;
 	rte_flow_create;
 	rte_flow_destroy;
 	rte_flow_flush;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v6 2/4] lib: add bitrate statistics library
    2017-01-11 16:03  2% ` [dpdk-dev] [PATCH v6 1/4] lib: add information metrics library Remy Horton
@ 2017-01-11 16:03  3% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-01-11 16:03 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_17_02.rst             |   6 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 131 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 11 files changed, 292 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 4a19497..6cd9896 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -601,6 +601,10 @@ M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 F: doc/guides/sample_app_ug/keep_alive.rst
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 0eb3866..decebe5 100644
--- a/config/common_base
+++ b/config/common_base
@@ -598,3 +598,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the device metrics library
 #
 CONFIG_RTE_LIBRTE_METRICS=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 94f0f69..5e194b0 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -151,4 +151,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [Device Metrics]     (@ref rte_metrics.h),
+  [Bitrate Statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 13e0faf..ff15f5b 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -59,6 +59,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_ring \
                           lib/librte_sched \
                           lib/librte_metrics \
+                          lib/librte_bitratestats \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 82c5616..70f93e1 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,6 +40,11 @@ New Features
      intended to provide a reporting mechanism that is independent of the
      ethdev library.
 
+   * **Added bit-rate calculation library.**
+
+     A library that can be used to calculate device bit-rates. Calculated
+     bitrates are reported using the metrics library.
+
      This section is a comment. do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =========================================================
@@ -162,6 +167,7 @@ The libraries prepended with a plus sign were incremented in this version.
 .. code-block:: diff
 
      librte_acl.so.2
+   + librte_bitratestats.so.1
      librte_cfgfile.so.2
      librte_cmdline.so.2
      librte_cryptodev.so.2
diff --git a/lib/Makefile b/lib/Makefile
index 5d85dcf..e211bc0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..b725d4e
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..cb7aae4
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,131 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate_s {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates_s {
+	struct rte_stats_bitrate_s port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates_s *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates_s), 0);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data)
+{
+	const char *names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_metrics(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate_s *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +-50 fixes integer rounding during divison */
+	if (delta > 0)
+		delta = (delta * alpha_percent + 50) / 100;
+	else
+		delta = (delta * alpha_percent - 50) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_metrics(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..bc87c5e
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates_s;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates_s *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..66f232f
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.02 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 40fcf33..6aac5ac 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 1/4] lib: add information metrics library
  @ 2017-01-11 16:03  2% ` Remy Horton
  2017-01-11 16:03  3% ` [dpdk-dev] [PATCH v6 2/4] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-01-11 16:03 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a new information metric library that allows other
modules to register named metrics and update their values. It is
intended to be independent of ethdev, rather than mixing ethdev
and non-ethdev information in xstats.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   5 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/rel_notes/release_17_02.rst     |   7 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 308 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 190 ++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 11 files changed, 584 insertions(+)
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 9645c9b..4a19497 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -596,6 +596,11 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+F: doc/guides/sample_app_ug/keep_alive.rst
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 8e9dcfa..0eb3866 100644
--- a/config/common_base
+++ b/config/common_base
@@ -593,3 +593,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 72d59b2..94f0f69 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -150,4 +150,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [Device Metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index b340fcf..13e0faf 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -58,6 +58,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_reorder \
                           lib/librte_ring \
                           lib/librte_sched \
+                          lib/librte_metrics \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 180af82..82c5616 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -34,6 +34,12 @@ New Features
 
      Refer to the previous release notes for examples.
 
+   * **Added information metric library.**
+
+     A library that allows information metrics to be added and update. It is
+     intended to provide a reporting mechanism that is independent of the
+     ethdev library.
+
      This section is a comment. do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =========================================================
@@ -171,6 +177,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_mbuf.so.2
      librte_mempool.so.2
      librte_meter.so.1
+   + librte_metrics.so.1
      librte_net.so.1
      librte_pdump.so.1
      librte_pipeline.so.3
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..5d85dcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..5edacc6
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,308 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ * @param name
+ *   Name of metric
+ * @param value
+ *   Current value for metric
+ * @param idx_next_set
+ *   Index of next root element (zero for none)
+ * @param idx_next_metric
+ *   Index of next metric in set (zero for none)
+ *
+ * Only the root of each set needs idx_next_set but since it has to be
+ * assumed that number of sets could equal total number of metrics,
+ * having a separate set metadata table doesn't save any memory.
+ */
+struct rte_metrics_meta_s {
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	uint64_t value[RTE_MAX_ETHPORTS];
+	uint64_t nonport_value;
+	uint16_t idx_next_set;
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * @param idx_last_set
+ *   Index of last metadata entry with valid data. This value is
+ *   not valid if cnt_stats is zero.
+ * @param cnt_stats
+ *   Number of metrics.
+ * @param metadata
+ *   Stat data memory block.
+ *
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	uint16_t idx_last_set;
+	uint16_t cnt_stats;
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(void)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), rte_socket_id(), 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_metric(const char *name)
+{
+	const char *list_names[] = {name};
+
+	return rte_metrics_reg_metrics(list_names, 1);
+}
+
+int
+rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_metric(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_metrics(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_metrics(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_NONPORT)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].nonport_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		if (port_id == RTE_METRICS_NONPORT)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->nonport_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..c58b366
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,190 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * RTE Metrics module
+ *
+ * Metric information is populated using a push model, where the
+ * information provider calls an update function on the relevant
+ * metrics. Currently only bulk querying of metrics is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/** Used to indicate port-independent information */
+#define RTE_METRICS_NONPORT -1
+
+
+/**
+ * Metric name
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric name.
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This only has to be explicitly called if you
+ * intend to use rte_metrics_reg_metric() or rte_metrics_reg_metrics() from a
+ * secondary process. This function must be called from a primary process.
+ */
+void rte_metrics_init(void);
+
+
+/**
+ * Register a metric
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metric(const char *name);
+
+/**
+ * Register a set of metrics
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metrics(const char **names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   Array of names to receive key names
+ *
+ * @param capacity
+ *   Space available in names
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Fetch metrics.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   Array to receive values and their keys
+ *
+ * @param capacity
+ *   Space available in values
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metric(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metrics(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..f904814
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.02 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_metric;
+	rte_metrics_reg_metrics;
+	rte_metrics_update_metric;
+	rte_metrics_update_metrics;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index f75f0e2..40fcf33 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [RFC PATCH 09/11] ring: make existing rings reuse the typed ring definitions
  2017-01-11 15:05  3% [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Bruce Richardson
                   ` (2 preceding siblings ...)
  2017-01-11 15:05 12% ` [dpdk-dev] [RFC PATCH 07/11] ring: allow multiple typed rings in the same unit Bruce Richardson
@ 2017-01-11 15:05  6% ` Bruce Richardson
  2017-01-13 14:23  3% ` [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Olivier Matz
  4 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-11 15:05 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

now that the typed rings are functional, start removing the old code from
the existing rings, so that we don't have code duplication. The first
to be removed are the structure and macro definitions which are duplicated.
This allows the typed rings and regular ring headers to co-exist, so we
can change the typed rings to use their own guard macro.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_ring/rte_ring.c       |   2 -
 lib/librte_ring/rte_ring.h       | 183 ++-------------------------------------
 lib/librte_ring/rte_typed_ring.h |   6 +-
 3 files changed, 9 insertions(+), 182 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 8ead295..b6215f6 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,8 +89,6 @@
 
 #include "rte_ring.h"
 
-TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
-
 struct rte_tailq_elem rte_ring_tailq = {
 	.name = RTE_TAILQ_RING_NAME,
 };
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e359aff..4e74efd 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -91,132 +91,13 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-
-#define RTE_TAILQ_RING_NAME "RTE_RING"
-
-enum rte_ring_queue_behavior {
-	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
-	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
-};
-
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
-#define RTE_RING_MZ_PREFIX "RG_"
-/**< The maximum length of a ring name. */
-#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
-			   sizeof(RTE_RING_MZ_PREFIX) + 1)
-
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
-struct rte_memzone; /* forward declaration, so as not to require memzone.h */
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct rte_ring {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
-
-	/** Ring producer status. */
-	struct prod {
-		uint32_t watermark;      /**< Maximum items before EDQUOT. */
-		uint32_t sp_enqueue;     /**< True, if single producer. */
-		uint32_t size;           /**< Size of ring. */
-		uint32_t mask;           /**< Mask (size-1) of ring. */
-		volatile uint32_t head;  /**< Producer head. */
-		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
-
-	/** Ring consumer status. */
-	struct cons {
-		uint32_t sc_dequeue;     /**< True, if single consumer. */
-		uint32_t size;           /**< Size of the ring. */
-		uint32_t mask;           /**< Mask (size-1) of ring. */
-		volatile uint32_t head;  /**< Consumer head. */
-		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
-
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
+#define RING_TYPE void *
+#define RING_TYPE_NAME rte_void
+#include "rte_typed_ring.h"
+#undef RING_TYPE
+#undef RING_TYPE_NAME
 
-	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
-	                                     * not volatile so need to be careful
-	                                     * about compiler re-ordering */
-};
-
-#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
-#define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
-
-/**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
+#define rte_ring rte_void_ring
 
 /**
  * Calculate the memory size needed for a ring
@@ -350,58 +231,6 @@ int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
  */
 void rte_ring_dump(FILE *f, const struct rte_ring *r);
 
-/* the actual enqueue of pointers on the ring.
- * Placed here since identical code needed in both
- * single and multi producer enqueue functions */
-#define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
-	uint32_t idx = prod_head & mask; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
-			r->ring[idx] = obj_table[i]; \
-			r->ring[idx+1] = obj_table[i+1]; \
-			r->ring[idx+2] = obj_table[i+2]; \
-			r->ring[idx+3] = obj_table[i+3]; \
-		} \
-		switch (n & 0x3) { \
-			case 3: r->ring[idx++] = obj_table[i++]; \
-			case 2: r->ring[idx++] = obj_table[i++]; \
-			case 1: r->ring[idx++] = obj_table[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			r->ring[idx] = obj_table[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			r->ring[idx] = obj_table[i]; \
-	} \
-} while(0)
-
-/* the actual copy of pointers on the ring to obj_table.
- * Placed here since identical code needed in both
- * single and multi consumer dequeue functions */
-#define DEQUEUE_PTRS() do { \
-	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
-			obj_table[i] = r->ring[idx]; \
-			obj_table[i+1] = r->ring[idx+1]; \
-			obj_table[i+2] = r->ring[idx+2]; \
-			obj_table[i+3] = r->ring[idx+3]; \
-		} \
-		switch (n & 0x3) { \
-			case 3: obj_table[i++] = r->ring[idx++]; \
-			case 2: obj_table[i++] = r->ring[idx++]; \
-			case 1: obj_table[i++] = r->ring[idx++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj_table[i] = r->ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj_table[i] = r->ring[idx]; \
-	} \
-} while (0)
-
 /**
  * @internal Enqueue several objects on the ring (multi-producers safe).
  *
diff --git a/lib/librte_ring/rte_typed_ring.h b/lib/librte_ring/rte_typed_ring.h
index 79edc65..89f6983 100644
--- a/lib/librte_ring/rte_typed_ring.h
+++ b/lib/librte_ring/rte_typed_ring.h
@@ -63,8 +63,8 @@
  *
  ***************************************************************************/
 
-#ifndef _RTE_RING_H_
-#define _RTE_RING_H_
+#ifndef _RTE_TYPED_RING_H_
+#define _RTE_TYPED_RING_H_
 
 /**
  * @file
@@ -242,7 +242,7 @@ TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
 extern struct rte_tailq_elem rte_ring_tailq;
 
-#endif /* _RTE_RING_H_ */
+#endif /* _RTE_TYPED_RING_H_ */
 
 #ifndef RING_TYPE_NAME
 #error "Need RING_TYPE_NAME defined before including"
-- 
2.9.3

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [RFC PATCH 07/11] ring: allow multiple typed rings in the same unit
  2017-01-11 15:05  3% [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Bruce Richardson
  2017-01-11 15:05  2% ` [dpdk-dev] [RFC PATCH 01/11] ring: add new typed ring header file Bruce Richardson
  2017-01-11 15:05  1% ` [dpdk-dev] [RFC PATCH 05/11] ring: add user-specified typing to typed rings Bruce Richardson
@ 2017-01-11 15:05 12% ` Bruce Richardson
  2017-01-11 15:05  6% ` [dpdk-dev] [RFC PATCH 09/11] ring: make existing rings reuse the typed ring definitions Bruce Richardson
  2017-01-13 14:23  3% ` [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Olivier Matz
  4 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-11 15:05 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

allow the typed ring header file to be included multiple times inside a
C file so that we can have multiple different ring types in use. This
is tested by having a second ring type in the unit tests, which works
with small (16 byte) structures, rather than just pointers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_typed_ring.c       |  60 +++++++++++++++-
 lib/librte_ring/rte_typed_ring.h | 144 ++++++++++++++++++++-------------------
 2 files changed, 132 insertions(+), 72 deletions(-)

diff --git a/app/test/test_typed_ring.c b/app/test/test_typed_ring.c
index aaef023..b403af5 100644
--- a/app/test/test_typed_ring.c
+++ b/app/test/test_typed_ring.c
@@ -37,6 +37,19 @@
 #define RING_TYPE struct rte_mbuf *
 #define RING_TYPE_NAME rte_mbuf
 #include <rte_typed_ring.h>
+#undef RING_TYPE_NAME
+#undef RING_TYPE
+
+struct xyval {
+	uint64_t x;
+	void *y;
+};
+
+#define RING_TYPE struct xyval /* structure not pointer */
+#define RING_TYPE_NAME rte_xyval
+#include <rte_typed_ring.h>
+#undef RING_TYPE_NAME
+#undef RING_TYPE
 
 #define RING_SIZE 256
 #define BURST_SZ 32
@@ -73,6 +86,38 @@ test_mbuf_enqueue_dequeue(struct rte_mbuf_ring *r)
 	return 0;
 }
 
+static int
+test_xyval_enqueue_dequeue(struct rte_xyval_ring *r)
+{
+	struct xyval inbufs[BURST_SZ];
+	struct xyval outbufs[BURST_SZ];
+	unsigned int i, j;
+
+	for (i = 0; i < BURST_SZ; i++)
+		inbufs[i].x = rte_rand();
+
+	for (i = 0; i < ITERATIONS; i++) {
+		uint16_t in = rte_xyval_ring_enqueue_burst(r, inbufs, BURST_SZ);
+		if (in != BURST_SZ) {
+			printf("Error enqueuing xyvals\n");
+			return -1;
+		}
+		uint16_t out = rte_xyval_ring_dequeue_burst(r, outbufs, BURST_SZ);
+		if (out != BURST_SZ) {
+			printf("Error dequeuing xyvals\n");
+			return -1;
+		}
+
+		for (j = 0; j < BURST_SZ; j++)
+			if (outbufs[j].x != inbufs[j].x ||
+					outbufs[j].y != inbufs[j].y) {
+				printf("Error: dequeued val != enqueued val\n");
+				return -1;
+			}
+	}
+	return 0;
+}
+
 /**
  * test entry point
  */
@@ -87,13 +132,24 @@ test_typed_ring(void)
 		return -1;
 	}
 	rte_mbuf_ring_list_dump(stdout);
-
+	printf("mbuf ring has memory size %u\n",
+			(unsigned int)rte_mbuf_ring_get_memsize(RING_SIZE));
 	if (test_mbuf_enqueue_dequeue(r) != 0) {
 		rte_mbuf_ring_free(r);
 		return -1;
 	}
-
 	rte_mbuf_ring_free(r);
+
+	struct rte_xyval_ring *r2;
+	r2 = rte_xyval_ring_create("xyval_ring", RING_SIZE, rte_socket_id(),
+			RING_F_SP_ENQ | RING_F_SC_DEQ);
+	if (test_xyval_enqueue_dequeue(r2) != 0) {
+		rte_xyval_ring_free(r2);
+		return -1;
+	}
+	printf("xyval ring has memory size %u\n",
+			(unsigned int)rte_xyval_ring_get_memsize(RING_SIZE));
+	rte_xyval_ring_free(r2);
 	return 0;
 }
 
diff --git a/lib/librte_ring/rte_typed_ring.h b/lib/librte_ring/rte_typed_ring.h
index 3f7514f..79edc65 100644
--- a/lib/librte_ring/rte_typed_ring.h
+++ b/lib/librte_ring/rte_typed_ring.h
@@ -114,14 +114,6 @@ extern "C" {
 #define _CAT(a, b) a ## _ ## b
 #define CAT(a, b) _CAT(a, b)
 
-#ifndef RING_TYPE_NAME
-#error "Need RING_TYPE_NAME defined before including"
-#endif
-#ifndef RING_TYPE
-#error "Need RING_TYPE defined before including"
-#endif
-#define TYPE(x) CAT(RING_TYPE_NAME, x)
-
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
 enum rte_ring_queue_behavior {
@@ -160,63 +152,7 @@ struct rte_ring_debug_stats {
 #define RTE_RING_PAUSE_REP_COUNT 0
 #endif
 
-struct rte_memzone; /* forward declaration, so as not to require memzone.h */
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct TYPE(ring) {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the ring */
-
-	/** Ring producer status. */
-	struct prod {
-		uint32_t watermark;      /**< Maximum items before EDQUOT. */
-		uint32_t sp_enqueue;     /**< True, if single producer. */
-		uint32_t size;           /**< Size of ring. */
-		uint32_t mask;           /**< Mask (size-1) of ring. */
-		volatile uint32_t head;  /**< Producer head. */
-		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
-
-	/** Ring consumer status. */
-	struct cons {
-		uint32_t sc_dequeue;     /**< True, if single consumer. */
-		uint32_t size;           /**< Size of the ring. */
-		uint32_t mask;           /**< Mask (size-1) of ring. */
-		volatile uint32_t head;  /**< Consumer head. */
-		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
-
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
-	/**
-	 * Memory space of ring starts here.
-	 * not volatile so need to be careful
-	 * about compiler re-ordering
-	 */
-	RING_TYPE ring[] __rte_cache_aligned;
-};
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
 #define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
@@ -304,6 +240,77 @@ struct TYPE(ring) {
 	} \
 } while (0)
 
+extern struct rte_tailq_elem rte_ring_tailq;
+
+#endif /* _RTE_RING_H_ */
+
+#ifndef RING_TYPE_NAME
+#error "Need RING_TYPE_NAME defined before including"
+#endif
+#ifndef RING_TYPE
+#error "Need RING_TYPE defined before including"
+#endif
+#define TYPE(x) CAT(RING_TYPE_NAME, x)
+
+struct rte_memzone; /* forward declaration, so as not to require memzone.h */
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct TYPE(ring) {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
+	int flags;                       /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the ring */
+
+	/** Ring producer status. */
+	struct {
+		uint32_t watermark;      /**< Maximum items before EDQUOT. */
+		uint32_t sp_enqueue;     /**< True, if single producer. */
+		uint32_t size;           /**< Size of ring. */
+		uint32_t mask;           /**< Mask (size-1) of ring. */
+		volatile uint32_t head;  /**< Producer head. */
+		volatile uint32_t tail;  /**< Producer tail. */
+	} prod __rte_cache_aligned;
+
+	/** Ring consumer status. */
+	struct {
+		uint32_t sc_dequeue;     /**< True, if single consumer. */
+		uint32_t size;           /**< Size of the ring. */
+		uint32_t mask;           /**< Mask (size-1) of ring. */
+		volatile uint32_t head;  /**< Consumer head. */
+		volatile uint32_t tail;  /**< Consumer tail. */
+#ifdef RTE_RING_SPLIT_PROD_CONS
+	} cons __rte_cache_aligned;
+#else
+	} cons;
+#endif
+
+#ifdef RTE_LIBRTE_RING_DEBUG
+	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
+#endif
+
+	/**
+	 * Memory space of ring starts here.
+	 * not volatile so need to be careful
+	 * about compiler re-ordering
+	 */
+	RING_TYPE ring[] __rte_cache_aligned;
+};
+
+
 /**
  * @internal Enqueue several objects on the ring (multi-producers safe).
  *
@@ -1146,10 +1153,6 @@ TYPE(ring_dequeue_burst)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned in
 		return TYPE(ring_mc_dequeue_burst)(r, obj_table, n);
 }
 
-TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
-
-extern struct rte_tailq_elem rte_ring_tailq;
-
 /**
  * Calculate the memory size needed for a ring
  *
@@ -1559,8 +1562,9 @@ TYPE(ring_lookup)(const char *name)
 	return r;
 }
 
+#undef TYPE
+
 #ifdef __cplusplus
 }
 #endif
 
-#endif /* _RTE_RING_H_ */
-- 
2.9.3

^ permalink raw reply	[relevance 12%]

* [dpdk-dev] [RFC PATCH 05/11] ring: add user-specified typing to typed rings
  2017-01-11 15:05  3% [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Bruce Richardson
  2017-01-11 15:05  2% ` [dpdk-dev] [RFC PATCH 01/11] ring: add new typed ring header file Bruce Richardson
@ 2017-01-11 15:05  1% ` Bruce Richardson
  2017-01-11 15:05 12% ` [dpdk-dev] [RFC PATCH 07/11] ring: allow multiple typed rings in the same unit Bruce Richardson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-11 15:05 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Make the typed ring header as the name suggests, with rings created based
on a user-defined type. This is done by using macros to create all
functions in the file. For now, the file can still only be included once
so only one type of ring can be used by a single C file, but that will be
fixed in a later commit. Test this flexibility by defining a ring which
works with mbuf pointers instead of void pointers. The only difference here
is in usability, in that no casts are necessary to pass an array of mbuf
pointers to ring enqueue/dequeue. [While any other pointer type can be
passed without cast to "void *", the same does not hold true for "void **"]

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_typed_ring.c       |  56 +++++++++-
 lib/librte_ring/rte_typed_ring.h | 227 ++++++++++++++++++++-------------------
 2 files changed, 173 insertions(+), 110 deletions(-)

diff --git a/app/test/test_typed_ring.c b/app/test/test_typed_ring.c
index fbfb820..aaef023 100644
--- a/app/test/test_typed_ring.c
+++ b/app/test/test_typed_ring.c
@@ -31,15 +31,69 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <rte_typed_ring.h>
+#include <rte_random.h>
 #include "test.h"
 
+#define RING_TYPE struct rte_mbuf *
+#define RING_TYPE_NAME rte_mbuf
+#include <rte_typed_ring.h>
+
+#define RING_SIZE 256
+#define BURST_SZ 32
+#define ITERATIONS (RING_SIZE * 2)
+
+static int
+test_mbuf_enqueue_dequeue(struct rte_mbuf_ring *r)
+{
+	struct rte_mbuf *inbufs[BURST_SZ];
+	struct rte_mbuf *outbufs[BURST_SZ];
+	unsigned int i, j;
+
+	for (i = 0; i < BURST_SZ; i++)
+		inbufs[i] = (void *)((uintptr_t)rte_rand());
+
+	for (i = 0; i < ITERATIONS; i++) {
+		uint16_t in = rte_mbuf_ring_enqueue_burst(r, inbufs, BURST_SZ);
+		if (in != BURST_SZ) {
+			printf("Error enqueuing mbuf ptrs\n");
+			return -1;
+		}
+		uint16_t out = rte_mbuf_ring_dequeue_burst(r, outbufs, BURST_SZ);
+		if (out != BURST_SZ) {
+			printf("Error dequeuing mbuf ptrs\n");
+			return -1;
+		}
+
+		for (j = 0; j < BURST_SZ; j++)
+			if (outbufs[j] != inbufs[j]) {
+				printf("Error: dequeued ptr != enqueued ptr\n");
+				return -1;
+			}
+	}
+	return 0;
+}
+
 /**
  * test entry point
  */
 static int
 test_typed_ring(void)
 {
+	struct rte_mbuf_ring *r;
+	r = rte_mbuf_ring_create("Test_mbuf_ring", RING_SIZE, rte_socket_id(),
+			RING_F_SP_ENQ | RING_F_SC_DEQ);
+	if (r == NULL) {
+		fprintf(stderr, "ln %d: Error creating mbuf ring\n", __LINE__);
+		return -1;
+	}
+	rte_mbuf_ring_list_dump(stdout);
+
+	if (test_mbuf_enqueue_dequeue(r) != 0) {
+		rte_mbuf_ring_free(r);
+		return -1;
+	}
+
+	rte_mbuf_ring_free(r);
 	return 0;
 }
 
diff --git a/lib/librte_ring/rte_typed_ring.h b/lib/librte_ring/rte_typed_ring.h
index 5a14403..03a9bd7 100644
--- a/lib/librte_ring/rte_typed_ring.h
+++ b/lib/librte_ring/rte_typed_ring.h
@@ -111,6 +111,17 @@ extern "C" {
 #include <rte_rwlock.h>
 #include <rte_eal_memconfig.h>
 
+#define _CAT(a, b) a ## _ ## b
+#define CAT(a, b) _CAT(a, b)
+
+#ifndef RING_TYPE_NAME
+#error "Need RING_TYPE_NAME defined before including"
+#endif
+#ifndef RING_TYPE
+#error "Need RING_TYPE defined before including"
+#endif
+#define TYPE(x) CAT(RING_TYPE_NAME, x)
+
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
 enum rte_ring_queue_behavior {
@@ -161,7 +172,7 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
  * values in a modulo-32bit base: that's why the overflow of the indexes is not
  * a problem.
  */
-struct rte_ring {
+struct TYPE(ring) {
 	/*
 	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
 	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
@@ -170,7 +181,7 @@ struct rte_ring {
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
 	int flags;                       /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
+			/**< Memzone, if any, containing the ring */
 
 	/** Ring producer status. */
 	struct prod {
@@ -204,7 +215,7 @@ struct rte_ring {
 	 * not volatile so need to be careful
 	 * about compiler re-ordering
 	 */
-	void *ring[] __rte_cache_aligned;
+	RING_TYPE ring[] __rte_cache_aligned;
 };
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
@@ -302,7 +313,7 @@ struct rte_ring {
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @param behavior
@@ -319,7 +330,7 @@ struct rte_ring {
  *   - n: Actual number of objects enqueued.
  */
 static inline int __attribute__((always_inline))
-__rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+TYPE(__ring_mp_do_enqueue)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 			 unsigned int n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t prod_head, prod_next;
@@ -412,7 +423,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @param behavior
@@ -429,7 +440,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects enqueued.
  */
 static inline int __attribute__((always_inline))
-__rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+TYPE(__ring_sp_do_enqueue)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 			 unsigned int n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t prod_head, cons_tail;
@@ -495,7 +506,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @param behavior
@@ -512,7 +523,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  */
 
 static inline int __attribute__((always_inline))
-__rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
+TYPE(__ring_mc_do_dequeue)(struct TYPE(ring) *r, RING_TYPE *obj_table,
 		 unsigned int n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t cons_head, prod_tail;
@@ -597,7 +608,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @param behavior
@@ -613,7 +624,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   - n: Actual number of objects dequeued.
  */
 static inline int __attribute__((always_inline))
-__rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
+TYPE(__ring_sc_do_dequeue)(struct TYPE(ring) *r, RING_TYPE *obj_table,
 		 unsigned int n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t cons_head, prod_tail;
@@ -665,7 +676,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
@@ -675,10 +686,10 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+TYPE(ring_mp_enqueue_bulk)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 			 unsigned int n)
 {
-	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return TYPE(__ring_mp_do_enqueue)(r, obj_table, n, RTE_RING_QUEUE_FIXED);
 }
 
 /**
@@ -687,7 +698,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
@@ -697,10 +708,10 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+TYPE(ring_sp_enqueue_bulk)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 			 unsigned int n)
 {
-	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return TYPE(__ring_sp_do_enqueue)(r, obj_table, n, RTE_RING_QUEUE_FIXED);
 }
 
 /**
@@ -713,7 +724,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
@@ -723,13 +734,13 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+TYPE(ring_enqueue_bulk)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 		      unsigned int n)
 {
 	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue_bulk(r, obj_table, n);
+		return TYPE(ring_sp_enqueue_bulk)(r, obj_table, n);
 	else
-		return rte_ring_mp_enqueue_bulk(r, obj_table, n);
+		return TYPE(ring_mp_enqueue_bulk)(r, obj_table, n);
 }
 
 /**
@@ -749,9 +760,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
+TYPE(ring_mp_enqueue)(struct TYPE(ring) *r, RING_TYPE obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+	return TYPE(ring_mp_enqueue_bulk)(r, &obj, 1);
 }
 
 /**
@@ -768,9 +779,9 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
+TYPE(ring_sp_enqueue)(struct TYPE(ring) *r, RING_TYPE obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+	return TYPE(ring_sp_enqueue_bulk)(r, &obj, 1);
 }
 
 /**
@@ -791,12 +802,12 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_enqueue(struct rte_ring *r, void *obj)
+TYPE(ring_enqueue)(struct TYPE(ring) *r, RING_TYPE obj)
 {
 	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
+		return TYPE(ring_sp_enqueue(r, obj));
 	else
-		return rte_ring_mp_enqueue(r, obj);
+		return TYPE(ring_mp_enqueue(r, obj));
 }
 
 /**
@@ -808,7 +819,7 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
@@ -817,9 +828,9 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  *     dequeued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
+TYPE(ring_mc_dequeue_bulk)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned int n)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return TYPE(__ring_mc_do_dequeue)(r, obj_table, n, RTE_RING_QUEUE_FIXED);
 }
 
 /**
@@ -828,7 +839,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table,
  *   must be strictly positive.
@@ -838,9 +849,9 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
  *     dequeued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
+TYPE(ring_sc_dequeue_bulk)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned int n)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return TYPE(__ring_sc_do_dequeue)(r, obj_table, n, RTE_RING_QUEUE_FIXED);
 }
 
 /**
@@ -853,7 +864,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
@@ -862,12 +873,12 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
  *     dequeued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
+TYPE(ring_dequeue_bulk)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned int n)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+		return TYPE(ring_sc_dequeue_bulk)(r, obj_table, n);
 	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+		return TYPE(ring_mc_dequeue_bulk)(r, obj_table, n);
 }
 
 /**
@@ -879,16 +890,16 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
+ *   A pointer to a RING_TYPE pointer (object) that will be filled.
  * @return
  *   - 0: Success; objects dequeued.
  *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
  *     dequeued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
+TYPE(ring_mc_dequeue)(struct TYPE(ring) *r, RING_TYPE *obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+	return TYPE(ring_mc_dequeue_bulk)(r, obj_p, 1);
 }
 
 /**
@@ -897,16 +908,16 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
+ *   A pointer to a RING_TYPE pointer (object) that will be filled.
  * @return
  *   - 0: Success; objects dequeued.
  *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
  *     dequeued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
+TYPE(ring_sc_dequeue)(struct TYPE(ring) *r, RING_TYPE *obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+	return TYPE(ring_sc_dequeue_bulk)(r, obj_p, 1);
 }
 
 /**
@@ -919,19 +930,19 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
+ *   A pointer to a RING_TYPE pointer (object) that will be filled.
  * @return
  *   - 0: Success, objects dequeued.
  *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
  *     dequeued.
  */
 static inline int __attribute__((always_inline))
-rte_ring_dequeue(struct rte_ring *r, void **obj_p)
+TYPE(ring_dequeue)(struct TYPE(ring) *r, RING_TYPE *obj_p)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
+		return TYPE(ring_sc_dequeue)(r, obj_p);
 	else
-		return rte_ring_mc_dequeue(r, obj_p);
+		return TYPE(ring_mc_dequeue)(r, obj_p);
 }
 
 /**
@@ -944,7 +955,7 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
  *   - 0: The ring is not full.
  */
 static inline int
-rte_ring_full(const struct rte_ring *r)
+TYPE(ring_full)(const struct TYPE(ring) *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -961,7 +972,7 @@ rte_ring_full(const struct rte_ring *r)
  *   - 0: The ring is not empty.
  */
 static inline int
-rte_ring_empty(const struct rte_ring *r)
+TYPE(ring_empty)(const struct TYPE(ring) *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -977,7 +988,7 @@ rte_ring_empty(const struct rte_ring *r)
  *   The number of entries in the ring.
  */
 static inline unsigned
-rte_ring_count(const struct rte_ring *r)
+TYPE(ring_count)(const struct TYPE(ring) *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -993,7 +1004,7 @@ rte_ring_count(const struct rte_ring *r)
  *   The number of free entries in the ring.
  */
 static inline unsigned
-rte_ring_free_count(const struct rte_ring *r)
+TYPE(ring_free_count)(const struct TYPE(ring) *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -1009,17 +1020,17 @@ rte_ring_free_count(const struct rte_ring *r)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - n: Actual number of objects enqueued.
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+TYPE(ring_mp_enqueue_burst)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 			 unsigned int n)
 {
-	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return TYPE(__ring_mp_do_enqueue)(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
 }
 
 /**
@@ -1028,17 +1039,17 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - n: Actual number of objects enqueued.
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+TYPE(ring_sp_enqueue_burst)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 			 unsigned int n)
 {
-	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return TYPE(__ring_sp_do_enqueue)(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
 }
 
 /**
@@ -1051,20 +1062,20 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects).
+ *   A pointer to a table of RING_TYPE pointers (objects).
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - n: Actual number of objects enqueued.
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+TYPE(ring_enqueue_burst)(struct TYPE(ring) *r, RING_TYPE const *obj_table,
 		      unsigned int n)
 {
 	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue_burst(r, obj_table, n);
+		return TYPE(ring_sp_enqueue_burst)(r, obj_table, n);
 	else
-		return rte_ring_mp_enqueue_burst(r, obj_table, n);
+		return TYPE(ring_mp_enqueue_burst)(r, obj_table, n);
 }
 
 /**
@@ -1078,16 +1089,16 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
+TYPE(ring_mc_dequeue_burst)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned int n)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return TYPE(__ring_mc_do_dequeue)(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
 }
 
 /**
@@ -1098,16 +1109,16 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
+TYPE(ring_sc_dequeue_burst)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned int n)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return TYPE(__ring_sc_do_dequeue)(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
 }
 
 /**
@@ -1120,27 +1131,24 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
  * @param r
  *   A pointer to the ring structure.
  * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
+ *   A pointer to a table of RING_TYPE pointers (objects) that will be filled.
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
  *   - Number of objects dequeued
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
+TYPE(ring_dequeue_burst)(struct TYPE(ring) *r, RING_TYPE *obj_table, unsigned int n)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+		return TYPE(ring_sc_dequeue_burst)(r, obj_table, n);
 	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+		return TYPE(ring_mc_dequeue_burst)(r, obj_table, n);
 }
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
-static struct rte_tailq_elem rte_ring_tailq = {
-	.name = RTE_TAILQ_RING_NAME,
-};
-EAL_REGISTER_TAILQ(rte_ring_tailq)
+extern struct rte_tailq_elem rte_ring_tailq;
 
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
@@ -1150,7 +1158,7 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
  *
  * This function returns the number of bytes needed for a ring, given
  * the number of elements in it. This value is the sum of the size of
- * the structure rte_ring and the size of the memory needed by the
+ * the ring structure and the size of the memory needed by the
  * objects pointers. The value is aligned to a cache line size.
  *
  * @param count
@@ -1160,7 +1168,7 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
  *   - -EINVAL if count is not a power of 2.
  */
 static inline ssize_t
-rte_ring_get_memsize(unsigned int count)
+TYPE(ring_get_memsize)(unsigned int count)
 {
 	ssize_t sz;
 
@@ -1172,7 +1180,7 @@ rte_ring_get_memsize(unsigned int count)
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct TYPE(ring)) + count * sizeof(RING_TYPE);
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
@@ -1182,7 +1190,7 @@ rte_ring_get_memsize(unsigned int count)
  *
  * Initialize a ring structure in memory pointed by "r". The size of the
  * memory area must be large enough to store the ring structure and the
- * object table. It is advised to use rte_ring_get_memsize() to get the
+ * object table. It is advised to use ring_get_memsize() to get the
  * appropriate size.
  *
  * The ring size is set to *count*, which must be a power of two. Water
@@ -1203,33 +1211,33 @@ rte_ring_get_memsize(unsigned int count)
  * @param flags
  *   An OR of the following:
  *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      using ``enqueue()`` or ``enqueue_bulk()``
  *      is "single-producer". Otherwise, it is "multi-producers".
  *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      using ``dequeue()`` or ``dequeue_bulk()``
  *      is "single-consumer". Otherwise, it is "multi-consumers".
  * @return
  *   0 on success, or a negative value on error.
  */
 static inline int
-rte_ring_init(struct rte_ring *r, const char *name, unsigned int count,
+TYPE(ring_init)(struct TYPE(ring) *r, const char *name, unsigned int count,
 	unsigned int flags)
 {
 	int ret;
 
 	/* compilation-time checks */
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
+	RTE_BUILD_BUG_ON((sizeof(struct TYPE(ring)) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_RING_SPLIT_PROD_CONS
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
+	RTE_BUILD_BUG_ON((offsetof(struct TYPE(ring), cons) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #endif
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
+	RTE_BUILD_BUG_ON((offsetof(struct TYPE(ring), prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
+	RTE_BUILD_BUG_ON((sizeof(struct TYPE(ring_debug_stats) &
 			  RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
+	RTE_BUILD_BUG_ON((offsetof(struct TYPE(ring), stats) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #endif
 
@@ -1254,7 +1262,7 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned int count,
  * Create a new ring named *name* in memory.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Then it
- * calls rte_ring_init() to initialize an empty ring.
+ * calls ring_init() to initialize an empty ring.
  *
  * The new ring size is set to *count*, which must be a power of
  * two. Water marking is disabled by default. The real usable ring size
@@ -1274,10 +1282,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned int count,
  * @param flags
  *   An OR of the following:
  *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      using ``enqueue()`` or ``enqueue_bulk()``
  *      is "single-producer". Otherwise, it is "multi-producers".
  *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      using ``dequeue()`` or ``dequeue_bulk()``
  *      is "single-consumer". Otherwise, it is "multi-consumers".
  * @return
  *   On success, the pointer to the new allocated ring. NULL on error with
@@ -1289,12 +1297,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned int count,
  *    - EEXIST - a memzone with the same name already exists
  *    - ENOMEM - no appropriate memory area found in which to create memzone
  */
-static inline struct rte_ring *
-rte_ring_create(const char *name, unsigned int count, int socket_id,
+static inline struct TYPE(ring) *
+TYPE(ring_create)(const char *name, unsigned int count, int socket_id,
 		unsigned int flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
-	struct rte_ring *r;
+	struct TYPE(ring) *r;
 	struct rte_tailq_entry *te;
 	const struct rte_memzone *mz;
 	ssize_t ring_size;
@@ -1304,7 +1312,7 @@ rte_ring_create(const char *name, unsigned int count, int socket_id,
 
 	ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = TYPE(ring_get_memsize)(count);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -1334,7 +1342,7 @@ rte_ring_create(const char *name, unsigned int count, int socket_id,
 	if (mz != NULL) {
 		r = mz->addr;
 		/* no need to check return value here, checked the args above */
-		rte_ring_init(r, name, count, flags);
+		TYPE(ring_init)(r, name, count, flags);
 
 		te->data = (void *) r;
 		r->memzone = mz;
@@ -1357,7 +1365,7 @@ rte_ring_create(const char *name, unsigned int count, int socket_id,
  *   Ring to free
  */
 static inline void
-rte_ring_free(struct rte_ring *r)
+TYPE(ring_free)(struct TYPE(ring) *r)
 {
 	struct rte_ring_list *ring_list = NULL;
 	struct rte_tailq_entry *te;
@@ -1366,11 +1374,12 @@ rte_ring_free(struct rte_ring *r)
 		return;
 
 	/*
-	 * Ring was not created with rte_ring_create,
+	 * Ring was not created with create,
 	 * therefore, there is no memzone to free.
 	 */
 	if (r->memzone == NULL) {
-		RTE_LOG(ERR, RING, "Cannot free ring (not created with rte_ring_create()");
+		RTE_LOG(ERR, RING,
+			"Cannot free ring (not created with create())\n");
 		return;
 	}
 
@@ -1419,7 +1428,7 @@ rte_ring_free(struct rte_ring *r)
  *   - -EINVAL: Invalid water mark value.
  */
 static inline int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned int count)
+TYPE(ring_set_water_mark)(struct TYPE(ring) *r, unsigned int count)
 {
 	if (count >= r->prod.size)
 		return -EINVAL;
@@ -1441,7 +1450,7 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned int count)
  *   A pointer to the ring structure.
  */
 static inline void
-rte_ring_dump(FILE *f, const struct rte_ring *r)
+TYPE(ring_dump)(FILE *f, const struct TYPE(ring) *r)
 {
 #ifdef RTE_LIBRTE_RING_DEBUG
 	struct rte_ring_debug_stats sum;
@@ -1455,8 +1464,8 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
-	fprintf(f, "  used=%u\n", rte_ring_count(r));
-	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
+	fprintf(f, "  used=%u\n", TYPE(ring_count)(r));
+	fprintf(f, "  avail=%u\n", TYPE(ring_free_count)(r));
 	if (r->prod.watermark == r->prod.size)
 		fprintf(f, "  watermark=0\n");
 	else
@@ -1500,7 +1509,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
  *   A pointer to a file for output
  */
 static inline void
-rte_ring_list_dump(FILE *f)
+TYPE(ring_list_dump)(FILE *f)
 {
 	const struct rte_tailq_entry *te;
 	struct rte_ring_list *ring_list;
@@ -1510,7 +1519,7 @@ rte_ring_list_dump(FILE *f)
 	rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
 
 	TAILQ_FOREACH(te, ring_list, next) {
-		rte_ring_dump(f, (struct rte_ring *) te->data);
+		TYPE(ring_dump)(f, (struct TYPE(ring) *) te->data);
 	}
 
 	rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -1526,11 +1535,11 @@ rte_ring_list_dump(FILE *f)
  *   with rte_errno set appropriately. Possible rte_errno values include:
  *    - ENOENT - required entry not available to return.
  */
-static inline struct rte_ring *
-rte_ring_lookup(const char *name)
+static inline struct TYPE(ring) *
+TYPE(ring_lookup)(const char *name)
 {
 	struct rte_tailq_entry *te;
-	struct rte_ring *r = NULL;
+	struct TYPE(ring) *r = NULL;
 	struct rte_ring_list *ring_list;
 
 	ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
@@ -1538,7 +1547,7 @@ rte_ring_lookup(const char *name)
 	rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
 
 	TAILQ_FOREACH(te, ring_list, next) {
-		r = (struct rte_ring *) te->data;
+		r = (struct TYPE(ring) *) te->data;
 		if (strncmp(name, r->name, RTE_RING_NAMESIZE) == 0)
 			break;
 	}
-- 
2.9.3

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [RFC PATCH 01/11] ring: add new typed ring header file
  2017-01-11 15:05  3% [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Bruce Richardson
@ 2017-01-11 15:05  2% ` Bruce Richardson
  2017-01-11 15:05  1% ` [dpdk-dev] [RFC PATCH 05/11] ring: add user-specified typing to typed rings Bruce Richardson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-11 15:05 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

initially this is a clone of rte_ring.h with checkpatch errors/warnings
fixed, but will be modified by later commits to be a generic ring
implementation.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_ring/Makefile         |    1 +
 lib/librte_ring/rte_typed_ring.h | 1285 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 1286 insertions(+)
 create mode 100644 lib/librte_ring/rte_typed_ring.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 4b1112e..3aa494c 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -45,6 +45,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include += rte_typed_ring.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal
 
diff --git a/lib/librte_ring/rte_typed_ring.h b/lib/librte_ring/rte_typed_ring.h
new file mode 100644
index 0000000..18cc6fe
--- /dev/null
+++ b/lib/librte_ring/rte_typed_ring.h
@@ -0,0 +1,1285 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Derived from FreeBSD's bufring.h
+ *
+ **************************************************************************
+ *
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *    this list of conditions and the following disclaimer.
+ *
+ * 2. The name of Kip Macy nor the names of other
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ ***************************************************************************/
+
+#ifndef _RTE_RING_H_
+#define _RTE_RING_H_
+
+/**
+ * @file
+ * RTE Ring
+ *
+ * The Ring Manager is a fixed-size queue, implemented as a table of
+ * pointers. Head and tail pointers are modified atomically, allowing
+ * concurrent access to it. It has the following features:
+ *
+ * - FIFO (First In First Out)
+ * - Maximum size is fixed; the pointers are stored in a table.
+ * - Lockless implementation.
+ * - Multi- or single-consumer dequeue.
+ * - Multi- or single-producer enqueue.
+ * - Bulk dequeue.
+ * - Bulk enqueue.
+ *
+ * Note: the ring implementation is not preemptible. A lcore must not
+ * be interrupted by another task that uses the same ring.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+
+#define RTE_TAILQ_RING_NAME "RTE_RING"
+
+enum rte_ring_queue_behavior {
+	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
+	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
+};
+
+#ifdef RTE_LIBRTE_RING_DEBUG
+/**
+ * A structure that stores the ring statistics (per-lcore).
+ */
+struct rte_ring_debug_stats {
+	uint64_t enq_success_bulk; /**< Successful enqueues number. */
+	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
+	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
+	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
+	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
+	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
+	uint64_t deq_success_bulk; /**< Successful dequeues number. */
+	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
+	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
+	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
+} __rte_cache_aligned;
+#endif
+
+#define RTE_RING_MZ_PREFIX "RG_"
+/**< The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
+
+#ifndef RTE_RING_PAUSE_REP_COUNT
+/**
+ * Yield after pause num of times, no yield
+ * if RTE_RING_PAUSE_REP not defined.
+ */
+#define RTE_RING_PAUSE_REP_COUNT 0
+#endif
+
+struct rte_memzone; /* forward declaration, so as not to require memzone.h */
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct rte_ring {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
+	int flags;                       /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the rte_ring */
+
+	/** Ring producer status. */
+	struct prod {
+		uint32_t watermark;      /**< Maximum items before EDQUOT. */
+		uint32_t sp_enqueue;     /**< True, if single producer. */
+		uint32_t size;           /**< Size of ring. */
+		uint32_t mask;           /**< Mask (size-1) of ring. */
+		volatile uint32_t head;  /**< Producer head. */
+		volatile uint32_t tail;  /**< Producer tail. */
+	} prod __rte_cache_aligned;
+
+	/** Ring consumer status. */
+	struct cons {
+		uint32_t sc_dequeue;     /**< True, if single consumer. */
+		uint32_t size;           /**< Size of the ring. */
+		uint32_t mask;           /**< Mask (size-1) of ring. */
+		volatile uint32_t head;  /**< Consumer head. */
+		volatile uint32_t tail;  /**< Consumer tail. */
+#ifdef RTE_RING_SPLIT_PROD_CONS
+	} cons __rte_cache_aligned;
+#else
+	} cons;
+#endif
+
+#ifdef RTE_LIBRTE_RING_DEBUG
+	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
+#endif
+
+	/**
+	 * Memory space of ring starts here.
+	 * not volatile so need to be careful
+	 * about compiler re-ordering
+	 */
+	void *ring[] __rte_cache_aligned;
+};
+
+#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
+#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
+#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
+#define RTE_RING_SZ_MASK  (unsigned int)(0x0fffffff) /**< Ring size mask */
+
+/**
+ * @internal When debug is enabled, store ring statistics.
+ * @param r
+ *   A pointer to the ring.
+ * @param name
+ *   The name of the statistics field to increment in the ring.
+ * @param n
+ *   The number to add to the object-oriented statistics.
+ */
+#ifdef RTE_LIBRTE_RING_DEBUG
+#define __RING_STAT_ADD(r, name, n) do {                        \
+		unsigned int __lcore_id = rte_lcore_id();       \
+		if (__lcore_id < RTE_MAX_LCORE) {               \
+			r->stats[__lcore_id].name##_objs += n;  \
+			r->stats[__lcore_id].name##_bulk += 1;  \
+		}                                               \
+	} while (0)
+#else
+#define __RING_STAT_ADD(r, name, n) do {} while (0)
+#endif
+
+/**
+ * Calculate the memory size needed for a ring
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_ring and the size of the memory needed by the
+ * objects pointers. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_ring_get_memsize(unsigned int count);
+
+/**
+ * Initialize a ring structure.
+ *
+ * Initialize a ring structure in memory pointed by "r". The size of the
+ * memory area must be large enough to store the ring structure and the
+ * object table. It is advised to use rte_ring_get_memsize() to get the
+ * appropriate size.
+ *
+ * The ring size is set to *count*, which must be a power of two. Water
+ * marking is disabled by default. The real usable ring size is
+ * *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is not added in RTE_TAILQ_RING global list. Indeed, the
+ * memory given by the caller may not be shareable among dpdk
+ * processes.
+ *
+ * @param r
+ *   The pointer to the ring structure followed by the objects table.
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+int rte_ring_init(struct rte_ring *r, const char *name, unsigned int count,
+	unsigned int flags);
+
+/**
+ * Create a new ring named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The size of the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_ring *rte_ring_create(const char *name, unsigned int count,
+				 int socket_id, unsigned int flags);
+/**
+ * De-allocate all memory used by the ring.
+ *
+ * @param r
+ *   Ring to free
+ */
+void rte_ring_free(struct rte_ring *r);
+
+/**
+ * Change the high water mark.
+ *
+ * If *count* is 0, water marking is disabled. Otherwise, it is set to the
+ * *count* value. The *count* value must be greater than 0 and less
+ * than the ring size.
+ *
+ * This function can be called at any time (not necessarily at
+ * initialization).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param count
+ *   The new water mark value.
+ * @return
+ *   - 0: Success; water mark changed.
+ *   - -EINVAL: Invalid water mark value.
+ */
+int rte_ring_set_water_mark(struct rte_ring *r, unsigned int count);
+
+/**
+ * Dump the status of the ring to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param r
+ *   A pointer to the ring structure.
+ */
+void rte_ring_dump(FILE *f, const struct rte_ring *r);
+
+/* the actual enqueue of pointers on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions
+ */
+#define ENQUEUE_PTRS() do { \
+	const uint32_t size = r->prod.size; \
+	uint32_t idx = prod_head & mask; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~0x3U)); i += 4, idx += 4) { \
+			r->ring[idx] = obj_table[i]; \
+			r->ring[idx+1] = obj_table[i+1]; \
+			r->ring[idx+2] = obj_table[i+2]; \
+			r->ring[idx+3] = obj_table[i+3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			r->ring[idx++] = obj_table[i++]; /* fallthrough */ \
+		case 2: \
+			r->ring[idx++] = obj_table[i++]; /* fallthrough */ \
+		case 1: \
+			r->ring[idx++] = obj_table[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			r->ring[idx] = obj_table[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			r->ring[idx] = obj_table[i]; \
+	} \
+} while (0)
+
+/* the actual copy of pointers on the ring to obj_table.
+ * Placed here since identical code needed in both
+ * single and multi consumer dequeue functions
+ */
+#define DEQUEUE_PTRS() do { \
+	uint32_t idx = cons_head & mask; \
+	const uint32_t size = r->cons.size; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~0x3U)); i += 4, idx += 4) { \
+			obj_table[i] = r->ring[idx]; \
+			obj_table[i+1] = r->ring[idx+1]; \
+			obj_table[i+2] = r->ring[idx+2]; \
+			obj_table[i+3] = r->ring[idx+3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			obj_table[i++] = r->ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj_table[i++] = r->ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj_table[i++] = r->ring[idx++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj_table[i] = r->ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj_table[i] = r->ring[idx]; \
+	} \
+} while (0)
+
+/**
+ * @internal Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects enqueued.
+ */
+static inline int __attribute__((always_inline))
+__rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t cons_tail, free_entries;
+	const unsigned int max = n;
+	int success;
+	unsigned int i, rep = 0;
+	uint32_t mask = r->prod.mask;
+	int ret;
+
+	/* Avoid the unnecessary cmpset operation below, which is also
+	 * potentially harmful when n equals 0.
+	 */
+	if (n == 0)
+		return 0;
+
+	/* move prod.head atomically */
+	do {
+		/* Reset n to the initial burst count */
+		n = max;
+
+		prod_head = r->prod.head;
+		cons_tail = r->cons.tail;
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * prod_head > cons_tail). So 'free_entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		free_entries = (mask + cons_tail - prod_head);
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > free_entries)) {
+			if (behavior == RTE_RING_QUEUE_FIXED) {
+				__RING_STAT_ADD(r, enq_fail, n);
+				return -ENOBUFS;
+			}
+
+			/* Check for space for at least 1 entry */
+			if (unlikely(free_entries == 0)) {
+				__RING_STAT_ADD(r, enq_fail, n);
+				return 0;
+			}
+
+			n = free_entries;
+		}
+
+		prod_next = prod_head + n;
+		success = rte_atomic32_cmpset(&r->prod.head, prod_head,
+					      prod_next);
+	} while (unlikely(success == 0));
+
+	/* write entries in ring */
+	ENQUEUE_PTRS();
+	rte_smp_wmb();
+
+	/* if we exceed the watermark */
+	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
+				(int)(n | RTE_RING_QUOT_EXCEED);
+		__RING_STAT_ADD(r, enq_quota, n);
+	} else {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+		__RING_STAT_ADD(r, enq_success, n);
+	}
+
+	/*
+	 * If there are other enqueues in progress that preceded us,
+	 * we need to wait for them to complete
+	 */
+	while (unlikely(r->prod.tail != prod_head)) {
+		rte_pause();
+
+		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
+		 * for other thread finish. It gives pre-empted thread a chance
+		 * to proceed and finish with ring dequeue operation.
+		 */
+		if (RTE_RING_PAUSE_REP_COUNT &&
+		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
+			rep = 0;
+			sched_yield();
+		}
+	}
+	r->prod.tail = prod_next;
+	return ret;
+}
+
+/**
+ * @internal Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects enqueued.
+ */
+static inline int __attribute__((always_inline))
+__rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t prod_head, cons_tail;
+	uint32_t prod_next, free_entries;
+	unsigned int i;
+	uint32_t mask = r->prod.mask;
+	int ret;
+
+	prod_head = r->prod.head;
+	cons_tail = r->cons.tail;
+	/* The subtraction is done between two unsigned 32bits value
+	 * (the result is always modulo 32 bits even if we have
+	 * prod_head > cons_tail). So 'free_entries' is always between 0
+	 * and size(ring)-1.
+	 */
+	free_entries = mask + cons_tail - prod_head;
+
+	/* check that we have enough room in ring */
+	if (unlikely(n > free_entries)) {
+		if (behavior == RTE_RING_QUEUE_FIXED) {
+			__RING_STAT_ADD(r, enq_fail, n);
+			return -ENOBUFS;
+		}
+
+		/* Check for space for at least 1 entry */
+		if (unlikely(free_entries == 0)) {
+			__RING_STAT_ADD(r, enq_fail, n);
+			return 0;
+		}
+
+		n = free_entries;
+	}
+
+	prod_next = prod_head + n;
+	r->prod.head = prod_next;
+
+	/* write entries in ring */
+	ENQUEUE_PTRS();
+	rte_smp_wmb();
+
+	/* if we exceed the watermark */
+	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
+			(int)(n | RTE_RING_QUOT_EXCEED);
+		__RING_STAT_ADD(r, enq_quota, n);
+	} else {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+		__RING_STAT_ADD(r, enq_success, n);
+	}
+
+	r->prod.tail = prod_next;
+	return ret;
+}
+
+/**
+ * @internal Dequeue several objects from a ring (multi-consumers safe). When
+ * the request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects dequeued.
+ */
+
+static inline int __attribute__((always_inline))
+__rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
+		 unsigned int n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t cons_head, prod_tail;
+	uint32_t cons_next, entries;
+	const unsigned int max = n;
+	int success;
+	unsigned int i, rep = 0;
+	uint32_t mask = r->prod.mask;
+
+	/* Avoid the unnecessary cmpset operation below, which is also
+	 * potentially harmful when n equals 0.
+	 */
+	if (n == 0)
+		return 0;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = max;
+
+		cons_head = r->cons.head;
+		prod_tail = r->prod.tail;
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		entries = (prod_tail - cons_head);
+
+		/* Set the actual entries for dequeue */
+		if (n > entries) {
+			if (behavior == RTE_RING_QUEUE_FIXED) {
+				__RING_STAT_ADD(r, deq_fail, n);
+				return -ENOENT;
+			}
+
+			if (unlikely(entries == 0)) {
+				__RING_STAT_ADD(r, deq_fail, n);
+				return 0;
+			}
+
+			n = entries;
+		}
+
+		cons_next = cons_head + n;
+		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
+					      cons_next);
+	} while (unlikely(success == 0));
+
+	/* copy in table */
+	DEQUEUE_PTRS();
+	rte_smp_rmb();
+
+	/*
+	 * If there are other dequeues in progress that preceded us,
+	 * we need to wait for them to complete
+	 */
+	while (unlikely(r->cons.tail != cons_head)) {
+		rte_pause();
+
+		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
+		 * for other thread finish. It gives pre-empted thread a chance
+		 * to proceed and finish with ring dequeue operation.
+		 */
+		if (RTE_RING_PAUSE_REP_COUNT &&
+		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
+			rep = 0;
+			sched_yield();
+		}
+	}
+	__RING_STAT_ADD(r, deq_success, n);
+	r->cons.tail = cons_next;
+
+	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+}
+
+/**
+ * @internal Dequeue several objects from a ring (NOT multi-consumers safe).
+ * When the request objects are more than the available objects, only dequeue
+ * the actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects dequeued.
+ */
+static inline int __attribute__((always_inline))
+__rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
+		 unsigned int n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t cons_head, prod_tail;
+	uint32_t cons_next, entries;
+	unsigned int i;
+	uint32_t mask = r->prod.mask;
+
+	cons_head = r->cons.head;
+	prod_tail = r->prod.tail;
+	/* The subtraction is done between two unsigned 32bits value
+	 * (the result is always modulo 32 bits even if we have
+	 * cons_head > prod_tail). So 'entries' is always between 0
+	 * and size(ring)-1.
+	 */
+	entries = prod_tail - cons_head;
+
+	if (n > entries) {
+		if (behavior == RTE_RING_QUEUE_FIXED) {
+			__RING_STAT_ADD(r, deq_fail, n);
+			return -ENOENT;
+		}
+
+		if (unlikely(entries == 0)) {
+			__RING_STAT_ADD(r, deq_fail, n);
+			return 0;
+		}
+
+		n = entries;
+	}
+
+	cons_next = cons_head + n;
+	r->cons.head = cons_next;
+
+	/* copy in table */
+	DEQUEUE_PTRS();
+	rte_smp_rmb();
+
+	__RING_STAT_ADD(r, deq_success, n);
+	r->cons.tail = cons_next;
+	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n)
+{
+	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n)
+{
+	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+		      unsigned int n)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n);
+	else
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
+{
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
+{
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_enqueue(struct rte_ring *r, void *obj)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue(r, obj);
+	else
+		return rte_ring_mp_enqueue(r, obj);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
+{
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
+{
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+	else
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
+{
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_dequeue(struct rte_ring *r, void **obj_p)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue(r, obj_p);
+	else
+		return rte_ring_mc_dequeue(r, obj_p);
+}
+
+/**
+ * Test if a ring is full.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is full.
+ *   - 0: The ring is not full.
+ */
+static inline int
+rte_ring_full(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+}
+
+/**
+ * Test if a ring is empty.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is empty.
+ *   - 0: The ring is not empty.
+ */
+static inline int
+rte_ring_empty(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return !!(cons_tail == prod_tail);
+}
+
+/**
+ * Return the number of entries in a ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of entries in the ring.
+ */
+static inline unsigned
+rte_ring_count(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return (prod_tail - cons_tail) & r->prod.mask;
+}
+
+/**
+ * Return the number of free entries in a ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of free entries in the ring.
+ */
+static inline unsigned
+rte_ring_free_count(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return (cons_tail - prod_tail - 1) & r->prod.mask;
+}
+
+/**
+ * Dump the status of all rings on the console
+ *
+ * @param f
+ *   A pointer to a file for output
+ */
+void rte_ring_list_dump(FILE *f);
+
+/**
+ * Search a ring from its name
+ *
+ * @param name
+ *   The name of the ring.
+ * @return
+ *   The pointer to the ring matching the name, or NULL if not found,
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - ENOENT - required entry not available to return.
+ */
+struct rte_ring *rte_ring_lookup(const char *name);
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned int __attribute__((always_inline))
+rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n)
+{
+	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned int __attribute__((always_inline))
+rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n)
+{
+	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned int __attribute__((always_inline))
+rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+		      unsigned int n)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue_burst(r, obj_table, n);
+	else
+		return rte_ring_mp_enqueue_burst(r, obj_table, n);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static inline unsigned int __attribute__((always_inline))
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
+{
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static inline unsigned int __attribute__((always_inline))
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
+{
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - Number of objects dequeued
+ */
+static inline unsigned int __attribute__((always_inline))
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned int n)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+	else
+		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_H_ */
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes
@ 2017-01-11 15:05  3% Bruce Richardson
  2017-01-11 15:05  2% ` [dpdk-dev] [RFC PATCH 01/11] ring: add new typed ring header file Bruce Richardson
                   ` (4 more replies)
  0 siblings, 5 replies; 200+ results
From: Bruce Richardson @ 2017-01-11 15:05 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The rte_ring library in DPDK provides an excellent high-performance
mechanism which can be used for passing pointers between cores and
for other tasks such as buffering. However, it does have a number
of limitations:

* type information of pointers is lost, as it works with void pointers
* typecasting is needed when using enqueue/dequeue burst functions,
  since arrays of other types cannot be automatically cast to void **
* the data to be passed through the ring itself must be no bigger than
  a pointer

While the first two limitations are an inconvenience, the final one is
one that can prevent use of rte_rings in cases where their functionality
is needed. The use-case which has inspired the patchset is that of
eventdev. When working with rte_events, each event is a 16-byte structure
consisting of a pointer and some metadata e.g. priority and type. For
these events, what is passed around between cores is not pointers to
events, but the events themselves. This makes existing rings unsuitable
for use by applications working with rte_events, and also for use
internally inside any software implementation of an eventdev.

For rings to handle events or other similarly sized structures, e.g.
NIC descriptors, etc., we then have two options - duplicate rte_ring
code to create new ring implementations for each of those types, or
generalise the existing code using macros so that the data type handled
by each rings is a compile time paramter. This patchset takes the latter
approach, and once applied would allow us to add an rte_event_ring type
to DPDK using a header file containing:

#define RING_TYPE struct rte_event
#define RING_TYPE_NAME rte_event
#include <rte_typed_ring.h>
#undef RING_TYPE_NAME
#undef RING_TYPE

[NOTE: the event_ring is not defined in this set, since it depends on
the eventdev implementation not present in the main tree]

If we want to elimiate some of the typecasting on our code when enqueuing
and dequeuing mbuf pointers, an rte_mbuf_ring type can be similarly
created using the same number of lines of code.

The downside of this generalisation is that the code for the rings now
has far more use of macros in it. However, I do not feel that overall
readability suffers much from this change, the since the changes are
pretty much just search-replace onces. There should also be no ABI
compatibility issues with this change, since the existing rte_ring
structures remain the same.

Bruce Richardson (11):
  ring: add new typed ring header file
  test: add new test file for typed rings
  ring: add ring management functions to typed ring header
  ring: make ring tailq variable public
  ring: add user-specified typing to typed rings
  ring: use existing power-of-2 function
  ring: allow multiple typed rings in the same unit
  app/pdump: remove duplicate macro definition
  ring: make existing rings reuse the typed ring definitions
  ring: reuse typed rings management functions
  ring: reuse typed ring enqueue and dequeue functions

 app/pdump/main.c                     |    1 -
 app/test/Makefile                    |    1 +
 app/test/test_typed_ring.c           |  156 ++++
 lib/librte_ring/Makefile             |    1 +
 lib/librte_ring/rte_ring.c           |  246 +-----
 lib/librte_ring/rte_ring.h           |  563 +-----------
 lib/librte_ring/rte_ring_version.map |    7 +
 lib/librte_ring/rte_typed_ring.h     | 1570 ++++++++++++++++++++++++++++++++++
 8 files changed, 1758 insertions(+), 787 deletions(-)
 create mode 100644 app/test/test_typed_ring.c
 create mode 100644 lib/librte_ring/rte_typed_ring.h

-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 1/5] ethdev: add firmware version get
  @ 2017-01-11  6:41  5%               ` Qiming Yang
    1 sibling, 0 replies; 200+ results
From: Qiming Yang @ 2017-01-11  6:41 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, remy.horton, Qiming Yang

This patch adds a new API 'rte_eth_dev_fw_version_get' for
fetching firmware version by a given device.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
 doc/guides/nics/features/default.ini   |  1 +
 doc/guides/rel_notes/deprecation.rst   |  4 ----
 doc/guides/rel_notes/release_17_02.rst |  5 +++++
 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 25 +++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  1 +
 6 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index f1bf9bf..ae40d57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -50,6 +50,7 @@ Timesync             =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 054e2e7..755dc65 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -30,10 +30,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
-  will be extended with a new member ``fw_version`` in order to store
-  the NIC firmware version.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 5762d3f..f9134bb 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -66,6 +66,11 @@ New Features
   Support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps adapters
   has been added to the existing mlx5 PMD.
 
+* **Added firmware version get API.**
+
+  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
+  version by a given device.
+
 Resolved Issues
 ---------------
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 917557a..89cffcf 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1588,6 +1588,18 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
 			STAT_QMAP_RX);
 }
 
+int
+rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, size_t fw_size)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->fw_version_get, -ENOTSUP);
+	return (*dev->dev_ops->fw_version_get)(dev, fw_version, fw_size);
+}
+
 void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ded43d7..a9b3686 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1177,6 +1177,10 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
+				     char *fw_version, size_t fw_size);
+/**< @internal Get firmware information of an Ethernet device. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1459,6 +1463,7 @@ struct eth_dev_ops {
 	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
 	eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
 	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
+	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
 	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
 	/**< Get packet types supported and identified by device. */
 
@@ -2396,6 +2401,26 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
 void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
 
 /**
+ * Retrieve the firmware version of a device.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param fw_version
+ *   A array pointer to store the firmware version of a device,
+ *   allocated by caller.
+ * @param fw_size
+ *   The size of the array pointed by fw_version, which should be
+ *   large enough to store firmware version of the device.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if operation is not supported.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if *fw_size* is not enough to store firmware version.
+ */
+int rte_eth_dev_fw_version_get(uint8_t port_id,
+			       char *fw_version, size_t fw_size);
+
+/**
  * Retrieve the supported packet types of an Ethernet device.
  *
  * When a packet type is announced as supported, it *must* be recognized by
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 0c2859e..c6c9d0d 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -146,6 +146,7 @@ DPDK_17.02 {
 	global:
 
 	_rte_eth_dev_reset;
+	rte_eth_dev_fw_version_get;
 	rte_flow_create;
 	rte_flow_destroy;
 	rte_flow_flush;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v6 1/5] ethdev: add firmware version get
  @ 2017-01-10  9:08  5%           ` Qiming Yang
    0 siblings, 1 reply; 200+ results
From: Qiming Yang @ 2017-01-10  9:08 UTC (permalink / raw)
  To: dev; +Cc: remy.horton, ferruh.yigit, helin.zhang, Qiming Yang

This patch adds a new API 'rte_eth_dev_fw_version_get' for
fetching firmware version by a given device.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
 doc/guides/nics/features/default.ini   |  1 +
 doc/guides/rel_notes/deprecation.rst   |  4 ----
 doc/guides/rel_notes/release_17_02.rst |  3 +++
 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 25 +++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  1 +
 6 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index f1bf9bf..ae40d57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -50,6 +50,7 @@ Timesync             =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1438c77..291e03d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -30,10 +30,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
-  will be extended with a new member ``fw_version`` in order to store
-  the NIC firmware version.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 180af82..260033d 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -52,6 +52,9 @@ New Features
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
+* **Added firmware version get API.**
+ Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
+ version by a given device.
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9dea1f1..49ca42d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1588,6 +1588,18 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
 			STAT_QMAP_RX);
 }
 
+int
+rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, size_t fw_size)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->fw_version_get, -ENOTSUP);
+	return (*dev->dev_ops->fw_version_get)(dev, fw_version, fw_size);
+}
+
 void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1c356c1..357612d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1177,6 +1177,10 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
+				     char *fw_version, size_t fw_size);
+/**< @internal Get firmware information of an Ethernet device. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1487,6 +1491,7 @@ struct eth_dev_ops {
 	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
 	eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
 	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
+	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
 	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
 	/**< Get packet types supported and identified by device. */
 
@@ -2430,6 +2435,26 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
 void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
 
 /**
+ * Retrieve the firmware version of a device.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param fw_version
+ *   A array pointer to store the firmware version of a device,
+ *   allocated by caller.
+ * @param fw_size
+ *   The size of the array pointed by fw_version, which should be
+ *   large enough to store firmware version of the device.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if operation is not supported.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if *fw_size* is not enough to store firmware version.
+ */
+int rte_eth_dev_fw_version_get(uint8_t port_id,
+			       char *fw_version, size_t fw_size);
+
+/**
  * Retrieve the supported packet types of an Ethernet device.
  *
  * When a packet type is announced as supported, it *must* be recognized by
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a021781..0cf94ed 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -151,6 +151,7 @@ DPDK_17.02 {
 	global:
 
 	_rte_eth_dev_reset;
+	rte_eth_dev_fw_version_get;
 	rte_flow_create;
 	rte_flow_destroy;
 	rte_flow_flush;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [DPDK 1/5] ethdev: add firmware version get
    2017-01-08  4:11  5%         ` [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get Qiming Yang
@ 2017-01-10  9:00  5%         ` Qiming Yang
    2 siblings, 0 replies; 200+ results
From: Qiming Yang @ 2017-01-10  9:00 UTC (permalink / raw)
  To: dev; +Cc: remy.horton, ferruh.yigit, helin.zhang, Qiming Yang

This patch adds a new API 'rte_eth_dev_fw_version_get' for
fetching firmware version by a given device.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
 doc/guides/nics/features/default.ini   |  1 +
 doc/guides/rel_notes/deprecation.rst   |  4 ----
 doc/guides/rel_notes/release_17_02.rst |  3 +++
 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 25 +++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  1 +
 6 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index f1bf9bf..ae40d57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -50,6 +50,7 @@ Timesync             =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1438c77..291e03d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -30,10 +30,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
-  will be extended with a new member ``fw_version`` in order to store
-  the NIC firmware version.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 180af82..260033d 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -52,6 +52,9 @@ New Features
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
+* **Added firmware version get API.**
+ Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
+ version by a given device.
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9dea1f1..49ca42d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1588,6 +1588,18 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
 			STAT_QMAP_RX);
 }
 
+int
+rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, size_t fw_size)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->fw_version_get, -ENOTSUP);
+	return (*dev->dev_ops->fw_version_get)(dev, fw_version, fw_size);
+}
+
 void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1c356c1..357612d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1177,6 +1177,10 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
+				     char *fw_version, size_t fw_size);
+/**< @internal Get firmware information of an Ethernet device. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1487,6 +1491,7 @@ struct eth_dev_ops {
 	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
 	eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
 	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
+	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
 	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
 	/**< Get packet types supported and identified by device. */
 
@@ -2430,6 +2435,26 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
 void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
 
 /**
+ * Retrieve the firmware version of a device.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param fw_version
+ *   A array pointer to store the firmware version of a device,
+ *   allocated by caller.
+ * @param fw_size
+ *   The size of the array pointed by fw_version, which should be
+ *   large enough to store firmware version of the device.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if operation is not supported.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if *fw_size* is not enough to store firmware version.
+ */
+int rte_eth_dev_fw_version_get(uint8_t port_id,
+			       char *fw_version, size_t fw_size);
+
+/**
  * Retrieve the supported packet types of an Ethernet device.
  *
  * When a packet type is announced as supported, it *must* be recognized by
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a021781..0cf94ed 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -151,6 +151,7 @@ DPDK_17.02 {
 	global:
 
 	_rte_eth_dev_reset;
+	rte_eth_dev_fw_version_get;
 	rte_flow_create;
 	rte_flow_destroy;
 	rte_flow_flush;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get
  2017-01-08  4:11  5%         ` [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get Qiming Yang
@ 2017-01-08  6:38  0%           ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2017-01-08  6:38 UTC (permalink / raw)
  To: Qiming Yang, dev; +Cc: ferruh.yigit, helin.zhang, remy.horton

On 01/08/2017 07:11 AM, Qiming Yang wrote:
> This patch adds a new API 'rte_eth_dev_fw_version_get' for
> fetching firmware version by a given device.
>
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>
> Acked-by: Remy Horton <remy.horton@intel.com>
> ---
> v2 changes:
> * modified some comment statements.
> v3 changes:
> * change API, use rte_eth_dev_fw_info_get(uint8_t port_id,
>    uint32_t *fw_major, uint32_t *fw_minor, uint32_t *fw_patch,
>    uint32_t *etrack_id) instead of rte_eth_dev_fwver_get(uint8_t port_id,
>    char *fw_version, int fw_length).
>    Add statusment in /doc/guides/nics/features/default.ini and
>    release_17_02.rst.
> v4 changes:
> * remove deprecation notice, rename API as rte_eth_dev_fw_version_get.
> v5 changes:
> * change API, use rte_eth_dev_fw_version_get(uint8_t port_id,
>    char *fw_version, int fw_length).
> ---
> ---
>   doc/guides/nics/features/default.ini   |  1 +
>   doc/guides/rel_notes/deprecation.rst   |  4 ----
>   doc/guides/rel_notes/release_17_02.rst |  3 +++
>   lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
>   lib/librte_ether/rte_ethdev.h          | 20 ++++++++++++++++++++
>   lib/librte_ether/rte_ether_version.map |  1 +
>   6 files changed, 37 insertions(+), 4 deletions(-)
>
> diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> index f1bf9bf..ae40d57 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -50,6 +50,7 @@ Timesync             =
>   Basic stats          =
>   Extended stats       =
>   Stats per queue      =
> +FW version           =
>   EEPROM dump          =
>   Registers dump       =
>   Multiprocess aware   =
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 1438c77..291e03d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -30,10 +30,6 @@ Deprecation Notices
>     ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
>     segments limit to be transmitted by device for TSO/non-TSO packets.
>   
> -* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
> -  will be extended with a new member ``fw_version`` in order to store
> -  the NIC firmware version.
> -
>   * ethdev: an API change is planned for 17.02 for the function
>     ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
>     instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
> diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
> index 180af82..260033d 100644
> --- a/doc/guides/rel_notes/release_17_02.rst
> +++ b/doc/guides/rel_notes/release_17_02.rst
> @@ -52,6 +52,9 @@ New Features
>     See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
>     information.
>   
> +* **Added firmware version get API.**
> + Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
> + version by a given device.
>   
>   Resolved Issues
>   ---------------
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 280f0db..cb80476 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1586,6 +1586,18 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
>   }
>   
>   void
> +rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, int fw_length)

May be size_t should be used for fw_length? Corresponding argument of 
the snprintf()
has size_t type, sizeof(drvinfo.fw_version) is used as value of the 
parameter.

Also the prototype does not provide a way to communicate that fw_length 
is insufficient
to store firmware version. I'd suggest snprintf()-like return value. It 
is pretty easy for PMD
to provide and convenient for the API function caller to handle.

> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	dev = &rte_eth_devices[port_id];
> +
> +	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->fw_version_get);
> +	(*dev->dev_ops->fw_version_get)(dev, fw_version, fw_length);
> +}
> +
> +void
>   rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
>   {
>   	struct rte_eth_dev *dev;
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index fb51754..2be31d2 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1150,6 +1150,10 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
>   typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
>   /**< @internal Check DD bit of specific RX descriptor */
>   
> +typedef void (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
> +				     char *fw_version, int fw_length);
> +/**< @internal Get firmware information of an Ethernet device. */
> +
>   typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
>   	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
>   
> @@ -1455,6 +1459,7 @@ struct eth_dev_ops {
>   	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
>   	eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
>   	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
> +	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
>   	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
>   	/**< Get packet types supported and identified by device. */
>   
> @@ -2395,6 +2400,21 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
>   void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
>   
>   /**
> + * Retrieve the firmware version of a device.
> + *
> + * @param port_id
> + *   The port identifier of the device.
> + * @param fw_version
> + *   A array pointer to store the firmware version of a device,
> + *   allocated by caller.
> + * @param fw_length
> + *   The size of the array pointed by fw_version, which should be
> + *   large enough to store firmware version of the device.
> + */
> +void rte_eth_dev_fw_version_get(uint8_t port_id,
> +				char *fw_version, int fw_length);
> +
> +/**
>    * Retrieve the supported packet types of an Ethernet device.
>    *
>    * When a packet type is announced as supported, it *must* be recognized by
> diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
> index a021781..0cf94ed 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -151,6 +151,7 @@ DPDK_17.02 {
>   	global:
>   
>   	_rte_eth_dev_reset;
> +	rte_eth_dev_fw_version_get;
>   	rte_flow_create;
>   	rte_flow_destroy;
>   	rte_flow_flush;

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get
  @ 2017-01-08  4:11  5%         ` Qiming Yang
  2017-01-08  6:38  0%           ` Andrew Rybchenko
  2017-01-10  9:00  5%         ` [dpdk-dev] [DPDK " Qiming Yang
    2 siblings, 1 reply; 200+ results
From: Qiming Yang @ 2017-01-08  4:11 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, helin.zhang, remy.horton, Qiming Yang

This patch adds a new API 'rte_eth_dev_fw_version_get' for
fetching firmware version by a given device.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
v2 changes:
* modified some comment statements.
v3 changes:
* change API, use rte_eth_dev_fw_info_get(uint8_t port_id,
  uint32_t *fw_major, uint32_t *fw_minor, uint32_t *fw_patch,
  uint32_t *etrack_id) instead of rte_eth_dev_fwver_get(uint8_t port_id,
  char *fw_version, int fw_length).
  Add statusment in /doc/guides/nics/features/default.ini and
  release_17_02.rst.
v4 changes:
* remove deprecation notice, rename API as rte_eth_dev_fw_version_get.
v5 changes:
* change API, use rte_eth_dev_fw_version_get(uint8_t port_id,
  char *fw_version, int fw_length).
---
---
 doc/guides/nics/features/default.ini   |  1 +
 doc/guides/rel_notes/deprecation.rst   |  4 ----
 doc/guides/rel_notes/release_17_02.rst |  3 +++
 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 20 ++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  1 +
 6 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index f1bf9bf..ae40d57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -50,6 +50,7 @@ Timesync             =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1438c77..291e03d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -30,10 +30,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
-  will be extended with a new member ``fw_version`` in order to store
-  the NIC firmware version.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 180af82..260033d 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -52,6 +52,9 @@ New Features
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
+* **Added firmware version get API.**
+ Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
+ version by a given device.
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 280f0db..cb80476 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1586,6 +1586,18 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
 }
 
 void
+rte_eth_dev_fw_version_get(uint8_t port_id, char *fw_version, int fw_length)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->fw_version_get);
+	(*dev->dev_ops->fw_version_get)(dev, fw_version, fw_length);
+}
+
+void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fb51754..2be31d2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1150,6 +1150,10 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef void (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
+				     char *fw_version, int fw_length);
+/**< @internal Get firmware information of an Ethernet device. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1455,6 +1459,7 @@ struct eth_dev_ops {
 	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
 	eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
 	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
+	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
 	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
 	/**< Get packet types supported and identified by device. */
 
@@ -2395,6 +2400,21 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
 void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
 
 /**
+ * Retrieve the firmware version of a device.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param fw_version
+ *   A array pointer to store the firmware version of a device,
+ *   allocated by caller.
+ * @param fw_length
+ *   The size of the array pointed by fw_version, which should be
+ *   large enough to store firmware version of the device.
+ */
+void rte_eth_dev_fw_version_get(uint8_t port_id,
+				char *fw_version, int fw_length);
+
+/**
  * Retrieve the supported packet types of an Ethernet device.
  *
  * When a packet type is announced as supported, it *must* be recognized by
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a021781..0cf94ed 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -151,6 +151,7 @@ DPDK_17.02 {
 	global:
 
 	_rte_eth_dev_reset;
+	rte_eth_dev_fw_version_get;
 	rte_flow_create;
 	rte_flow_destroy;
 	rte_flow_flush;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  2017-01-05 10:44  4% [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev Bernard Iremonger
  2017-01-05 13:31  4% ` Thomas Monjalon
@ 2017-01-05 15:25  4% ` Bernard Iremonger
  1 sibling, 0 replies; 200+ results
From: Bernard Iremonger @ 2017-01-05 15:25 UTC (permalink / raw)
  To: dev, john.mcnamara; +Cc: Bernard Iremonger

In 17.05 nine rte_eth_dev_* functions will be removed from
librte_ether, renamed and moved to the ixgbe PMD.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
v2:
Used comma's to shorten lists.

 doc/guides/rel_notes/deprecation.rst | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1438c77..985cda8 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -79,3 +79,22 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* ethdev: for 17.05 it is planned to deprecate the following nine rte_eth_dev_* functions
+  and move them into the ixgbe PMD:
+
+  ``rte_eth_dev_bypass_init``, ``rte_eth_dev_bypass_state_set``, ``rte_eth_dev_bypass_state_show``,
+  ``rte_eth_dev_bypass_event_store``, ``rte_eth_dev_bypass_event_show``, ``rte_eth_dev_wd_timeout_store``,
+  ``rte_eth_dev_bypass_wd_timeout_show``, ``rte_eth_dev_bypass_ver_show``, ``rte_eth_dev_bypass_wd_reset``.
+
+  The following fields will be removed from ``struct eth_dev_ops``:
+
+  ``bypass_init_t``, ``bypass_state_set_t``, ``bypass_state_show_t``, ``bypass_event_set_t``,
+  ``bypass_event_show_t``, ``bypass_wd_timeout_set_t``, ``bypass_wd_timeout_show_t``,
+  ``bypass_ver_show_t``, ``bypass_wd_reset_t``.
+
+  The functions will be renamed to the following, and moved to the ``ixgbe`` PMD:
+
+  ``rte_pmd_ixgbe_bypass_init``, ``rte_pmd_ixgbe_bypass_state_set``, ``rte_pmd_ixgbe_bypass_state_show``,
+  ``rte_pmd_ixgbe_bypass_event_set``, ``rte_pmd_ixgbe_bypass_event_show``, ``rte_pmd_ixgbe_bypass_wd_timeout_set``,
+  ``rte_pmd_ixgbe_bypass_wd_timeout_show``, ``rte_pmd_ixgbe_bypass_ver_show``, ``rte_pmd_ixgbe_bypass_wd_reset``.
-- 
2.10.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev
  2017-01-05 13:31  4% ` Thomas Monjalon
@ 2017-01-05 14:40  4%   ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2017-01-05 14:40 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Mcnamara, John

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, January 5, 2017 1:31 PM
> To: Iremonger, Bernard <bernard.iremonger@intel.com>
> Cc: dev@dpdk.org; Mcnamara, John <john.mcnamara@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for
> ethdev
> 
> 2017-01-05 10:44, Bernard Iremonger:
> > In 17.05 nine rte_eth_dev_* functions will be removed from
> > librte_ether, renamed and moved to the ixgbe PMD.
> 
> I agree it is a good move to clean up ethdev API.
> 
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > +* ethdev: for 17.05 it is planned to deprecate the following nine
> > +rte_eth_dev_* functions
> > +  and move them into the ixgbe PMD:
> > +
> > +  ``rte_eth_dev_bypass_init``
> > +
> > +  ``rte_eth_dev_bypass_state_set``
> > +
> > +  ``rte_eth_dev_bypass_state_show``
> > +
> > +  ``rte_eth_dev_bypass_event_store``
> > +
> > +  ``rte_eth_dev_bypass_event_show``
> > +
> > +  ``rte_eth_dev_wd_timeout_store``
> > +
> > +  ``rte_eth_dev_bypass_wd_timeout_show``
> > +
> > +  ``rte_eth_dev_bypass_ver_show``
> > +
> > +  ``rte_eth_dev_bypass_wd_reset``
> > +
> > +  The following fields will be removed from ``struct eth_dev_ops``:
> > +
> > +  ``bypass_init_t``
> > +
> > +  ``bypass_state_set_t``
> > +
> > +  ``bypass_state_show_t``
> > +
> > +  ``bypass_event_set_t``
> > +
> > +  ``bypass_event_show_t``
> > +
> > +  ``bypass_wd_timeout_set_t``
> > +
> > +  ``bypass_wd_timeout_show_t``
> > +
> > +  ``bypass_ver_show_t``
> > +
> > +  ``bypass_wd_reset_t``
> > +
> > +  The functions will be renamed to the following, and moved to the
> ``ixgbe`` PMD:
> > +
> > +  ``rte_pmd_ixgbe_bypass_init``
> > +
> > +  ``rte_pmd_ixgbe_bypass_state_set``
> > +
> > +  ``rte_pmd_ixgbe_bypass_state_show``
> > +
> > +  ``rte_pmd_ixgbe_bypass_event_set``
> > +
> > +  ``rte_pmd_ixgbe_bypass_event_show``
> > +
> > +  ``rte_pmd_ixgbe_bypass_wd_timeout_set``
> > +
> > +  ``rte_pmd_ixgbe_bypass_wd_timeout_show``
> > +
> > +  ``rte_pmd_ixgbe_bypass_ver_show``
> > +
> > +  ``rte_pmd_ixgbe_bypass_wd_reset``
> >
> 
> Please could you make it shorter by using commas for listing?

I will use commas for the listing in v2.

Regards,

Bernard.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev
  2017-01-05 10:44  4% [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev Bernard Iremonger
@ 2017-01-05 13:31  4% ` Thomas Monjalon
  2017-01-05 14:40  4%   ` Iremonger, Bernard
  2017-01-05 15:25  4% ` [dpdk-dev] [PATCH v2] " Bernard Iremonger
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-01-05 13:31 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev, john.mcnamara

2017-01-05 10:44, Bernard Iremonger:
> In 17.05 nine rte_eth_dev_* functions will be removed from
> librte_ether, renamed and moved to the ixgbe PMD.

I agree it is a good move to clean up ethdev API.

> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> +* ethdev: for 17.05 it is planned to deprecate the following nine rte_eth_dev_* functions
> +  and move them into the ixgbe PMD:
> +
> +  ``rte_eth_dev_bypass_init``
> +
> +  ``rte_eth_dev_bypass_state_set``
> +
> +  ``rte_eth_dev_bypass_state_show``
> +
> +  ``rte_eth_dev_bypass_event_store``
> +
> +  ``rte_eth_dev_bypass_event_show``
> +
> +  ``rte_eth_dev_wd_timeout_store``
> +
> +  ``rte_eth_dev_bypass_wd_timeout_show``
> +
> +  ``rte_eth_dev_bypass_ver_show``
> +
> +  ``rte_eth_dev_bypass_wd_reset``
> +
> +  The following fields will be removed from ``struct eth_dev_ops``:
> +
> +  ``bypass_init_t``
> +
> +  ``bypass_state_set_t``
> +
> +  ``bypass_state_show_t``
> +
> +  ``bypass_event_set_t``
> +
> +  ``bypass_event_show_t``
> +
> +  ``bypass_wd_timeout_set_t``
> +
> +  ``bypass_wd_timeout_show_t``
> +
> +  ``bypass_ver_show_t``
> +
> +  ``bypass_wd_reset_t``
> +
> +  The functions will be renamed to the following, and moved to the ``ixgbe`` PMD:
> +
> +  ``rte_pmd_ixgbe_bypass_init``
> +
> +  ``rte_pmd_ixgbe_bypass_state_set``
> +
> +  ``rte_pmd_ixgbe_bypass_state_show``
> +
> +  ``rte_pmd_ixgbe_bypass_event_set``
> +
> +  ``rte_pmd_ixgbe_bypass_event_show``
> +
> +  ``rte_pmd_ixgbe_bypass_wd_timeout_set``
> +
> +  ``rte_pmd_ixgbe_bypass_wd_timeout_show``
> +
> +  ``rte_pmd_ixgbe_bypass_ver_show``
> +
> +  ``rte_pmd_ixgbe_bypass_wd_reset``
> 

Please could you make it shorter by using commas for listing?

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev
@ 2017-01-05 10:44  4% Bernard Iremonger
  2017-01-05 13:31  4% ` Thomas Monjalon
  2017-01-05 15:25  4% ` [dpdk-dev] [PATCH v2] " Bernard Iremonger
  0 siblings, 2 replies; 200+ results
From: Bernard Iremonger @ 2017-01-05 10:44 UTC (permalink / raw)
  To: dev, john.mcnamara; +Cc: Bernard Iremonger

In 17.05 nine rte_eth_dev_* functions will be removed from
librte_ether, renamed and moved to the ixgbe PMD.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 61 ++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1438c77..f3d79d8 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -79,3 +79,64 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* ethdev: for 17.05 it is planned to deprecate the following nine rte_eth_dev_* functions
+  and move them into the ixgbe PMD:
+
+  ``rte_eth_dev_bypass_init``
+
+  ``rte_eth_dev_bypass_state_set``
+
+  ``rte_eth_dev_bypass_state_show``
+
+  ``rte_eth_dev_bypass_event_store``
+
+  ``rte_eth_dev_bypass_event_show``
+
+  ``rte_eth_dev_wd_timeout_store``
+
+  ``rte_eth_dev_bypass_wd_timeout_show``
+
+  ``rte_eth_dev_bypass_ver_show``
+
+  ``rte_eth_dev_bypass_wd_reset``
+
+  The following fields will be removed from ``struct eth_dev_ops``:
+
+  ``bypass_init_t``
+
+  ``bypass_state_set_t``
+
+  ``bypass_state_show_t``
+
+  ``bypass_event_set_t``
+
+  ``bypass_event_show_t``
+
+  ``bypass_wd_timeout_set_t``
+
+  ``bypass_wd_timeout_show_t``
+
+  ``bypass_ver_show_t``
+
+  ``bypass_wd_reset_t``
+
+  The functions will be renamed to the following, and moved to the ``ixgbe`` PMD:
+
+  ``rte_pmd_ixgbe_bypass_init``
+
+  ``rte_pmd_ixgbe_bypass_state_set``
+
+  ``rte_pmd_ixgbe_bypass_state_show``
+
+  ``rte_pmd_ixgbe_bypass_event_set``
+
+  ``rte_pmd_ixgbe_bypass_event_show``
+
+  ``rte_pmd_ixgbe_bypass_wd_timeout_set``
+
+  ``rte_pmd_ixgbe_bypass_wd_timeout_show``
+
+  ``rte_pmd_ixgbe_bypass_ver_show``
+
+  ``rte_pmd_ixgbe_bypass_wd_reset``
-- 
2.10.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw version get
  2016-12-22 15:31  0%             ` Thomas Monjalon
  2016-12-23 12:48  0%               ` Ferruh Yigit
@ 2017-01-05  3:04  3%               ` Zhang, Helin
  1 sibling, 0 replies; 200+ results
From: Zhang, Helin @ 2017-01-05  3:04 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh; +Cc: Yang, Qiming, dev, Horton, Remy



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Thursday, December 22, 2016 11:31 PM
> To: Yigit, Ferruh
> Cc: Yang, Qiming; dev@dpdk.org; Horton, Remy
> Subject: Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw
> version get
> 
> 2016-12-22 15:05, Ferruh Yigit:
> > On 12/22/2016 2:47 PM, Thomas Monjalon wrote:
> > > 2016-12-22 14:36, Ferruh Yigit:
> > >> On 12/22/2016 11:07 AM, Thomas Monjalon wrote:
> > >>> I think it is OK to add a new dev_ops and a new API function for
> > >>> firmware query. Generally speaking, it is a good thing to avoid
> > >>> putting all informations in the same structure (e.g. rte_eth_dev_info).
> > >>
> > >> OK.
> > >>
> > >>> However, there
> > >>> is a balance to find. Could we plan to add more info to this new query?
> > >>> Instead of
> > >>> 	rte_eth_dev_fwver_get(uint8_t port_id, char *fw_version, int
> > >>> fw_length)
> > > [...]
> > >>> could it fill a struct?
> > >>> 	rte_eth_dev_fw_info_get(uint8_t port_id, struct
> > >>> rte_eth_dev_fw_info *fw_info)
> > >>
> > >> I believe this is better. But the problem we are having with this
> > >> usage
> > >> is: ABI breakage.
> > >>
> > >> Since this struct will be a public structure, in the future if we
> > >> want to add a new field to the struct, it will break the ABI, and
> > >> just this change will cause a new version for whole ethdev library!
> > >>
> > >> When all required fields received via arguments, one by one,
> > >> instead of struct, at least ABI versioning can be done on the API
> > >> when new field added, and can be possible to escape from ABI
> > >> breakage. But this will be ugly when number of arguments increased.
> > >>
> > >> Or any other opinion on how to define API to reduce ABI breakage?
> > >
> > > You're right.
> > > But I don't think we should have a function per data. Just because
> > > it would be ugly :)
> >
> > I am no suggesting function per data, instead something like:
> >
> > rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);
> >
> > And in the future if we need etrack_id too, we can have both in
> > versioned manner:
> >
> > rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);
> >
> > rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min,
> > 	uint32_t etrack_id);
> 
> Oh I see. So it can be versioned with compat macros.
> 
> > So my concern was if the number of the arguments becomes too many by
> time.
> 
> It looks to be a good proposal. We should not have a dozen of arguments.

I'd suggest to do that the similar way of kernel driver/ethtool (Linux or FreeBSD) does,
which should be well discussed.
In addition, for future extention, and avoid breaking any ABI in a strcuture,
we can just pre-define a lot of bytes as reserved, e.g. 64 bytes. Inside DPDK,
there are several strucutres defined like this, e.g. mbuf.

Thanks,
Helin

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 1/5] ethdev: add firmware version get
  @ 2017-01-04 12:03  5%       ` Qiming Yang
    1 sibling, 0 replies; 200+ results
From: Qiming Yang @ 2017-01-04 12:03 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, helin.zhang, remy.horton, Qiming Yang

This patch adds a new API 'rte_eth_dev_fw_version_get' for
fetching firmware version related information by a given device.

Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Remy Horton <remy.horton@intel.com>
---
v2 changes:
* modified some comment statements.
v3 changes:
* change API, use rte_eth_dev_fw_info_get(uint8_t port_id,
  uint32_t *fw_major, uint32_t *fw_minor, uint32_t *fw_patch,
  uint32_t *etrack_id) instead of rte_eth_dev_fwver_get(uint8_t port_id,
  char *fw_version, int fw_length).
  Add statusment in /doc/guides/nics/features/default.ini and
  release_17_02.rst.
v4 changes:
* remove deprecation notice, rename API as rte_eth_dev_fw_version_get
---
---
 doc/guides/nics/features/default.ini   |  1 +
 doc/guides/rel_notes/deprecation.rst   |  4 ----
 doc/guides/rel_notes/release_17_02.rst |  3 +++
 lib/librte_ether/rte_ethdev.c          | 14 ++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 23 +++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  1 +
 6 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index f1bf9bf..ae40d57 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -50,6 +50,7 @@ Timesync             =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+FW version           =
 EEPROM dump          =
 Registers dump       =
 Multiprocess aware   =
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1438c77..291e03d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -30,10 +30,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
-  will be extended with a new member ``fw_version`` in order to store
-  the NIC firmware version.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 180af82..d6958d4 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -52,6 +52,9 @@ New Features
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
+* **Added firmware version get API.**
+ Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware version
+ related information by a given device.
 
 Resolved Issues
 ---------------
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 280f0db..a4b20b5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1586,6 +1586,20 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, uint16_t rx_queue_id,
 }
 
 void
+rte_eth_dev_fw_version_get(uint8_t port_id, uint32_t *fw_major,
+		uint32_t *fw_minor, uint32_t *fw_patch, uint32_t *etrack_id)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->fw_version_get);
+	(*dev->dev_ops->fw_version_get)(dev, fw_major, fw_minor,
+					fw_patch, etrack_id);
+}
+
+void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
 	struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fb51754..9c7efa1 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1150,6 +1150,11 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef void (*eth_fw_version_get_t)(struct rte_eth_dev *dev,
+		uint32_t *fw_major, uint32_t *fw_minor,
+		uint32_t *fw_patch, uint32_t *etrack_id);
+/**< @internal Get firmware version information of an Ethernet device. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1457,6 +1462,7 @@ struct eth_dev_ops {
 	eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
 	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
 	/**< Get packet types supported and identified by device. */
+	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */
 
 	vlan_filter_set_t          vlan_filter_set; /**< Filter VLAN Setup. */
 	vlan_tpid_set_t            vlan_tpid_set; /**< Outer/Inner VLAN TPID Setup. */
@@ -2395,6 +2401,23 @@ void rte_eth_macaddr_get(uint8_t port_id, struct ether_addr *mac_addr);
 void rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info);
 
 /**
+ * Retrieve the firmware version of a device.
+ *
+ * @param port_id
+ *   The port identifier of the device.
+ * @param fw_major
+ *   A pointer to store the major firmware version of a device.
+ * @param fw_minor
+ *   A pointer to store the minor firmware version of a device.
+ * @param fw_patch
+ *   A pointer to store the firmware patch number of a device.
+ * @param etrack_id
+ *   A pointer to store the nvm version of a device.
+ */
+void rte_eth_dev_fw_version_get(uint8_t port_id, uint32_t *fw_major,
+	uint32_t *fw_minor, uint32_t *fw_patch, uint32_t *etrack_id);
+
+/**
  * Retrieve the supported packet types of an Ethernet device.
  *
  * When a packet type is announced as supported, it *must* be recognized by
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a021781..0cf94ed 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -151,6 +151,7 @@ DPDK_17.02 {
 	global:
 
 	_rte_eth_dev_reset;
+	rte_eth_dev_fw_version_get;
 	rte_flow_create;
 	rte_flow_destroy;
 	rte_flow_flush;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v3 2/4] net/e1000: add firmware version get
  @ 2017-01-04  8:47  4%           ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-01-04  8:47 UTC (permalink / raw)
  To: Yang, Qiming, dev, thomas.monjalon; +Cc: Horton, Remy

On 1/4/2017 3:14 AM, Yang, Qiming wrote:
> See the reply below.
> 
> -----Original Message-----
> From: Yigit, Ferruh 
> Sent: Tuesday, January 3, 2017 11:03 PM
> To: Yang, Qiming <qiming.yang@intel.com>; dev@dpdk.org; thomas.monjalon@6wind.com
> Cc: Horton, Remy <remy.horton@intel.com>
> Subject: Re: [PATCH v3 2/4] net/e1000: add firmware version get
> 
> On 12/27/2016 12:30 PM, Qiming Yang wrote:
>> This patch adds a new function eth_igb_fw_version_get.
>>
>> Signed-off-by: Qiming Yang <qiming.yang@intel.com>
>> ---
>> v3 changes:
>>  * use eth_igb_fw_version_get(struct rte_eth_dev *dev, u32 *fw_major,
>>    u32 *fw_minor, u32 *fw_minor, u32 *fw_patch, u32 *etrack_id) instead
>>    of eth_igb_fw_version_get(struct rte_eth_dev *dev, char *fw_version,
>>    int fw_length). Add statusment in /doc/guides/nics/features/igb.ini.
>> ---
>> ---
>>  doc/guides/nics/features/igb.ini |  1 +
>>  drivers/net/e1000/igb_ethdev.c   | 43 ++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 44 insertions(+)
>>
>> diff --git a/doc/guides/nics/features/igb.ini 
>> b/doc/guides/nics/features/igb.ini
>> index 9fafe72..ffd87ba 100644
>> --- a/doc/guides/nics/features/igb.ini
>> +++ b/doc/guides/nics/features/igb.ini
>> @@ -39,6 +39,7 @@ EEPROM dump          = Y
>>  Registers dump       = Y
>>  BSD nic_uio          = Y
>>  Linux UIO            = Y
>> +FW version           = Y
> 
> Please keep same location with default.ini file. Why you are putting this just into middle of the uio and vfio?
> Qiming: It's a clerical error, I want to add this line at the end of this file.
> 
>>  Linux VFIO           = Y
>>  x86-32               = Y
>>  x86-64               = Y
>> diff --git a/drivers/net/e1000/igb_ethdev.c 
>> b/drivers/net/e1000/igb_ethdev.c index 4a15447..25344b7 100644
>> --- a/drivers/net/e1000/igb_ethdev.c
>> +++ b/drivers/net/e1000/igb_ethdev.c
>> @@ -120,6 +120,8 @@ static int eth_igb_xstats_get_names(struct rte_eth_dev *dev,
>>  				    unsigned limit);
>>  static void eth_igb_stats_reset(struct rte_eth_dev *dev);  static 
>> void eth_igb_xstats_reset(struct rte_eth_dev *dev);
>> +static void eth_igb_fw_version_get(struct rte_eth_dev *dev, u32 *fw_major,
>> +		u32 *fw_minor, u32 *fw_patch, u32 *etrack_id);
> 
> I think you can use a struct as parameter here. But beware, that struct should NOT be a public struct.
> Qiming: I think only add a private struct for igb is unnecessary. Keep the arguments consistent with rte_eth_dev_fw_info_get is better.
> What do you think?

Both are OK.
Normally, I believe using struct is better. But we are not using struct
in public API because of the ABI compatibility issues. Here it is
internal usage, there is no ABI breakage concern, so it may be possible
to use a struct.
But if you prefer to keep the arguments same here with public API, that
is fine.

> 
>>  static void eth_igb_infos_get(struct rte_eth_dev *dev,
>>  			      struct rte_eth_dev_info *dev_info);  static const uint32_t 
>> *eth_igb_supported_ptypes_get(struct rte_eth_dev *dev); @@ -389,6 
>> +391,7 @@ static const struct eth_dev_ops eth_igb_ops = {
>>  	.xstats_get_names     = eth_igb_xstats_get_names,
>>  	.stats_reset          = eth_igb_stats_reset,
>>  	.xstats_reset         = eth_igb_xstats_reset,
>> +	.fw_version_get       = eth_igb_fw_version_get,
>>  	.dev_infos_get        = eth_igb_infos_get,
>>  	.dev_supported_ptypes_get = eth_igb_supported_ptypes_get,
>>  	.mtu_set              = eth_igb_mtu_set,
>> @@ -1981,6 +1984,46 @@ eth_igbvf_stats_reset(struct rte_eth_dev *dev)  
>> }
>>  
> 
> <...>
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 10/24] ethdev: parse ethertype filter
  2016-12-23  8:43  3%       ` Adrien Mazarguil
@ 2016-12-27  6:36  0%         ` Xing, Beilei
  0 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2016-12-27  6:36 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Yigit, Ferruh, Wu, Jingjing, Zhang, Helin, dev, Lu, Wenzhuo



> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Friday, December 23, 2016 4:43 PM
> To: Xing, Beilei <beilei.xing@intel.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 10/24] ethdev: parse ethertype filter
> 
> Hi all,
> 
> On Wed, Dec 21, 2016 at 03:54:50AM +0000, Xing, Beilei wrote:
> > Hi Ferruh,
> >
> > > -----Original Message-----
> > > From: Yigit, Ferruh
> > > Sent: Wednesday, December 21, 2016 2:12 AM
> > > To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Adrien
> > > Mazarguil <adrien.mazarguil@6wind.com>
> > > Subject: Re: [dpdk-dev] [PATCH 10/24] ethdev: parse ethertype filter
> > >
> > > On 12/2/2016 11:53 AM, Beilei Xing wrote:
> > > > Check if the rule is a ethertype rule, and get the ethertype info BTW.
> > > >
> > > > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> > > > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> > > > ---
> > >
> > > CC: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> 
> Thanks again for CC'ing me.
> 
> > > >  lib/librte_ether/rte_flow.c        | 136
> > > +++++++++++++++++++++++++++++++++++++
> > > >  lib/librte_ether/rte_flow_driver.h |  34 ++++++++++
> > >
> > > <...>
> > >
> > > > diff --git a/lib/librte_ether/rte_flow_driver.h
> > > > b/lib/librte_ether/rte_flow_driver.h
> > > > index a88c621..2760c74 100644
> > > > --- a/lib/librte_ether/rte_flow_driver.h
> > > > +++ b/lib/librte_ether/rte_flow_driver.h
> > > > @@ -170,6 +170,40 @@ rte_flow_error_set(struct rte_flow_error
> > > > *error, const struct rte_flow_ops *  rte_flow_ops_get(uint8_t
> > > > port_id, struct rte_flow_error *error);
> > > >
> > > > +int cons_parse_ethertype_filter(const struct rte_flow_attr *attr,
> > > > +			    const struct rte_flow_item *pattern,
> > > > +			    const struct rte_flow_action *actions,
> > > > +			    struct rte_eth_ethertype_filter *filter,
> > > > +			    struct rte_flow_error *error);
> > >
> > > Although this is helper function, it may be good if it follows the
> > > rte_follow namespace.
> >
> > OK, I will rename it in the next version, thanks very much.
> 
> Agreed, all public symbols exposed by headers must be prefixed with
> rte_flow.
> 
> Now I'm not so sure about the need to convert a rte_flow rule to a
> rte_eth_ethertype_filter. This definition basically makes rte_flow depend on
> rte_eth_ctrl.h (related #include is missing by the way).
> 

Since the whole implementation of parse function is modified, there'll be no common rte_eth_ethertype_filter here temporarily.

> I understand that both ixgbe and i40e would benefit from it, and considering
> rte_flow_driver.h is free from ABI versioning I guess it's acceptable, but
> remember we'll gradually remove existing filter types so we should avoid
> new dependencies on them. Just keep in mind this will be temporary.
> 

 i40e and ixgbe all use existing filter types in rte_flow_driver.h. if all existing filter types will be removed, we need to change the fiter info after applied.

> Please add full documentation as well in Doxygen style like for existing
> symbols. We have to maintain this API properly documented.
> 
> > > > +
> > > > +#define PATTERN_SKIP_VOID(filter, filter_struct, error_type)
> > > 	\
> > > > +	do {								\
> > > > +		if (!pattern) {						\
> > > > +			memset(filter, 0, sizeof(filter_struct));	\
> > > > +			error->type = error_type;                       \
> > > > +			return -EINVAL;
> > > 	\
> > > > +		}							\
> > > > +		item = pattern + i;					\
> > >
> > > I believe macros that relies on variables that not passed as
> > > argument is not good idea.
> >
> > Yes, I'm reworking the macros, and it will be changed in v2.
> >
> > >
> > > > +		while (item->type == RTE_FLOW_ITEM_TYPE_VOID) {
> > > 	\
> > > > +			i++;						\
> > > > +			item = pattern + i;				\
> > > > +		}							\
> > > > +	} while (0)
> > > > +
> > > > +#define ACTION_SKIP_VOID(filter, filter_struct, error_type)
> > > 	\
> > > > +	do {								\
> > > > +		if (!actions) {						\
> > > > +			memset(filter, 0, sizeof(filter_struct));	\
> > > > +			error->type = error_type;			\
> > > > +			return -EINVAL;
> > > 	\
> > > > +		}							\
> > > > +		act = actions + i;					\
> > > > +		while (act->type == RTE_FLOW_ACTION_TYPE_VOID) {	\
> > > > +			i++;						\
> > > > +			act = actions + i;				\
> > > > +		}							\
> > > > +	} while (0)
> > >
> > > Are these macros generic enough for all rte_flow consumers?
> > >
> > > What do you think separate this patch, and use these after applied,
> > > meanwhile keeping function and MACROS PMD internal?
> >
> > The main purpose of the macros is to reduce the code in PMD, otherwise
> > there'll be many such codes to get the next non-void item in all parse
> > functions, including the parse_ethertype_filter function in
> > rte_flow.c. But actually I'm not very sure if it's generic enough for
> > all consumers, although I think it's general at present:)
> 
> I'll concede skipping VOIDs can be tedious depending on the parser
> implementation, but I do not think these macros need to be exposed either.
> PMDs can duplicate some code such as this.
> 
> I think ixgbe and i40e share a fair amount of code already, and factoring it
> should be part of larger task to create a common Intel-specific library instead.


Good point. Thanks. We'll consider related implementation for the common code.
In V2 patch set, there'll be no common code temporarily since the implementation of parsing functions is different between ixgbe and i40e.

> 
> > Thanks for your advice, I'll move the macros to PMD currently, then there'll
> be no macros used in parse_ethertype_filter function, and optimize it after
> applied.
> >
> > BTW, I plan to send out V2 patch set in this week.
> >
> > Best Regards,
> > Beilei
> >
> > >
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > >
> >
> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw version get
  2016-12-22 15:31  0%             ` Thomas Monjalon
@ 2016-12-23 12:48  0%               ` Ferruh Yigit
  2017-01-05  3:04  3%               ` Zhang, Helin
  1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2016-12-23 12:48 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Qiming Yang, dev, Remy Horton

On 12/22/2016 3:31 PM, Thomas Monjalon wrote:
> 2016-12-22 15:05, Ferruh Yigit:
>> On 12/22/2016 2:47 PM, Thomas Monjalon wrote:
>>> 2016-12-22 14:36, Ferruh Yigit:
>>>> On 12/22/2016 11:07 AM, Thomas Monjalon wrote:
>>>>> I think it is OK to add a new dev_ops and a new API function for firmware
>>>>> query. Generally speaking, it is a good thing to avoid putting all
>>>>> informations in the same structure (e.g. rte_eth_dev_info). 
>>>>
>>>> OK.
>>>>
>>>>> However, there
>>>>> is a balance to find. Could we plan to add more info to this new query?
>>>>> Instead of
>>>>> 	rte_eth_dev_fwver_get(uint8_t port_id, char *fw_version, int fw_length)
>>> [...]
>>>>> could it fill a struct?
>>>>> 	rte_eth_dev_fw_info_get(uint8_t port_id, struct rte_eth_dev_fw_info *fw_info)
>>>>
>>>> I believe this is better. But the problem we are having with this usage
>>>> is: ABI breakage.
>>>>
>>>> Since this struct will be a public structure, in the future if we want
>>>> to add a new field to the struct, it will break the ABI, and just this
>>>> change will cause a new version for whole ethdev library!
>>>>
>>>> When all required fields received via arguments, one by one, instead of
>>>> struct, at least ABI versioning can be done on the API when new field
>>>> added, and can be possible to escape from ABI breakage. But this will be
>>>> ugly when number of arguments increased.
>>>>
>>>> Or any other opinion on how to define API to reduce ABI breakage?
>>>
>>> You're right.
>>> But I don't think we should have a function per data. Just because it would
>>> be ugly :)
>>
>> I am no suggesting function per data, instead something like:
>>
>> rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);
>>
>> And in the future if we need etrack_id too, we can have both in
>> versioned manner:
>>
>> rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);
>>
>> rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min,
>> 	uint32_t etrack_id);
> 
> Oh I see. So it can be versioned with compat macros.
> 
>> So my concern was if the number of the arguments becomes too many by time.
> 
> It looks to be a good proposal. We should not have a dozen of arguments.
> 

So, I suggest trying this approach in this API.


Overall, change request for the patch becomes:
1- Change API, is following arguments good enough to start with?:
- FW_major_number
- FW_minor_number
- FW_patch_number
- Etrack_id

If so, API becomes:
rte_eth_dev_fw_version_get(uint8_t port_id, uint32_t *fw_major,
	uint32_t *fw_minor, uint32_t *fw_patch, uint32_t *etrack_id);

! Note, I have renamed API to rte_eth_dev_fw_version_get() from
rte_eth_dev_fw_info_get() mentioned above, to narrow the scope of API.

and dev_ops name keeps same: fw_version_get

2- Add new feature in feature table (doc/guides/nics/features/), first
patch can add to the default one, and each driver patch implements this
feature should update its feature table.
Feature name can be "FW version"

3- Remove deprecation notice in the first patch.


Thanks,
ferruh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
  @ 2016-12-23  9:45  4%           ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-23  9:45 UTC (permalink / raw)
  To: Billy McFall
  Cc: Ananyev, Konstantin, thomas.monjalon, Lu, Wenzhuo, dev,
	Stephen Hemminger

Hi Billy,

On Tue, Dec 20, 2016 at 09:15:50AM -0500, Billy McFall wrote:
> Thank you for your responses, see inline.
> 
> On Tue, Dec 20, 2016 at 7:58 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Tue, Dec 20, 2016 at 12:17:10PM +0000, Ananyev, Konstantin wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> >> > Sent: Tuesday, December 20, 2016 11:28 AM
> >> > To: Billy McFall <bmcfall@redhat.com>
> >> > Cc: thomas.monjalon@6wind.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; dev@dpdk.org; Stephen Hemminger
> >> > <stephen@networkplumber.org>
> >> > Subject: Re: [dpdk-dev] [PATCH 1/3] ethdev: New API to free consumed buffers in TX ring
> >> >
> >> > Hi Billy,
> >> >
> >> > On Fri, Dec 16, 2016 at 07:48:49AM -0500, Billy McFall wrote:
> >> > > Add a new API to force free consumed buffers on TX ring. API will return
> >> > > the number of packets freed (0-n) or error code if feature not supported
> >> > > (-ENOTSUP) or input invalid (-ENODEV).
> >> > >
> >> > > Because rte_eth_tx_buffer() may be used, and mbufs may still be held
> >> > > in local buffer, the API also accepts *buffer and *sent. Before
> >> > > attempting to free, rte_eth_tx_buffer_flush() is called to make sure
> >> > > all mbufs are sent to Tx ring. rte_eth_tx_buffer_flush() is called even
> >> > > if threshold is not met.
> >> > >
> >> > > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> >> > > ---
> >> > >  lib/librte_ether/rte_ethdev.h | 56 +++++++++++++++++++++++++++++++++++++++++++
> >> > >  1 file changed, 56 insertions(+)
> >> > >
> >> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> >> > > index 9678179..e3f2be4 100644
> >> > > --- a/lib/librte_ether/rte_ethdev.h
> >> > > +++ b/lib/librte_ether/rte_ethdev.h
> >> > > @@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
> >> > >  typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
> >> > >  /**< @internal Check DD bit of specific RX descriptor */
> >> > >
> >> > > +typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
> >> > > +/**< @internal Force mbufs to be from TX ring. */
> >> > > +
> >> > >  typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
> >> > >   uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
> >> > >
> >> > > @@ -1467,6 +1470,7 @@ struct eth_dev_ops {
> >> > >   eth_rx_disable_intr_t      rx_queue_intr_disable;
> >> > >   eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> >> > >   eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> >> > > + eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
> >> > >   eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
> >> > >   eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> >> > >   flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
> >> > > @@ -2943,6 +2947,58 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
> >> > >  }
> >> > >
> >> > >  /**
> >> > > + * Request the driver to free mbufs currently cached by the driver. The
> >> > > + * driver will only free the mbuf if it is no longer in use.
> >> > > + *
> >> > > + * @param port_id
> >> > > + *   The port identifier of the Ethernet device.
> >> > > + * @param queue_id
> >> > > + *   The index of the transmit queue through which output packets must be
> >> > > + *   sent.
> >> > > + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> >> > > + *   to rte_eth_dev_configure().
> >> > > + * @param free_cnt
> >> > > + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> >> > > + *   should be freed. Note that a packet may be using multiple mbufs.
> >> > > + * @param buffer
> >> > > + *   Buffer used to collect packets to be sent. If provided, the buffer will
> >> > > + *   be flushed, even if the current length is less than buffer->size. Pass NULL
> >> > > + *   if buffer has already been flushed.
> >> > > + * @param sent
> >> > > + *   Pointer to return number of packets sent if buffer has packets to be sent.
> >> > > + *   If *buffer is supplied, *sent must also be supplied.
> >> > > + * @return
> >> > > + *   Failure: < 0
> >> > > + *     -ENODEV: Invalid interface
> >> > > + *     -ENOTSUP: Driver does not support function
> >> > > + *   Success: >= 0
> >> > > + *     0-n: Number of packets freed. More packets may still remain in ring that
> >> > > + *     are in use.
> >> > > + */
> >> > > +
> >> > > +static inline int
> >> > > +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt,
> >> > > +         struct rte_eth_dev_tx_buffer *buffer, uint16_t *sent)
> >> > > +{
> >> > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >> > > +
> >> > > + /* Validate Input Data. Bail if not valid or not supported. */
> >> > > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >> > > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> >> > > +
> >> > > + /*
> >> > > +  * If transmit buffer is provided and there are still packets to be
> >> > > +  * sent, then send them before attempting to free pending mbufs.
> >> > > +  */
> >> > > + if (buffer && sent)
> >> > > +         *sent = rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
> >> > > +
> >> > > + /* Call driver to free pending mbufs. */
> >> > > + return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> >> > > +                 free_cnt);
> >> > > +}
> >> > > +
> >> > > +/**
> >> > >   * Configure a callback for buffered packets which cannot be sent
> >> > >   *
> >> > >   * Register a specific callback to be called when an attempt is made to send
> >> >
> 
> I will remove the buffer/sent parameters. It will be the applications
> responsibility
> to make sure rte_eth_tx_buffer_flush() is called.
> 
> I don't feel strongly about the free_cnt parameter. It was in the
> original request
> so that if there was a large ring buffer, the API could bail early
> without having
> to go through all the entire ring. It might be a little unrealistic
> for the application
> to truly know how many mbufs it wants freed. Also, as an example, the I40e
> driver already has a i40e_tx_free_bufs(...) function, so by dropping
> the free_cnt
> parameter, this function could be reused without having to account for
> the free_cnt.
> 
> >> > Just a thought to follow-up on Stephen's comment to further simplify this
> >> > API, how about not adding any new eth_dev_ops but instead defining what
> >> > should happen during an empty TX burst call (tx_burst() with 0 packets).
> >> >
> 
> In the original API request thread, see dpdk-dev mailing list from 11/21/2016
> with subject "Adding API to force freeing consumed buffers in TX ring",
> overloading the existing API with nb_pkts == 0 was suggested and consensus
> was to go with new API. I lean towards a new API since this is a special case
> most applications won't use, but I will go with the community on whether to
> enhance the existing burst functionality or add a new API.

OK, I've just read the original thread.

> >> > Several PMDs already have a check for this scenario and start by cleaning up
> >> > completed packets anyway, they effectively partially implement this
> >> > definition for free already.
> >>
> >> Many PMDs  start by cleaning up only when number of free entries
> >> drop below some point.
> 
> True, but the original request for this API was for the scenario where packets
> are being flooded and the application wanted to reuse mbuf to avoid a packet
> copy. So the API was to request the driver to free "done" mbufs outside of any
> threshold.

Understood, so it's more than just a polite suggestion to PMDs that
implement this call. In my opinion it's still better to avoid adding a new
callback for that purpose since applications cannot rely on a specific
outcome, it cannot guarantee any mbuf would be freed, not unlike calling
tx_burst() with 0 packets.

That's a separate discussion, however perhaps making struct eth_dev_ops part
of the public API was not such a good idea after all. We're unable to
maintain ABI compatibility across releases because of it.

New callbacks would be met with less resistance (at least on my side) if
this whole ABI compat thing was not an issue.

> >> Also in that case the author would have to modify (and test) all existing TX routinies.
> >> So I think a separate API call seems more plausible.
> >
> > Not necessarily, as I understand this API in its current form only suggests
> > that a PMD should release a few mbufs from a queue if possible, without any
> > guarantee, PMDs are not forced to comply.
> >
> > I think the threshold you mention is a valid reason not to release them, and
> > it wouldn't change a thing to existing tx_burst() implementations in the
> > meantime (only documentation).
> >
> > This threshold could also be bypassed rather painlessly in the
> > "if (unlikely(nb_pkts == 0))" case that all PMDs already check for in a
> > way or another.
> >
> >> Though I am agree with previous comment from Stephen that last two parameters
> >> are redundant and would just overcomplicate things.
> >> tin
> >>
> >> >
> >> > The main difference with this API would be that you wouldn't know how many
> >> > mbufs were freed and wouldn't collect them into an array. However most
> >> > applications have one mbuf pool and/or know where they come from, so they
> >> > can just query the pool or attempt to re-allocate from it after doing empty
> >> > bursts in case of starvation.
> >> >
> >> > [1] http://dpdk.org/ml/archives/dev/2016-December/052469.html
> >
> > --
> > Adrien Mazarguil
> > 6WIND

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 10/24] ethdev: parse ethertype filter
  @ 2016-12-23  8:43  3%       ` Adrien Mazarguil
  2016-12-27  6:36  0%         ` Xing, Beilei
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-12-23  8:43 UTC (permalink / raw)
  To: Xing, Beilei; +Cc: Yigit, Ferruh, Wu, Jingjing, Zhang, Helin, dev, Lu, Wenzhuo

Hi all,

On Wed, Dec 21, 2016 at 03:54:50AM +0000, Xing, Beilei wrote:
> Hi Ferruh,
> 
> > -----Original Message-----
> > From: Yigit, Ferruh
> > Sent: Wednesday, December 21, 2016 2:12 AM
> > To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>
> > Subject: Re: [dpdk-dev] [PATCH 10/24] ethdev: parse ethertype filter
> > 
> > On 12/2/2016 11:53 AM, Beilei Xing wrote:
> > > Check if the rule is a ethertype rule, and get the ethertype info BTW.
> > >
> > > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> > > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> > > ---
> > 
> > CC: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Thanks again for CC'ing me.

> > >  lib/librte_ether/rte_flow.c        | 136
> > +++++++++++++++++++++++++++++++++++++
> > >  lib/librte_ether/rte_flow_driver.h |  34 ++++++++++
> > 
> > <...>
> > 
> > > diff --git a/lib/librte_ether/rte_flow_driver.h
> > > b/lib/librte_ether/rte_flow_driver.h
> > > index a88c621..2760c74 100644
> > > --- a/lib/librte_ether/rte_flow_driver.h
> > > +++ b/lib/librte_ether/rte_flow_driver.h
> > > @@ -170,6 +170,40 @@ rte_flow_error_set(struct rte_flow_error *error,
> > > const struct rte_flow_ops *  rte_flow_ops_get(uint8_t port_id, struct
> > > rte_flow_error *error);
> > >
> > > +int cons_parse_ethertype_filter(const struct rte_flow_attr *attr,
> > > +			    const struct rte_flow_item *pattern,
> > > +			    const struct rte_flow_action *actions,
> > > +			    struct rte_eth_ethertype_filter *filter,
> > > +			    struct rte_flow_error *error);
> > 
> > Although this is helper function, it may be good if it follows the rte_follow
> > namespace.
> 
> OK, I will rename it in the next version, thanks very much.

Agreed, all public symbols exposed by headers must be prefixed with
rte_flow.

Now I'm not so sure about the need to convert a rte_flow rule to a
rte_eth_ethertype_filter. This definition basically makes rte_flow depend on
rte_eth_ctrl.h (related #include is missing by the way).

I understand that both ixgbe and i40e would benefit from it, and considering
rte_flow_driver.h is free from ABI versioning I guess it's acceptable, but
remember we'll gradually remove existing filter types so we should avoid new
dependencies on them. Just keep in mind this will be temporary.

Please add full documentation as well in Doxygen style like for existing
symbols. We have to maintain this API properly documented.

> > > +
> > > +#define PATTERN_SKIP_VOID(filter, filter_struct, error_type)
> > 	\
> > > +	do {								\
> > > +		if (!pattern) {						\
> > > +			memset(filter, 0, sizeof(filter_struct));	\
> > > +			error->type = error_type;                       \
> > > +			return -EINVAL;
> > 	\
> > > +		}							\
> > > +		item = pattern + i;					\
> > 
> > I believe macros that relies on variables that not passed as argument is not
> > good idea.
> 
> Yes, I'm reworking the macros, and it will be changed in v2.
> 
> > 
> > > +		while (item->type == RTE_FLOW_ITEM_TYPE_VOID) {
> > 	\
> > > +			i++;						\
> > > +			item = pattern + i;				\
> > > +		}							\
> > > +	} while (0)
> > > +
> > > +#define ACTION_SKIP_VOID(filter, filter_struct, error_type)
> > 	\
> > > +	do {								\
> > > +		if (!actions) {						\
> > > +			memset(filter, 0, sizeof(filter_struct));	\
> > > +			error->type = error_type;			\
> > > +			return -EINVAL;
> > 	\
> > > +		}							\
> > > +		act = actions + i;					\
> > > +		while (act->type == RTE_FLOW_ACTION_TYPE_VOID) {	\
> > > +			i++;						\
> > > +			act = actions + i;				\
> > > +		}							\
> > > +	} while (0)
> > 
> > Are these macros generic enough for all rte_flow consumers?
> > 
> > What do you think separate this patch, and use these after applied,
> > meanwhile keeping function and MACROS PMD internal?
> 
> The main purpose of the macros is to reduce the code in PMD, otherwise there'll be many such codes to get the next non-void item in all parse functions, including the parse_ethertype_filter function in rte_flow.c. But actually I'm not very sure if it's generic enough for all consumers, although I think it's general at present:) 

I'll concede skipping VOIDs can be tedious depending on the parser
implementation, but I do not think these macros need to be exposed
either. PMDs can duplicate some code such as this.

I think ixgbe and i40e share a fair amount of code already, and factoring it
should be part of larger task to create a common Intel-specific library
instead.

> Thanks for your advice, I'll move the macros to PMD currently, then there'll be no macros used in parse_ethertype_filter function, and optimize it after applied.
> 
> BTW, I plan to send out V2 patch set in this week.
> 
> Best Regards,
> Beilei
> 
> > 
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > >
> 

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats from PF
  2016-12-21  0:56  3%         ` Lu, Wenzhuo
@ 2016-12-22 16:38  0%           ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2016-12-22 16:38 UTC (permalink / raw)
  To: Lu, Wenzhuo, Yigit, Ferruh, dev
  Cc: Wu, Jingjing, Zhang, Helin, Zhang, Qi Z, Chen, Jing D, Iremonger,
	Bernard



> -----Original Message-----
> From: Lu, Wenzhuo
> Sent: Wednesday, December 21, 2016 12:56 AM
> To: Iremonger, Bernard <bernard.iremonger@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; dev@dpdk.org
> Cc: Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
> <helin.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Chen, Jing D
> <jing.d.chen@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats from PF
> 
> Hi all,
> 
> 
> > -----Original Message-----
> > From: Iremonger, Bernard
> > Sent: Tuesday, December 20, 2016 9:40 PM
> > To: Yigit, Ferruh; dev@dpdk.org
> > Cc: Wu, Jingjing; Zhang, Helin; Zhang, Qi Z; Lu, Wenzhuo; Chen, Jing D
> > Subject: RE: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats
> > from PF
> >
> > Hi Ferruh,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > > Sent: Tuesday, December 20, 2016 1:25 PM
> > > To: dev@dpdk.org
> > > Cc: Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
> > > <helin.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Lu,
> > > Wenzhuo <wenzhuo.lu@intel.com>; Chen, Jing D
> <jing.d.chen@intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF
> > > stats from PF
> > >
> > > On 12/16/2016 7:02 PM, Ferruh Yigit wrote:
> > > > From: Qi Zhang <qi.z.zhang@intel.com>
> > > >
> > > > This patch add support to get/clear VF statistics from PF side.
> > > > Two APIs are added:
> > > > rte_pmd_i40e_get_vf_stats.
> > > > rte_pmd_i40e_reset_vf_stats.
> > > >
> > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > ---
> > >
> > > <...>
> > >
> > > > diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > index 8ac1bc8..7a5d211 100644
> > > > --- a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > +++ b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > @@ -6,7 +6,9 @@ DPDK_2.0 {
> > > >  DPDK_17.02 {
> > > >  	global:
> > > >
> > > > +	rte_pmd_i40e_get_vf_stats;
> > > >  	rte_pmd_i40e_ping_vfs;
> > > > +	rte_pmd_i40e_reset_vf_stats;
> > > >  	rte_pmd_i40e_set_tx_loopback;
> > > >  	rte_pmd_i40e_set_vf_broadcast;
> > > >  	rte_pmd_i40e_set_vf_mac_addr;
> > >
> > > Hi Wenzhuo, Mark,
> > >
> > > I think this is the list of all APIs added with this patchset.
> > >
> > > Just a question, what do you think following a logic in API naming as:
> > > <name_space>_<object>_<action> ?
> > >
> > > So API names become:
> > > rte_pmd_i40e_tx_loopback_set;
> > > rte_pmd_i40e_vf_broadcast_set;
> > > rte_pmd_i40e_vf_mac_addr_set;
> > > rte_pmd_i40e_vfs_ping;
> > > rte_pmd_i40e_vf_stats_get;
> > > rte_pmd_i40e_vf_stats_reset;
> > >
> > >
> > > After above rename, rte_pmd_i40e_tx_loopback_set() is not giving a
> > > hint that this is something related to the PF controlling VF,
> > > perhaps we can rename the API ?
> > >
> > > Also rte_pmd_i40e_vfs_ping() can become rte_pmd_i40e_vf_ping_all()
> > > to be more consistent about _vf_ usage.
> > >
> > > Overall, they can be something like:
> > > rte_pmd_i40e_vf_broadcast_set;
> > > rte_pmd_i40e_vf_mac_addr_set;
> > > rte_pmd_i40e_vf_ping_all;
> > > rte_pmd_i40e_vf_stats_get;
> > > rte_pmd_i40e_vf_stats_reset;
> > > rte_pmd_i40e_vf_tx_loopback_set;
> > >
> > > What do you think?
> > >
> >
> > I think the naming should be consistent with what has already been
> > implemented for the ixgbe PMD.
> > 	rte_pmd_ixgbe_set_all_queues_drop_en;
> > 	rte_pmd_ixgbe_set_tx_loopback;
> > 	rte_pmd_ixgbe_set_vf_mac_addr;
> > 	rte_pmd_ixgbe_set_vf_mac_anti_spoof;
> > 	rte_pmd_ixgbe_set_vf_split_drop_en;
> > 	rte_pmd_ixgbe_set_vf_vlan_anti_spoof;
> > 	rte_pmd_ixgbe_set_vf_vlan_insert;
> > 	rte_pmd_ixgbe_set_vf_vlan_stripq;
> >
> > 	rte_pmd_ixgbe_set_vf_rate_limit;
> > 	rte_pmd_ixgbe_set_vf_rx;
> > 	rte_pmd_ixgbe_set_vf_rxmode;
> > 	rte_pmd_ixgbe_set_vf_tx;
> > 	rte_pmd_ixgbe_set_vf_vlan_filter;
> So, seems better to use the current names. Rework both ixgbe and i40e's
> later. Not sure if it'll be counted as the ABI change if we change the ixgbe's
> name.
> 

A similar naming convention was used originally in the ethdev:
rte_eth_dev_set_vf_rxmode
rte_eth_dev_set_vf_rx
rte_eth_dev_set_vf_tx
rte_eth_dev_set_vf_vlan_filter
rte_eth_dev_set_vf_rate_limit

rte_eth_dev has just been replaced with rte_pmd_<ixgbe|i40e>

Regards,

Bernard.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw version get
  2016-12-22 15:05  0%           ` Ferruh Yigit
@ 2016-12-22 15:31  0%             ` Thomas Monjalon
  2016-12-23 12:48  0%               ` Ferruh Yigit
  2017-01-05  3:04  3%               ` Zhang, Helin
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2016-12-22 15:31 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Qiming Yang, dev, Remy Horton

2016-12-22 15:05, Ferruh Yigit:
> On 12/22/2016 2:47 PM, Thomas Monjalon wrote:
> > 2016-12-22 14:36, Ferruh Yigit:
> >> On 12/22/2016 11:07 AM, Thomas Monjalon wrote:
> >>> I think it is OK to add a new dev_ops and a new API function for firmware
> >>> query. Generally speaking, it is a good thing to avoid putting all
> >>> informations in the same structure (e.g. rte_eth_dev_info). 
> >>
> >> OK.
> >>
> >>> However, there
> >>> is a balance to find. Could we plan to add more info to this new query?
> >>> Instead of
> >>> 	rte_eth_dev_fwver_get(uint8_t port_id, char *fw_version, int fw_length)
> > [...]
> >>> could it fill a struct?
> >>> 	rte_eth_dev_fw_info_get(uint8_t port_id, struct rte_eth_dev_fw_info *fw_info)
> >>
> >> I believe this is better. But the problem we are having with this usage
> >> is: ABI breakage.
> >>
> >> Since this struct will be a public structure, in the future if we want
> >> to add a new field to the struct, it will break the ABI, and just this
> >> change will cause a new version for whole ethdev library!
> >>
> >> When all required fields received via arguments, one by one, instead of
> >> struct, at least ABI versioning can be done on the API when new field
> >> added, and can be possible to escape from ABI breakage. But this will be
> >> ugly when number of arguments increased.
> >>
> >> Or any other opinion on how to define API to reduce ABI breakage?
> > 
> > You're right.
> > But I don't think we should have a function per data. Just because it would
> > be ugly :)
> 
> I am no suggesting function per data, instead something like:
> 
> rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);
> 
> And in the future if we need etrack_id too, we can have both in
> versioned manner:
> 
> rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);
> 
> rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min,
> 	uint32_t etrack_id);

Oh I see. So it can be versioned with compat macros.

> So my concern was if the number of the arguments becomes too many by time.

It looks to be a good proposal. We should not have a dozen of arguments.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] ethdev: cleanup device ops struct whitespace
  2016-12-22 15:16  3%     ` Ferruh Yigit
@ 2016-12-22 15:28  3%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-12-22 15:28 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Jan Blunck, dev

2016-12-22 15:16, Ferruh Yigit:
> On 12/22/2016 3:10 PM, Jan Blunck wrote:
> > On Thu, Dec 22, 2016 at 2:10 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >> - Grouped related items using empty lines
> > 
> > Reordering fields of a struct is breaking ABI. We should bump the
> > library version now.
> 
> You are right, sorry missed that.
> Intention was not to break the ABI, but cleanup.
> 
> Thomas,
> If it is too late, do you want me prepare a revert patch?

No
Please check doc/guides/rel_notes/deprecation.rst
We are going to break the ethdev ABI anyway.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] ethdev: cleanup device ops struct whitespace
  2016-12-22 15:10  3%   ` Jan Blunck
@ 2016-12-22 15:16  3%     ` Ferruh Yigit
  2016-12-22 15:28  3%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-12-22 15:16 UTC (permalink / raw)
  To: Jan Blunck; +Cc: dev, Thomas Monjalon

On 12/22/2016 3:10 PM, Jan Blunck wrote:
> On Thu, Dec 22, 2016 at 2:10 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>> - Grouped related items using empty lines
> 
> Reordering fields of a struct is breaking ABI. We should bump the
> library version now.

You are right, sorry missed that.
Intention was not to break the ABI, but cleanup.

Thomas,
If it is too late, do you want me prepare a revert patch?

> 
> 
>> - Aligned arguments to same column
>> - All item comments that doesn't fit same line are placed blow the item
>>   itself
>> - Moved some comments to same line if overall line < 100 chars
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>
<...>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] ethdev: cleanup device ops struct whitespace
  @ 2016-12-22 15:10  3%   ` Jan Blunck
  2016-12-22 15:16  3%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2016-12-22 15:10 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Thomas Monjalon

On Thu, Dec 22, 2016 at 2:10 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> - Grouped related items using empty lines

Reordering fields of a struct is breaking ABI. We should bump the
library version now.


> - Aligned arguments to same column
> - All item comments that doesn't fit same line are placed blow the item
>   itself
> - Moved some comments to same line if overall line < 100 chars
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>
> ---
>
> - ! This patch has the problem of trashing the git history for the struct,
>   which is indeed valid argument.
> - Some re-ordering also may be required which I hesitate to do
> - Some item comments doesn't give extra information and can be removed
>
> v3:
> - group MAC, MTU, promisc and allmuti functions together
> - group rxq/txq_info_get with dev_infos_get
> - group l2_tunnel_* and udp_tunnel_* functions together
>
> v2:
> - extract mtu_set into new group
> - move rss_hash_* to reta_* group
> - move set_mc_addr_list to mac_addr_* group
> - move set_vf_rate_limit to set_vf_* group
> - move get_dcb_info out of timesync_* group
>
> To make it easy to comment to latest struct, copy-paste here:
>
> struct eth_dev_ops {
>         eth_dev_configure_t        dev_configure; /**< Configure device. */
>         eth_dev_start_t            dev_start;     /**< Start device. */
>         eth_dev_stop_t             dev_stop;      /**< Stop device. */
>         eth_dev_set_link_up_t      dev_set_link_up;   /**< Device link up. */
>         eth_dev_set_link_down_t    dev_set_link_down; /**< Device link down. */
>         eth_dev_close_t            dev_close;     /**< Close device. */
>         eth_link_update_t          link_update;   /**< Get device link state. */
>
>         eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
>         eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
>         eth_allmulticast_enable_t  allmulticast_enable;/**< RX multicast ON. */
>         eth_allmulticast_disable_t allmulticast_disable;/**< RX multicast OF. */
>         eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address. */
>         eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address. */
>         eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address. */
>         eth_set_mc_addr_list_t     set_mc_addr_list; /**< set list of mcast addrs. */
>         mtu_set_t                  mtu_set;       /**< Set MTU. */
>
>         eth_stats_get_t            stats_get;     /**< Get generic device statistics. */
>         eth_stats_reset_t          stats_reset;   /**< Reset generic device statistics. */
>         eth_xstats_get_t           xstats_get;    /**< Get extended device statistics. */
>         eth_xstats_reset_t         xstats_reset;  /**< Reset extended device statistics. */
>         eth_xstats_get_names_t     xstats_get_names;
>         /**< Get names of extended statistics. */
>         eth_queue_stats_mapping_set_t queue_stats_mapping_set;
>         /**< Configure per queue stat counter mapping. */
>
>         eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
>         eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
>         eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
>         eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
>         /**< Get packet types supported and identified by device. */
>
>         vlan_filter_set_t          vlan_filter_set; /**< Filter VLAN Setup. */
>         vlan_tpid_set_t            vlan_tpid_set; /**< Outer/Inner VLAN TPID Setup. */
>         vlan_strip_queue_set_t     vlan_strip_queue_set; /**< VLAN Stripping on queue. */
>         vlan_offload_set_t         vlan_offload_set; /**< Set VLAN Offload. */
>         vlan_pvid_set_t            vlan_pvid_set; /**< Set port based TX VLAN insertion. */
>
>         eth_queue_start_t          rx_queue_start;/**< Start RX for a queue. */
>         eth_queue_stop_t           rx_queue_stop; /**< Stop RX for a queue. */
>         eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
>         eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
>         eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
>         eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
>         eth_rx_queue_count_t       rx_queue_count;/**< Get Rx queue count. */
>         eth_rx_descriptor_done_t   rx_descriptor_done; /**< Check rxd DD bit. */
>         eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
>         eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
>         eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
>         eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
>
>         eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>         eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
>
>         flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
>         flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control. */
>         priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority flow control. */
>
>         eth_uc_hash_table_set_t    uc_hash_table_set; /**< Set Unicast Table Array. */
>         eth_uc_all_hash_table_set_t uc_all_hash_table_set; /**< Set Unicast hash bitmap. */
>
>         eth_mirror_rule_set_t      mirror_rule_set; /**< Add a traffic mirror rule. */
>         eth_mirror_rule_reset_t    mirror_rule_reset; /**< reset a traffic mirror rule. */
>
>         eth_set_vf_rx_mode_t       set_vf_rx_mode;/**< Set VF RX mode. */
>         eth_set_vf_rx_t            set_vf_rx;     /**< enable/disable a VF receive. */
>         eth_set_vf_tx_t            set_vf_tx;     /**< enable/disable a VF transmit. */
>         eth_set_vf_vlan_filter_t   set_vf_vlan_filter; /**< Set VF VLAN filter. */
>         eth_set_vf_rate_limit_t    set_vf_rate_limit; /**< Set VF rate limit. */
>
>         eth_udp_tunnel_port_add_t  udp_tunnel_port_add; /** Add UDP tunnel port. */
>         eth_udp_tunnel_port_del_t  udp_tunnel_port_del; /** Del UDP tunnel port. */
>         eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
>         /** Config ether type of l2 tunnel. */
>         eth_l2_tunnel_offload_set_t   l2_tunnel_offload_set;
>         /** Enable/disable l2 tunnel offload functions. */
>
>         eth_set_queue_rate_limit_t set_queue_rate_limit; /**< Set queue rate limit. */
>
>         rss_hash_update_t          rss_hash_update; /** Configure RSS hash protocols. */
>         rss_hash_conf_get_t        rss_hash_conf_get; /** Get current RSS hash configuration. */
>         reta_update_t              reta_update;   /** Update redirection table. */
>         reta_query_t               reta_query;    /** Query redirection table. */
>
>         eth_get_reg_t              get_reg;           /**< Get registers. */
>         eth_get_eeprom_length_t    get_eeprom_length; /**< Get eeprom length. */
>         eth_get_eeprom_t           get_eeprom;        /**< Get eeprom data. */
>         eth_set_eeprom_t           set_eeprom;        /**< Set eeprom. */
>
>         /* bypass control */
> \#ifdef RTE_NIC_BYPASS
>         bypass_init_t              bypass_init;
>         bypass_state_set_t         bypass_state_set;
>         bypass_state_show_t        bypass_state_show;
>         bypass_event_set_t         bypass_event_set;
>         bypass_event_show_t        bypass_event_show;
>         bypass_wd_timeout_set_t    bypass_wd_timeout_set;
>         bypass_wd_timeout_show_t   bypass_wd_timeout_show;
>         bypass_ver_show_t          bypass_ver_show;
>         bypass_wd_reset_t          bypass_wd_reset;
> \#endif
>
>         eth_filter_ctrl_t          filter_ctrl; /**< common filter control. */
>
>         eth_get_dcb_info           get_dcb_info; /** Get DCB information. */
>
>         eth_timesync_enable_t      timesync_enable;
>         /** Turn IEEE1588/802.1AS timestamping on. */
>         eth_timesync_disable_t     timesync_disable;
>         /** Turn IEEE1588/802.1AS timestamping off. */
>         eth_timesync_read_rx_timestamp_t timesync_read_rx_timestamp;
>         /** Read the IEEE1588/802.1AS RX timestamp. */
>         eth_timesync_read_tx_timestamp_t timesync_read_tx_timestamp;
>         /** Read the IEEE1588/802.1AS TX timestamp. */
>         eth_timesync_adjust_time   timesync_adjust_time; /** Adjust the device clock. */
>         eth_timesync_read_time     timesync_read_time; /** Get the device clock time. */
>         eth_timesync_write_time    timesync_write_time; /** Set the device clock time. */
> };
> ---
>  lib/librte_ether/rte_ethdev.h | 174 +++++++++++++++++++++---------------------
>  1 file changed, 85 insertions(+), 89 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 52119af..272fd41 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1431,11 +1431,18 @@ struct eth_dev_ops {
>         eth_dev_set_link_up_t      dev_set_link_up;   /**< Device link up. */
>         eth_dev_set_link_down_t    dev_set_link_down; /**< Device link down. */
>         eth_dev_close_t            dev_close;     /**< Close device. */
> +       eth_link_update_t          link_update;   /**< Get device link state. */
> +
>         eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
>         eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
>         eth_allmulticast_enable_t  allmulticast_enable;/**< RX multicast ON. */
>         eth_allmulticast_disable_t allmulticast_disable;/**< RX multicast OF. */
> -       eth_link_update_t          link_update;   /**< Get device link state. */
> +       eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address. */
> +       eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address. */
> +       eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address. */
> +       eth_set_mc_addr_list_t     set_mc_addr_list; /**< set list of mcast addrs. */
> +       mtu_set_t                  mtu_set;       /**< Set MTU. */
> +
>         eth_stats_get_t            stats_get;     /**< Get generic device statistics. */
>         eth_stats_reset_t          stats_reset;   /**< Reset generic device statistics. */
>         eth_xstats_get_t           xstats_get;    /**< Get extended device statistics. */
> @@ -1444,109 +1451,98 @@ struct eth_dev_ops {
>         /**< Get names of extended statistics. */
>         eth_queue_stats_mapping_set_t queue_stats_mapping_set;
>         /**< Configure per queue stat counter mapping. */
> +
>         eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
> +       eth_rxq_info_get_t         rxq_info_get; /**< retrieve RX queue information. */
> +       eth_txq_info_get_t         txq_info_get; /**< retrieve TX queue information. */
>         eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
> -       /**< Get packet types supported and identified by device*/
> -       mtu_set_t                  mtu_set; /**< Set MTU. */
> -       vlan_filter_set_t          vlan_filter_set;  /**< Filter VLAN Setup. */
> -       vlan_tpid_set_t            vlan_tpid_set;      /**< Outer/Inner VLAN TPID Setup. */
> +       /**< Get packet types supported and identified by device. */
> +
> +       vlan_filter_set_t          vlan_filter_set; /**< Filter VLAN Setup. */
> +       vlan_tpid_set_t            vlan_tpid_set; /**< Outer/Inner VLAN TPID Setup. */
>         vlan_strip_queue_set_t     vlan_strip_queue_set; /**< VLAN Stripping on queue. */
>         vlan_offload_set_t         vlan_offload_set; /**< Set VLAN Offload. */
> -       vlan_pvid_set_t            vlan_pvid_set; /**< Set port based TX VLAN insertion */
> -       eth_queue_start_t          rx_queue_start;/**< Start RX for a queue.*/
> -       eth_queue_stop_t           rx_queue_stop;/**< Stop RX for a queue.*/
> -       eth_queue_start_t          tx_queue_start;/**< Start TX for a queue.*/
> -       eth_queue_stop_t           tx_queue_stop;/**< Stop TX for a queue.*/
> -       eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue.*/
> -       eth_queue_release_t        rx_queue_release;/**< Release RX queue.*/
> -       eth_rx_queue_count_t       rx_queue_count; /**< Get Rx queue count. */
> -       eth_rx_descriptor_done_t   rx_descriptor_done;  /**< Check rxd DD bit */
> -       /**< Enable Rx queue interrupt. */
> -       eth_rx_enable_intr_t       rx_queue_intr_enable;
> -       /**< Disable Rx queue interrupt.*/
> -       eth_rx_disable_intr_t      rx_queue_intr_disable;
> -       eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
> -       eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
> +       vlan_pvid_set_t            vlan_pvid_set; /**< Set port based TX VLAN insertion. */
> +
> +       eth_queue_start_t          rx_queue_start;/**< Start RX for a queue. */
> +       eth_queue_stop_t           rx_queue_stop; /**< Stop RX for a queue. */
> +       eth_queue_start_t          tx_queue_start;/**< Start TX for a queue. */
> +       eth_queue_stop_t           tx_queue_stop; /**< Stop TX for a queue. */
> +       eth_rx_queue_setup_t       rx_queue_setup;/**< Set up device RX queue. */
> +       eth_queue_release_t        rx_queue_release; /**< Release RX queue. */
> +       eth_rx_queue_count_t       rx_queue_count;/**< Get Rx queue count. */
> +       eth_rx_descriptor_done_t   rx_descriptor_done; /**< Check rxd DD bit. */
> +       eth_rx_enable_intr_t       rx_queue_intr_enable;  /**< Enable Rx queue interrupt. */
> +       eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt. */
> +       eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue. */
> +       eth_queue_release_t        tx_queue_release; /**< Release TX queue. */
> +
>         eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
>         eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
> +
>         flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
>         flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control. */
> -       priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority flow control.*/
> -       eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
> -       eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
> -       eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
> -       eth_uc_hash_table_set_t    uc_hash_table_set;  /**< Set Unicast Table Array */
> -       eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast hash bitmap */
> -       eth_mirror_rule_set_t      mirror_rule_set;  /**< Add a traffic mirror rule.*/
> -       eth_mirror_rule_reset_t    mirror_rule_reset;  /**< reset a traffic mirror rule.*/
> -       eth_set_vf_rx_mode_t       set_vf_rx_mode;   /**< Set VF RX mode */
> -       eth_set_vf_rx_t            set_vf_rx;  /**< enable/disable a VF receive */
> -       eth_set_vf_tx_t            set_vf_tx;  /**< enable/disable a VF transmit */
> -       eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter */
> -       /** Add UDP tunnel port. */
> -       eth_udp_tunnel_port_add_t udp_tunnel_port_add;
> -       /** Del UDP tunnel port. */
> -       eth_udp_tunnel_port_del_t udp_tunnel_port_del;
> -       eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate limit */
> -       eth_set_vf_rate_limit_t    set_vf_rate_limit;   /**< Set VF rate limit */
> -       /** Update redirection table. */
> -       reta_update_t reta_update;
> -       /** Query redirection table. */
> -       reta_query_t reta_query;
> -
> -       eth_get_reg_t get_reg;
> -       /**< Get registers */
> -       eth_get_eeprom_length_t get_eeprom_length;
> -       /**< Get eeprom length */
> -       eth_get_eeprom_t get_eeprom;
> -       /**< Get eeprom data */
> -       eth_set_eeprom_t set_eeprom;
> -       /**< Set eeprom */
> -  /* bypass control */
> +       priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority flow control. */
> +
> +       eth_uc_hash_table_set_t    uc_hash_table_set; /**< Set Unicast Table Array. */
> +       eth_uc_all_hash_table_set_t uc_all_hash_table_set; /**< Set Unicast hash bitmap. */
> +
> +       eth_mirror_rule_set_t      mirror_rule_set; /**< Add a traffic mirror rule. */
> +       eth_mirror_rule_reset_t    mirror_rule_reset; /**< reset a traffic mirror rule. */
> +
> +       eth_set_vf_rx_mode_t       set_vf_rx_mode;/**< Set VF RX mode. */
> +       eth_set_vf_rx_t            set_vf_rx;     /**< enable/disable a VF receive. */
> +       eth_set_vf_tx_t            set_vf_tx;     /**< enable/disable a VF transmit. */
> +       eth_set_vf_vlan_filter_t   set_vf_vlan_filter; /**< Set VF VLAN filter. */
> +       eth_set_vf_rate_limit_t    set_vf_rate_limit; /**< Set VF rate limit. */
> +
> +       eth_udp_tunnel_port_add_t  udp_tunnel_port_add; /** Add UDP tunnel port. */
> +       eth_udp_tunnel_port_del_t  udp_tunnel_port_del; /** Del UDP tunnel port. */
> +       eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
> +       /** Config ether type of l2 tunnel. */
> +       eth_l2_tunnel_offload_set_t   l2_tunnel_offload_set;
> +       /** Enable/disable l2 tunnel offload functions. */
> +
> +       eth_set_queue_rate_limit_t set_queue_rate_limit; /**< Set queue rate limit. */
> +
> +       rss_hash_update_t          rss_hash_update; /** Configure RSS hash protocols. */
> +       rss_hash_conf_get_t        rss_hash_conf_get; /** Get current RSS hash configuration. */
> +       reta_update_t              reta_update;   /** Update redirection table. */
> +       reta_query_t               reta_query;    /** Query redirection table. */
> +
> +       eth_get_reg_t              get_reg;           /**< Get registers. */
> +       eth_get_eeprom_length_t    get_eeprom_length; /**< Get eeprom length. */
> +       eth_get_eeprom_t           get_eeprom;        /**< Get eeprom data. */
> +       eth_set_eeprom_t           set_eeprom;        /**< Set eeprom. */
> +
> +       /* bypass control */
>  #ifdef RTE_NIC_BYPASS
> -  bypass_init_t bypass_init;
> -  bypass_state_set_t bypass_state_set;
> -  bypass_state_show_t bypass_state_show;
> -  bypass_event_set_t bypass_event_set;
> -  bypass_event_show_t bypass_event_show;
> -  bypass_wd_timeout_set_t bypass_wd_timeout_set;
> -  bypass_wd_timeout_show_t bypass_wd_timeout_show;
> -  bypass_ver_show_t bypass_ver_show;
> -  bypass_wd_reset_t bypass_wd_reset;
> +       bypass_init_t              bypass_init;
> +       bypass_state_set_t         bypass_state_set;
> +       bypass_state_show_t        bypass_state_show;
> +       bypass_event_set_t         bypass_event_set;
> +       bypass_event_show_t        bypass_event_show;
> +       bypass_wd_timeout_set_t    bypass_wd_timeout_set;
> +       bypass_wd_timeout_show_t   bypass_wd_timeout_show;
> +       bypass_ver_show_t          bypass_ver_show;
> +       bypass_wd_reset_t          bypass_wd_reset;
>  #endif
>
> -       /** Configure RSS hash protocols. */
> -       rss_hash_update_t rss_hash_update;
> -       /** Get current RSS hash configuration. */
> -       rss_hash_conf_get_t rss_hash_conf_get;
> -       eth_filter_ctrl_t              filter_ctrl;
> -       /**< common filter control. */
> -       eth_set_mc_addr_list_t set_mc_addr_list; /**< set list of mcast addrs */
> -       eth_rxq_info_get_t rxq_info_get;
> -       /**< retrieve RX queue information. */
> -       eth_txq_info_get_t txq_info_get;
> -       /**< retrieve TX queue information. */
> +       eth_filter_ctrl_t          filter_ctrl; /**< common filter control. */
> +
> +       eth_get_dcb_info           get_dcb_info; /** Get DCB information. */
> +
> +       eth_timesync_enable_t      timesync_enable;
>         /** Turn IEEE1588/802.1AS timestamping on. */
> -       eth_timesync_enable_t timesync_enable;
> +       eth_timesync_disable_t     timesync_disable;
>         /** Turn IEEE1588/802.1AS timestamping off. */
> -       eth_timesync_disable_t timesync_disable;
> -       /** Read the IEEE1588/802.1AS RX timestamp. */
>         eth_timesync_read_rx_timestamp_t timesync_read_rx_timestamp;
> -       /** Read the IEEE1588/802.1AS TX timestamp. */
> +       /** Read the IEEE1588/802.1AS RX timestamp. */
>         eth_timesync_read_tx_timestamp_t timesync_read_tx_timestamp;
> -
> -       /** Get DCB information */
> -       eth_get_dcb_info get_dcb_info;
> -       /** Adjust the device clock.*/
> -       eth_timesync_adjust_time timesync_adjust_time;
> -       /** Get the device clock time. */
> -       eth_timesync_read_time timesync_read_time;
> -       /** Set the device clock time. */
> -       eth_timesync_write_time timesync_write_time;
> -       /** Config ether type of l2 tunnel */
> -       eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
> -       /** Enable/disable l2 tunnel offload functions */
> -       eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
> +       /** Read the IEEE1588/802.1AS TX timestamp. */
> +       eth_timesync_adjust_time   timesync_adjust_time; /** Adjust the device clock. */
> +       eth_timesync_read_time     timesync_read_time; /** Get the device clock time. */
> +       eth_timesync_write_time    timesync_write_time; /** Set the device clock time. */
>  };
>
>  /**
> --
> 2.9.3
>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw version get
  2016-12-22 14:47  3%         ` Thomas Monjalon
@ 2016-12-22 15:05  0%           ` Ferruh Yigit
  2016-12-22 15:31  0%             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-12-22 15:05 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Qiming Yang, dev, Remy Horton

On 12/22/2016 2:47 PM, Thomas Monjalon wrote:
> 2016-12-22 14:36, Ferruh Yigit:
>> On 12/22/2016 11:07 AM, Thomas Monjalon wrote:
>>> I think it is OK to add a new dev_ops and a new API function for firmware
>>> query. Generally speaking, it is a good thing to avoid putting all
>>> informations in the same structure (e.g. rte_eth_dev_info). 
>>
>> OK.
>>
>>> However, there
>>> is a balance to find. Could we plan to add more info to this new query?
>>> Instead of
>>> 	rte_eth_dev_fwver_get(uint8_t port_id, char *fw_version, int fw_length)
> [...]
>>> could it fill a struct?
>>> 	rte_eth_dev_fw_info_get(uint8_t port_id, struct rte_eth_dev_fw_info *fw_info)
>>
>> I believe this is better. But the problem we are having with this usage
>> is: ABI breakage.
>>
>> Since this struct will be a public structure, in the future if we want
>> to add a new field to the struct, it will break the ABI, and just this
>> change will cause a new version for whole ethdev library!
>>
>> When all required fields received via arguments, one by one, instead of
>> struct, at least ABI versioning can be done on the API when new field
>> added, and can be possible to escape from ABI breakage. But this will be
>> ugly when number of arguments increased.
>>
>> Or any other opinion on how to define API to reduce ABI breakage?
> 
> You're right.
> But I don't think we should have a function per data. Just because it would
> be ugly :)

I am no suggesting function per data, instead something like:

rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);

And in the future if we need etrack_id too, we can have both in
versioned manner:

rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min);

rte_eth_dev_fw_info_get(uint8_t port_id, uint32_t maj, uint32_t min,
	uint32_t etrack_id);

So my concern was if the number of the arguments becomes too many by time.


> I hope the ABI could become stable with time.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw version get
  2016-12-22 14:36  5%       ` Ferruh Yigit
@ 2016-12-22 14:47  3%         ` Thomas Monjalon
  2016-12-22 15:05  0%           ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-12-22 14:47 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Qiming Yang, dev, Remy Horton

2016-12-22 14:36, Ferruh Yigit:
> On 12/22/2016 11:07 AM, Thomas Monjalon wrote:
> > I think it is OK to add a new dev_ops and a new API function for firmware
> > query. Generally speaking, it is a good thing to avoid putting all
> > informations in the same structure (e.g. rte_eth_dev_info). 
> 
> OK.
> 
> > However, there
> > is a balance to find. Could we plan to add more info to this new query?
> > Instead of
> > 	rte_eth_dev_fwver_get(uint8_t port_id, char *fw_version, int fw_length)
[...]
> > could it fill a struct?
> > 	rte_eth_dev_fw_info_get(uint8_t port_id, struct rte_eth_dev_fw_info *fw_info)
> 
> I believe this is better. But the problem we are having with this usage
> is: ABI breakage.
> 
> Since this struct will be a public structure, in the future if we want
> to add a new field to the struct, it will break the ABI, and just this
> change will cause a new version for whole ethdev library!
> 
> When all required fields received via arguments, one by one, instead of
> struct, at least ABI versioning can be done on the API when new field
> added, and can be possible to escape from ABI breakage. But this will be
> ugly when number of arguments increased.
> 
> Or any other opinion on how to define API to reduce ABI breakage?

You're right.
But I don't think we should have a function per data. Just because it would
be ugly :)
I hope the ABI could become stable with time.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw version get
  @ 2016-12-22 14:36  5%       ` Ferruh Yigit
  2016-12-22 14:47  3%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-12-22 14:36 UTC (permalink / raw)
  To: Thomas Monjalon, Qiming Yang; +Cc: dev, Remy Horton

On 12/22/2016 11:07 AM, Thomas Monjalon wrote:
> 2016-12-08 16:34, Remy Horton:
>>
>> On 06/12/2016 15:16, Qiming Yang wrote:
>> [..]
>>> Qiming Yang (5):
>>>   ethdev: add firmware version get
>>>   net/e1000: add firmware version get
>>>   net/ixgbe: add firmware version get
>>>   net/i40e: add firmware version get
>>>   ethtool: dispaly bus info and firmware version
>>
>> s/dispaly/display
>>
>> doc/guides/rel_notes/release_17_02.rst ought to be updated as well. Code 
>> itself looks ok though..
>>
>> Acked-by: Remy Horton <remy.horton@intel.com>
> 
> It must be a feature in the table (doc/guides/nics/features/).
> The deprecation notice must be removed also.
> 
> I think it is OK to add a new dev_ops and a new API function for firmware
> query. Generally speaking, it is a good thing to avoid putting all
> informations in the same structure (e.g. rte_eth_dev_info). 

OK.

> However, there
> is a balance to find. Could we plan to add more info to this new query?
> Instead of
> 	rte_eth_dev_fwver_get(uint8_t port_id, char *fw_version, int fw_length)

Here there is another problem, the content and the format of the string
is not defined. In this patchset it is not same for different PMDs.
This is OK for just printing the data, but not good for an API. How can
the application know what to expect.

> could it fill a struct?
> 	rte_eth_dev_fw_info_get(uint8_t port_id, struct rte_eth_dev_fw_info *fw_info)

I believe this is better. But the problem we are having with this usage
is: ABI breakage.

Since this struct will be a public structure, in the future if we want
to add a new field to the struct, it will break the ABI, and just this
change will cause a new version for whole ethdev library!

When all required fields received via arguments, one by one, instead of
struct, at least ABI versioning can be done on the API when new field
added, and can be possible to escape from ABI breakage. But this will be
ugly when number of arguments increased.

Or any other opinion on how to define API to reduce ABI breakage?

> 
> We already have
> 	rte_eth_dev_get_reg_info(uint8_t port_id, struct rte_dev_reg_info *info)
> with
> 	uint32_t version; /**< Device version */
> 
> There are also these functions (a bit related):
> 	rte_eth_dev_get_eeprom_length(uint8_t port_id)
> 	rte_eth_dev_get_eeprom(uint8_t port_id, struct rte_dev_eeprom_info *info)
> 

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [RFC] pci: remove unused UNBIND support
  2016-12-08 10:53  3% ` David Marchand
@ 2016-12-21 15:15  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-12-21 15:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Marchand, dev

2016-12-08 11:53, David Marchand:
> On Wed, Dec 7, 2016 at 7:04 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > No device driver sets the unbind flag in current public code base.
> > Therefore it is good time to remove the unused dead code.
> 
> Yes, this has been unused for some time now.
> 
> I would say this is not subject to abi enforcement as this only
> matters to driver api not application api.
> So this can go into 17.02.
> 
> The patch looks good to me.

Applied, thanks

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 3/3] doc: add required python versions to docs
  @ 2016-12-21 15:03  4% ` John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2016-12-21 15:03 UTC (permalink / raw)
  To: dev; +Cc: mkletzan, nhorman, John McNamara

Add a requirement to support both Python 2 and 3 to the
DPDK Python Coding Standards and Getting started Guide.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/contributing/coding_style.rst | 3 ++-
 doc/guides/linux_gsg/sys_reqs.rst        | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
index 1eb67f3..4163960 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -690,6 +690,7 @@ Control Statements
 Python Code
 -----------
 
-All python code should be compliant with `PEP8 (Style Guide for Python Code) <https://www.python.org/dev/peps/pep-0008/>`_.
+All Python code should work with Python 2.7+ and 3.2+ and be compliant with
+`PEP8 (Style Guide for Python Code) <https://www.python.org/dev/peps/pep-0008/>`_.
 
 The ``pep8`` tool can be used for testing compliance with the guidelines.
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 76d82e6..61222c6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -84,7 +84,7 @@ Compilation of the DPDK
        x86_x32 ABI is currently supported with distribution packages only on Ubuntu
        higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.
 
-*   Python, version 2.6 or 2.7, to use various helper scripts included in the DPDK package.
+*   Python, version 2.7+ or 3.2+, to use various helper scripts included in the DPDK package.
 
 
 **Optional Tools:**
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v5 04/26] cmdline: add support for dynamic tokens
    2016-12-21 14:51  2%           ` [dpdk-dev] [PATCH v5 01/26] ethdev: introduce generic flow API Adrien Mazarguil
  2016-12-21 14:51  1%           ` [dpdk-dev] [PATCH v5 02/26] doc: add rte_flow prog guide Adrien Mazarguil
@ 2016-12-21 14:51  2%           ` Adrien Mazarguil
  2 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-21 14:51 UTC (permalink / raw)
  To: dev

Considering tokens must be hard-coded in a list part of the instruction
structure, context-dependent tokens cannot be expressed.

This commit adds support for building dynamic token lists through a
user-provided function, which is called when the static token list is empty
(a single NULL entry).

Because no structures are modified (existing fields are reused), this
commit has no impact on the current ABI.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 lib/librte_cmdline/cmdline_parse.c | 60 +++++++++++++++++++++++++++++----
 lib/librte_cmdline/cmdline_parse.h | 21 ++++++++++++
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c
index b496067..14f5553 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -146,7 +146,9 @@ nb_common_chars(const char * s1, const char * s2)
  */
 static int
 match_inst(cmdline_parse_inst_t *inst, const char *buf,
-	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size)
+	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size,
+	   cmdline_parse_token_hdr_t
+		*(*dyn_tokens)[CMDLINE_PARSE_DYNAMIC_TOKENS])
 {
 	unsigned int token_num=0;
 	cmdline_parse_token_hdr_t * token_p;
@@ -155,6 +157,11 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 	struct cmdline_token_hdr token_hdr;
 
 	token_p = inst->tokens[token_num];
+	if (!token_p && dyn_tokens && inst->f) {
+		if (!(*dyn_tokens)[0])
+			inst->f(&(*dyn_tokens)[0], NULL, dyn_tokens);
+		token_p = (*dyn_tokens)[0];
+	}
 	if (token_p)
 		memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -196,7 +203,17 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 		buf += n;
 
 		token_num ++;
-		token_p = inst->tokens[token_num];
+		if (!inst->tokens[0]) {
+			if (token_num < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!(*dyn_tokens)[token_num])
+					inst->f(&(*dyn_tokens)[token_num],
+						NULL,
+						dyn_tokens);
+				token_p = (*dyn_tokens)[token_num];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[token_num];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 	}
@@ -239,6 +256,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 	cmdline_parse_inst_t *inst;
 	const char *curbuf;
 	char result_buf[CMDLINE_PARSE_RESULT_BUFSIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	void (*f)(void *, struct cmdline *, void *) = NULL;
 	void *data = NULL;
 	int comment = 0;
@@ -255,6 +273,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		return CMDLINE_PARSE_BAD_ARGS;
 
 	ctx = cl->ctx;
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/*
 	 * - look if the buffer contains at least one line
@@ -299,7 +318,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		debug_printf("INST %d\n", inst_num);
 
 		/* fully parsed */
-		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf));
+		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf),
+				 &dyn_tokens);
 
 		if (tok > 0) /* we matched at least one token */
 			err = CMDLINE_PARSE_BAD_ARGS;
@@ -355,6 +375,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 	cmdline_parse_token_hdr_t *token_p;
 	struct cmdline_token_hdr token_hdr;
 	char tmpbuf[CMDLINE_BUFFER_SIZE], comp_buf[CMDLINE_BUFFER_SIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	unsigned int partial_tok_len;
 	int comp_len = -1;
 	int tmp_len = -1;
@@ -374,6 +395,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 
 	debug_printf("%s called\n", __func__);
 	memset(&token_hdr, 0, sizeof(token_hdr));
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/* count the number of complete token to parse */
 	for (i=0 ; buf[i] ; i++) {
@@ -396,11 +418,24 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		inst = ctx[inst_num];
 		while (inst) {
 			/* parse the first tokens of the inst */
-			if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+			if (nb_token &&
+			    match_inst(inst, buf, nb_token, NULL, 0,
+				       &dyn_tokens))
 				goto next;
 
 			debug_printf("instruction match\n");
-			token_p = inst->tokens[nb_token];
+			if (!inst->tokens[0]) {
+				if (nb_token <
+				    (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+					if (!dyn_tokens[nb_token])
+						inst->f(&dyn_tokens[nb_token],
+							NULL,
+							&dyn_tokens);
+					token_p = dyn_tokens[nb_token];
+				} else
+					token_p = NULL;
+			} else
+				token_p = inst->tokens[nb_token];
 			if (token_p)
 				memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -490,10 +525,21 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		/* we need to redo it */
 		inst = ctx[inst_num];
 
-		if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+		if (nb_token &&
+		    match_inst(inst, buf, nb_token, NULL, 0, &dyn_tokens))
 			goto next2;
 
-		token_p = inst->tokens[nb_token];
+		if (!inst->tokens[0]) {
+			if (nb_token < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!dyn_tokens[nb_token])
+					inst->f(&dyn_tokens[nb_token],
+						NULL,
+						&dyn_tokens);
+				token_p = dyn_tokens[nb_token];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[nb_token];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
diff --git a/lib/librte_cmdline/cmdline_parse.h b/lib/librte_cmdline/cmdline_parse.h
index 4ac05d6..65b18d4 100644
--- a/lib/librte_cmdline/cmdline_parse.h
+++ b/lib/librte_cmdline/cmdline_parse.h
@@ -83,6 +83,9 @@ extern "C" {
 /* maximum buffer size for parsed result */
 #define CMDLINE_PARSE_RESULT_BUFSIZE 8192
 
+/* maximum number of dynamic tokens */
+#define CMDLINE_PARSE_DYNAMIC_TOKENS 128
+
 /**
  * Stores a pointer to the ops struct, and the offset: the place to
  * write the parsed result in the destination structure.
@@ -130,6 +133,24 @@ struct cmdline;
  * Store a instruction, which is a pointer to a callback function and
  * its parameter that is called when the instruction is parsed, a help
  * string, and a list of token composing this instruction.
+ *
+ * When no tokens are defined (tokens[0] == NULL), they are retrieved
+ * dynamically by calling f() as follows:
+ *
+ *  f((struct cmdline_token_hdr **)&token_hdr,
+ *    NULL,
+ *    (struct cmdline_token_hdr *[])tokens));
+ *
+ * The address of the resulting token is expected at the location pointed by
+ * the first argument. Can be set to NULL to end the list.
+ *
+ * The cmdline argument (struct cmdline *) is always NULL.
+ *
+ * The last argument points to the NULL-terminated list of dynamic tokens
+ * defined so far. Since token_hdr points to an index of that list, the
+ * current index can be derived as follows:
+ *
+ *  int index = token_hdr - &(*tokens)[0];
  */
 struct cmdline_inst {
 	/* f(parsed_struct, data) */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v5 02/26] doc: add rte_flow prog guide
    2016-12-21 14:51  2%           ` [dpdk-dev] [PATCH v5 01/26] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-12-21 14:51  1%           ` Adrien Mazarguil
  2016-12-21 14:51  2%           ` [dpdk-dev] [PATCH v5 04/26] cmdline: add support for dynamic tokens Adrien Mazarguil
  2 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-21 14:51 UTC (permalink / raw)
  To: dev

This documentation is based on the latest RFC submission, subsequently
updated according to feedback from the community.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/prog_guide/index.rst    |    1 +
 doc/guides/prog_guide/rte_flow.rst | 2041 +++++++++++++++++++++++++++++++
 2 files changed, 2042 insertions(+)

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index e5a50a8..ed7f770 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -42,6 +42,7 @@ Programmer's Guide
     mempool_lib
     mbuf_lib
     poll_mode_drv
+    rte_flow
     cryptodev_lib
     link_bonding_poll_mode_drv_lib
     timer_lib
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
new file mode 100644
index 0000000..f415a73
--- /dev/null
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -0,0 +1,2041 @@
+..  BSD LICENSE
+    Copyright 2016 6WIND S.A.
+    Copyright 2016 Mellanox.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Generic_flow_API:
+
+Generic flow API (rte_flow)
+===========================
+
+Overview
+--------
+
+This API provides a generic means to configure hardware to match specific
+ingress or egress traffic, alter its fate and query related counters
+according to any number of user-defined rules.
+
+It is named *rte_flow* after the prefix used for all its symbols, and is
+defined in ``rte_flow.h``.
+
+- Matching can be performed on packet data (protocol headers, payload) and
+  properties (e.g. associated physical port, virtual device function ID).
+
+- Possible operations include dropping traffic, diverting it to specific
+  queues, to virtual/physical device functions or ports, performing tunnel
+  offloads, adding marks and so on.
+
+It is slightly higher-level than the legacy filtering framework which it
+encompasses and supersedes (including all functions and filter types) in
+order to expose a single interface with an unambiguous behavior that is
+common to all poll-mode drivers (PMDs).
+
+Several methods to migrate existing applications are described in `API
+migration`_.
+
+Flow rule
+---------
+
+Description
+~~~~~~~~~~~
+
+A flow rule is the combination of attributes with a matching pattern and a
+list of actions. Flow rules form the basis of this API.
+
+Flow rules can have several distinct actions (such as counting,
+encapsulating, decapsulating before redirecting packets to a particular
+queue, etc.), instead of relying on several rules to achieve this and having
+applications deal with hardware implementation details regarding their
+order.
+
+Support for different priority levels on a rule basis is provided, for
+example in order to force a more specific rule to come before a more generic
+one for packets matched by both. However hardware support for more than a
+single priority level cannot be guaranteed. When supported, the number of
+available priority levels is usually low, which is why they can also be
+implemented in software by PMDs (e.g. missing priority levels may be
+emulated by reordering rules).
+
+In order to remain as hardware-agnostic as possible, by default all rules
+are considered to have the same priority, which means that the order between
+overlapping rules (when a packet is matched by several filters) is
+undefined.
+
+PMDs may refuse to create overlapping rules at a given priority level when
+they can be detected (e.g. if a pattern matches an existing filter).
+
+Thus predictable results for a given priority level can only be achieved
+with non-overlapping rules, using perfect matching on all protocol layers.
+
+Flow rules can also be grouped, the flow rule priority is specific to the
+group they belong to. All flow rules in a given group are thus processed
+either before or after another group.
+
+Support for multiple actions per rule may be implemented internally on top
+of non-default hardware priorities, as a result both features may not be
+simultaneously available to applications.
+
+Considering that allowed pattern/actions combinations cannot be known in
+advance and would result in an impractically large number of capabilities to
+expose, a method is provided to validate a given rule from the current
+device configuration state.
+
+This enables applications to check if the rule types they need is supported
+at initialization time, before starting their data path. This method can be
+used anytime, its only requirement being that the resources needed by a rule
+should exist (e.g. a target RX queue should be configured first).
+
+Each defined rule is associated with an opaque handle managed by the PMD,
+applications are responsible for keeping it. These can be used for queries
+and rules management, such as retrieving counters or other data and
+destroying them.
+
+To avoid resource leaks on the PMD side, handles must be explicitly
+destroyed by the application before releasing associated resources such as
+queues and ports.
+
+The following sections cover:
+
+- **Attributes** (represented by ``struct rte_flow_attr``): properties of a
+  flow rule such as its direction (ingress or egress) and priority.
+
+- **Pattern item** (represented by ``struct rte_flow_item``): part of a
+  matching pattern that either matches specific packet data or traffic
+  properties. It can also describe properties of the pattern itself, such as
+  inverted matching.
+
+- **Matching pattern**: traffic properties to look for, a combination of any
+  number of items.
+
+- **Actions** (represented by ``struct rte_flow_action``): operations to
+  perform whenever a packet is matched by a pattern.
+
+Attributes
+~~~~~~~~~~
+
+Attribute: Group
+^^^^^^^^^^^^^^^^
+
+Flow rules can be grouped by assigning them a common group number. Lower
+values have higher priority. Group 0 has the highest priority.
+
+Although optional, applications are encouraged to group similar rules as
+much as possible to fully take advantage of hardware capabilities
+(e.g. optimized matching) and work around limitations (e.g. a single pattern
+type possibly allowed in a given group).
+
+Note that support for more than a single group is not guaranteed.
+
+Attribute: Priority
+^^^^^^^^^^^^^^^^^^^
+
+A priority level can be assigned to a flow rule. Like groups, lower values
+denote higher priority, with 0 as the maximum.
+
+A rule with priority 0 in group 8 is always matched after a rule with
+priority 8 in group 0.
+
+Group and priority levels are arbitrary and up to the application, they do
+not need to be contiguous nor start from 0, however the maximum number
+varies between devices and may be affected by existing flow rules.
+
+If a packet is matched by several rules of a given group for a given
+priority level, the outcome is undefined. It can take any path, may be
+duplicated or even cause unrecoverable errors.
+
+Note that support for more than a single priority level is not guaranteed.
+
+Attribute: Traffic direction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+
+Several pattern items and actions are valid and can be used in both
+directions. At least one direction must be specified.
+
+Specifying both directions at once for a given rule is not recommended but
+may be valid in a few cases (e.g. shared counters).
+
+Pattern item
+~~~~~~~~~~~~
+
+Pattern items fall in two categories:
+
+- Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+  IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+  specification structure.
+
+- Matching meta-data or affecting pattern processing (END, VOID, INVERT, PF,
+  VF, PORT and so on), often without a specification structure.
+
+Item specification structures are used to match specific values among
+protocol fields (or item properties). Documentation describes for each item
+whether they are associated with one and their type name if so.
+
+Up to three structures of the same type can be set for a given item:
+
+- ``spec``: values to match (e.g. a given IPv4 address).
+
+- ``last``: upper bound for an inclusive range with corresponding fields in
+  ``spec``.
+
+- ``mask``: bit-mask applied to both ``spec`` and ``last`` whose purpose is
+  to distinguish the values to take into account and/or partially mask them
+  out (e.g. in order to match an IPv4 address prefix).
+
+Usage restrictions and expected behavior:
+
+- Setting either ``mask`` or ``last`` without ``spec`` is an error.
+
+- Field values in ``last`` which are either 0 or equal to the corresponding
+  values in ``spec`` are ignored; they do not generate a range. Nonzero
+  values lower than those in ``spec`` are not supported.
+
+- Setting ``spec`` and optionally ``last`` without ``mask`` causes the PMD
+  to only take the fields it can recognize into account. There is no error
+  checking for unsupported fields.
+
+- Not setting any of them (assuming item type allows it) uses default
+  parameters that depend on the item type. Most of the time, particularly
+  for protocol header items, it is equivalent to providing an empty (zeroed)
+  ``mask``.
+
+- ``mask`` is a simple bit-mask applied before interpreting the contents of
+  ``spec`` and ``last``, which may yield unexpected results if not used
+  carefully. For example, if for an IPv4 address field, ``spec`` provides
+  *10.1.2.3*, ``last`` provides *10.3.4.5* and ``mask`` provides
+  *255.255.0.0*, the effective range becomes *10.1.0.0* to *10.3.255.255*.
+
+Example of an item specification matching an Ethernet header:
+
+.. _table_rte_flow_pattern_item_example:
+
+.. table:: Ethernet item
+
+   +----------+----------+--------------------+
+   | Field    | Subfield | Value              |
+   +==========+==========+====================+
+   | ``spec`` | ``src``  | ``00:01:02:03:04`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:2a:66:00:01`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x22aa``         |
+   +----------+----------+--------------------+
+   | ``last`` | unspecified                   |
+   +----------+----------+--------------------+
+   | ``mask`` | ``src``  | ``00:ff:ff:ff:00`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:00:00:00:ff`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x0000``         |
+   +----------+----------+--------------------+
+
+Non-masked bits stand for any value (shown as ``?`` below), Ethernet headers
+with the following properties are thus matched:
+
+- ``src``: ``??:01:02:03:??``
+- ``dst``: ``??:??:??:??:01``
+- ``type``: ``0x????``
+
+Matching pattern
+~~~~~~~~~~~~~~~~
+
+A pattern is formed by stacking items starting from the lowest protocol
+layer to match. This stacking restriction does not apply to meta items which
+can be placed anywhere in the stack without affecting the meaning of the
+resulting pattern.
+
+Patterns are terminated by END items.
+
+Examples:
+
+.. _table_rte_flow_tcpv4_as_l4:
+
+.. table:: TCPv4 as L4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | IPv4     |
+   +-------+----------+
+   | 2     | TCP      |
+   +-------+----------+
+   | 3     | END      |
+   +-------+----------+
+
+|
+
+.. _table_rte_flow_tcpv6_in_vxlan:
+
+.. table:: TCPv6 in VXLAN
+
+   +-------+------------+
+   | Index | Item       |
+   +=======+============+
+   | 0     | Ethernet   |
+   +-------+------------+
+   | 1     | IPv4       |
+   +-------+------------+
+   | 2     | UDP        |
+   +-------+------------+
+   | 3     | VXLAN      |
+   +-------+------------+
+   | 4     | Ethernet   |
+   +-------+------------+
+   | 5     | IPv6       |
+   +-------+------------+
+   | 6     | TCP        |
+   +-------+------------+
+   | 7     | END        |
+   +-------+------------+
+
+|
+
+.. _table_rte_flow_tcpv4_as_l4_meta:
+
+.. table:: TCPv4 as L4 with meta items
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | VOID     |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | VOID     |
+   +-------+----------+
+   | 3     | IPv4     |
+   +-------+----------+
+   | 4     | TCP      |
+   +-------+----------+
+   | 5     | VOID     |
+   +-------+----------+
+   | 6     | VOID     |
+   +-------+----------+
+   | 7     | END      |
+   +-------+----------+
+
+The above example shows how meta items do not affect packet data matching
+items, as long as those remain stacked properly. The resulting matching
+pattern is identical to "TCPv4 as L4".
+
+.. _table_rte_flow_udpv6_anywhere:
+
+.. table:: UDPv6 anywhere
+
+   +-------+------+
+   | Index | Item |
+   +=======+======+
+   | 0     | IPv6 |
+   +-------+------+
+   | 1     | UDP  |
+   +-------+------+
+   | 2     | END  |
+   +-------+------+
+
+If supported by the PMD, omitting one or several protocol layers at the
+bottom of the stack as in the above example (missing an Ethernet
+specification) enables looking up anywhere in packets.
+
+It is unspecified whether the payload of supported encapsulations
+(e.g. VXLAN payload) is matched by such a pattern, which may apply to inner,
+outer or both packets.
+
+.. _table_rte_flow_invalid_l3:
+
+.. table:: Invalid, missing L3
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | UDP      |
+   +-------+----------+
+   | 2     | END      |
+   +-------+----------+
+
+The above pattern is invalid due to a missing L3 specification between L2
+(Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the
+top of the stack.
+
+Meta item types
+~~~~~~~~~~~~~~~
+
+They match meta-data or affect pattern processing instead of matching packet
+data directly, most of them do not need a specification structure. This
+particularity allows them to be specified anywhere in the stack without
+causing any side effect.
+
+Item: ``END``
+^^^^^^^^^^^^^
+
+End marker for item lists. Prevents further processing of items, thereby
+ending the pattern.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_end:
+
+.. table:: END
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Item: ``VOID``
+^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_void:
+
+.. table:: VOID
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+One usage example for this type is generating rules that share a common
+prefix quickly without reallocating memory, only by updating item types:
+
+.. _table_rte_flow_item_void_example:
+
+.. table:: TCP, UDP or ICMP as L4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | Ethernet           |
+   +-------+--------------------+
+   | 1     | IPv4               |
+   +-------+------+------+------+
+   | 2     | UDP  | VOID | VOID |
+   +-------+------+------+------+
+   | 3     | VOID | TCP  | VOID |
+   +-------+------+------+------+
+   | 4     | VOID | VOID | ICMP |
+   +-------+------+------+------+
+   | 5     | END                |
+   +-------+--------------------+
+
+Item: ``INVERT``
+^^^^^^^^^^^^^^^^
+
+Inverted matching, i.e. process packets that do not match the pattern.
+
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_invert:
+
+.. table:: INVERT
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Usage example, matching non-TCPv4 packets only:
+
+.. _table_rte_flow_item_invert_example:
+
+.. table:: Anything but TCPv4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | INVERT   |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | IPv4     |
+   +-------+----------+
+   | 3     | TCP      |
+   +-------+----------+
+   | 4     | END      |
+   +-------+----------+
+
+Item: ``PF``
+^^^^^^^^^^^^
+
+Matches packets addressed to the physical function of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: PF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if applied to a VF
+  device.
+- Can be combined with any number of `Item: VF`_ to match both PF and VF
+  traffic.
+- ``spec``, ``last`` and ``mask`` must not be set.
+
+.. _table_rte_flow_item_pf:
+
+.. table:: PF
+
+   +----------+-------+
+   | Field    | Value |
+   +==========+=======+
+   | ``spec`` | unset |
+   +----------+-------+
+   | ``last`` | unset |
+   +----------+-------+
+   | ``mask`` | unset |
+   +----------+-------+
+
+Item: ``VF``
+^^^^^^^^^^^^
+
+Matches packets addressed to a virtual function ID of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: VF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if this causes a VF
+  device to match traffic addressed to a different VF.
+- Can be specified multiple times to match traffic addressed to several VF
+  IDs.
+- Can be combined with a PF item to match both PF and VF traffic.
+
+.. _table_rte_flow_item_vf:
+
+.. table:: VF
+
+   +----------+----------+---------------------------+
+   | Field    | Subfield | Value                     |
+   +==========+==========+===========================+
+   | ``spec`` | ``id``   | destination VF ID         |
+   +----------+----------+---------------------------+
+   | ``last`` | ``id``   | upper range value         |
+   +----------+----------+---------------------------+
+   | ``mask`` | ``id``   | zeroed to match any VF ID |
+   +----------+----------+---------------------------+
+
+Item: ``PORT``
+^^^^^^^^^^^^^^
+
+Matches packets coming from the specified physical port of the underlying
+device.
+
+The first PORT item overrides the physical port normally associated with the
+specified DPDK input port (port_id). This item can be provided several times
+to match additional physical ports.
+
+Note that physical ports are not necessarily tied to DPDK input ports
+(port_id) when those are not under DPDK control. Possible values are
+specific to each device, they are not necessarily indexed from zero and may
+not be contiguous.
+
+As a device property, the list of allowed values as well as the value
+associated with a port_id should be retrieved by other means.
+
+.. _table_rte_flow_item_port:
+
+.. table:: PORT
+
+   +----------+-----------+--------------------------------+
+   | Field    | Subfield  | Value                          |
+   +==========+===========+================================+
+   | ``spec`` | ``index`` | physical port index            |
+   +----------+-----------+--------------------------------+
+   | ``last`` | ``index`` | upper range value              |
+   +----------+-----------+--------------------------------+
+   | ``mask`` | ``index`` | zeroed to match any port index |
+   +----------+-----------+--------------------------------+
+
+Data matching item types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most of these are basically protocol header definitions with associated
+bit-masks. They must be specified (stacked) from lowest to highest protocol
+layer to form a matching pattern.
+
+The following list is not exhaustive, new protocols will be added in the
+future.
+
+Item: ``ANY``
+^^^^^^^^^^^^^
+
+Matches any protocol in place of the current layer, a single ANY may also
+stand for several protocol layers.
+
+This is usually specified as the first pattern item when looking for a
+protocol anywhere in a packet.
+
+.. _table_rte_flow_item_any:
+
+.. table:: ANY
+
+   +----------+----------+--------------------------------------+
+   | Field    | Subfield | Value                                |
+   +==========+==========+======================================+
+   | ``spec`` | ``num``  | number of layers covered             |
+   +----------+----------+--------------------------------------+
+   | ``last`` | ``num``  | upper range value                    |
+   +----------+----------+--------------------------------------+
+   | ``mask`` | ``num``  | zeroed to cover any number of layers |
+   +----------+----------+--------------------------------------+
+
+Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
+and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
+or IPv6) matched by the second ANY specification:
+
+.. _table_rte_flow_item_any_example:
+
+.. table:: TCP in VXLAN with wildcards
+
+   +-------+------+----------+----------+-------+
+   | Index | Item | Field    | Subfield | Value |
+   +=======+======+==========+==========+=======+
+   | 0     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 1     | ANY  | ``spec`` | ``num``  | 2     |
+   +-------+------+----------+----------+-------+
+   | 2     | VXLAN                              |
+   +-------+------------------------------------+
+   | 3     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 4     | ANY  | ``spec`` | ``num``  | 1     |
+   +-------+------+----------+----------+-------+
+   | 5     | TCP                                |
+   +-------+------------------------------------+
+   | 6     | END                                |
+   +-------+------------------------------------+
+
+Item: ``RAW``
+^^^^^^^^^^^^^
+
+Matches a byte string of a given length at a given offset.
+
+Offset is either absolute (using the start of the packet) or relative to the
+end of the previous matched item in the stack, in which case negative values
+are allowed.
+
+If search is enabled, offset is used as the starting point. The search area
+can be delimited by setting limit to a nonzero value, which is the maximum
+number of bytes after offset where the pattern may start.
+
+Matching a zero-length pattern is allowed, doing so resets the relative
+offset for subsequent items.
+
+- This type does not support ranges (``last`` field).
+
+.. _table_rte_flow_item_raw:
+
+.. table:: RAW
+
+   +----------+--------------+-------------------------------------------------+
+   | Field    | Subfield     | Value                                           |
+   +==========+==============+=================================================+
+   | ``spec`` | ``relative`` | look for pattern after the previous item        |
+   |          +--------------+-------------------------------------------------+
+   |          | ``search``   | search pattern from offset (see also ``limit``) |
+   |          +--------------+-------------------------------------------------+
+   |          | ``reserved`` | reserved, must be set to zero                   |
+   |          +--------------+-------------------------------------------------+
+   |          | ``offset``   | absolute or relative offset for ``pattern``     |
+   |          +--------------+-------------------------------------------------+
+   |          | ``limit``    | search area limit for start of ``pattern``      |
+   |          +--------------+-------------------------------------------------+
+   |          | ``length``   | ``pattern`` length                              |
+   |          +--------------+-------------------------------------------------+
+   |          | ``pattern``  | byte string to look for                         |
+   +----------+--------------+-------------------------------------------------+
+   | ``last`` | if specified, either all 0 or with the same values as ``spec`` |
+   +----------+----------------------------------------------------------------+
+   | ``mask`` | bit-mask applied to ``spec`` values with usual behavior        |
+   +----------+----------------------------------------------------------------+
+
+Example pattern looking for several strings at various offsets of a UDP
+payload, using combined RAW items:
+
+.. _table_rte_flow_item_raw_example:
+
+.. table:: UDP payload matching
+
+   +-------+------+----------+--------------+-------+
+   | Index | Item | Field    | Subfield     | Value |
+   +=======+======+==========+==============+=======+
+   | 0     | Ethernet                               |
+   +-------+----------------------------------------+
+   | 1     | IPv4                                   |
+   +-------+----------------------------------------+
+   | 2     | UDP                                    |
+   +-------+------+----------+--------------+-------+
+   | 3     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 10    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "foo" |
+   +-------+------+----------+--------------+-------+
+   | 4     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 20    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "bar" |
+   +-------+------+----------+--------------+-------+
+   | 5     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | -29   |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "baz" |
+   +-------+------+----------+--------------+-------+
+   | 6     | END                                    |
+   +-------+----------------------------------------+
+
+This translates to:
+
+- Locate "foo" at least 10 bytes deep inside UDP payload.
+- Locate "bar" after "foo" plus 20 bytes.
+- Locate "baz" after "bar" minus 29 bytes.
+
+Such a packet may be represented as follows (not to scale)::
+
+ 0                     >= 10 B           == 20 B
+ |                  |<--------->|     |<--------->|
+ |                  |           |     |           |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+ | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+                          |                             |
+                          |<--------------------------->|
+                                      == 29 B
+
+Note that matching subsequent pattern items would resume after "baz", not
+"bar" since matching is always performed after the previous item of the
+stack.
+
+Item: ``ETH``
+^^^^^^^^^^^^^
+
+Matches an Ethernet header.
+
+- ``dst``: destination MAC.
+- ``src``: source MAC.
+- ``type``: EtherType.
+
+Item: ``VLAN``
+^^^^^^^^^^^^^^
+
+Matches an 802.1Q/ad VLAN tag.
+
+- ``tpid``: tag protocol identifier.
+- ``tci``: tag control information.
+
+Item: ``IPV4``
+^^^^^^^^^^^^^^
+
+Matches an IPv4 header.
+
+Note: IPv4 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv4 header definition (``rte_ip.h``).
+
+Item: ``IPV6``
+^^^^^^^^^^^^^^
+
+Matches an IPv6 header.
+
+Note: IPv6 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv6 header definition (``rte_ip.h``).
+
+Item: ``ICMP``
+^^^^^^^^^^^^^^
+
+Matches an ICMP header.
+
+- ``hdr``: ICMP header definition (``rte_icmp.h``).
+
+Item: ``UDP``
+^^^^^^^^^^^^^
+
+Matches a UDP header.
+
+- ``hdr``: UDP header definition (``rte_udp.h``).
+
+Item: ``TCP``
+^^^^^^^^^^^^^
+
+Matches a TCP header.
+
+- ``hdr``: TCP header definition (``rte_tcp.h``).
+
+Item: ``SCTP``
+^^^^^^^^^^^^^^
+
+Matches a SCTP header.
+
+- ``hdr``: SCTP header definition (``rte_sctp.h``).
+
+Item: ``VXLAN``
+^^^^^^^^^^^^^^^
+
+Matches a VXLAN header (RFC 7348).
+
+- ``flags``: normally 0x08 (I flag).
+- ``rsvd0``: reserved, normally 0x000000.
+- ``vni``: VXLAN network identifier.
+- ``rsvd1``: reserved, normally 0x00.
+
+Actions
+~~~~~~~
+
+Each possible action is represented by a type. Some have associated
+configuration structures. Several actions combined in a list can be affected
+to a flow rule. That list is not ordered.
+
+They fall in three categories:
+
+- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+  processing matched packets by subsequent flow rules, unless overridden
+  with PASSTHRU.
+
+- Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for
+  additional processing by subsequent flow rules.
+
+- Other non-terminating meta actions that do not affect the fate of packets
+  (END, VOID, MARK, FLAG, COUNT).
+
+When several actions are combined in a flow rule, they should all have
+different types (e.g. dropping a packet twice is not possible).
+
+Only the last action of a given type is taken into account. PMDs still
+perform error checking on the entire list.
+
+Like matching patterns, action lists are terminated by END items.
+
+*Note that PASSTHRU is the only action able to override a terminating rule.*
+
+Example of action that redirects packets to queue index 10:
+
+.. _table_rte_flow_action_example:
+
+.. table:: Queue action
+
+   +-----------+-------+
+   | Field     | Value |
+   +===========+=======+
+   | ``index`` | 10    |
+   +-----------+-------+
+
+Action lists examples, their order is not significant, applications must
+consider all actions to be performed simultaneously:
+
+.. _table_rte_flow_count_and_drop:
+
+.. table:: Count and drop
+
+   +-------+--------+
+   | Index | Action |
+   +=======+========+
+   | 0     | COUNT  |
+   +-------+--------+
+   | 1     | DROP   |
+   +-------+--------+
+   | 2     | END    |
+   +-------+--------+
+
+|
+
+.. _table_rte_flow_mark_count_redirect:
+
+.. table:: Mark, count and redirect
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | MARK   | ``mark``  | 0x2a  |
+   +-------+--------+-----------+-------+
+   | 1     | COUNT                      |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 10    |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+|
+
+.. _table_rte_flow_redirect_queue_5:
+
+.. table:: Redirect to queue 5
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | DROP                       |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+In the above example, considering both actions are performed simultaneously,
+the end result is that only QUEUE has any effect.
+
+.. _table_rte_flow_redirect_queue_3:
+
+.. table:: Redirect to queue 3
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 1     | VOID                       |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 3     |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+As previously described, only the last action of a given type found in the
+list is taken into account. The above example also shows that VOID is
+ignored.
+
+Action types
+~~~~~~~~~~~~
+
+Common action types are described in this section. Like pattern item types,
+this list is not exhaustive as new actions will be added in the future.
+
+Action: ``END``
+^^^^^^^^^^^^^^^
+
+End marker for action lists. Prevents further processing of actions, thereby
+ending the list.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_end:
+
+.. table:: END
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VOID``
+^^^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_void:
+
+.. table:: VOID
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``PASSTHRU``
+^^^^^^^^^^^^^^^^^^^^
+
+Leaves packets up for additional processing by subsequent flow rules. This
+is the default when a rule does not contain a terminating action, but can be
+specified to force a rule to become non-terminating.
+
+- No configurable properties.
+
+.. _table_rte_flow_action_passthru:
+
+.. table:: PASSTHRU
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Example to copy a packet to a queue and continue processing by subsequent
+flow rules:
+
+.. _table_rte_flow_action_passthru_example:
+
+.. table:: Copy to queue 8
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | PASSTHRU                   |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 8     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+Action: ``MARK``
+^^^^^^^^^^^^^^^^
+
+Attaches a 32 bit value to packets.
+
+This value is arbitrary and application-defined. For compatibility with FDIR
+it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
+also set in ``ol_flags``.
+
+.. _table_rte_flow_action_mark:
+
+.. table:: MARK
+
+   +--------+-------------------------------------+
+   | Field  | Value                               |
+   +========+=====================================+
+   | ``id`` | 32 bit value to return with packets |
+   +--------+-------------------------------------+
+
+Action: ``FLAG``
+^^^^^^^^^^^^^^^^
+
+Flag packets. Similar to `Action: MARK`_ but only affects ``ol_flags``.
+
+- No configurable properties.
+
+Note: a distinctive flag must be defined for it.
+
+.. _table_rte_flow_action_flag:
+
+.. table:: FLAG
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``QUEUE``
+^^^^^^^^^^^^^^^^^
+
+Assigns packets to a given queue index.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_queue:
+
+.. table:: QUEUE
+
+   +-----------+--------------------+
+   | Field     | Value              |
+   +===========+====================+
+   | ``index`` | queue index to use |
+   +-----------+--------------------+
+
+Action: ``DROP``
+^^^^^^^^^^^^^^^^
+
+Drop packets.
+
+- No configurable properties.
+- Terminating by default.
+- PASSTHRU overrides this action if both are specified.
+
+.. _table_rte_flow_action_drop:
+
+.. table:: DROP
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``COUNT``
+^^^^^^^^^^^^^^^^^
+
+Enables counters for this rule.
+
+These counters can be retrieved and reset through ``rte_flow_query()``, see
+``struct rte_flow_query_count``.
+
+- Counters can be retrieved with ``rte_flow_query()``.
+- No configurable properties.
+
+.. _table_rte_flow_action_count:
+
+.. table:: COUNT
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Query structure to retrieve and reset flow rule counters:
+
+.. _table_rte_flow_query_count:
+
+.. table:: COUNT query
+
+   +---------------+-----+-----------------------------------+
+   | Field         | I/O | Value                             |
+   +===============+=====+===================================+
+   | ``reset``     | in  | reset counter after query         |
+   +---------------+-----+-----------------------------------+
+   | ``hits_set``  | out | ``hits`` field is set             |
+   +---------------+-----+-----------------------------------+
+   | ``bytes_set`` | out | ``bytes`` field is set            |
+   +---------------+-----+-----------------------------------+
+   | ``hits``      | out | number of hits for this rule      |
+   +---------------+-----+-----------------------------------+
+   | ``bytes``     | out | number of bytes through this rule |
+   +---------------+-----+-----------------------------------+
+
+Action: ``DUP``
+^^^^^^^^^^^^^^^
+
+Duplicates packets to a given queue index.
+
+This is normally combined with QUEUE, however when used alone, it is
+actually similar to QUEUE + PASSTHRU.
+
+- Non-terminating by default.
+
+.. _table_rte_flow_action_dup:
+
+.. table:: DUP
+
+   +-----------+------------------------------------+
+   | Field     | Value                              |
+   +===========+====================================+
+   | ``index`` | queue index to duplicate packet to |
+   +-----------+------------------------------------+
+
+Action: ``RSS``
+^^^^^^^^^^^^^^^
+
+Similar to QUEUE, except RSS is additionally performed on packets to spread
+them among several queues according to the provided parameters.
+
+Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
+however it conflicts with `Action: MARK`_ as they share the same space. When
+both actions are specified, the RSS hash is discarded and
+``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
+structure should eventually evolve to store both.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_rss:
+
+.. table:: RSS
+
+   +--------------+------------------------------+
+   | Field        | Value                        |
+   +==============+==============================+
+   | ``rss_conf`` | RSS parameters               |
+   +--------------+------------------------------+
+   | ``num``      | number of entries in queue[] |
+   +--------------+------------------------------+
+   | ``queue[]``  | queue indices to use         |
+   +--------------+------------------------------+
+
+Action: ``PF``
+^^^^^^^^^^^^^^
+
+Redirects packets to the physical function (PF) of the current device.
+
+- No configurable properties.
+- Terminating by default.
+
+.. _table_rte_flow_action_pf:
+
+.. table:: PF
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VF``
+^^^^^^^^^^^^^^
+
+Redirects packets to a virtual function (VF) of the current device.
+
+Packets matched by a VF pattern item can be redirected to their original VF
+ID instead of the specified one. This parameter may not be available and is
+not guaranteed to work properly if the VF part is matched by a prior flow
+rule or if packets are not addressed to a VF in the first place.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_vf:
+
+.. table:: VF
+
+   +--------------+--------------------------------+
+   | Field        | Value                          |
+   +==============+================================+
+   | ``original`` | use original VF ID if possible |
+   +--------------+--------------------------------+
+   | ``vf``       | VF ID to redirect packets to   |
+   +--------------+--------------------------------+
+
+Negative types
+~~~~~~~~~~~~~~
+
+All specified pattern items (``enum rte_flow_item_type``) and actions
+(``enum rte_flow_action_type``) use positive identifiers.
+
+The negative space is reserved for dynamic types generated by PMDs during
+run-time. PMDs may encounter them as a result but must not accept negative
+identifiers they are not aware of.
+
+A method to generate them remains to be defined.
+
+Planned types
+~~~~~~~~~~~~~
+
+Pattern item types will be added as new protocols are implemented.
+
+Variable headers support through dedicated pattern items, for example in
+order to match specific IPv4 options and IPv6 extension headers would be
+stacked after IPv4/IPv6 items.
+
+Other action types are planned but are not defined yet. These include the
+ability to alter packet data in several ways, such as performing
+encapsulation/decapsulation of tunnel headers.
+
+Rules management
+----------------
+
+A rather simple API with few functions is provided to fully manage flow
+rules.
+
+Each created flow rule is associated with an opaque, PMD-specific handle
+pointer. The application is responsible for keeping it until the rule is
+destroyed.
+
+Flows rules are represented by ``struct rte_flow`` objects.
+
+Validation
+~~~~~~~~~~
+
+Given that expressing a definite set of device capabilities is not
+practical, a dedicated function is provided to check if a flow rule is
+supported and can be created.
+
+.. code-block:: c
+
+   int
+   rte_flow_validate(uint8_t port_id,
+                     const struct rte_flow_attr *attr,
+                     const struct rte_flow_item pattern[],
+                     const struct rte_flow_action actions[],
+                     struct rte_flow_error *error);
+
+While this function has no effect on the target device, the flow rule is
+validated against its current configuration state and the returned value
+should be considered valid by the caller for that state only.
+
+The returned value is guaranteed to remain valid only as long as no
+successful calls to ``rte_flow_create()`` or ``rte_flow_destroy()`` are made
+in the meantime and no device parameter affecting flow rules in any way are
+modified, due to possible collisions or resource limitations (although in
+such cases ``EINVAL`` should not be returned).
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 if flow rule is valid and can be created. A negative errno value
+  otherwise (``rte_errno`` is also set), the following errors are defined.
+- ``-ENOSYS``: underlying device does not support this functionality.
+- ``-EINVAL``: unknown or invalid rule specification.
+- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
+  bit-masks are unsupported).
+- ``-EEXIST``: collision with an existing rule.
+- ``-ENOMEM``: not enough resources.
+- ``-EBUSY``: action cannot be performed due to busy device resources, may
+  succeed if the affected queues or even the entire port are in a stopped
+  state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).
+
+Creation
+~~~~~~~~
+
+Creating a flow rule is similar to validating one, except the rule is
+actually created and a handle returned.
+
+.. code-block:: c
+
+   struct rte_flow *
+   rte_flow_create(uint8_t port_id,
+                   const struct rte_flow_attr *attr,
+                   const struct rte_flow_item pattern[],
+                   const struct rte_flow_action *actions[],
+                   struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
+to the positive version of one of the error codes defined for
+``rte_flow_validate()``.
+
+Destruction
+~~~~~~~~~~~
+
+Flow rules destruction is not automatic, and a queue or a port should not be
+released if any are still attached to them. Applications must take care of
+performing this step before releasing resources.
+
+.. code-block:: c
+
+   int
+   rte_flow_destroy(uint8_t port_id,
+                    struct rte_flow *flow,
+                    struct rte_flow_error *error);
+
+
+Failure to destroy a flow rule handle may occur when other flow rules depend
+on it, and destroying it would result in an inconsistent state.
+
+This function is only guaranteed to succeed if handles are destroyed in
+reverse order of their creation.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to destroy.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Flush
+~~~~~
+
+Convenience function to destroy all flow rule handles associated with a
+port. They are released as with successive calls to ``rte_flow_destroy()``.
+
+.. code-block:: c
+
+   int
+   rte_flow_flush(uint8_t port_id,
+                  struct rte_flow_error *error);
+
+In the unlikely event of failure, handles are still considered destroyed and
+no longer valid but the port must be assumed to be in an inconsistent state.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Query
+~~~~~
+
+Query an existing flow rule.
+
+This function allows retrieving flow-specific data such as counters. Data
+is gathered by special actions which must be present in the flow rule
+definition.
+
+.. code-block:: c
+
+   int
+   rte_flow_query(uint8_t port_id,
+                  struct rte_flow *flow,
+                  enum rte_flow_action_type action,
+                  void *data,
+                  struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to query.
+- ``action``: action type to query.
+- ``data``: pointer to storage for the associated query data type.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Verbose error reporting
+-----------------------
+
+The defined *errno* values may not be accurate enough for users or
+application developers who want to investigate issues related to flow rules
+management. A dedicated error object is defined for this purpose:
+
+.. code-block:: c
+
+   enum rte_flow_error_type {
+       RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+       RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+       RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+       RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+       RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+       RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+       RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+       RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+   };
+
+   struct rte_flow_error {
+       enum rte_flow_error_type type; /**< Cause field and error types. */
+       const void *cause; /**< Object responsible for the error. */
+       const char *message; /**< Human-readable error message. */
+   };
+
+Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case
+remaining fields can be ignored. Other error types describe the type of the
+object pointed by ``cause``.
+
+If non-NULL, ``cause`` points to the object responsible for the error. For a
+flow rule, this may be a pattern item or an individual action.
+
+If non-NULL, ``message`` provides a human-readable error message.
+
+This object is normally allocated by applications and set by PMDs in case of
+error, the message points to a constant string which does not need to be
+freed by the application, however its pointer can be considered valid only
+as long as its associated DPDK port remains configured. Closing the
+underlying device or unloading the PMD invalidates it.
+
+Caveats
+-------
+
+- DPDK does not keep track of flow rules definitions or flow rule objects
+  automatically. Applications may keep track of the former and must keep
+  track of the latter. PMDs may also do it for internal needs, however this
+  must not be relied on by applications.
+
+- Flow rules are not maintained between successive port initializations. An
+  application exiting without releasing them and restarting must re-create
+  them from scratch.
+
+- API operations are synchronous and blocking (``EAGAIN`` cannot be
+  returned).
+
+- There is no provision for reentrancy/multi-thread safety, although nothing
+  should prevent different devices from being configured at the same
+  time. PMDs may protect their control path functions accordingly.
+
+- Stopping the data path (TX/RX) should not be necessary when managing flow
+  rules. If this cannot be achieved naturally or with workarounds (such as
+  temporarily replacing the burst function pointers), an appropriate error
+  code must be returned (``EBUSY``).
+
+- PMDs, not applications, are responsible for maintaining flow rules
+  configuration when stopping and restarting a port or performing other
+  actions which may affect them. They can only be destroyed explicitly by
+  applications.
+
+For devices exposing multiple ports sharing global settings affected by flow
+rules:
+
+- All ports under DPDK control must behave consistently, PMDs are
+  responsible for making sure that existing flow rules on a port are not
+  affected by other ports.
+
+- Ports not under DPDK control (unaffected or handled by other applications)
+  are user's responsibility. They may affect existing flow rules and cause
+  undefined behavior. PMDs aware of this may prevent flow rules creation
+  altogether in such cases.
+
+PMD interface
+-------------
+
+The PMD interface is defined in ``rte_flow_driver.h``. It is not subject to
+API/ABI versioning constraints as it is not exposed to applications and may
+evolve independently.
+
+It is currently implemented on top of the legacy filtering framework through
+filter type *RTE_ETH_FILTER_GENERIC* that accepts the single operation
+*RTE_ETH_FILTER_GET* to return PMD-specific *rte_flow* callbacks wrapped
+inside ``struct rte_flow_ops``.
+
+This overhead is temporarily necessary in order to keep compatibility with
+the legacy filtering framework, which should eventually disappear.
+
+- PMD callbacks implement exactly the interface described in `Rules
+  management`_, except for the port ID argument which has already been
+  converted to a pointer to the underlying ``struct rte_eth_dev``.
+
+- Public API functions do not process flow rules definitions at all before
+  calling PMD functions (no basic error checking, no validation
+  whatsoever). They only make sure these callbacks are non-NULL or return
+  the ``ENOSYS`` (function not supported) error.
+
+This interface additionally defines the following helper functions:
+
+- ``rte_flow_ops_get()``: get generic flow operations structure from a
+  port.
+
+- ``rte_flow_error_set()``: initialize generic flow error structure.
+
+More will be added over time.
+
+Device compatibility
+--------------------
+
+No known implementation supports all the described features.
+
+Unsupported features or combinations are not expected to be fully emulated
+in software by PMDs for performance reasons. Partially supported features
+may be completed in software as long as hardware performs most of the work
+(such as queue redirection and packet recognition).
+
+However PMDs are expected to do their best to satisfy application requests
+by working around hardware limitations as long as doing so does not affect
+the behavior of existing flow rules.
+
+The following sections provide a few examples of such cases and describe how
+PMDs should handle them, they are based on limitations built into the
+previous APIs.
+
+Global bit-masks
+~~~~~~~~~~~~~~~~
+
+Each flow rule comes with its own, per-layer bit-masks, while hardware may
+support only a single, device-wide bit-mask for a given layer type, so that
+two IPv4 rules cannot use different bit-masks.
+
+The expected behavior in this case is that PMDs automatically configure
+global bit-masks according to the needs of the first flow rule created.
+
+Subsequent rules are allowed only if their bit-masks match those, the
+``EEXIST`` error code should be returned otherwise.
+
+Unsupported layer types
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Many protocols can be simulated by crafting patterns with the `Item: RAW`_
+type.
+
+PMDs can rely on this capability to simulate support for protocols with
+headers not directly recognized by hardware.
+
+``ANY`` pattern item
+~~~~~~~~~~~~~~~~~~~~
+
+This pattern item stands for anything, which can be difficult to translate
+to something hardware would understand, particularly if followed by more
+specific types.
+
+Consider the following pattern:
+
+.. _table_rte_flow_unsupported_any:
+
+.. table:: Pattern with ANY as L3
+
+   +-------+-----------------------+
+   | Index | Item                  |
+   +=======+=======================+
+   | 0     | ETHER                 |
+   +-------+-----+---------+-------+
+   | 1     | ANY | ``num`` | ``1`` |
+   +-------+-----+---------+-------+
+   | 2     | TCP                   |
+   +-------+-----------------------+
+   | 3     | END                   |
+   +-------+-----------------------+
+
+Knowing that TCP does not make sense with something other than IPv4 and IPv6
+as L3, such a pattern may be translated to two flow rules instead:
+
+.. _table_rte_flow_unsupported_any_ipv4:
+
+.. table:: ANY replaced with IPV4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV4 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+|
+
+.. _table_rte_flow_unsupported_any_ipv6:
+
+.. table:: ANY replaced with IPV6
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV6 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+Note that as soon as a ANY rule covers several layers, this approach may
+yield a large number of hidden flow rules. It is thus suggested to only
+support the most common scenarios (anything as L2 and/or L3).
+
+Unsupported actions
+~~~~~~~~~~~~~~~~~~~
+
+- When combined with `Action: QUEUE`_, packet counting (`Action: COUNT`_)
+  and tagging (`Action: MARK`_ or `Action: FLAG`_) may be implemented in
+  software as long as the target queue is used by a single rule.
+
+- A rule specifying both `Action: DUP`_ + `Action: QUEUE`_ may be translated
+  to two hidden rules combining `Action: QUEUE`_ and `Action: PASSTHRU`_.
+
+- When a single target queue is provided, `Action: RSS`_ can also be
+  implemented through `Action: QUEUE`_.
+
+Flow rules priority
+~~~~~~~~~~~~~~~~~~~
+
+While it would naturally make sense, flow rules cannot be assumed to be
+processed by hardware in the same order as their creation for several
+reasons:
+
+- They may be managed internally as a tree or a hash table instead of a
+  list.
+- Removing a flow rule before adding another one can either put the new rule
+  at the end of the list or reuse a freed entry.
+- Duplication may occur when packets are matched by several rules.
+
+For overlapping rules (particularly in order to use `Action: PASSTHRU`_)
+predictable behavior is only guaranteed by using different priority levels.
+
+Priority levels are not necessarily implemented in hardware, or may be
+severely limited (e.g. a single priority bit).
+
+For these reasons, priority levels may be implemented purely in software by
+PMDs.
+
+- For devices expecting flow rules to be added in the correct order, PMDs
+  may destroy and re-create existing rules after adding a new one with
+  a higher priority.
+
+- A configurable number of dummy or empty rules can be created at
+  initialization time to save high priority slots for later.
+
+- In order to save priority levels, PMDs may evaluate whether rules are
+  likely to collide and adjust their priority accordingly.
+
+Future evolutions
+-----------------
+
+- A device profile selection function which could be used to force a
+  permanent profile instead of relying on its automatic configuration based
+  on existing flow rules.
+
+- A method to optimize *rte_flow* rules with specific pattern items and
+  action types generated on the fly by PMDs. DPDK should assign negative
+  numbers to these in order to not collide with the existing types. See
+  `Negative types`_.
+
+- Adding specific egress pattern items and actions as described in
+  `Attribute: Traffic direction`_.
+
+- Optional software fallback when PMDs are unable to handle requested flow
+  rules so applications do not have to implement their own.
+
+API migration
+-------------
+
+Exhaustive list of deprecated filter types (normally prefixed with
+*RTE_ETH_FILTER_*) found in ``rte_eth_ctrl.h`` and methods to convert them
+to *rte_flow* rules.
+
+``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*MACVLAN* can be translated to a basic `Item: ETH`_ flow rule with a
+terminating `Action: VF`_ or `Action: PF`_.
+
+.. _table_rte_flow_migration_macvlan:
+
+.. table:: MACVLAN conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | VF,     |
+   |   |     +----------+-----+ PF      |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*ETHERTYPE* is basically an `Item: ETH`_ flow rule with a terminating
+`Action: QUEUE`_ or `Action: DROP`_.
+
+.. _table_rte_flow_migration_ethertype:
+
+.. table:: ETHERTYPE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | QUEUE,  |
+   |   |     +----------+-----+ DROP    |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``FLEXIBLE`` to ``RAW`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FLEXIBLE* can be translated to one `Item: RAW`_ pattern with a terminating
+`Action: QUEUE`_ and a defined priority level.
+
+.. _table_rte_flow_migration_flexible:
+
+.. table:: FLEXIBLE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | RAW | ``spec`` | any | QUEUE   |
+   |   |     +----------+-----+         |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``SYN`` to ``TCP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*SYN* is a `Item: TCP`_ rule with only the ``syn`` bit enabled and masked,
+and a terminating `Action: QUEUE`_.
+
+Priority level can be set to simulate the high priority bit.
+
+.. _table_rte_flow_migration_syn:
+
+.. table:: SYN conversion
+
+   +-----------------------------------+---------+
+   | Pattern                           | Actions |
+   +===+======+==========+=============+=========+
+   | 0 | ETH  | ``spec`` | unset       | QUEUE   |
+   |   |      +----------+-------------+         |
+   |   |      | ``last`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+-------------+---------+
+   | 1 | IPV4 | ``spec`` | unset       | END     |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+---------+---+         |
+   | 2 | TCP  | ``spec`` | ``syn`` | 1 |         |
+   |   |      +----------+---------+---+         |
+   |   |      | ``mask`` | ``syn`` | 1 |         |
+   +---+------+----------+---------+---+         |
+   | 3 | END                           |         |
+   +---+-------------------------------+---------+
+
+``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*NTUPLE* is similar to specifying an empty L2, `Item: IPV4`_ as L3 with
+`Item: TCP`_ or `Item: UDP`_ as L4 and a terminating `Action: QUEUE`_.
+
+A priority level can be specified as well.
+
+.. _table_rte_flow_migration_ntuple:
+
+.. table:: NTUPLE conversion
+
+   +-----------------------------+---------+
+   | Pattern                     | Actions |
+   +===+======+==========+=======+=========+
+   | 0 | ETH  | ``spec`` | unset | QUEUE   |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | unset |         |
+   +---+------+----------+-------+---------+
+   | 1 | IPV4 | ``spec`` | any   | END     |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 2 | TCP, | ``spec`` | any   |         |
+   |   | UDP  +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 3 | END                     |         |
+   +---+-------------------------+---------+
+
+``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*TUNNEL* matches common IPv4 and IPv6 L3/L4-based tunnel types.
+
+In the following table, `Item: ANY`_ is used to cover the optional L4.
+
+.. _table_rte_flow_migration_tunnel:
+
+.. table:: TUNNEL conversion
+
+   +-------------------------------------------------------+---------+
+   | Pattern                                               | Actions |
+   +===+==========================+==========+=============+=========+
+   | 0 | ETH                      | ``spec`` | any         | QUEUE   |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+---------+
+   | 1 | IPV4, IPV6               | ``spec`` | any         | END     |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+         |
+   | 2 | ANY                      | ``spec`` | any         |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+---------+---+         |
+   |   |                          | ``mask`` | ``num`` | 0 |         |
+   +---+--------------------------+----------+---------+---+         |
+   | 3 | VXLAN, GENEVE, TEREDO,   | ``spec`` | any         |         |
+   |   | NVGRE, GRE, ...          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+         |
+   | 4 | END                                               |         |
+   +---+---------------------------------------------------+---------+
+
+``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FDIR* is more complex than any other type, there are several methods to
+emulate its functionality. It is summarized for the most part in the table
+below.
+
+A few features are intentionally not supported:
+
+- The ability to configure the matching input set and masks for the entire
+  device, PMDs should take care of it automatically according to the
+  requested flow rules.
+
+  For example if a device supports only one bit-mask per protocol type,
+  source/address IPv4 bit-masks can be made immutable by the first created
+  rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
+  compatible.
+
+  Note that only protocol bit-masks affected by existing flow rules are
+  immutable, others can be changed later. They become mutable again after
+  the related flow rules are destroyed.
+
+- Returning four or eight bytes of matched data when using flex bytes
+  filtering. Although a specific action could implement it, it conflicts
+  with the much more useful 32 bits tagging on devices that support it.
+
+- Side effects on RSS processing of the entire device. Flow rules that
+  conflict with the current device configuration should not be
+  allowed. Similarly, device configuration should not be allowed when it
+  affects existing flow rules.
+
+- Device modes of operation. "none" is unsupported since filtering cannot be
+  disabled as long as a flow rule is present.
+
+- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
+  according to the created flow rules.
+
+- Signature mode of operation is not defined but could be handled through a
+  specific item type if needed.
+
+.. _table_rte_flow_migration_fdir:
+
+.. table:: FDIR conversion
+
+   +----------------------------------------+-----------------------+
+   | Pattern                                | Actions               |
+   +===+===================+==========+=====+=======================+
+   | 0 | ETH, RAW          | ``spec`` | any | QUEUE, DROP, PASSTHRU |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``last`` | N/A |                       |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``mask`` | any |                       |
+   +---+-------------------+----------+-----+-----------------------+
+   | 1 | IPV4, IPv6        | ``spec`` | any | MARK                  |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``last`` | N/A |                       |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``mask`` | any |                       |
+   +---+-------------------+----------+-----+-----------------------+
+   | 2 | TCP, UDP, SCTP    | ``spec`` | any | END                   |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``last`` | N/A |                       |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``mask`` | any |                       |
+   +---+-------------------+----------+-----+                       |
+   | 3 | VF, PF (optional) | ``spec`` | any |                       |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``last`` | N/A |                       |
+   |   |                   +----------+-----+                       |
+   |   |                   | ``mask`` | any |                       |
+   +---+-------------------+----------+-----+                       |
+   | 4 | END                                |                       |
+   +---+------------------------------------+-----------------------+
+
+``HASH``
+~~~~~~~~
+
+There is no counterpart to this filter type because it translates to a
+global device setting instead of a pattern item. Device settings are
+automatically set according to the created flow rules.
+
+``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All packets are matched. This type alters incoming packets to encapsulate
+them in a chosen tunnel type, optionally redirect them to a VF as well.
+
+The destination pool for tag based forwarding can be emulated with other
+flow rules using `Action: DUP`_.
+
+.. _table_rte_flow_migration_l2tunnel:
+
+.. table:: L2_TUNNEL conversion
+
+   +---------------------------+--------------------+
+   | Pattern                   | Actions            |
+   +===+======+==========+=====+====================+
+   | 0 | VOID | ``spec`` | N/A | VXLAN, GENEVE, ... |
+   |   |      |          |     |                    |
+   |   |      |          |     |                    |
+   |   |      +----------+-----+                    |
+   |   |      | ``last`` | N/A |                    |
+   |   |      +----------+-----+                    |
+   |   |      | ``mask`` | N/A |                    |
+   |   |      |          |     |                    |
+   +---+------+----------+-----+--------------------+
+   | 1 | END                   | VF (optional)      |
+   +---+                       +--------------------+
+   | 2 |                       | END                |
+   +---+-----------------------+--------------------+
-- 
2.1.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v5 01/26] ethdev: introduce generic flow API
  @ 2016-12-21 14:51  2%           ` Adrien Mazarguil
  2016-12-21 14:51  1%           ` [dpdk-dev] [PATCH v5 02/26] doc: add rte_flow prog guide Adrien Mazarguil
  2016-12-21 14:51  2%           ` [dpdk-dev] [PATCH v5 04/26] cmdline: add support for dynamic tokens Adrien Mazarguil
  2 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-21 14:51 UTC (permalink / raw)
  To: dev

This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

Benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

Existing filter types will be deprecated and removed in the near future.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                            |   4 +
 doc/api/doxy-api-index.md              |   2 +
 lib/librte_ether/Makefile              |   3 +
 lib/librte_ether/rte_eth_ctrl.h        |   1 +
 lib/librte_ether/rte_ether_version.map |  11 +
 lib/librte_ether/rte_flow.c            | 159 +++++
 lib/librte_ether/rte_flow.h            | 947 ++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow_driver.h     | 182 ++++++
 8 files changed, 1309 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bb0b99..775b058 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -243,6 +243,10 @@ M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
 F: scripts/test-null.sh
 
+Generic flow API
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: lib/librte_ether/rte_flow*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index de65b4c..4951552 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -39,6 +39,8 @@ There are many libraries, so their headers may be grouped by topics:
   [dev]                (@ref rte_dev.h),
   [ethdev]             (@ref rte_ethdev.h),
   [ethctrl]            (@ref rte_eth_ctrl.h),
+  [rte_flow]           (@ref rte_flow.h),
+  [rte_flow_driver]    (@ref rte_flow_driver.h),
   [cryptodev]          (@ref rte_cryptodev.h),
   [devargs]            (@ref rte_devargs.h),
   [bond]               (@ref rte_eth_bond.h),
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..9335361 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
 LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
+SRCS-y += rte_flow.c
 
 #
 # Export include files
@@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
+SYMLINK-y-include += rte_flow_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index fe80eb0..8386904 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -99,6 +99,7 @@ enum rte_filter_type {
 	RTE_ETH_FILTER_FDIR,
 	RTE_ETH_FILTER_HASH,
 	RTE_ETH_FILTER_L2_TUNNEL,
+	RTE_ETH_FILTER_GENERIC,
 	RTE_ETH_FILTER_MAX
 };
 
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..384cdee 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -147,3 +147,14 @@ DPDK_16.11 {
 	rte_eth_dev_pci_remove;
 
 } DPDK_16.07;
+
+DPDK_17.02 {
+	global:
+
+	rte_flow_validate;
+	rte_flow_create;
+	rte_flow_destroy;
+	rte_flow_flush;
+	rte_flow_query;
+
+} DPDK_16.11;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
new file mode 100644
index 0000000..d98fb1b
--- /dev/null
+++ b/lib/librte_ether/rte_flow.c
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_flow_driver.h"
+#include "rte_flow.h"
+
+/* Get generic flow operations structure from a port. */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops;
+	int code;
+
+	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
+		code = ENODEV;
+	else if (unlikely(!dev->dev_ops->filter_ctrl ||
+			  dev->dev_ops->filter_ctrl(dev,
+						    RTE_ETH_FILTER_GENERIC,
+						    RTE_ETH_FILTER_GET,
+						    &ops) ||
+			  !ops))
+		code = ENOSYS;
+	else
+		return ops;
+	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(code));
+	return NULL;
+}
+
+/* Check whether a flow rule can be created on a given port. */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error)
+{
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->validate))
+		return ops->validate(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Create a flow rule on a given port. */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return NULL;
+	if (likely(!!ops->create))
+		return ops->create(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return NULL;
+}
+
+/* Destroy a flow rule on a given port. */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->destroy))
+		return ops->destroy(dev, flow, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Destroy all flow rules associated with a port. */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->flush))
+		return ops->flush(dev, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Query an existing flow rule. */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (!ops)
+		return -rte_errno;
+	if (likely(!!ops->query))
+		return ops->query(dev, flow, action, data, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
new file mode 100644
index 0000000..98084ac
--- /dev/null
+++ b/lib/librte_ether/rte_flow.h
@@ -0,0 +1,947 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_H_
+#define RTE_FLOW_H_
+
+/**
+ * @file
+ * RTE generic flow API
+ *
+ * This interface provides the ability to program packet matching and
+ * associated actions in hardware through flow rules.
+ */
+
+#include <rte_arp.h>
+#include <rte_ether.h>
+#include <rte_icmp.h>
+#include <rte_ip.h>
+#include <rte_sctp.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Flow rule attributes.
+ *
+ * Priorities are set on two levels: per group and per rule within groups.
+ *
+ * Lower values denote higher priority, the highest priority for both levels
+ * is 0, so that a rule with priority 0 in group 8 is always matched after a
+ * rule with priority 8 in group 0.
+ *
+ * Although optional, applications are encouraged to group similar rules as
+ * much as possible to fully take advantage of hardware capabilities
+ * (e.g. optimized matching) and work around limitations (e.g. a single
+ * pattern type possibly allowed in a given group).
+ *
+ * Group and priority levels are arbitrary and up to the application, they
+ * do not need to be contiguous nor start from 0, however the maximum number
+ * varies between devices and may be affected by existing flow rules.
+ *
+ * If a packet is matched by several rules of a given group for a given
+ * priority level, the outcome is undefined. It can take any path, may be
+ * duplicated or even cause unrecoverable errors.
+ *
+ * Note that support for more than a single group and priority level is not
+ * guaranteed.
+ *
+ * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+ *
+ * Several pattern items and actions are valid and can be used in both
+ * directions. Those valid for only one direction are described as such.
+ *
+ * At least one direction must be specified.
+ *
+ * Specifying both directions at once for a given rule is not recommended
+ * but may be valid in a few cases (e.g. shared counter).
+ */
+struct rte_flow_attr {
+	uint32_t group; /**< Priority group. */
+	uint32_t priority; /**< Priority level within group. */
+	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
+	uint32_t egress:1; /**< Rule applies to egress traffic. */
+	uint32_t reserved:30; /**< Reserved, must be zero. */
+};
+
+/**
+ * Matching pattern item types.
+ *
+ * Pattern items fall in two categories:
+ *
+ * - Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+ *   IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+ *   specification structure. These must be stacked in the same order as the
+ *   protocol layers to match, starting from the lowest.
+ *
+ * - Matching meta-data or affecting pattern processing (END, VOID, INVERT,
+ *   PF, VF, PORT and so on), often without a specification structure. Since
+ *   they do not match packet contents, these can be specified anywhere
+ *   within item lists without affecting others.
+ *
+ * See the description of individual types for more information. Those
+ * marked with [META] fall into the second category.
+ */
+enum rte_flow_item_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for item lists. Prevents further processing of items,
+	 * thereby ending the pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_VOID,
+
+	/**
+	 * [META]
+	 *
+	 * Inverted matching, i.e. process packets that do not match the
+	 * pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_INVERT,
+
+	/**
+	 * Matches any protocol in place of the current layer, a single ANY
+	 * may also stand for several protocol layers.
+	 *
+	 * See struct rte_flow_item_any.
+	 */
+	RTE_FLOW_ITEM_TYPE_ANY,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to the physical function of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a PF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_PF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to a virtual function ID of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a VF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * See struct rte_flow_item_vf.
+	 */
+	RTE_FLOW_ITEM_TYPE_VF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets coming from the specified physical port of the
+	 * underlying device.
+	 *
+	 * The first PORT item overrides the physical port normally
+	 * associated with the specified DPDK input port (port_id). This
+	 * item can be provided several times to match additional physical
+	 * ports.
+	 *
+	 * See struct rte_flow_item_port.
+	 */
+	RTE_FLOW_ITEM_TYPE_PORT,
+
+	/**
+	 * Matches a byte string of a given length at a given offset.
+	 *
+	 * See struct rte_flow_item_raw.
+	 */
+	RTE_FLOW_ITEM_TYPE_RAW,
+
+	/**
+	 * Matches an Ethernet header.
+	 *
+	 * See struct rte_flow_item_eth.
+	 */
+	RTE_FLOW_ITEM_TYPE_ETH,
+
+	/**
+	 * Matches an 802.1Q/ad VLAN tag.
+	 *
+	 * See struct rte_flow_item_vlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VLAN,
+
+	/**
+	 * Matches an IPv4 header.
+	 *
+	 * See struct rte_flow_item_ipv4.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV4,
+
+	/**
+	 * Matches an IPv6 header.
+	 *
+	 * See struct rte_flow_item_ipv6.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6,
+
+	/**
+	 * Matches an ICMP header.
+	 *
+	 * See struct rte_flow_item_icmp.
+	 */
+	RTE_FLOW_ITEM_TYPE_ICMP,
+
+	/**
+	 * Matches a UDP header.
+	 *
+	 * See struct rte_flow_item_udp.
+	 */
+	RTE_FLOW_ITEM_TYPE_UDP,
+
+	/**
+	 * Matches a TCP header.
+	 *
+	 * See struct rte_flow_item_tcp.
+	 */
+	RTE_FLOW_ITEM_TYPE_TCP,
+
+	/**
+	 * Matches a SCTP header.
+	 *
+	 * See struct rte_flow_item_sctp.
+	 */
+	RTE_FLOW_ITEM_TYPE_SCTP,
+
+	/**
+	 * Matches a VXLAN header.
+	 *
+	 * See struct rte_flow_item_vxlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VXLAN,
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ANY
+ *
+ * Matches any protocol in place of the current layer, a single ANY may also
+ * stand for several protocol layers.
+ *
+ * This is usually specified as the first pattern item when looking for a
+ * protocol anywhere in a packet.
+ *
+ * A zeroed mask stands for any number of layers.
+ */
+struct rte_flow_item_any {
+	uint32_t num; /* Number of layers covered. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VF
+ *
+ * Matches packets addressed to a virtual function ID of the device.
+ *
+ * If the underlying device function differs from the one that would
+ * normally receive the matched traffic, specifying this item prevents it
+ * from reaching that device unless the flow rule contains a VF
+ * action. Packets are not duplicated between device instances by default.
+ *
+ * - Likely to return an error or never match any traffic if this causes a
+ *   VF device to match traffic addressed to a different VF.
+ * - Can be specified multiple times to match traffic addressed to several
+ *   VF IDs.
+ * - Can be combined with a PF item to match both PF and VF traffic.
+ *
+ * A zeroed mask can be used to match any VF ID.
+ */
+struct rte_flow_item_vf {
+	uint32_t id; /**< Destination VF ID. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_PORT
+ *
+ * Matches packets coming from the specified physical port of the underlying
+ * device.
+ *
+ * The first PORT item overrides the physical port normally associated with
+ * the specified DPDK input port (port_id). This item can be provided
+ * several times to match additional physical ports.
+ *
+ * Note that physical ports are not necessarily tied to DPDK input ports
+ * (port_id) when those are not under DPDK control. Possible values are
+ * specific to each device, they are not necessarily indexed from zero and
+ * may not be contiguous.
+ *
+ * As a device property, the list of allowed values as well as the value
+ * associated with a port_id should be retrieved by other means.
+ *
+ * A zeroed mask can be used to match any port index.
+ */
+struct rte_flow_item_port {
+	uint32_t index; /**< Physical port index. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_RAW
+ *
+ * Matches a byte string of a given length at a given offset.
+ *
+ * Offset is either absolute (using the start of the packet) or relative to
+ * the end of the previous matched item in the stack, in which case negative
+ * values are allowed.
+ *
+ * If search is enabled, offset is used as the starting point. The search
+ * area can be delimited by setting limit to a nonzero value, which is the
+ * maximum number of bytes after offset where the pattern may start.
+ *
+ * Matching a zero-length pattern is allowed, doing so resets the relative
+ * offset for subsequent items.
+ *
+ * This type does not support ranges (struct rte_flow_item.last).
+ */
+struct rte_flow_item_raw {
+	uint32_t relative:1; /**< Look for pattern after the previous item. */
+	uint32_t search:1; /**< Search pattern from offset (see also limit). */
+	uint32_t reserved:30; /**< Reserved, must be set to zero. */
+	int32_t offset; /**< Absolute or relative offset for pattern. */
+	uint16_t limit; /**< Search area limit for start of pattern. */
+	uint16_t length; /**< Pattern length. */
+	uint8_t pattern[]; /**< Byte string to look for. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ETH
+ *
+ * Matches an Ethernet header.
+ */
+struct rte_flow_item_eth {
+	struct ether_addr dst; /**< Destination MAC. */
+	struct ether_addr src; /**< Source MAC. */
+	uint16_t type; /**< EtherType. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VLAN
+ *
+ * Matches an 802.1Q/ad VLAN tag.
+ *
+ * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
+ * RTE_FLOW_ITEM_TYPE_VLAN.
+ */
+struct rte_flow_item_vlan {
+	uint16_t tpid; /**< Tag protocol identifier. */
+	uint16_t tci; /**< Tag control information. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV4
+ *
+ * Matches an IPv4 header.
+ *
+ * Note: IPv4 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv4 {
+	struct ipv4_hdr hdr; /**< IPv4 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV6.
+ *
+ * Matches an IPv6 header.
+ *
+ * Note: IPv6 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv6 {
+	struct ipv6_hdr hdr; /**< IPv6 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ICMP.
+ *
+ * Matches an ICMP header.
+ */
+struct rte_flow_item_icmp {
+	struct icmp_hdr hdr; /**< ICMP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_UDP.
+ *
+ * Matches a UDP header.
+ */
+struct rte_flow_item_udp {
+	struct udp_hdr hdr; /**< UDP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_TCP.
+ *
+ * Matches a TCP header.
+ */
+struct rte_flow_item_tcp {
+	struct tcp_hdr hdr; /**< TCP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_SCTP.
+ *
+ * Matches a SCTP header.
+ */
+struct rte_flow_item_sctp {
+	struct sctp_hdr hdr; /**< SCTP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VXLAN.
+ *
+ * Matches a VXLAN header (RFC 7348).
+ */
+struct rte_flow_item_vxlan {
+	uint8_t flags; /**< Normally 0x08 (I flag). */
+	uint8_t rsvd0[3]; /**< Reserved, normally 0x000000. */
+	uint8_t vni[3]; /**< VXLAN identifier. */
+	uint8_t rsvd1; /**< Reserved, normally 0x00. */
+};
+
+/**
+ * Matching pattern item definition.
+ *
+ * A pattern is formed by stacking items starting from the lowest protocol
+ * layer to match. This stacking restriction does not apply to meta items
+ * which can be placed anywhere in the stack without affecting the meaning
+ * of the resulting pattern.
+ *
+ * Patterns are terminated by END items.
+ *
+ * The spec field should be a valid pointer to a structure of the related
+ * item type. It may be set to NULL in many cases to use default values.
+ *
+ * Optionally, last can point to a structure of the same type to define an
+ * inclusive range. This is mostly supported by integer and address fields,
+ * may cause errors otherwise. Fields that do not support ranges must be set
+ * to 0 or to the same value as the corresponding fields in spec.
+ *
+ * By default all fields present in spec are considered relevant (see note
+ * below). This behavior can be altered by providing a mask structure of the
+ * same type with applicable bits set to one. It can also be used to
+ * partially filter out specific fields (e.g. as an alternate mean to match
+ * ranges of IP addresses).
+ *
+ * Mask is a simple bit-mask applied before interpreting the contents of
+ * spec and last, which may yield unexpected results if not used
+ * carefully. For example, if for an IPv4 address field, spec provides
+ * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
+ * effective range becomes 10.1.0.0 to 10.3.255.255.
+ *
+ * Note: the defaults for data-matching items such as IPv4 when mask is not
+ * specified actually depend on the underlying implementation since only
+ * recognized fields can be taken into account.
+ */
+struct rte_flow_item {
+	enum rte_flow_item_type type; /**< Item type. */
+	const void *spec; /**< Pointer to item specification structure. */
+	const void *last; /**< Defines an inclusive range (spec to last). */
+	const void *mask; /**< Bit-mask applied to spec and last. */
+};
+
+/**
+ * Action types.
+ *
+ * Each possible action is represented by a type. Some have associated
+ * configuration structures. Several actions combined in a list can be
+ * affected to a flow rule. That list is not ordered.
+ *
+ * They fall in three categories:
+ *
+ * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+ *   processing matched packets by subsequent flow rules, unless overridden
+ *   with PASSTHRU.
+ *
+ * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
+ *   for additional processing by subsequent flow rules.
+ *
+ * - Other non terminating meta actions that do not affect the fate of
+ *   packets (END, VOID, MARK, FLAG, COUNT).
+ *
+ * When several actions are combined in a flow rule, they should all have
+ * different types (e.g. dropping a packet twice is not possible).
+ *
+ * Only the last action of a given type is taken into account. PMDs still
+ * perform error checking on the entire list.
+ *
+ * Note that PASSTHRU is the only action able to override a terminating
+ * rule.
+ */
+enum rte_flow_action_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for action lists. Prevents further processing of
+	 * actions, thereby ending the list.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_VOID,
+
+	/**
+	 * Leaves packets up for additional processing by subsequent flow
+	 * rules. This is the default when a rule does not contain a
+	 * terminating action, but can be specified to force a rule to
+	 * become non-terminating.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PASSTHRU,
+
+	/**
+	 * [META]
+	 *
+	 * Attaches a 32 bit value to packets.
+	 *
+	 * See struct rte_flow_action_mark.
+	 */
+	RTE_FLOW_ACTION_TYPE_MARK,
+
+	/**
+	 * [META]
+	 *
+	 * Flag packets. Similar to MARK but only affects ol_flags.
+	 *
+	 * Note: a distinctive flag must be defined for it.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_FLAG,
+
+	/**
+	 * Assigns packets to a given queue index.
+	 *
+	 * See struct rte_flow_action_queue.
+	 */
+	RTE_FLOW_ACTION_TYPE_QUEUE,
+
+	/**
+	 * Drops packets.
+	 *
+	 * PASSTHRU overrides this action if both are specified.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_DROP,
+
+	/**
+	 * [META]
+	 *
+	 * Enables counters for this rule.
+	 *
+	 * These counters can be retrieved and reset through rte_flow_query(),
+	 * see struct rte_flow_query_count.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_COUNT,
+
+	/**
+	 * Duplicates packets to a given queue index.
+	 *
+	 * This is normally combined with QUEUE, however when used alone, it
+	 * is actually similar to QUEUE + PASSTHRU.
+	 *
+	 * See struct rte_flow_action_dup.
+	 */
+	RTE_FLOW_ACTION_TYPE_DUP,
+
+	/**
+	 * Similar to QUEUE, except RSS is additionally performed on packets
+	 * to spread them among several queues according to the provided
+	 * parameters.
+	 *
+	 * See struct rte_flow_action_rss.
+	 */
+	RTE_FLOW_ACTION_TYPE_RSS,
+
+	/**
+	 * Redirects packets to the physical function (PF) of the current
+	 * device.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PF,
+
+	/**
+	 * Redirects packets to the virtual function (VF) of the current
+	 * device with the specified ID.
+	 *
+	 * See struct rte_flow_action_vf.
+	 */
+	RTE_FLOW_ACTION_TYPE_VF,
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_MARK
+ *
+ * Attaches a 32 bit value to packets.
+ *
+ * This value is arbitrary and application-defined. For compatibility with
+ * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
+ * also set in ol_flags.
+ */
+struct rte_flow_action_mark {
+	uint32_t id; /**< 32 bit value to return with packets. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_QUEUE
+ *
+ * Assign packets to a given queue index.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_queue {
+	uint16_t index; /**< Queue index to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_COUNT (query)
+ *
+ * Query structure to retrieve and reset flow rule counters.
+ */
+struct rte_flow_query_count {
+	uint32_t reset:1; /**< Reset counters after query [in]. */
+	uint32_t hits_set:1; /**< hits field is set [out]. */
+	uint32_t bytes_set:1; /**< bytes field is set [out]. */
+	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
+	uint64_t hits; /**< Number of hits for this rule [out]. */
+	uint64_t bytes; /**< Number of bytes through this rule [out]. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_DUP
+ *
+ * Duplicates packets to a given queue index.
+ *
+ * This is normally combined with QUEUE, however when used alone, it is
+ * actually similar to QUEUE + PASSTHRU.
+ *
+ * Non-terminating by default.
+ */
+struct rte_flow_action_dup {
+	uint16_t index; /**< Queue index to duplicate packets to. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_RSS
+ *
+ * Similar to QUEUE, except RSS is additionally performed on packets to
+ * spread them among several queues according to the provided parameters.
+ *
+ * Note: RSS hash result is normally stored in the hash.rss mbuf field,
+ * however it conflicts with the MARK action as they share the same
+ * space. When both actions are specified, the RSS hash is discarded and
+ * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
+ * structure should eventually evolve to store both.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_rss {
+	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
+	uint16_t num; /**< Number of entries in queue[]. */
+	uint16_t queue[]; /**< Queues indices to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_VF
+ *
+ * Redirects packets to a virtual function (VF) of the current device.
+ *
+ * Packets matched by a VF pattern item can be redirected to their original
+ * VF ID instead of the specified one. This parameter may not be available
+ * and is not guaranteed to work properly if the VF part is matched by a
+ * prior flow rule or if packets are not addressed to a VF in the first
+ * place.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_vf {
+	uint32_t original:1; /**< Use original VF ID if possible. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint32_t id; /**< VF ID to redirect packets to. */
+};
+
+/**
+ * Definition of a single action.
+ *
+ * A list of actions is terminated by a END action.
+ *
+ * For simple actions without a configuration structure, conf remains NULL.
+ */
+struct rte_flow_action {
+	enum rte_flow_action_type type; /**< Action type. */
+	const void *conf; /**< Pointer to action configuration structure. */
+};
+
+/**
+ * Opaque type returned after successfully creating a flow.
+ *
+ * This handle can be used to manage and query the related flow (e.g. to
+ * destroy it or retrieve counters).
+ */
+struct rte_flow;
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_flow_error.cause.
+ */
+enum rte_flow_error_type {
+	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+	RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+	RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+	RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_flow_error {
+	enum rte_flow_error_type type; /**< Cause field and error types. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Check whether a flow rule can be created on a given port.
+ *
+ * While this function has no effect on the target device, the flow rule is
+ * validated against its current configuration state and the returned value
+ * should be considered valid by the caller for that state only.
+ *
+ * The returned value is guaranteed to remain valid only as long as no
+ * successful calls to rte_flow_create() or rte_flow_destroy() are made in
+ * the meantime and no device parameter affecting flow rules in any way are
+ * modified, due to possible collisions or resource limitations (although in
+ * such cases EINVAL should not be returned).
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 if flow rule is valid and can be created. A negative errno value
+ *   otherwise (rte_errno is also set), the following errors are defined:
+ *
+ *   -ENOSYS: underlying device does not support this functionality.
+ *
+ *   -EINVAL: unknown or invalid rule specification.
+ *
+ *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
+ *   bit-masks are unsupported).
+ *
+ *   -EEXIST: collision with an existing rule.
+ *
+ *   -ENOMEM: not enough resources.
+ *
+ *   -EBUSY: action cannot be performed due to busy device resources, may
+ *   succeed if the affected queues or even the entire port are in a stopped
+ *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
+ */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error);
+
+/**
+ * Create a flow rule on a given port.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise and rte_errno is set
+ *   to the positive version of one of the error codes defined for
+ *   rte_flow_validate().
+ */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error);
+
+/**
+ * Destroy a flow rule on a given port.
+ *
+ * Failure to destroy a flow rule handle may occur when other flow rules
+ * depend on it, and destroying it would result in an inconsistent state.
+ *
+ * This function is only guaranteed to succeed if handles are destroyed in
+ * reverse order of their creation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to destroy.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error);
+
+/**
+ * Destroy all flow rules associated with a port.
+ *
+ * In the unlikely event of failure, handles are still considered destroyed
+ * and no longer valid but the port must be assumed to be in an inconsistent
+ * state.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error);
+
+/**
+ * Query an existing flow rule.
+ *
+ * This function allows retrieving flow-specific data such as counters.
+ * Data is gathered by special actions which must be present in the flow
+ * rule definition.
+ *
+ * \see RTE_FLOW_ACTION_TYPE_COUNT
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to query.
+ * @param action
+ *   Action type to query.
+ * @param[in, out] data
+ *   Pointer to storage for the associated query data type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_H_ */
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
new file mode 100644
index 0000000..cc97785
--- /dev/null
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -0,0 +1,182 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_DRIVER_H_
+#define RTE_FLOW_DRIVER_H_
+
+/**
+ * @file
+ * RTE generic flow API (driver side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_flow.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Generic flow operations structure implemented and returned by PMDs.
+ *
+ * To implement this API, PMDs must handle the RTE_ETH_FILTER_GENERIC filter
+ * type in their .filter_ctrl callback function (struct eth_dev_ops) as well
+ * as the RTE_ETH_FILTER_GET filter operation.
+ *
+ * If successful, this operation must result in a pointer to a PMD-specific
+ * struct rte_flow_ops written to the argument address as described below:
+ *
+ * \code
+ *
+ * // PMD filter_ctrl callback
+ *
+ * static const struct rte_flow_ops pmd_flow_ops = { ... };
+ *
+ * switch (filter_type) {
+ * case RTE_ETH_FILTER_GENERIC:
+ *     if (filter_op != RTE_ETH_FILTER_GET)
+ *         return -EINVAL;
+ *     *(const void **)arg = &pmd_flow_ops;
+ *     return 0;
+ * }
+ *
+ * \endcode
+ *
+ * See also rte_flow_ops_get().
+ *
+ * These callback functions are not supposed to be used by applications
+ * directly, which must rely on the API defined in rte_flow.h.
+ *
+ * Public-facing wrapper functions perform a few consistency checks so that
+ * unimplemented (i.e. NULL) callbacks simply return -ENOTSUP. These
+ * callbacks otherwise only differ by their first argument (with port ID
+ * already resolved to a pointer to struct rte_eth_dev).
+ */
+struct rte_flow_ops {
+	/** See rte_flow_validate(). */
+	int (*validate)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_create(). */
+	struct rte_flow *(*create)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_destroy(). */
+	int (*destroy)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 struct rte_flow_error *);
+	/** See rte_flow_flush(). */
+	int (*flush)
+		(struct rte_eth_dev *,
+		 struct rte_flow_error *);
+	/** See rte_flow_query(). */
+	int (*query)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 enum rte_flow_action_type,
+		 void *,
+		 struct rte_flow_error *);
+};
+
+/**
+ * Initialize generic flow error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Pointer to flow error structure.
+ */
+static inline struct rte_flow_error *
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return error;
+}
+
+/**
+ * Get generic flow operations structure from a port.
+ *
+ * @param port_id
+ *   Port identifier to query.
+ * @param[out] error
+ *   Pointer to flow error structure.
+ *
+ * @return
+ *   The flow operations structure associated with port_id, NULL in case of
+ *   error, in which case rte_errno is set and the error structure contains
+ *   additional details.
+ */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_DRIVER_H_ */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats from PF
  @ 2016-12-21  0:56  3%         ` Lu, Wenzhuo
  2016-12-22 16:38  0%           ` Iremonger, Bernard
  0 siblings, 1 reply; 200+ results
From: Lu, Wenzhuo @ 2016-12-21  0:56 UTC (permalink / raw)
  To: Iremonger, Bernard, Yigit, Ferruh, dev
  Cc: Wu, Jingjing, Zhang, Helin, Zhang, Qi Z, Chen, Jing D

Hi all,


> -----Original Message-----
> From: Iremonger, Bernard
> Sent: Tuesday, December 20, 2016 9:40 PM
> To: Yigit, Ferruh; dev@dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Zhang, Qi Z; Lu, Wenzhuo; Chen, Jing D
> Subject: RE: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats from PF
> 
> Hi Ferruh,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > Sent: Tuesday, December 20, 2016 1:25 PM
> > To: dev@dpdk.org
> > Cc: Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin
> > <helin.zhang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Lu,
> > Wenzhuo <wenzhuo.lu@intel.com>; Chen, Jing D <jing.d.chen@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats
> > from PF
> >
> > On 12/16/2016 7:02 PM, Ferruh Yigit wrote:
> > > From: Qi Zhang <qi.z.zhang@intel.com>
> > >
> > > This patch add support to get/clear VF statistics from PF side.
> > > Two APIs are added:
> > > rte_pmd_i40e_get_vf_stats.
> > > rte_pmd_i40e_reset_vf_stats.
> > >
> > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > ---
> >
> > <...>
> >
> > > diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > index 8ac1bc8..7a5d211 100644
> > > --- a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > +++ b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > @@ -6,7 +6,9 @@ DPDK_2.0 {
> > >  DPDK_17.02 {
> > >  	global:
> > >
> > > +	rte_pmd_i40e_get_vf_stats;
> > >  	rte_pmd_i40e_ping_vfs;
> > > +	rte_pmd_i40e_reset_vf_stats;
> > >  	rte_pmd_i40e_set_tx_loopback;
> > >  	rte_pmd_i40e_set_vf_broadcast;
> > >  	rte_pmd_i40e_set_vf_mac_addr;
> >
> > Hi Wenzhuo, Mark,
> >
> > I think this is the list of all APIs added with this patchset.
> >
> > Just a question, what do you think following a logic in API naming as:
> > <name_space>_<object>_<action> ?
> >
> > So API names become:
> > rte_pmd_i40e_tx_loopback_set;
> > rte_pmd_i40e_vf_broadcast_set;
> > rte_pmd_i40e_vf_mac_addr_set;
> > rte_pmd_i40e_vfs_ping;
> > rte_pmd_i40e_vf_stats_get;
> > rte_pmd_i40e_vf_stats_reset;
> >
> >
> > After above rename, rte_pmd_i40e_tx_loopback_set() is not giving a
> > hint that this is something related to the PF controlling VF, perhaps
> > we can rename the API ?
> >
> > Also rte_pmd_i40e_vfs_ping() can become rte_pmd_i40e_vf_ping_all() to
> > be more consistent about _vf_ usage.
> >
> > Overall, they can be something like:
> > rte_pmd_i40e_vf_broadcast_set;
> > rte_pmd_i40e_vf_mac_addr_set;
> > rte_pmd_i40e_vf_ping_all;
> > rte_pmd_i40e_vf_stats_get;
> > rte_pmd_i40e_vf_stats_reset;
> > rte_pmd_i40e_vf_tx_loopback_set;
> >
> > What do you think?
> >
> 
> I think the naming should be consistent with what has already been implemented
> for the ixgbe PMD.
> 	rte_pmd_ixgbe_set_all_queues_drop_en;
> 	rte_pmd_ixgbe_set_tx_loopback;
> 	rte_pmd_ixgbe_set_vf_mac_addr;
> 	rte_pmd_ixgbe_set_vf_mac_anti_spoof;
> 	rte_pmd_ixgbe_set_vf_split_drop_en;
> 	rte_pmd_ixgbe_set_vf_vlan_anti_spoof;
> 	rte_pmd_ixgbe_set_vf_vlan_insert;
> 	rte_pmd_ixgbe_set_vf_vlan_stripq;
> 
> 	rte_pmd_ixgbe_set_vf_rate_limit;
> 	rte_pmd_ixgbe_set_vf_rx;
> 	rte_pmd_ixgbe_set_vf_rxmode;
> 	rte_pmd_ixgbe_set_vf_tx;
> 	rte_pmd_ixgbe_set_vf_vlan_filter;
So, seems better to use the current names. Rework both ixgbe and i40e's later. Not sure if it'll be counted as the ABI change if we change the ixgbe's name.

> 
> Regards,
> 
> Bernard.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 04/25] cmdline: add support for dynamic tokens
    2016-12-20 18:42  2%         ` [dpdk-dev] [PATCH v4 01/25] ethdev: introduce generic flow API Adrien Mazarguil
  2016-12-20 18:42  1%         ` [dpdk-dev] [PATCH v4 02/25] doc: add rte_flow prog guide Adrien Mazarguil
@ 2016-12-20 18:42  2%         ` Adrien Mazarguil
    3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-20 18:42 UTC (permalink / raw)
  To: dev

Considering tokens must be hard-coded in a list part of the instruction
structure, context-dependent tokens cannot be expressed.

This commit adds support for building dynamic token lists through a
user-provided function, which is called when the static token list is empty
(a single NULL entry).

Because no structures are modified (existing fields are reused), this
commit has no impact on the current ABI.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 lib/librte_cmdline/cmdline_parse.c | 60 +++++++++++++++++++++++++++++----
 lib/librte_cmdline/cmdline_parse.h | 21 ++++++++++++
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c
index b496067..14f5553 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -146,7 +146,9 @@ nb_common_chars(const char * s1, const char * s2)
  */
 static int
 match_inst(cmdline_parse_inst_t *inst, const char *buf,
-	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size)
+	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size,
+	   cmdline_parse_token_hdr_t
+		*(*dyn_tokens)[CMDLINE_PARSE_DYNAMIC_TOKENS])
 {
 	unsigned int token_num=0;
 	cmdline_parse_token_hdr_t * token_p;
@@ -155,6 +157,11 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 	struct cmdline_token_hdr token_hdr;
 
 	token_p = inst->tokens[token_num];
+	if (!token_p && dyn_tokens && inst->f) {
+		if (!(*dyn_tokens)[0])
+			inst->f(&(*dyn_tokens)[0], NULL, dyn_tokens);
+		token_p = (*dyn_tokens)[0];
+	}
 	if (token_p)
 		memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -196,7 +203,17 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 		buf += n;
 
 		token_num ++;
-		token_p = inst->tokens[token_num];
+		if (!inst->tokens[0]) {
+			if (token_num < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!(*dyn_tokens)[token_num])
+					inst->f(&(*dyn_tokens)[token_num],
+						NULL,
+						dyn_tokens);
+				token_p = (*dyn_tokens)[token_num];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[token_num];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 	}
@@ -239,6 +256,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 	cmdline_parse_inst_t *inst;
 	const char *curbuf;
 	char result_buf[CMDLINE_PARSE_RESULT_BUFSIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	void (*f)(void *, struct cmdline *, void *) = NULL;
 	void *data = NULL;
 	int comment = 0;
@@ -255,6 +273,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		return CMDLINE_PARSE_BAD_ARGS;
 
 	ctx = cl->ctx;
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/*
 	 * - look if the buffer contains at least one line
@@ -299,7 +318,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		debug_printf("INST %d\n", inst_num);
 
 		/* fully parsed */
-		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf));
+		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf),
+				 &dyn_tokens);
 
 		if (tok > 0) /* we matched at least one token */
 			err = CMDLINE_PARSE_BAD_ARGS;
@@ -355,6 +375,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 	cmdline_parse_token_hdr_t *token_p;
 	struct cmdline_token_hdr token_hdr;
 	char tmpbuf[CMDLINE_BUFFER_SIZE], comp_buf[CMDLINE_BUFFER_SIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	unsigned int partial_tok_len;
 	int comp_len = -1;
 	int tmp_len = -1;
@@ -374,6 +395,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 
 	debug_printf("%s called\n", __func__);
 	memset(&token_hdr, 0, sizeof(token_hdr));
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/* count the number of complete token to parse */
 	for (i=0 ; buf[i] ; i++) {
@@ -396,11 +418,24 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		inst = ctx[inst_num];
 		while (inst) {
 			/* parse the first tokens of the inst */
-			if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+			if (nb_token &&
+			    match_inst(inst, buf, nb_token, NULL, 0,
+				       &dyn_tokens))
 				goto next;
 
 			debug_printf("instruction match\n");
-			token_p = inst->tokens[nb_token];
+			if (!inst->tokens[0]) {
+				if (nb_token <
+				    (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+					if (!dyn_tokens[nb_token])
+						inst->f(&dyn_tokens[nb_token],
+							NULL,
+							&dyn_tokens);
+					token_p = dyn_tokens[nb_token];
+				} else
+					token_p = NULL;
+			} else
+				token_p = inst->tokens[nb_token];
 			if (token_p)
 				memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -490,10 +525,21 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		/* we need to redo it */
 		inst = ctx[inst_num];
 
-		if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+		if (nb_token &&
+		    match_inst(inst, buf, nb_token, NULL, 0, &dyn_tokens))
 			goto next2;
 
-		token_p = inst->tokens[nb_token];
+		if (!inst->tokens[0]) {
+			if (nb_token < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!dyn_tokens[nb_token])
+					inst->f(&dyn_tokens[nb_token],
+						NULL,
+						&dyn_tokens);
+				token_p = dyn_tokens[nb_token];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[nb_token];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
diff --git a/lib/librte_cmdline/cmdline_parse.h b/lib/librte_cmdline/cmdline_parse.h
index 4ac05d6..65b18d4 100644
--- a/lib/librte_cmdline/cmdline_parse.h
+++ b/lib/librte_cmdline/cmdline_parse.h
@@ -83,6 +83,9 @@ extern "C" {
 /* maximum buffer size for parsed result */
 #define CMDLINE_PARSE_RESULT_BUFSIZE 8192
 
+/* maximum number of dynamic tokens */
+#define CMDLINE_PARSE_DYNAMIC_TOKENS 128
+
 /**
  * Stores a pointer to the ops struct, and the offset: the place to
  * write the parsed result in the destination structure.
@@ -130,6 +133,24 @@ struct cmdline;
  * Store a instruction, which is a pointer to a callback function and
  * its parameter that is called when the instruction is parsed, a help
  * string, and a list of token composing this instruction.
+ *
+ * When no tokens are defined (tokens[0] == NULL), they are retrieved
+ * dynamically by calling f() as follows:
+ *
+ *  f((struct cmdline_token_hdr **)&token_hdr,
+ *    NULL,
+ *    (struct cmdline_token_hdr *[])tokens));
+ *
+ * The address of the resulting token is expected at the location pointed by
+ * the first argument. Can be set to NULL to end the list.
+ *
+ * The cmdline argument (struct cmdline *) is always NULL.
+ *
+ * The last argument points to the NULL-terminated list of dynamic tokens
+ * defined so far. Since token_hdr points to an index of that list, the
+ * current index can be derived as follows:
+ *
+ *  int index = token_hdr - &(*tokens)[0];
  */
 struct cmdline_inst {
 	/* f(parsed_struct, data) */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v4 02/25] doc: add rte_flow prog guide
    2016-12-20 18:42  2%         ` [dpdk-dev] [PATCH v4 01/25] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-12-20 18:42  1%         ` Adrien Mazarguil
  2016-12-20 18:42  2%         ` [dpdk-dev] [PATCH v4 04/25] cmdline: add support for dynamic tokens Adrien Mazarguil
    3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-20 18:42 UTC (permalink / raw)
  To: dev

This documentation is based on the latest RFC submission, subsequently
updated according to feedback from the community.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/prog_guide/index.rst    |    1 +
 doc/guides/prog_guide/rte_flow.rst | 2042 +++++++++++++++++++++++++++++++
 2 files changed, 2043 insertions(+)

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index e5a50a8..ed7f770 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -42,6 +42,7 @@ Programmer's Guide
     mempool_lib
     mbuf_lib
     poll_mode_drv
+    rte_flow
     cryptodev_lib
     link_bonding_poll_mode_drv_lib
     timer_lib
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
new file mode 100644
index 0000000..98c672e
--- /dev/null
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -0,0 +1,2042 @@
+..  BSD LICENSE
+    Copyright 2016 6WIND S.A.
+    Copyright 2016 Mellanox.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Generic_flow_API:
+
+Generic flow API (rte_flow)
+===========================
+
+Overview
+--------
+
+This API provides a generic means to configure hardware to match specific
+ingress or egress traffic, alter its fate and query related counters
+according to any number of user-defined rules.
+
+It is named *rte_flow* after the prefix used for all its symbols, and is
+defined in ``rte_flow.h``.
+
+- Matching can be performed on packet data (protocol headers, payload) and
+  properties (e.g. associated physical port, virtual device function ID).
+
+- Possible operations include dropping traffic, diverting it to specific
+  queues, to virtual/physical device functions or ports, performing tunnel
+  offloads, adding marks and so on.
+
+It is slightly higher-level than the legacy filtering framework which it
+encompasses and supersedes (including all functions and filter types) in
+order to expose a single interface with an unambiguous behavior that is
+common to all poll-mode drivers (PMDs).
+
+Several methods to migrate existing applications are described in `API
+migration`_.
+
+Flow rule
+---------
+
+Description
+~~~~~~~~~~~
+
+A flow rule is the combination of attributes with a matching pattern and a
+list of actions. Flow rules form the basis of this API.
+
+Flow rules can have several distinct actions (such as counting,
+encapsulating, decapsulating before redirecting packets to a particular
+queue, etc.), instead of relying on several rules to achieve this and having
+applications deal with hardware implementation details regarding their
+order.
+
+Support for different priority levels on a rule basis is provided, for
+example in order to force a more specific rule to come before a more generic
+one for packets matched by both. However hardware support for more than a
+single priority level cannot be guaranteed. When supported, the number of
+available priority levels is usually low, which is why they can also be
+implemented in software by PMDs (e.g. missing priority levels may be
+emulated by reordering rules).
+
+In order to remain as hardware-agnostic as possible, by default all rules
+are considered to have the same priority, which means that the order between
+overlapping rules (when a packet is matched by several filters) is
+undefined.
+
+PMDs may refuse to create overlapping rules at a given priority level when
+they can be detected (e.g. if a pattern matches an existing filter).
+
+Thus predictable results for a given priority level can only be achieved
+with non-overlapping rules, using perfect matching on all protocol layers.
+
+Flow rules can also be grouped, the flow rule priority is specific to the
+group they belong to. All flow rules in a given group are thus processed
+either before or after another group.
+
+Support for multiple actions per rule may be implemented internally on top
+of non-default hardware priorities, as a result both features may not be
+simultaneously available to applications.
+
+Considering that allowed pattern/actions combinations cannot be known in
+advance and would result in an impractically large number of capabilities to
+expose, a method is provided to validate a given rule from the current
+device configuration state.
+
+This enables applications to check if the rule types they need is supported
+at initialization time, before starting their data path. This method can be
+used anytime, its only requirement being that the resources needed by a rule
+should exist (e.g. a target RX queue should be configured first).
+
+Each defined rule is associated with an opaque handle managed by the PMD,
+applications are responsible for keeping it. These can be used for queries
+and rules management, such as retrieving counters or other data and
+destroying them.
+
+To avoid resource leaks on the PMD side, handles must be explicitly
+destroyed by the application before releasing associated resources such as
+queues and ports.
+
+The following sections cover:
+
+- **Attributes** (represented by ``struct rte_flow_attr``): properties of a
+  flow rule such as its direction (ingress or egress) and priority.
+
+- **Pattern item** (represented by ``struct rte_flow_item``): part of a
+  matching pattern that either matches specific packet data or traffic
+  properties. It can also describe properties of the pattern itself, such as
+  inverted matching.
+
+- **Matching pattern**: traffic properties to look for, a combination of any
+  number of items.
+
+- **Actions** (represented by ``struct rte_flow_action``): operations to
+  perform whenever a packet is matched by a pattern.
+
+Attributes
+~~~~~~~~~~
+
+Attribute: Group
+^^^^^^^^^^^^^^^^
+
+Flow rules can be grouped by assigning them a common group number. Lower
+values have higher priority. Group 0 has the highest priority.
+
+Although optional, applications are encouraged to group similar rules as
+much as possible to fully take advantage of hardware capabilities
+(e.g. optimized matching) and work around limitations (e.g. a single pattern
+type possibly allowed in a given group).
+
+Note that support for more than a single group is not guaranteed.
+
+Attribute: Priority
+^^^^^^^^^^^^^^^^^^^
+
+A priority level can be assigned to a flow rule. Like groups, lower values
+denote higher priority, with 0 as the maximum.
+
+A rule with priority 0 in group 8 is always matched after a rule with
+priority 8 in group 0.
+
+Group and priority levels are arbitrary and up to the application, they do
+not need to be contiguous nor start from 0, however the maximum number
+varies between devices and may be affected by existing flow rules.
+
+If a packet is matched by several rules of a given group for a given
+priority level, the outcome is undefined. It can take any path, may be
+duplicated or even cause unrecoverable errors.
+
+Note that support for more than a single priority level is not guaranteed.
+
+Attribute: Traffic direction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+
+Several pattern items and actions are valid and can be used in both
+directions. At least one direction must be specified.
+
+Specifying both directions at once for a given rule is not recommended but
+may be valid in a few cases (e.g. shared counters).
+
+Pattern item
+~~~~~~~~~~~~
+
+Pattern items fall in two categories:
+
+- Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+  IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+  specification structure.
+
+- Matching meta-data or affecting pattern processing (END, VOID, INVERT, PF,
+  VF, PORT and so on), often without a specification structure.
+
+Item specification structures are used to match specific values among
+protocol fields (or item properties). Documentation describes for each item
+whether they are associated with one and their type name if so.
+
+Up to three structures of the same type can be set for a given item:
+
+- ``spec``: values to match (e.g. a given IPv4 address).
+
+- ``last``: upper bound for an inclusive range with corresponding fields in
+  ``spec``.
+
+- ``mask``: bit-mask applied to both ``spec`` and ``last`` whose purpose is
+  to distinguish the values to take into account and/or partially mask them
+  out (e.g. in order to match an IPv4 address prefix).
+
+Usage restrictions and expected behavior:
+
+- Setting either ``mask`` or ``last`` without ``spec`` is an error.
+
+- Field values in ``last`` which are either 0 or equal to the corresponding
+  values in ``spec`` are ignored; they do not generate a range. Nonzero
+  values lower than those in ``spec`` are not supported.
+
+- Setting ``spec`` and optionally ``last`` without ``mask`` causes the PMD
+  to only take the fields it can recognize into account. There is no error
+  checking for unsupported fields.
+
+- Not setting any of them (assuming item type allows it) uses default
+  parameters that depend on the item type. Most of the time, particularly
+  for protocol header items, it is equivalent to providing an empty (zeroed)
+  ``mask``.
+
+- ``mask`` is a simple bit-mask applied before interpreting the contents of
+  ``spec`` and ``last``, which may yield unexpected results if not used
+  carefully. For example, if for an IPv4 address field, ``spec`` provides
+  *10.1.2.3*, ``last`` provides *10.3.4.5* and ``mask`` provides
+  *255.255.0.0*, the effective range becomes *10.1.0.0* to *10.3.255.255*.
+
+Example of an item specification matching an Ethernet header:
+
+.. _table_rte_flow_pattern_item_example:
+
+.. table:: Ethernet item
+
+   +----------+----------+--------------------+
+   | Field    | Subfield | Value              |
+   +==========+==========+====================+
+   | ``spec`` | ``src``  | ``00:01:02:03:04`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:2a:66:00:01`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x22aa``         |
+   +----------+----------+--------------------+
+   | ``last`` | unspecified                   |
+   +----------+----------+--------------------+
+   | ``mask`` | ``src``  | ``00:ff:ff:ff:00`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:00:00:00:ff`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x0000``         |
+   +----------+----------+--------------------+
+
+Non-masked bits stand for any value (shown as ``?`` below), Ethernet headers
+with the following properties are thus matched:
+
+- ``src``: ``??:01:02:03:??``
+- ``dst``: ``??:??:??:??:01``
+- ``type``: ``0x????``
+
+Matching pattern
+~~~~~~~~~~~~~~~~
+
+A pattern is formed by stacking items starting from the lowest protocol
+layer to match. This stacking restriction does not apply to meta items which
+can be placed anywhere in the stack without affecting the meaning of the
+resulting pattern.
+
+Patterns are terminated by END items.
+
+Examples:
+
+.. _table_rte_flow_tcpv4_as_l4:
+
+.. table:: TCPv4 as L4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | IPv4     |
+   +-------+----------+
+   | 2     | TCP      |
+   +-------+----------+
+   | 3     | END      |
+   +-------+----------+
+
+|
+
+.. _table_rte_flow_tcpv6_in_vxlan:
+
+.. table:: TCPv6 in VXLAN
+
+   +-------+------------+
+   | Index | Item       |
+   +=======+============+
+   | 0     | Ethernet   |
+   +-------+------------+
+   | 1     | IPv4       |
+   +-------+------------+
+   | 2     | UDP        |
+   +-------+------------+
+   | 3     | VXLAN      |
+   +-------+------------+
+   | 4     | Ethernet   |
+   +-------+------------+
+   | 5     | IPv6       |
+   +-------+------------+
+   | 6     | TCP        |
+   +-------+------------+
+   | 7     | END        |
+   +-------+------------+
+
+|
+
+.. _table_rte_flow_tcpv4_as_l4_meta:
+
+.. table:: TCPv4 as L4 with meta items
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | VOID     |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | VOID     |
+   +-------+----------+
+   | 3     | IPv4     |
+   +-------+----------+
+   | 4     | TCP      |
+   +-------+----------+
+   | 5     | VOID     |
+   +-------+----------+
+   | 6     | VOID     |
+   +-------+----------+
+   | 7     | END      |
+   +-------+----------+
+
+The above example shows how meta items do not affect packet data matching
+items, as long as those remain stacked properly. The resulting matching
+pattern is identical to "TCPv4 as L4".
+
+.. _table_rte_flow_udpv6_anywhere:
+
+.. table:: UDPv6 anywhere
+
+   +-------+------+
+   | Index | Item |
+   +=======+======+
+   | 0     | IPv6 |
+   +-------+------+
+   | 1     | UDP  |
+   +-------+------+
+   | 2     | END  |
+   +-------+------+
+
+If supported by the PMD, omitting one or several protocol layers at the
+bottom of the stack as in the above example (missing an Ethernet
+specification) enables looking up anywhere in packets.
+
+It is unspecified whether the payload of supported encapsulations
+(e.g. VXLAN payload) is matched by such a pattern, which may apply to inner,
+outer or both packets.
+
+.. _table_rte_flow_invalid_l3:
+
+.. table:: Invalid, missing L3
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | UDP      |
+   +-------+----------+
+   | 2     | END      |
+   +-------+----------+
+
+The above pattern is invalid due to a missing L3 specification between L2
+(Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the
+top of the stack.
+
+Meta item types
+~~~~~~~~~~~~~~~
+
+They match meta-data or affect pattern processing instead of matching packet
+data directly, most of them do not need a specification structure. This
+particularity allows them to be specified anywhere in the stack without
+causing any side effect.
+
+Item: ``END``
+^^^^^^^^^^^^^
+
+End marker for item lists. Prevents further processing of items, thereby
+ending the pattern.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_end:
+
+.. table:: END
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Item: ``VOID``
+^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_void:
+
+.. table:: VOID
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+One usage example for this type is generating rules that share a common
+prefix quickly without reallocating memory, only by updating item types:
+
+.. _table_rte_flow_item_void_example:
+
+.. table:: TCP, UDP or ICMP as L4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | Ethernet           |
+   +-------+--------------------+
+   | 1     | IPv4               |
+   +-------+------+------+------+
+   | 2     | UDP  | VOID | VOID |
+   +-------+------+------+------+
+   | 3     | VOID | TCP  | VOID |
+   +-------+------+------+------+
+   | 4     | VOID | VOID | ICMP |
+   +-------+------+------+------+
+   | 5     | END                |
+   +-------+--------------------+
+
+Item: ``INVERT``
+^^^^^^^^^^^^^^^^
+
+Inverted matching, i.e. process packets that do not match the pattern.
+
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_invert:
+
+.. table:: INVERT
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Usage example, matching non-TCPv4 packets only:
+
+.. _table_rte_flow_item_invert_example:
+
+.. table:: Anything but TCPv4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | INVERT   |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | IPv4     |
+   +-------+----------+
+   | 3     | TCP      |
+   +-------+----------+
+   | 4     | END      |
+   +-------+----------+
+
+Item: ``PF``
+^^^^^^^^^^^^
+
+Matches packets addressed to the physical function of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: PF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if applied to a VF
+  device.
+- Can be combined with any number of `Item: VF`_ to match both PF and VF
+  traffic.
+- ``spec``, ``last`` and ``mask`` must not be set.
+
+.. _table_rte_flow_item_pf:
+
+.. table:: PF
+
+   +----------+-------+
+   | Field    | Value |
+   +==========+=======+
+   | ``spec`` | unset |
+   +----------+-------+
+   | ``last`` | unset |
+   +----------+-------+
+   | ``mask`` | unset |
+   +----------+-------+
+
+Item: ``VF``
+^^^^^^^^^^^^
+
+Matches packets addressed to a virtual function ID of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: VF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if this causes a VF
+  device to match traffic addressed to a different VF.
+- Can be specified multiple times to match traffic addressed to several VF
+  IDs.
+- Can be combined with a PF item to match both PF and VF traffic.
+
+.. _table_rte_flow_item_vf:
+
+.. table:: VF
+
+   +----------+----------+---------------------------+
+   | Field    | Subfield | Value                     |
+   +==========+==========+===========================+
+   | ``spec`` | ``id``   | destination VF ID         |
+   +----------+----------+---------------------------+
+   | ``last`` | ``id``   | upper range value         |
+   +----------+----------+---------------------------+
+   | ``mask`` | ``id``   | zeroed to match any VF ID |
+   +----------+----------+---------------------------+
+
+Item: ``PORT``
+^^^^^^^^^^^^^^
+
+Matches packets coming from the specified physical port of the underlying
+device.
+
+The first PORT item overrides the physical port normally associated with the
+specified DPDK input port (port_id). This item can be provided several times
+to match additional physical ports.
+
+Note that physical ports are not necessarily tied to DPDK input ports
+(port_id) when those are not under DPDK control. Possible values are
+specific to each device, they are not necessarily indexed from zero and may
+not be contiguous.
+
+As a device property, the list of allowed values as well as the value
+associated with a port_id should be retrieved by other means.
+
+.. _table_rte_flow_item_port:
+
+.. table:: PORT
+
+   +----------+-----------+--------------------------------+
+   | Field    | Subfield  | Value                          |
+   +==========+===========+================================+
+   | ``spec`` | ``index`` | physical port index            |
+   +----------+-----------+--------------------------------+
+   | ``last`` | ``index`` | upper range value              |
+   +----------+-----------+--------------------------------+
+   | ``mask`` | ``index`` | zeroed to match any port index |
+   +----------+-----------+--------------------------------+
+
+Data matching item types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most of these are basically protocol header definitions with associated
+bit-masks. They must be specified (stacked) from lowest to highest protocol
+layer to form a matching pattern.
+
+The following list is not exhaustive, new protocols will be added in the
+future.
+
+Item: ``ANY``
+^^^^^^^^^^^^^
+
+Matches any protocol in place of the current layer, a single ANY may also
+stand for several protocol layers.
+
+This is usually specified as the first pattern item when looking for a
+protocol anywhere in a packet.
+
+.. _table_rte_flow_item_any:
+
+.. table:: ANY
+
+   +----------+----------+--------------------------------------+
+   | Field    | Subfield | Value                                |
+   +==========+==========+======================================+
+   | ``spec`` | ``num``  | number of layers covered             |
+   +----------+----------+--------------------------------------+
+   | ``last`` | ``num``  | upper range value                    |
+   +----------+----------+--------------------------------------+
+   | ``mask`` | ``num``  | zeroed to cover any number of layers |
+   +----------+----------+--------------------------------------+
+
+Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
+and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
+or IPv6) matched by the second ANY specification:
+
+.. _table_rte_flow_item_any_example:
+
+.. table:: TCP in VXLAN with wildcards
+
+   +-------+------+----------+----------+-------+
+   | Index | Item | Field    | Subfield | Value |
+   +=======+======+==========+==========+=======+
+   | 0     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 1     | ANY  | ``spec`` | ``num``  | 2     |
+   +-------+------+----------+----------+-------+
+   | 2     | VXLAN                              |
+   +-------+------------------------------------+
+   | 3     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 4     | ANY  | ``spec`` | ``num``  | 1     |
+   +-------+------+----------+----------+-------+
+   | 5     | TCP                                |
+   +-------+------------------------------------+
+   | 6     | END                                |
+   +-------+------------------------------------+
+
+Item: ``RAW``
+^^^^^^^^^^^^^
+
+Matches a byte string of a given length at a given offset.
+
+Offset is either absolute (using the start of the packet) or relative to the
+end of the previous matched item in the stack, in which case negative values
+are allowed.
+
+If search is enabled, offset is used as the starting point. The search area
+can be delimited by setting limit to a nonzero value, which is the maximum
+number of bytes after offset where the pattern may start.
+
+Matching a zero-length pattern is allowed, doing so resets the relative
+offset for subsequent items.
+
+- This type does not support ranges (``last`` field).
+
+.. _table_rte_flow_item_raw:
+
+.. table:: RAW
+
+   +----------+--------------+-------------------------------------------------+
+   | Field    | Subfield     | Value                                           |
+   +==========+==============+=================================================+
+   | ``spec`` | ``relative`` | look for pattern after the previous item        |
+   |          +--------------+-------------------------------------------------+
+   |          | ``search``   | search pattern from offset (see also ``limit``) |
+   |          +--------------+-------------------------------------------------+
+   |          | ``reserved`` | reserved, must be set to zero                   |
+   |          +--------------+-------------------------------------------------+
+   |          | ``offset``   | absolute or relative offset for ``pattern``     |
+   |          +--------------+-------------------------------------------------+
+   |          | ``limit``    | search area limit for start of ``pattern``      |
+   |          +--------------+-------------------------------------------------+
+   |          | ``length``   | ``pattern`` length                              |
+   |          +--------------+-------------------------------------------------+
+   |          | ``pattern``  | byte string to look for                         |
+   +----------+--------------+-------------------------------------------------+
+   | ``last`` | if specified, either all 0 or with the same values as ``spec`` |
+   +----------+----------------------------------------------------------------+
+   | ``mask`` | bit-mask applied to ``spec`` values with usual behavior        |
+   +----------+----------------------------------------------------------------+
+
+Example pattern looking for several strings at various offsets of a UDP
+payload, using combined RAW items:
+
+.. _table_rte_flow_item_raw_example:
+
+.. table:: UDP payload matching
+
+   +-------+------+----------+--------------+-------+
+   | Index | Item | Field    | Subfield     | Value |
+   +=======+======+==========+==============+=======+
+   | 0     | Ethernet                               |
+   +-------+----------------------------------------+
+   | 1     | IPv4                                   |
+   +-------+----------------------------------------+
+   | 2     | UDP                                    |
+   +-------+------+----------+--------------+-------+
+   | 3     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 10    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "foo" |
+   +-------+------+----------+--------------+-------+
+   | 4     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 20    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "bar" |
+   +-------+------+----------+--------------+-------+
+   | 5     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | -29   |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "baz" |
+   +-------+------+----------+--------------+-------+
+   | 6     | END                                    |
+   +-------+----------------------------------------+
+
+This translates to:
+
+- Locate "foo" at least 10 bytes deep inside UDP payload.
+- Locate "bar" after "foo" plus 20 bytes.
+- Locate "baz" after "bar" minus 29 bytes.
+
+Such a packet may be represented as follows (not to scale)::
+
+ 0                     >= 10 B           == 20 B
+ |                  |<--------->|     |<--------->|
+ |                  |           |     |           |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+ | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+                          |                             |
+                          |<--------------------------->|
+                                      == 29 B
+
+Note that matching subsequent pattern items would resume after "baz", not
+"bar" since matching is always performed after the previous item of the
+stack.
+
+Item: ``ETH``
+^^^^^^^^^^^^^
+
+Matches an Ethernet header.
+
+- ``dst``: destination MAC.
+- ``src``: source MAC.
+- ``type``: EtherType.
+
+Item: ``VLAN``
+^^^^^^^^^^^^^^
+
+Matches an 802.1Q/ad VLAN tag.
+
+- ``tpid``: tag protocol identifier.
+- ``tci``: tag control information.
+
+Item: ``IPV4``
+^^^^^^^^^^^^^^
+
+Matches an IPv4 header.
+
+Note: IPv4 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv4 header definition (``rte_ip.h``).
+
+Item: ``IPV6``
+^^^^^^^^^^^^^^
+
+Matches an IPv6 header.
+
+Note: IPv6 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv6 header definition (``rte_ip.h``).
+
+Item: ``ICMP``
+^^^^^^^^^^^^^^
+
+Matches an ICMP header.
+
+- ``hdr``: ICMP header definition (``rte_icmp.h``).
+
+Item: ``UDP``
+^^^^^^^^^^^^^
+
+Matches a UDP header.
+
+- ``hdr``: UDP header definition (``rte_udp.h``).
+
+Item: ``TCP``
+^^^^^^^^^^^^^
+
+Matches a TCP header.
+
+- ``hdr``: TCP header definition (``rte_tcp.h``).
+
+Item: ``SCTP``
+^^^^^^^^^^^^^^
+
+Matches a SCTP header.
+
+- ``hdr``: SCTP header definition (``rte_sctp.h``).
+
+Item: ``VXLAN``
+^^^^^^^^^^^^^^^
+
+Matches a VXLAN header (RFC 7348).
+
+- ``flags``: normally 0x08 (I flag).
+- ``rsvd0``: reserved, normally 0x000000.
+- ``vni``: VXLAN network identifier.
+- ``rsvd1``: reserved, normally 0x00.
+
+Actions
+~~~~~~~
+
+Each possible action is represented by a type. Some have associated
+configuration structures. Several actions combined in a list can be affected
+to a flow rule. That list is not ordered.
+
+They fall in three categories:
+
+- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+  processing matched packets by subsequent flow rules, unless overridden
+  with PASSTHRU.
+
+- Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for
+  additional processing by subsequent flow rules.
+
+- Other non-terminating meta actions that do not affect the fate of packets
+  (END, VOID, MARK, FLAG, COUNT).
+
+When several actions are combined in a flow rule, they should all have
+different types (e.g. dropping a packet twice is not possible).
+
+Only the last action of a given type is taken into account. PMDs still
+perform error checking on the entire list.
+
+Like matching patterns, action lists are terminated by END items.
+
+*Note that PASSTHRU is the only action able to override a terminating rule.*
+
+Example of action that redirects packets to queue index 10:
+
+.. _table_rte_flow_action_example:
+
+.. table:: Queue action
+
+   +-----------+-------+
+   | Field     | Value |
+   +===========+=======+
+   | ``index`` | 10    |
+   +-----------+-------+
+
+Action lists examples, their order is not significant, applications must
+consider all actions to be performed simultaneously:
+
+.. _table_rte_flow_count_and_drop:
+
+.. table:: Count and drop
+
+   +-------+--------+
+   | Index | Action |
+   +=======+========+
+   | 0     | COUNT  |
+   +-------+--------+
+   | 1     | DROP   |
+   +-------+--------+
+   | 2     | END    |
+   +-------+--------+
+
+|
+
+.. _table_rte_flow_mark_count_redirect:
+
+.. table:: Mark, count and redirect
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | MARK   | ``mark``  | 0x2a  |
+   +-------+--------+-----------+-------+
+   | 1     | COUNT                      |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 10    |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+|
+
+.. _table_rte_flow_redirect_queue_5:
+
+.. table:: Redirect to queue 5
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | DROP                       |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+In the above example, considering both actions are performed simultaneously,
+the end result is that only QUEUE has any effect.
+
+.. _table_rte_flow_redirect_queue_3:
+
+.. table:: Redirect to queue 3
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 1     | VOID                       |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 3     |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+As previously described, only the last action of a given type found in the
+list is taken into account. The above example also shows that VOID is
+ignored.
+
+Action types
+~~~~~~~~~~~~
+
+Common action types are described in this section. Like pattern item types,
+this list is not exhaustive as new actions will be added in the future.
+
+Action: ``END``
+^^^^^^^^^^^^^^^
+
+End marker for action lists. Prevents further processing of actions, thereby
+ending the list.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_end:
+
+.. table:: END
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VOID``
+^^^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_void:
+
+.. table:: VOID
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``PASSTHRU``
+^^^^^^^^^^^^^^^^^^^^
+
+Leaves packets up for additional processing by subsequent flow rules. This
+is the default when a rule does not contain a terminating action, but can be
+specified to force a rule to become non-terminating.
+
+- No configurable properties.
+
+.. _table_rte_flow_action_passthru:
+
+.. table:: PASSTHRU
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Example to copy a packet to a queue and continue processing by subsequent
+flow rules:
+
+.. _table_rte_flow_action_passthru_example:
+
+.. table:: Copy to queue 8
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | PASSTHRU                   |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 8     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+Action: ``MARK``
+^^^^^^^^^^^^^^^^
+
+Attaches a 32 bit value to packets.
+
+This value is arbitrary and application-defined. For compatibility with FDIR
+it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
+also set in ``ol_flags``.
+
+.. _table_rte_flow_action_mark:
+
+.. table:: MARK
+
+   +--------+-------------------------------------+
+   | Field  | Value                               |
+   +========+=====================================+
+   | ``id`` | 32 bit value to return with packets |
+   +--------+-------------------------------------+
+
+Action: ``FLAG``
+^^^^^^^^^^^^^^^^
+
+Flag packets. Similar to `Action: MARK`_ but only affects ``ol_flags``.
+
+- No configurable properties.
+
+Note: a distinctive flag must be defined for it.
+
+.. _table_rte_flow_action_flag:
+
+.. table:: FLAG
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``QUEUE``
+^^^^^^^^^^^^^^^^^
+
+Assigns packets to a given queue index.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_queue:
+
+.. table:: QUEUE
+
+   +-----------+--------------------+
+   | Field     | Value              |
+   +===========+====================+
+   | ``index`` | queue index to use |
+   +-----------+--------------------+
+
+Action: ``DROP``
+^^^^^^^^^^^^^^^^
+
+Drop packets.
+
+- No configurable properties.
+- Terminating by default.
+- PASSTHRU overrides this action if both are specified.
+
+.. _table_rte_flow_action_drop:
+
+.. table:: DROP
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``COUNT``
+^^^^^^^^^^^^^^^^^
+
+Enables counters for this rule.
+
+These counters can be retrieved and reset through ``rte_flow_query()``, see
+``struct rte_flow_query_count``.
+
+- Counters can be retrieved with ``rte_flow_query()``.
+- No configurable properties.
+
+.. _table_rte_flow_action_count:
+
+.. table:: COUNT
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Query structure to retrieve and reset flow rule counters:
+
+.. _table_rte_flow_query_count:
+
+.. table:: COUNT query
+
+   +---------------+-----+-----------------------------------+
+   | Field         | I/O | Value                             |
+   +===============+=====+===================================+
+   | ``reset``     | in  | reset counter after query         |
+   +---------------+-----+-----------------------------------+
+   | ``hits_set``  | out | ``hits`` field is set             |
+   +---------------+-----+-----------------------------------+
+   | ``bytes_set`` | out | ``bytes`` field is set            |
+   +---------------+-----+-----------------------------------+
+   | ``hits``      | out | number of hits for this rule      |
+   +---------------+-----+-----------------------------------+
+   | ``bytes``     | out | number of bytes through this rule |
+   +---------------+-----+-----------------------------------+
+
+Action: ``DUP``
+^^^^^^^^^^^^^^^
+
+Duplicates packets to a given queue index.
+
+This is normally combined with QUEUE, however when used alone, it is
+actually similar to QUEUE + PASSTHRU.
+
+- Non-terminating by default.
+
+.. _table_rte_flow_action_dup:
+
+.. table:: DUP
+
+   +-----------+------------------------------------+
+   | Field     | Value                              |
+   +===========+====================================+
+   | ``index`` | queue index to duplicate packet to |
+   +-----------+------------------------------------+
+
+Action: ``RSS``
+^^^^^^^^^^^^^^^
+
+Similar to QUEUE, except RSS is additionally performed on packets to spread
+them among several queues according to the provided parameters.
+
+Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
+however it conflicts with `Action: MARK`_ as they share the same space. When
+both actions are specified, the RSS hash is discarded and
+``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
+structure should eventually evolve to store both.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_rss:
+
+.. table:: RSS
+
+   +--------------+------------------------------+
+   | Field        | Value                        |
+   +==============+==============================+
+   | ``rss_conf`` | RSS parameters               |
+   +--------------+------------------------------+
+   | ``num``      | number of entries in queue[] |
+   +--------------+------------------------------+
+   | ``queue[]``  | queue indices to use         |
+   +--------------+------------------------------+
+
+Action: ``PF``
+^^^^^^^^^^^^^^
+
+Redirects packets to the physical function (PF) of the current device.
+
+- No configurable properties.
+- Terminating by default.
+
+.. _table_rte_flow_action_pf:
+
+.. table:: PF
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VF``
+^^^^^^^^^^^^^^
+
+Redirects packets to a virtual function (VF) of the current device.
+
+Packets matched by a VF pattern item can be redirected to their original VF
+ID instead of the specified one. This parameter may not be available and is
+not guaranteed to work properly if the VF part is matched by a prior flow
+rule or if packets are not addressed to a VF in the first place.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_vf:
+
+.. table:: VF
+
+   +--------------+--------------------------------+
+   | Field        | Value                          |
+   +==============+================================+
+   | ``original`` | use original VF ID if possible |
+   +--------------+--------------------------------+
+   | ``vf``       | VF ID to redirect packets to   |
+   +--------------+--------------------------------+
+
+Negative types
+~~~~~~~~~~~~~~
+
+All specified pattern items (``enum rte_flow_item_type``) and actions
+(``enum rte_flow_action_type``) use positive identifiers.
+
+The negative space is reserved for dynamic types generated by PMDs during
+run-time. PMDs may encounter them as a result but must not accept negative
+identifiers they are not aware of.
+
+A method to generate them remains to be defined.
+
+Planned types
+~~~~~~~~~~~~~
+
+Pattern item types will be added as new protocols are implemented.
+
+Variable headers support through dedicated pattern items, for example in
+order to match specific IPv4 options and IPv6 extension headers would be
+stacked after IPv4/IPv6 items.
+
+Other action types are planned but are not defined yet. These include the
+ability to alter packet data in several ways, such as performing
+encapsulation/decapsulation of tunnel headers.
+
+Rules management
+----------------
+
+A rather simple API with few functions is provided to fully manage flow
+rules.
+
+Each created flow rule is associated with an opaque, PMD-specific handle
+pointer. The application is responsible for keeping it until the rule is
+destroyed.
+
+Flows rules are represented by ``struct rte_flow`` objects.
+
+Validation
+~~~~~~~~~~
+
+Given that expressing a definite set of device capabilities is not
+practical, a dedicated function is provided to check if a flow rule is
+supported and can be created.
+
+.. code-block:: c
+
+   int
+   rte_flow_validate(uint8_t port_id,
+                     const struct rte_flow_attr *attr,
+                     const struct rte_flow_item pattern[],
+                     const struct rte_flow_action actions[],
+                     struct rte_flow_error *error);
+
+While this function has no effect on the target device, the flow rule is
+validated against its current configuration state and the returned value
+should be considered valid by the caller for that state only.
+
+The returned value is guaranteed to remain valid only as long as no
+successful calls to ``rte_flow_create()`` or ``rte_flow_destroy()`` are made
+in the meantime and no device parameter affecting flow rules in any way are
+modified, due to possible collisions or resource limitations (although in
+such cases ``EINVAL`` should not be returned).
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 if flow rule is valid and can be created. A negative errno value
+  otherwise (``rte_errno`` is also set), the following errors are defined.
+- ``-ENOSYS``: underlying device does not support this functionality.
+- ``-EINVAL``: unknown or invalid rule specification.
+- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
+  bit-masks are unsupported).
+- ``-EEXIST``: collision with an existing rule.
+- ``-ENOMEM``: not enough resources.
+- ``-EBUSY``: action cannot be performed due to busy device resources, may
+  succeed if the affected queues or even the entire port are in a stopped
+  state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).
+
+Creation
+~~~~~~~~
+
+Creating a flow rule is similar to validating one, except the rule is
+actually created and a handle returned.
+
+.. code-block:: c
+
+   struct rte_flow *
+   rte_flow_create(uint8_t port_id,
+                   const struct rte_flow_attr *attr,
+                   const struct rte_flow_item pattern[],
+                   const struct rte_flow_action *actions[],
+                   struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
+to the positive version of one of the error codes defined for
+``rte_flow_validate()``.
+
+Destruction
+~~~~~~~~~~~
+
+Flow rules destruction is not automatic, and a queue or a port should not be
+released if any are still attached to them. Applications must take care of
+performing this step before releasing resources.
+
+.. code-block:: c
+
+   int
+   rte_flow_destroy(uint8_t port_id,
+                    struct rte_flow *flow,
+                    struct rte_flow_error *error);
+
+
+Failure to destroy a flow rule handle may occur when other flow rules depend
+on it, and destroying it would result in an inconsistent state.
+
+This function is only guaranteed to succeed if handles are destroyed in
+reverse order of their creation.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to destroy.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Flush
+~~~~~
+
+Convenience function to destroy all flow rule handles associated with a
+port. They are released as with successive calls to ``rte_flow_destroy()``.
+
+.. code-block:: c
+
+   int
+   rte_flow_flush(uint8_t port_id,
+                  struct rte_flow_error *error);
+
+In the unlikely event of failure, handles are still considered destroyed and
+no longer valid but the port must be assumed to be in an inconsistent state.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Query
+~~~~~
+
+Query an existing flow rule.
+
+This function allows retrieving flow-specific data such as counters. Data
+is gathered by special actions which must be present in the flow rule
+definition.
+
+.. code-block:: c
+
+   int
+   rte_flow_query(uint8_t port_id,
+                  struct rte_flow *flow,
+                  enum rte_flow_action_type action,
+                  void *data,
+                  struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to query.
+- ``action``: action type to query.
+- ``data``: pointer to storage for the associated query data type.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Verbose error reporting
+-----------------------
+
+The defined *errno* values may not be accurate enough for users or
+application developers who want to investigate issues related to flow rules
+management. A dedicated error object is defined for this purpose:
+
+.. code-block:: c
+
+   enum rte_flow_error_type {
+       RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+       RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+       RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+       RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+       RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+       RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+       RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+       RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+   };
+
+   struct rte_flow_error {
+       enum rte_flow_error_type type; /**< Cause field and error types. */
+       const void *cause; /**< Object responsible for the error. */
+       const char *message; /**< Human-readable error message. */
+   };
+
+Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case
+remaining fields can be ignored. Other error types describe the type of the
+object pointed by ``cause``.
+
+If non-NULL, ``cause`` points to the object responsible for the error. For a
+flow rule, this may be a pattern item or an individual action.
+
+If non-NULL, ``message`` provides a human-readable error message.
+
+This object is normally allocated by applications and set by PMDs in case of
+error, the message points to a constant string which does not need to be
+freed by the application, however its pointer can be considered valid only
+as long as its associated DPDK port remains configured. Closing the
+underlying device or unloading the PMD invalidates it.
+
+Caveats
+-------
+
+- DPDK does not keep track of flow rules definitions or flow rule objects
+  automatically. Applications may keep track of the former and must keep
+  track of the latter. PMDs may also do it for internal needs, however this
+  must not be relied on by applications.
+
+- Flow rules are not maintained between successive port initializations. An
+  application exiting without releasing them and restarting must re-create
+  them from scratch.
+
+- API operations are synchronous and blocking (``EAGAIN`` cannot be
+  returned).
+
+- There is no provision for reentrancy/multi-thread safety, although nothing
+  should prevent different devices from being configured at the same
+  time. PMDs may protect their control path functions accordingly.
+
+- Stopping the data path (TX/RX) should not be necessary when managing flow
+  rules. If this cannot be achieved naturally or with workarounds (such as
+  temporarily replacing the burst function pointers), an appropriate error
+  code must be returned (``EBUSY``).
+
+- PMDs, not applications, are responsible for maintaining flow rules
+  configuration when stopping and restarting a port or performing other
+  actions which may affect them. They can only be destroyed explicitly by
+  applications.
+
+For devices exposing multiple ports sharing global settings affected by flow
+rules:
+
+- All ports under DPDK control must behave consistently, PMDs are
+  responsible for making sure that existing flow rules on a port are not
+  affected by other ports.
+
+- Ports not under DPDK control (unaffected or handled by other applications)
+  are user's responsibility. They may affect existing flow rules and cause
+  undefined behavior. PMDs aware of this may prevent flow rules creation
+  altogether in such cases.
+
+PMD interface
+-------------
+
+The PMD interface is defined in ``rte_flow_driver.h``. It is not subject to
+API/ABI versioning constraints as it is not exposed to applications and may
+evolve independently.
+
+It is currently implemented on top of the legacy filtering framework through
+filter type *RTE_ETH_FILTER_GENERIC* that accepts the single operation
+*RTE_ETH_FILTER_GET* to return PMD-specific *rte_flow* callbacks wrapped
+inside ``struct rte_flow_ops``.
+
+This overhead is temporarily necessary in order to keep compatibility with
+the legacy filtering framework, which should eventually disappear.
+
+- PMD callbacks implement exactly the interface described in `Rules
+  management`_, except for the port ID argument which has already been
+  converted to a pointer to the underlying ``struct rte_eth_dev``.
+
+- Public API functions do not process flow rules definitions at all before
+  calling PMD functions (no basic error checking, no validation
+  whatsoever). They only make sure these callbacks are non-NULL or return
+  the ``ENOSYS`` (function not supported) error.
+
+This interface additionally defines the following helper functions:
+
+- ``rte_flow_ops_get()``: get generic flow operations structure from a
+  port.
+
+- ``rte_flow_error_set()``: initialize generic flow error structure.
+
+More will be added over time.
+
+Device compatibility
+--------------------
+
+No known implementation supports all the described features.
+
+Unsupported features or combinations are not expected to be fully emulated
+in software by PMDs for performance reasons. Partially supported features
+may be completed in software as long as hardware performs most of the work
+(such as queue redirection and packet recognition).
+
+However PMDs are expected to do their best to satisfy application requests
+by working around hardware limitations as long as doing so does not affect
+the behavior of existing flow rules.
+
+The following sections provide a few examples of such cases and describe how
+PMDs should handle them, they are based on limitations built into the
+previous APIs.
+
+Global bit-masks
+~~~~~~~~~~~~~~~~
+
+Each flow rule comes with its own, per-layer bit-masks, while hardware may
+support only a single, device-wide bit-mask for a given layer type, so that
+two IPv4 rules cannot use different bit-masks.
+
+The expected behavior in this case is that PMDs automatically configure
+global bit-masks according to the needs of the first flow rule created.
+
+Subsequent rules are allowed only if their bit-masks match those, the
+``EEXIST`` error code should be returned otherwise.
+
+Unsupported layer types
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Many protocols can be simulated by crafting patterns with the `Item: RAW`_
+type.
+
+PMDs can rely on this capability to simulate support for protocols with
+headers not directly recognized by hardware.
+
+``ANY`` pattern item
+~~~~~~~~~~~~~~~~~~~~
+
+This pattern item stands for anything, which can be difficult to translate
+to something hardware would understand, particularly if followed by more
+specific types.
+
+Consider the following pattern:
+
+.. _table_rte_flow_unsupported_any:
+
+.. table:: Pattern with ANY as L3
+
+   +-------+-----------------------+
+   | Index | Item                  |
+   +=======+=======================+
+   | 0     | ETHER                 |
+   +-------+-----+---------+-------+
+   | 1     | ANY | ``num`` | ``1`` |
+   +-------+-----+---------+-------+
+   | 2     | TCP                   |
+   +-------+-----------------------+
+   | 3     | END                   |
+   +-------+-----------------------+
+
+Knowing that TCP does not make sense with something other than IPv4 and IPv6
+as L3, such a pattern may be translated to two flow rules instead:
+
+.. _table_rte_flow_unsupported_any_ipv4:
+
+.. table:: ANY replaced with IPV4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV4 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+|
+
+.. _table_rte_flow_unsupported_any_ipv6:
+
+.. table:: ANY replaced with IPV6
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV6 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+Note that as soon as a ANY rule covers several layers, this approach may
+yield a large number of hidden flow rules. It is thus suggested to only
+support the most common scenarios (anything as L2 and/or L3).
+
+Unsupported actions
+~~~~~~~~~~~~~~~~~~~
+
+- When combined with `Action: QUEUE`_, packet counting (`Action: COUNT`_)
+  and tagging (`Action: MARK`_ or `Action: FLAG`_) may be implemented in
+  software as long as the target queue is used by a single rule.
+
+- A rule specifying both `Action: DUP`_ + `Action: QUEUE`_ may be translated
+  to two hidden rules combining `Action: QUEUE`_ and `Action: PASSTHRU`_.
+
+- When a single target queue is provided, `Action: RSS`_ can also be
+  implemented through `Action: QUEUE`_.
+
+Flow rules priority
+~~~~~~~~~~~~~~~~~~~
+
+While it would naturally make sense, flow rules cannot be assumed to be
+processed by hardware in the same order as their creation for several
+reasons:
+
+- They may be managed internally as a tree or a hash table instead of a
+  list.
+- Removing a flow rule before adding another one can either put the new rule
+  at the end of the list or reuse a freed entry.
+- Duplication may occur when packets are matched by several rules.
+
+For overlapping rules (particularly in order to use `Action: PASSTHRU`_)
+predictable behavior is only guaranteed by using different priority levels.
+
+Priority levels are not necessarily implemented in hardware, or may be
+severely limited (e.g. a single priority bit).
+
+For these reasons, priority levels may be implemented purely in software by
+PMDs.
+
+- For devices expecting flow rules to be added in the correct order, PMDs
+  may destroy and re-create existing rules after adding a new one with
+  a higher priority.
+
+- A configurable number of dummy or empty rules can be created at
+  initialization time to save high priority slots for later.
+
+- In order to save priority levels, PMDs may evaluate whether rules are
+  likely to collide and adjust their priority accordingly.
+
+Future evolutions
+-----------------
+
+- A device profile selection function which could be used to force a
+  permanent profile instead of relying on its automatic configuration based
+  on existing flow rules.
+
+- A method to optimize *rte_flow* rules with specific pattern items and
+  action types generated on the fly by PMDs. DPDK should assign negative
+  numbers to these in order to not collide with the existing types. See
+  `Negative types`_.
+
+- Adding specific egress pattern items and actions as described in
+  `Attribute: Traffic direction`_.
+
+- Optional software fallback when PMDs are unable to handle requested flow
+  rules so applications do not have to implement their own.
+
+API migration
+-------------
+
+Exhaustive list of deprecated filter types (normally prefixed with
+*RTE_ETH_FILTER_*) found in ``rte_eth_ctrl.h`` and methods to convert them
+to *rte_flow* rules.
+
+``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*MACVLAN* can be translated to a basic `Item: ETH`_ flow rule with a
+terminating `Action: VF`_ or `Action: PF`_.
+
+.. _table_rte_flow_migration_macvlan:
+
+.. table:: MACVLAN conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | VF,     |
+   |   |     +----------+-----+ PF      |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*ETHERTYPE* is basically an `Item: ETH`_ flow rule with a terminating
+`Action: QUEUE`_ or `Action: DROP`_.
+
+.. _table_rte_flow_migration_ethertype:
+
+.. table:: ETHERTYPE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | QUEUE,  |
+   |   |     +----------+-----+ DROP    |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``FLEXIBLE`` to ``RAW`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FLEXIBLE* can be translated to one `Item: RAW`_ pattern with a terminating
+`Action: QUEUE`_ and a defined priority level.
+
+.. _table_rte_flow_migration_flexible:
+
+.. table:: FLEXIBLE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | RAW | ``spec`` | any | QUEUE   |
+   |   |     +----------+-----+         |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``SYN`` to ``TCP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*SYN* is a `Item: TCP`_ rule with only the ``syn`` bit enabled and masked,
+and a terminating `Action: QUEUE`_.
+
+Priority level can be set to simulate the high priority bit.
+
+.. _table_rte_flow_migration_syn:
+
+.. table:: SYN conversion
+
+   +-----------------------------------+---------+
+   | Pattern                           | Actions |
+   +===+======+==========+=============+=========+
+   | 0 | ETH  | ``spec`` | unset       | QUEUE   |
+   |   |      +----------+-------------+         |
+   |   |      | ``last`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+-------------+---------+
+   | 1 | IPV4 | ``spec`` | unset       | END     |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+---------+---+         |
+   | 2 | TCP  | ``spec`` | ``syn`` | 1 |         |
+   |   |      +----------+---------+---+         |
+   |   |      | ``mask`` | ``syn`` | 1 |         |
+   +---+------+----------+---------+---+         |
+   | 3 | END                           |         |
+   +---+-------------------------------+---------+
+
+``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*NTUPLE* is similar to specifying an empty L2, `Item: IPV4`_ as L3 with
+`Item: TCP`_ or `Item: UDP`_ as L4 and a terminating `Action: QUEUE`_.
+
+A priority level can be specified as well.
+
+.. _table_rte_flow_migration_ntuple:
+
+.. table:: NTUPLE conversion
+
+   +-----------------------------+---------+
+   | Pattern                     | Actions |
+   +===+======+==========+=======+=========+
+   | 0 | ETH  | ``spec`` | unset | QUEUE   |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | unset |         |
+   +---+------+----------+-------+---------+
+   | 1 | IPV4 | ``spec`` | any   | END     |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 2 | TCP, | ``spec`` | any   |         |
+   |   | UDP  +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 3 | END                     |         |
+   +---+-------------------------+---------+
+
+``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*TUNNEL* matches common IPv4 and IPv6 L3/L4-based tunnel types.
+
+In the following table, `Item: ANY`_ is used to cover the optional L4.
+
+.. _table_rte_flow_migration_tunnel:
+
+.. table:: TUNNEL conversion
+
+   +-------------------------------------------------------+---------+
+   | Pattern                                               | Actions |
+   +===+==========================+==========+=============+=========+
+   | 0 | ETH                      | ``spec`` | any         | QUEUE   |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+---------+
+   | 1 | IPV4, IPV6               | ``spec`` | any         | END     |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+         |
+   | 2 | ANY                      | ``spec`` | any         |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+---------+---+         |
+   |   |                          | ``mask`` | ``num`` | 0 |         |
+   +---+--------------------------+----------+---------+---+         |
+   | 3 | VXLAN, GENEVE, TEREDO,   | ``spec`` | any         |         |
+   |   | NVGRE, GRE, ...          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+         |
+   | 4 | END                                               |         |
+   +---+---------------------------------------------------+---------+
+
+``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FDIR* is more complex than any other type, there are several methods to
+emulate its functionality. It is summarized for the most part in the table
+below.
+
+A few features are intentionally not supported:
+
+- The ability to configure the matching input set and masks for the entire
+  device, PMDs should take care of it automatically according to the
+  requested flow rules.
+
+  For example if a device supports only one bit-mask per protocol type,
+  source/address IPv4 bit-masks can be made immutable by the first created
+  rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
+  compatible.
+
+  Note that only protocol bit-masks affected by existing flow rules are
+  immutable, others can be changed later. They become mutable again after
+  the related flow rules are destroyed.
+
+- Returning four or eight bytes of matched data when using flex bytes
+  filtering. Although a specific action could implement it, it conflicts
+  with the much more useful 32 bits tagging on devices that support it.
+
+- Side effects on RSS processing of the entire device. Flow rules that
+  conflict with the current device configuration should not be
+  allowed. Similarly, device configuration should not be allowed when it
+  affects existing flow rules.
+
+- Device modes of operation. "none" is unsupported since filtering cannot be
+  disabled as long as a flow rule is present.
+
+- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
+  according to the created flow rules.
+
+- Signature mode of operation is not defined but could be handled through a
+  specific item type if needed.
+
+.. _table_rte_flow_migration_fdir:
+
+.. table:: FDIR conversion
+
+   +---------------------------------+------------+
+   | Pattern                         | Actions    |
+   +===+============+==========+=====+============+
+   | 0 | ETH,       | ``spec`` | any | QUEUE,     |
+   |   | RAW        +----------+-----+ DROP,      |
+   |   |            | ``last`` | N/A | PASSTHRU   |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+------------+
+   | 1 | IPV4,      | ``spec`` | any | MARK       |
+   |   | IPV6       +----------+-----+            |
+   |   |            | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+------------+
+   | 2 | TCP,       | ``spec`` | any | END        |
+   |   | UDP,       +----------+-----+            |
+   |   | SCTP       | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+            |
+   | 3 | VF,        | ``spec`` | any |            |
+   |   | PF         +----------+-----+            |
+   |   | (optional) | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+            |
+   | 4 | END                         |            |
+   +---+-----------------------------+------------+
+
+``HASH``
+~~~~~~~~
+
+There is no counterpart to this filter type because it translates to a
+global device setting instead of a pattern item. Device settings are
+automatically set according to the created flow rules.
+
+``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All packets are matched. This type alters incoming packets to encapsulate
+them in a chosen tunnel type, optionally redirect them to a VF as well.
+
+The destination pool for tag based forwarding can be emulated with other
+flow rules using `Action: DUP`_.
+
+.. _table_rte_flow_migration_l2tunnel:
+
+.. table:: L2_TUNNEL conversion
+
+   +---------------------------+------------+
+   | Pattern                   | Actions    |
+   +===+======+==========+=====+============+
+   | 0 | VOID | ``spec`` | N/A | VXLAN,     |
+   |   |      |          |     | GENEVE,    |
+   |   |      |          |     | ...        |
+   |   |      +----------+-----+            |
+   |   |      | ``last`` | N/A |            |
+   |   |      +----------+-----+            |
+   |   |      | ``mask`` | N/A |            |
+   |   |      |          |     |            |
+   +---+------+----------+-----+------------+
+   | 1 | END                   | VF         |
+   |   |                       | (optional) |
+   +---+                       +------------+
+   | 2 |                       | END        |
+   +---+-----------------------+------------+
-- 
2.1.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v4 01/25] ethdev: introduce generic flow API
  @ 2016-12-20 18:42  2%         ` Adrien Mazarguil
  2016-12-20 18:42  1%         ` [dpdk-dev] [PATCH v4 02/25] doc: add rte_flow prog guide Adrien Mazarguil
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-20 18:42 UTC (permalink / raw)
  To: dev

This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

Benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

Existing filter types will be deprecated and removed in the near future.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                            |   4 +
 doc/api/doxy-api-index.md              |   2 +
 lib/librte_ether/Makefile              |   3 +
 lib/librte_ether/rte_eth_ctrl.h        |   1 +
 lib/librte_ether/rte_ether_version.map |  11 +
 lib/librte_ether/rte_flow.c            | 159 +++++
 lib/librte_ether/rte_flow.h            | 947 ++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow_driver.h     | 182 ++++++
 8 files changed, 1309 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bb0b99..775b058 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -243,6 +243,10 @@ M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
 F: scripts/test-null.sh
 
+Generic flow API
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: lib/librte_ether/rte_flow*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index de65b4c..4951552 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -39,6 +39,8 @@ There are many libraries, so their headers may be grouped by topics:
   [dev]                (@ref rte_dev.h),
   [ethdev]             (@ref rte_ethdev.h),
   [ethctrl]            (@ref rte_eth_ctrl.h),
+  [rte_flow]           (@ref rte_flow.h),
+  [rte_flow_driver]    (@ref rte_flow_driver.h),
   [cryptodev]          (@ref rte_cryptodev.h),
   [devargs]            (@ref rte_devargs.h),
   [bond]               (@ref rte_eth_bond.h),
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..9335361 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
 LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
+SRCS-y += rte_flow.c
 
 #
 # Export include files
@@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
+SYMLINK-y-include += rte_flow_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index fe80eb0..8386904 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -99,6 +99,7 @@ enum rte_filter_type {
 	RTE_ETH_FILTER_FDIR,
 	RTE_ETH_FILTER_HASH,
 	RTE_ETH_FILTER_L2_TUNNEL,
+	RTE_ETH_FILTER_GENERIC,
 	RTE_ETH_FILTER_MAX
 };
 
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..384cdee 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -147,3 +147,14 @@ DPDK_16.11 {
 	rte_eth_dev_pci_remove;
 
 } DPDK_16.07;
+
+DPDK_17.02 {
+	global:
+
+	rte_flow_validate;
+	rte_flow_create;
+	rte_flow_destroy;
+	rte_flow_flush;
+	rte_flow_query;
+
+} DPDK_16.11;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
new file mode 100644
index 0000000..d98fb1b
--- /dev/null
+++ b/lib/librte_ether/rte_flow.c
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_flow_driver.h"
+#include "rte_flow.h"
+
+/* Get generic flow operations structure from a port. */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops;
+	int code;
+
+	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
+		code = ENODEV;
+	else if (unlikely(!dev->dev_ops->filter_ctrl ||
+			  dev->dev_ops->filter_ctrl(dev,
+						    RTE_ETH_FILTER_GENERIC,
+						    RTE_ETH_FILTER_GET,
+						    &ops) ||
+			  !ops))
+		code = ENOSYS;
+	else
+		return ops;
+	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(code));
+	return NULL;
+}
+
+/* Check whether a flow rule can be created on a given port. */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error)
+{
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->validate))
+		return ops->validate(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Create a flow rule on a given port. */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return NULL;
+	if (likely(!!ops->create))
+		return ops->create(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return NULL;
+}
+
+/* Destroy a flow rule on a given port. */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->destroy))
+		return ops->destroy(dev, flow, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Destroy all flow rules associated with a port. */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->flush))
+		return ops->flush(dev, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Query an existing flow rule. */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (!ops)
+		return -rte_errno;
+	if (likely(!!ops->query))
+		return ops->query(dev, flow, action, data, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
new file mode 100644
index 0000000..98084ac
--- /dev/null
+++ b/lib/librte_ether/rte_flow.h
@@ -0,0 +1,947 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_H_
+#define RTE_FLOW_H_
+
+/**
+ * @file
+ * RTE generic flow API
+ *
+ * This interface provides the ability to program packet matching and
+ * associated actions in hardware through flow rules.
+ */
+
+#include <rte_arp.h>
+#include <rte_ether.h>
+#include <rte_icmp.h>
+#include <rte_ip.h>
+#include <rte_sctp.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Flow rule attributes.
+ *
+ * Priorities are set on two levels: per group and per rule within groups.
+ *
+ * Lower values denote higher priority, the highest priority for both levels
+ * is 0, so that a rule with priority 0 in group 8 is always matched after a
+ * rule with priority 8 in group 0.
+ *
+ * Although optional, applications are encouraged to group similar rules as
+ * much as possible to fully take advantage of hardware capabilities
+ * (e.g. optimized matching) and work around limitations (e.g. a single
+ * pattern type possibly allowed in a given group).
+ *
+ * Group and priority levels are arbitrary and up to the application, they
+ * do not need to be contiguous nor start from 0, however the maximum number
+ * varies between devices and may be affected by existing flow rules.
+ *
+ * If a packet is matched by several rules of a given group for a given
+ * priority level, the outcome is undefined. It can take any path, may be
+ * duplicated or even cause unrecoverable errors.
+ *
+ * Note that support for more than a single group and priority level is not
+ * guaranteed.
+ *
+ * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+ *
+ * Several pattern items and actions are valid and can be used in both
+ * directions. Those valid for only one direction are described as such.
+ *
+ * At least one direction must be specified.
+ *
+ * Specifying both directions at once for a given rule is not recommended
+ * but may be valid in a few cases (e.g. shared counter).
+ */
+struct rte_flow_attr {
+	uint32_t group; /**< Priority group. */
+	uint32_t priority; /**< Priority level within group. */
+	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
+	uint32_t egress:1; /**< Rule applies to egress traffic. */
+	uint32_t reserved:30; /**< Reserved, must be zero. */
+};
+
+/**
+ * Matching pattern item types.
+ *
+ * Pattern items fall in two categories:
+ *
+ * - Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+ *   IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+ *   specification structure. These must be stacked in the same order as the
+ *   protocol layers to match, starting from the lowest.
+ *
+ * - Matching meta-data or affecting pattern processing (END, VOID, INVERT,
+ *   PF, VF, PORT and so on), often without a specification structure. Since
+ *   they do not match packet contents, these can be specified anywhere
+ *   within item lists without affecting others.
+ *
+ * See the description of individual types for more information. Those
+ * marked with [META] fall into the second category.
+ */
+enum rte_flow_item_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for item lists. Prevents further processing of items,
+	 * thereby ending the pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_VOID,
+
+	/**
+	 * [META]
+	 *
+	 * Inverted matching, i.e. process packets that do not match the
+	 * pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_INVERT,
+
+	/**
+	 * Matches any protocol in place of the current layer, a single ANY
+	 * may also stand for several protocol layers.
+	 *
+	 * See struct rte_flow_item_any.
+	 */
+	RTE_FLOW_ITEM_TYPE_ANY,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to the physical function of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a PF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_PF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to a virtual function ID of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a VF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * See struct rte_flow_item_vf.
+	 */
+	RTE_FLOW_ITEM_TYPE_VF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets coming from the specified physical port of the
+	 * underlying device.
+	 *
+	 * The first PORT item overrides the physical port normally
+	 * associated with the specified DPDK input port (port_id). This
+	 * item can be provided several times to match additional physical
+	 * ports.
+	 *
+	 * See struct rte_flow_item_port.
+	 */
+	RTE_FLOW_ITEM_TYPE_PORT,
+
+	/**
+	 * Matches a byte string of a given length at a given offset.
+	 *
+	 * See struct rte_flow_item_raw.
+	 */
+	RTE_FLOW_ITEM_TYPE_RAW,
+
+	/**
+	 * Matches an Ethernet header.
+	 *
+	 * See struct rte_flow_item_eth.
+	 */
+	RTE_FLOW_ITEM_TYPE_ETH,
+
+	/**
+	 * Matches an 802.1Q/ad VLAN tag.
+	 *
+	 * See struct rte_flow_item_vlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VLAN,
+
+	/**
+	 * Matches an IPv4 header.
+	 *
+	 * See struct rte_flow_item_ipv4.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV4,
+
+	/**
+	 * Matches an IPv6 header.
+	 *
+	 * See struct rte_flow_item_ipv6.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6,
+
+	/**
+	 * Matches an ICMP header.
+	 *
+	 * See struct rte_flow_item_icmp.
+	 */
+	RTE_FLOW_ITEM_TYPE_ICMP,
+
+	/**
+	 * Matches a UDP header.
+	 *
+	 * See struct rte_flow_item_udp.
+	 */
+	RTE_FLOW_ITEM_TYPE_UDP,
+
+	/**
+	 * Matches a TCP header.
+	 *
+	 * See struct rte_flow_item_tcp.
+	 */
+	RTE_FLOW_ITEM_TYPE_TCP,
+
+	/**
+	 * Matches a SCTP header.
+	 *
+	 * See struct rte_flow_item_sctp.
+	 */
+	RTE_FLOW_ITEM_TYPE_SCTP,
+
+	/**
+	 * Matches a VXLAN header.
+	 *
+	 * See struct rte_flow_item_vxlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VXLAN,
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ANY
+ *
+ * Matches any protocol in place of the current layer, a single ANY may also
+ * stand for several protocol layers.
+ *
+ * This is usually specified as the first pattern item when looking for a
+ * protocol anywhere in a packet.
+ *
+ * A zeroed mask stands for any number of layers.
+ */
+struct rte_flow_item_any {
+	uint32_t num; /* Number of layers covered. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VF
+ *
+ * Matches packets addressed to a virtual function ID of the device.
+ *
+ * If the underlying device function differs from the one that would
+ * normally receive the matched traffic, specifying this item prevents it
+ * from reaching that device unless the flow rule contains a VF
+ * action. Packets are not duplicated between device instances by default.
+ *
+ * - Likely to return an error or never match any traffic if this causes a
+ *   VF device to match traffic addressed to a different VF.
+ * - Can be specified multiple times to match traffic addressed to several
+ *   VF IDs.
+ * - Can be combined with a PF item to match both PF and VF traffic.
+ *
+ * A zeroed mask can be used to match any VF ID.
+ */
+struct rte_flow_item_vf {
+	uint32_t id; /**< Destination VF ID. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_PORT
+ *
+ * Matches packets coming from the specified physical port of the underlying
+ * device.
+ *
+ * The first PORT item overrides the physical port normally associated with
+ * the specified DPDK input port (port_id). This item can be provided
+ * several times to match additional physical ports.
+ *
+ * Note that physical ports are not necessarily tied to DPDK input ports
+ * (port_id) when those are not under DPDK control. Possible values are
+ * specific to each device, they are not necessarily indexed from zero and
+ * may not be contiguous.
+ *
+ * As a device property, the list of allowed values as well as the value
+ * associated with a port_id should be retrieved by other means.
+ *
+ * A zeroed mask can be used to match any port index.
+ */
+struct rte_flow_item_port {
+	uint32_t index; /**< Physical port index. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_RAW
+ *
+ * Matches a byte string of a given length at a given offset.
+ *
+ * Offset is either absolute (using the start of the packet) or relative to
+ * the end of the previous matched item in the stack, in which case negative
+ * values are allowed.
+ *
+ * If search is enabled, offset is used as the starting point. The search
+ * area can be delimited by setting limit to a nonzero value, which is the
+ * maximum number of bytes after offset where the pattern may start.
+ *
+ * Matching a zero-length pattern is allowed, doing so resets the relative
+ * offset for subsequent items.
+ *
+ * This type does not support ranges (struct rte_flow_item.last).
+ */
+struct rte_flow_item_raw {
+	uint32_t relative:1; /**< Look for pattern after the previous item. */
+	uint32_t search:1; /**< Search pattern from offset (see also limit). */
+	uint32_t reserved:30; /**< Reserved, must be set to zero. */
+	int32_t offset; /**< Absolute or relative offset for pattern. */
+	uint16_t limit; /**< Search area limit for start of pattern. */
+	uint16_t length; /**< Pattern length. */
+	uint8_t pattern[]; /**< Byte string to look for. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ETH
+ *
+ * Matches an Ethernet header.
+ */
+struct rte_flow_item_eth {
+	struct ether_addr dst; /**< Destination MAC. */
+	struct ether_addr src; /**< Source MAC. */
+	uint16_t type; /**< EtherType. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VLAN
+ *
+ * Matches an 802.1Q/ad VLAN tag.
+ *
+ * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
+ * RTE_FLOW_ITEM_TYPE_VLAN.
+ */
+struct rte_flow_item_vlan {
+	uint16_t tpid; /**< Tag protocol identifier. */
+	uint16_t tci; /**< Tag control information. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV4
+ *
+ * Matches an IPv4 header.
+ *
+ * Note: IPv4 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv4 {
+	struct ipv4_hdr hdr; /**< IPv4 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV6.
+ *
+ * Matches an IPv6 header.
+ *
+ * Note: IPv6 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv6 {
+	struct ipv6_hdr hdr; /**< IPv6 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ICMP.
+ *
+ * Matches an ICMP header.
+ */
+struct rte_flow_item_icmp {
+	struct icmp_hdr hdr; /**< ICMP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_UDP.
+ *
+ * Matches a UDP header.
+ */
+struct rte_flow_item_udp {
+	struct udp_hdr hdr; /**< UDP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_TCP.
+ *
+ * Matches a TCP header.
+ */
+struct rte_flow_item_tcp {
+	struct tcp_hdr hdr; /**< TCP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_SCTP.
+ *
+ * Matches a SCTP header.
+ */
+struct rte_flow_item_sctp {
+	struct sctp_hdr hdr; /**< SCTP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VXLAN.
+ *
+ * Matches a VXLAN header (RFC 7348).
+ */
+struct rte_flow_item_vxlan {
+	uint8_t flags; /**< Normally 0x08 (I flag). */
+	uint8_t rsvd0[3]; /**< Reserved, normally 0x000000. */
+	uint8_t vni[3]; /**< VXLAN identifier. */
+	uint8_t rsvd1; /**< Reserved, normally 0x00. */
+};
+
+/**
+ * Matching pattern item definition.
+ *
+ * A pattern is formed by stacking items starting from the lowest protocol
+ * layer to match. This stacking restriction does not apply to meta items
+ * which can be placed anywhere in the stack without affecting the meaning
+ * of the resulting pattern.
+ *
+ * Patterns are terminated by END items.
+ *
+ * The spec field should be a valid pointer to a structure of the related
+ * item type. It may be set to NULL in many cases to use default values.
+ *
+ * Optionally, last can point to a structure of the same type to define an
+ * inclusive range. This is mostly supported by integer and address fields,
+ * may cause errors otherwise. Fields that do not support ranges must be set
+ * to 0 or to the same value as the corresponding fields in spec.
+ *
+ * By default all fields present in spec are considered relevant (see note
+ * below). This behavior can be altered by providing a mask structure of the
+ * same type with applicable bits set to one. It can also be used to
+ * partially filter out specific fields (e.g. as an alternate mean to match
+ * ranges of IP addresses).
+ *
+ * Mask is a simple bit-mask applied before interpreting the contents of
+ * spec and last, which may yield unexpected results if not used
+ * carefully. For example, if for an IPv4 address field, spec provides
+ * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
+ * effective range becomes 10.1.0.0 to 10.3.255.255.
+ *
+ * Note: the defaults for data-matching items such as IPv4 when mask is not
+ * specified actually depend on the underlying implementation since only
+ * recognized fields can be taken into account.
+ */
+struct rte_flow_item {
+	enum rte_flow_item_type type; /**< Item type. */
+	const void *spec; /**< Pointer to item specification structure. */
+	const void *last; /**< Defines an inclusive range (spec to last). */
+	const void *mask; /**< Bit-mask applied to spec and last. */
+};
+
+/**
+ * Action types.
+ *
+ * Each possible action is represented by a type. Some have associated
+ * configuration structures. Several actions combined in a list can be
+ * affected to a flow rule. That list is not ordered.
+ *
+ * They fall in three categories:
+ *
+ * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+ *   processing matched packets by subsequent flow rules, unless overridden
+ *   with PASSTHRU.
+ *
+ * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
+ *   for additional processing by subsequent flow rules.
+ *
+ * - Other non terminating meta actions that do not affect the fate of
+ *   packets (END, VOID, MARK, FLAG, COUNT).
+ *
+ * When several actions are combined in a flow rule, they should all have
+ * different types (e.g. dropping a packet twice is not possible).
+ *
+ * Only the last action of a given type is taken into account. PMDs still
+ * perform error checking on the entire list.
+ *
+ * Note that PASSTHRU is the only action able to override a terminating
+ * rule.
+ */
+enum rte_flow_action_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for action lists. Prevents further processing of
+	 * actions, thereby ending the list.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_VOID,
+
+	/**
+	 * Leaves packets up for additional processing by subsequent flow
+	 * rules. This is the default when a rule does not contain a
+	 * terminating action, but can be specified to force a rule to
+	 * become non-terminating.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PASSTHRU,
+
+	/**
+	 * [META]
+	 *
+	 * Attaches a 32 bit value to packets.
+	 *
+	 * See struct rte_flow_action_mark.
+	 */
+	RTE_FLOW_ACTION_TYPE_MARK,
+
+	/**
+	 * [META]
+	 *
+	 * Flag packets. Similar to MARK but only affects ol_flags.
+	 *
+	 * Note: a distinctive flag must be defined for it.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_FLAG,
+
+	/**
+	 * Assigns packets to a given queue index.
+	 *
+	 * See struct rte_flow_action_queue.
+	 */
+	RTE_FLOW_ACTION_TYPE_QUEUE,
+
+	/**
+	 * Drops packets.
+	 *
+	 * PASSTHRU overrides this action if both are specified.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_DROP,
+
+	/**
+	 * [META]
+	 *
+	 * Enables counters for this rule.
+	 *
+	 * These counters can be retrieved and reset through rte_flow_query(),
+	 * see struct rte_flow_query_count.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_COUNT,
+
+	/**
+	 * Duplicates packets to a given queue index.
+	 *
+	 * This is normally combined with QUEUE, however when used alone, it
+	 * is actually similar to QUEUE + PASSTHRU.
+	 *
+	 * See struct rte_flow_action_dup.
+	 */
+	RTE_FLOW_ACTION_TYPE_DUP,
+
+	/**
+	 * Similar to QUEUE, except RSS is additionally performed on packets
+	 * to spread them among several queues according to the provided
+	 * parameters.
+	 *
+	 * See struct rte_flow_action_rss.
+	 */
+	RTE_FLOW_ACTION_TYPE_RSS,
+
+	/**
+	 * Redirects packets to the physical function (PF) of the current
+	 * device.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PF,
+
+	/**
+	 * Redirects packets to the virtual function (VF) of the current
+	 * device with the specified ID.
+	 *
+	 * See struct rte_flow_action_vf.
+	 */
+	RTE_FLOW_ACTION_TYPE_VF,
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_MARK
+ *
+ * Attaches a 32 bit value to packets.
+ *
+ * This value is arbitrary and application-defined. For compatibility with
+ * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
+ * also set in ol_flags.
+ */
+struct rte_flow_action_mark {
+	uint32_t id; /**< 32 bit value to return with packets. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_QUEUE
+ *
+ * Assign packets to a given queue index.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_queue {
+	uint16_t index; /**< Queue index to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_COUNT (query)
+ *
+ * Query structure to retrieve and reset flow rule counters.
+ */
+struct rte_flow_query_count {
+	uint32_t reset:1; /**< Reset counters after query [in]. */
+	uint32_t hits_set:1; /**< hits field is set [out]. */
+	uint32_t bytes_set:1; /**< bytes field is set [out]. */
+	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
+	uint64_t hits; /**< Number of hits for this rule [out]. */
+	uint64_t bytes; /**< Number of bytes through this rule [out]. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_DUP
+ *
+ * Duplicates packets to a given queue index.
+ *
+ * This is normally combined with QUEUE, however when used alone, it is
+ * actually similar to QUEUE + PASSTHRU.
+ *
+ * Non-terminating by default.
+ */
+struct rte_flow_action_dup {
+	uint16_t index; /**< Queue index to duplicate packets to. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_RSS
+ *
+ * Similar to QUEUE, except RSS is additionally performed on packets to
+ * spread them among several queues according to the provided parameters.
+ *
+ * Note: RSS hash result is normally stored in the hash.rss mbuf field,
+ * however it conflicts with the MARK action as they share the same
+ * space. When both actions are specified, the RSS hash is discarded and
+ * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
+ * structure should eventually evolve to store both.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_rss {
+	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
+	uint16_t num; /**< Number of entries in queue[]. */
+	uint16_t queue[]; /**< Queues indices to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_VF
+ *
+ * Redirects packets to a virtual function (VF) of the current device.
+ *
+ * Packets matched by a VF pattern item can be redirected to their original
+ * VF ID instead of the specified one. This parameter may not be available
+ * and is not guaranteed to work properly if the VF part is matched by a
+ * prior flow rule or if packets are not addressed to a VF in the first
+ * place.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_vf {
+	uint32_t original:1; /**< Use original VF ID if possible. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint32_t id; /**< VF ID to redirect packets to. */
+};
+
+/**
+ * Definition of a single action.
+ *
+ * A list of actions is terminated by a END action.
+ *
+ * For simple actions without a configuration structure, conf remains NULL.
+ */
+struct rte_flow_action {
+	enum rte_flow_action_type type; /**< Action type. */
+	const void *conf; /**< Pointer to action configuration structure. */
+};
+
+/**
+ * Opaque type returned after successfully creating a flow.
+ *
+ * This handle can be used to manage and query the related flow (e.g. to
+ * destroy it or retrieve counters).
+ */
+struct rte_flow;
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_flow_error.cause.
+ */
+enum rte_flow_error_type {
+	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+	RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+	RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+	RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_flow_error {
+	enum rte_flow_error_type type; /**< Cause field and error types. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Check whether a flow rule can be created on a given port.
+ *
+ * While this function has no effect on the target device, the flow rule is
+ * validated against its current configuration state and the returned value
+ * should be considered valid by the caller for that state only.
+ *
+ * The returned value is guaranteed to remain valid only as long as no
+ * successful calls to rte_flow_create() or rte_flow_destroy() are made in
+ * the meantime and no device parameter affecting flow rules in any way are
+ * modified, due to possible collisions or resource limitations (although in
+ * such cases EINVAL should not be returned).
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 if flow rule is valid and can be created. A negative errno value
+ *   otherwise (rte_errno is also set), the following errors are defined:
+ *
+ *   -ENOSYS: underlying device does not support this functionality.
+ *
+ *   -EINVAL: unknown or invalid rule specification.
+ *
+ *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
+ *   bit-masks are unsupported).
+ *
+ *   -EEXIST: collision with an existing rule.
+ *
+ *   -ENOMEM: not enough resources.
+ *
+ *   -EBUSY: action cannot be performed due to busy device resources, may
+ *   succeed if the affected queues or even the entire port are in a stopped
+ *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
+ */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error);
+
+/**
+ * Create a flow rule on a given port.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise and rte_errno is set
+ *   to the positive version of one of the error codes defined for
+ *   rte_flow_validate().
+ */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error);
+
+/**
+ * Destroy a flow rule on a given port.
+ *
+ * Failure to destroy a flow rule handle may occur when other flow rules
+ * depend on it, and destroying it would result in an inconsistent state.
+ *
+ * This function is only guaranteed to succeed if handles are destroyed in
+ * reverse order of their creation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to destroy.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error);
+
+/**
+ * Destroy all flow rules associated with a port.
+ *
+ * In the unlikely event of failure, handles are still considered destroyed
+ * and no longer valid but the port must be assumed to be in an inconsistent
+ * state.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error);
+
+/**
+ * Query an existing flow rule.
+ *
+ * This function allows retrieving flow-specific data such as counters.
+ * Data is gathered by special actions which must be present in the flow
+ * rule definition.
+ *
+ * \see RTE_FLOW_ACTION_TYPE_COUNT
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to query.
+ * @param action
+ *   Action type to query.
+ * @param[in, out] data
+ *   Pointer to storage for the associated query data type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_H_ */
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
new file mode 100644
index 0000000..274562c
--- /dev/null
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -0,0 +1,182 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_DRIVER_H_
+#define RTE_FLOW_DRIVER_H_
+
+/**
+ * @file
+ * RTE generic flow API (driver side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include "rte_flow.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Generic flow operations structure implemented and returned by PMDs.
+ *
+ * To implement this API, PMDs must handle the RTE_ETH_FILTER_GENERIC filter
+ * type in their .filter_ctrl callback function (struct eth_dev_ops) as well
+ * as the RTE_ETH_FILTER_GET filter operation.
+ *
+ * If successful, this operation must result in a pointer to a PMD-specific
+ * struct rte_flow_ops written to the argument address as described below:
+ *
+ * \code
+ *
+ * // PMD filter_ctrl callback
+ *
+ * static const struct rte_flow_ops pmd_flow_ops = { ... };
+ *
+ * switch (filter_type) {
+ * case RTE_ETH_FILTER_GENERIC:
+ *     if (filter_op != RTE_ETH_FILTER_GET)
+ *         return -EINVAL;
+ *     *(const void **)arg = &pmd_flow_ops;
+ *     return 0;
+ * }
+ *
+ * \endcode
+ *
+ * See also rte_flow_ops_get().
+ *
+ * These callback functions are not supposed to be used by applications
+ * directly, which must rely on the API defined in rte_flow.h.
+ *
+ * Public-facing wrapper functions perform a few consistency checks so that
+ * unimplemented (i.e. NULL) callbacks simply return -ENOTSUP. These
+ * callbacks otherwise only differ by their first argument (with port ID
+ * already resolved to a pointer to struct rte_eth_dev).
+ */
+struct rte_flow_ops {
+	/** See rte_flow_validate(). */
+	int (*validate)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_create(). */
+	struct rte_flow *(*create)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_destroy(). */
+	int (*destroy)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 struct rte_flow_error *);
+	/** See rte_flow_flush(). */
+	int (*flush)
+		(struct rte_eth_dev *,
+		 struct rte_flow_error *);
+	/** See rte_flow_query(). */
+	int (*query)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 enum rte_flow_action_type,
+		 void *,
+		 struct rte_flow_error *);
+};
+
+/**
+ * Initialize generic flow error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Pointer to flow error structure.
+ */
+static inline struct rte_flow_error *
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return error;
+}
+
+/**
+ * Get generic flow operations structure from a port.
+ *
+ * @param port_id
+ *   Port identifier to query.
+ * @param[out] error
+ *   Pointer to flow error structure.
+ *
+ * @return
+ *   The flow operations structure associated with port_id, NULL in case of
+ *   error, in which case rte_errno is set and the error structure contains
+ *   additional details.
+ */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_DRIVER_H_ */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 06/13] ethdev: make dev_info generic (not just PCI)
  2016-12-19 21:59 16% ` [dpdk-dev] [PATCH 06/13] ethdev: make dev_info generic (not just PCI) Stephen Hemminger
@ 2016-12-20 11:20  0%   ` Jan Blunck
  0 siblings, 0 replies; 200+ results
From: Jan Blunck @ 2016-12-20 11:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

On Mon, Dec 19, 2016 at 10:59 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> The result from rte_eth_dev_info_get should have pointer to
> device not PCI device.  This breaks ABI but is necessary.
>
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  app/test-pmd/config.c                  | 32 ++++++++++++++++++++++++++--
>  app/test-pmd/testpmd.c                 | 11 ++++++++--
>  app/test-pmd/testpmd.h                 | 32 ++++++++++++++++------------
>  app/test/test_kni.c                    | 39 ++++++++++++++++++++++++++++------
>  doc/guides/rel_notes/release_17_02.rst | 10 +++------
>  lib/librte_ether/rte_ethdev.c          |  3 ++-
>  lib/librte_ether/rte_ethdev.h          |  2 +-
>  7 files changed, 96 insertions(+), 33 deletions(-)
>
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 8cf537d5..1d0974ad 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -553,6 +553,16 @@ port_id_is_invalid(portid_t port_id, enum print_warning warning)
>         return 1;
>  }
>
> +int
> +port_is_not_pci(portid_t port_id)
> +{
> +       if (ports[port_id].pci_dev)
> +               return 0;
> +
> +       printf("Port %u is not a PCI device\n", port_id);
> +       return 1;
> +}
> +
>  static int
>  vlan_id_is_invalid(uint16_t vlan_id)
>  {
> @@ -565,15 +575,22 @@ vlan_id_is_invalid(uint16_t vlan_id)
>  static int
>  port_reg_off_is_invalid(portid_t port_id, uint32_t reg_off)
>  {
> +       struct rte_pci_device *pci_dev = ports[port_id].pci_dev;
>         uint64_t pci_len;
>
> +       if (pci_dev == NULL) {
> +               printf("Port %u is not a PCI device\n", port_id);
> +               return 1;
> +       }
> +
>         if (reg_off & 0x3) {
>                 printf("Port register offset 0x%X not aligned on a 4-byte "
>                        "boundary\n",
>                        (unsigned)reg_off);
>                 return 1;
>         }
> -       pci_len = ports[port_id].dev_info.pci_dev->mem_resource[0].len;
> +
> +       pci_len = pci_dev->mem_resource[0].len;
>         if (reg_off >= pci_len) {
>                 printf("Port %d: register offset %u (0x%X) out of port PCI "
>                        "resource (length=%"PRIu64")\n",
> @@ -607,9 +624,10 @@ port_reg_bit_display(portid_t port_id, uint32_t reg_off, uint8_t bit_x)
>  {
>         uint32_t reg_v;
>
> -
>         if (port_id_is_invalid(port_id, ENABLED_WARN))
>                 return;
> +       if (port_is_not_pci(port_id))
> +               return;
>         if (port_reg_off_is_invalid(port_id, reg_off))
>                 return;
>         if (reg_bit_pos_is_invalid(bit_x))
> @@ -629,6 +647,8 @@ port_reg_bit_field_display(portid_t port_id, uint32_t reg_off,
>
>         if (port_id_is_invalid(port_id, ENABLED_WARN))
>                 return;
> +       if (port_is_not_pci(port_id))
> +               return;
>         if (port_reg_off_is_invalid(port_id, reg_off))
>                 return;
>         if (reg_bit_pos_is_invalid(bit1_pos))
> @@ -658,6 +678,8 @@ port_reg_display(portid_t port_id, uint32_t reg_off)
>                 return;
>         if (port_reg_off_is_invalid(port_id, reg_off))
>                 return;
> +       if (port_is_not_pci(port_id))
> +               return;
>         reg_v = port_id_pci_reg_read(port_id, reg_off);
>         display_port_reg_value(port_id, reg_off, reg_v);
>  }
> @@ -670,6 +692,8 @@ port_reg_bit_set(portid_t port_id, uint32_t reg_off, uint8_t bit_pos,
>
>         if (port_id_is_invalid(port_id, ENABLED_WARN))
>                 return;
> +       if (port_is_not_pci(port_id))
> +               return;
>         if (port_reg_off_is_invalid(port_id, reg_off))
>                 return;
>         if (reg_bit_pos_is_invalid(bit_pos))
> @@ -698,6 +722,8 @@ port_reg_bit_field_set(portid_t port_id, uint32_t reg_off,
>
>         if (port_id_is_invalid(port_id, ENABLED_WARN))
>                 return;
> +       if (port_is_not_pci(port_id))
> +               return;
>         if (port_reg_off_is_invalid(port_id, reg_off))
>                 return;
>         if (reg_bit_pos_is_invalid(bit1_pos))
> @@ -732,6 +758,8 @@ port_reg_set(portid_t port_id, uint32_t reg_off, uint32_t reg_v)
>  {
>         if (port_id_is_invalid(port_id, ENABLED_WARN))
>                 return;
> +       if (port_is_not_pci(port_id))
> +               return;
>         if (port_reg_off_is_invalid(port_id, reg_off))
>                 return;
>         port_id_pci_reg_write(port_id, reg_off, reg_v);
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index a0332c26..faf1e16d 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -492,7 +492,6 @@ static void
>  init_config(void)
>  {
>         portid_t pid;
> -       struct rte_port *port;
>         struct rte_mempool *mbp;
>         unsigned int nb_mbuf_per_pool;
>         lcoreid_t  lc_id;
> @@ -547,9 +546,17 @@ init_config(void)
>         }
>
>         FOREACH_PORT(pid, ports) {
> -               port = &ports[pid];
> +               struct rte_port *port = &ports[pid];
> +               struct rte_device *dev;
> +
>                 rte_eth_dev_info_get(pid, &port->dev_info);
>
> +               dev = port->dev_info.device;
> +               if (dev->driver->type == PMD_PCI)
> +                       port->pci_dev = container_of(dev, struct rte_pci_device, device);
> +               else
> +                       port->pci_dev = NULL;
> +
>                 if (numa_support) {
>                         if (port_numa[pid] != NUMA_NO_CONFIG)
>                                 port_per_socket[port_numa[pid]]++;
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 9c1e7039..e8aca32a 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -149,6 +149,7 @@ struct fwd_stream {
>   */
>  struct rte_port {
>         uint8_t                 enabled;    /**< Port enabled or not */
> +       struct rte_pci_device   *pci_dev;
>         struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
>         struct rte_eth_conf     dev_conf;   /**< Port configuration. */
>         struct ether_addr       eth_addr;   /**< Port ethernet address */
> @@ -442,34 +443,36 @@ mbuf_pool_find(unsigned int sock_id)
>   * Read/Write operations on a PCI register of a port.
>   */
>  static inline uint32_t
> -port_pci_reg_read(struct rte_port *port, uint32_t reg_off)
> +pci_reg_read(struct rte_pci_device *pci_dev, uint32_t reg_off)
>  {
> -       void *reg_addr;
> +       void *reg_addr
> +               = (char *)pci_dev->mem_resource[0].addr + reg_off;
>         uint32_t reg_v;
>
> -       reg_addr = (void *)
> -               ((char *)port->dev_info.pci_dev->mem_resource[0].addr +
> -                       reg_off);
>         reg_v = *((volatile uint32_t *)reg_addr);
>         return rte_le_to_cpu_32(reg_v);
>  }
>
> -#define port_id_pci_reg_read(pt_id, reg_off) \
> -       port_pci_reg_read(&ports[(pt_id)], (reg_off))
> +static inline uint32_t
> +port_id_pci_reg_read(portid_t pt_id, uint32_t reg_off)
> +{
> +       return pci_reg_read(ports[pt_id].pci_dev, reg_off);
> +}
>
>  static inline void
> -port_pci_reg_write(struct rte_port *port, uint32_t reg_off, uint32_t reg_v)
> +pci_reg_write(struct rte_pci_device *pci_dev, uint32_t reg_off, uint32_t reg_v)
>  {
> -       void *reg_addr;
> +       void *reg_addr
> +               = (char *)pci_dev->mem_resource[0].addr + reg_off;
>
> -       reg_addr = (void *)
> -               ((char *)port->dev_info.pci_dev->mem_resource[0].addr +
> -                       reg_off);
>         *((volatile uint32_t *)reg_addr) = rte_cpu_to_le_32(reg_v);
>  }
>
> -#define port_id_pci_reg_write(pt_id, reg_off, reg_value) \
> -       port_pci_reg_write(&ports[(pt_id)], (reg_off), (reg_value))
> +static inline void
> +port_id_pci_reg_write(portid_t pt_id, uint32_t reg_off, uint32_t reg_v)
> +{
> +       return pci_reg_write(ports[pt_id].pci_dev, reg_off, reg_v);
> +}
>
>  /* Prototypes */
>  unsigned int parse_item_list(char* str, const char* item_name,
> @@ -598,6 +601,7 @@ enum print_warning {
>         ENABLED_WARN = 0,
>         DISABLED_WARN
>  };
> +int port_is_not_pci(portid_t port_id);
>  int port_id_is_invalid(portid_t port_id, enum print_warning warning);
>
>  /*
> diff --git a/app/test/test_kni.c b/app/test/test_kni.c
> index 309741cb..6b2ebbed 100644
> --- a/app/test/test_kni.c
> +++ b/app/test/test_kni.c
> @@ -370,6 +370,8 @@ test_kni_processing(uint8_t port_id, struct rte_mempool *mp)
>         struct rte_kni_conf conf;
>         struct rte_eth_dev_info info;
>         struct rte_kni_ops ops;
> +       struct rte_device *dev;
> +       struct rte_pci_device *pci_dev;
>
>         if (!mp)
>                 return -1;
> @@ -379,8 +381,16 @@ test_kni_processing(uint8_t port_id, struct rte_mempool *mp)
>         memset(&ops, 0, sizeof(ops));
>
>         rte_eth_dev_info_get(port_id, &info);
> -       conf.addr = info.pci_dev->addr;
> -       conf.id = info.pci_dev->id;
> +
> +       dev = info.device;
> +       if (dev->driver->type != PMD_PCI) {
> +               printf("device is not PCI\n");
> +               return -1;
> +       }
> +
> +       pci_dev = container_of(dev, struct rte_pci_device, device);
> +       conf.addr = pci_dev->addr;
> +       conf.id = pci_dev->id;
>         snprintf(conf.name, sizeof(conf.name), TEST_KNI_PORT);
>
>         /* core id 1 configured for kernel thread */
> @@ -478,6 +488,8 @@ test_kni(void)
>         struct rte_kni_conf conf;
>         struct rte_eth_dev_info info;
>         struct rte_kni_ops ops;
> +       struct rte_device *dev;
> +       struct rte_pci_device *pci_dev;
>
>         /* Initialize KNI subsytem */
>         rte_kni_init(KNI_TEST_MAX_PORTS);
> @@ -536,8 +548,16 @@ test_kni(void)
>         memset(&conf, 0, sizeof(conf));
>         memset(&ops, 0, sizeof(ops));
>         rte_eth_dev_info_get(port_id, &info);
> -       conf.addr = info.pci_dev->addr;
> -       conf.id = info.pci_dev->id;
> +
> +       dev = info.device;
> +       if (dev->driver->type != PMD_PCI) {
> +               printf("device is not PCI\n");
> +               return -1;
> +       }
> +
> +       pci_dev = container_of(dev, struct rte_pci_device, device);
> +       conf.addr = pci_dev->addr;
> +       conf.id = pci_dev->id;
>         conf.group_id = (uint16_t)port_id;
>         conf.mbuf_size = MAX_PACKET_SZ;
>
> @@ -565,8 +585,15 @@ test_kni(void)
>         memset(&info, 0, sizeof(info));
>         memset(&ops, 0, sizeof(ops));
>         rte_eth_dev_info_get(port_id, &info);
> -       conf.addr = info.pci_dev->addr;
> -       conf.id = info.pci_dev->id;
> +       dev = info.device;
> +       if (dev->driver->type != PMD_PCI) {
> +               printf("device is not PCI\n");
> +               return -1;
> +       }
> +
> +       pci_dev = container_of(dev, struct rte_pci_device, device);
> +       conf.addr = pci_dev->addr;
> +       conf.id = pci_dev->id;
>         conf.group_id = (uint16_t)port_id;
>         conf.mbuf_size = MAX_PACKET_SZ;
>
> diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
> index 3b650388..30b23703 100644
> --- a/doc/guides/rel_notes/release_17_02.rst
> +++ b/doc/guides/rel_notes/release_17_02.rst
> @@ -106,16 +106,12 @@ API Changes
>  ABI Changes
>  -----------
>
> -.. This section should contain ABI changes. Sample format:
> -
> -   * Add a short 1-2 sentence description of the ABI change that was announced in
> +.. * Add a short 1-2 sentence description of the ABI change that was announced in
>       the previous releases and made in this release. Use fixed width quotes for
>       ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
>
> -   This section is a comment. do not overwrite or remove it.
> -   Also, make sure to start the actual text at the margin.
> -   =========================================================
> -
> +* The ``rte_eth_dev_info`` structure no longer has pointer to PCI device, but
> +  instead has new_field ``device`` which is a pointer to a generic ethernet device.
>
>
>  Shared Library Versions
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 1e0f2061..71a8e9b9 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1568,7 +1568,8 @@ rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
>
>         RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>         (*dev->dev_ops->dev_infos_get)(dev, dev_info);
> -       dev_info->pci_dev = dev->pci_dev;
> +
> +       dev_info->device = &dev->pci_dev->device;
>         dev_info->driver_name = dev->data->drv_name;

I don't think that exposing the device through dev_info makes this a
future proof. If we want to model some kind of extensions to dev_info
we should instead model this explicitly.

So from my point of view the pci_dev should get removed instead.


>         dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>         dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 3c85e331..2b3b4014 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -879,7 +879,7 @@ struct rte_eth_conf {
>   * Ethernet device information
>   */
>  struct rte_eth_dev_info {
> -       struct rte_pci_device *pci_dev; /**< Device PCI information. */
> +       struct rte_device *device; /**< Device information. */
>         const char *driver_name; /**< Device Driver name. */
>         unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>                 Use if_indextoname() to translate into an interface name. */
> --
> 2.11.0
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 06/13] ethdev: make dev_info generic (not just PCI)
  @ 2016-12-19 21:59 16% ` Stephen Hemminger
  2016-12-20 11:20  0%   ` Jan Blunck
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2016-12-19 21:59 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The result from rte_eth_dev_info_get should have pointer to
device not PCI device.  This breaks ABI but is necessary.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 app/test-pmd/config.c                  | 32 ++++++++++++++++++++++++++--
 app/test-pmd/testpmd.c                 | 11 ++++++++--
 app/test-pmd/testpmd.h                 | 32 ++++++++++++++++------------
 app/test/test_kni.c                    | 39 ++++++++++++++++++++++++++++------
 doc/guides/rel_notes/release_17_02.rst | 10 +++------
 lib/librte_ether/rte_ethdev.c          |  3 ++-
 lib/librte_ether/rte_ethdev.h          |  2 +-
 7 files changed, 96 insertions(+), 33 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 8cf537d5..1d0974ad 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -553,6 +553,16 @@ port_id_is_invalid(portid_t port_id, enum print_warning warning)
 	return 1;
 }
 
+int
+port_is_not_pci(portid_t port_id)
+{
+	if (ports[port_id].pci_dev)
+		return 0;
+
+	printf("Port %u is not a PCI device\n", port_id);
+	return 1;
+}
+
 static int
 vlan_id_is_invalid(uint16_t vlan_id)
 {
@@ -565,15 +575,22 @@ vlan_id_is_invalid(uint16_t vlan_id)
 static int
 port_reg_off_is_invalid(portid_t port_id, uint32_t reg_off)
 {
+	struct rte_pci_device *pci_dev = ports[port_id].pci_dev;
 	uint64_t pci_len;
 
+	if (pci_dev == NULL) {
+		printf("Port %u is not a PCI device\n", port_id);
+		return 1;
+	}
+
 	if (reg_off & 0x3) {
 		printf("Port register offset 0x%X not aligned on a 4-byte "
 		       "boundary\n",
 		       (unsigned)reg_off);
 		return 1;
 	}
-	pci_len = ports[port_id].dev_info.pci_dev->mem_resource[0].len;
+
+	pci_len = pci_dev->mem_resource[0].len;
 	if (reg_off >= pci_len) {
 		printf("Port %d: register offset %u (0x%X) out of port PCI "
 		       "resource (length=%"PRIu64")\n",
@@ -607,9 +624,10 @@ port_reg_bit_display(portid_t port_id, uint32_t reg_off, uint8_t bit_x)
 {
 	uint32_t reg_v;
 
-
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+	if (port_is_not_pci(port_id))
+		return;
 	if (port_reg_off_is_invalid(port_id, reg_off))
 		return;
 	if (reg_bit_pos_is_invalid(bit_x))
@@ -629,6 +647,8 @@ port_reg_bit_field_display(portid_t port_id, uint32_t reg_off,
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+	if (port_is_not_pci(port_id))
+		return;
 	if (port_reg_off_is_invalid(port_id, reg_off))
 		return;
 	if (reg_bit_pos_is_invalid(bit1_pos))
@@ -658,6 +678,8 @@ port_reg_display(portid_t port_id, uint32_t reg_off)
 		return;
 	if (port_reg_off_is_invalid(port_id, reg_off))
 		return;
+	if (port_is_not_pci(port_id))
+		return;
 	reg_v = port_id_pci_reg_read(port_id, reg_off);
 	display_port_reg_value(port_id, reg_off, reg_v);
 }
@@ -670,6 +692,8 @@ port_reg_bit_set(portid_t port_id, uint32_t reg_off, uint8_t bit_pos,
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+	if (port_is_not_pci(port_id))
+		return;
 	if (port_reg_off_is_invalid(port_id, reg_off))
 		return;
 	if (reg_bit_pos_is_invalid(bit_pos))
@@ -698,6 +722,8 @@ port_reg_bit_field_set(portid_t port_id, uint32_t reg_off,
 
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+	if (port_is_not_pci(port_id))
+		return;
 	if (port_reg_off_is_invalid(port_id, reg_off))
 		return;
 	if (reg_bit_pos_is_invalid(bit1_pos))
@@ -732,6 +758,8 @@ port_reg_set(portid_t port_id, uint32_t reg_off, uint32_t reg_v)
 {
 	if (port_id_is_invalid(port_id, ENABLED_WARN))
 		return;
+	if (port_is_not_pci(port_id))
+		return;
 	if (port_reg_off_is_invalid(port_id, reg_off))
 		return;
 	port_id_pci_reg_write(port_id, reg_off, reg_v);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a0332c26..faf1e16d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -492,7 +492,6 @@ static void
 init_config(void)
 {
 	portid_t pid;
-	struct rte_port *port;
 	struct rte_mempool *mbp;
 	unsigned int nb_mbuf_per_pool;
 	lcoreid_t  lc_id;
@@ -547,9 +546,17 @@ init_config(void)
 	}
 
 	FOREACH_PORT(pid, ports) {
-		port = &ports[pid];
+		struct rte_port *port = &ports[pid];
+		struct rte_device *dev;
+
 		rte_eth_dev_info_get(pid, &port->dev_info);
 
+		dev = port->dev_info.device;
+		if (dev->driver->type == PMD_PCI)
+			port->pci_dev = container_of(dev, struct rte_pci_device, device);
+		else
+			port->pci_dev = NULL;
+
 		if (numa_support) {
 			if (port_numa[pid] != NUMA_NO_CONFIG)
 				port_per_socket[port_numa[pid]]++;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 9c1e7039..e8aca32a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -149,6 +149,7 @@ struct fwd_stream {
  */
 struct rte_port {
 	uint8_t                 enabled;    /**< Port enabled or not */
+	struct rte_pci_device	*pci_dev;
 	struct rte_eth_dev_info dev_info;   /**< PCI info + driver name */
 	struct rte_eth_conf     dev_conf;   /**< Port configuration. */
 	struct ether_addr       eth_addr;   /**< Port ethernet address */
@@ -442,34 +443,36 @@ mbuf_pool_find(unsigned int sock_id)
  * Read/Write operations on a PCI register of a port.
  */
 static inline uint32_t
-port_pci_reg_read(struct rte_port *port, uint32_t reg_off)
+pci_reg_read(struct rte_pci_device *pci_dev, uint32_t reg_off)
 {
-	void *reg_addr;
+	void *reg_addr
+		= (char *)pci_dev->mem_resource[0].addr + reg_off;
 	uint32_t reg_v;
 
-	reg_addr = (void *)
-		((char *)port->dev_info.pci_dev->mem_resource[0].addr +
-			reg_off);
 	reg_v = *((volatile uint32_t *)reg_addr);
 	return rte_le_to_cpu_32(reg_v);
 }
 
-#define port_id_pci_reg_read(pt_id, reg_off) \
-	port_pci_reg_read(&ports[(pt_id)], (reg_off))
+static inline uint32_t
+port_id_pci_reg_read(portid_t pt_id, uint32_t reg_off)
+{
+	return pci_reg_read(ports[pt_id].pci_dev, reg_off);
+}
 
 static inline void
-port_pci_reg_write(struct rte_port *port, uint32_t reg_off, uint32_t reg_v)
+pci_reg_write(struct rte_pci_device *pci_dev, uint32_t reg_off, uint32_t reg_v)
 {
-	void *reg_addr;
+	void *reg_addr
+		= (char *)pci_dev->mem_resource[0].addr + reg_off;
 
-	reg_addr = (void *)
-		((char *)port->dev_info.pci_dev->mem_resource[0].addr +
-			reg_off);
 	*((volatile uint32_t *)reg_addr) = rte_cpu_to_le_32(reg_v);
 }
 
-#define port_id_pci_reg_write(pt_id, reg_off, reg_value) \
-	port_pci_reg_write(&ports[(pt_id)], (reg_off), (reg_value))
+static inline void
+port_id_pci_reg_write(portid_t pt_id, uint32_t reg_off, uint32_t reg_v)
+{
+	return pci_reg_write(ports[pt_id].pci_dev, reg_off, reg_v);
+}
 
 /* Prototypes */
 unsigned int parse_item_list(char* str, const char* item_name,
@@ -598,6 +601,7 @@ enum print_warning {
 	ENABLED_WARN = 0,
 	DISABLED_WARN
 };
+int port_is_not_pci(portid_t port_id);
 int port_id_is_invalid(portid_t port_id, enum print_warning warning);
 
 /*
diff --git a/app/test/test_kni.c b/app/test/test_kni.c
index 309741cb..6b2ebbed 100644
--- a/app/test/test_kni.c
+++ b/app/test/test_kni.c
@@ -370,6 +370,8 @@ test_kni_processing(uint8_t port_id, struct rte_mempool *mp)
 	struct rte_kni_conf conf;
 	struct rte_eth_dev_info info;
 	struct rte_kni_ops ops;
+	struct rte_device *dev;
+	struct rte_pci_device *pci_dev;
 
 	if (!mp)
 		return -1;
@@ -379,8 +381,16 @@ test_kni_processing(uint8_t port_id, struct rte_mempool *mp)
 	memset(&ops, 0, sizeof(ops));
 
 	rte_eth_dev_info_get(port_id, &info);
-	conf.addr = info.pci_dev->addr;
-	conf.id = info.pci_dev->id;
+
+	dev = info.device;
+	if (dev->driver->type != PMD_PCI) {
+		printf("device is not PCI\n");
+		return -1;
+	}
+
+	pci_dev = container_of(dev, struct rte_pci_device, device);
+	conf.addr = pci_dev->addr;
+	conf.id = pci_dev->id;
 	snprintf(conf.name, sizeof(conf.name), TEST_KNI_PORT);
 
 	/* core id 1 configured for kernel thread */
@@ -478,6 +488,8 @@ test_kni(void)
 	struct rte_kni_conf conf;
 	struct rte_eth_dev_info info;
 	struct rte_kni_ops ops;
+	struct rte_device *dev;
+	struct rte_pci_device *pci_dev;
 
 	/* Initialize KNI subsytem */
 	rte_kni_init(KNI_TEST_MAX_PORTS);
@@ -536,8 +548,16 @@ test_kni(void)
 	memset(&conf, 0, sizeof(conf));
 	memset(&ops, 0, sizeof(ops));
 	rte_eth_dev_info_get(port_id, &info);
-	conf.addr = info.pci_dev->addr;
-	conf.id = info.pci_dev->id;
+
+	dev = info.device;
+	if (dev->driver->type != PMD_PCI) {
+		printf("device is not PCI\n");
+		return -1;
+	}
+
+	pci_dev = container_of(dev, struct rte_pci_device, device);
+	conf.addr = pci_dev->addr;
+	conf.id = pci_dev->id;
 	conf.group_id = (uint16_t)port_id;
 	conf.mbuf_size = MAX_PACKET_SZ;
 
@@ -565,8 +585,15 @@ test_kni(void)
 	memset(&info, 0, sizeof(info));
 	memset(&ops, 0, sizeof(ops));
 	rte_eth_dev_info_get(port_id, &info);
-	conf.addr = info.pci_dev->addr;
-	conf.id = info.pci_dev->id;
+	dev = info.device;
+	if (dev->driver->type != PMD_PCI) {
+		printf("device is not PCI\n");
+		return -1;
+	}
+
+	pci_dev = container_of(dev, struct rte_pci_device, device);
+	conf.addr = pci_dev->addr;
+	conf.id = pci_dev->id;
 	conf.group_id = (uint16_t)port_id;
 	conf.mbuf_size = MAX_PACKET_SZ;
 
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 3b650388..30b23703 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -106,16 +106,12 @@ API Changes
 ABI Changes
 -----------
 
-.. This section should contain ABI changes. Sample format:
-
-   * Add a short 1-2 sentence description of the ABI change that was announced in
+.. * Add a short 1-2 sentence description of the ABI change that was announced in
      the previous releases and made in this release. Use fixed width quotes for
      ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
 
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
+* The ``rte_eth_dev_info`` structure no longer has pointer to PCI device, but
+  instead has new_field ``device`` which is a pointer to a generic ethernet device.
 
 
 Shared Library Versions
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1e0f2061..71a8e9b9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1568,7 +1568,8 @@ rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 
 	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
 	(*dev->dev_ops->dev_infos_get)(dev, dev_info);
-	dev_info->pci_dev = dev->pci_dev;
+
+	dev_info->device = &dev->pci_dev->device;
 	dev_info->driver_name = dev->data->drv_name;
 	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
 	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 3c85e331..2b3b4014 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -879,7 +879,7 @@ struct rte_eth_conf {
  * Ethernet device information
  */
 struct rte_eth_dev_info {
-	struct rte_pci_device *pci_dev; /**< Device PCI information. */
+	struct rte_device *device; /**< Device information. */
 	const char *driver_name; /**< Device Driver name. */
 	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
 		Use if_indextoname() to translate into an interface name. */
-- 
2.11.0

^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v2] doc: fix required tools list layout
@ 2016-12-19 19:28 11% Baruch Siach
  0 siblings, 0 replies; 200+ results
From: Baruch Siach @ 2016-12-19 19:28 UTC (permalink / raw)
  To: dev; +Cc: John McNamara, Baruch Siach

The Python requirement should appear in the bullet list.

Also, indent the x32 note, since it is related to the previous bullet.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
v2: Indent also the note paragraph (John)
---
 doc/guides/linux_gsg/sys_reqs.rst | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 3d743421595a..76d82e6eef75 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -79,14 +79,12 @@ Compilation of the DPDK
 
     * glibc.ppc64, libgcc.ppc64, libstdc++.ppc64 and glibc-devel.ppc64 for IBM ppc_64;
 
-.. note::
-
-    x86_x32 ABI is currently supported with distribution packages only on Ubuntu
-    higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.
+    .. note::
 
-.. note::
+       x86_x32 ABI is currently supported with distribution packages only on Ubuntu
+       higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.
 
-    Python, version 2.6 or 2.7, to use various helper scripts included in the DPDK package.
+*   Python, version 2.6 or 2.7, to use various helper scripts included in the DPDK package.
 
 
 **Optional Tools:**
-- 
2.11.0

^ permalink raw reply	[relevance 11%]

* [dpdk-dev] [PATCH v3 04/25] cmdline: add support for dynamic tokens
    2016-12-19 17:48  2%       ` [dpdk-dev] [PATCH v3 01/25] ethdev: introduce generic flow API Adrien Mazarguil
  2016-12-19 17:48  1%       ` [dpdk-dev] [PATCH v3 02/25] doc: add rte_flow prog guide Adrien Mazarguil
@ 2016-12-19 17:48  2%       ` Adrien Mazarguil
    3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-19 17:48 UTC (permalink / raw)
  To: dev

Considering tokens must be hard-coded in a list part of the instruction
structure, context-dependent tokens cannot be expressed.

This commit adds support for building dynamic token lists through a
user-provided function, which is called when the static token list is empty
(a single NULL entry).

Because no structures are modified (existing fields are reused), this
commit has no impact on the current ABI.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 lib/librte_cmdline/cmdline_parse.c | 60 +++++++++++++++++++++++++++++----
 lib/librte_cmdline/cmdline_parse.h | 21 ++++++++++++
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c
index b496067..14f5553 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -146,7 +146,9 @@ nb_common_chars(const char * s1, const char * s2)
  */
 static int
 match_inst(cmdline_parse_inst_t *inst, const char *buf,
-	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size)
+	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size,
+	   cmdline_parse_token_hdr_t
+		*(*dyn_tokens)[CMDLINE_PARSE_DYNAMIC_TOKENS])
 {
 	unsigned int token_num=0;
 	cmdline_parse_token_hdr_t * token_p;
@@ -155,6 +157,11 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 	struct cmdline_token_hdr token_hdr;
 
 	token_p = inst->tokens[token_num];
+	if (!token_p && dyn_tokens && inst->f) {
+		if (!(*dyn_tokens)[0])
+			inst->f(&(*dyn_tokens)[0], NULL, dyn_tokens);
+		token_p = (*dyn_tokens)[0];
+	}
 	if (token_p)
 		memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -196,7 +203,17 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 		buf += n;
 
 		token_num ++;
-		token_p = inst->tokens[token_num];
+		if (!inst->tokens[0]) {
+			if (token_num < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!(*dyn_tokens)[token_num])
+					inst->f(&(*dyn_tokens)[token_num],
+						NULL,
+						dyn_tokens);
+				token_p = (*dyn_tokens)[token_num];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[token_num];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 	}
@@ -239,6 +256,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 	cmdline_parse_inst_t *inst;
 	const char *curbuf;
 	char result_buf[CMDLINE_PARSE_RESULT_BUFSIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	void (*f)(void *, struct cmdline *, void *) = NULL;
 	void *data = NULL;
 	int comment = 0;
@@ -255,6 +273,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		return CMDLINE_PARSE_BAD_ARGS;
 
 	ctx = cl->ctx;
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/*
 	 * - look if the buffer contains at least one line
@@ -299,7 +318,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		debug_printf("INST %d\n", inst_num);
 
 		/* fully parsed */
-		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf));
+		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf),
+				 &dyn_tokens);
 
 		if (tok > 0) /* we matched at least one token */
 			err = CMDLINE_PARSE_BAD_ARGS;
@@ -355,6 +375,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 	cmdline_parse_token_hdr_t *token_p;
 	struct cmdline_token_hdr token_hdr;
 	char tmpbuf[CMDLINE_BUFFER_SIZE], comp_buf[CMDLINE_BUFFER_SIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	unsigned int partial_tok_len;
 	int comp_len = -1;
 	int tmp_len = -1;
@@ -374,6 +395,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 
 	debug_printf("%s called\n", __func__);
 	memset(&token_hdr, 0, sizeof(token_hdr));
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/* count the number of complete token to parse */
 	for (i=0 ; buf[i] ; i++) {
@@ -396,11 +418,24 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		inst = ctx[inst_num];
 		while (inst) {
 			/* parse the first tokens of the inst */
-			if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+			if (nb_token &&
+			    match_inst(inst, buf, nb_token, NULL, 0,
+				       &dyn_tokens))
 				goto next;
 
 			debug_printf("instruction match\n");
-			token_p = inst->tokens[nb_token];
+			if (!inst->tokens[0]) {
+				if (nb_token <
+				    (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+					if (!dyn_tokens[nb_token])
+						inst->f(&dyn_tokens[nb_token],
+							NULL,
+							&dyn_tokens);
+					token_p = dyn_tokens[nb_token];
+				} else
+					token_p = NULL;
+			} else
+				token_p = inst->tokens[nb_token];
 			if (token_p)
 				memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -490,10 +525,21 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		/* we need to redo it */
 		inst = ctx[inst_num];
 
-		if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+		if (nb_token &&
+		    match_inst(inst, buf, nb_token, NULL, 0, &dyn_tokens))
 			goto next2;
 
-		token_p = inst->tokens[nb_token];
+		if (!inst->tokens[0]) {
+			if (nb_token < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!dyn_tokens[nb_token])
+					inst->f(&dyn_tokens[nb_token],
+						NULL,
+						&dyn_tokens);
+				token_p = dyn_tokens[nb_token];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[nb_token];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
diff --git a/lib/librte_cmdline/cmdline_parse.h b/lib/librte_cmdline/cmdline_parse.h
index 4ac05d6..65b18d4 100644
--- a/lib/librte_cmdline/cmdline_parse.h
+++ b/lib/librte_cmdline/cmdline_parse.h
@@ -83,6 +83,9 @@ extern "C" {
 /* maximum buffer size for parsed result */
 #define CMDLINE_PARSE_RESULT_BUFSIZE 8192
 
+/* maximum number of dynamic tokens */
+#define CMDLINE_PARSE_DYNAMIC_TOKENS 128
+
 /**
  * Stores a pointer to the ops struct, and the offset: the place to
  * write the parsed result in the destination structure.
@@ -130,6 +133,24 @@ struct cmdline;
  * Store a instruction, which is a pointer to a callback function and
  * its parameter that is called when the instruction is parsed, a help
  * string, and a list of token composing this instruction.
+ *
+ * When no tokens are defined (tokens[0] == NULL), they are retrieved
+ * dynamically by calling f() as follows:
+ *
+ *  f((struct cmdline_token_hdr **)&token_hdr,
+ *    NULL,
+ *    (struct cmdline_token_hdr *[])tokens));
+ *
+ * The address of the resulting token is expected at the location pointed by
+ * the first argument. Can be set to NULL to end the list.
+ *
+ * The cmdline argument (struct cmdline *) is always NULL.
+ *
+ * The last argument points to the NULL-terminated list of dynamic tokens
+ * defined so far. Since token_hdr points to an index of that list, the
+ * current index can be derived as follows:
+ *
+ *  int index = token_hdr - &(*tokens)[0];
  */
 struct cmdline_inst {
 	/* f(parsed_struct, data) */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 02/25] doc: add rte_flow prog guide
    2016-12-19 17:48  2%       ` [dpdk-dev] [PATCH v3 01/25] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-12-19 17:48  1%       ` Adrien Mazarguil
  2016-12-19 17:48  2%       ` [dpdk-dev] [PATCH v3 04/25] cmdline: add support for dynamic tokens Adrien Mazarguil
    3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-19 17:48 UTC (permalink / raw)
  To: dev

This documentation is based on the latest RFC submission, subsequently
updated according to feedback from the community.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/prog_guide/index.rst    |    1 +
 doc/guides/prog_guide/rte_flow.rst | 2042 +++++++++++++++++++++++++++++++
 2 files changed, 2043 insertions(+)

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index e5a50a8..ed7f770 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -42,6 +42,7 @@ Programmer's Guide
     mempool_lib
     mbuf_lib
     poll_mode_drv
+    rte_flow
     cryptodev_lib
     link_bonding_poll_mode_drv_lib
     timer_lib
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
new file mode 100644
index 0000000..73fe809
--- /dev/null
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -0,0 +1,2042 @@
+..  BSD LICENSE
+    Copyright 2016 6WIND S.A.
+    Copyright 2016 Mellanox.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Generic_flow_API:
+
+Generic flow API (rte_flow)
+===========================
+
+Overview
+--------
+
+This API provides a generic means to configure hardware to match specific
+ingress or egress traffic, alter its fate and query related counters
+according to any number of user-defined rules.
+
+It is named *rte_flow* after the prefix used for all its symbols, and is
+defined in ``rte_flow.h``.
+
+- Matching can be performed on packet data (protocol headers, payload) and
+  properties (e.g. associated physical port, virtual device function ID).
+
+- Possible operations include dropping traffic, diverting it to specific
+  queues, to virtual/physical device functions or ports, performing tunnel
+  offloads, adding marks and so on.
+
+It is slightly higher-level than the legacy filtering framework which it
+encompasses and supersedes (including all functions and filter types) in
+order to expose a single interface with an unambiguous behavior that is
+common to all poll-mode drivers (PMDs).
+
+Several methods to migrate existing applications are described in `API
+migration`_.
+
+Flow rule
+---------
+
+Description
+~~~~~~~~~~~
+
+A flow rule is the combination of attributes with a matching pattern and a
+list of actions. Flow rules form the basis of this API.
+
+Flow rules can have several distinct actions (such as counting,
+encapsulating, decapsulating before redirecting packets to a particular
+queue, etc.), instead of relying on several rules to achieve this and having
+applications deal with hardware implementation details regarding their
+order.
+
+Support for different priority levels on a rule basis is provided, for
+example in order to force a more specific rule to come before a more generic
+one for packets matched by both. However hardware support for more than a
+single priority level cannot be guaranteed. When supported, the number of
+available priority levels is usually low, which is why they can also be
+implemented in software by PMDs (e.g. missing priority levels may be
+emulated by reordering rules).
+
+In order to remain as hardware-agnostic as possible, by default all rules
+are considered to have the same priority, which means that the order between
+overlapping rules (when a packet is matched by several filters) is
+undefined.
+
+PMDs may refuse to create overlapping rules at a given priority level when
+they can be detected (e.g. if a pattern matches an existing filter).
+
+Thus predictable results for a given priority level can only be achieved
+with non-overlapping rules, using perfect matching on all protocol layers.
+
+Flow rules can also be grouped, the flow rule priority is specific to the
+group they belong to. All flow rules in a given group are thus processed
+either before or after another group.
+
+Support for multiple actions per rule may be implemented internally on top
+of non-default hardware priorities, as a result both features may not be
+simultaneously available to applications.
+
+Considering that allowed pattern/actions combinations cannot be known in
+advance and would result in an impractically large number of capabilities to
+expose, a method is provided to validate a given rule from the current
+device configuration state.
+
+This enables applications to check if the rule types they need is supported
+at initialization time, before starting their data path. This method can be
+used anytime, its only requirement being that the resources needed by a rule
+should exist (e.g. a target RX queue should be configured first).
+
+Each defined rule is associated with an opaque handle managed by the PMD,
+applications are responsible for keeping it. These can be used for queries
+and rules management, such as retrieving counters or other data and
+destroying them.
+
+To avoid resource leaks on the PMD side, handles must be explicitly
+destroyed by the application before releasing associated resources such as
+queues and ports.
+
+The following sections cover:
+
+- **Attributes** (represented by ``struct rte_flow_attr``): properties of a
+  flow rule such as its direction (ingress or egress) and priority.
+
+- **Pattern item** (represented by ``struct rte_flow_item``): part of a
+  matching pattern that either matches specific packet data or traffic
+  properties. It can also describe properties of the pattern itself, such as
+  inverted matching.
+
+- **Matching pattern**: traffic properties to look for, a combination of any
+  number of items.
+
+- **Actions** (represented by ``struct rte_flow_action``): operations to
+  perform whenever a packet is matched by a pattern.
+
+Attributes
+~~~~~~~~~~
+
+Attribute: Group
+^^^^^^^^^^^^^^^^
+
+Flow rules can be grouped by assigning them a common group number. Lower
+values have higher priority. Group 0 has the highest priority.
+
+Although optional, applications are encouraged to group similar rules as
+much as possible to fully take advantage of hardware capabilities
+(e.g. optimized matching) and work around limitations (e.g. a single pattern
+type possibly allowed in a given group).
+
+Note that support for more than a single group is not guaranteed.
+
+Attribute: Priority
+^^^^^^^^^^^^^^^^^^^
+
+A priority level can be assigned to a flow rule. Like groups, lower values
+denote higher priority, with 0 as the maximum.
+
+A rule with priority 0 in group 8 is always matched after a rule with
+priority 8 in group 0.
+
+Group and priority levels are arbitrary and up to the application, they do
+not need to be contiguous nor start from 0, however the maximum number
+varies between devices and may be affected by existing flow rules.
+
+If a packet is matched by several rules of a given group for a given
+priority level, the outcome is undefined. It can take any path, may be
+duplicated or even cause unrecoverable errors.
+
+Note that support for more than a single priority level is not guaranteed.
+
+Attribute: Traffic direction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+
+Several pattern items and actions are valid and can be used in both
+directions. At least one direction must be specified.
+
+Specifying both directions at once for a given rule is not recommended but
+may be valid in a few cases (e.g. shared counters).
+
+Pattern item
+~~~~~~~~~~~~
+
+Pattern items fall in two categories:
+
+- Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+  IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+  specification structure.
+
+- Matching meta-data or affecting pattern processing (END, VOID, INVERT, PF,
+  VF, PORT and so on), often without a specification structure.
+
+Item specification structures are used to match specific values among
+protocol fields (or item properties). Documentation describes for each item
+whether they are associated with one and their type name if so.
+
+Up to three structures of the same type can be set for a given item:
+
+- ``spec``: values to match (e.g. a given IPv4 address).
+
+- ``last``: upper bound for an inclusive range with corresponding fields in
+  ``spec``.
+
+- ``mask``: bit-mask applied to both ``spec`` and ``last`` whose purpose is
+  to distinguish the values to take into account and/or partially mask them
+  out (e.g. in order to match an IPv4 address prefix).
+
+Usage restrictions and expected behavior:
+
+- Setting either ``mask`` or ``last`` without ``spec`` is an error.
+
+- Field values in ``last`` which are either 0 or equal to the corresponding
+  values in ``spec`` are ignored; they do not generate a range. Nonzero
+  values lower than those in ``spec`` are not supported.
+
+- Setting ``spec`` and optionally ``last`` without ``mask`` causes the PMD
+  to only take the fields it can recognize into account. There is no error
+  checking for unsupported fields.
+
+- Not setting any of them (assuming item type allows it) uses default
+  parameters that depend on the item type. Most of the time, particularly
+  for protocol header items, it is equivalent to providing an empty (zeroed)
+  ``mask``.
+
+- ``mask`` is a simple bit-mask applied before interpreting the contents of
+  ``spec`` and ``last``, which may yield unexpected results if not used
+  carefully. For example, if for an IPv4 address field, ``spec`` provides
+  *10.1.2.3*, ``last`` provides *10.3.4.5* and ``mask`` provides
+  *255.255.0.0*, the effective range becomes *10.1.0.0* to *10.3.255.255*.
+
+Example of an item specification matching an Ethernet header:
+
+.. _table_rte_flow_pattern_item_example:
+
+.. table:: Ethernet item
+
+   +----------+----------+--------------------+
+   | Field    | Subfield | Value              |
+   +==========+==========+====================+
+   | ``spec`` | ``src``  | ``00:01:02:03:04`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:2a:66:00:01`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x22aa``         |
+   +----------+----------+--------------------+
+   | ``last`` | unspecified                   |
+   +----------+----------+--------------------+
+   | ``mask`` | ``src``  | ``00:ff:ff:ff:00`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:00:00:00:ff`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x0000``         |
+   +----------+----------+--------------------+
+
+Non-masked bits stand for any value (shown as ``?`` below), Ethernet headers
+with the following properties are thus matched:
+
+- ``src``: ``??:01:02:03:??``
+- ``dst``: ``??:??:??:??:01``
+- ``type``: ``0x????``
+
+Matching pattern
+~~~~~~~~~~~~~~~~
+
+A pattern is formed by stacking items starting from the lowest protocol
+layer to match. This stacking restriction does not apply to meta items which
+can be placed anywhere in the stack without affecting the meaning of the
+resulting pattern.
+
+Patterns are terminated by END items.
+
+Examples:
+
+.. _table_rte_flow_tcpv4_as_l4:
+
+.. table:: TCPv4 as L4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | IPv4     |
+   +-------+----------+
+   | 2     | TCP      |
+   +-------+----------+
+   | 3     | END      |
+   +-------+----------+
+
+|
+
+.. _table_rte_flow_tcpv6_in_vxlan:
+
+.. table:: TCPv6 in VXLAN
+
+   +-------+------------+
+   | Index | Item       |
+   +=======+============+
+   | 0     | Ethernet   |
+   +-------+------------+
+   | 1     | IPv4       |
+   +-------+------------+
+   | 2     | UDP        |
+   +-------+------------+
+   | 3     | VXLAN      |
+   +-------+------------+
+   | 4     | Ethernet   |
+   +-------+------------+
+   | 5     | IPv6       |
+   +-------+------------+
+   | 6     | TCP        |
+   +-------+------------+
+   | 7     | END        |
+   +-------+------------+
+
+|
+
+.. _table_rte_flow_tcpv4_as_l4_meta:
+
+.. table:: TCPv4 as L4 with meta items
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | VOID     |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | VOID     |
+   +-------+----------+
+   | 3     | IPv4     |
+   +-------+----------+
+   | 4     | TCP      |
+   +-------+----------+
+   | 5     | VOID     |
+   +-------+----------+
+   | 6     | VOID     |
+   +-------+----------+
+   | 7     | END      |
+   +-------+----------+
+
+The above example shows how meta items do not affect packet data matching
+items, as long as those remain stacked properly. The resulting matching
+pattern is identical to "TCPv4 as L4".
+
+.. _table_rte_flow_udpv6_anywhere:
+
+.. table:: UDPv6 anywhere
+
+   +-------+------+
+   | Index | Item |
+   +=======+======+
+   | 0     | IPv6 |
+   +-------+------+
+   | 1     | UDP  |
+   +-------+------+
+   | 2     | END  |
+   +-------+------+
+
+If supported by the PMD, omitting one or several protocol layers at the
+bottom of the stack as in the above example (missing an Ethernet
+specification) enables looking up anywhere in packets.
+
+It is unspecified whether the payload of supported encapsulations
+(e.g. VXLAN payload) is matched by such a pattern, which may apply to inner,
+outer or both packets.
+
+.. _table_rte_flow_invalid_l3:
+
+.. table:: Invalid, missing L3
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | UDP      |
+   +-------+----------+
+   | 2     | END      |
+   +-------+----------+
+
+The above pattern is invalid due to a missing L3 specification between L2
+(Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the
+top of the stack.
+
+Meta item types
+~~~~~~~~~~~~~~~
+
+They match meta-data or affect pattern processing instead of matching packet
+data directly, most of them do not need a specification structure. This
+particularity allows them to be specified anywhere in the stack without
+causing any side effect.
+
+Item: ``END``
+^^^^^^^^^^^^^
+
+End marker for item lists. Prevents further processing of items, thereby
+ending the pattern.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_end:
+
+.. table:: END
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Item: ``VOID``
+^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_void:
+
+.. table:: VOID
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+One usage example for this type is generating rules that share a common
+prefix quickly without reallocating memory, only by updating item types:
+
+.. _table_rte_flow_item_void_example:
+
+.. table:: TCP, UDP or ICMP as L4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | Ethernet           |
+   +-------+--------------------+
+   | 1     | IPv4               |
+   +-------+------+------+------+
+   | 2     | UDP  | VOID | VOID |
+   +-------+------+------+------+
+   | 3     | VOID | TCP  | VOID |
+   +-------+------+------+------+
+   | 4     | VOID | VOID | ICMP |
+   +-------+------+------+------+
+   | 5     | END                |
+   +-------+--------------------+
+
+Item: ``INVERT``
+^^^^^^^^^^^^^^^^
+
+Inverted matching, i.e. process packets that do not match the pattern.
+
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_invert:
+
+.. table:: INVERT
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Usage example, matching non-TCPv4 packets only:
+
+.. _table_rte_flow_item_invert_example:
+
+.. table:: Anything but TCPv4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | INVERT   |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | IPv4     |
+   +-------+----------+
+   | 3     | TCP      |
+   +-------+----------+
+   | 4     | END      |
+   +-------+----------+
+
+Item: ``PF``
+^^^^^^^^^^^^
+
+Matches packets addressed to the physical function of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: PF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if applied to a VF
+  device.
+- Can be combined with any number of `Item: VF`_ to match both PF and VF
+  traffic.
+- ``spec``, ``last`` and ``mask`` must not be set.
+
+.. _table_rte_flow_item_pf:
+
+.. table:: PF
+
+   +----------+-------+
+   | Field    | Value |
+   +==========+=======+
+   | ``spec`` | unset |
+   +----------+-------+
+   | ``last`` | unset |
+   +----------+-------+
+   | ``mask`` | unset |
+   +----------+-------+
+
+Item: ``VF``
+^^^^^^^^^^^^
+
+Matches packets addressed to a virtual function ID of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: VF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if this causes a VF
+  device to match traffic addressed to a different VF.
+- Can be specified multiple times to match traffic addressed to several VF
+  IDs.
+- Can be combined with a PF item to match both PF and VF traffic.
+
+.. _table_rte_flow_item_vf:
+
+.. table:: VF
+
+   +----------+----------+---------------------------+
+   | Field    | Subfield | Value                     |
+   +==========+==========+===========================+
+   | ``spec`` | ``id``   | destination VF ID         |
+   +----------+----------+---------------------------+
+   | ``last`` | ``id``   | upper range value         |
+   +----------+----------+---------------------------+
+   | ``mask`` | ``id``   | zeroed to match any VF ID |
+   +----------+----------+---------------------------+
+
+Item: ``PORT``
+^^^^^^^^^^^^^^
+
+Matches packets coming from the specified physical port of the underlying
+device.
+
+The first PORT item overrides the physical port normally associated with the
+specified DPDK input port (port_id). This item can be provided several times
+to match additional physical ports.
+
+Note that physical ports are not necessarily tied to DPDK input ports
+(port_id) when those are not under DPDK control. Possible values are
+specific to each device, they are not necessarily indexed from zero and may
+not be contiguous.
+
+As a device property, the list of allowed values as well as the value
+associated with a port_id should be retrieved by other means.
+
+.. _table_rte_flow_item_port:
+
+.. table:: PORT
+
+   +----------+-----------+--------------------------------+
+   | Field    | Subfield  | Value                          |
+   +==========+===========+================================+
+   | ``spec`` | ``index`` | physical port index            |
+   +----------+-----------+--------------------------------+
+   | ``last`` | ``index`` | upper range value              |
+   +----------+-----------+--------------------------------+
+   | ``mask`` | ``index`` | zeroed to match any port index |
+   +----------+-----------+--------------------------------+
+
+Data matching item types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most of these are basically protocol header definitions with associated
+bit-masks. They must be specified (stacked) from lowest to highest protocol
+layer to form a matching pattern.
+
+The following list is not exhaustive, new protocols will be added in the
+future.
+
+Item: ``ANY``
+^^^^^^^^^^^^^
+
+Matches any protocol in place of the current layer, a single ANY may also
+stand for several protocol layers.
+
+This is usually specified as the first pattern item when looking for a
+protocol anywhere in a packet.
+
+.. _table_rte_flow_item_any:
+
+.. table:: ANY
+
+   +----------+----------+--------------------------------------+
+   | Field    | Subfield | Value                                |
+   +==========+==========+======================================+
+   | ``spec`` | ``num``  | number of layers covered             |
+   +----------+----------+--------------------------------------+
+   | ``last`` | ``num``  | upper range value                    |
+   +----------+----------+--------------------------------------+
+   | ``mask`` | ``num``  | zeroed to cover any number of layers |
+   +----------+----------+--------------------------------------+
+
+Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
+and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
+or IPv6) matched by the second ANY specification:
+
+.. _table_rte_flow_item_any_example:
+
+.. table:: TCP in VXLAN with wildcards
+
+   +-------+------+----------+----------+-------+
+   | Index | Item | Field    | Subfield | Value |
+   +=======+======+==========+==========+=======+
+   | 0     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 1     | ANY  | ``spec`` | ``num``  | 2     |
+   +-------+------+----------+----------+-------+
+   | 2     | VXLAN                              |
+   +-------+------------------------------------+
+   | 3     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 4     | ANY  | ``spec`` | ``num``  | 1     |
+   +-------+------+----------+----------+-------+
+   | 5     | TCP                                |
+   +-------+------------------------------------+
+   | 6     | END                                |
+   +-------+------------------------------------+
+
+Item: ``RAW``
+^^^^^^^^^^^^^
+
+Matches a byte string of a given length at a given offset.
+
+Offset is either absolute (using the start of the packet) or relative to the
+end of the previous matched item in the stack, in which case negative values
+are allowed.
+
+If search is enabled, offset is used as the starting point. The search area
+can be delimited by setting limit to a nonzero value, which is the maximum
+number of bytes after offset where the pattern may start.
+
+Matching a zero-length pattern is allowed, doing so resets the relative
+offset for subsequent items.
+
+- This type does not support ranges (``last`` field).
+
+.. _table_rte_flow_item_raw:
+
+.. table:: RAW
+
+   +----------+--------------+-------------------------------------------------+
+   | Field    | Subfield     | Value                                           |
+   +==========+==============+=================================================+
+   | ``spec`` | ``relative`` | look for pattern after the previous item        |
+   |          +--------------+-------------------------------------------------+
+   |          | ``search``   | search pattern from offset (see also ``limit``) |
+   |          +--------------+-------------------------------------------------+
+   |          | ``reserved`` | reserved, must be set to zero                   |
+   |          +--------------+-------------------------------------------------+
+   |          | ``offset``   | absolute or relative offset for ``pattern``     |
+   |          +--------------+-------------------------------------------------+
+   |          | ``limit``    | search area limit for start of ``pattern``      |
+   |          +--------------+-------------------------------------------------+
+   |          | ``length``   | ``pattern`` length                              |
+   |          +--------------+-------------------------------------------------+
+   |          | ``pattern``  | byte string to look for                         |
+   +----------+--------------+-------------------------------------------------+
+   | ``last`` | if specified, either all 0 or with the same values as ``spec`` |
+   +----------+----------------------------------------------------------------+
+   | ``mask`` | bit-mask applied to ``spec`` values with usual behavior        |
+   +----------+----------------------------------------------------------------+
+
+Example pattern looking for several strings at various offsets of a UDP
+payload, using combined RAW items:
+
+.. _table_rte_flow_item_raw_example:
+
+.. table:: UDP payload matching
+
+   +-------+------+----------+--------------+-------+
+   | Index | Item | Field    | Subfield     | Value |
+   +=======+======+==========+==============+=======+
+   | 0     | Ethernet                               |
+   +-------+----------------------------------------+
+   | 1     | IPv4                                   |
+   +-------+----------------------------------------+
+   | 2     | UDP                                    |
+   +-------+------+----------+--------------+-------+
+   | 3     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 10    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "foo" |
+   +-------+------+----------+--------------+-------+
+   | 4     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 20    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "bar" |
+   +-------+------+----------+--------------+-------+
+   | 5     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | -29   |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "baz" |
+   +-------+------+----------+--------------+-------+
+   | 6     | END                                    |
+   +-------+----------------------------------------+
+
+This translates to:
+
+- Locate "foo" at least 10 bytes deep inside UDP payload.
+- Locate "bar" after "foo" plus 20 bytes.
+- Locate "baz" after "bar" minus 29 bytes.
+
+Such a packet may be represented as follows (not to scale)::
+
+ 0                     >= 10 B           == 20 B
+ |                  |<--------->|     |<--------->|
+ |                  |           |     |           |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+ | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+                          |                             |
+                          |<--------------------------->|
+                                      == 29 B
+
+Note that matching subsequent pattern items would resume after "baz", not
+"bar" since matching is always performed after the previous item of the
+stack.
+
+Item: ``ETH``
+^^^^^^^^^^^^^
+
+Matches an Ethernet header.
+
+- ``dst``: destination MAC.
+- ``src``: source MAC.
+- ``type``: EtherType.
+
+Item: ``VLAN``
+^^^^^^^^^^^^^^
+
+Matches an 802.1Q/ad VLAN tag.
+
+- ``tpid``: tag protocol identifier.
+- ``tci``: tag control information.
+
+Item: ``IPV4``
+^^^^^^^^^^^^^^
+
+Matches an IPv4 header.
+
+Note: IPv4 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv4 header definition (``rte_ip.h``).
+
+Item: ``IPV6``
+^^^^^^^^^^^^^^
+
+Matches an IPv6 header.
+
+Note: IPv6 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv6 header definition (``rte_ip.h``).
+
+Item: ``ICMP``
+^^^^^^^^^^^^^^
+
+Matches an ICMP header.
+
+- ``hdr``: ICMP header definition (``rte_icmp.h``).
+
+Item: ``UDP``
+^^^^^^^^^^^^^
+
+Matches a UDP header.
+
+- ``hdr``: UDP header definition (``rte_udp.h``).
+
+Item: ``TCP``
+^^^^^^^^^^^^^
+
+Matches a TCP header.
+
+- ``hdr``: TCP header definition (``rte_tcp.h``).
+
+Item: ``SCTP``
+^^^^^^^^^^^^^^
+
+Matches a SCTP header.
+
+- ``hdr``: SCTP header definition (``rte_sctp.h``).
+
+Item: ``VXLAN``
+^^^^^^^^^^^^^^^
+
+Matches a VXLAN header (RFC 7348).
+
+- ``flags``: normally 0x08 (I flag).
+- ``rsvd0``: reserved, normally 0x000000.
+- ``vni``: VXLAN network identifier.
+- ``rsvd1``: reserved, normally 0x00.
+
+Actions
+~~~~~~~
+
+Each possible action is represented by a type. Some have associated
+configuration structures. Several actions combined in a list can be affected
+to a flow rule. That list is not ordered.
+
+They fall in three categories:
+
+- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+  processing matched packets by subsequent flow rules, unless overridden
+  with PASSTHRU.
+
+- Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for
+  additional processing by subsequent flow rules.
+
+- Other non-terminating meta actions that do not affect the fate of packets
+  (END, VOID, MARK, FLAG, COUNT).
+
+When several actions are combined in a flow rule, they should all have
+different types (e.g. dropping a packet twice is not possible).
+
+Only the last action of a given type is taken into account. PMDs still
+perform error checking on the entire list.
+
+Like matching patterns, action lists are terminated by END items.
+
+*Note that PASSTHRU is the only action able to override a terminating rule.*
+
+Example of action that redirects packets to queue index 10:
+
+.. _table_rte_flow_action_example:
+
+.. table:: Queue action
+
+   +-----------+-------+
+   | Field     | Value |
+   +===========+=======+
+   | ``index`` | 10    |
+   +-----------+-------+
+
+Action lists examples, their order is not significant, applications must
+consider all actions to be performed simultaneously:
+
+.. _table_rte_flow_count_and_drop:
+
+.. table:: Count and drop
+
+   +-------+--------+
+   | Index | Action |
+   +=======+========+
+   | 0     | COUNT  |
+   +-------+--------+
+   | 1     | DROP   |
+   +-------+--------+
+   | 2     | END    |
+   +-------+--------+
+
+|
+
+.. _table_rte_flow_mark_count_redirect:
+
+.. table:: Mark, count and redirect
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | MARK   | ``mark``  | 0x2a  |
+   +-------+--------+-----------+-------+
+   | 1     | COUNT                      |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 10    |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+|
+
+.. _table_rte_flow_redirect_queue_5:
+
+.. table:: Redirect to queue 5
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | DROP                       |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+In the above example, considering both actions are performed simultaneously,
+the end result is that only QUEUE has any effect.
+
+.. _table_rte_flow_redirect_queue_3:
+
+.. table:: Redirect to queue 3
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 1     | VOID                       |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 3     |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+As previously described, only the last action of a given type found in the
+list is taken into account. The above example also shows that VOID is
+ignored.
+
+Action types
+~~~~~~~~~~~~
+
+Common action types are described in this section. Like pattern item types,
+this list is not exhaustive as new actions will be added in the future.
+
+Action: ``END``
+^^^^^^^^^^^^^^^
+
+End marker for action lists. Prevents further processing of actions, thereby
+ending the list.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_end:
+
+.. table:: END
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VOID``
+^^^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_void:
+
+.. table:: VOID
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``PASSTHRU``
+^^^^^^^^^^^^^^^^^^^^
+
+Leaves packets up for additional processing by subsequent flow rules. This
+is the default when a rule does not contain a terminating action, but can be
+specified to force a rule to become non-terminating.
+
+- No configurable properties.
+
+.. _table_rte_flow_action_passthru:
+
+.. table:: PASSTHRU
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Example to copy a packet to a queue and continue processing by subsequent
+flow rules:
+
+.. _table_rte_flow_action_passthru_example:
+
+.. table:: Copy to queue 8
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | PASSTHRU                   |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 8     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+Action: ``MARK``
+^^^^^^^^^^^^^^^^
+
+Attaches a 32 bit value to packets.
+
+This value is arbitrary and application-defined. For compatibility with FDIR
+it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
+also set in ``ol_flags``.
+
+.. _table_rte_flow_action_mark:
+
+.. table:: MARK
+
+   +--------+-------------------------------------+
+   | Field  | Value                               |
+   +========+=====================================+
+   | ``id`` | 32 bit value to return with packets |
+   +--------+-------------------------------------+
+
+Action: ``FLAG``
+^^^^^^^^^^^^^^^^
+
+Flag packets. Similar to `Action: MARK`_ but only affects ``ol_flags``.
+
+- No configurable properties.
+
+Note: a distinctive flag must be defined for it.
+
+.. _table_rte_flow_action_flag:
+
+.. table:: FLAG
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``QUEUE``
+^^^^^^^^^^^^^^^^^
+
+Assigns packets to a given queue index.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_queue:
+
+.. table:: QUEUE
+
+   +-----------+--------------------+
+   | Field     | Value              |
+   +===========+====================+
+   | ``index`` | queue index to use |
+   +-----------+--------------------+
+
+Action: ``DROP``
+^^^^^^^^^^^^^^^^
+
+Drop packets.
+
+- No configurable properties.
+- Terminating by default.
+- PASSTHRU overrides this action if both are specified.
+
+.. _table_rte_flow_action_drop:
+
+.. table:: DROP
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``COUNT``
+^^^^^^^^^^^^^^^^^
+
+Enables counters for this rule.
+
+These counters can be retrieved and reset through ``rte_flow_query()``, see
+``struct rte_flow_query_count``.
+
+- Counters can be retrieved with ``rte_flow_query()``.
+- No configurable properties.
+
+.. _table_rte_flow_action_count:
+
+.. table:: COUNT
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Query structure to retrieve and reset flow rule counters:
+
+.. _table_rte_flow_query_count:
+
+.. table:: COUNT query
+
+   +---------------+-----+-----------------------------------+
+   | Field         | I/O | Value                             |
+   +===============+=====+===================================+
+   | ``reset``     | in  | reset counter after query         |
+   +---------------+-----+-----------------------------------+
+   | ``hits_set``  | out | ``hits`` field is set             |
+   +---------------+-----+-----------------------------------+
+   | ``bytes_set`` | out | ``bytes`` field is set            |
+   +---------------+-----+-----------------------------------+
+   | ``hits``      | out | number of hits for this rule      |
+   +---------------+-----+-----------------------------------+
+   | ``bytes``     | out | number of bytes through this rule |
+   +---------------+-----+-----------------------------------+
+
+Action: ``DUP``
+^^^^^^^^^^^^^^^
+
+Duplicates packets to a given queue index.
+
+This is normally combined with QUEUE, however when used alone, it is
+actually similar to QUEUE + PASSTHRU.
+
+- Non-terminating by default.
+
+.. _table_rte_flow_action_dup:
+
+.. table:: DUP
+
+   +-----------+------------------------------------+
+   | Field     | Value                              |
+   +===========+====================================+
+   | ``index`` | queue index to duplicate packet to |
+   +-----------+------------------------------------+
+
+Action: ``RSS``
+^^^^^^^^^^^^^^^
+
+Similar to QUEUE, except RSS is additionally performed on packets to spread
+them among several queues according to the provided parameters.
+
+Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
+however it conflicts with `Action: MARK`_ as they share the same space. When
+both actions are specified, the RSS hash is discarded and
+``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
+structure should eventually evolve to store both.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_rss:
+
+.. table:: RSS
+
+   +--------------+------------------------------+
+   | Field        | Value                        |
+   +==============+==============================+
+   | ``rss_conf`` | RSS parameters               |
+   +--------------+------------------------------+
+   | ``num``      | number of entries in queue[] |
+   +--------------+------------------------------+
+   | ``queue[]``  | queue indices to use         |
+   +--------------+------------------------------+
+
+Action: ``PF``
+^^^^^^^^^^^^^^
+
+Redirects packets to the physical function (PF) of the current device.
+
+- No configurable properties.
+- Terminating by default.
+
+.. _table_rte_flow_action_pf:
+
+.. table:: PF
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VF``
+^^^^^^^^^^^^^^
+
+Redirects packets to a virtual function (VF) of the current device.
+
+Packets matched by a VF pattern item can be redirected to their original VF
+ID instead of the specified one. This parameter may not be available and is
+not guaranteed to work properly if the VF part is matched by a prior flow
+rule or if packets are not addressed to a VF in the first place.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_vf:
+
+.. table:: VF
+
+   +--------------+--------------------------------+
+   | Field        | Value                          |
+   +==============+================================+
+   | ``original`` | use original VF ID if possible |
+   +--------------+--------------------------------+
+   | ``vf``       | VF ID to redirect packets to   |
+   +--------------+--------------------------------+
+
+Negative types
+~~~~~~~~~~~~~~
+
+All specified pattern items (``enum rte_flow_item_type``) and actions
+(``enum rte_flow_action_type``) use positive identifiers.
+
+The negative space is reserved for dynamic types generated by PMDs during
+run-time. PMDs may encounter them as a result but must not accept negative
+identifiers they are not aware of.
+
+A method to generate them remains to be defined.
+
+Planned types
+~~~~~~~~~~~~~
+
+Pattern item types will be added as new protocols are implemented.
+
+Variable headers support through dedicated pattern items, for example in
+order to match specific IPv4 options and IPv6 extension headers would be
+stacked after IPv4/IPv6 items.
+
+Other action types are planned but are not defined yet. These include the
+ability to alter packet data in several ways, such as performing
+encapsulation/decapsulation of tunnel headers.
+
+Rules management
+----------------
+
+A rather simple API with few functions is provided to fully manage flow
+rules.
+
+Each created flow rule is associated with an opaque, PMD-specific handle
+pointer. The application is responsible for keeping it until the rule is
+destroyed.
+
+Flows rules are represented by ``struct rte_flow`` objects.
+
+Validation
+~~~~~~~~~~
+
+Given that expressing a definite set of device capabilities is not
+practical, a dedicated function is provided to check if a flow rule is
+supported and can be created.
+
+.. code-block:: c
+
+   int
+   rte_flow_validate(uint8_t port_id,
+                     const struct rte_flow_attr *attr,
+                     const struct rte_flow_item pattern[],
+                     const struct rte_flow_action actions[],
+                     struct rte_flow_error *error);
+
+While this function has no effect on the target device, the flow rule is
+validated against its current configuration state and the returned value
+should be considered valid by the caller for that state only.
+
+The returned value is guaranteed to remain valid only as long as no
+successful calls to ``rte_flow_create()`` or ``rte_flow_destroy()`` are made
+in the meantime and no device parameter affecting flow rules in any way are
+modified, due to possible collisions or resource limitations (although in
+such cases ``EINVAL`` should not be returned).
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 if flow rule is valid and can be created. A negative errno value
+  otherwise (``rte_errno`` is also set), the following errors are defined.
+- ``-ENOSYS``: underlying device does not support this functionality.
+- ``-EINVAL``: unknown or invalid rule specification.
+- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
+  bit-masks are unsupported).
+- ``-EEXIST``: collision with an existing rule.
+- ``-ENOMEM``: not enough resources.
+- ``-EBUSY``: action cannot be performed due to busy device resources, may
+  succeed if the affected queues or even the entire port are in a stopped
+  state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).
+
+Creation
+~~~~~~~~
+
+Creating a flow rule is similar to validating one, except the rule is
+actually created and a handle returned.
+
+.. code-block:: c
+
+   struct rte_flow *
+   rte_flow_create(uint8_t port_id,
+                   const struct rte_flow_attr *attr,
+                   const struct rte_flow_item pattern[],
+                   const struct rte_flow_action *actions[],
+                   struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
+to the positive version of one of the error codes defined for
+``rte_flow_validate()``.
+
+Destruction
+~~~~~~~~~~~
+
+Flow rules destruction is not automatic, and a queue or a port should not be
+released if any are still attached to them. Applications must take care of
+performing this step before releasing resources.
+
+.. code-block:: c
+
+   int
+   rte_flow_destroy(uint8_t port_id,
+                    struct rte_flow *flow,
+                    struct rte_flow_error *error);
+
+
+Failure to destroy a flow rule handle may occur when other flow rules depend
+on it, and destroying it would result in an inconsistent state.
+
+This function is only guaranteed to succeed if handles are destroyed in
+reverse order of their creation.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to destroy.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Flush
+~~~~~
+
+Convenience function to destroy all flow rule handles associated with a
+port. They are released as with successive calls to ``rte_flow_destroy()``.
+
+.. code-block:: c
+
+   int
+   rte_flow_flush(uint8_t port_id,
+                  struct rte_flow_error *error);
+
+In the unlikely event of failure, handles are still considered destroyed and
+no longer valid but the port must be assumed to be in an inconsistent state.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Query
+~~~~~
+
+Query an existing flow rule.
+
+This function allows retrieving flow-specific data such as counters. Data
+is gathered by special actions which must be present in the flow rule
+definition.
+
+.. code-block:: c
+
+   int
+   rte_flow_query(uint8_t port_id,
+                  struct rte_flow *flow,
+                  enum rte_flow_action_type action,
+                  void *data,
+                  struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to query.
+- ``action``: action type to query.
+- ``data``: pointer to storage for the associated query data type.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Verbose error reporting
+-----------------------
+
+The defined *errno* values may not be accurate enough for users or
+application developers who want to investigate issues related to flow rules
+management. A dedicated error object is defined for this purpose:
+
+.. code-block:: c
+
+   enum rte_flow_error_type {
+       RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+       RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+       RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+       RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+       RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+       RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+       RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+       RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+   };
+
+   struct rte_flow_error {
+       enum rte_flow_error_type type; /**< Cause field and error types. */
+       const void *cause; /**< Object responsible for the error. */
+       const char *message; /**< Human-readable error message. */
+   };
+
+Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case
+remaining fields can be ignored. Other error types describe the type of the
+object pointed by ``cause``.
+
+If non-NULL, ``cause`` points to the object responsible for the error. For a
+flow rule, this may be a pattern item or an individual action.
+
+If non-NULL, ``message`` provides a human-readable error message.
+
+This object is normally allocated by applications and set by PMDs in case of
+error, the message points to a constant string which does not need to be
+freed by the application, however its pointer can be considered valid only
+as long as its associated DPDK port remains configured. Closing the
+underlying device or unloading the PMD invalidates it.
+
+Caveats
+-------
+
+- DPDK does not keep track of flow rules definitions or flow rule objects
+  automatically. Applications may keep track of the former and must keep
+  track of the latter. PMDs may also do it for internal needs, however this
+  must not be relied on by applications.
+
+- Flow rules are not maintained between successive port initializations. An
+  application exiting without releasing them and restarting must re-create
+  them from scratch.
+
+- API operations are synchronous and blocking (``EAGAIN`` cannot be
+  returned).
+
+- There is no provision for reentrancy/multi-thread safety, although nothing
+  should prevent different devices from being configured at the same
+  time. PMDs may protect their control path functions accordingly.
+
+- Stopping the data path (TX/RX) should not be necessary when managing flow
+  rules. If this cannot be achieved naturally or with workarounds (such as
+  temporarily replacing the burst function pointers), an appropriate error
+  code must be returned (``EBUSY``).
+
+- PMDs, not applications, are responsible for maintaining flow rules
+  configuration when stopping and restarting a port or performing other
+  actions which may affect them. They can only be destroyed explicitly by
+  applications.
+
+For devices exposing multiple ports sharing global settings affected by flow
+rules:
+
+- All ports under DPDK control must behave consistently, PMDs are
+  responsible for making sure that existing flow rules on a port are not
+  affected by other ports.
+
+- Ports not under DPDK control (unaffected or handled by other applications)
+  are user's responsibility. They may affect existing flow rules and cause
+  undefined behavior. PMDs aware of this may prevent flow rules creation
+  altogether in such cases.
+
+PMD interface
+-------------
+
+The PMD interface is defined in ``rte_flow_driver.h``. It is not subject to
+API/ABI versioning constraints as it is not exposed to applications and may
+evolve independently.
+
+It is currently implemented on top of the legacy filtering framework through
+filter type *RTE_ETH_FILTER_GENERIC* that accepts the single operation
+*RTE_ETH_FILTER_GET* to return PMD-specific *rte_flow* callbacks wrapped
+inside ``struct rte_flow_ops``.
+
+This overhead is temporarily necessary in order to keep compatibility with
+the legacy filtering framework, which should eventually disappear.
+
+- PMD callbacks implement exactly the interface described in `Rules
+  management`_, except for the port ID argument which has already been
+  converted to a pointer to the underlying ``struct rte_eth_dev``.
+
+- Public API functions do not process flow rules definitions at all before
+  calling PMD functions (no basic error checking, no validation
+  whatsoever). They only make sure these callbacks are non-NULL or return
+  the ``ENOSYS`` (function not supported) error.
+
+This interface additionally defines the following helper functions:
+
+- ``rte_flow_ops_get()``: get generic flow operations structure from a
+  port.
+
+- ``rte_flow_error_set()``: initialize generic flow error structure.
+
+More will be added over time.
+
+Device compatibility
+--------------------
+
+No known implementation supports all the described features.
+
+Unsupported features or combinations are not expected to be fully emulated
+in software by PMDs for performance reasons. Partially supported features
+may be completed in software as long as hardware performs most of the work
+(such as queue redirection and packet recognition).
+
+However PMDs are expected to do their best to satisfy application requests
+by working around hardware limitations as long as doing so does not affect
+the behavior of existing flow rules.
+
+The following sections provide a few examples of such cases and describe how
+PMDs should handle them, they are based on limitations built into the
+previous APIs.
+
+Global bit-masks
+~~~~~~~~~~~~~~~~
+
+Each flow rule comes with its own, per-layer bit-masks, while hardware may
+support only a single, device-wide bit-mask for a given layer type, so that
+two IPv4 rules cannot use different bit-masks.
+
+The expected behavior in this case is that PMDs automatically configure
+global bit-masks according to the needs of the first flow rule created.
+
+Subsequent rules are allowed only if their bit-masks match those, the
+``EEXIST`` error code should be returned otherwise.
+
+Unsupported layer types
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Many protocols can be simulated by crafting patterns with the `Item: RAW`_
+type.
+
+PMDs can rely on this capability to simulate support for protocols with
+headers not directly recognized by hardware.
+
+``ANY`` pattern item
+~~~~~~~~~~~~~~~~~~~~
+
+This pattern item stands for anything, which can be difficult to translate
+to something hardware would understand, particularly if followed by more
+specific types.
+
+Consider the following pattern:
+
+.. _table_rte_flow_unsupported_any:
+
+.. table:: Pattern with ANY as L3
+
+   +-------+-----------------------+
+   | Index | Item                  |
+   +=======+=======================+
+   | 0     | ETHER                 |
+   +-------+-----+---------+-------+
+   | 1     | ANY | ``num`` | ``1`` |
+   +-------+-----+---------+-------+
+   | 2     | TCP                   |
+   +-------+-----------------------+
+   | 3     | END                   |
+   +-------+-----------------------+
+
+Knowing that TCP does not make sense with something other than IPv4 and IPv6
+as L3, such a pattern may be translated to two flow rules instead:
+
+.. _table_rte_flow_unsupported_any_ipv4:
+
+.. table:: ANY replaced with IPV4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV4 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+|
+
+.. _table_rte_flow_unsupported_any_ipv6:
+
+.. table:: ANY replaced with IPV6
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV6 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+Note that as soon as a ANY rule covers several layers, this approach may
+yield a large number of hidden flow rules. It is thus suggested to only
+support the most common scenarios (anything as L2 and/or L3).
+
+Unsupported actions
+~~~~~~~~~~~~~~~~~~~
+
+- When combined with `Action: QUEUE`_, packet counting (`Action: COUNT`_)
+  and tagging (`Action: MARK`_ or `Action: FLAG`_) may be implemented in
+  software as long as the target queue is used by a single rule.
+
+- A rule specifying both `Action: DUP`_ + `Action: QUEUE`_ may be translated
+  to two hidden rules combining `Action: QUEUE`_ and `Action: PASSTHRU`_.
+
+- When a single target queue is provided, `Action: RSS`_ can also be
+  implemented through `Action: QUEUE`_.
+
+Flow rules priority
+~~~~~~~~~~~~~~~~~~~
+
+While it would naturally make sense, flow rules cannot be assumed to be
+processed by hardware in the same order as their creation for several
+reasons:
+
+- They may be managed internally as a tree or a hash table instead of a
+  list.
+- Removing a flow rule before adding another one can either put the new rule
+  at the end of the list or reuse a freed entry.
+- Duplication may occur when packets are matched by several rules.
+
+For overlapping rules (particularly in order to use `Action: PASSTHRU`_)
+predictable behavior is only guaranteed by using different priority levels.
+
+Priority levels are not necessarily implemented in hardware, or may be
+severely limited (e.g. a single priority bit).
+
+For these reasons, priority levels may be implemented purely in software by
+PMDs.
+
+- For devices expecting flow rules to be added in the correct order, PMDs
+  may destroy and re-create existing rules after adding a new one with
+  a higher priority.
+
+- A configurable number of dummy or empty rules can be created at
+  initialization time to save high priority slots for later.
+
+- In order to save priority levels, PMDs may evaluate whether rules are
+  likely to collide and adjust their priority accordingly.
+
+Future evolutions
+-----------------
+
+- A device profile selection function which could be used to force a
+  permanent profile instead of relying on its automatic configuration based
+  on existing flow rules.
+
+- A method to optimize *rte_flow* rules with specific pattern items and
+  action types generated on the fly by PMDs. DPDK should assign negative
+  numbers to these in order to not collide with the existing types. See
+  `Negative types`_.
+
+- Adding specific egress pattern items and actions as described in
+  `Attribute: Traffic direction`_.
+
+- Optional software fallback when PMDs are unable to handle requested flow
+  rules so applications do not have to implement their own.
+
+API migration
+-------------
+
+Exhaustive list of deprecated filter types (normally prefixed with
+*RTE_ETH_FILTER_*) found in ``rte_eth_ctrl.h`` and methods to convert them
+to *rte_flow* rules.
+
+``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*MACVLAN* can be translated to a basic `Item: ETH`_ flow rule with a
+terminating `Action: VF`_ or `Action: PF`_.
+
+.. _table_rte_flow_migration_macvlan:
+
+.. table:: MACVLAN conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | VF,     |
+   |   |     +----------+-----+ PF      |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*ETHERTYPE* is basically an `Item: ETH`_ flow rule with a terminating
+`Action: QUEUE`_ or `Action: DROP`_.
+
+.. _table_rte_flow_migration_ethertype:
+
+.. table:: ETHERTYPE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | QUEUE,  |
+   |   |     +----------+-----+ DROP    |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``FLEXIBLE`` to ``RAW`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FLEXIBLE* can be translated to one `Item: RAW`_ pattern with a terminating
+`Action: QUEUE`_ and a defined priority level.
+
+.. _table_rte_flow_migration_flexible:
+
+.. table:: FLEXIBLE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | RAW | ``spec`` | any | QUEUE   |
+   |   |     +----------+-----+         |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``SYN`` to ``TCP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*SYN* is a `Item: TCP`_ rule with only the ``syn`` bit enabled and masked,
+and a terminating `Action: QUEUE`_.
+
+Priority level can be set to simulate the high priority bit.
+
+.. _table_rte_flow_migration_syn:
+
+.. table:: SYN conversion
+
+   +-----------------------------------+---------+
+   | Pattern                           | Actions |
+   +===+======+==========+=============+=========+
+   | 0 | ETH  | ``spec`` | unset       | QUEUE   |
+   |   |      +----------+-------------+         |
+   |   |      | ``last`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+-------------+         |
+   | 1 | IPV4 | ``spec`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+---------+---+         |
+   | 2 | TCP  | ``spec`` | ``syn`` | 1 |         |
+   |   |      +----------+---------+---+         |
+   |   |      | ``mask`` | ``syn`` | 1 |         |
+   +---+------+----------+---------+---+---------+
+   | 3 | END                           | END     |
+   +---+-------------------------------+---------+
+
+``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*NTUPLE* is similar to specifying an empty L2, `Item: IPV4`_ as L3 with
+`Item: TCP`_ or `Item: UDP`_ as L4 and a terminating `Action: QUEUE`_.
+
+A priority level can be specified as well.
+
+.. _table_rte_flow_migration_ntuple:
+
+.. table:: NTUPLE conversion
+
+   +-----------------------------+---------+
+   | Pattern                     | Actions |
+   +===+======+==========+=======+=========+
+   | 0 | ETH  | ``spec`` | unset | QUEUE   |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | unset |         |
+   +---+------+----------+-------+         |
+   | 1 | IPV4 | ``spec`` | any   |         |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 2 | TCP, | ``spec`` | any   |         |
+   |   | UDP  +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+---------+
+   | 3 | END                     | END     |
+   +---+-------------------------+---------+
+
+``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*TUNNEL* matches common IPv4 and IPv6 L3/L4-based tunnel types.
+
+In the following table, `Item: ANY`_ is used to cover the optional L4.
+
+.. _table_rte_flow_migration_tunnel:
+
+.. table:: TUNNEL conversion
+
+   +--------------------------------------+---------+
+   | Pattern                              | Actions |
+   +===+=========+==========+=============+=========+
+   | 0 | ETH     | ``spec`` | any         | QUEUE   |
+   |   |         +----------+-------------+         |
+   |   |         | ``last`` | unset       |         |
+   |   |         +----------+-------------+         |
+   |   |         | ``mask`` | any         |         |
+   +---+---------+----------+-------------+         |
+   | 1 | IPV4,   | ``spec`` | any         |         |
+   |   | IPV6    +----------+-------------+         |
+   |   |         | ``last`` | unset       |         |
+   |   |         +----------+-------------+         |
+   |   |         | ``mask`` | any         |         |
+   +---+---------+----------+-------------+         |
+   | 2 | ANY     | ``spec`` | any         |         |
+   |   |         +----------+-------------+         |
+   |   |         | ``last`` | unset       |         |
+   |   |         +----------+---------+---+         |
+   |   |         | ``mask`` | ``num`` | 0 |         |
+   +---+---------+----------+---------+---+         |
+   | 3 | VXLAN,  | ``spec`` | any         |         |
+   |   | GENEVE, +----------+-------------+         |
+   |   | TEREDO, | ``last`` | unset       |         |
+   |   | NVGRE,  +----------+-------------+         |
+   |   | GRE,    | ``mask`` | any         |         |
+   |   | ...     |          |             |         |
+   |   |         |          |             |         |
+   |   |         |          |             |         |
+   +---+---------+----------+-------------+---------+
+   | 4 | END                              | END     |
+   +---+----------------------------------+---------+
+
+``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FDIR* is more complex than any other type, there are several methods to
+emulate its functionality. It is summarized for the most part in the table
+below.
+
+A few features are intentionally not supported:
+
+- The ability to configure the matching input set and masks for the entire
+  device, PMDs should take care of it automatically according to the
+  requested flow rules.
+
+  For example if a device supports only one bit-mask per protocol type,
+  source/address IPv4 bit-masks can be made immutable by the first created
+  rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
+  compatible.
+
+  Note that only protocol bit-masks affected by existing flow rules are
+  immutable, others can be changed later. They become mutable again after
+  the related flow rules are destroyed.
+
+- Returning four or eight bytes of matched data when using flex bytes
+  filtering. Although a specific action could implement it, it conflicts
+  with the much more useful 32 bits tagging on devices that support it.
+
+- Side effects on RSS processing of the entire device. Flow rules that
+  conflict with the current device configuration should not be
+  allowed. Similarly, device configuration should not be allowed when it
+  affects existing flow rules.
+
+- Device modes of operation. "none" is unsupported since filtering cannot be
+  disabled as long as a flow rule is present.
+
+- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
+  according to the created flow rules.
+
+- Signature mode of operation is not defined but could be handled through a
+  specific item type if needed.
+
+.. _table_rte_flow_migration_fdir:
+
+.. table:: FDIR conversion
+
+   +---------------------------------+------------+
+   | Pattern                         | Actions    |
+   +===+============+==========+=====+============+
+   | 0 | ETH,       | ``spec`` | any | QUEUE,     |
+   |   | RAW        +----------+-----+ DROP,      |
+   |   |            | ``last`` | N/A | PASSTHRU   |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+------------+
+   | 1 | IPV4,      | ``spec`` | any | MARK       |
+   |   | IPV6       +----------+-----+            |
+   |   |            | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+            |
+   | 2 | TCP,       | ``spec`` | any |            |
+   |   | UDP,       +----------+-----+            |
+   |   | SCTP       | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+            |
+   | 3 | VF,        | ``spec`` | any |            |
+   |   | PF         +----------+-----+            |
+   |   | (optional) | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+------------+
+   | 4 | END                         | END        |
+   +---+-----------------------------+------------+
+
+``HASH``
+~~~~~~~~
+
+There is no counterpart to this filter type because it translates to a
+global device setting instead of a pattern item. Device settings are
+automatically set according to the created flow rules.
+
+``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All packets are matched. This type alters incoming packets to encapsulate
+them in a chosen tunnel type, optionally redirect them to a VF as well.
+
+The destination pool for tag based forwarding can be emulated with other
+flow rules using `Action: DUP`_.
+
+.. _table_rte_flow_migration_l2tunnel:
+
+.. table:: L2_TUNNEL conversion
+
+   +---------------------------+------------+
+   | Pattern                   | Actions    |
+   +===+======+==========+=====+============+
+   | 0 | VOID | ``spec`` | N/A | VXLAN,     |
+   |   |      |          |     | GENEVE,    |
+   |   |      |          |     | ...        |
+   |   |      +----------+-----+------------+
+   |   |      | ``last`` | N/A | VF         |
+   |   |      +----------+-----+ (optional) |
+   |   |      | ``mask`` | N/A |            |
+   |   |      |          |     |            |
+   +---+------+----------+-----+------------+
+   | 1 | END                   | END        |
+   +---+-----------------------+------------+
-- 
2.1.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v3 01/25] ethdev: introduce generic flow API
  @ 2016-12-19 17:48  2%       ` Adrien Mazarguil
  2016-12-19 17:48  1%       ` [dpdk-dev] [PATCH v3 02/25] doc: add rte_flow prog guide Adrien Mazarguil
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-19 17:48 UTC (permalink / raw)
  To: dev

This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

Benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

Existing filter types will be deprecated and removed in the near future.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                            |   4 +
 doc/api/doxy-api-index.md              |   2 +
 lib/librte_ether/Makefile              |   3 +
 lib/librte_ether/rte_eth_ctrl.h        |   1 +
 lib/librte_ether/rte_ether_version.map |  11 +
 lib/librte_ether/rte_flow.c            | 159 +++++
 lib/librte_ether/rte_flow.h            | 947 ++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow_driver.h     | 182 ++++++
 8 files changed, 1309 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 26d9590..5975cff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -243,6 +243,10 @@ M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
 F: scripts/test-null.sh
 
+Generic flow API
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: lib/librte_ether/rte_flow*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index de65b4c..4951552 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -39,6 +39,8 @@ There are many libraries, so their headers may be grouped by topics:
   [dev]                (@ref rte_dev.h),
   [ethdev]             (@ref rte_ethdev.h),
   [ethctrl]            (@ref rte_eth_ctrl.h),
+  [rte_flow]           (@ref rte_flow.h),
+  [rte_flow_driver]    (@ref rte_flow_driver.h),
   [cryptodev]          (@ref rte_cryptodev.h),
   [devargs]            (@ref rte_devargs.h),
   [bond]               (@ref rte_eth_bond.h),
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..9335361 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
 LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
+SRCS-y += rte_flow.c
 
 #
 # Export include files
@@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
+SYMLINK-y-include += rte_flow_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index fe80eb0..8386904 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -99,6 +99,7 @@ enum rte_filter_type {
 	RTE_ETH_FILTER_FDIR,
 	RTE_ETH_FILTER_HASH,
 	RTE_ETH_FILTER_L2_TUNNEL,
+	RTE_ETH_FILTER_GENERIC,
 	RTE_ETH_FILTER_MAX
 };
 
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..384cdee 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -147,3 +147,14 @@ DPDK_16.11 {
 	rte_eth_dev_pci_remove;
 
 } DPDK_16.07;
+
+DPDK_17.02 {
+	global:
+
+	rte_flow_validate;
+	rte_flow_create;
+	rte_flow_destroy;
+	rte_flow_flush;
+	rte_flow_query;
+
+} DPDK_16.11;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
new file mode 100644
index 0000000..d98fb1b
--- /dev/null
+++ b/lib/librte_ether/rte_flow.c
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_flow_driver.h"
+#include "rte_flow.h"
+
+/* Get generic flow operations structure from a port. */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops;
+	int code;
+
+	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
+		code = ENODEV;
+	else if (unlikely(!dev->dev_ops->filter_ctrl ||
+			  dev->dev_ops->filter_ctrl(dev,
+						    RTE_ETH_FILTER_GENERIC,
+						    RTE_ETH_FILTER_GET,
+						    &ops) ||
+			  !ops))
+		code = ENOSYS;
+	else
+		return ops;
+	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(code));
+	return NULL;
+}
+
+/* Check whether a flow rule can be created on a given port. */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error)
+{
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->validate))
+		return ops->validate(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Create a flow rule on a given port. */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return NULL;
+	if (likely(!!ops->create))
+		return ops->create(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return NULL;
+}
+
+/* Destroy a flow rule on a given port. */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->destroy))
+		return ops->destroy(dev, flow, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Destroy all flow rules associated with a port. */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->flush))
+		return ops->flush(dev, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
+
+/* Query an existing flow rule. */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (!ops)
+		return -rte_errno;
+	if (likely(!!ops->query))
+		return ops->query(dev, flow, action, data, error);
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOSYS));
+	return -rte_errno;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
new file mode 100644
index 0000000..98084ac
--- /dev/null
+++ b/lib/librte_ether/rte_flow.h
@@ -0,0 +1,947 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_H_
+#define RTE_FLOW_H_
+
+/**
+ * @file
+ * RTE generic flow API
+ *
+ * This interface provides the ability to program packet matching and
+ * associated actions in hardware through flow rules.
+ */
+
+#include <rte_arp.h>
+#include <rte_ether.h>
+#include <rte_icmp.h>
+#include <rte_ip.h>
+#include <rte_sctp.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Flow rule attributes.
+ *
+ * Priorities are set on two levels: per group and per rule within groups.
+ *
+ * Lower values denote higher priority, the highest priority for both levels
+ * is 0, so that a rule with priority 0 in group 8 is always matched after a
+ * rule with priority 8 in group 0.
+ *
+ * Although optional, applications are encouraged to group similar rules as
+ * much as possible to fully take advantage of hardware capabilities
+ * (e.g. optimized matching) and work around limitations (e.g. a single
+ * pattern type possibly allowed in a given group).
+ *
+ * Group and priority levels are arbitrary and up to the application, they
+ * do not need to be contiguous nor start from 0, however the maximum number
+ * varies between devices and may be affected by existing flow rules.
+ *
+ * If a packet is matched by several rules of a given group for a given
+ * priority level, the outcome is undefined. It can take any path, may be
+ * duplicated or even cause unrecoverable errors.
+ *
+ * Note that support for more than a single group and priority level is not
+ * guaranteed.
+ *
+ * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+ *
+ * Several pattern items and actions are valid and can be used in both
+ * directions. Those valid for only one direction are described as such.
+ *
+ * At least one direction must be specified.
+ *
+ * Specifying both directions at once for a given rule is not recommended
+ * but may be valid in a few cases (e.g. shared counter).
+ */
+struct rte_flow_attr {
+	uint32_t group; /**< Priority group. */
+	uint32_t priority; /**< Priority level within group. */
+	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
+	uint32_t egress:1; /**< Rule applies to egress traffic. */
+	uint32_t reserved:30; /**< Reserved, must be zero. */
+};
+
+/**
+ * Matching pattern item types.
+ *
+ * Pattern items fall in two categories:
+ *
+ * - Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+ *   IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+ *   specification structure. These must be stacked in the same order as the
+ *   protocol layers to match, starting from the lowest.
+ *
+ * - Matching meta-data or affecting pattern processing (END, VOID, INVERT,
+ *   PF, VF, PORT and so on), often without a specification structure. Since
+ *   they do not match packet contents, these can be specified anywhere
+ *   within item lists without affecting others.
+ *
+ * See the description of individual types for more information. Those
+ * marked with [META] fall into the second category.
+ */
+enum rte_flow_item_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for item lists. Prevents further processing of items,
+	 * thereby ending the pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_VOID,
+
+	/**
+	 * [META]
+	 *
+	 * Inverted matching, i.e. process packets that do not match the
+	 * pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_INVERT,
+
+	/**
+	 * Matches any protocol in place of the current layer, a single ANY
+	 * may also stand for several protocol layers.
+	 *
+	 * See struct rte_flow_item_any.
+	 */
+	RTE_FLOW_ITEM_TYPE_ANY,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to the physical function of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a PF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_PF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to a virtual function ID of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a VF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * See struct rte_flow_item_vf.
+	 */
+	RTE_FLOW_ITEM_TYPE_VF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets coming from the specified physical port of the
+	 * underlying device.
+	 *
+	 * The first PORT item overrides the physical port normally
+	 * associated with the specified DPDK input port (port_id). This
+	 * item can be provided several times to match additional physical
+	 * ports.
+	 *
+	 * See struct rte_flow_item_port.
+	 */
+	RTE_FLOW_ITEM_TYPE_PORT,
+
+	/**
+	 * Matches a byte string of a given length at a given offset.
+	 *
+	 * See struct rte_flow_item_raw.
+	 */
+	RTE_FLOW_ITEM_TYPE_RAW,
+
+	/**
+	 * Matches an Ethernet header.
+	 *
+	 * See struct rte_flow_item_eth.
+	 */
+	RTE_FLOW_ITEM_TYPE_ETH,
+
+	/**
+	 * Matches an 802.1Q/ad VLAN tag.
+	 *
+	 * See struct rte_flow_item_vlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VLAN,
+
+	/**
+	 * Matches an IPv4 header.
+	 *
+	 * See struct rte_flow_item_ipv4.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV4,
+
+	/**
+	 * Matches an IPv6 header.
+	 *
+	 * See struct rte_flow_item_ipv6.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6,
+
+	/**
+	 * Matches an ICMP header.
+	 *
+	 * See struct rte_flow_item_icmp.
+	 */
+	RTE_FLOW_ITEM_TYPE_ICMP,
+
+	/**
+	 * Matches a UDP header.
+	 *
+	 * See struct rte_flow_item_udp.
+	 */
+	RTE_FLOW_ITEM_TYPE_UDP,
+
+	/**
+	 * Matches a TCP header.
+	 *
+	 * See struct rte_flow_item_tcp.
+	 */
+	RTE_FLOW_ITEM_TYPE_TCP,
+
+	/**
+	 * Matches a SCTP header.
+	 *
+	 * See struct rte_flow_item_sctp.
+	 */
+	RTE_FLOW_ITEM_TYPE_SCTP,
+
+	/**
+	 * Matches a VXLAN header.
+	 *
+	 * See struct rte_flow_item_vxlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VXLAN,
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ANY
+ *
+ * Matches any protocol in place of the current layer, a single ANY may also
+ * stand for several protocol layers.
+ *
+ * This is usually specified as the first pattern item when looking for a
+ * protocol anywhere in a packet.
+ *
+ * A zeroed mask stands for any number of layers.
+ */
+struct rte_flow_item_any {
+	uint32_t num; /* Number of layers covered. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VF
+ *
+ * Matches packets addressed to a virtual function ID of the device.
+ *
+ * If the underlying device function differs from the one that would
+ * normally receive the matched traffic, specifying this item prevents it
+ * from reaching that device unless the flow rule contains a VF
+ * action. Packets are not duplicated between device instances by default.
+ *
+ * - Likely to return an error or never match any traffic if this causes a
+ *   VF device to match traffic addressed to a different VF.
+ * - Can be specified multiple times to match traffic addressed to several
+ *   VF IDs.
+ * - Can be combined with a PF item to match both PF and VF traffic.
+ *
+ * A zeroed mask can be used to match any VF ID.
+ */
+struct rte_flow_item_vf {
+	uint32_t id; /**< Destination VF ID. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_PORT
+ *
+ * Matches packets coming from the specified physical port of the underlying
+ * device.
+ *
+ * The first PORT item overrides the physical port normally associated with
+ * the specified DPDK input port (port_id). This item can be provided
+ * several times to match additional physical ports.
+ *
+ * Note that physical ports are not necessarily tied to DPDK input ports
+ * (port_id) when those are not under DPDK control. Possible values are
+ * specific to each device, they are not necessarily indexed from zero and
+ * may not be contiguous.
+ *
+ * As a device property, the list of allowed values as well as the value
+ * associated with a port_id should be retrieved by other means.
+ *
+ * A zeroed mask can be used to match any port index.
+ */
+struct rte_flow_item_port {
+	uint32_t index; /**< Physical port index. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_RAW
+ *
+ * Matches a byte string of a given length at a given offset.
+ *
+ * Offset is either absolute (using the start of the packet) or relative to
+ * the end of the previous matched item in the stack, in which case negative
+ * values are allowed.
+ *
+ * If search is enabled, offset is used as the starting point. The search
+ * area can be delimited by setting limit to a nonzero value, which is the
+ * maximum number of bytes after offset where the pattern may start.
+ *
+ * Matching a zero-length pattern is allowed, doing so resets the relative
+ * offset for subsequent items.
+ *
+ * This type does not support ranges (struct rte_flow_item.last).
+ */
+struct rte_flow_item_raw {
+	uint32_t relative:1; /**< Look for pattern after the previous item. */
+	uint32_t search:1; /**< Search pattern from offset (see also limit). */
+	uint32_t reserved:30; /**< Reserved, must be set to zero. */
+	int32_t offset; /**< Absolute or relative offset for pattern. */
+	uint16_t limit; /**< Search area limit for start of pattern. */
+	uint16_t length; /**< Pattern length. */
+	uint8_t pattern[]; /**< Byte string to look for. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ETH
+ *
+ * Matches an Ethernet header.
+ */
+struct rte_flow_item_eth {
+	struct ether_addr dst; /**< Destination MAC. */
+	struct ether_addr src; /**< Source MAC. */
+	uint16_t type; /**< EtherType. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VLAN
+ *
+ * Matches an 802.1Q/ad VLAN tag.
+ *
+ * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
+ * RTE_FLOW_ITEM_TYPE_VLAN.
+ */
+struct rte_flow_item_vlan {
+	uint16_t tpid; /**< Tag protocol identifier. */
+	uint16_t tci; /**< Tag control information. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV4
+ *
+ * Matches an IPv4 header.
+ *
+ * Note: IPv4 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv4 {
+	struct ipv4_hdr hdr; /**< IPv4 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV6.
+ *
+ * Matches an IPv6 header.
+ *
+ * Note: IPv6 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv6 {
+	struct ipv6_hdr hdr; /**< IPv6 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ICMP.
+ *
+ * Matches an ICMP header.
+ */
+struct rte_flow_item_icmp {
+	struct icmp_hdr hdr; /**< ICMP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_UDP.
+ *
+ * Matches a UDP header.
+ */
+struct rte_flow_item_udp {
+	struct udp_hdr hdr; /**< UDP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_TCP.
+ *
+ * Matches a TCP header.
+ */
+struct rte_flow_item_tcp {
+	struct tcp_hdr hdr; /**< TCP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_SCTP.
+ *
+ * Matches a SCTP header.
+ */
+struct rte_flow_item_sctp {
+	struct sctp_hdr hdr; /**< SCTP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VXLAN.
+ *
+ * Matches a VXLAN header (RFC 7348).
+ */
+struct rte_flow_item_vxlan {
+	uint8_t flags; /**< Normally 0x08 (I flag). */
+	uint8_t rsvd0[3]; /**< Reserved, normally 0x000000. */
+	uint8_t vni[3]; /**< VXLAN identifier. */
+	uint8_t rsvd1; /**< Reserved, normally 0x00. */
+};
+
+/**
+ * Matching pattern item definition.
+ *
+ * A pattern is formed by stacking items starting from the lowest protocol
+ * layer to match. This stacking restriction does not apply to meta items
+ * which can be placed anywhere in the stack without affecting the meaning
+ * of the resulting pattern.
+ *
+ * Patterns are terminated by END items.
+ *
+ * The spec field should be a valid pointer to a structure of the related
+ * item type. It may be set to NULL in many cases to use default values.
+ *
+ * Optionally, last can point to a structure of the same type to define an
+ * inclusive range. This is mostly supported by integer and address fields,
+ * may cause errors otherwise. Fields that do not support ranges must be set
+ * to 0 or to the same value as the corresponding fields in spec.
+ *
+ * By default all fields present in spec are considered relevant (see note
+ * below). This behavior can be altered by providing a mask structure of the
+ * same type with applicable bits set to one. It can also be used to
+ * partially filter out specific fields (e.g. as an alternate mean to match
+ * ranges of IP addresses).
+ *
+ * Mask is a simple bit-mask applied before interpreting the contents of
+ * spec and last, which may yield unexpected results if not used
+ * carefully. For example, if for an IPv4 address field, spec provides
+ * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
+ * effective range becomes 10.1.0.0 to 10.3.255.255.
+ *
+ * Note: the defaults for data-matching items such as IPv4 when mask is not
+ * specified actually depend on the underlying implementation since only
+ * recognized fields can be taken into account.
+ */
+struct rte_flow_item {
+	enum rte_flow_item_type type; /**< Item type. */
+	const void *spec; /**< Pointer to item specification structure. */
+	const void *last; /**< Defines an inclusive range (spec to last). */
+	const void *mask; /**< Bit-mask applied to spec and last. */
+};
+
+/**
+ * Action types.
+ *
+ * Each possible action is represented by a type. Some have associated
+ * configuration structures. Several actions combined in a list can be
+ * affected to a flow rule. That list is not ordered.
+ *
+ * They fall in three categories:
+ *
+ * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+ *   processing matched packets by subsequent flow rules, unless overridden
+ *   with PASSTHRU.
+ *
+ * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
+ *   for additional processing by subsequent flow rules.
+ *
+ * - Other non terminating meta actions that do not affect the fate of
+ *   packets (END, VOID, MARK, FLAG, COUNT).
+ *
+ * When several actions are combined in a flow rule, they should all have
+ * different types (e.g. dropping a packet twice is not possible).
+ *
+ * Only the last action of a given type is taken into account. PMDs still
+ * perform error checking on the entire list.
+ *
+ * Note that PASSTHRU is the only action able to override a terminating
+ * rule.
+ */
+enum rte_flow_action_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for action lists. Prevents further processing of
+	 * actions, thereby ending the list.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_VOID,
+
+	/**
+	 * Leaves packets up for additional processing by subsequent flow
+	 * rules. This is the default when a rule does not contain a
+	 * terminating action, but can be specified to force a rule to
+	 * become non-terminating.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PASSTHRU,
+
+	/**
+	 * [META]
+	 *
+	 * Attaches a 32 bit value to packets.
+	 *
+	 * See struct rte_flow_action_mark.
+	 */
+	RTE_FLOW_ACTION_TYPE_MARK,
+
+	/**
+	 * [META]
+	 *
+	 * Flag packets. Similar to MARK but only affects ol_flags.
+	 *
+	 * Note: a distinctive flag must be defined for it.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_FLAG,
+
+	/**
+	 * Assigns packets to a given queue index.
+	 *
+	 * See struct rte_flow_action_queue.
+	 */
+	RTE_FLOW_ACTION_TYPE_QUEUE,
+
+	/**
+	 * Drops packets.
+	 *
+	 * PASSTHRU overrides this action if both are specified.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_DROP,
+
+	/**
+	 * [META]
+	 *
+	 * Enables counters for this rule.
+	 *
+	 * These counters can be retrieved and reset through rte_flow_query(),
+	 * see struct rte_flow_query_count.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_COUNT,
+
+	/**
+	 * Duplicates packets to a given queue index.
+	 *
+	 * This is normally combined with QUEUE, however when used alone, it
+	 * is actually similar to QUEUE + PASSTHRU.
+	 *
+	 * See struct rte_flow_action_dup.
+	 */
+	RTE_FLOW_ACTION_TYPE_DUP,
+
+	/**
+	 * Similar to QUEUE, except RSS is additionally performed on packets
+	 * to spread them among several queues according to the provided
+	 * parameters.
+	 *
+	 * See struct rte_flow_action_rss.
+	 */
+	RTE_FLOW_ACTION_TYPE_RSS,
+
+	/**
+	 * Redirects packets to the physical function (PF) of the current
+	 * device.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PF,
+
+	/**
+	 * Redirects packets to the virtual function (VF) of the current
+	 * device with the specified ID.
+	 *
+	 * See struct rte_flow_action_vf.
+	 */
+	RTE_FLOW_ACTION_TYPE_VF,
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_MARK
+ *
+ * Attaches a 32 bit value to packets.
+ *
+ * This value is arbitrary and application-defined. For compatibility with
+ * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
+ * also set in ol_flags.
+ */
+struct rte_flow_action_mark {
+	uint32_t id; /**< 32 bit value to return with packets. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_QUEUE
+ *
+ * Assign packets to a given queue index.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_queue {
+	uint16_t index; /**< Queue index to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_COUNT (query)
+ *
+ * Query structure to retrieve and reset flow rule counters.
+ */
+struct rte_flow_query_count {
+	uint32_t reset:1; /**< Reset counters after query [in]. */
+	uint32_t hits_set:1; /**< hits field is set [out]. */
+	uint32_t bytes_set:1; /**< bytes field is set [out]. */
+	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
+	uint64_t hits; /**< Number of hits for this rule [out]. */
+	uint64_t bytes; /**< Number of bytes through this rule [out]. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_DUP
+ *
+ * Duplicates packets to a given queue index.
+ *
+ * This is normally combined with QUEUE, however when used alone, it is
+ * actually similar to QUEUE + PASSTHRU.
+ *
+ * Non-terminating by default.
+ */
+struct rte_flow_action_dup {
+	uint16_t index; /**< Queue index to duplicate packets to. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_RSS
+ *
+ * Similar to QUEUE, except RSS is additionally performed on packets to
+ * spread them among several queues according to the provided parameters.
+ *
+ * Note: RSS hash result is normally stored in the hash.rss mbuf field,
+ * however it conflicts with the MARK action as they share the same
+ * space. When both actions are specified, the RSS hash is discarded and
+ * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
+ * structure should eventually evolve to store both.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_rss {
+	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
+	uint16_t num; /**< Number of entries in queue[]. */
+	uint16_t queue[]; /**< Queues indices to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_VF
+ *
+ * Redirects packets to a virtual function (VF) of the current device.
+ *
+ * Packets matched by a VF pattern item can be redirected to their original
+ * VF ID instead of the specified one. This parameter may not be available
+ * and is not guaranteed to work properly if the VF part is matched by a
+ * prior flow rule or if packets are not addressed to a VF in the first
+ * place.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_vf {
+	uint32_t original:1; /**< Use original VF ID if possible. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint32_t id; /**< VF ID to redirect packets to. */
+};
+
+/**
+ * Definition of a single action.
+ *
+ * A list of actions is terminated by a END action.
+ *
+ * For simple actions without a configuration structure, conf remains NULL.
+ */
+struct rte_flow_action {
+	enum rte_flow_action_type type; /**< Action type. */
+	const void *conf; /**< Pointer to action configuration structure. */
+};
+
+/**
+ * Opaque type returned after successfully creating a flow.
+ *
+ * This handle can be used to manage and query the related flow (e.g. to
+ * destroy it or retrieve counters).
+ */
+struct rte_flow;
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_flow_error.cause.
+ */
+enum rte_flow_error_type {
+	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+	RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+	RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+	RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_flow_error {
+	enum rte_flow_error_type type; /**< Cause field and error types. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Check whether a flow rule can be created on a given port.
+ *
+ * While this function has no effect on the target device, the flow rule is
+ * validated against its current configuration state and the returned value
+ * should be considered valid by the caller for that state only.
+ *
+ * The returned value is guaranteed to remain valid only as long as no
+ * successful calls to rte_flow_create() or rte_flow_destroy() are made in
+ * the meantime and no device parameter affecting flow rules in any way are
+ * modified, due to possible collisions or resource limitations (although in
+ * such cases EINVAL should not be returned).
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 if flow rule is valid and can be created. A negative errno value
+ *   otherwise (rte_errno is also set), the following errors are defined:
+ *
+ *   -ENOSYS: underlying device does not support this functionality.
+ *
+ *   -EINVAL: unknown or invalid rule specification.
+ *
+ *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
+ *   bit-masks are unsupported).
+ *
+ *   -EEXIST: collision with an existing rule.
+ *
+ *   -ENOMEM: not enough resources.
+ *
+ *   -EBUSY: action cannot be performed due to busy device resources, may
+ *   succeed if the affected queues or even the entire port are in a stopped
+ *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
+ */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error);
+
+/**
+ * Create a flow rule on a given port.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise and rte_errno is set
+ *   to the positive version of one of the error codes defined for
+ *   rte_flow_validate().
+ */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error);
+
+/**
+ * Destroy a flow rule on a given port.
+ *
+ * Failure to destroy a flow rule handle may occur when other flow rules
+ * depend on it, and destroying it would result in an inconsistent state.
+ *
+ * This function is only guaranteed to succeed if handles are destroyed in
+ * reverse order of their creation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to destroy.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error);
+
+/**
+ * Destroy all flow rules associated with a port.
+ *
+ * In the unlikely event of failure, handles are still considered destroyed
+ * and no longer valid but the port must be assumed to be in an inconsistent
+ * state.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error);
+
+/**
+ * Query an existing flow rule.
+ *
+ * This function allows retrieving flow-specific data such as counters.
+ * Data is gathered by special actions which must be present in the flow
+ * rule definition.
+ *
+ * \see RTE_FLOW_ACTION_TYPE_COUNT
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to query.
+ * @param action
+ *   Action type to query.
+ * @param[in, out] data
+ *   Pointer to storage for the associated query data type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL. PMDs initialize this
+ *   structure in case of error only.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_H_ */
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
new file mode 100644
index 0000000..274562c
--- /dev/null
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -0,0 +1,182 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_DRIVER_H_
+#define RTE_FLOW_DRIVER_H_
+
+/**
+ * @file
+ * RTE generic flow API (driver side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include "rte_flow.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Generic flow operations structure implemented and returned by PMDs.
+ *
+ * To implement this API, PMDs must handle the RTE_ETH_FILTER_GENERIC filter
+ * type in their .filter_ctrl callback function (struct eth_dev_ops) as well
+ * as the RTE_ETH_FILTER_GET filter operation.
+ *
+ * If successful, this operation must result in a pointer to a PMD-specific
+ * struct rte_flow_ops written to the argument address as described below:
+ *
+ * \code
+ *
+ * // PMD filter_ctrl callback
+ *
+ * static const struct rte_flow_ops pmd_flow_ops = { ... };
+ *
+ * switch (filter_type) {
+ * case RTE_ETH_FILTER_GENERIC:
+ *     if (filter_op != RTE_ETH_FILTER_GET)
+ *         return -EINVAL;
+ *     *(const void **)arg = &pmd_flow_ops;
+ *     return 0;
+ * }
+ *
+ * \endcode
+ *
+ * See also rte_flow_ops_get().
+ *
+ * These callback functions are not supposed to be used by applications
+ * directly, which must rely on the API defined in rte_flow.h.
+ *
+ * Public-facing wrapper functions perform a few consistency checks so that
+ * unimplemented (i.e. NULL) callbacks simply return -ENOTSUP. These
+ * callbacks otherwise only differ by their first argument (with port ID
+ * already resolved to a pointer to struct rte_eth_dev).
+ */
+struct rte_flow_ops {
+	/** See rte_flow_validate(). */
+	int (*validate)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_create(). */
+	struct rte_flow *(*create)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_destroy(). */
+	int (*destroy)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 struct rte_flow_error *);
+	/** See rte_flow_flush(). */
+	int (*flush)
+		(struct rte_eth_dev *,
+		 struct rte_flow_error *);
+	/** See rte_flow_query(). */
+	int (*query)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 enum rte_flow_action_type,
+		 void *,
+		 struct rte_flow_error *);
+};
+
+/**
+ * Initialize generic flow error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Pointer to flow error structure.
+ */
+static inline struct rte_flow_error *
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return error;
+}
+
+/**
+ * Get generic flow operations structure from a port.
+ *
+ * @param port_id
+ *   Port identifier to query.
+ * @param[out] error
+ *   Pointer to flow error structure.
+ *
+ * @return
+ *   The flow operations structure associated with port_id, NULL in case of
+ *   error, in which case rte_errno is set and the error structure contains
+ *   additional details.
+ */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_DRIVER_H_ */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst
  2016-12-17 10:43  0%     ` tom.barbette
@ 2016-12-19 10:25  0%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2016-12-19 10:25 UTC (permalink / raw)
  To: tom.barbette; +Cc: dev

On Sat, Dec 17, 2016 at 11:43:25AM +0100, tom.barbette@ulg.ac.be wrote:
> Hi,
> 
> Your comments made me saw the line "PMD: i40e_set_rx_function(): Vector rx enabled, please make sure RX burst size no less than 4 (port=0)."
> 
> The problem was probably that I was under this limit... Is there a way to get that limit through a function or something? 
> 
> With 16.04 I received sometimes 5 or 7 packets with a burst_size of 4 which respects this limit. I see that "[dpdk-dev] net/i40e: fix out-of-bounds writes during vector Rx" fixed that, as the limit was in fact 32 no matter the message.
> 
> At the end, what should be the minimal rx burst size? How to find it at runtime for any NIC? I imagine that vector rx will create a problem if I give a burst size of 1 even with a recent DPDK version, right?
> 

Sadly, there doesn't appear to be any way to discover this, and the i40e
driver requires at least a burst size of 4 even with the latest DPDK.
>From i40e_rxtx_vec_sse.c:

243         /* nb_pkts has to be floor-aligned to RTE_I40E_DESCS_PER_LOOP */
244         nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_I40E_DESCS_PER_LOOP);
245

I think in this case the gap is not so much having a discovery mechanism
to determine min burst size, but rather a driver gap so as to allow some
form of slower-path fallback when we get below min-size bursts for the
vector driver.

/Bruce

> Thanks,
> Tom
> 
> Tom Barbette 
> PhD Student @ Université de Liège 
> 
> Office 1/13 
> Bâtiment B37 
> Quartier Polytech 
> Allée de la découverte, 12 
> 4000 Liège 
> 
> 04/366 91 75 
> 0479/60 94 63 
> 
> 
> ----- Mail original -----
> De: "Bruce Richardson" <bruce.richardson@intel.com>
> À: "tom barbette" <tom.barbette@ulg.ac.be>
> Cc: dev@dpdk.org
> Envoyé: Mercredi 14 Décembre 2016 17:52:21
> Objet: Re: [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst
> 
> On Wed, Dec 14, 2016 at 04:13:53PM +0100, tom.barbette@ulg.ac.be wrote:
> > Hi list,
> > 
> > Between 2.2.0 and 16.04 (up to at least 16.07.2 if not current), with the XL710 controller I do not get any packet when calling rte_eth_rx_burst if nb_pkts is too small. I would say smaller than 32. The input rate is not big, if that helps. But It should definitely get at least one packet per second.
> > 
> > Any ideas? Is that a bug or expected behaviour? Could be caused by other ABI changes?
> > 
> Does this issue still occur even if you disable the vector driver in
> your build-time configuration?
> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: fix required tools list layout
  2016-12-18 19:11  0%   ` Baruch Siach
@ 2016-12-18 20:50  3%     ` Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2016-12-18 20:50 UTC (permalink / raw)
  To: Baruch Siach; +Cc: dev



> -----Original Message-----
> From: Baruch Siach [mailto:baruch@tkos.co.il]
> Sent: Sunday, December 18, 2016 7:11 PM
> To: Mcnamara, John <john.mcnamara@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: fix required tools list layout
> 
> Hi John,
> 
> On Thu, Dec 15, 2016 at 03:09:32PM +0000, Mcnamara, John wrote:
> > > -----Original Message-----
> > > From: Baruch Siach [mailto:baruch at tkos.co.il]
> > > Sent: Tuesday, December 13, 2016 10:04 AM
> > > To: dev at dpdk.org
> > > Cc: Mcnamara, John <john.mcnamara at intel.com>; David Marchand
> > > <david.marchand at 6wind.com>; Baruch Siach <baruch at tkos.co.il>
> > > Subject: [PATCH] doc: fix required tools list layout
> > >
> > > The Python requirement should appear in the bullet list.
> > >
> > > Signed-off-by: Baruch Siach <baruch at tkos.co.il>
> > > ---
> > >  doc/guides/linux_gsg/sys_reqs.rst | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/doc/guides/linux_gsg/sys_reqs.rst
> > > b/doc/guides/linux_gsg/sys_reqs.rst
> > > index 3d743421595a..621cc9ddaef6 100644
> > > --- a/doc/guides/linux_gsg/sys_reqs.rst
> > > +++ b/doc/guides/linux_gsg/sys_reqs.rst
> > > @@ -84,9 +84,7 @@ Compilation of the DPDK
> > >      x86_x32 ABI is currently supported with distribution packages
> > > only on Ubuntu
> > >      higher than 13.10 or recent Debian distribution. The only
> > > supported compiler is gcc 4.9+.
> > >
> > > -.. note::
> > > -
> > > -    Python, version 2.6 or 2.7, to use various helper scripts
> included in
> > > the DPDK package.
> > > +*   Python, version 2.6 or 2.7, to use various helper scripts
> included in
> > > the DPDK package.
> >
> > In addition to this change the note on the previous item should be
> > indented to the level of the bullet item. It is probably worth making
> > that change at the same time.
> 
> All items are equally aligned as far as I can see. The 32bit on 64bit
> requirement bullets are sub-items of the previous item. Am I missing
> anything?

Hi Baruch,

The note should be indented to the level of the first level bullet item text rather
than the margin since it is a note on that particular item and not a general note.
Like this:

    *   Additional packages required for 32-bit compilation on 64-bit systems are:

        * glibc.i686, libgcc.i686, libstdc++.i686 and glibc-devel.i686 for Intel i686/x86_64;

        * glibc.ppc64, libgcc.ppc64, libstdc++.ppc64 and glibc-devel.ppc64 for IBM ppc_64;

        .. note::

           x86_x32 ABI is currently supported with distribution packages only on Ubuntu
           higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.

If you generate the html before and after you will see the difference.

> 
> > Also, the Python version should probably say 2.7+ and 3.2+ if this
> > patch is
> > accepted:
> >
> >     http://dpdk.org/dev/patchwork/patch/17775/
> >
> > However, since that change hasn't been acked/merged yet you can leave
> > that part of your patch as it is and I'll fix the version numbers in
> > the other patch.
> 
> Note that your updated patch[1] conflicts with this one.
> 

Yes. :-)

It also conflicts with Thomas' patch to move the directories. I'll rebase based
on whatever order the patches are applied in.

John

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: fix required tools list layout
  2016-12-15 15:09  0% ` Mcnamara, John
@ 2016-12-18 19:11  0%   ` Baruch Siach
  2016-12-18 20:50  3%     ` Mcnamara, John
  0 siblings, 1 reply; 200+ results
From: Baruch Siach @ 2016-12-18 19:11 UTC (permalink / raw)
  To: Mcnamara, John; +Cc: dev

Hi John,

On Thu, Dec 15, 2016 at 03:09:32PM +0000, Mcnamara, John wrote:
> > -----Original Message-----
> > From: Baruch Siach [mailto:baruch at tkos.co.il]
> > Sent: Tuesday, December 13, 2016 10:04 AM
> > To: dev at dpdk.org
> > Cc: Mcnamara, John <john.mcnamara at intel.com>; David Marchand
> > <david.marchand at 6wind.com>; Baruch Siach <baruch at tkos.co.il>
> > Subject: [PATCH] doc: fix required tools list layout
> > 
> > The Python requirement should appear in the bullet list.
> > 
> > Signed-off-by: Baruch Siach <baruch at tkos.co.il>
> > ---
> >  doc/guides/linux_gsg/sys_reqs.rst | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/doc/guides/linux_gsg/sys_reqs.rst
> > b/doc/guides/linux_gsg/sys_reqs.rst
> > index 3d743421595a..621cc9ddaef6 100644
> > --- a/doc/guides/linux_gsg/sys_reqs.rst
> > +++ b/doc/guides/linux_gsg/sys_reqs.rst
> > @@ -84,9 +84,7 @@ Compilation of the DPDK
> >      x86_x32 ABI is currently supported with distribution packages only on
> > Ubuntu
> >      higher than 13.10 or recent Debian distribution. The only supported
> > compiler is gcc 4.9+.
> > 
> > -.. note::
> > -
> > -    Python, version 2.6 or 2.7, to use various helper scripts included in
> > the DPDK package.
> > +*   Python, version 2.6 or 2.7, to use various helper scripts included in
> > the DPDK package.
> 
> In addition to this change the note on the previous item should be indented 
> to the level of the bullet item. It is probably worth making that change at 
> the same time.

All items are equally aligned as far as I can see. The 32bit on 64bit 
requirement bullets are sub-items of the previous item. Am I missing anything?

> Also, the Python version should probably say 2.7+ and 3.2+ if this patch is 
> accepted: 
> 
>     http://dpdk.org/dev/patchwork/patch/17775/
> 
> However, since that change hasn't been acked/merged yet you can leave that 
> part of your patch as it is and I'll fix the version numbers in the other 
> patch.

Note that your updated patch[1] conflicts with this one.

[1] http://dpdk.org/dev/patchwork/patch/18152/

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst
  2016-12-14 16:52  0%   ` Bruce Richardson
@ 2016-12-17 10:43  0%     ` tom.barbette
  2016-12-19 10:25  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: tom.barbette @ 2016-12-17 10:43 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Hi,

Your comments made me saw the line "PMD: i40e_set_rx_function(): Vector rx enabled, please make sure RX burst size no less than 4 (port=0)."

The problem was probably that I was under this limit... Is there a way to get that limit through a function or something? 

With 16.04 I received sometimes 5 or 7 packets with a burst_size of 4 which respects this limit. I see that "[dpdk-dev] net/i40e: fix out-of-bounds writes during vector Rx" fixed that, as the limit was in fact 32 no matter the message.

At the end, what should be the minimal rx burst size? How to find it at runtime for any NIC? I imagine that vector rx will create a problem if I give a burst size of 1 even with a recent DPDK version, right?

Thanks,
Tom

Tom Barbette 
PhD Student @ Université de Liège 

Office 1/13 
Bâtiment B37 
Quartier Polytech 
Allée de la découverte, 12 
4000 Liège 

04/366 91 75 
0479/60 94 63 


----- Mail original -----
De: "Bruce Richardson" <bruce.richardson@intel.com>
À: "tom barbette" <tom.barbette@ulg.ac.be>
Cc: dev@dpdk.org
Envoyé: Mercredi 14 Décembre 2016 17:52:21
Objet: Re: [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst

On Wed, Dec 14, 2016 at 04:13:53PM +0100, tom.barbette@ulg.ac.be wrote:
> Hi list,
> 
> Between 2.2.0 and 16.04 (up to at least 16.07.2 if not current), with the XL710 controller I do not get any packet when calling rte_eth_rx_burst if nb_pkts is too small. I would say smaller than 32. The input rate is not big, if that helps. But It should definitely get at least one packet per second.
> 
> Any ideas? Is that a bug or expected behaviour? Could be caused by other ABI changes?
> 
Does this issue still occur even if you disable the vector driver in
your build-time configuration?

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 04/25] cmdline: add support for dynamic tokens
    2016-12-16 16:24  2%     ` [dpdk-dev] [PATCH v2 01/25] ethdev: introduce generic flow API Adrien Mazarguil
  2016-12-16 16:24  1%     ` [dpdk-dev] [PATCH v2 02/25] doc: add rte_flow prog guide Adrien Mazarguil
@ 2016-12-16 16:25  2%     ` Adrien Mazarguil
    3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-16 16:25 UTC (permalink / raw)
  To: dev

Considering tokens must be hard-coded in a list part of the instruction
structure, context-dependent tokens cannot be expressed.

This commit adds support for building dynamic token lists through a
user-provided function, which is called when the static token list is empty
(a single NULL entry).

Because no structures are modified (existing fields are reused), this
commit has no impact on the current ABI.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_cmdline/cmdline_parse.c | 60 +++++++++++++++++++++++++++++----
 lib/librte_cmdline/cmdline_parse.h | 21 ++++++++++++
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c
index b496067..14f5553 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -146,7 +146,9 @@ nb_common_chars(const char * s1, const char * s2)
  */
 static int
 match_inst(cmdline_parse_inst_t *inst, const char *buf,
-	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size)
+	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size,
+	   cmdline_parse_token_hdr_t
+		*(*dyn_tokens)[CMDLINE_PARSE_DYNAMIC_TOKENS])
 {
 	unsigned int token_num=0;
 	cmdline_parse_token_hdr_t * token_p;
@@ -155,6 +157,11 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 	struct cmdline_token_hdr token_hdr;
 
 	token_p = inst->tokens[token_num];
+	if (!token_p && dyn_tokens && inst->f) {
+		if (!(*dyn_tokens)[0])
+			inst->f(&(*dyn_tokens)[0], NULL, dyn_tokens);
+		token_p = (*dyn_tokens)[0];
+	}
 	if (token_p)
 		memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -196,7 +203,17 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 		buf += n;
 
 		token_num ++;
-		token_p = inst->tokens[token_num];
+		if (!inst->tokens[0]) {
+			if (token_num < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!(*dyn_tokens)[token_num])
+					inst->f(&(*dyn_tokens)[token_num],
+						NULL,
+						dyn_tokens);
+				token_p = (*dyn_tokens)[token_num];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[token_num];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 	}
@@ -239,6 +256,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 	cmdline_parse_inst_t *inst;
 	const char *curbuf;
 	char result_buf[CMDLINE_PARSE_RESULT_BUFSIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	void (*f)(void *, struct cmdline *, void *) = NULL;
 	void *data = NULL;
 	int comment = 0;
@@ -255,6 +273,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		return CMDLINE_PARSE_BAD_ARGS;
 
 	ctx = cl->ctx;
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/*
 	 * - look if the buffer contains at least one line
@@ -299,7 +318,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		debug_printf("INST %d\n", inst_num);
 
 		/* fully parsed */
-		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf));
+		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf),
+				 &dyn_tokens);
 
 		if (tok > 0) /* we matched at least one token */
 			err = CMDLINE_PARSE_BAD_ARGS;
@@ -355,6 +375,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 	cmdline_parse_token_hdr_t *token_p;
 	struct cmdline_token_hdr token_hdr;
 	char tmpbuf[CMDLINE_BUFFER_SIZE], comp_buf[CMDLINE_BUFFER_SIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	unsigned int partial_tok_len;
 	int comp_len = -1;
 	int tmp_len = -1;
@@ -374,6 +395,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 
 	debug_printf("%s called\n", __func__);
 	memset(&token_hdr, 0, sizeof(token_hdr));
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/* count the number of complete token to parse */
 	for (i=0 ; buf[i] ; i++) {
@@ -396,11 +418,24 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		inst = ctx[inst_num];
 		while (inst) {
 			/* parse the first tokens of the inst */
-			if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+			if (nb_token &&
+			    match_inst(inst, buf, nb_token, NULL, 0,
+				       &dyn_tokens))
 				goto next;
 
 			debug_printf("instruction match\n");
-			token_p = inst->tokens[nb_token];
+			if (!inst->tokens[0]) {
+				if (nb_token <
+				    (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+					if (!dyn_tokens[nb_token])
+						inst->f(&dyn_tokens[nb_token],
+							NULL,
+							&dyn_tokens);
+					token_p = dyn_tokens[nb_token];
+				} else
+					token_p = NULL;
+			} else
+				token_p = inst->tokens[nb_token];
 			if (token_p)
 				memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -490,10 +525,21 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		/* we need to redo it */
 		inst = ctx[inst_num];
 
-		if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+		if (nb_token &&
+		    match_inst(inst, buf, nb_token, NULL, 0, &dyn_tokens))
 			goto next2;
 
-		token_p = inst->tokens[nb_token];
+		if (!inst->tokens[0]) {
+			if (nb_token < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!dyn_tokens[nb_token])
+					inst->f(&dyn_tokens[nb_token],
+						NULL,
+						&dyn_tokens);
+				token_p = dyn_tokens[nb_token];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[nb_token];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
diff --git a/lib/librte_cmdline/cmdline_parse.h b/lib/librte_cmdline/cmdline_parse.h
index 4ac05d6..65b18d4 100644
--- a/lib/librte_cmdline/cmdline_parse.h
+++ b/lib/librte_cmdline/cmdline_parse.h
@@ -83,6 +83,9 @@ extern "C" {
 /* maximum buffer size for parsed result */
 #define CMDLINE_PARSE_RESULT_BUFSIZE 8192
 
+/* maximum number of dynamic tokens */
+#define CMDLINE_PARSE_DYNAMIC_TOKENS 128
+
 /**
  * Stores a pointer to the ops struct, and the offset: the place to
  * write the parsed result in the destination structure.
@@ -130,6 +133,24 @@ struct cmdline;
  * Store a instruction, which is a pointer to a callback function and
  * its parameter that is called when the instruction is parsed, a help
  * string, and a list of token composing this instruction.
+ *
+ * When no tokens are defined (tokens[0] == NULL), they are retrieved
+ * dynamically by calling f() as follows:
+ *
+ *  f((struct cmdline_token_hdr **)&token_hdr,
+ *    NULL,
+ *    (struct cmdline_token_hdr *[])tokens));
+ *
+ * The address of the resulting token is expected at the location pointed by
+ * the first argument. Can be set to NULL to end the list.
+ *
+ * The cmdline argument (struct cmdline *) is always NULL.
+ *
+ * The last argument points to the NULL-terminated list of dynamic tokens
+ * defined so far. Since token_hdr points to an index of that list, the
+ * current index can be derived as follows:
+ *
+ *  int index = token_hdr - &(*tokens)[0];
  */
 struct cmdline_inst {
 	/* f(parsed_struct, data) */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 02/25] doc: add rte_flow prog guide
    2016-12-16 16:24  2%     ` [dpdk-dev] [PATCH v2 01/25] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-12-16 16:24  1%     ` Adrien Mazarguil
  2016-12-16 16:25  2%     ` [dpdk-dev] [PATCH v2 04/25] cmdline: add support for dynamic tokens Adrien Mazarguil
    3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-16 16:24 UTC (permalink / raw)
  To: dev

This documentation is based on the latest RFC submission, subsequently
updated according to feedback from the community.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/prog_guide/index.rst    |    1 +
 doc/guides/prog_guide/rte_flow.rst | 1853 +++++++++++++++++++++++++++++++
 2 files changed, 1854 insertions(+)

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index e5a50a8..ed7f770 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -42,6 +42,7 @@ Programmer's Guide
     mempool_lib
     mbuf_lib
     poll_mode_drv
+    rte_flow
     cryptodev_lib
     link_bonding_poll_mode_drv_lib
     timer_lib
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
new file mode 100644
index 0000000..63413d1
--- /dev/null
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -0,0 +1,1853 @@
+..  BSD LICENSE
+    Copyright 2016 6WIND S.A.
+    Copyright 2016 Mellanox.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Generic_flow_API:
+
+Generic flow API (rte_flow)
+===========================
+
+Overview
+--------
+
+This API provides a generic means to configure hardware to match specific
+ingress or egress traffic, alter its fate and query related counters
+according to any number of user-defined rules.
+
+It is named *rte_flow* after the prefix used for all its symbols, and is
+defined in ``rte_flow.h``.
+
+- Matching can be performed on packet data (protocol headers, payload) and
+  properties (e.g. associated physical port, virtual device function ID).
+
+- Possible operations include dropping traffic, diverting it to specific
+  queues, to virtual/physical device functions or ports, performing tunnel
+  offloads, adding marks and so on.
+
+It is slightly higher-level than the legacy filtering framework which it
+encompasses and supersedes (including all functions and filter types) in
+order to expose a single interface with an unambiguous behavior that is
+common to all poll-mode drivers (PMDs).
+
+Several methods to migrate existing applications are described in `API
+migration`_.
+
+Flow rule
+---------
+
+Description
+~~~~~~~~~~~
+
+A flow rule is the combination of attributes with a matching pattern and a
+list of actions. Flow rules form the basis of this API.
+
+Flow rules can have several distinct actions (such as counting,
+encapsulating, decapsulating before redirecting packets to a particular
+queue, etc.), instead of relying on several rules to achieve this and having
+applications deal with hardware implementation details regarding their
+order.
+
+Support for different priority levels on a rule basis is provided, for
+example in order to force a more specific rule to come before a more generic
+one for packets matched by both. However hardware support for more than a
+single priority level cannot be guaranteed. When supported, the number of
+available priority levels is usually low, which is why they can also be
+implemented in software by PMDs (e.g. missing priority levels may be
+emulated by reordering rules).
+
+In order to remain as hardware-agnostic as possible, by default all rules
+are considered to have the same priority, which means that the order between
+overlapping rules (when a packet is matched by several filters) is
+undefined.
+
+PMDs may refuse to create overlapping rules at a given priority level when
+they can be detected (e.g. if a pattern matches an existing filter).
+
+Thus predictable results for a given priority level can only be achieved
+with non-overlapping rules, using perfect matching on all protocol layers.
+
+Flow rules can also be grouped, the flow rule priority is specific to the
+group they belong to. All flow rules in a given group are thus processed
+either before or after another group.
+
+Support for multiple actions per rule may be implemented internally on top
+of non-default hardware priorities, as a result both features may not be
+simultaneously available to applications.
+
+Considering that allowed pattern/actions combinations cannot be known in
+advance and would result in an unpractically large number of capabilities to
+expose, a method is provided to validate a given rule from the current
+device configuration state.
+
+This enables applications to check if the rule types they need is supported
+at initialization time, before starting their data path. This method can be
+used anytime, its only requirement being that the resources needed by a rule
+should exist (e.g. a target RX queue should be configured first).
+
+Each defined rule is associated with an opaque handle managed by the PMD,
+applications are responsible for keeping it. These can be used for queries
+and rules management, such as retrieving counters or other data and
+destroying them.
+
+To avoid resource leaks on the PMD side, handles must be explicitly
+destroyed by the application before releasing associated resources such as
+queues and ports.
+
+The following sections cover:
+
+- **Attributes** (represented by ``struct rte_flow_attr``): properties of a
+  flow rule such as its direction (ingress or egress) and priority.
+
+- **Pattern item** (represented by ``struct rte_flow_item``): part of a
+  matching pattern that either matches specific packet data or traffic
+  properties. It can also describe properties of the pattern itself, such as
+  inverted matching.
+
+- **Matching pattern**: traffic properties to look for, a combination of any
+  number of items.
+
+- **Actions** (represented by ``struct rte_flow_action``): operations to
+  perform whenever a packet is matched by a pattern.
+
+Attributes
+~~~~~~~~~~
+
+Group
+^^^^^
+
+Flow rules can be grouped by assigning them a common group number. Lower
+values have higher priority. Group 0 has the highest priority.
+
+Although optional, applications are encouraged to group similar rules as
+much as possible to fully take advantage of hardware capabilities
+(e.g. optimized matching) and work around limitations (e.g. a single pattern
+type possibly allowed in a given group).
+
+Note that support for more than a single group is not guaranteed.
+
+Priority
+^^^^^^^^
+
+A priority level can be assigned to a flow rule. Like groups, lower values
+denote higher priority, with 0 as the maximum.
+
+A rule with priority 0 in group 8 is always matched after a rule with
+priority 8 in group 0.
+
+Group and priority levels are arbitrary and up to the application, they do
+not need to be contiguous nor start from 0, however the maximum number
+varies between devices and may be affected by existing flow rules.
+
+If a packet is matched by several rules of a given group for a given
+priority level, the outcome is undefined. It can take any path, may be
+duplicated or even cause unrecoverable errors.
+
+Note that support for more than a single priority level is not guaranteed.
+
+Traffic direction
+^^^^^^^^^^^^^^^^^
+
+Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+
+Several pattern items and actions are valid and can be used in both
+directions. At least one direction must be specified.
+
+Specifying both directions at once for a given rule is not recommended but
+may be valid in a few cases (e.g. shared counters).
+
+Pattern item
+~~~~~~~~~~~~
+
+Pattern items fall in two categories:
+
+- Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+  IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+  specification structure.
+
+- Matching meta-data or affecting pattern processing (END, VOID, INVERT, PF,
+  VF, PORT and so on), often without a specification structure.
+
+Item specification structures are used to match specific values among
+protocol fields (or item properties). Documentation describes for each item
+whether they are associated with one and their type name if so.
+
+Up to three structures of the same type can be set for a given item:
+
+- ``spec``: values to match (e.g. a given IPv4 address).
+
+- ``last``: upper bound for an inclusive range with corresponding fields in
+  ``spec``.
+
+- ``mask``: bit-mask applied to both ``spec`` and ``last`` whose purpose is
+  to distinguish the values to take into account and/or partially mask them
+  out (e.g. in order to match an IPv4 address prefix).
+
+Usage restrictions and expected behavior:
+
+- Setting either ``mask`` or ``last`` without ``spec`` is an error.
+
+- Field values in ``last`` which are either 0 or equal to the corresponding
+  values in ``spec`` are ignored; they do not generate a range. Nonzero
+  values lower than those in ``spec`` are not supported.
+
+- Setting ``spec`` and optionally ``last`` without ``mask`` causes the PMD
+  to only take the fields it can recognize into account. There is no error
+  checking for unsupported fields.
+
+- Not setting any of them (assuming item type allows it) uses default
+  parameters that depend on the item type. Most of the time, particularly
+  for protocol header items, it is equivalent to providing an empty (zeroed)
+  ``mask``.
+
+- ``mask`` is a simple bit-mask applied before interpreting the contents of
+  ``spec`` and ``last``, which may yield unexpected results if not used
+  carefully. For example, if for an IPv4 address field, ``spec`` provides
+  *10.1.2.3*, ``last`` provides *10.3.4.5* and ``mask`` provides
+  *255.255.0.0*, the effective range becomes *10.1.0.0* to *10.3.255.255*.
+
+Example of an item specification matching an Ethernet header:
+
++------------------------------------------+
+| Ethernet                                 |
++==========+==========+====================+
+| ``spec`` | ``src``  | ``00:01:02:03:04`` |
+|          +----------+--------------------+
+|          | ``dst``  | ``00:2a:66:00:01`` |
+|          +----------+--------------------+
+|          | ``type`` | ``0x22aa``         |
++----------+----------+--------------------+
+| ``last`` | unspecified                   |
++----------+----------+--------------------+
+| ``mask`` | ``src``  | ``00:ff:ff:ff:00`` |
+|          +----------+--------------------+
+|          | ``dst``  | ``00:00:00:00:ff`` |
+|          +----------+--------------------+
+|          | ``type`` | ``0x0000``         |
++----------+----------+--------------------+
+
+Non-masked bits stand for any value (shown as ``?`` below), Ethernet headers
+with the following properties are thus matched:
+
+- ``src``: ``??:01:02:03:??``
+- ``dst``: ``??:??:??:??:01``
+- ``type``: ``0x????``
+
+Matching pattern
+~~~~~~~~~~~~~~~~
+
+A pattern is formed by stacking items starting from the lowest protocol
+layer to match. This stacking restriction does not apply to meta items which
+can be placed anywhere in the stack without affecting the meaning of the
+resulting pattern.
+
+Patterns are terminated by END items.
+
+Examples:
+
++--------------+
+| TCPv4 as L4  |
++===+==========+
+| 0 | Ethernet |
++---+----------+
+| 1 | IPv4     |
++---+----------+
+| 2 | TCP      |
++---+----------+
+| 3 | END      |
++---+----------+
+
+|
+
++----------------+
+| TCPv6 in VXLAN |
++===+============+
+| 0 | Ethernet   |
++---+------------+
+| 1 | IPv4       |
++---+------------+
+| 2 | UDP        |
++---+------------+
+| 3 | VXLAN      |
++---+------------+
+| 4 | Ethernet   |
++---+------------+
+| 5 | IPv6       |
++---+------------+
+| 6 | TCP        |
++---+------------+
+| 7 | END        |
++---+------------+
+
+|
+
++-----------------------------+
+| TCPv4 as L4 with meta items |
++===+=========================+
+| 0 | VOID                    |
++---+-------------------------+
+| 1 | Ethernet                |
++---+-------------------------+
+| 2 | VOID                    |
++---+-------------------------+
+| 3 | IPv4                    |
++---+-------------------------+
+| 4 | TCP                     |
++---+-------------------------+
+| 5 | VOID                    |
++---+-------------------------+
+| 6 | VOID                    |
++---+-------------------------+
+| 7 | END                     |
++---+-------------------------+
+
+The above example shows how meta items do not affect packet data matching
+items, as long as those remain stacked properly. The resulting matching
+pattern is identical to "TCPv4 as L4".
+
++----------------+
+| UDPv6 anywhere |
++===+============+
+| 0 | IPv6       |
++---+------------+
+| 1 | UDP        |
++---+------------+
+| 2 | END        |
++---+------------+
+
+If supported by the PMD, omitting one or several protocol layers at the
+bottom of the stack as in the above example (missing an Ethernet
+specification) enables looking up anywhere in packets.
+
+It is unspecified whether the payload of supported encapsulations
+(e.g. VXLAN payload) is matched by such a pattern, which may apply to inner,
+outer or both packets.
+
++---------------------+
+| Invalid, missing L3 |
++===+=================+
+| 0 | Ethernet        |
++---+-----------------+
+| 1 | UDP             |
++---+-----------------+
+| 2 | END             |
++---+-----------------+
+
+The above pattern is invalid due to a missing L3 specification between L2
+(Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the
+top of the stack.
+
+Meta item types
+~~~~~~~~~~~~~~~
+
+They match meta-data or affect pattern processing instead of matching packet
+data directly, most of them do not need a specification structure. This
+particularity allows them to be specified anywhere in the stack without
+causing any side effect.
+
+``END``
+^^^^^^^
+
+End marker for item lists. Prevents further processing of items, thereby
+ending the pattern.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
++--------------------+
+| END                |
++==========+=========+
+| ``spec`` | ignored |
++----------+---------+
+| ``last`` | ignored |
++----------+---------+
+| ``mask`` | ignored |
++----------+---------+
+
+``VOID``
+^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
++--------------------+
+| VOID               |
++==========+=========+
+| ``spec`` | ignored |
++----------+---------+
+| ``last`` | ignored |
++----------+---------+
+| ``mask`` | ignored |
++----------+---------+
+
+One usage example for this type is generating rules that share a common
+prefix quickly without reallocating memory, only by updating item types:
+
++------------------------+
+| TCP, UDP or ICMP as L4 |
++===+====================+
+| 0 | Ethernet           |
++---+--------------------+
+| 1 | IPv4               |
++---+------+------+------+
+| 2 | UDP  | VOID | VOID |
++---+------+------+------+
+| 3 | VOID | TCP  | VOID |
++---+------+------+------+
+| 4 | VOID | VOID | ICMP |
++---+------+------+------+
+| 5 | END                |
++---+--------------------+
+
+``INVERT``
+^^^^^^^^^^
+
+Inverted matching, i.e. process packets that do not match the pattern.
+
+- ``spec``, ``last`` and ``mask`` are ignored.
+
++--------------------+
+| INVERT             |
++==========+=========+
+| ``spec`` | ignored |
++----------+---------+
+| ``last`` | ignored |
++----------+---------+
+| ``mask`` | ignored |
++----------+---------+
+
+Usage example, matching non-TCPv4 packets only:
+
++--------------------+
+| Anything but TCPv4 |
++===+================+
+| 0 | INVERT         |
++---+----------------+
+| 1 | Ethernet       |
++---+----------------+
+| 2 | IPv4           |
++---+----------------+
+| 3 | TCP            |
++---+----------------+
+| 4 | END            |
++---+----------------+
+
+``PF``
+^^^^^^
+
+Matches packets addressed to the physical function of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `PF (action)`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if applied to a VF
+  device.
+- Can be combined with any number of `VF`_ items to match both PF and VF
+  traffic.
+- ``spec``, ``last`` and ``mask`` must not be set.
+
++------------------+
+| PF               |
++==========+=======+
+| ``spec`` | unset |
++----------+-------+
+| ``last`` | unset |
++----------+-------+
+| ``mask`` | unset |
++----------+-------+
+
+``VF``
+^^^^^^
+
+Matches packets addressed to a virtual function ID of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `VF (action)`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if this causes a VF
+  device to match traffic addressed to a different VF.
+- Can be specified multiple times to match traffic addressed to several VF
+  IDs.
+- Can be combined with a PF item to match both PF and VF traffic.
+
++------------------------------------------------+
+| VF                                             |
++==========+=========+===========================+
+| ``spec`` | ``id``  | destination VF ID         |
++----------+---------+---------------------------+
+| ``last`` | ``id``  | upper range value         |
++----------+---------+---------------------------+
+| ``mask`` | ``id``  | zeroed to match any VF ID |
++----------+---------+---------------------------+
+
+``PORT``
+^^^^^^^^
+
+Matches packets coming from the specified physical port of the underlying
+device.
+
+The first PORT item overrides the physical port normally associated with the
+specified DPDK input port (port_id). This item can be provided several times
+to match additional physical ports.
+
+Note that physical ports are not necessarily tied to DPDK input ports
+(port_id) when those are not under DPDK control. Possible values are
+specific to each device, they are not necessarily indexed from zero and may
+not be contiguous.
+
+As a device property, the list of allowed values as well as the value
+associated with a port_id should be retrieved by other means.
+
++-------------------------------------------------------+
+| PORT                                                  |
++==========+===========+================================+
+| ``spec`` | ``index`` | physical port index            |
++----------+-----------+--------------------------------+
+| ``last`` | ``index`` | upper range value              |
++----------+-----------+--------------------------------+
+| ``mask`` | ``index`` | zeroed to match any port index |
++----------+-----------+--------------------------------+
+
+Data matching item types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most of these are basically protocol header definitions with associated
+bit-masks. They must be specified (stacked) from lowest to highest protocol
+layer to form a matching pattern.
+
+The following list is not exhaustive, new protocols will be added in the
+future.
+
+``ANY``
+^^^^^^^
+
+Matches any protocol in place of the current layer, a single ANY may also
+stand for several protocol layers.
+
+This is usually specified as the first pattern item when looking for a
+protocol anywhere in a packet.
+
++-----------------------------------------------------------+
+| ANY                                                       |
++==========+=========+======================================+
+| ``spec`` | ``num`` | number of layers covered             |
++----------+---------+--------------------------------------+
+| ``last`` | ``num`` | upper range value                    |
++----------+---------+--------------------------------------+
+| ``mask`` | ``num`` | zeroed to cover any number of layers |
++----------+---------+--------------------------------------+
+
+Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
+and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
+or IPv6) matched by the second ANY specification:
+
++----------------------------------+
+| TCP in VXLAN with wildcards      |
++===+==============================+
+| 0 | Ethernet                     |
++---+-----+----------+---------+---+
+| 1 | ANY | ``spec`` | ``num`` | 2 |
++---+-----+----------+---------+---+
+| 2 | VXLAN                        |
++---+------------------------------+
+| 3 | Ethernet                     |
++---+-----+----------+---------+---+
+| 4 | ANY | ``spec`` | ``num`` | 1 |
++---+-----+----------+---------+---+
+| 5 | TCP                          |
++---+------------------------------+
+| 6 | END                          |
++---+------------------------------+
+
+``RAW``
+^^^^^^^
+
+Matches a byte string of a given length at a given offset.
+
+Offset is either absolute (using the start of the packet) or relative to the
+end of the previous matched item in the stack, in which case negative values
+are allowed.
+
+If search is enabled, offset is used as the starting point. The search area
+can be delimited by setting limit to a nonzero value, which is the maximum
+number of bytes after offset where the pattern may start.
+
+Matching a zero-length pattern is allowed, doing so resets the relative
+offset for subsequent items.
+
+- This type does not support ranges (``last`` field).
+
++---------------------------------------------------------------------------+
+| RAW                                                                       |
++==========+==============+=================================================+
+| ``spec`` | ``relative`` | look for pattern after the previous item        |
+|          +--------------+-------------------------------------------------+
+|          | ``search``   | search pattern from offset (see also ``limit``) |
+|          +--------------+-------------------------------------------------+
+|          | ``reserved`` | reserved, must be set to zero                   |
+|          +--------------+-------------------------------------------------+
+|          | ``offset``   | absolute or relative offset for ``pattern``     |
+|          +--------------+-------------------------------------------------+
+|          | ``limit``    | search area limit for start of ``pattern``      |
+|          +--------------+-------------------------------------------------+
+|          | ``length``   | ``pattern`` length                              |
+|          +--------------+-------------------------------------------------+
+|          | ``pattern``  | byte string to look for                         |
++----------+--------------+-------------------------------------------------+
+| ``last`` | if specified, either all 0 or with the same values as ``spec`` |
++----------+----------------------------------------------------------------+
+| ``mask`` | bit-mask applied to ``spec`` values with usual behavior        |
++----------+----------------------------------------------------------------+
+
+Example pattern looking for several strings at various offsets of a UDP
+payload, using combined RAW items:
+
++-------------------------------------------+
+| UDP payload matching                      |
++===+=======================================+
+| 0 | Ethernet                              |
++---+---------------------------------------+
+| 1 | IPv4                                  |
++---+---------------------------------------+
+| 2 | UDP                                   |
++---+-----+----------+--------------+-------+
+| 3 | RAW | ``spec`` | ``relative`` | 1     |
+|   |     |          +--------------+-------+
+|   |     |          | ``search``   | 1     |
+|   |     |          +--------------+-------+
+|   |     |          | ``offset``   | 10    |
+|   |     |          +--------------+-------+
+|   |     |          | ``limit``    | 0     |
+|   |     |          +--------------+-------+
+|   |     |          | ``length``   | 3     |
+|   |     |          +--------------+-------+
+|   |     |          | ``pattern``  | "foo" |
++---+-----+----------+--------------+-------+
+| 4 | RAW | ``spec`` | ``relative`` | 1     |
+|   |     |          +--------------+-------+
+|   |     |          | ``search``   | 0     |
+|   |     |          +--------------+-------+
+|   |     |          | ``offset``   | 20    |
+|   |     |          +--------------+-------+
+|   |     |          | ``limit``    | 0     |
+|   |     |          +--------------+-------+
+|   |     |          | ``length``   | 3     |
+|   |     |          +--------------+-------+
+|   |     |          | ``pattern``  | "bar" |
++---+-----+----------+--------------+-------+
+| 5 | RAW | ``spec`` | ``relative`` | 1     |
+|   |     |          +--------------+-------+
+|   |     |          | ``search``   | 0     |
+|   |     |          +--------------+-------+
+|   |     |          | ``offset``   | -29   |
+|   |     |          +--------------+-------+
+|   |     |          | ``limit``    | 0     |
+|   |     |          +--------------+-------+
+|   |     |          | ``length``   | 3     |
+|   |     |          +--------------+-------+
+|   |     |          | ``pattern``  | "baz" |
++---+-----+----------+--------------+-------+
+| 6 | END                                   |
++---+---------------------------------------+
+
+This translates to:
+
+- Locate "foo" at least 10 bytes deep inside UDP payload.
+- Locate "bar" after "foo" plus 20 bytes.
+- Locate "baz" after "bar" minus 29 bytes.
+
+Such a packet may be represented as follows (not to scale)::
+
+ 0                     >= 10 B           == 20 B
+ |                  |<--------->|     |<--------->|
+ |                  |           |     |           |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+ | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+                          |                             |
+                          |<--------------------------->|
+                                      == 29 B
+
+Note that matching subsequent pattern items would resume after "baz", not
+"bar" since matching is always performed after the previous item of the
+stack.
+
+``ETH``
+^^^^^^^
+
+Matches an Ethernet header.
+
+- ``dst``: destination MAC.
+- ``src``: source MAC.
+- ``type``: EtherType.
+
+``VLAN``
+^^^^^^^^
+
+Matches an 802.1Q/ad VLAN tag.
+
+- ``tpid``: tag protocol identifier.
+- ``tci``: tag control information.
+
+``IPV4``
+^^^^^^^^
+
+Matches an IPv4 header.
+
+Note: IPv4 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv4 header definition (``rte_ip.h``).
+
+``IPV6``
+^^^^^^^^
+
+Matches an IPv6 header.
+
+Note: IPv6 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv6 header definition (``rte_ip.h``).
+
+``ICMP``
+^^^^^^^^
+
+Matches an ICMP header.
+
+- ``hdr``: ICMP header definition (``rte_icmp.h``).
+
+``UDP``
+^^^^^^^
+
+Matches a UDP header.
+
+- ``hdr``: UDP header definition (``rte_udp.h``).
+
+``TCP``
+^^^^^^^
+
+Matches a TCP header.
+
+- ``hdr``: TCP header definition (``rte_tcp.h``).
+
+``SCTP``
+^^^^^^^^
+
+Matches a SCTP header.
+
+- ``hdr``: SCTP header definition (``rte_sctp.h``).
+
+``VXLAN``
+^^^^^^^^^
+
+Matches a VXLAN header (RFC 7348).
+
+- ``flags``: normally 0x08 (I flag).
+- ``rsvd0``: reserved, normally 0x000000.
+- ``vni``: VXLAN network identifier.
+- ``rsvd1``: reserved, normally 0x00.
+
+Actions
+~~~~~~~
+
+Each possible action is represented by a type. Some have associated
+configuration structures. Several actions combined in a list can be affected
+to a flow rule. That list is not ordered.
+
+They fall in three categories:
+
+- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+  processing matched packets by subsequent flow rules, unless overridden
+  with PASSTHRU.
+
+- Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for
+  additional processing by subsequent flow rules.
+
+- Other non-terminating meta actions that do not affect the fate of packets
+  (END, VOID, MARK, FLAG, COUNT).
+
+When several actions are combined in a flow rule, they should all have
+different types (e.g. dropping a packet twice is not possible).
+
+Only the last action of a given type is taken into account. PMDs still
+perform error checking on the entire list.
+
+Like matching patterns, action lists are terminated by END items.
+
+*Note that PASSTHRU is the only action able to override a terminating rule.*
+
+Example of action that redirects packets to queue index 10:
+
++----------------+
+| QUEUE          |
++===========+====+
+| ``index`` | 10 |
++-----------+----+
+
+Action lists examples, their order is not significant, applications must
+consider all actions to be performed simultaneously:
+
++----------------+
+| Count and drop |
++================+
+| COUNT          |
++----------------+
+| DROP           |
++----------------+
+| END            |
++----------------+
+
+|
+
++--------------------------+
+| Mark, count and redirect |
++=======+===========+======+
+| MARK  | ``mark``  | 0x2a |
++-------+-----------+------+
+| COUNT                    |
++-------+-----------+------+
+| QUEUE | ``queue`` | 10   |
++-------+-----------+------+
+| END                      |
++--------------------------+
+
+|
+
++-----------------------+
+| Redirect to queue 5   |
++=======================+
+| DROP                  |
++-------+-----------+---+
+| QUEUE | ``queue`` | 5 |
++-------+-----------+---+
+| END                   |
++-----------------------+
+
+In the above example, considering both actions are performed simultaneously,
+the end result is that only QUEUE has any effect.
+
++-----------------------+
+| Redirect to queue 3   |
++=======+===========+===+
+| QUEUE | ``queue`` | 5 |
++-------+-----------+---+
+| VOID                  |
++-------+-----------+---+
+| QUEUE | ``queue`` | 3 |
++-------+-----------+---+
+| END                   |
++-----------------------+
+
+As previously described, only the last action of a given type found in the
+list is taken into account. The above example also shows that VOID is
+ignored.
+
+Action types
+~~~~~~~~~~~~
+
+Common action types are described in this section. Like pattern item types,
+this list is not exhaustive as new actions will be added in the future.
+
+``END`` (action)
+^^^^^^^^^^^^^^^^
+
+End marker for action lists. Prevents further processing of actions, thereby
+ending the list.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- No configurable properties.
+
++---------------+
+| END           |
++===============+
+| no properties |
++---------------+
+
+``VOID`` (action)
+^^^^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- No configurable properties.
+
++---------------+
+| VOID          |
++===============+
+| no properties |
++---------------+
+
+``PASSTHRU``
+^^^^^^^^^^^^
+
+Leaves packets up for additional processing by subsequent flow rules. This
+is the default when a rule does not contain a terminating action, but can be
+specified to force a rule to become non-terminating.
+
+- No configurable properties.
+
++---------------+
+| PASSTHRU      |
++===============+
+| no properties |
++---------------+
+
+Example to copy a packet to a queue and continue processing by subsequent
+flow rules:
+
++--------------------------+
+| Copy to queue 8          |
++==========================+
+| PASSTHRU                 |
++----------+-----------+---+
+| QUEUE    | ``queue`` | 8 |
++----------+-----------+---+
+| END                      |
++--------------------------+
+
+``MARK``
+^^^^^^^^
+
+Attaches a 32 bit value to packets.
+
+This value is arbitrary and application-defined. For compatibility with FDIR
+it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
+also set in ``ol_flags``.
+
++----------------------------------------------+
+| MARK                                         |
++========+=====================================+
+| ``id`` | 32 bit value to return with packets |
++--------+-------------------------------------+
+
+``FLAG``
+^^^^^^^^
+
+Flag packets. Similar to `MARK`_ but only affects ``ol_flags``.
+
+- No configurable properties.
+
+Note: a distinctive flag must be defined for it.
+
++---------------+
+| FLAG          |
++===============+
+| no properties |
++---------------+
+
+``QUEUE``
+^^^^^^^^^
+
+Assigns packets to a given queue index.
+
+- Terminating by default.
+
++--------------------------------+
+| QUEUE                          |
++===========+====================+
+| ``index`` | queue index to use |
++-----------+--------------------+
+
+``DROP``
+^^^^^^^^
+
+Drop packets.
+
+- No configurable properties.
+- Terminating by default.
+- PASSTHRU overrides this action if both are specified.
+
++---------------+
+| DROP          |
++===============+
+| no properties |
++---------------+
+
+``COUNT``
+^^^^^^^^^
+
+Enables counters for this rule.
+
+These counters can be retrieved and reset through ``rte_flow_query()``, see
+``struct rte_flow_query_count``.
+
+- Counters can be retrieved with ``rte_flow_query()``.
+- No configurable properties.
+
++---------------+
+| COUNT         |
++===============+
+| no properties |
++---------------+
+
+Query structure to retrieve and reset flow rule counters:
+
++---------------------------------------------------------+
+| COUNT query                                             |
++===============+=====+===================================+
+| ``reset``     | in  | reset counter after query         |
++---------------+-----+-----------------------------------+
+| ``hits_set``  | out | ``hits`` field is set             |
++---------------+-----+-----------------------------------+
+| ``bytes_set`` | out | ``bytes`` field is set            |
++---------------+-----+-----------------------------------+
+| ``hits``      | out | number of hits for this rule      |
++---------------+-----+-----------------------------------+
+| ``bytes``     | out | number of bytes through this rule |
++---------------+-----+-----------------------------------+
+
+``DUP``
+^^^^^^^
+
+Duplicates packets to a given queue index.
+
+This is normally combined with QUEUE, however when used alone, it is
+actually similar to QUEUE + PASSTHRU.
+
+- Non-terminating by default.
+
++------------------------------------------------+
+| DUP                                            |
++===========+====================================+
+| ``index`` | queue index to duplicate packet to |
++-----------+------------------------------------+
+
+``RSS``
+^^^^^^^
+
+Similar to QUEUE, except RSS is additionally performed on packets to spread
+them among several queues according to the provided parameters.
+
+Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
+however it conflicts with the `MARK`_ action as they share the same
+space. When both actions are specified, the RSS hash is discarded and
+``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
+structure should eventually evolve to store both.
+
+- Terminating by default.
+
++---------------------------------------------+
+| RSS                                         |
++==============+==============================+
+| ``rss_conf`` | RSS parameters               |
++--------------+------------------------------+
+| ``num``      | number of entries in queue[] |
++--------------+------------------------------+
+| ``queue[]``  | queue indices to use         |
++--------------+------------------------------+
+
+``PF`` (action)
+^^^^^^^^^^^^^^^
+
+Redirects packets to the physical function (PF) of the current device.
+
+- No configurable properties.
+- Terminating by default.
+
++---------------+
+| PF            |
++===============+
+| no properties |
++---------------+
+
+``VF`` (action)
+^^^^^^^^^^^^^^^
+
+Redirects packets to a virtual function (VF) of the current device.
+
+Packets matched by a VF pattern item can be redirected to their original VF
+ID instead of the specified one. This parameter may not be available and is
+not guaranteed to work properly if the VF part is matched by a prior flow
+rule or if packets are not addressed to a VF in the first place.
+
+- Terminating by default.
+
++-----------------------------------------------+
+| VF                                            |
++==============+================================+
+| ``original`` | use original VF ID if possible |
++--------------+--------------------------------+
+| ``vf``       | VF ID to redirect packets to   |
++--------------+--------------------------------+
+
+Negative types
+~~~~~~~~~~~~~~
+
+All specified pattern items (``enum rte_flow_item_type``) and actions
+(``enum rte_flow_action_type``) use positive identifiers.
+
+The negative space is reserved for dynamic types generated by PMDs during
+run-time. PMDs may encounter them as a result but must not accept negative
+identifiers they are not aware of.
+
+A method to generate them remains to be defined.
+
+Planned types
+~~~~~~~~~~~~~
+
+Pattern item types will be added as new protocols are implemented.
+
+Variable headers support through dedicated pattern items, for example in
+order to match specific IPv4 options and IPv6 extension headers would be
+stacked after IPv4/IPv6 items.
+
+Other action types are planned but are not defined yet. These include the
+ability to alter packet data in several ways, such as performing
+encapsulation/decapsulation of tunnel headers.
+
+Rules management
+----------------
+
+A rather simple API with few functions is provided to fully manage flow
+rules.
+
+Each created flow rule is associated with an opaque, PMD-specific handle
+pointer. The application is responsible for keeping it until the rule is
+destroyed.
+
+Flows rules are represented by ``struct rte_flow`` objects.
+
+Validation
+~~~~~~~~~~
+
+Given that expressing a definite set of device capabilities is not
+practical, a dedicated function is provided to check if a flow rule is
+supported and can be created.
+
+::
+
+ int
+ rte_flow_validate(uint8_t port_id,
+                   const struct rte_flow_attr *attr,
+                   const struct rte_flow_item pattern[],
+                   const struct rte_flow_action actions[],
+                   struct rte_flow_error *error);
+
+While this function has no effect on the target device, the flow rule is
+validated against its current configuration state and the returned value
+should be considered valid by the caller for that state only.
+
+The returned value is guaranteed to remain valid only as long as no
+successful calls to ``rte_flow_create()`` or ``rte_flow_destroy()`` are made
+in the meantime and no device parameter affecting flow rules in any way are
+modified, due to possible collisions or resource limitations (although in
+such cases ``EINVAL`` should not be returned).
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL.
+
+Return values:
+
+- 0 if flow rule is valid and can be created. A negative errno value
+  otherwise (``rte_errno`` is also set), the following errors are defined.
+- ``-ENOSYS``: underlying device does not support this functionality.
+- ``-EINVAL``: unknown or invalid rule specification.
+- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
+  bit-masks are unsupported).
+- ``-EEXIST``: collision with an existing rule.
+- ``-ENOMEM``: not enough resources.
+- ``-EBUSY``: action cannot be performed due to busy device resources, may
+  succeed if the affected queues or even the entire port are in a stopped
+  state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).
+
+Creation
+~~~~~~~~
+
+Creating a flow rule is similar to validating one, except the rule is
+actually created and a handle returned.
+
+::
+
+ struct rte_flow *
+ rte_flow_create(uint8_t port_id,
+                 const struct rte_flow_attr *attr,
+                 const struct rte_flow_item pattern[],
+                 const struct rte_flow_action *actions[],
+                 struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL.
+
+Return values:
+
+A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
+to the positive version of one of the error codes defined for
+``rte_flow_validate()``.
+
+Destruction
+~~~~~~~~~~~
+
+Flow rules destruction is not automatic, and a queue or a port should not be
+released if any are still attached to them. Applications must take care of
+performing this step before releasing resources.
+
+::
+
+ int
+ rte_flow_destroy(uint8_t port_id,
+                  struct rte_flow *flow,
+                  struct rte_flow_error *error);
+
+
+Failure to destroy a flow rule handle may occur when other flow rules depend
+on it, and destroying it would result in an inconsistent state.
+
+This function is only guaranteed to succeed if handles are destroyed in
+reverse order of their creation.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to destroy.
+- ``error``: perform verbose error reporting if not NULL.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Flush
+~~~~~
+
+Convenience function to destroy all flow rule handles associated with a
+port. They are released as with successive calls to ``rte_flow_destroy()``.
+
+::
+
+ int
+ rte_flow_flush(uint8_t port_id,
+                struct rte_flow_error *error);
+
+In the unlikely event of failure, handles are still considered destroyed and
+no longer valid but the port must be assumed to be in an inconsistent state.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``error``: perform verbose error reporting if not NULL.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Query
+~~~~~
+
+Query an existing flow rule.
+
+This function allows retrieving flow-specific data such as counters. Data
+is gathered by special actions which must be present in the flow rule
+definition.
+
+::
+
+ int
+ rte_flow_query(uint8_t port_id,
+                struct rte_flow *flow,
+                enum rte_flow_action_type action,
+                void *data,
+                struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to query.
+- ``action``: action type to query.
+- ``data``: pointer to storage for the associated query data type.
+- ``error``: perform verbose error reporting if not NULL.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Verbose error reporting
+-----------------------
+
+The defined *errno* values may not be accurate enough for users or
+application developers who want to investigate issues related to flow rules
+management. A dedicated error object is defined for this purpose::
+
+ enum rte_flow_error_type {
+     RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+     RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+     RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+     RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+     RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+     RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+     RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+     RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+     RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+     RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+     RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+     RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+ };
+
+ struct rte_flow_error {
+     enum rte_flow_error_type type; /**< Cause field and error types. */
+     const void *cause; /**< Object responsible for the error. */
+     const char *message; /**< Human-readable error message. */
+ };
+
+Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case
+remaining fields can be ignored. Other error types describe the type of the
+object pointed by ``cause``.
+
+If non-NULL, ``cause`` points to the object responsible for the error. For a
+flow rule, this may be a pattern item or an individual action.
+
+If non-NULL, ``message`` provides a human-readable error message.
+
+This object is normally allocated by applications and set by PMDs, the
+message points to a constant string which does not need to be freed by the
+application, however its pointer can be considered valid only as long as its
+associated DPDK port remains configured. Closing the underlying device or
+unloading the PMD invalidates it.
+
+Caveats
+-------
+
+- DPDK does not keep track of flow rules definitions or flow rule objects
+  automatically. Applications may keep track of the former and must keep
+  track of the latter. PMDs may also do it for internal needs, however this
+  must not be relied on by applications.
+
+- Flow rules are not maintained between successive port initializations. An
+  application exiting without releasing them and restarting must re-create
+  them from scratch.
+
+- API operations are synchronous and blocking (``EAGAIN`` cannot be
+  returned).
+
+- There is no provision for reentrancy/multi-thread safety, although nothing
+  should prevent different devices from being configured at the same
+  time. PMDs may protect their control path functions accordingly.
+
+- Stopping the data path (TX/RX) should not be necessary when managing flow
+  rules. If this cannot be achieved naturally or with workarounds (such as
+  temporarily replacing the burst function pointers), an appropriate error
+  code must be returned (``EBUSY``).
+
+- PMDs, not applications, are responsible for maintaining flow rules
+  configuration when stopping and restarting a port or performing other
+  actions which may affect them. They can only be destroyed explicitly by
+  applications.
+
+For devices exposing multiple ports sharing global settings affected by flow
+rules:
+
+- All ports under DPDK control must behave consistently, PMDs are
+  responsible for making sure that existing flow rules on a port are not
+  affected by other ports.
+
+- Ports not under DPDK control (unaffected or handled by other applications)
+  are user's responsibility. They may affect existing flow rules and cause
+  undefined behavior. PMDs aware of this may prevent flow rules creation
+  altogether in such cases.
+
+PMD interface
+-------------
+
+The PMD interface is defined in ``rte_flow_driver.h``. It is not subject to
+API/ABI versioning constraints as it is not exposed to applications and may
+evolve independently.
+
+It is currently implemented on top of the legacy filtering framework through
+filter type *RTE_ETH_FILTER_GENERIC* that accepts the single operation
+*RTE_ETH_FILTER_GET* to return PMD-specific *rte_flow* callbacks wrapped
+inside ``struct rte_flow_ops``.
+
+This overhead is temporarily necessary in order to keep compatibility with
+the legacy filtering framework, which should eventually disappear.
+
+- PMD callbacks implement exactly the interface described in `Rules
+  management`_, except for the port ID argument which has already been
+  converted to a pointer to the underlying ``struct rte_eth_dev``.
+
+- Public API functions do not process flow rules definitions at all before
+  calling PMD functions (no basic error checking, no validation
+  whatsoever). They only make sure these callbacks are non-NULL or return
+  the ``ENOSYS`` (function not supported) error.
+
+This interface additionally defines the following helper functions:
+
+- ``rte_flow_ops_get()``: get generic flow operations structure from a
+  port.
+
+- ``rte_flow_error_set()``: initialize generic flow error structure.
+
+More will be added over time.
+
+Device compatibility
+--------------------
+
+No known implementation supports all the described features.
+
+Unsupported features or combinations are not expected to be fully emulated
+in software by PMDs for performance reasons. Partially supported features
+may be completed in software as long as hardware performs most of the work
+(such as queue redirection and packet recognition).
+
+However PMDs are expected to do their best to satisfy application requests
+by working around hardware limitations as long as doing so does not affect
+the behavior of existing flow rules.
+
+The following sections provide a few examples of such cases and describe how
+PMDs should handle them, they are based on limitations built into the
+previous APIs.
+
+Global bit-masks
+~~~~~~~~~~~~~~~~
+
+Each flow rule comes with its own, per-layer bit-masks, while hardware may
+support only a single, device-wide bit-mask for a given layer type, so that
+two IPv4 rules cannot use different bit-masks.
+
+The expected behavior in this case is that PMDs automatically configure
+global bit-masks according to the needs of the first flow rule created.
+
+Subsequent rules are allowed only if their bit-masks match those, the
+``EEXIST`` error code should be returned otherwise.
+
+Unsupported layer types
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Many protocols can be simulated by crafting patterns with the `RAW`_ type.
+
+PMDs can rely on this capability to simulate support for protocols with
+headers not directly recognized by hardware.
+
+``ANY`` pattern item
+~~~~~~~~~~~~~~~~~~~~
+
+This pattern item stands for anything, which can be difficult to translate
+to something hardware would understand, particularly if followed by more
+specific types.
+
+Consider the following pattern:
+
++---+-------------------------+
+| 0 | ETHER                   |
++---+-------+---------+-------+
+| 1 | ANY   | ``num`` | ``1`` |
++---+-------+---------+-------+
+| 2 | TCP                     |
++---+-------------------------+
+| 3 | END                     |
++---+-------------------------+
+
+Knowing that TCP does not make sense with something other than IPv4 and IPv6
+as L3, such a pattern may be translated to two flow rules instead:
+
++---+--------------------+
+| 0 | ETHER              |
++---+--------------------+
+| 1 | IPV4 (zeroed mask) |
++---+--------------------+
+| 2 | TCP                |
++---+--------------------+
+| 3 | END                |
++---+--------------------+
+
+..
+
++---+--------------------+
+| 0 | ETHER              |
++---+--------------------+
+| 1 | IPV6 (zeroed mask) |
++---+--------------------+
+| 2 | TCP                |
++---+--------------------+
+| 3 | END                |
++---+--------------------+
+
+Note that as soon as a ANY rule covers several layers, this approach may
+yield a large number of hidden flow rules. It is thus suggested to only
+support the most common scenarios (anything as L2 and/or L3).
+
+Unsupported actions
+~~~~~~~~~~~~~~~~~~~
+
+- When combined with a `QUEUE`_ action, packet counting (`COUNT`_) and
+  tagging (`MARK`_ or `FLAG`_) may be implemented in software as long as the
+  target queue is used by a single rule.
+
+- A rule specifying both `DUP`_ + `QUEUE`_ may be translated to two hidden
+  rules combining `QUEUE`_ and `PASSTHRU`_.
+
+- When a single target queue is provided, `RSS`_ can also be implemented
+  through `QUEUE`_.
+
+Flow rules priority
+~~~~~~~~~~~~~~~~~~~
+
+While it would naturally make sense, flow rules cannot be assumed to be
+processed by hardware in the same order as their creation for several
+reasons:
+
+- They may be managed internally as a tree or a hash table instead of a
+  list.
+- Removing a flow rule before adding another one can either put the new rule
+  at the end of the list or reuse a freed entry.
+- Duplication may occur when packets are matched by several rules.
+
+For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
+predictable behavior is only guaranteed by using different priority levels.
+
+Priority levels are not necessarily implemented in hardware, or may be
+severely limited (e.g. a single priority bit).
+
+For these reasons, priority levels may be implemented purely in software by
+PMDs.
+
+- For devices expecting flow rules to be added in the correct order, PMDs
+  may destroy and re-create existing rules after adding a new one with
+  a higher priority.
+
+- A configurable number of dummy or empty rules can be created at
+  initialization time to save high priority slots for later.
+
+- In order to save priority levels, PMDs may evaluate whether rules are
+  likely to collide and adjust their priority accordingly.
+
+Future evolutions
+-----------------
+
+- A device profile selection function which could be used to force a
+  permanent profile instead of relying on its automatic configuration based
+  on existing flow rules.
+
+- A method to optimize *rte_flow* rules with specific pattern items and
+  action types generated on the fly by PMDs. DPDK should assign negative
+  numbers to these in order to not collide with the existing types. See
+  `Negative types`_.
+
+- Adding specific egress pattern items and actions as described in `Traffic
+  direction`_.
+
+- Optional software fallback when PMDs are unable to handle requested flow
+  rules so applications do not have to implement their own.
+
+API migration
+-------------
+
+Exhaustive list of deprecated filter types (normally prefixed with
+*RTE_ETH_FILTER_*) found in ``rte_eth_ctrl.h`` and methods to convert them
+to *rte_flow* rules.
+
+``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*MACVLAN* can be translated to a basic `ETH`_ flow rule with a `VF
+(action)`_ or `PF (action)`_ terminating action.
+
++------------------------------------+
+| MACVLAN                            |
++--------------------------+---------+
+| Pattern                  | Actions |
++===+=====+==========+=====+=========+
+| 0 | ETH | ``spec`` | any | VF,     |
+|   |     +----------+-----+ PF      |
+|   |     | ``last`` | N/A |         |
+|   |     +----------+-----+         |
+|   |     | ``mask`` | any |         |
++---+-----+----------+-----+---------+
+| 1 | END                  | END     |
++---+----------------------+---------+
+
+``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*ETHERTYPE* is basically an `ETH`_ flow rule with `QUEUE`_ or `DROP`_ as a
+terminating action.
+
++------------------------------------+
+| ETHERTYPE                          |
++--------------------------+---------+
+| Pattern                  | Actions |
++===+=====+==========+=====+=========+
+| 0 | ETH | ``spec`` | any | QUEUE,  |
+|   |     +----------+-----+ DROP    |
+|   |     | ``last`` | N/A |         |
+|   |     +----------+-----+         |
+|   |     | ``mask`` | any |         |
++---+-----+----------+-----+---------+
+| 1 | END                  | END     |
++---+----------------------+---------+
+
+``FLEXIBLE`` to ``RAW`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FLEXIBLE* can be translated to one `RAW`_ pattern with `QUEUE`_ as the
+terminating action and a defined priority level.
+
++------------------------------------+
+| FLEXIBLE                           |
++--------------------------+---------+
+| Pattern                  | Actions |
++===+=====+==========+=====+=========+
+| 0 | RAW | ``spec`` | any | QUEUE   |
+|   |     +----------+-----+         |
+|   |     | ``last`` | N/A |         |
+|   |     +----------+-----+         |
+|   |     | ``mask`` | any |         |
++---+-----+----------+-----+---------+
+| 1 | END                  | END     |
++---+----------------------+---------+
+
+``SYN`` to ``TCP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*SYN* is a `TCP`_ rule with only the ``syn`` bit enabled and masked, and
+`QUEUE`_ as the terminating action.
+
+Priority level can be set to simulate the high priority bit.
+
++---------------------------------------------+
+| SYN                                         |
++-----------------------------------+---------+
+| Pattern                           | Actions |
++===+======+==========+=============+=========+
+| 0 | ETH  | ``spec`` | unset       | QUEUE   |
+|   |      +----------+-------------+         |
+|   |      | ``last`` | unset       |         |
+|   |      +----------+-------------+         |
+|   |      | ``mask`` | unset       |         |
++---+------+----------+-------------+         |
+| 1 | IPV4 | ``spec`` | unset       |         |
+|   |      +----------+-------------+         |
+|   |      | ``mask`` | unset       |         |
+|   |      +----------+-------------+         |
+|   |      | ``mask`` | unset       |         |
++---+------+----------+---------+---+         |
+| 2 | TCP  | ``spec`` | ``syn`` | 1 |         |
+|   |      +----------+---------+---+         |
+|   |      | ``mask`` | ``syn`` | 1 |         |
++---+------+----------+---------+---+---------+
+| 3 | END                           | END     |
++---+-------------------------------+---------+
+
+``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*NTUPLE* is similar to specifying an empty L2, `IPV4`_ as L3 with `TCP`_ or
+`UDP`_ as L4 and `QUEUE`_ as the terminating action.
+
+A priority level can be specified as well.
+
++---------------------------------------+
+| NTUPLE                                |
++-----------------------------+---------+
+| Pattern                     | Actions |
++===+======+==========+=======+=========+
+| 0 | ETH  | ``spec`` | unset | QUEUE   |
+|   |      +----------+-------+         |
+|   |      | ``last`` | unset |         |
+|   |      +----------+-------+         |
+|   |      | ``mask`` | unset |         |
++---+------+----------+-------+         |
+| 1 | IPV4 | ``spec`` | any   |         |
+|   |      +----------+-------+         |
+|   |      | ``last`` | unset |         |
+|   |      +----------+-------+         |
+|   |      | ``mask`` | any   |         |
++---+------+----------+-------+         |
+| 2 | TCP, | ``spec`` | any   |         |
+|   | UDP  +----------+-------+         |
+|   |      | ``last`` | unset |         |
+|   |      +----------+-------+         |
+|   |      | ``mask`` | any   |         |
++---+------+----------+-------+---------+
+| 3 | END                     | END     |
++---+-------------------------+---------+
+
+``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*TUNNEL* matches common IPv4 and IPv6 L3/L4-based tunnel types.
+
+In the following table, `ANY`_ is used to cover the optional L4.
+
++------------------------------------------------+
+| TUNNEL                                         |
++--------------------------------------+---------+
+| Pattern                              | Actions |
++===+=========+==========+=============+=========+
+| 0 | ETH     | ``spec`` | any         | QUEUE   |
+|   |         +----------+-------------+         |
+|   |         | ``last`` | unset       |         |
+|   |         +----------+-------------+         |
+|   |         | ``mask`` | any         |         |
++---+---------+----------+-------------+         |
+| 1 | IPV4,   | ``spec`` | any         |         |
+|   | IPV6    +----------+-------------+         |
+|   |         | ``last`` | unset       |         |
+|   |         +----------+-------------+         |
+|   |         | ``mask`` | any         |         |
++---+---------+----------+-------------+         |
+| 2 | ANY     | ``spec`` | any         |         |
+|   |         +----------+-------------+         |
+|   |         | ``last`` | unset       |         |
+|   |         +----------+---------+---+         |
+|   |         | ``mask`` | ``num`` | 0 |         |
++---+---------+----------+---------+---+         |
+| 3 | VXLAN,  | ``spec`` | any         |         |
+|   | GENEVE, +----------+-------------+         |
+|   | TEREDO, | ``last`` | unset       |         |
+|   | NVGRE,  +----------+-------------+         |
+|   | GRE,    | ``mask`` | any         |         |
+|   | ...     |          |             |         |
+|   |         |          |             |         |
+|   |         |          |             |         |
++---+---------+----------+-------------+---------+
+| 4 | END                              | END     |
++---+----------------------------------+---------+
+
+``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FDIR* is more complex than any other type, there are several methods to
+emulate its functionality. It is summarized for the most part in the table
+below.
+
+A few features are intentionally not supported:
+
+- The ability to configure the matching input set and masks for the entire
+  device, PMDs should take care of it automatically according to the
+  requested flow rules.
+
+  For example if a device supports only one bit-mask per protocol type,
+  source/address IPv4 bit-masks can be made immutable by the first created
+  rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
+  compatible.
+
+  Note that only protocol bit-masks affected by existing flow rules are
+  immutable, others can be changed later. They become mutable again after
+  the related flow rules are destroyed.
+
+- Returning four or eight bytes of matched data when using flex bytes
+  filtering. Although a specific action could implement it, it conflicts
+  with the much more useful 32 bits tagging on devices that support it.
+
+- Side effects on RSS processing of the entire device. Flow rules that
+  conflict with the current device configuration should not be
+  allowed. Similarly, device configuration should not be allowed when it
+  affects existing flow rules.
+
+- Device modes of operation. "none" is unsupported since filtering cannot be
+  disabled as long as a flow rule is present.
+
+- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
+  according to the created flow rules.
+
+- Signature mode of operation is not defined but could be handled through a
+  specific item type if needed.
+
++----------------------------------------------+
+| FDIR                                         |
++---------------------------------+------------+
+| Pattern                         | Actions    |
++===+============+==========+=====+============+
+| 0 | ETH,       | ``spec`` | any | QUEUE,     |
+|   | RAW        +----------+-----+ DROP,      |
+|   |            | ``last`` | N/A | PASSTHRU   |
+|   |            +----------+-----+            |
+|   |            | ``mask`` | any |            |
++---+------------+----------+-----+------------+
+| 1 | IPV4,      | ``spec`` | any | MARK       |
+|   | IPV6       +----------+-----+            |
+|   |            | ``last`` | N/A |            |
+|   |            +----------+-----+            |
+|   |            | ``mask`` | any |            |
++---+------------+----------+-----+            |
+| 2 | TCP,       | ``spec`` | any |            |
+|   | UDP,       +----------+-----+            |
+|   | SCTP       | ``last`` | N/A |            |
+|   |            +----------+-----+            |
+|   |            | ``mask`` | any |            |
++---+------------+----------+-----+            |
+| 3 | VF,        | ``spec`` | any |            |
+|   | PF         +----------+-----+            |
+|   | (optional) | ``last`` | N/A |            |
+|   |            +----------+-----+            |
+|   |            | ``mask`` | any |            |
++---+------------+----------+-----+------------+
+| 4 | END                         | END        |
++---+-----------------------------+------------+
+
+
+``HASH``
+~~~~~~~~
+
+There is no counterpart to this filter type because it translates to a
+global device setting instead of a pattern item. Device settings are
+automatically set according to the created flow rules.
+
+``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All packets are matched. This type alters incoming packets to encapsulate
+them in a chosen tunnel type, optionally redirect them to a VF as well.
+
+The destination pool for tag based forwarding can be emulated with other
+flow rules using `DUP`_ as the action.
+
++----------------------------------------+
+| L2_TUNNEL                              |
++---------------------------+------------+
+| Pattern                   | Actions    |
++===+======+==========+=====+============+
+| 0 | VOID | ``spec`` | N/A | VXLAN,     |
+|   |      |          |     | GENEVE,    |
+|   |      |          |     | ...        |
+|   |      +----------+-----+------------+
+|   |      | ``last`` | N/A | VF         |
+|   |      +----------+-----+ (optional) |
+|   |      | ``mask`` | N/A |            |
+|   |      |          |     |            |
++---+------+----------+-----+------------+
+| 1 | END                   | END        |
++---+-----------------------+------------+
-- 
2.1.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v2 01/25] ethdev: introduce generic flow API
  @ 2016-12-16 16:24  2%     ` Adrien Mazarguil
  2016-12-16 16:24  1%     ` [dpdk-dev] [PATCH v2 02/25] doc: add rte_flow prog guide Adrien Mazarguil
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-16 16:24 UTC (permalink / raw)
  To: dev

This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

Benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

Existing filter types will be deprecated and removed in the near future.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 MAINTAINERS                            |   4 +
 doc/api/doxy-api-index.md              |   2 +
 lib/librte_ether/Makefile              |   3 +
 lib/librte_ether/rte_eth_ctrl.h        |   1 +
 lib/librte_ether/rte_ether_version.map |  11 +
 lib/librte_ether/rte_flow.c            | 159 +++++
 lib/librte_ether/rte_flow.h            | 942 ++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow_driver.h     | 181 ++++++
 8 files changed, 1303 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 26d9590..5975cff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -243,6 +243,10 @@ M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
 F: scripts/test-null.sh
 
+Generic flow API
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: lib/librte_ether/rte_flow*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index de65b4c..4951552 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -39,6 +39,8 @@ There are many libraries, so their headers may be grouped by topics:
   [dev]                (@ref rte_dev.h),
   [ethdev]             (@ref rte_ethdev.h),
   [ethctrl]            (@ref rte_eth_ctrl.h),
+  [rte_flow]           (@ref rte_flow.h),
+  [rte_flow_driver]    (@ref rte_flow_driver.h),
   [cryptodev]          (@ref rte_cryptodev.h),
   [devargs]            (@ref rte_devargs.h),
   [bond]               (@ref rte_eth_bond.h),
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..9335361 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
 LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
+SRCS-y += rte_flow.c
 
 #
 # Export include files
@@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
+SYMLINK-y-include += rte_flow_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index fe80eb0..8386904 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -99,6 +99,7 @@ enum rte_filter_type {
 	RTE_ETH_FILTER_FDIR,
 	RTE_ETH_FILTER_HASH,
 	RTE_ETH_FILTER_L2_TUNNEL,
+	RTE_ETH_FILTER_GENERIC,
 	RTE_ETH_FILTER_MAX
 };
 
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..384cdee 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -147,3 +147,14 @@ DPDK_16.11 {
 	rte_eth_dev_pci_remove;
 
 } DPDK_16.07;
+
+DPDK_17.02 {
+	global:
+
+	rte_flow_validate;
+	rte_flow_create;
+	rte_flow_destroy;
+	rte_flow_flush;
+	rte_flow_query;
+
+} DPDK_16.11;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
new file mode 100644
index 0000000..064963d
--- /dev/null
+++ b/lib/librte_ether/rte_flow.c
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_flow_driver.h"
+#include "rte_flow.h"
+
+/* Get generic flow operations structure from a port. */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops;
+	int code;
+
+	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
+		code = ENODEV;
+	else if (unlikely(!dev->dev_ops->filter_ctrl ||
+			  dev->dev_ops->filter_ctrl(dev,
+						    RTE_ETH_FILTER_GENERIC,
+						    RTE_ETH_FILTER_GET,
+						    &ops) ||
+			  !ops))
+		code = ENOTSUP;
+	else
+		return ops;
+	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(code));
+	return NULL;
+}
+
+/* Check whether a flow rule can be created on a given port. */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error)
+{
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->validate))
+		return ops->validate(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
+
+/* Create a flow rule on a given port. */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return NULL;
+	if (likely(!!ops->create))
+		return ops->create(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return NULL;
+}
+
+/* Destroy a flow rule on a given port. */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->destroy))
+		return ops->destroy(dev, flow, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
+
+/* Destroy all flow rules associated with a port. */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->flush))
+		return ops->flush(dev, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
+
+/* Query an existing flow rule. */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (!ops)
+		return -rte_errno;
+	if (likely(!!ops->query))
+		return ops->query(dev, flow, action, data, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
new file mode 100644
index 0000000..0bd5957
--- /dev/null
+++ b/lib/librte_ether/rte_flow.h
@@ -0,0 +1,942 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_H_
+#define RTE_FLOW_H_
+
+/**
+ * @file
+ * RTE generic flow API
+ *
+ * This interface provides the ability to program packet matching and
+ * associated actions in hardware through flow rules.
+ */
+
+#include <rte_arp.h>
+#include <rte_ether.h>
+#include <rte_icmp.h>
+#include <rte_ip.h>
+#include <rte_sctp.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Flow rule attributes.
+ *
+ * Priorities are set on two levels: per group and per rule within groups.
+ *
+ * Lower values denote higher priority, the highest priority for both levels
+ * is 0, so that a rule with priority 0 in group 8 is always matched after a
+ * rule with priority 8 in group 0.
+ *
+ * Although optional, applications are encouraged to group similar rules as
+ * much as possible to fully take advantage of hardware capabilities
+ * (e.g. optimized matching) and work around limitations (e.g. a single
+ * pattern type possibly allowed in a given group).
+ *
+ * Group and priority levels are arbitrary and up to the application, they
+ * do not need to be contiguous nor start from 0, however the maximum number
+ * varies between devices and may be affected by existing flow rules.
+ *
+ * If a packet is matched by several rules of a given group for a given
+ * priority level, the outcome is undefined. It can take any path, may be
+ * duplicated or even cause unrecoverable errors.
+ *
+ * Note that support for more than a single group and priority level is not
+ * guaranteed.
+ *
+ * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+ *
+ * Several pattern items and actions are valid and can be used in both
+ * directions. Those valid for only one direction are described as such.
+ *
+ * At least one direction must be specified.
+ *
+ * Specifying both directions at once for a given rule is not recommended
+ * but may be valid in a few cases (e.g. shared counter).
+ */
+struct rte_flow_attr {
+	uint32_t group; /**< Priority group. */
+	uint32_t priority; /**< Priority level within group. */
+	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
+	uint32_t egress:1; /**< Rule applies to egress traffic. */
+	uint32_t reserved:30; /**< Reserved, must be zero. */
+};
+
+/**
+ * Matching pattern item types.
+ *
+ * Pattern items fall in two categories:
+ *
+ * - Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+ *   IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+ *   specification structure. These must be stacked in the same order as the
+ *   protocol layers to match, starting from the lowest.
+ *
+ * - Matching meta-data or affecting pattern processing (END, VOID, INVERT,
+ *   PF, VF, PORT and so on), often without a specification structure. Since
+ *   they do not match packet contents, these can be specified anywhere
+ *   within item lists without affecting others.
+ *
+ * See the description of individual types for more information. Those
+ * marked with [META] fall into the second category.
+ */
+enum rte_flow_item_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for item lists. Prevents further processing of items,
+	 * thereby ending the pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_VOID,
+
+	/**
+	 * [META]
+	 *
+	 * Inverted matching, i.e. process packets that do not match the
+	 * pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_INVERT,
+
+	/**
+	 * Matches any protocol in place of the current layer, a single ANY
+	 * may also stand for several protocol layers.
+	 *
+	 * See struct rte_flow_item_any.
+	 */
+	RTE_FLOW_ITEM_TYPE_ANY,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to the physical function of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a PF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_PF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to a virtual function ID of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a VF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * See struct rte_flow_item_vf.
+	 */
+	RTE_FLOW_ITEM_TYPE_VF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets coming from the specified physical port of the
+	 * underlying device.
+	 *
+	 * The first PORT item overrides the physical port normally
+	 * associated with the specified DPDK input port (port_id). This
+	 * item can be provided several times to match additional physical
+	 * ports.
+	 *
+	 * See struct rte_flow_item_port.
+	 */
+	RTE_FLOW_ITEM_TYPE_PORT,
+
+	/**
+	 * Matches a byte string of a given length at a given offset.
+	 *
+	 * See struct rte_flow_item_raw.
+	 */
+	RTE_FLOW_ITEM_TYPE_RAW,
+
+	/**
+	 * Matches an Ethernet header.
+	 *
+	 * See struct rte_flow_item_eth.
+	 */
+	RTE_FLOW_ITEM_TYPE_ETH,
+
+	/**
+	 * Matches an 802.1Q/ad VLAN tag.
+	 *
+	 * See struct rte_flow_item_vlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VLAN,
+
+	/**
+	 * Matches an IPv4 header.
+	 *
+	 * See struct rte_flow_item_ipv4.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV4,
+
+	/**
+	 * Matches an IPv6 header.
+	 *
+	 * See struct rte_flow_item_ipv6.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6,
+
+	/**
+	 * Matches an ICMP header.
+	 *
+	 * See struct rte_flow_item_icmp.
+	 */
+	RTE_FLOW_ITEM_TYPE_ICMP,
+
+	/**
+	 * Matches a UDP header.
+	 *
+	 * See struct rte_flow_item_udp.
+	 */
+	RTE_FLOW_ITEM_TYPE_UDP,
+
+	/**
+	 * Matches a TCP header.
+	 *
+	 * See struct rte_flow_item_tcp.
+	 */
+	RTE_FLOW_ITEM_TYPE_TCP,
+
+	/**
+	 * Matches a SCTP header.
+	 *
+	 * See struct rte_flow_item_sctp.
+	 */
+	RTE_FLOW_ITEM_TYPE_SCTP,
+
+	/**
+	 * Matches a VXLAN header.
+	 *
+	 * See struct rte_flow_item_vxlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VXLAN,
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ANY
+ *
+ * Matches any protocol in place of the current layer, a single ANY may also
+ * stand for several protocol layers.
+ *
+ * This is usually specified as the first pattern item when looking for a
+ * protocol anywhere in a packet.
+ *
+ * A zeroed mask stands for any number of layers.
+ */
+struct rte_flow_item_any {
+	uint32_t num; /* Number of layers covered. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VF
+ *
+ * Matches packets addressed to a virtual function ID of the device.
+ *
+ * If the underlying device function differs from the one that would
+ * normally receive the matched traffic, specifying this item prevents it
+ * from reaching that device unless the flow rule contains a VF
+ * action. Packets are not duplicated between device instances by default.
+ *
+ * - Likely to return an error or never match any traffic if this causes a
+ *   VF device to match traffic addressed to a different VF.
+ * - Can be specified multiple times to match traffic addressed to several
+ *   VF IDs.
+ * - Can be combined with a PF item to match both PF and VF traffic.
+ *
+ * A zeroed mask can be used to match any VF ID.
+ */
+struct rte_flow_item_vf {
+	uint32_t id; /**< Destination VF ID. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_PORT
+ *
+ * Matches packets coming from the specified physical port of the underlying
+ * device.
+ *
+ * The first PORT item overrides the physical port normally associated with
+ * the specified DPDK input port (port_id). This item can be provided
+ * several times to match additional physical ports.
+ *
+ * Note that physical ports are not necessarily tied to DPDK input ports
+ * (port_id) when those are not under DPDK control. Possible values are
+ * specific to each device, they are not necessarily indexed from zero and
+ * may not be contiguous.
+ *
+ * As a device property, the list of allowed values as well as the value
+ * associated with a port_id should be retrieved by other means.
+ *
+ * A zeroed mask can be used to match any port index.
+ */
+struct rte_flow_item_port {
+	uint32_t index; /**< Physical port index. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_RAW
+ *
+ * Matches a byte string of a given length at a given offset.
+ *
+ * Offset is either absolute (using the start of the packet) or relative to
+ * the end of the previous matched item in the stack, in which case negative
+ * values are allowed.
+ *
+ * If search is enabled, offset is used as the starting point. The search
+ * area can be delimited by setting limit to a nonzero value, which is the
+ * maximum number of bytes after offset where the pattern may start.
+ *
+ * Matching a zero-length pattern is allowed, doing so resets the relative
+ * offset for subsequent items.
+ *
+ * This type does not support ranges (struct rte_flow_item.last).
+ */
+struct rte_flow_item_raw {
+	uint32_t relative:1; /**< Look for pattern after the previous item. */
+	uint32_t search:1; /**< Search pattern from offset (see also limit). */
+	uint32_t reserved:30; /**< Reserved, must be set to zero. */
+	int32_t offset; /**< Absolute or relative offset for pattern. */
+	uint16_t limit; /**< Search area limit for start of pattern. */
+	uint16_t length; /**< Pattern length. */
+	uint8_t pattern[]; /**< Byte string to look for. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ETH
+ *
+ * Matches an Ethernet header.
+ */
+struct rte_flow_item_eth {
+	struct ether_addr dst; /**< Destination MAC. */
+	struct ether_addr src; /**< Source MAC. */
+	uint16_t type; /**< EtherType. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VLAN
+ *
+ * Matches an 802.1Q/ad VLAN tag.
+ *
+ * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
+ * RTE_FLOW_ITEM_TYPE_VLAN.
+ */
+struct rte_flow_item_vlan {
+	uint16_t tpid; /**< Tag protocol identifier. */
+	uint16_t tci; /**< Tag control information. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV4
+ *
+ * Matches an IPv4 header.
+ *
+ * Note: IPv4 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv4 {
+	struct ipv4_hdr hdr; /**< IPv4 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV6.
+ *
+ * Matches an IPv6 header.
+ *
+ * Note: IPv6 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv6 {
+	struct ipv6_hdr hdr; /**< IPv6 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ICMP.
+ *
+ * Matches an ICMP header.
+ */
+struct rte_flow_item_icmp {
+	struct icmp_hdr hdr; /**< ICMP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_UDP.
+ *
+ * Matches a UDP header.
+ */
+struct rte_flow_item_udp {
+	struct udp_hdr hdr; /**< UDP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_TCP.
+ *
+ * Matches a TCP header.
+ */
+struct rte_flow_item_tcp {
+	struct tcp_hdr hdr; /**< TCP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_SCTP.
+ *
+ * Matches a SCTP header.
+ */
+struct rte_flow_item_sctp {
+	struct sctp_hdr hdr; /**< SCTP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VXLAN.
+ *
+ * Matches a VXLAN header (RFC 7348).
+ */
+struct rte_flow_item_vxlan {
+	uint8_t flags; /**< Normally 0x08 (I flag). */
+	uint8_t rsvd0[3]; /**< Reserved, normally 0x000000. */
+	uint8_t vni[3]; /**< VXLAN identifier. */
+	uint8_t rsvd1; /**< Reserved, normally 0x00. */
+};
+
+/**
+ * Matching pattern item definition.
+ *
+ * A pattern is formed by stacking items starting from the lowest protocol
+ * layer to match. This stacking restriction does not apply to meta items
+ * which can be placed anywhere in the stack without affecting the meaning
+ * of the resulting pattern.
+ *
+ * Patterns are terminated by END items.
+ *
+ * The spec field should be a valid pointer to a structure of the related
+ * item type. It may be set to NULL in many cases to use default values.
+ *
+ * Optionally, last can point to a structure of the same type to define an
+ * inclusive range. This is mostly supported by integer and address fields,
+ * may cause errors otherwise. Fields that do not support ranges must be set
+ * to 0 or to the same value as the corresponding fields in spec.
+ *
+ * By default all fields present in spec are considered relevant (see note
+ * below). This behavior can be altered by providing a mask structure of the
+ * same type with applicable bits set to one. It can also be used to
+ * partially filter out specific fields (e.g. as an alternate mean to match
+ * ranges of IP addresses).
+ *
+ * Mask is a simple bit-mask applied before interpreting the contents of
+ * spec and last, which may yield unexpected results if not used
+ * carefully. For example, if for an IPv4 address field, spec provides
+ * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
+ * effective range becomes 10.1.0.0 to 10.3.255.255.
+ *
+ * Note: the defaults for data-matching items such as IPv4 when mask is not
+ * specified actually depend on the underlying implementation since only
+ * recognized fields can be taken into account.
+ */
+struct rte_flow_item {
+	enum rte_flow_item_type type; /**< Item type. */
+	const void *spec; /**< Pointer to item specification structure. */
+	const void *last; /**< Defines an inclusive range (spec to last). */
+	const void *mask; /**< Bit-mask applied to spec and last. */
+};
+
+/**
+ * Action types.
+ *
+ * Each possible action is represented by a type. Some have associated
+ * configuration structures. Several actions combined in a list can be
+ * affected to a flow rule. That list is not ordered.
+ *
+ * They fall in three categories:
+ *
+ * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+ *   processing matched packets by subsequent flow rules, unless overridden
+ *   with PASSTHRU.
+ *
+ * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
+ *   for additional processing by subsequent flow rules.
+ *
+ * - Other non terminating meta actions that do not affect the fate of
+ *   packets (END, VOID, MARK, FLAG, COUNT).
+ *
+ * When several actions are combined in a flow rule, they should all have
+ * different types (e.g. dropping a packet twice is not possible).
+ *
+ * Only the last action of a given type is taken into account. PMDs still
+ * perform error checking on the entire list.
+ *
+ * Note that PASSTHRU is the only action able to override a terminating
+ * rule.
+ */
+enum rte_flow_action_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for action lists. Prevents further processing of
+	 * actions, thereby ending the list.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_VOID,
+
+	/**
+	 * Leaves packets up for additional processing by subsequent flow
+	 * rules. This is the default when a rule does not contain a
+	 * terminating action, but can be specified to force a rule to
+	 * become non-terminating.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PASSTHRU,
+
+	/**
+	 * [META]
+	 *
+	 * Attaches a 32 bit value to packets.
+	 *
+	 * See struct rte_flow_action_mark.
+	 */
+	RTE_FLOW_ACTION_TYPE_MARK,
+
+	/**
+	 * [META]
+	 *
+	 * Flag packets. Similar to MARK but only affects ol_flags.
+	 *
+	 * Note: a distinctive flag must be defined for it.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_FLAG,
+
+	/**
+	 * Assigns packets to a given queue index.
+	 *
+	 * See struct rte_flow_action_queue.
+	 */
+	RTE_FLOW_ACTION_TYPE_QUEUE,
+
+	/**
+	 * Drops packets.
+	 *
+	 * PASSTHRU overrides this action if both are specified.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_DROP,
+
+	/**
+	 * [META]
+	 *
+	 * Enables counters for this rule.
+	 *
+	 * These counters can be retrieved and reset through rte_flow_query(),
+	 * see struct rte_flow_query_count.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_COUNT,
+
+	/**
+	 * Duplicates packets to a given queue index.
+	 *
+	 * This is normally combined with QUEUE, however when used alone, it
+	 * is actually similar to QUEUE + PASSTHRU.
+	 *
+	 * See struct rte_flow_action_dup.
+	 */
+	RTE_FLOW_ACTION_TYPE_DUP,
+
+	/**
+	 * Similar to QUEUE, except RSS is additionally performed on packets
+	 * to spread them among several queues according to the provided
+	 * parameters.
+	 *
+	 * See struct rte_flow_action_rss.
+	 */
+	RTE_FLOW_ACTION_TYPE_RSS,
+
+	/**
+	 * Redirects packets to the physical function (PF) of the current
+	 * device.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PF,
+
+	/**
+	 * Redirects packets to the virtual function (VF) of the current
+	 * device with the specified ID.
+	 *
+	 * See struct rte_flow_action_vf.
+	 */
+	RTE_FLOW_ACTION_TYPE_VF,
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_MARK
+ *
+ * Attaches a 32 bit value to packets.
+ *
+ * This value is arbitrary and application-defined. For compatibility with
+ * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
+ * also set in ol_flags.
+ */
+struct rte_flow_action_mark {
+	uint32_t id; /**< 32 bit value to return with packets. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_QUEUE
+ *
+ * Assign packets to a given queue index.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_queue {
+	uint16_t index; /**< Queue index to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_COUNT (query)
+ *
+ * Query structure to retrieve and reset flow rule counters.
+ */
+struct rte_flow_query_count {
+	uint32_t reset:1; /**< Reset counters after query [in]. */
+	uint32_t hits_set:1; /**< hits field is set [out]. */
+	uint32_t bytes_set:1; /**< bytes field is set [out]. */
+	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
+	uint64_t hits; /**< Number of hits for this rule [out]. */
+	uint64_t bytes; /**< Number of bytes through this rule [out]. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_DUP
+ *
+ * Duplicates packets to a given queue index.
+ *
+ * This is normally combined with QUEUE, however when used alone, it is
+ * actually similar to QUEUE + PASSTHRU.
+ *
+ * Non-terminating by default.
+ */
+struct rte_flow_action_dup {
+	uint16_t index; /**< Queue index to duplicate packets to. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_RSS
+ *
+ * Similar to QUEUE, except RSS is additionally performed on packets to
+ * spread them among several queues according to the provided parameters.
+ *
+ * Note: RSS hash result is normally stored in the hash.rss mbuf field,
+ * however it conflicts with the MARK action as they share the same
+ * space. When both actions are specified, the RSS hash is discarded and
+ * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
+ * structure should eventually evolve to store both.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_rss {
+	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
+	uint16_t num; /**< Number of entries in queue[]. */
+	uint16_t queue[]; /**< Queues indices to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_VF
+ *
+ * Redirects packets to a virtual function (VF) of the current device.
+ *
+ * Packets matched by a VF pattern item can be redirected to their original
+ * VF ID instead of the specified one. This parameter may not be available
+ * and is not guaranteed to work properly if the VF part is matched by a
+ * prior flow rule or if packets are not addressed to a VF in the first
+ * place.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_vf {
+	uint32_t original:1; /**< Use original VF ID if possible. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint32_t id; /**< VF ID to redirect packets to. */
+};
+
+/**
+ * Definition of a single action.
+ *
+ * A list of actions is terminated by a END action.
+ *
+ * For simple actions without a configuration structure, conf remains NULL.
+ */
+struct rte_flow_action {
+	enum rte_flow_action_type type; /**< Action type. */
+	const void *conf; /**< Pointer to action configuration structure. */
+};
+
+/**
+ * Opaque type returned after successfully creating a flow.
+ *
+ * This handle can be used to manage and query the related flow (e.g. to
+ * destroy it or retrieve counters).
+ */
+struct rte_flow;
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_flow_error.cause.
+ */
+enum rte_flow_error_type {
+	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+	RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+	RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+	RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_flow_error {
+	enum rte_flow_error_type type; /**< Cause field and error types. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Check whether a flow rule can be created on a given port.
+ *
+ * While this function has no effect on the target device, the flow rule is
+ * validated against its current configuration state and the returned value
+ * should be considered valid by the caller for that state only.
+ *
+ * The returned value is guaranteed to remain valid only as long as no
+ * successful calls to rte_flow_create() or rte_flow_destroy() are made in
+ * the meantime and no device parameter affecting flow rules in any way are
+ * modified, due to possible collisions or resource limitations (although in
+ * such cases EINVAL should not be returned).
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 if flow rule is valid and can be created. A negative errno value
+ *   otherwise (rte_errno is also set), the following errors are defined:
+ *
+ *   -ENOSYS: underlying device does not support this functionality.
+ *
+ *   -EINVAL: unknown or invalid rule specification.
+ *
+ *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
+ *   bit-masks are unsupported).
+ *
+ *   -EEXIST: collision with an existing rule.
+ *
+ *   -ENOMEM: not enough resources.
+ *
+ *   -EBUSY: action cannot be performed due to busy device resources, may
+ *   succeed if the affected queues or even the entire port are in a stopped
+ *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
+ */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error);
+
+/**
+ * Create a flow rule on a given port.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise and rte_errno is set
+ *   to the positive version of one of the error codes defined for
+ *   rte_flow_validate().
+ */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error);
+
+/**
+ * Destroy a flow rule on a given port.
+ *
+ * Failure to destroy a flow rule handle may occur when other flow rules
+ * depend on it, and destroying it would result in an inconsistent state.
+ *
+ * This function is only guaranteed to succeed if handles are destroyed in
+ * reverse order of their creation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to destroy.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error);
+
+/**
+ * Destroy all flow rules associated with a port.
+ *
+ * In the unlikely event of failure, handles are still considered destroyed
+ * and no longer valid but the port must be assumed to be in an inconsistent
+ * state.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error);
+
+/**
+ * Query an existing flow rule.
+ *
+ * This function allows retrieving flow-specific data such as counters.
+ * Data is gathered by special actions which must be present in the flow
+ * rule definition.
+ *
+ * \see RTE_FLOW_ACTION_TYPE_COUNT
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to query.
+ * @param action
+ *   Action type to query.
+ * @param[in, out] data
+ *   Pointer to storage for the associated query data type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_H_ */
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
new file mode 100644
index 0000000..b75cfdd
--- /dev/null
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -0,0 +1,181 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_DRIVER_H_
+#define RTE_FLOW_DRIVER_H_
+
+/**
+ * @file
+ * RTE generic flow API (driver side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_flow.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Generic flow operations structure implemented and returned by PMDs.
+ *
+ * To implement this API, PMDs must handle the RTE_ETH_FILTER_GENERIC filter
+ * type in their .filter_ctrl callback function (struct eth_dev_ops) as well
+ * as the RTE_ETH_FILTER_GET filter operation.
+ *
+ * If successful, this operation must result in a pointer to a PMD-specific
+ * struct rte_flow_ops written to the argument address as described below:
+ *
+ * \code
+ *
+ * // PMD filter_ctrl callback
+ *
+ * static const struct rte_flow_ops pmd_flow_ops = { ... };
+ *
+ * switch (filter_type) {
+ * case RTE_ETH_FILTER_GENERIC:
+ *     if (filter_op != RTE_ETH_FILTER_GET)
+ *         return -EINVAL;
+ *     *(const void **)arg = &pmd_flow_ops;
+ *     return 0;
+ * }
+ *
+ * \endcode
+ *
+ * See also rte_flow_ops_get().
+ *
+ * These callback functions are not supposed to be used by applications
+ * directly, which must rely on the API defined in rte_flow.h.
+ *
+ * Public-facing wrapper functions perform a few consistency checks so that
+ * unimplemented (i.e. NULL) callbacks simply return -ENOTSUP. These
+ * callbacks otherwise only differ by their first argument (with port ID
+ * already resolved to a pointer to struct rte_eth_dev).
+ */
+struct rte_flow_ops {
+	/** See rte_flow_validate(). */
+	int (*validate)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_create(). */
+	struct rte_flow *(*create)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_destroy(). */
+	int (*destroy)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 struct rte_flow_error *);
+	/** See rte_flow_flush(). */
+	int (*flush)
+		(struct rte_eth_dev *,
+		 struct rte_flow_error *);
+	/** See rte_flow_query(). */
+	int (*query)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 enum rte_flow_action_type,
+		 void *,
+		 struct rte_flow_error *);
+};
+
+/**
+ * Initialize generic flow error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Pointer to flow error structure.
+ */
+static inline struct rte_flow_error *
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return error;
+}
+
+/**
+ * Get generic flow operations structure from a port.
+ *
+ * @param port_id
+ *   Port identifier to query.
+ * @param[out] error
+ *   Pointer to flow error structure.
+ *
+ * @return
+ *   The flow operations structure associated with port_id, NULL in case of
+ *   error, in which case rte_errno is set and the error structure contains
+ *   additional details.
+ */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_DRIVER_H_ */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 3/3] tools: move to usertools
  2016-12-15 21:59  4% [dpdk-dev] [PATCH 0/3] buildtools/devtools/usertools Thomas Monjalon
  2016-12-15 21:59 32% ` [dpdk-dev] [PATCH 2/3] scripts: move to devtools Thomas Monjalon
@ 2016-12-15 21:59  2% ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-12-15 21:59 UTC (permalink / raw)
  To: dev

Rename tools/ into usertools/ to differentiate from buildtools/
and devtools/ while making clear these scripts are part of
DPDK runtime.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 MAINTAINERS                                      |  2 +-
 doc/guides/cryptodevs/qat.rst                    |  2 +-
 doc/guides/faq/faq.rst                           |  2 +-
 doc/guides/howto/lm_bond_virtio_sriov.rst        |  8 ++++----
 doc/guides/howto/lm_virtio_vhost_user.rst        | 16 +++++++--------
 doc/guides/linux_gsg/build_dpdk.rst              | 14 ++++++-------
 doc/guides/linux_gsg/nic_perf_intel_platform.rst |  6 +++---
 doc/guides/linux_gsg/quick_start.rst             |  4 ++--
 doc/guides/nics/bnx2x.rst                        |  4 ++--
 doc/guides/nics/cxgbe.rst                        |  4 ++--
 doc/guides/nics/ena.rst                          |  2 +-
 doc/guides/nics/i40e.rst                         |  4 ++--
 doc/guides/nics/qede.rst                         |  2 +-
 doc/guides/nics/thunderx.rst                     | 26 ++++++++++++------------
 doc/guides/nics/virtio.rst                       |  2 +-
 doc/guides/sample_app_ug/vhost.rst               |  2 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst      | 10 ++++-----
 doc/guides/xen/pkt_switch.rst                    |  2 +-
 lib/librte_eal/common/eal_common_options.c       |  2 +-
 mk/rte.sdkinstall.mk                             | 10 +++------
 pkg/dpdk.spec                                    |  2 +-
 {tools => usertools}/cpu_layout.py               |  0
 {tools => usertools}/dpdk-devbind.py             |  0
 {tools => usertools}/dpdk-pmdinfo.py             |  0
 {tools => usertools}/dpdk-setup.sh               | 14 ++++++-------
 25 files changed, 68 insertions(+), 72 deletions(-)
 rename {tools => usertools}/cpu_layout.py (100%)
 rename {tools => usertools}/dpdk-devbind.py (100%)
 rename {tools => usertools}/dpdk-pmdinfo.py (100%)
 rename {tools => usertools}/dpdk-setup.sh (97%)

diff --git a/MAINTAINERS b/MAINTAINERS
index e779a5d..3fdd92a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -74,7 +74,7 @@ F: scripts/validate-abi.sh
 
 Driver information
 F: buildtools/pmdinfogen/
-F: tools/dpdk-pmdinfo.py
+F: usertools/dpdk-pmdinfo.py
 F: doc/guides/tools/pmdinfo.rst
 
 
diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 52a9ae3..03d5c2d 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -413,4 +413,4 @@ The other way to bind the VFs to the DPDK UIO driver is by using the ``dpdk-devb
 .. code-block:: console
 
     cd $RTE_SDK
-    ./tools/dpdk-devbind.py -b igb_uio 0000:03:01.1
+    ./usertools/dpdk-devbind.py -b igb_uio 0000:03:01.1
diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index 0adc549..5a324b2 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -50,7 +50,7 @@ When you stop and restart the test application, it looks to see if the pages are
 If you look in the directory, you will see ``n`` number of 2M pages files. If you specified 1024, you will see 1024 page files.
 These are then placed in memory segments to get contiguous memory.
 
-If you need to change the number of pages, it is easier to first remove the pages. The tools/dpdk-setup.sh script provides an option to do this.
+If you need to change the number of pages, it is easier to first remove the pages. The usertools/dpdk-setup.sh script provides an option to do this.
 See the "Quick Start Setup Script" section in the :ref:`DPDK Getting Started Guide <linux_gsg>` for more information.
 
 
diff --git a/doc/guides/howto/lm_bond_virtio_sriov.rst b/doc/guides/howto/lm_bond_virtio_sriov.rst
index fe9803e..169b64e 100644
--- a/doc/guides/howto/lm_bond_virtio_sriov.rst
+++ b/doc/guides/howto/lm_bond_virtio_sriov.rst
@@ -613,17 +613,17 @@ Set up DPDK in the Virtual Machine
    cat  /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 
    ifconfig -a
-   /root/dpdk/tools/dpdk-devbind.py --status
+   /root/dpdk/usertools/dpdk-devbind.py --status
 
    rmmod virtio-pci ixgbevf
 
    modprobe uio
    insmod /root/dpdk/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko
 
-   /root/dpdk/tools/dpdk-devbind.py -b igb_uio 0000:00:03.0
-   /root/dpdk/tools/dpdk-devbind.py -b igb_uio 0000:00:04.0
+   /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:03.0
+   /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:04.0
 
-   /root/dpdk/tools/dpdk-devbind.py --status
+   /root/dpdk/usertools/dpdk-devbind.py --status
 
 run_testpmd_bonding_in_vm.sh
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/howto/lm_virtio_vhost_user.rst b/doc/guides/howto/lm_virtio_vhost_user.rst
index 4937781..0a0fcfc 100644
--- a/doc/guides/howto/lm_virtio_vhost_user.rst
+++ b/doc/guides/howto/lm_virtio_vhost_user.rst
@@ -90,14 +90,14 @@ For Fortville NIC.
 
 .. code-block:: console
 
-   cd /root/dpdk/tools
+   cd /root/dpdk/usertools
    ./dpdk-devbind.py -b igb_uio 0000:02:00.0
 
 For Niantic NIC.
 
 .. code-block:: console
 
-   cd /root/dpdk/tools
+   cd /root/dpdk/usertools
    ./dpdk-devbind.py -b igb_uio 0000:09:00.0
 
 On host_server_1: Terminal 3
@@ -171,14 +171,14 @@ For Fortville NIC.
 
 .. code-block:: console
 
-   cd /root/dpdk/tools
+   cd /root/dpdk/usertools
    ./dpdk-devbind.py -b igb_uio 0000:03:00.0
 
 For Niantic NIC.
 
 .. code-block:: console
 
-   cd /root/dpdk/tools
+   cd /root/dpdk/usertools
    ./dpdk-devbind.py -b igb_uio 0000:06:00.0
 
 On host_server_2: Terminal 3
@@ -444,17 +444,17 @@ setup_dpdk_virtio_in_vm.sh
    cat  /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 
    ifconfig -a
-   /root/dpdk/tools/dpdk-devbind.py --status
+   /root/dpdk/usertools/dpdk-devbind.py --status
 
    rmmod virtio-pci
 
    modprobe uio
    insmod /root/dpdk/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko
 
-   /root/dpdk/tools/dpdk-devbind.py -b igb_uio 0000:00:03.0
-   /root/dpdk/tools/dpdk-devbind.py -b igb_uio 0000:00:04.0
+   /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:03.0
+   /root/dpdk/usertools/dpdk-devbind.py -b igb_uio 0000:00:04.0
 
-   /root/dpdk/tools/dpdk-devbind.py --status
+   /root/dpdk/usertools/dpdk-devbind.py --status
 
 run_testpmd_in_vm.sh
 ~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/linux_gsg/build_dpdk.rst b/doc/guides/linux_gsg/build_dpdk.rst
index 527c38d..f0a096e 100644
--- a/doc/guides/linux_gsg/build_dpdk.rst
+++ b/doc/guides/linux_gsg/build_dpdk.rst
@@ -58,7 +58,7 @@ The DPDK is composed of several directories:
 
 *   examples: Source code of DPDK application examples
 
-*   config, tools, scripts, mk: Framework-related makefiles, scripts and configuration
+*   config, buildtools, mk: Framework-related makefiles, scripts and configuration
 
 Installation of DPDK Target Environments
 ----------------------------------------
@@ -188,7 +188,7 @@ however please consult your distributions documentation to make sure that is the
 Also, to use VFIO, both kernel and BIOS must support and be configured to use IO virtualization (such as Intel® VT-d).
 
 For proper operation of VFIO when running DPDK applications as a non-privileged user, correct permissions should also be set up.
-This can be done by using the DPDK setup script (called dpdk-setup.sh and located in the tools directory).
+This can be done by using the DPDK setup script (called dpdk-setup.sh and located in the usertools directory).
 
 .. _linux_gsg_binding_kernel:
 
@@ -208,7 +208,7 @@ Any network ports under Linux* control will be ignored by the DPDK poll-mode dri
 
 To bind ports to the ``uio_pci_generic``, ``igb_uio`` or ``vfio-pci`` module for DPDK use,
 and then subsequently return ports to Linux* control,
-a utility script called dpdk_nic _bind.py is provided in the tools subdirectory.
+a utility script called dpdk_nic _bind.py is provided in the usertools subdirectory.
 This utility can be used to provide a view of the current state of the network ports on the system,
 and to bind and unbind those ports from the different kernel modules, including the uio and vfio modules.
 The following are some examples of how the script can be used.
@@ -235,7 +235,7 @@ To see the status of all network ports on the system:
 
 .. code-block:: console
 
-    ./tools/dpdk-devbind.py --status
+    ./usertools/dpdk-devbind.py --status
 
     Network devices using DPDK-compatible driver
     ============================================
@@ -257,16 +257,16 @@ To bind device ``eth1``,``04:00.1``, to the ``uio_pci_generic`` driver:
 
 .. code-block:: console
 
-    ./tools/dpdk-devbind.py --bind=uio_pci_generic 04:00.1
+    ./usertools/dpdk-devbind.py --bind=uio_pci_generic 04:00.1
 
 or, alternatively,
 
 .. code-block:: console
 
-    ./tools/dpdk-devbind.py --bind=uio_pci_generic eth1
+    ./usertools/dpdk-devbind.py --bind=uio_pci_generic eth1
 
 To restore device ``82:00.0`` to its original kernel binding:
 
 .. code-block:: console
 
-    ./tools/dpdk-devbind.py --bind=ixgbe 82:00.0
+    ./usertools/dpdk-devbind.py --bind=ixgbe 82:00.0
diff --git a/doc/guides/linux_gsg/nic_perf_intel_platform.rst b/doc/guides/linux_gsg/nic_perf_intel_platform.rst
index d4a8362..8f34faf 100644
--- a/doc/guides/linux_gsg/nic_perf_intel_platform.rst
+++ b/doc/guides/linux_gsg/nic_perf_intel_platform.rst
@@ -158,7 +158,7 @@ Configurations before running DPDK
 
       cd dpdk_folder
 
-      tools/cpu_layout.py
+      usertools/cpu_layout.py
 
    Or run ``lscpu`` to check the the cores on each socket.
 
@@ -192,10 +192,10 @@ Configurations before running DPDK
 
 
       # Bind ports 82:00.0 and 85:00.0 to dpdk driver
-      ./dpdk_folder/tools/dpdk-devbind.py -b igb_uio 82:00.0 85:00.0
+      ./dpdk_folder/usertools/dpdk-devbind.py -b igb_uio 82:00.0 85:00.0
 
       # Check the port driver status
-      ./dpdk_folder/tools/dpdk-devbind.py --status
+      ./dpdk_folder/usertools/dpdk-devbind.py --status
 
    See ``dpdk-devbind.py --help`` for more details.
 
diff --git a/doc/guides/linux_gsg/quick_start.rst b/doc/guides/linux_gsg/quick_start.rst
index 6e858c2..b158d0f 100644
--- a/doc/guides/linux_gsg/quick_start.rst
+++ b/doc/guides/linux_gsg/quick_start.rst
@@ -33,7 +33,7 @@
 Quick Start Setup Script
 ========================
 
-The dpdk-setup.sh script, found in the tools subdirectory, allows the user to perform the following tasks:
+The dpdk-setup.sh script, found in the usertools subdirectory, allows the user to perform the following tasks:
 
 *   Build the DPDK libraries
 
@@ -108,7 +108,7 @@ Some options in the script prompt the user for further data before proceeding.
 
 .. code-block:: console
 
-    source tools/dpdk-setup.sh
+    source usertools/dpdk-setup.sh
 
     ------------------------------------------------------------------------
 
diff --git a/doc/guides/nics/bnx2x.rst b/doc/guides/nics/bnx2x.rst
index 6d1768a..c011df1 100644
--- a/doc/guides/nics/bnx2x.rst
+++ b/doc/guides/nics/bnx2x.rst
@@ -207,7 +207,7 @@ devices managed by ``librte_pmd_bnx2x`` in Linux operating system.
 #. Bind the QLogic adapters to ``igb_uio`` or ``vfio-pci`` loaded in the
    previous step::
 
-      ./tools/dpdk-devbind.py --bind igb_uio 0000:84:00.0 0000:84:00.1
+      ./usertools/dpdk-devbind.py --bind igb_uio 0000:84:00.0 0000:84:00.1
 
    or
 
@@ -219,7 +219,7 @@ devices managed by ``librte_pmd_bnx2x`` in Linux operating system.
 
       sudo chmod 0666 /dev/vfio/*
 
-      ./tools/dpdk-devbind.py --bind vfio-pci 0000:84:00.0 0000:84:00.1
+      ./usertools/dpdk-devbind.py --bind vfio-pci 0000:84:00.0 0000:84:00.1
 
 #. Start ``testpmd`` with basic parameters:
 
diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst
index d8236b0..7aa6953 100644
--- a/doc/guides/nics/cxgbe.rst
+++ b/doc/guides/nics/cxgbe.rst
@@ -285,7 +285,7 @@ devices managed by librte_pmd_cxgbe in Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind igb_uio 0000:02:00.4
+      ./usertools/dpdk-devbind.py --bind igb_uio 0000:02:00.4
 
    or
 
@@ -297,7 +297,7 @@ devices managed by librte_pmd_cxgbe in Linux operating system.
 
       sudo chmod 0666 /dev/vfio/*
 
-      ./tools/dpdk-devbind.py --bind vfio-pci 0000:02:00.4
+      ./usertools/dpdk-devbind.py --bind vfio-pci 0000:02:00.4
 
    .. note::
 
diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 073b35a..c2738e8 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -225,7 +225,7 @@ devices managed by librte_pmd_ena.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind=igb_uio 0000:02:00.1
+      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:02:00.1
 
 #. Start testpmd with basic parameters:
 
diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 5780268..0cc9268 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -164,13 +164,13 @@ devices managed by ``librte_pmd_i40e`` in the Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind igb_uio 0000:83:00.0
+      ./usertools/dpdk-devbind.py --bind igb_uio 0000:83:00.0
 
    Or setup VFIO permissions for regular users and then bind to ``vfio-pci``:
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind vfio-pci 0000:83:00.0
+      ./usertools/dpdk-devbind.py --bind vfio-pci 0000:83:00.0
 
 #. Start ``testpmd`` with basic parameters:
 
diff --git a/doc/guides/nics/qede.rst b/doc/guides/nics/qede.rst
index d22ecdd..9d70217 100644
--- a/doc/guides/nics/qede.rst
+++ b/doc/guides/nics/qede.rst
@@ -175,7 +175,7 @@ devices managed by ``librte_pmd_qede`` in Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind igb_uio 0000:84:00.0 0000:84:00.1 \
+      ./usertools/dpdk-devbind.py --bind igb_uio 0000:84:00.0 0000:84:00.1 \
                                               0000:84:00.2 0000:84:00.3
 
 #. Start ``testpmd`` with basic parameters:
diff --git a/doc/guides/nics/thunderx.rst b/doc/guides/nics/thunderx.rst
index 187c9a4..e6ac441 100644
--- a/doc/guides/nics/thunderx.rst
+++ b/doc/guides/nics/thunderx.rst
@@ -149,7 +149,7 @@ managed by ``librte_pmd_thunderx_nicvf`` in the Linux operating system.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind vfio-pci 0002:01:00.2
+      ./usertools/dpdk-devbind.py --bind vfio-pci 0002:01:00.2
 
 #. Start ``testpmd`` with basic parameters:
 
@@ -253,7 +253,7 @@ This section provides instructions to configure SR-IOV with Linux OS.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --status
+      ./usertools/dpdk-devbind.py --status
 
    Example output:
 
@@ -275,14 +275,14 @@ This section provides instructions to configure SR-IOV with Linux OS.
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --bind vfio-pci 0002:01:00.1
-      ./tools/dpdk-devbind.py --bind vfio-pci 0002:01:00.2
+      ./usertools/dpdk-devbind.py --bind vfio-pci 0002:01:00.1
+      ./usertools/dpdk-devbind.py --bind vfio-pci 0002:01:00.2
 
 #. Verify VF bind using ``dpdk-devbind.py``:
 
    .. code-block:: console
 
-      ./tools/dpdk-devbind.py --status
+      ./usertools/dpdk-devbind.py --status
 
    Example output:
 
@@ -352,7 +352,7 @@ driver' list, secondary VFs are on the remaining on the remaining part of the li
    .. note::
 
       The VNIC driver in the multiqueue setup works differently than other drivers like `ixgbe`.
-      We need to bind separately each specific queue set device with the ``tools/dpdk-devbind.py`` utility.
+      We need to bind separately each specific queue set device with the ``usertools/dpdk-devbind.py`` utility.
 
    .. note::
 
@@ -372,7 +372,7 @@ on a non-NUMA machine.
 
    .. code-block:: console
 
-      # tools/dpdk-devbind.py --status
+      # usertools/dpdk-devbind.py --status
 
       Network devices using DPDK-compatible driver
       ============================================
@@ -416,17 +416,17 @@ We will choose four secondary queue sets from the ending of the list (0002:01:01
 
    .. code-block:: console
 
-      tools/dpdk-devbind.py -b vfio-pci 0002:01:00.2
-      tools/dpdk-devbind.py -b vfio-pci 0002:01:00.3
+      usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.2
+      usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.3
 
 #. Bind four primary VFs to the ``vfio-pci`` driver:
 
    .. code-block:: console
 
-      tools/dpdk-devbind.py -b vfio-pci 0002:01:01.7
-      tools/dpdk-devbind.py -b vfio-pci 0002:01:02.0
-      tools/dpdk-devbind.py -b vfio-pci 0002:01:02.1
-      tools/dpdk-devbind.py -b vfio-pci 0002:01:02.2
+      usertools/dpdk-devbind.py -b vfio-pci 0002:01:01.7
+      usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.0
+      usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.1
+      usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.2
 
 The nicvf thunderx driver will make use of attached secondary VFs automatically during the interface configuration stage.
 
diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
index 5431015..c90e517 100644
--- a/doc/guides/nics/virtio.rst
+++ b/doc/guides/nics/virtio.rst
@@ -172,7 +172,7 @@ Host2VM communication example
         modprobe uio
         echo 512 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
         modprobe uio_pci_generic
-        python tools/dpdk-devbind.py -b uio_pci_generic 00:03.0
+        python usertools/dpdk-devbind.py -b uio_pci_generic 00:03.0
 
     We use testpmd as the forwarding application in this example.
 
diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index 1f6d0d9..95db988 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -115,7 +115,7 @@ could be done by:
 .. code-block:: console
 
    modprobe uio_pci_generic
-   $RTE_SDK/tools/dpdk-devbind.py -b=uio_pci_generic 0000:00:04.0
+   $RTE_SDK/usertools/dpdk-devbind.py -b=uio_pci_generic 0000:00:04.0
 
 Then start testpmd for packet forwarding testing.
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index f1c269a..f82dcfb 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1061,7 +1061,7 @@ For example, to move a pci device using ixgbe under DPDK management:
 .. code-block:: console
 
    # Check the status of the available devices.
-   ./tools/dpdk-devbind.py --status
+   ./usertools/dpdk-devbind.py --status
 
    Network devices using DPDK-compatible driver
    ============================================
@@ -1073,11 +1073,11 @@ For example, to move a pci device using ixgbe under DPDK management:
 
 
    # Bind the device to igb_uio.
-   sudo ./tools/dpdk-devbind.py -b igb_uio 0000:0a:00.0
+   sudo ./usertools/dpdk-devbind.py -b igb_uio 0000:0a:00.0
 
 
    # Recheck the status of the devices.
-   ./tools/dpdk-devbind.py --status
+   ./usertools/dpdk-devbind.py --status
    Network devices using DPDK-compatible driver
    ============================================
    0000:0a:00.0 '82599ES 10-Gigabit' drv=igb_uio unused=
@@ -1180,9 +1180,9 @@ For example, to move a pci device under kernel management:
 
 .. code-block:: console
 
-   sudo ./tools/dpdk-devbind.py -b ixgbe 0000:0a:00.0
+   sudo ./usertools/dpdk-devbind.py -b ixgbe 0000:0a:00.0
 
-   ./tools/dpdk-devbind.py --status
+   ./usertools/dpdk-devbind.py --status
 
    Network devices using DPDK-compatible driver
    ============================================
diff --git a/doc/guides/xen/pkt_switch.rst b/doc/guides/xen/pkt_switch.rst
index a45841b..0b4ddfd 100644
--- a/doc/guides/xen/pkt_switch.rst
+++ b/doc/guides/xen/pkt_switch.rst
@@ -323,7 +323,7 @@ Building and Running the Switching Backend
     .. code-block:: console
 
         modprobe uio_pci_generic
-        python tools/dpdk-devbind.py -b uio_pci_generic 0000:09:00:00.0
+        python usertools/dpdk-devbind.py -b uio_pci_generic 0000:09:00:00.0
 
     In this case, 0000:09:00.0 is the PCI address for the NIC controller.
 
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 6ca8af1..a9936bf 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -118,7 +118,7 @@ static const char *default_solib_dir = RTE_EAL_PMD_PATH;
 /*
  * Stringified version of solib path used by dpdk-pmdinfo.py
  * Note: PLEASE DO NOT ALTER THIS without making a corresponding
- * change to tools/dpdk-pmdinfo.py
+ * change to usertools/dpdk-pmdinfo.py
  */
 static const char dpdk_solib_path[] __attribute__((used)) =
 "DPDK_PLUGIN_PATH=" RTE_EAL_PMD_PATH;
diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 896bc14..dbac2a2 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -124,15 +124,11 @@ install-runtime:
 	    tar -xf -      -C $(DESTDIR)$(bindir) --strip-components=1 \
 		--keep-newer-files
 	$(Q)$(call rte_mkdir,      $(DESTDIR)$(datadir))
-	$(Q)cp -a $(RTE_SDK)/tools $(DESTDIR)$(datadir)
-	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-setup.sh, \
-	                           $(DESTDIR)$(datadir)/tools/setup.sh)
-	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-devbind.py, \
-	                           $(DESTDIR)$(datadir)/tools/dpdk_nic_bind.py)
+	$(Q)cp -a $(RTE_SDK)/usertools $(DESTDIR)$(datadir)
 	$(Q)$(call rte_mkdir,      $(DESTDIR)$(sbindir))
-	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-devbind.py, \
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/usertools/dpdk-devbind.py, \
 	                           $(DESTDIR)$(sbindir)/dpdk-devbind)
-	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/tools/dpdk-pmdinfo.py, \
+	$(Q)$(call rte_symlink,    $(DESTDIR)$(datadir)/usertools/dpdk-pmdinfo.py, \
 	                           $(DESTDIR)$(bindir)/dpdk-pmdinfo)
 ifneq ($(wildcard $O/doc/man/*/*.1),)
 	$(Q)$(call rte_mkdir,      $(DESTDIR)$(mandir)/man1)
diff --git a/pkg/dpdk.spec b/pkg/dpdk.spec
index d12509a..43ff954 100644
--- a/pkg/dpdk.spec
+++ b/pkg/dpdk.spec
@@ -94,7 +94,7 @@ make install O=%{target} DESTDIR=%{buildroot} \
 
 %files
 %dir %{_datadir}/dpdk
-%{_datadir}/dpdk/tools
+%{_datadir}/dpdk/usertools
 /lib/modules/%(uname -r)/extra/*
 %{_sbindir}/*
 %{_bindir}/*
diff --git a/tools/cpu_layout.py b/usertools/cpu_layout.py
similarity index 100%
rename from tools/cpu_layout.py
rename to usertools/cpu_layout.py
diff --git a/tools/dpdk-devbind.py b/usertools/dpdk-devbind.py
similarity index 100%
rename from tools/dpdk-devbind.py
rename to usertools/dpdk-devbind.py
diff --git a/tools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
similarity index 100%
rename from tools/dpdk-pmdinfo.py
rename to usertools/dpdk-pmdinfo.py
diff --git a/tools/dpdk-setup.sh b/usertools/dpdk-setup.sh
similarity index 97%
rename from tools/dpdk-setup.sh
rename to usertools/dpdk-setup.sh
index 14ed590..c4fec5a 100755
--- a/tools/dpdk-setup.sh
+++ b/usertools/dpdk-setup.sh
@@ -428,7 +428,7 @@ grep_meminfo()
 show_devices()
 {
 	if [ -d /sys/module/vfio_pci -o -d /sys/module/igb_uio ]; then
-		${RTE_SDK}/tools/dpdk-devbind.py --status
+		${RTE_SDK}/usertools/dpdk-devbind.py --status
 	else
 		echo "# Please load the 'igb_uio' or 'vfio-pci' kernel module before "
 		echo "# querying or adjusting device bindings"
@@ -441,11 +441,11 @@ show_devices()
 bind_devices_to_vfio()
 {
 	if [ -d /sys/module/vfio_pci ]; then
-		${RTE_SDK}/tools/dpdk-devbind.py --status
+		${RTE_SDK}/usertools/dpdk-devbind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to VFIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/dpdk-devbind.py -b vfio-pci $PCI_PATH &&
+		sudo ${RTE_SDK}/usertools/dpdk-devbind.py -b vfio-pci $PCI_PATH &&
 			echo "OK"
 	else
 		echo "# Please load the 'vfio-pci' kernel module before querying or "
@@ -459,11 +459,11 @@ bind_devices_to_vfio()
 bind_devices_to_igb_uio()
 {
 	if [ -d /sys/module/igb_uio ]; then
-		${RTE_SDK}/tools/dpdk-devbind.py --status
+		${RTE_SDK}/usertools/dpdk-devbind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/dpdk-devbind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/usertools/dpdk-devbind.py -b igb_uio $PCI_PATH && echo "OK"
 	else
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting device bindings"
@@ -475,14 +475,14 @@ bind_devices_to_igb_uio()
 #
 unbind_devices()
 {
-	${RTE_SDK}/tools/dpdk-devbind.py --status
+	${RTE_SDK}/usertools/dpdk-devbind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to unbind: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/dpdk-devbind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/usertools/dpdk-devbind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
-- 
2.7.0

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 2/3] scripts: move to devtools
  2016-12-15 21:59  4% [dpdk-dev] [PATCH 0/3] buildtools/devtools/usertools Thomas Monjalon
@ 2016-12-15 21:59 32% ` Thomas Monjalon
  2016-12-15 21:59  2% ` [dpdk-dev] [PATCH 3/3] tools: move to usertools Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-12-15 21:59 UTC (permalink / raw)
  To: dev

The remaining scripts in the scripts/ directory are only useful
to developers. That's why devtools/ is a better name.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 MAINTAINERS                                   | 14 +++++++-------
 {scripts => devtools}/check-git-log.sh        |  0
 {scripts => devtools}/check-includes.sh       |  0
 {scripts => devtools}/check-maintainers.sh    |  0
 {scripts => devtools}/checkpatches.sh         |  0
 {scripts => devtools}/cocci.sh                |  2 +-
 {scripts => devtools}/cocci/mtod-offset.cocci |  0
 {scripts => devtools}/git-log-fixes.sh        |  0
 {scripts => devtools}/load-devel-config       |  0
 {scripts => devtools}/test-build.sh           |  0
 {scripts => devtools}/test-null.sh            |  0
 {scripts => devtools}/validate-abi.sh         |  0
 doc/guides/contributing/patches.rst           |  8 ++++----
 doc/guides/contributing/versioning.rst        | 10 +++++-----
 14 files changed, 17 insertions(+), 17 deletions(-)
 rename {scripts => devtools}/check-git-log.sh (100%)
 rename {scripts => devtools}/check-includes.sh (100%)
 rename {scripts => devtools}/check-maintainers.sh (100%)
 rename {scripts => devtools}/checkpatches.sh (100%)
 rename {scripts => devtools}/cocci.sh (98%)
 rename {scripts => devtools}/cocci/mtod-offset.cocci (100%)
 rename {scripts => devtools}/git-log-fixes.sh (100%)
 rename {scripts => devtools}/load-devel-config (100%)
 rename {scripts => devtools}/test-build.sh (100%)
 rename {scripts => devtools}/test-null.sh (100%)
 rename {scripts => devtools}/validate-abi.sh (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index b0f5b8a..e779a5d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24,13 +24,13 @@ General Project Administration
 M: Thomas Monjalon <thomas.monjalon@6wind.com>
 T: git://dpdk.org/dpdk
 F: MAINTAINERS
-F: scripts/check-maintainers.sh
-F: scripts/check-git-log.sh
-F: scripts/check-includes.sh
-F: scripts/checkpatches.sh
-F: scripts/git-log-fixes.sh
-F: scripts/load-devel-config
-F: scripts/test-build.sh
+F: devtools/check-maintainers.sh
+F: devtools/check-git-log.sh
+F: devtools/check-includes.sh
+F: devtools/checkpatches.sh
+F: devtools/git-log-fixes.sh
+F: devtools/load-devel-config
+F: devtools/test-build.sh
 
 Stable Branches
 ---------------
diff --git a/scripts/check-git-log.sh b/devtools/check-git-log.sh
similarity index 100%
rename from scripts/check-git-log.sh
rename to devtools/check-git-log.sh
diff --git a/scripts/check-includes.sh b/devtools/check-includes.sh
similarity index 100%
rename from scripts/check-includes.sh
rename to devtools/check-includes.sh
diff --git a/scripts/check-maintainers.sh b/devtools/check-maintainers.sh
similarity index 100%
rename from scripts/check-maintainers.sh
rename to devtools/check-maintainers.sh
diff --git a/scripts/checkpatches.sh b/devtools/checkpatches.sh
similarity index 100%
rename from scripts/checkpatches.sh
rename to devtools/checkpatches.sh
diff --git a/scripts/cocci.sh b/devtools/cocci.sh
similarity index 98%
rename from scripts/cocci.sh
rename to devtools/cocci.sh
index 7acc256..4ca5025 100755
--- a/scripts/cocci.sh
+++ b/devtools/cocci.sh
@@ -33,7 +33,7 @@
 # Apply coccinelle transforms.
 
 SRCTREE=$(readlink -f $(dirname $0)/..)
-COCCI=$SRCTREE/scripts/cocci
+COCCI=$SRCTREE/devtools/cocci
 [ -n "$SPATCH" ] || SPATCH=$(which spatch)
 
 PATCH_LIST="$@"
diff --git a/scripts/cocci/mtod-offset.cocci b/devtools/cocci/mtod-offset.cocci
similarity index 100%
rename from scripts/cocci/mtod-offset.cocci
rename to devtools/cocci/mtod-offset.cocci
diff --git a/scripts/git-log-fixes.sh b/devtools/git-log-fixes.sh
similarity index 100%
rename from scripts/git-log-fixes.sh
rename to devtools/git-log-fixes.sh
diff --git a/scripts/load-devel-config b/devtools/load-devel-config
similarity index 100%
rename from scripts/load-devel-config
rename to devtools/load-devel-config
diff --git a/scripts/test-build.sh b/devtools/test-build.sh
similarity index 100%
rename from scripts/test-build.sh
rename to devtools/test-build.sh
diff --git a/scripts/test-null.sh b/devtools/test-null.sh
similarity index 100%
rename from scripts/test-null.sh
rename to devtools/test-null.sh
diff --git a/scripts/validate-abi.sh b/devtools/validate-abi.sh
similarity index 100%
rename from scripts/validate-abi.sh
rename to devtools/validate-abi.sh
diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index fabddbe..fe42679 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -242,7 +242,7 @@ For example::
 Checking the Patches
 --------------------
 
-Patches should be checked for formatting and syntax issues using the ``checkpatches.sh`` script in the ``scripts``
+Patches should be checked for formatting and syntax issues using the ``checkpatches.sh`` script in the ``devtools``
 directory of the DPDK repo.
 This uses the Linux kernel development tool ``checkpatch.pl`` which  can be obtained by cloning, and periodically,
 updating the Linux kernel sources.
@@ -257,7 +257,7 @@ files, in order of preference::
 
 Once the environment variable the script can be run as follows::
 
-   scripts/checkpatches.sh ~/patch/
+   devtools/checkpatches.sh ~/patch/
 
 The script usage is::
 
@@ -284,10 +284,10 @@ Where the range is a ``git log`` option.
 Checking Compilation
 --------------------
 
-Compilation of patches and changes should be tested using the the ``test-build.sh`` script in the ``scripts``
+Compilation of patches and changes should be tested using the the ``test-build.sh`` script in the ``devtools``
 directory of the DPDK repo::
 
-  scripts/test-build.sh x86_64-native-linuxapp-gcc+next+shared
+  devtools/test-build.sh x86_64-native-linuxapp-gcc+next+shared
 
 The script usage is::
 
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index 08e2e21..fbc44a7 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -457,7 +457,7 @@ versions of the symbol.
 Running the ABI Validator
 -------------------------
 
-The ``scripts`` directory in the DPDK source tree contains a utility program,
+The ``devtools`` directory in the DPDK source tree contains a utility program,
 ``validate-abi.sh``, for validating the DPDK ABI based on the Linux `ABI
 Compliance Checker
 <http://ispras.linuxbase.org/index.php/ABI_compliance_checker>`_.
@@ -470,7 +470,7 @@ utilities which can be installed via a package manager. For example::
 
 The syntax of the ``validate-abi.sh`` utility is::
 
-   ./scripts/validate-abi.sh <REV1> <REV2> <TARGET>
+   ./devtools/validate-abi.sh <REV1> <REV2> <TARGET>
 
 Where ``REV1`` and ``REV2`` are valid gitrevisions(7)
 https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html
@@ -479,13 +479,13 @@ on the local repo and target is the usual DPDK compilation target.
 For example::
 
    # Check between the previous and latest commit:
-   ./scripts/validate-abi.sh HEAD~1 HEAD x86_64-native-linuxapp-gcc
+   ./devtools/validate-abi.sh HEAD~1 HEAD x86_64-native-linuxapp-gcc
 
    # Check between two tags:
-   ./scripts/validate-abi.sh v2.0.0 v2.1.0 x86_64-native-linuxapp-gcc
+   ./devtools/validate-abi.sh v2.0.0 v2.1.0 x86_64-native-linuxapp-gcc
 
    # Check between git master and local topic-branch "vhost-hacking":
-   ./scripts/validate-abi.sh master vhost-hacking x86_64-native-linuxapp-gcc
+   ./devtools/validate-abi.sh master vhost-hacking x86_64-native-linuxapp-gcc
 
 After the validation script completes (it can take a while since it need to
 compile both tags) it will create compatibility reports in the
-- 
2.7.0

^ permalink raw reply	[relevance 32%]

* [dpdk-dev] [PATCH 0/3] buildtools/devtools/usertools
@ 2016-12-15 21:59  4% Thomas Monjalon
  2016-12-15 21:59 32% ` [dpdk-dev] [PATCH 2/3] scripts: move to devtools Thomas Monjalon
  2016-12-15 21:59  2% ` [dpdk-dev] [PATCH 3/3] tools: move to usertools Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2016-12-15 21:59 UTC (permalink / raw)
  To: dev

The current tools/ and scripts/ directory names are not
self describing.
These patches create devtools/ and usertools/ directories.

Thomas Monjalon (3):
  scripts: move to buildtools
  scripts: move to devtools
  tools: move to usertools

 MAINTAINERS                                      | 26 ++++++++++++------------
 {scripts => buildtools}/auto-config-h.sh         |  0
 {scripts => buildtools}/depdirs-rule.sh          |  0
 {scripts => buildtools}/gen-build-mk.sh          |  0
 {scripts => buildtools}/gen-config-h.sh          |  0
 {scripts => buildtools}/relpath.sh               |  0
 {scripts => devtools}/check-git-log.sh           |  0
 {scripts => devtools}/check-includes.sh          |  0
 {scripts => devtools}/check-maintainers.sh       |  0
 {scripts => devtools}/checkpatches.sh            |  0
 {scripts => devtools}/cocci.sh                   |  2 +-
 {scripts => devtools}/cocci/mtod-offset.cocci    |  0
 {scripts => devtools}/git-log-fixes.sh           |  0
 {scripts => devtools}/load-devel-config          |  0
 {scripts => devtools}/test-build.sh              |  0
 {scripts => devtools}/test-null.sh               |  0
 {scripts => devtools}/validate-abi.sh            |  0
 doc/guides/contributing/patches.rst              |  8 ++++----
 doc/guides/contributing/versioning.rst           | 10 ++++-----
 doc/guides/cryptodevs/qat.rst                    |  2 +-
 doc/guides/faq/faq.rst                           |  2 +-
 doc/guides/freebsd_gsg/build_dpdk.rst            |  2 +-
 doc/guides/howto/lm_bond_virtio_sriov.rst        |  8 ++++----
 doc/guides/howto/lm_virtio_vhost_user.rst        | 16 +++++++--------
 doc/guides/linux_gsg/build_dpdk.rst              | 14 ++++++-------
 doc/guides/linux_gsg/nic_perf_intel_platform.rst |  6 +++---
 doc/guides/linux_gsg/quick_start.rst             |  4 ++--
 doc/guides/nics/bnx2x.rst                        |  4 ++--
 doc/guides/nics/cxgbe.rst                        |  4 ++--
 doc/guides/nics/ena.rst                          |  2 +-
 doc/guides/nics/i40e.rst                         |  4 ++--
 doc/guides/nics/qede.rst                         |  2 +-
 doc/guides/nics/thunderx.rst                     | 26 ++++++++++++------------
 doc/guides/nics/virtio.rst                       |  2 +-
 doc/guides/sample_app_ug/vhost.rst               |  2 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst      | 10 ++++-----
 doc/guides/xen/pkt_switch.rst                    |  2 +-
 drivers/net/mlx4/Makefile                        |  2 +-
 drivers/net/mlx5/Makefile                        |  2 +-
 lib/librte_eal/common/eal_common_options.c       |  2 +-
 mk/internal/rte.depdirs-post.mk                  |  2 +-
 mk/internal/rte.install-post.mk                  |  2 +-
 mk/rte.sdkbuild.mk                               |  2 +-
 mk/rte.sdkconfig.mk                              |  8 ++++----
 mk/rte.sdkinstall.mk                             | 14 +++++--------
 pkg/dpdk.spec                                    |  4 ++--
 {tools => usertools}/cpu_layout.py               |  0
 {tools => usertools}/dpdk-devbind.py             |  0
 {tools => usertools}/dpdk-pmdinfo.py             |  0
 {tools => usertools}/dpdk-setup.sh               | 14 ++++++-------
 50 files changed, 103 insertions(+), 107 deletions(-)
 rename {scripts => buildtools}/auto-config-h.sh (100%)
 rename {scripts => buildtools}/depdirs-rule.sh (100%)
 rename {scripts => buildtools}/gen-build-mk.sh (100%)
 rename {scripts => buildtools}/gen-config-h.sh (100%)
 rename {scripts => buildtools}/relpath.sh (100%)
 rename {scripts => devtools}/check-git-log.sh (100%)
 rename {scripts => devtools}/check-includes.sh (100%)
 rename {scripts => devtools}/check-maintainers.sh (100%)
 rename {scripts => devtools}/checkpatches.sh (100%)
 rename {scripts => devtools}/cocci.sh (98%)
 rename {scripts => devtools}/cocci/mtod-offset.cocci (100%)
 rename {scripts => devtools}/git-log-fixes.sh (100%)
 rename {scripts => devtools}/load-devel-config (100%)
 rename {scripts => devtools}/test-build.sh (100%)
 rename {scripts => devtools}/test-null.sh (100%)
 rename {scripts => devtools}/validate-abi.sh (100%)
 rename {tools => usertools}/cpu_layout.py (100%)
 rename {tools => usertools}/dpdk-devbind.py (100%)
 rename {tools => usertools}/dpdk-pmdinfo.py (100%)
 rename {tools => usertools}/dpdk-setup.sh (97%)

-- 
2.7.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: fix required tools list layout
  2016-12-13 10:03  4% [dpdk-dev] [PATCH] doc: fix required tools list layout Baruch Siach
@ 2016-12-15 15:09  0% ` Mcnamara, John
  2016-12-18 19:11  0%   ` Baruch Siach
  0 siblings, 1 reply; 200+ results
From: Mcnamara, John @ 2016-12-15 15:09 UTC (permalink / raw)
  To: Baruch Siach, dev; +Cc: David Marchand



> -----Original Message-----
> From: Baruch Siach [mailto:baruch@tkos.co.il]
> Sent: Tuesday, December 13, 2016 10:04 AM
> To: dev@dpdk.org
> Cc: Mcnamara, John <john.mcnamara@intel.com>; David Marchand
> <david.marchand@6wind.com>; Baruch Siach <baruch@tkos.co.il>
> Subject: [PATCH] doc: fix required tools list layout
> 
> The Python requirement should appear in the bullet list.
> 
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>  doc/guides/linux_gsg/sys_reqs.rst | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/doc/guides/linux_gsg/sys_reqs.rst
> b/doc/guides/linux_gsg/sys_reqs.rst
> index 3d743421595a..621cc9ddaef6 100644
> --- a/doc/guides/linux_gsg/sys_reqs.rst
> +++ b/doc/guides/linux_gsg/sys_reqs.rst
> @@ -84,9 +84,7 @@ Compilation of the DPDK
>      x86_x32 ABI is currently supported with distribution packages only on
> Ubuntu
>      higher than 13.10 or recent Debian distribution. The only supported
> compiler is gcc 4.9+.
> 
> -.. note::
> -
> -    Python, version 2.6 or 2.7, to use various helper scripts included in
> the DPDK package.
> +*   Python, version 2.6 or 2.7, to use various helper scripts included in
> the DPDK package.
> 

Hi Baruch,

In addition to this change the note on the previous item should be indented to the level of the bullet item. It is probably worth making that change at the same time.

Also, the Python version should probably say 2.7+ and 3.2+ if this patch is accepted: 

    http://dpdk.org/dev/patchwork/patch/17775/

However, since that change hasn't been acked/merged yet you can leave that part of your patch as it is and I'll fix the version numbers in the other patch.

John

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] KNI broken again with 4.9 kernel
  2016-12-15 12:01  0% ` [dpdk-dev] KNI broken again with 4.9 kernel Mcnamara, John
@ 2016-12-15 12:55  0%   ` Jay Rolette
  0 siblings, 0 replies; 200+ results
From: Jay Rolette @ 2016-12-15 12:55 UTC (permalink / raw)
  To: Mcnamara, John; +Cc: Stephen Hemminger, dev, Yigit, Ferruh

On Thu, Dec 15, 2016 at 6:01 AM, Mcnamara, John <john.mcnamara@intel.com>
wrote:

>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Wednesday, December 14, 2016 11:41 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] KNI broken again with 4.9 kernel
> >
> > /build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error:
> > initialization from incompatible pointer type [-Werror=incompatible-
> > pointer-types]
> >   .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
> >                      ^~~~~~~~~~~~~~~~~~~
> >
> > I am sure Ferruh Yigit will fix it.
> >
> > Which raises a couple of questions:
> >  1. Why is DPDK still keeping KNI support for Intel specific ethtool
> > functionality.
> >     This always breaks, is code bloat, and means a 3rd copy of base code
> > (Linux, DPDK PMD, + KNI)
> >
> >  2. Why is KNI not upstream?
> >     If not acceptable due to security or supportablity then why does it
> > still exist?
> >
> >  3. If not upstream, then maintainer should track upstream kernel changes
> > and fix DPDK before
> >     kernel is released.  The ABI is normally set early in the rc cycle
> > weeks before release.
>
>
> Hi Stephen,
>
> On point 2: The feedback we have always received is that the KNI code
> isn't upstreamable. Do you think there is an upstream path?
>
> > If not acceptable due to security or supportablity then why does it
> > still exist?
>
> The most commonly expressed reason when we have asked this question in the
> past (and we did again at Userspace a few months ago) is that the people
> who use it want the performance.
>

We use KNI in our product. In our case, it's because it allows "normal"
non-DPDK apps in the control plane to interact with traffic on the fastpath
as needed. Having everything under the sun live in DPDK's essentially flat
memory space is not great for security or stability.

It helps time to market by being able to use existing programs that
interface to the network via sockets instead of having to limit ourselves
to the relatively tiny set of libraries out there that work directly DPDK.

Double bonus on the time-to-market argument since we can implement
functionality in other higher-level languages as appropriate.

Performance-wise, KNI is "ok" but not great. It's not clear to me why it is
so much slower than using a NIC normally (not via DPDK) via the Linux
network stack. Copying data between sk_buf and mbuf is never going to be
cheap, but comparing that to what happens through all the kernel network
stack layers, the end result seems too slow.

That said, it's still faster than TAP/TUN interfaces and similar approaches.

Jay

On point 3: We do have an internal continuous integration system that runs
> nightly compiles of DPDK against the latest kernel and flags any issues.
>
> John
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] KNI broken again with 4.9 kernel
  2016-12-14 23:40  3% [dpdk-dev] KNI broken again with 4.9 kernel Stephen Hemminger
  2016-12-15 11:53  0% ` [dpdk-dev] KNI Questions Ferruh Yigit
@ 2016-12-15 12:01  0% ` Mcnamara, John
  2016-12-15 12:55  0%   ` Jay Rolette
  1 sibling, 1 reply; 200+ results
From: Mcnamara, John @ 2016-12-15 12:01 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Yigit, Ferruh



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, December 14, 2016 11:41 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] KNI broken again with 4.9 kernel
> 
> /build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error:
> initialization from incompatible pointer type [-Werror=incompatible-
> pointer-types]
>   .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
>                      ^~~~~~~~~~~~~~~~~~~
> 
> I am sure Ferruh Yigit will fix it.
> 
> Which raises a couple of questions:
>  1. Why is DPDK still keeping KNI support for Intel specific ethtool
> functionality.
>     This always breaks, is code bloat, and means a 3rd copy of base code
> (Linux, DPDK PMD, + KNI)
> 
>  2. Why is KNI not upstream?
>     If not acceptable due to security or supportablity then why does it
> still exist?
> 
>  3. If not upstream, then maintainer should track upstream kernel changes
> and fix DPDK before
>     kernel is released.  The ABI is normally set early in the rc cycle
> weeks before release.


Hi Stephen,

On point 2: The feedback we have always received is that the KNI code isn't upstreamable. Do you think there is an upstream path? 

> If not acceptable due to security or supportablity then why does it
> still exist?

The most commonly expressed reason when we have asked this question in the past (and we did again at Userspace a few months ago) is that the people who use it want the performance.

On point 3: We do have an internal continuous integration system that runs nightly compiles of DPDK against the latest kernel and flags any issues.

John

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] KNI Questions
  2016-12-14 23:40  3% [dpdk-dev] KNI broken again with 4.9 kernel Stephen Hemminger
@ 2016-12-15 11:53  0% ` Ferruh Yigit
  2016-12-15 12:01  0% ` [dpdk-dev] KNI broken again with 4.9 kernel Mcnamara, John
  1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2016-12-15 11:53 UTC (permalink / raw)
  To: Stephen Hemminger, dev

Hi Stephen,

<...>

> 
> Which raises a couple of questions:
>  1. Why is DPDK still keeping KNI support for Intel specific ethtool functionality.
>     This always breaks, is code bloat, and means a 3rd copy of base code (Linux, DPDK PMD, + KNI)

I agree on you comments related to the ethtool functionality,
but right now that is a functionality that people may be using, I think
we should not remove it without providing an alternative to it.

> 
>  2. Why is KNI not upstream?
>     If not acceptable due to security or supportablity then why does it still exist?

I believe you are one of the most knowledgeable person in the mail list
on upstreaming, any support is welcome.

> 
>  3. If not upstream, then maintainer should track upstream kernel changes and fix DPDK before
>     kernel is released.  The ABI is normally set early in the rc cycle weeks before release.

I am trying to track as much as possible, any help appreciated.

> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 0/2] support for Hyper-V VMBUS
@ 2016-12-14 23:59  3% Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2016-12-14 23:59 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is the core changes required to support VMBUS.
It does cause some ABI change to ethdev etc.

Stephen Hemminger (2):
  ethdev: increase length ethernet device internal name
  hyperv: VMBUS support infrastucture

 doc/guides/rel_notes/deprecation.rst        |   3 +
 lib/librte_eal/common/Makefile              |   2 +-
 lib/librte_eal/common/eal_common_devargs.c  |   7 +
 lib/librte_eal/common/eal_common_options.c  |  38 ++
 lib/librte_eal/common/eal_internal_cfg.h    |   3 +-
 lib/librte_eal/common/eal_options.h         |   6 +
 lib/librte_eal/common/eal_private.h         |   5 +
 lib/librte_eal/common/include/rte_devargs.h |   8 +
 lib/librte_eal/common/include/rte_vmbus.h   | 247 ++++++++
 lib/librte_eal/linuxapp/eal/Makefile        |   6 +
 lib/librte_eal/linuxapp/eal/eal.c           |  11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c     | 906 ++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.c               |  90 +++
 lib/librte_ether/rte_ethdev.h               |  34 +-
 mk/rte.app.mk                               |   1 +
 15 files changed, 1362 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c

-- 
2.10.2

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] KNI broken again with 4.9 kernel
@ 2016-12-14 23:40  3% Stephen Hemminger
  2016-12-15 11:53  0% ` [dpdk-dev] KNI Questions Ferruh Yigit
  2016-12-15 12:01  0% ` [dpdk-dev] KNI broken again with 4.9 kernel Mcnamara, John
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2016-12-14 23:40 UTC (permalink / raw)
  To: dev

/build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
  .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
                     ^~~~~~~~~~~~~~~~~~~

I am sure Ferruh Yigit will fix it.

Which raises a couple of questions:
 1. Why is DPDK still keeping KNI support for Intel specific ethtool functionality.
    This always breaks, is code bloat, and means a 3rd copy of base code (Linux, DPDK PMD, + KNI)

 2. Why is KNI not upstream?
    If not acceptable due to security or supportablity then why does it still exist?

 3. If not upstream, then maintainer should track upstream kernel changes and fix DPDK before
    kernel is released.  The ABI is normally set early in the rc cycle weeks before release.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst
  2016-12-14 15:13  3% ` [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst tom.barbette
@ 2016-12-14 16:52  0%   ` Bruce Richardson
  2016-12-17 10:43  0%     ` tom.barbette
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2016-12-14 16:52 UTC (permalink / raw)
  To: tom.barbette; +Cc: dev

On Wed, Dec 14, 2016 at 04:13:53PM +0100, tom.barbette@ulg.ac.be wrote:
> Hi list,
> 
> Between 2.2.0 and 16.04 (up to at least 16.07.2 if not current), with the XL710 controller I do not get any packet when calling rte_eth_rx_burst if nb_pkts is too small. I would say smaller than 32. The input rate is not big, if that helps. But It should definitely get at least one packet per second.
> 
> Any ideas? Is that a bug or expected behaviour? Could be caused by other ABI changes?
> 
Does this issue still occur even if you disable the vector driver in
your build-time configuration?

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model
  2016-12-14 13:13  3%           ` Jerin Jacob
@ 2016-12-14 15:15  0%             ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2016-12-14 15:15 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Van Haaren, Harry, dev, thomas.monjalon, hemant.agrawal, Eads, Gage

On Wed, Dec 14, 2016 at 06:43:58PM +0530, Jerin Jacob wrote:
> On Thu, Dec 08, 2016 at 11:02:16AM +0000, Van Haaren, Harry wrote:
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Thursday, December 8, 2016 1:24 AM
> > > To: Van Haaren, Harry <harry.van.haaren@intel.com>
> > 
> > <snip>
> > 
> > > > * Operation and sched_type *increased* to 4 bits each (from previous value of 2) to
> > > allow future expansion without ABI changes
> > > 
> > > Anyway it will break ABI if we add new operation. I would propose to keep 4bit
> > > reserved and add it when required.
> > 
> > Ok sounds good. I'll suggest to move it to the middle between operation or sched type, which would allow expanding operation without ABI breaks. On expanding the field would remain in the same place with the same bits available in that place (no ABI break), but new bits can be added into the currently reserved space.
> 
> OK. We will move the rsvd field as you suggested.
> 
> > 
> > 
> > > > * Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
> > > > * sub-event-type reduced to 4 bits (previous value was 8 bits). Can we think of
> > > situations where 16 values for application specified identifiers of each event-type is
> > > genuinely not enough?
> > > One packet will not go beyond 16 stages but an application may have more stages and
> > > each packet may go mutually exclusive stages. For example,
> > > 
> > > packet 0: stagex_0 ->stagex_1
> > > packet 1: stagey_0 ->stagey_1
> > > 
> > > In that sense, IMO, more than 16 is required.(AFIAK, VPP has any much larger limit on
> > > number of stages)
> > 
> > My understanding was that stages are linked to event queues, so the application can determine the stage the packet comes from by reading queue_id?
> 
> That is one way of doing it. But it is limited to number of queues
> therefore scalability issues.Another approach is through
> sub_event_type scheme without depended on the number of queues.
> 
> > 
> > I'm not opposed to having an 8 bit sub_event_type, but it seems unnecessarily large from my point of view. If you have a use for it, I'm ok with 8 bits.
> 
> OK
> 
> > 
> > 
> > > > In my opinion this structure layout is more balanced, and will perform better due to
> > > less loads that will need masking to access the required value.
> > > OK. Considering more balanced layout and above points. I propose following scheme(based on
> > > your input)
> > > 
> > > 	union {
> > > 		uint64_t event;
> > > 		struct {
> > > 			uint32_t flow_id: 20;
> > > 			uint32_t sub_event_type : 8;
> > > 			uint32_t event_type : 4;
> > > 
> > > 			uint8_t rsvd: 4; /* for future additions */
> > > 			uint8_t operation  : 2; /* new fwd drop */
> > > 			uint8_t sched_type : 2;
> > > 
> > > 			uint8_t queue_id;
> > > 			uint8_t priority;
> > > 			uint8_t impl_opaque;
> > > 		};
> > > 	};
> > > 
> > > Feedback and improvements welcomed,
> > 
> > 
> > So incorporating my latest suggestions on moving fields around, excluding sub_event_type *size* changes:
> > 
> > union {
> > 	uint64_t event;
> > 	struct {
> > 		uint32_t flow_id: 20;
> > 		uint32_t event_type : 4;
> > 		uint32_t sub_event_type : 8; /* 8 bits now naturally aligned */
> 
> Just one suggestion here. I am not sure about the correct split between
> number of bits to represent flow_id and sub_event_type fields. And its
> connected in our implementation, so I propose to move sub_event_type up so
> that future flow_id/sub_event_type bit field size change request wont
> impact our implementation. Since it is represented as 32bit as whole, I
> don't think there is an alignment issue.
> 
> So incorporating my latest suggestions on moving sub_event_type field around:
> 
> union {
> 	uint64_t event;
> 	struct {
> 		uint32_t flow_id: 20;
> 		uint32_t sub_event_type : 8;
> 		uint32_t event_type : 4;
> 

The issue with the above layout is that you have an 8-bit value which
can never be accessed as a byte. With the layout above proposed by
Harry, the sub_event_type can be accessed without any bit manipultaion
operations just by doing a byte read. With the layout you propose, all
fields require masking and/or shifting to access. It won't affect the
scheduler performance for us, but it means potentially more cycles in
the app to access those fields.

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst
       [not found]     <415214732.17903310.1481728244157.JavaMail.zimbra@ulg.ac.be>
@ 2016-12-14 15:13  3% ` tom.barbette
  2016-12-14 16:52  0%   ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: tom.barbette @ 2016-12-14 15:13 UTC (permalink / raw)
  To: dev

Hi list,

Between 2.2.0 and 16.04 (up to at least 16.07.2 if not current), with the XL710 controller I do not get any packet when calling rte_eth_rx_burst if nb_pkts is too small. I would say smaller than 32. The input rate is not big, if that helps. But It should definitely get at least one packet per second.

Any ideas? Is that a bug or expected behaviour? Could be caused by other ABI changes?

Thanks,
Tom

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-12-14 11:48  0%             ` Kevin Traynor
@ 2016-12-14 13:54  0%               ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-12-14 13:54 UTC (permalink / raw)
  To: Kevin Traynor
  Cc: dev, Thomas Monjalon, Pablo de Lara, Olivier Matz, sugesh.chandran

Hi Kevin,

On Wed, Dec 14, 2016 at 11:48:04AM +0000, Kevin Traynor wrote:
> hi Adrien, sorry for the delay
> 
> <...>
> 
> >>>>
> >>>> Is it expected that the application or pmd will provide locking between
> >>>> these functions if required? I think it's going to have to be the app.
> >>>
> >>> Locking is indeed expected to be performed by applications. This API only
> >>> documents places where locking would make sense if necessary and expected
> >>> behavior.
> >>>
> >>> Like all control path APIs, this one assumes a single control thread.
> >>> Applications must take the necessary precautions.
> >>
> >> If you look at OVS now it's quite possible that you have 2 rx queues
> >> serviced by different threads, that would also install the flow rules in
> >> the software flow caches - possibly that could extend to adding hardware
> >> flows. There could also be another thread that is querying for stats. So
> >> anything that can be done to minimise the locking would be helpful -
> >> maybe query() could be atomic and not require any locking?
> > 
> > I think we need basic functions with as few constraints as possible on PMDs
> > first, this API being somewhat complex to implement on their side. That
> > covers the common use case where applications have a single control thread
> > or otherwise perform locking on their own.
> > 
> > Once the basics are there for most PMDs, we may add new functions, items,
> > properties and actions that provide additional constraints (timing,
> > multi-threading and so on), which remain to be defined according to
> > feedback. It is designed to be extended without causing ABI breakage.
> 
> I think Sugesh and I are trying to foresee some of the issues that may
> arise when integrating with something like OVS. OTOH it's
> hard/impossible to say what will be needed exactly in the API right now
> to make it suitable for OVS.
> 
> So, I'm ok with the approach you are taking by exposing a basic API
> but I think there should be an expectation that it may not be sufficient
> for a project like OVS to integrate in and may take several
> iterations/extensions - don't go anywhere!
> 
> > 
> > As for query(), let's see how PMDs handle it first. A race between query()
> > and create() on a given device is almost unavoidable without locking, same
> > for queries that reset counters in a given flow rule. Basic parallel queries
> > should not cause any harm otherwise, although this cannot be guaranteed yet.
> 
> You still have a race if there is locking, except it is for the lock,
> but it has the same effect. The downside of my suggestion is that all
> the PMDs would need to guarantee they could gets stats atomically - I'm
> not sure if they can or it's too restrictive.
> 
> > 
> 
> <...>
> 
> >>
> >>>
> >>>>> +
> >>>>> +/**
> >>>>> + * Destroy a flow rule on a given port.
> >>>>> + *
> >>>>> + * Failure to destroy a flow rule handle may occur when other flow rules
> >>>>> + * depend on it, and destroying it would result in an inconsistent state.
> >>>>> + *
> >>>>> + * This function is only guaranteed to succeed if handles are destroyed in
> >>>>> + * reverse order of their creation.
> >>>>
> >>>> How can the application find this information out on error?
> >>>
> >>> Without maintaining a list, they cannot. The specified case is the only
> >>> possible guarantee. That does not mean PMDs should not do their best to
> >>> destroy flow rules, only that ordering must remain consistent in case of
> >>> inability to destroy one.
> >>>
> >>> What do you suggest?
> >>
> >> I think if the app cannot remove a specific rule it may want to remove
> >> all rules and deal with flows in software for a time. So once the app
> >> knows it fails that should be enough.
> > 
> > OK, then since destruction may return an error already, is it fine?
> > Applications may call rte_flow_flush() (not supposed to fail unless there is
> > a serious issue, abort() in that case) and switch to SW fallback.
> 
> yes, it's fine.
> 
> > 
> 
> <...>
> 
> >>>>> + * @param[out] error
> >>>>> + *   Perform verbose error reporting if not NULL.
> >>>>> + *
> >>>>> + * @return
> >>>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> >>>>> + */
> >>>>> +int
> >>>>> +rte_flow_query(uint8_t port_id,
> >>>>> +	       struct rte_flow *flow,
> >>>>> +	       enum rte_flow_action_type action,
> >>>>> +	       void *data,
> >>>>> +	       struct rte_flow_error *error);
> >>>>> +
> >>>>> +#ifdef __cplusplus
> >>>>> +}
> >>>>> +#endif
> >>>>
> >>>> I don't see a way to dump all the rules for a port out. I think this is
> >>>> neccessary for degbugging. You could have a look through dpif.h in OVS
> >>>> and see how dpif_flow_dump_next() is used, it might be a good reference.
> >>>
> >>> DPDK does not maintain flow rules and, depending on hardware capabilities
> >>> and level of compliance, PMDs do not necessarily do it either, particularly
> >>> since it requires space and application probably have a better method to
> >>> store these pointers for their own needs.
> >>
> >> understood
> >>
> >>>
> >>> What you see here is only a PMD interface. Depending on applications needs,
> >>> generic helper functions built on top of these may be added to manage flow
> >>> rules in the future.
> >>
> >> I'm thinking of the case where something goes wrong and I want to get a
> >> dump of all the flow rules from hardware, not query the rules I think I
> >> have. I don't see a way to do it or something to build a helper on top of?
> > 
> > Generic helper functions would exist on top of this API and would likely
> > maintain a list of flow rules themselves. The dump in that case would be
> > entirely implemented in software. I think that recovering flow rules from HW
> > may be complicated in many cases (even without taking storage allocation and
> > rules conversion issues into account), therefore if there is really a need
> > for it, we could perhaps add a dump() function that PMDs are free to
> > implement later.
> > 
> 
> ok. Maybe there are some more generic stats that can be got from the
> hardware that would help debugging that would suffice, like total flow
> rule hits/misses (i.e. not on a per flow rule basis).
> 
> You can get this from the software flow caches and it's widely used for
> debugging. e.g.
> 
> pmd thread numa_id 0 core_id 3:
> 	emc hits:0
> 	megaflow hits:0
> 	avg. subtable lookups per hit:0.00
> 	miss:0
> 

Perhaps a rule such as the following could do the trick:

 group: 42 (or priority 42)
 pattern: void
 actions: count / passthru

Assuming useful flow rules are defined with higher priorities (using lower
group ID or priority level) and provide a terminating action, this one would
count all packets that were not caught by them.

That is one example to illustrate how "global" counters can be requested by
applications.

Otherwise you could just make sure all rules contain mark / flag actions, in
which case mbufs would tell directly if they went through them or need
additional SW processing.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model
  2016-12-08 11:02  4%         ` Van Haaren, Harry
@ 2016-12-14 13:13  3%           ` Jerin Jacob
  2016-12-14 15:15  0%             ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-12-14 13:13 UTC (permalink / raw)
  To: Van Haaren, Harry
  Cc: dev, thomas.monjalon, Richardson, Bruce, hemant.agrawal, Eads, Gage

On Thu, Dec 08, 2016 at 11:02:16AM +0000, Van Haaren, Harry wrote:
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Thursday, December 8, 2016 1:24 AM
> > To: Van Haaren, Harry <harry.van.haaren@intel.com>
> 
> <snip>
> 
> > > * Operation and sched_type *increased* to 4 bits each (from previous value of 2) to
> > allow future expansion without ABI changes
> > 
> > Anyway it will break ABI if we add new operation. I would propose to keep 4bit
> > reserved and add it when required.
> 
> Ok sounds good. I'll suggest to move it to the middle between operation or sched type, which would allow expanding operation without ABI breaks. On expanding the field would remain in the same place with the same bits available in that place (no ABI break), but new bits can be added into the currently reserved space.

OK. We will move the rsvd field as you suggested.

> 
> 
> > > * Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
> > > * sub-event-type reduced to 4 bits (previous value was 8 bits). Can we think of
> > situations where 16 values for application specified identifiers of each event-type is
> > genuinely not enough?
> > One packet will not go beyond 16 stages but an application may have more stages and
> > each packet may go mutually exclusive stages. For example,
> > 
> > packet 0: stagex_0 ->stagex_1
> > packet 1: stagey_0 ->stagey_1
> > 
> > In that sense, IMO, more than 16 is required.(AFIAK, VPP has any much larger limit on
> > number of stages)
> 
> My understanding was that stages are linked to event queues, so the application can determine the stage the packet comes from by reading queue_id?

That is one way of doing it. But it is limited to number of queues
therefore scalability issues.Another approach is through
sub_event_type scheme without depended on the number of queues.

> 
> I'm not opposed to having an 8 bit sub_event_type, but it seems unnecessarily large from my point of view. If you have a use for it, I'm ok with 8 bits.

OK

> 
> 
> > > In my opinion this structure layout is more balanced, and will perform better due to
> > less loads that will need masking to access the required value.
> > OK. Considering more balanced layout and above points. I propose following scheme(based on
> > your input)
> > 
> > 	union {
> > 		uint64_t event;
> > 		struct {
> > 			uint32_t flow_id: 20;
> > 			uint32_t sub_event_type : 8;
> > 			uint32_t event_type : 4;
> > 
> > 			uint8_t rsvd: 4; /* for future additions */
> > 			uint8_t operation  : 2; /* new fwd drop */
> > 			uint8_t sched_type : 2;
> > 
> > 			uint8_t queue_id;
> > 			uint8_t priority;
> > 			uint8_t impl_opaque;
> > 		};
> > 	};
> > 
> > Feedback and improvements welcomed,
> 
> 
> So incorporating my latest suggestions on moving fields around, excluding sub_event_type *size* changes:
> 
> union {
> 	uint64_t event;
> 	struct {
> 		uint32_t flow_id: 20;
> 		uint32_t event_type : 4;
> 		uint32_t sub_event_type : 8; /* 8 bits now naturally aligned */

Just one suggestion here. I am not sure about the correct split between
number of bits to represent flow_id and sub_event_type fields. And its
connected in our implementation, so I propose to move sub_event_type up so
that future flow_id/sub_event_type bit field size change request wont
impact our implementation. Since it is represented as 32bit as whole, I
don't think there is an alignment issue.

So incorporating my latest suggestions on moving sub_event_type field around:

union {
	uint64_t event;
	struct {
		uint32_t flow_id: 20;
		uint32_t sub_event_type : 8;
		uint32_t event_type : 4;

		uint8_t operation  : 2; /* new fwd drop */
		uint8_t rsvd: 4; /* for future additions, can be expanded into without ABI break */
		uint8_t sched_type : 2;

		uint8_t queue_id;
		uint8_t priority;
		uint8_t impl_opaque;
	};
};

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-12-08 17:07  3%           ` Adrien Mazarguil
@ 2016-12-14 11:48  0%             ` Kevin Traynor
  2016-12-14 13:54  0%               ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Kevin Traynor @ 2016-12-14 11:48 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: dev, Thomas Monjalon, Pablo de Lara, Olivier Matz, sugesh.chandran

hi Adrien, sorry for the delay

<...>

>>>>
>>>> Is it expected that the application or pmd will provide locking between
>>>> these functions if required? I think it's going to have to be the app.
>>>
>>> Locking is indeed expected to be performed by applications. This API only
>>> documents places where locking would make sense if necessary and expected
>>> behavior.
>>>
>>> Like all control path APIs, this one assumes a single control thread.
>>> Applications must take the necessary precautions.
>>
>> If you look at OVS now it's quite possible that you have 2 rx queues
>> serviced by different threads, that would also install the flow rules in
>> the software flow caches - possibly that could extend to adding hardware
>> flows. There could also be another thread that is querying for stats. So
>> anything that can be done to minimise the locking would be helpful -
>> maybe query() could be atomic and not require any locking?
> 
> I think we need basic functions with as few constraints as possible on PMDs
> first, this API being somewhat complex to implement on their side. That
> covers the common use case where applications have a single control thread
> or otherwise perform locking on their own.
> 
> Once the basics are there for most PMDs, we may add new functions, items,
> properties and actions that provide additional constraints (timing,
> multi-threading and so on), which remain to be defined according to
> feedback. It is designed to be extended without causing ABI breakage.

I think Sugesh and I are trying to foresee some of the issues that may
arise when integrating with something like OVS. OTOH it's
hard/impossible to say what will be needed exactly in the API right now
to make it suitable for OVS.

So, I'm ok with the approach you are taking by exposing a basic API
but I think there should be an expectation that it may not be sufficient
for a project like OVS to integrate in and may take several
iterations/extensions - don't go anywhere!

> 
> As for query(), let's see how PMDs handle it first. A race between query()
> and create() on a given device is almost unavoidable without locking, same
> for queries that reset counters in a given flow rule. Basic parallel queries
> should not cause any harm otherwise, although this cannot be guaranteed yet.

You still have a race if there is locking, except it is for the lock,
but it has the same effect. The downside of my suggestion is that all
the PMDs would need to guarantee they could gets stats atomically - I'm
not sure if they can or it's too restrictive.

> 

<...>

>>
>>>
>>>>> +
>>>>> +/**
>>>>> + * Destroy a flow rule on a given port.
>>>>> + *
>>>>> + * Failure to destroy a flow rule handle may occur when other flow rules
>>>>> + * depend on it, and destroying it would result in an inconsistent state.
>>>>> + *
>>>>> + * This function is only guaranteed to succeed if handles are destroyed in
>>>>> + * reverse order of their creation.
>>>>
>>>> How can the application find this information out on error?
>>>
>>> Without maintaining a list, they cannot. The specified case is the only
>>> possible guarantee. That does not mean PMDs should not do their best to
>>> destroy flow rules, only that ordering must remain consistent in case of
>>> inability to destroy one.
>>>
>>> What do you suggest?
>>
>> I think if the app cannot remove a specific rule it may want to remove
>> all rules and deal with flows in software for a time. So once the app
>> knows it fails that should be enough.
> 
> OK, then since destruction may return an error already, is it fine?
> Applications may call rte_flow_flush() (not supposed to fail unless there is
> a serious issue, abort() in that case) and switch to SW fallback.

yes, it's fine.

> 

<...>

>>>>> + * @param[out] error
>>>>> + *   Perform verbose error reporting if not NULL.
>>>>> + *
>>>>> + * @return
>>>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>>>> + */
>>>>> +int
>>>>> +rte_flow_query(uint8_t port_id,
>>>>> +	       struct rte_flow *flow,
>>>>> +	       enum rte_flow_action_type action,
>>>>> +	       void *data,
>>>>> +	       struct rte_flow_error *error);
>>>>> +
>>>>> +#ifdef __cplusplus
>>>>> +}
>>>>> +#endif
>>>>
>>>> I don't see a way to dump all the rules for a port out. I think this is
>>>> neccessary for degbugging. You could have a look through dpif.h in OVS
>>>> and see how dpif_flow_dump_next() is used, it might be a good reference.
>>>
>>> DPDK does not maintain flow rules and, depending on hardware capabilities
>>> and level of compliance, PMDs do not necessarily do it either, particularly
>>> since it requires space and application probably have a better method to
>>> store these pointers for their own needs.
>>
>> understood
>>
>>>
>>> What you see here is only a PMD interface. Depending on applications needs,
>>> generic helper functions built on top of these may be added to manage flow
>>> rules in the future.
>>
>> I'm thinking of the case where something goes wrong and I want to get a
>> dump of all the flow rules from hardware, not query the rules I think I
>> have. I don't see a way to do it or something to build a helper on top of?
> 
> Generic helper functions would exist on top of this API and would likely
> maintain a list of flow rules themselves. The dump in that case would be
> entirely implemented in software. I think that recovering flow rules from HW
> may be complicated in many cases (even without taking storage allocation and
> rules conversion issues into account), therefore if there is really a need
> for it, we could perhaps add a dump() function that PMDs are free to
> implement later.
> 

ok. Maybe there are some more generic stats that can be got from the
hardware that would help debugging that would suffice, like total flow
rule hits/misses (i.e. not on a per flow rule basis).

You can get this from the software flow caches and it's widely used for
debugging. e.g.

pmd thread numa_id 0 core_id 3:
	emc hits:0
	megaflow hits:0
	avg. subtable lookups per hit:0.00
	miss:0

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 0/7] net/ixgbe: move set VF functions.
  2016-12-13 13:36  0%       ` Ferruh Yigit
@ 2016-12-13 13:46  0%         ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2016-12-13 13:46 UTC (permalink / raw)
  To: Yigit, Ferruh, thomas.monjalon, dev


Hi Ferruh,

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Tuesday, December 13, 2016 1:37 PM
> To: Iremonger, Bernard <bernard.iremonger@intel.com>;
> thomas.monjalon@6wind.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 0/7] net/ixgbe: move set VF functions.
> 
> On 12/13/2016 11:40 AM, Bernard Iremonger wrote:
> > This patchset implements the following deprecation notice:
> > [PATCH v1] doc: announce API and ABI change for librte_ether
> >
> > Changes in  V4:
> > Fixed compile issues when ixgbe PMD is not present.
> > Removed duplicate testpmd commands.
> > Added cleanup patch for testpmd.
> > Updated release note.
> >
> > Changes in V3:
> > Updated LIBABIVER in Makefile in librte_ether patch.
> > Updated rte_ethdev.h and ret_ether_version.map in librte_ether patch.
> > Squashed deprecation notice patch into librte_ether patch.
> > Added release_note patch.
> >
> > Changes in V2:
> > Update testpmd set vf commands help messages.
> > Updated ethtool to use the ixgbe public API's.
> > Removed the ixgbe_set_pool_* and ixgbe_set_vf_rate_limit functions.
> > Removed the rte_eth_dev_set_vf_* API's Removed the deprecation
> notice.
> >
> > Changes in V1:
> > The following functions from eth_dev_ops have been moved to the ixgbe
> > PMD and renamed:
> >
> > ixgbe_set_pool_rx_mode
> > ixgbe_set_pool_rx
> > ixgbe_set_pool_tx
> > ixgbe_set_pool_vlan_filter
> > ixgbe_set_vf_rate_limit
> >
> > Renamed the functions to the following:
> >
> > rte_pmd_ixgbe_set_vf_rxmode
> > rte_pmd_ixgbe_set_vf_rx
> > rte_pmd_ixgbe_set_vf_tx
> > rte_pmd_ixgbe_set_vf_vlan_filter
> > rte_pmd_ixgbe_set_vf_rate_limit
> >
> > Testpmd has been modified to use the following functions:
> > rte_pmd_ixgbe_set_vf_rxmode
> > rte_pmd_ixgbe_set_vf_rate_limit
> >
> > New testpmd commands have been added to test the following functions:
> > rte_pmd_ixgbe_set_vf_rx
> > rte_pmd_ixgbe_set_vf_tx
> > rte_pmd_ixgbe_set_vf_vlan_filter
> >
> > The testpmd user guide has been updated for the new commands.
> >
> > Bernard Iremonger (7):
> >   net/ixgbe: move set VF functions from the ethdev
> >   app/testpmd: use ixgbe public functions
> >   app/testpmd: cleanup parameter checking
> >   examples/ethtool: use ixgbe public function
> >   net/ixgbe: remove static set VF functions
> >   librte_ether: remove the set VF API's
> >   doc: update release notes
> >
> 
> Series applied to dpdk-next-net/master, thanks.
> 
> Last patch squashed.

Thanks.

Regards,

Bernard.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 0/7] net/ixgbe: move set VF functions.
  2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 0/7] " Bernard Iremonger
@ 2016-12-13 13:36  0%       ` Ferruh Yigit
  2016-12-13 13:46  0%         ` Iremonger, Bernard
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-12-13 13:36 UTC (permalink / raw)
  To: Bernard Iremonger, thomas.monjalon, dev

On 12/13/2016 11:40 AM, Bernard Iremonger wrote:
> This patchset implements the following deprecation notice:
> [PATCH v1] doc: announce API and ABI change for librte_ether
> 
> Changes in  V4:
> Fixed compile issues when ixgbe PMD is not present.
> Removed duplicate testpmd commands.
> Added cleanup patch for testpmd.
> Updated release note.
> 
> Changes in V3:
> Updated LIBABIVER in Makefile in librte_ether patch.
> Updated rte_ethdev.h and ret_ether_version.map in librte_ether patch.
> Squashed deprecation notice patch into librte_ether patch.
> Added release_note patch.
> 
> Changes in V2:
> Update testpmd set vf commands help messages.
> Updated ethtool to use the ixgbe public API's.
> Removed the ixgbe_set_pool_* and ixgbe_set_vf_rate_limit functions.
> Removed the rte_eth_dev_set_vf_* API's
> Removed the deprecation notice.
> 
> Changes in V1:
> The following functions from eth_dev_ops have been moved to the ixgbe PMD
> and renamed:
> 
> ixgbe_set_pool_rx_mode
> ixgbe_set_pool_rx
> ixgbe_set_pool_tx
> ixgbe_set_pool_vlan_filter
> ixgbe_set_vf_rate_limit
> 
> Renamed the functions to the following:
> 
> rte_pmd_ixgbe_set_vf_rxmode
> rte_pmd_ixgbe_set_vf_rx
> rte_pmd_ixgbe_set_vf_tx
> rte_pmd_ixgbe_set_vf_vlan_filter
> rte_pmd_ixgbe_set_vf_rate_limit
> 
> Testpmd has been modified to use the following functions:
> rte_pmd_ixgbe_set_vf_rxmode
> rte_pmd_ixgbe_set_vf_rate_limit
> 
> New testpmd commands have been added to test the following functions:
> rte_pmd_ixgbe_set_vf_rx
> rte_pmd_ixgbe_set_vf_tx
> rte_pmd_ixgbe_set_vf_vlan_filter
> 
> The testpmd user guide has been updated for the new commands.
> 
> Bernard Iremonger (7):
>   net/ixgbe: move set VF functions from the ethdev
>   app/testpmd: use ixgbe public functions
>   app/testpmd: cleanup parameter checking
>   examples/ethtool: use ixgbe public function
>   net/ixgbe: remove static set VF functions
>   librte_ether: remove the set VF API's
>   doc: update release notes
> 

Series applied to dpdk-next-net/master, thanks.

Last patch squashed.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 7/7] doc: update release notes
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 " Bernard Iremonger
  2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 0/7] " Bernard Iremonger
  2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 6/7] librte_ether: remove the set VF API's Bernard Iremonger
@ 2016-12-13 11:40  4%     ` Bernard Iremonger
  2 siblings, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-12-13 11:40 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1871 bytes --]

Add release note for removing set VF API's from the ethdev,
renaming the API's and moving them to the ixgbe PMD.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 doc/guides/rel_notes/release_17_02.rst | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..7a40057 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -38,7 +38,6 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
-
 Resolved Issues
 ---------------
 
@@ -102,6 +101,26 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Moved five APIs for VF management from the ethdev to the ixgbe PMD.**
+
+  The following five APIs for VF management from the PF have been removed from the ethdev,
+  renamed and added to the ixgbe PMD::
+
+    rte_eth_dev_set_vf_rate_limit
+    rte_eth_dev_set_vf_rx
+    rte_eth_dev_set_vf_rxmode
+    rte_eth_dev_set_vf_tx
+    rte_eth_dev_set_vf_vlan_filter
+
+  The API's have been renamed to the following::
+
+    rte_pmd_ixgbe_set_vf_rate_limit
+    rte_pmd_ixgbe_set_vf_rx
+    rte_pmd_ixgbe_set_vf_rxmode
+    rte_pmd_ixgbe_set_vf_tx
+    rte_pmd_ixgbe_set_vf_vlan_filter
+
+  The declarations for the API’s can be found in ``rte_pmd_ixgbe.h``.
 
 ABI Changes
 -----------
@@ -142,7 +161,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_cryptodev.so.2
      librte_distributor.so.1
      librte_eal.so.3
-     librte_ethdev.so.5
+     +librte_ethdev.so.6
      librte_hash.so.2
      librte_ip_frag.so.1
      librte_jobstats.so.1
-- 
2.10.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 6/7] librte_ether: remove the set VF API's
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 " Bernard Iremonger
  2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 0/7] " Bernard Iremonger
@ 2016-12-13 11:40  3%     ` Bernard Iremonger
  2016-12-13 11:40  4%     ` [dpdk-dev] [PATCH v4 7/7] doc: update release notes Bernard Iremonger
  2 siblings, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-12-13 11:40 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

remove the following API's:

rte_eth_dev_set_vf_rxmode
rte_eth_dev_set_vf_rx
rte_eth_dev_set_vf_tx
rte_eth_dev_set_vf_vlan_filter
rte_eth_dev_set_vf_rate_limit

Increment LIBABIVER in Makefile
Remove deprecation notice for removing rte_eth_dev_set_vf_* API's.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 doc/guides/rel_notes/deprecation.rst   |  13 ---
 lib/librte_ether/Makefile              |   4 +-
 lib/librte_ether/rte_ethdev.c          | 129 ------------------------------
 lib/librte_ether/rte_ethdev.h          | 140 ---------------------------------
 lib/librte_ether/rte_ether_version.map |   7 +-
 5 files changed, 3 insertions(+), 290 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 2d17bc6..c897c18 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -38,19 +38,6 @@ Deprecation Notices
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
-* ethdev: for 17.02 it is planned to deprecate the following five functions
-  and move them in ixgbe:
-
-  ``rte_eth_dev_set_vf_rxmode``
-
-  ``rte_eth_dev_set_vf_rx``
-
-  ``rte_eth_dev_set_vf_tx``
-
-  ``rte_eth_dev_set_vf_vlan_filter``
-
-  ``rte_eth_set_vf_rate_limit``
-
 * ABI changes are planned for 17.02 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..d23015c 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
 
 EXPORT_MAP := rte_ether_version.map
 
-LIBABIVER := 5
+LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1e0f206..6a93014 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2137,32 +2137,6 @@ rte_eth_dev_default_mac_addr_set(uint8_t port_id, struct ether_addr *addr)
 	return 0;
 }
 
-int
-rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
-				uint16_t rx_mode, uint8_t on)
-{
-	uint16_t num_vfs;
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-
-	num_vfs = dev_info.max_vfs;
-	if (vf > num_vfs) {
-		RTE_PMD_DEBUG_TRACE("set VF RX mode:invalid VF id %d\n", vf);
-		return -EINVAL;
-	}
-
-	if (rx_mode == 0) {
-		RTE_PMD_DEBUG_TRACE("set VF RX mode:mode mask ca not be zero\n");
-		return -EINVAL;
-	}
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rx_mode, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_rx_mode)(dev, vf, rx_mode, on);
-}
 
 /*
  * Returns index into MAC address array of addr. Use 00:00:00:00:00:00 to find
@@ -2252,76 +2226,6 @@ rte_eth_dev_uc_all_hash_table_set(uint8_t port_id, uint8_t on)
 	return (*dev->dev_ops->uc_all_hash_table_set)(dev, on);
 }
 
-int
-rte_eth_dev_set_vf_rx(uint8_t port_id, uint16_t vf, uint8_t on)
-{
-	uint16_t num_vfs;
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-
-	num_vfs = dev_info.max_vfs;
-	if (vf > num_vfs) {
-		RTE_PMD_DEBUG_TRACE("port %d: invalid vf id\n", port_id);
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rx, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_rx)(dev, vf, on);
-}
-
-int
-rte_eth_dev_set_vf_tx(uint8_t port_id, uint16_t vf, uint8_t on)
-{
-	uint16_t num_vfs;
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-
-	num_vfs = dev_info.max_vfs;
-	if (vf > num_vfs) {
-		RTE_PMD_DEBUG_TRACE("set pool tx:invalid pool id=%d\n", vf);
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_tx, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_tx)(dev, vf, on);
-}
-
-int
-rte_eth_dev_set_vf_vlan_filter(uint8_t port_id, uint16_t vlan_id,
-			       uint64_t vf_mask, uint8_t vlan_on)
-{
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-
-	if (vlan_id > ETHER_MAX_VLAN_ID) {
-		RTE_PMD_DEBUG_TRACE("VF VLAN filter:invalid VLAN id=%d\n",
-			vlan_id);
-		return -EINVAL;
-	}
-
-	if (vf_mask == 0) {
-		RTE_PMD_DEBUG_TRACE("VF VLAN filter:pool_mask can not be 0\n");
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_vlan_filter, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_vlan_filter)(dev, vlan_id,
-						   vf_mask, vlan_on);
-}
-
 int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
 					uint16_t tx_rate)
 {
@@ -2352,39 +2256,6 @@ int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
 	return (*dev->dev_ops->set_queue_rate_limit)(dev, queue_idx, tx_rate);
 }
 
-int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf, uint16_t tx_rate,
-				uint64_t q_msk)
-{
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-	struct rte_eth_link link;
-
-	if (q_msk == 0)
-		return 0;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-	link = dev->data->dev_link;
-
-	if (vf > dev_info.max_vfs) {
-		RTE_PMD_DEBUG_TRACE("set VF rate limit:port %d: "
-				"invalid vf id=%d\n", port_id, vf);
-		return -EINVAL;
-	}
-
-	if (tx_rate > link.link_speed) {
-		RTE_PMD_DEBUG_TRACE("set VF rate limit:invalid tx_rate=%d, "
-				"bigger than link speed= %d\n",
-				tx_rate, link.link_speed);
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rate_limit, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_rate_limit)(dev, vf, tx_rate, q_msk);
-}
-
 int
 rte_eth_mirror_rule_set(uint8_t port_id,
 			struct rte_eth_mirror_conf *mirror_conf,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..c602d7d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1249,39 +1249,11 @@ typedef int (*eth_uc_all_hash_table_set_t)(struct rte_eth_dev *dev,
 				  uint8_t on);
 /**< @internal Set all Unicast Hash bitmap */
 
-typedef int (*eth_set_vf_rx_mode_t)(struct rte_eth_dev *dev,
-				  uint16_t vf,
-				  uint16_t rx_mode,
-				  uint8_t on);
-/**< @internal Set a VF receive mode */
-
-typedef int (*eth_set_vf_rx_t)(struct rte_eth_dev *dev,
-				uint16_t vf,
-				uint8_t on);
-/**< @internal Set a VF receive  mode */
-
-typedef int (*eth_set_vf_tx_t)(struct rte_eth_dev *dev,
-				uint16_t vf,
-				uint8_t on);
-/**< @internal Enable or disable a VF transmit   */
-
-typedef int (*eth_set_vf_vlan_filter_t)(struct rte_eth_dev *dev,
-				  uint16_t vlan,
-				  uint64_t vf_mask,
-				  uint8_t vlan_on);
-/**< @internal Set VF VLAN pool filter */
-
 typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint16_t tx_rate);
 /**< @internal Set queue TX rate */
 
-typedef int (*eth_set_vf_rate_limit_t)(struct rte_eth_dev *dev,
-				uint16_t vf,
-				uint16_t tx_rate,
-				uint64_t q_msk);
-/**< @internal Set VF TX rate */
-
 typedef int (*eth_mirror_rule_set_t)(struct rte_eth_dev *dev,
 				  struct rte_eth_mirror_conf *mirror_conf,
 				  uint8_t rule_id,
@@ -1479,16 +1451,11 @@ struct eth_dev_ops {
 	eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast hash bitmap */
 	eth_mirror_rule_set_t	   mirror_rule_set;  /**< Add a traffic mirror rule.*/
 	eth_mirror_rule_reset_t	   mirror_rule_reset;  /**< reset a traffic mirror rule.*/
-	eth_set_vf_rx_mode_t       set_vf_rx_mode;   /**< Set VF RX mode */
-	eth_set_vf_rx_t            set_vf_rx;  /**< enable/disable a VF receive */
-	eth_set_vf_tx_t            set_vf_tx;  /**< enable/disable a VF transmit */
-	eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter */
 	/** Add UDP tunnel port. */
 	eth_udp_tunnel_port_add_t udp_tunnel_port_add;
 	/** Del UDP tunnel port. */
 	eth_udp_tunnel_port_del_t udp_tunnel_port_del;
 	eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate limit */
-	eth_set_vf_rate_limit_t    set_vf_rate_limit;   /**< Set VF rate limit */
 	/** Update redirection table. */
 	reta_update_t reta_update;
 	/** Query redirection table. */
@@ -3403,93 +3370,6 @@ int rte_eth_dev_uc_hash_table_set(uint8_t port,struct ether_addr *addr,
  */
 int rte_eth_dev_uc_all_hash_table_set(uint8_t port,uint8_t on);
 
- /**
- * Set RX L2 Filtering mode of a VF of an Ethernet device.
- *
- * @param port
- *   The port identifier of the Ethernet device.
- * @param vf
- *   VF id.
- * @param rx_mode
- *    The RX mode mask, which  is one or more of  accepting Untagged Packets,
- *    packets that match the PFUTA table, Broadcast and Multicast Promiscuous.
- *    ETH_VMDQ_ACCEPT_UNTAG,ETH_VMDQ_ACCEPT_HASH_UC,
- *    ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST will be used
- *    in rx_mode.
- * @param on
- *    1 - Enable a VF RX mode.
- *    0 - Disable a VF RX mode.
- * @return
- *   - (0) if successful.
- *   - (-ENOTSUP) if hardware doesn't support.
- *   - (-ENOTSUP) if hardware doesn't support.
- *   - (-EINVAL) if bad parameter.
- */
-int rte_eth_dev_set_vf_rxmode(uint8_t port, uint16_t vf, uint16_t rx_mode,
-				uint8_t on);
-
-/**
-* Enable or disable a VF traffic transmit of the Ethernet device.
-*
-* @param port
-*   The port identifier of the Ethernet device.
-* @param vf
-*   VF id.
-* @param on
-*    1 - Enable a VF traffic transmit.
-*    0 - Disable a VF traffic transmit.
-* @return
-*   - (0) if successful.
-*   - (-ENODEV) if *port_id* invalid.
-*   - (-ENOTSUP) if hardware doesn't support.
-*   - (-EINVAL) if bad parameter.
-*/
-int
-rte_eth_dev_set_vf_tx(uint8_t port,uint16_t vf, uint8_t on);
-
-/**
-* Enable or disable a VF traffic receive of an Ethernet device.
-*
-* @param port
-*   The port identifier of the Ethernet device.
-* @param vf
-*   VF id.
-* @param on
-*    1 - Enable a VF traffic receive.
-*    0 - Disable a VF traffic receive.
-* @return
-*   - (0) if successful.
-*   - (-ENOTSUP) if hardware doesn't support.
-*   - (-ENODEV) if *port_id* invalid.
-*   - (-EINVAL) if bad parameter.
-*/
-int
-rte_eth_dev_set_vf_rx(uint8_t port,uint16_t vf, uint8_t on);
-
-/**
-* Enable/Disable hardware VF VLAN filtering by an Ethernet device of
-* received VLAN packets tagged with a given VLAN Tag Identifier.
-*
-* @param port id
-*   The port identifier of the Ethernet device.
-* @param vlan_id
-*   The VLAN Tag Identifier whose filtering must be enabled or disabled.
-* @param vf_mask
-*    Bitmap listing which VFs participate in the VLAN filtering.
-* @param vlan_on
-*    1 - Enable VFs VLAN filtering.
-*    0 - Disable VFs VLAN filtering.
-* @return
-*   - (0) if successful.
-*   - (-ENOTSUP) if hardware doesn't support.
-*   - (-ENODEV) if *port_id* invalid.
-*   - (-EINVAL) if bad parameter.
-*/
-int
-rte_eth_dev_set_vf_vlan_filter(uint8_t port, uint16_t vlan_id,
-				uint64_t vf_mask,
-				uint8_t vlan_on);
-
 /**
  * Set a traffic mirroring rule on an Ethernet device
  *
@@ -3551,26 +3431,6 @@ int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
 			uint16_t tx_rate);
 
 /**
- * Set the rate limitation for a vf on an Ethernet device.
- *
- * @param port_id
- *   The port identifier of the Ethernet device.
- * @param vf
- *   VF id.
- * @param tx_rate
- *   The tx rate allocated from the total link speed for this VF id.
- * @param q_msk
- *   The queue mask which need to set the rate.
- * @return
- *   - (0) if successful.
- *   - (-ENOTSUP) if hardware doesn't support this feature.
- *   - (-ENODEV) if *port_id* invalid.
- *   - (-EINVAL) if bad parameter.
- */
-int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf,
-			uint16_t tx_rate, uint64_t q_msk);
-
-/**
  * Initialize bypass logic. This function needs to be called before
  * executing any other bypass API.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..7594416 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -61,10 +61,6 @@ DPDK_2.2 {
 	rte_eth_dev_set_mtu;
 	rte_eth_dev_set_rx_queue_stats_mapping;
 	rte_eth_dev_set_tx_queue_stats_mapping;
-	rte_eth_dev_set_vf_rx;
-	rte_eth_dev_set_vf_rxmode;
-	rte_eth_dev_set_vf_tx;
-	rte_eth_dev_set_vf_vlan_filter;
 	rte_eth_dev_set_vlan_offload;
 	rte_eth_dev_set_vlan_pvid;
 	rte_eth_dev_set_vlan_strip_on_queue;
@@ -94,7 +90,6 @@ DPDK_2.2 {
 	rte_eth_rx_queue_info_get;
 	rte_eth_rx_queue_setup;
 	rte_eth_set_queue_rate_limit;
-	rte_eth_set_vf_rate_limit;
 	rte_eth_stats;
 	rte_eth_stats_get;
 	rte_eth_stats_reset;
@@ -146,4 +141,4 @@ DPDK_16.11 {
 	rte_eth_dev_pci_probe;
 	rte_eth_dev_pci_remove;
 
-} DPDK_16.07;
+} DPDK_16.07;
\ No newline at end of file
-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 0/7] net/ixgbe: move set VF functions.
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 " Bernard Iremonger
@ 2016-12-13 11:40  3%     ` Bernard Iremonger
  2016-12-13 13:36  0%       ` Ferruh Yigit
  2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 6/7] librte_ether: remove the set VF API's Bernard Iremonger
  2016-12-13 11:40  4%     ` [dpdk-dev] [PATCH v4 7/7] doc: update release notes Bernard Iremonger
  2 siblings, 1 reply; 200+ results
From: Bernard Iremonger @ 2016-12-13 11:40 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

This patchset implements the following deprecation notice:
[PATCH v1] doc: announce API and ABI change for librte_ether

Changes in  V4:
Fixed compile issues when ixgbe PMD is not present.
Removed duplicate testpmd commands.
Added cleanup patch for testpmd.
Updated release note.

Changes in V3:
Updated LIBABIVER in Makefile in librte_ether patch.
Updated rte_ethdev.h and ret_ether_version.map in librte_ether patch.
Squashed deprecation notice patch into librte_ether patch.
Added release_note patch.

Changes in V2:
Update testpmd set vf commands help messages.
Updated ethtool to use the ixgbe public API's.
Removed the ixgbe_set_pool_* and ixgbe_set_vf_rate_limit functions.
Removed the rte_eth_dev_set_vf_* API's
Removed the deprecation notice.

Changes in V1:
The following functions from eth_dev_ops have been moved to the ixgbe PMD
and renamed:

ixgbe_set_pool_rx_mode
ixgbe_set_pool_rx
ixgbe_set_pool_tx
ixgbe_set_pool_vlan_filter
ixgbe_set_vf_rate_limit

Renamed the functions to the following:

rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter
rte_pmd_ixgbe_set_vf_rate_limit

Testpmd has been modified to use the following functions:
rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rate_limit

New testpmd commands have been added to test the following functions:
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter

The testpmd user guide has been updated for the new commands.

Bernard Iremonger (7):
  net/ixgbe: move set VF functions from the ethdev
  app/testpmd: use ixgbe public functions
  app/testpmd: cleanup parameter checking
  examples/ethtool: use ixgbe public function
  net/ixgbe: remove static set VF functions
  librte_ether: remove the set VF API's
  doc: update release notes

 app/test-pmd/cmdline.c                      |  18 +-
 app/test-pmd/config.c                       |  43 ++-
 doc/guides/rel_notes/deprecation.rst        |  13 -
 doc/guides/rel_notes/release_17_02.rst      |  23 +-
 drivers/net/ixgbe/ixgbe_ethdev.c            | 459 ++++++++++++++++------------
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 +
 examples/ethtool/lib/rte_ethtool.c          |  12 +-
 lib/librte_ether/Makefile                   |   4 +-
 lib/librte_ether/rte_ethdev.c               | 129 --------
 lib/librte_ether/rte_ethdev.h               | 140 ---------
 lib/librte_ether/rte_ether_version.map      |   7 +-
 12 files changed, 442 insertions(+), 520 deletions(-)

-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: fix required tools list layout
@ 2016-12-13 10:03  4% Baruch Siach
  2016-12-15 15:09  0% ` Mcnamara, John
  0 siblings, 1 reply; 200+ results
From: Baruch Siach @ 2016-12-13 10:03 UTC (permalink / raw)
  To: dev; +Cc: John McNamara, David Marchand, Baruch Siach

The Python requirement should appear in the bullet list.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 doc/guides/linux_gsg/sys_reqs.rst | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 3d743421595a..621cc9ddaef6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -84,9 +84,7 @@ Compilation of the DPDK
     x86_x32 ABI is currently supported with distribution packages only on Ubuntu
     higher than 13.10 or recent Debian distribution. The only supported  compiler is gcc 4.9+.
 
-.. note::
-
-    Python, version 2.6 or 2.7, to use various helper scripts included in the DPDK package.
+*   Python, version 2.6 or 2.7, to use various helper scripts included in the DPDK package.
 
 
 **Optional Tools:**
-- 
2.10.2

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 8/9] librte_ether: remove the set VF API's
  2016-12-09 17:25  3% ` [dpdk-dev] [PATCH v2 0/9] " Bernard Iremonger
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 " Bernard Iremonger
@ 2016-12-12 13:50  3%   ` Bernard Iremonger
  1 sibling, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-12-12 13:50 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

remove the following API's:

rte_eth_dev_set_vf_rxmode
rte_eth_dev_set_vf_rx
rte_eth_dev_set_vf_tx
rte_eth_dev_set_vf_vlan_filter
rte_eth_dev_set_vf_rate_limit

Increment LIBABIVER in Makefile
Remove deprecation notice for removing rte_eth_dev_set_vf_* API's.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 doc/guides/rel_notes/deprecation.rst   |  13 ---
 lib/librte_ether/Makefile              |   4 +-
 lib/librte_ether/rte_ethdev.c          | 129 ------------------------------
 lib/librte_ether/rte_ethdev.h          | 140 ---------------------------------
 lib/librte_ether/rte_ether_version.map |   7 +-
 5 files changed, 3 insertions(+), 290 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 2d17bc6..c897c18 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -38,19 +38,6 @@ Deprecation Notices
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
-* ethdev: for 17.02 it is planned to deprecate the following five functions
-  and move them in ixgbe:
-
-  ``rte_eth_dev_set_vf_rxmode``
-
-  ``rte_eth_dev_set_vf_rx``
-
-  ``rte_eth_dev_set_vf_tx``
-
-  ``rte_eth_dev_set_vf_vlan_filter``
-
-  ``rte_eth_set_vf_rate_limit``
-
 * ABI changes are planned for 17.02 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..d23015c 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
 
 EXPORT_MAP := rte_ether_version.map
 
-LIBABIVER := 5
+LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1e0f206..6a93014 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2137,32 +2137,6 @@ rte_eth_dev_default_mac_addr_set(uint8_t port_id, struct ether_addr *addr)
 	return 0;
 }
 
-int
-rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
-				uint16_t rx_mode, uint8_t on)
-{
-	uint16_t num_vfs;
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-
-	num_vfs = dev_info.max_vfs;
-	if (vf > num_vfs) {
-		RTE_PMD_DEBUG_TRACE("set VF RX mode:invalid VF id %d\n", vf);
-		return -EINVAL;
-	}
-
-	if (rx_mode == 0) {
-		RTE_PMD_DEBUG_TRACE("set VF RX mode:mode mask ca not be zero\n");
-		return -EINVAL;
-	}
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rx_mode, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_rx_mode)(dev, vf, rx_mode, on);
-}
 
 /*
  * Returns index into MAC address array of addr. Use 00:00:00:00:00:00 to find
@@ -2252,76 +2226,6 @@ rte_eth_dev_uc_all_hash_table_set(uint8_t port_id, uint8_t on)
 	return (*dev->dev_ops->uc_all_hash_table_set)(dev, on);
 }
 
-int
-rte_eth_dev_set_vf_rx(uint8_t port_id, uint16_t vf, uint8_t on)
-{
-	uint16_t num_vfs;
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-
-	num_vfs = dev_info.max_vfs;
-	if (vf > num_vfs) {
-		RTE_PMD_DEBUG_TRACE("port %d: invalid vf id\n", port_id);
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rx, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_rx)(dev, vf, on);
-}
-
-int
-rte_eth_dev_set_vf_tx(uint8_t port_id, uint16_t vf, uint8_t on)
-{
-	uint16_t num_vfs;
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-
-	num_vfs = dev_info.max_vfs;
-	if (vf > num_vfs) {
-		RTE_PMD_DEBUG_TRACE("set pool tx:invalid pool id=%d\n", vf);
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_tx, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_tx)(dev, vf, on);
-}
-
-int
-rte_eth_dev_set_vf_vlan_filter(uint8_t port_id, uint16_t vlan_id,
-			       uint64_t vf_mask, uint8_t vlan_on)
-{
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-
-	if (vlan_id > ETHER_MAX_VLAN_ID) {
-		RTE_PMD_DEBUG_TRACE("VF VLAN filter:invalid VLAN id=%d\n",
-			vlan_id);
-		return -EINVAL;
-	}
-
-	if (vf_mask == 0) {
-		RTE_PMD_DEBUG_TRACE("VF VLAN filter:pool_mask can not be 0\n");
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_vlan_filter, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_vlan_filter)(dev, vlan_id,
-						   vf_mask, vlan_on);
-}
-
 int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
 					uint16_t tx_rate)
 {
@@ -2352,39 +2256,6 @@ int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
 	return (*dev->dev_ops->set_queue_rate_limit)(dev, queue_idx, tx_rate);
 }
 
-int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf, uint16_t tx_rate,
-				uint64_t q_msk)
-{
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
-	struct rte_eth_link link;
-
-	if (q_msk == 0)
-		return 0;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	rte_eth_dev_info_get(port_id, &dev_info);
-	link = dev->data->dev_link;
-
-	if (vf > dev_info.max_vfs) {
-		RTE_PMD_DEBUG_TRACE("set VF rate limit:port %d: "
-				"invalid vf id=%d\n", port_id, vf);
-		return -EINVAL;
-	}
-
-	if (tx_rate > link.link_speed) {
-		RTE_PMD_DEBUG_TRACE("set VF rate limit:invalid tx_rate=%d, "
-				"bigger than link speed= %d\n",
-				tx_rate, link.link_speed);
-		return -EINVAL;
-	}
-
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rate_limit, -ENOTSUP);
-	return (*dev->dev_ops->set_vf_rate_limit)(dev, vf, tx_rate, q_msk);
-}
-
 int
 rte_eth_mirror_rule_set(uint8_t port_id,
 			struct rte_eth_mirror_conf *mirror_conf,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..c602d7d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1249,39 +1249,11 @@ typedef int (*eth_uc_all_hash_table_set_t)(struct rte_eth_dev *dev,
 				  uint8_t on);
 /**< @internal Set all Unicast Hash bitmap */
 
-typedef int (*eth_set_vf_rx_mode_t)(struct rte_eth_dev *dev,
-				  uint16_t vf,
-				  uint16_t rx_mode,
-				  uint8_t on);
-/**< @internal Set a VF receive mode */
-
-typedef int (*eth_set_vf_rx_t)(struct rte_eth_dev *dev,
-				uint16_t vf,
-				uint8_t on);
-/**< @internal Set a VF receive  mode */
-
-typedef int (*eth_set_vf_tx_t)(struct rte_eth_dev *dev,
-				uint16_t vf,
-				uint8_t on);
-/**< @internal Enable or disable a VF transmit   */
-
-typedef int (*eth_set_vf_vlan_filter_t)(struct rte_eth_dev *dev,
-				  uint16_t vlan,
-				  uint64_t vf_mask,
-				  uint8_t vlan_on);
-/**< @internal Set VF VLAN pool filter */
-
 typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint16_t tx_rate);
 /**< @internal Set queue TX rate */
 
-typedef int (*eth_set_vf_rate_limit_t)(struct rte_eth_dev *dev,
-				uint16_t vf,
-				uint16_t tx_rate,
-				uint64_t q_msk);
-/**< @internal Set VF TX rate */
-
 typedef int (*eth_mirror_rule_set_t)(struct rte_eth_dev *dev,
 				  struct rte_eth_mirror_conf *mirror_conf,
 				  uint8_t rule_id,
@@ -1479,16 +1451,11 @@ struct eth_dev_ops {
 	eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast hash bitmap */
 	eth_mirror_rule_set_t	   mirror_rule_set;  /**< Add a traffic mirror rule.*/
 	eth_mirror_rule_reset_t	   mirror_rule_reset;  /**< reset a traffic mirror rule.*/
-	eth_set_vf_rx_mode_t       set_vf_rx_mode;   /**< Set VF RX mode */
-	eth_set_vf_rx_t            set_vf_rx;  /**< enable/disable a VF receive */
-	eth_set_vf_tx_t            set_vf_tx;  /**< enable/disable a VF transmit */
-	eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter */
 	/** Add UDP tunnel port. */
 	eth_udp_tunnel_port_add_t udp_tunnel_port_add;
 	/** Del UDP tunnel port. */
 	eth_udp_tunnel_port_del_t udp_tunnel_port_del;
 	eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate limit */
-	eth_set_vf_rate_limit_t    set_vf_rate_limit;   /**< Set VF rate limit */
 	/** Update redirection table. */
 	reta_update_t reta_update;
 	/** Query redirection table. */
@@ -3403,93 +3370,6 @@ int rte_eth_dev_uc_hash_table_set(uint8_t port,struct ether_addr *addr,
  */
 int rte_eth_dev_uc_all_hash_table_set(uint8_t port,uint8_t on);
 
- /**
- * Set RX L2 Filtering mode of a VF of an Ethernet device.
- *
- * @param port
- *   The port identifier of the Ethernet device.
- * @param vf
- *   VF id.
- * @param rx_mode
- *    The RX mode mask, which  is one or more of  accepting Untagged Packets,
- *    packets that match the PFUTA table, Broadcast and Multicast Promiscuous.
- *    ETH_VMDQ_ACCEPT_UNTAG,ETH_VMDQ_ACCEPT_HASH_UC,
- *    ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST will be used
- *    in rx_mode.
- * @param on
- *    1 - Enable a VF RX mode.
- *    0 - Disable a VF RX mode.
- * @return
- *   - (0) if successful.
- *   - (-ENOTSUP) if hardware doesn't support.
- *   - (-ENOTSUP) if hardware doesn't support.
- *   - (-EINVAL) if bad parameter.
- */
-int rte_eth_dev_set_vf_rxmode(uint8_t port, uint16_t vf, uint16_t rx_mode,
-				uint8_t on);
-
-/**
-* Enable or disable a VF traffic transmit of the Ethernet device.
-*
-* @param port
-*   The port identifier of the Ethernet device.
-* @param vf
-*   VF id.
-* @param on
-*    1 - Enable a VF traffic transmit.
-*    0 - Disable a VF traffic transmit.
-* @return
-*   - (0) if successful.
-*   - (-ENODEV) if *port_id* invalid.
-*   - (-ENOTSUP) if hardware doesn't support.
-*   - (-EINVAL) if bad parameter.
-*/
-int
-rte_eth_dev_set_vf_tx(uint8_t port,uint16_t vf, uint8_t on);
-
-/**
-* Enable or disable a VF traffic receive of an Ethernet device.
-*
-* @param port
-*   The port identifier of the Ethernet device.
-* @param vf
-*   VF id.
-* @param on
-*    1 - Enable a VF traffic receive.
-*    0 - Disable a VF traffic receive.
-* @return
-*   - (0) if successful.
-*   - (-ENOTSUP) if hardware doesn't support.
-*   - (-ENODEV) if *port_id* invalid.
-*   - (-EINVAL) if bad parameter.
-*/
-int
-rte_eth_dev_set_vf_rx(uint8_t port,uint16_t vf, uint8_t on);
-
-/**
-* Enable/Disable hardware VF VLAN filtering by an Ethernet device of
-* received VLAN packets tagged with a given VLAN Tag Identifier.
-*
-* @param port id
-*   The port identifier of the Ethernet device.
-* @param vlan_id
-*   The VLAN Tag Identifier whose filtering must be enabled or disabled.
-* @param vf_mask
-*    Bitmap listing which VFs participate in the VLAN filtering.
-* @param vlan_on
-*    1 - Enable VFs VLAN filtering.
-*    0 - Disable VFs VLAN filtering.
-* @return
-*   - (0) if successful.
-*   - (-ENOTSUP) if hardware doesn't support.
-*   - (-ENODEV) if *port_id* invalid.
-*   - (-EINVAL) if bad parameter.
-*/
-int
-rte_eth_dev_set_vf_vlan_filter(uint8_t port, uint16_t vlan_id,
-				uint64_t vf_mask,
-				uint8_t vlan_on);
-
 /**
  * Set a traffic mirroring rule on an Ethernet device
  *
@@ -3551,26 +3431,6 @@ int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
 			uint16_t tx_rate);
 
 /**
- * Set the rate limitation for a vf on an Ethernet device.
- *
- * @param port_id
- *   The port identifier of the Ethernet device.
- * @param vf
- *   VF id.
- * @param tx_rate
- *   The tx rate allocated from the total link speed for this VF id.
- * @param q_msk
- *   The queue mask which need to set the rate.
- * @return
- *   - (0) if successful.
- *   - (-ENOTSUP) if hardware doesn't support this feature.
- *   - (-ENODEV) if *port_id* invalid.
- *   - (-EINVAL) if bad parameter.
- */
-int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf,
-			uint16_t tx_rate, uint64_t q_msk);
-
-/**
  * Initialize bypass logic. This function needs to be called before
  * executing any other bypass API.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..7594416 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -61,10 +61,6 @@ DPDK_2.2 {
 	rte_eth_dev_set_mtu;
 	rte_eth_dev_set_rx_queue_stats_mapping;
 	rte_eth_dev_set_tx_queue_stats_mapping;
-	rte_eth_dev_set_vf_rx;
-	rte_eth_dev_set_vf_rxmode;
-	rte_eth_dev_set_vf_tx;
-	rte_eth_dev_set_vf_vlan_filter;
 	rte_eth_dev_set_vlan_offload;
 	rte_eth_dev_set_vlan_pvid;
 	rte_eth_dev_set_vlan_strip_on_queue;
@@ -94,7 +90,6 @@ DPDK_2.2 {
 	rte_eth_rx_queue_info_get;
 	rte_eth_rx_queue_setup;
 	rte_eth_set_queue_rate_limit;
-	rte_eth_set_vf_rate_limit;
 	rte_eth_stats;
 	rte_eth_stats_get;
 	rte_eth_stats_reset;
@@ -146,4 +141,4 @@ DPDK_16.11 {
 	rte_eth_dev_pci_probe;
 	rte_eth_dev_pci_remove;
 
-} DPDK_16.07;
+} DPDK_16.07;
\ No newline at end of file
-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 0/9] net/ixgbe: move set VF functions.
  2016-12-09 17:25  3% ` [dpdk-dev] [PATCH v2 0/9] " Bernard Iremonger
@ 2016-12-12 13:50  3%   ` Bernard Iremonger
  2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 0/7] " Bernard Iremonger
                       ` (2 more replies)
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 8/9] librte_ether: remove the set VF API's Bernard Iremonger
  1 sibling, 3 replies; 200+ results
From: Bernard Iremonger @ 2016-12-12 13:50 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

This patchset implements the following deprecation notice:
[PATCH v1] doc: announce API and ABI change for librte_ether

Changes in V3:
Updated LIBABIVER in Makefile in librte_ether patch.
Updated rte_ethdev.h and ret_ether_version.map in librte_ether patch.
Squashed deprecation notice patch into librte_ether patch.
Added release_note patch.

Changes in V2:
Update testpmd set vf commands help messages.
Updated ethtool to use the ixgbe public API's.
Removed the ixgbe_set_pool_* and ixgbe_set_vf_rate_limit functions.
Removed the rte_eth_dev_set_vf_* API's
Removed the deprecation notice.

Changes in V1:
The following functions from eth_dev_ops have been moved to the ixgbe PMD
and renamed:

ixgbe_set_pool_rx_mode
ixgbe_set_pool_rx
ixgbe_set_pool_tx
ixgbe_set_pool_vlan_filter
ixgbe_set_vf_rate_limit

Renamed the functions to the following:

rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter
rte_pmd_ixgbe_set_vf_rate_limit

Testpmd has been modified to use the following functions:
rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rate_limit

New testpmd commands have been added to test the following functions:
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter

The testpmd user guide has been updated for the new commands.

Bernard Iremonger (9):
  net/ixgbe: move set VF functions from the ethdev
  app/testpmd: use ixgbe public functions
  app/testpmd: add command for set VF VLAN filter
  app/testpmd: add command for set VF receive
  app/testpmd: add command for set VF transmit
  examples/ethtool: use ixgbe public function
  net/ixgbe: remove static set VF functions
  librte_ether: remove the set VF API's
  doc: update release notes

 app/test-pmd/cmdline.c                      | 270 +++++++++++++++-
 app/test-pmd/config.c                       |  31 +-
 doc/guides/rel_notes/deprecation.rst        |  13 -
 doc/guides/rel_notes/release_17_02.rst      |  20 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 ++
 drivers/net/ixgbe/ixgbe_ethdev.c            | 459 ++++++++++++++++------------
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 +
 examples/ethtool/lib/rte_ethtool.c          |   5 +-
 lib/librte_ether/Makefile                   |   4 +-
 lib/librte_ether/rte_ethdev.c               | 129 --------
 lib/librte_ether/rte_ethdev.h               | 140 ---------
 lib/librte_ether/rte_ether_version.map      |   7 +-
 13 files changed, 706 insertions(+), 507 deletions(-)

-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add firmware version get
  2016-12-08 11:07  3%     ` Ferruh Yigit
@ 2016-12-12  1:28  4%       ` Yang, Qiming
  0 siblings, 0 replies; 200+ results
From: Yang, Qiming @ 2016-12-12  1:28 UTC (permalink / raw)
  To: Yigit, Ferruh, dev; +Cc: Thomas Monjalon, Horton, Remy

Hi, Yigit
Yes, we had planned to add  fw_version in rte_eth_dev_info_get(). But Remy think we should better to implement this feature through a way don't break the original ABI. So I change the implement.

-----Original Message-----
From: Yigit, Ferruh 
Sent: Thursday, December 8, 2016 7:07 PM
To: Yang, Qiming <qiming.yang@intel.com>; dev@dpdk.org
Cc: Thomas Monjalon <thomas.monjalon@6wind.com>
Subject: Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add firmware version get

Hi Qiming,

On 12/6/2016 7:16 AM, Qiming Yang wrote:
> This patch adds a new API 'rte_eth_dev_fwver_get' for fetching 
> firmware version by a given device.
> 
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>

<...>

> @@ -1444,6 +1448,7 @@ struct eth_dev_ops {
>  	/**< Get names of extended statistics. */
>  	eth_queue_stats_mapping_set_t queue_stats_mapping_set;
>  	/**< Configure per queue stat counter mapping. */
> +	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */

Hi Qiming,

Not sure if I am missing something but this change is for following [1] deprecation notice, right?
If so, notice suggest updating rte_eth_dev_info_get() to include fw_version, but this patch adds a new eth_dev_ops.

Is it agreed to add a new eth_dev_ops for this?


[1]
* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
  will be extended with a new member ``fw_version`` in order to store
  the NIC firmware version.



>  	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
>  	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
>  	/**< Get packet types supported and identified by device*/

<...>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 8/9] librte_ether: remove the set VF API's
  @ 2016-12-09 18:00  3%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2016-12-09 18:00 UTC (permalink / raw)
  To: Bernard Iremonger, thomas.monjalon, dev

On 12/9/2016 5:26 PM, Bernard Iremonger wrote:
> remove the following API's:
> 
> rte_eth_dev_set_vf_rxmode
> rte_eth_dev_set_vf_rx
> rte_eth_dev_set_vf_tx
> rte_eth_dev_set_vf_vlan_filter
> rte_eth_dev_set_vf_rate_limit

This patch should also remove above function definitions from
rte_ethdev.h and rte_ether_version.map too.

And it may be good to squash next patch (remove deprecation notice) to
this one to show what caused to notice removal.

Also need to increase LIBABIVER, and update release notes for ABI
breakage, somewhere in this patchset, I would do in this patch but not
quite sure.

> 
> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 129 ------------------------------------------
>  lib/librte_ether/rte_ethdev.h |  33 -----------
>  2 files changed, 162 deletions(-)
> 

<...>

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 9/9] doc: remove deprecation notice
  2016-12-09 11:27  3% Bernard Iremonger
                   ` (2 preceding siblings ...)
  @ 2016-12-09 17:26  4% ` Bernard Iremonger
  3 siblings, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-12-09 17:26 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

remove deprecation notice for removing rte_eth_dev_set_vf_* API's.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 2d17bc6..c897c18 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -38,19 +38,6 @@ Deprecation Notices
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
-* ethdev: for 17.02 it is planned to deprecate the following five functions
-  and move them in ixgbe:
-
-  ``rte_eth_dev_set_vf_rxmode``
-
-  ``rte_eth_dev_set_vf_rx``
-
-  ``rte_eth_dev_set_vf_tx``
-
-  ``rte_eth_dev_set_vf_vlan_filter``
-
-  ``rte_eth_set_vf_rate_limit``
-
 * ABI changes are planned for 17.02 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
-- 
2.10.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 0/9] net/ixgbe: move set VF functions.
  2016-12-09 11:27  3% Bernard Iremonger
  2016-12-09 11:54  0% ` Ferruh Yigit
@ 2016-12-09 17:25  3% ` Bernard Iremonger
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 " Bernard Iremonger
  2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 8/9] librte_ether: remove the set VF API's Bernard Iremonger
    2016-12-09 17:26  4% ` [dpdk-dev] [PATCH v2 9/9] doc: remove deprecation notice Bernard Iremonger
  3 siblings, 2 replies; 200+ results
From: Bernard Iremonger @ 2016-12-09 17:25 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

This patchset implements the following deprecation notice:
[PATCH v1] doc: announce API and ABI change for librte_ether

Changes in V2:
Update testpmd set vf commands help messages.
Updated ethtool to use the ixgbe public API's.
Removed the ixgbe_set_pool_* and ixgbe_set_vf_rate_limit functions.
Removed the rte_eth_dev_set_vf_* API's
Removed the deprecation notice.

Changes in V1:
The following functions from eth_dev_ops have been moved to the ixgbe PMD
and renamed:

ixgbe_set_pool_rx_mode
ixgbe_set_pool_rx
ixgbe_set_pool_tx
ixgbe_set_pool_vlan_filter
ixgbe_set_vf_rate_limit

Renamed the functions to the following:

rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter
rte_pmd_ixgbe_set_vf_rate_limit

Testpmd has been modified to use the following functions:
rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rate_limit

New testpmd commands have been added to test the following functions:
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter

The testpmd user guide has been updated for the new commands.

Bernard Iremonger (9):
  net/ixgbe: move set VF functions from the ethdev
  app/testpmd: use ixgbe public functions
  app/testpmd: add command for set VF VLAN filter
  app/testpmd: add command for set VF receive
  app/testpmd: add command for set VF transmit
  examples/ethtool: use ixgbe public function
  net/ixgbe: remove static set VF functions
  librte_ether: remove the set VF API's
  doc: remove deprecation notice

 app/test-pmd/cmdline.c                      | 270 +++++++++++++++-
 app/test-pmd/config.c                       |  31 +-
 doc/guides/rel_notes/deprecation.rst        |  13 -
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 ++
 drivers/net/ixgbe/ixgbe_ethdev.c            | 459 ++++++++++++++++------------
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 +
 examples/ethtool/lib/rte_ethtool.c          |   5 +-
 lib/librte_ether/rte_ethdev.c               | 129 --------
 lib/librte_ether/rte_ethdev.h               |  33 --
 10 files changed, 683 insertions(+), 392 deletions(-)

-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 02/12] lib: add cryptodev type for the upcoming ARMv8 PMD
  @ 2016-12-09 12:06  3%       ` Declan Doherty
  0 siblings, 0 replies; 200+ results
From: Declan Doherty @ 2016-12-09 12:06 UTC (permalink / raw)
  To: Thomas Monjalon, Zbigniew Bodek
  Cc: dev, zbigniew.bodek, pablo.de.lara.guarch, jerin.jacob

On 07/12/16 20:09, Thomas Monjalon wrote:
> 2016-12-07 20:04, Zbigniew Bodek:
>> On 06.12.2016 21:27, Thomas Monjalon wrote:
>>> 2016-12-06 18:32, zbigniew.bodek@caviumnetworks.com:
>>>> From: Zbigniew Bodek <zbigniew.bodek@caviumnetworks.com>
>>>>
>>>> Add type and name for ARMv8 crypto PMD
>>>>
>>>> Signed-off-by: Zbigniew Bodek <zbigniew.bodek@caviumnetworks.com>
>>> [...]
>>>> --- a/lib/librte_cryptodev/rte_cryptodev.h
>>>> +++ b/lib/librte_cryptodev/rte_cryptodev.h
>>>> @@ -66,6 +66,8 @@
>>>>  /**< KASUMI PMD device name */
>>>>  #define CRYPTODEV_NAME_ZUC_PMD		crypto_zuc
>>>>  /**< KASUMI PMD device name */
>>>> +#define CRYPTODEV_NAME_ARMV8_PMD	crypto_armv8
>>>> +/**< ARMv8 CM device name */
>>>>
>>>>  /** Crypto device type */
>>>>  enum rte_cryptodev_type {
>>>> @@ -77,6 +79,7 @@ enum rte_cryptodev_type {
>>>>  	RTE_CRYPTODEV_KASUMI_PMD,	/**< KASUMI PMD */
>>>>  	RTE_CRYPTODEV_ZUC_PMD,		/**< ZUC PMD */
>>>>  	RTE_CRYPTODEV_OPENSSL_PMD,    /**<  OpenSSL PMD */
>>>> +	RTE_CRYPTODEV_ARMV8_PMD,	/**< ARMv8 crypto PMD */
>>>>  };
>>>
>>> Can we remove all these types and names in the generic crypto API?
>>>
>>
>> Hello Thomas,
>>
>> I added another PMD type and therefore we need new, unique number for
>> it. I'm not sure if I understand correctly what you mean here, so please
>> elaborate.
>
> My comment is not specific to your PMD.
> I think there is something wrong in the design of cryptodev if we need
> to update rte_cryptodev.h each time a new driver is added.
> There is no such thing in ethdev.
>

Hey Thomas, I've been meaning to have a look at removing this enum, I 
just haven't had the time as yet, I think since there is now a standard 
naming convention for all pmds, the use for this is redundant.

This change will require a ABI/API deprecation notice, so I'll put that 
into 17.02 and then do the patches to remove for 17.05

Declan

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 0/5] net/ixgbe: move set VF functions.
  2016-12-09 11:54  0% ` Ferruh Yigit
@ 2016-12-09 12:00  0%   ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2016-12-09 12:00 UTC (permalink / raw)
  To: Yigit, Ferruh, thomas.monjalon, dev; +Cc: Iremonger, Bernard

Hi Ferruh,

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Friday, December 9, 2016 11:54 AM
> To: Iremonger, Bernard <bernard.iremonger@intel.com>;
> thomas.monjalon@6wind.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 0/5] net/ixgbe: move set VF functions.
> 
> On 12/9/2016 11:27 AM, Bernard Iremonger wrote:
> > This patchset implements the following deprecation notice:
> > [PATCH v1] doc: announce API and ABI change for librte_ether
> >
> > The following functions from eth_dev_ops have been moved to the ixgbe
> > PMD and renamed:
> >
> > ixgbe_set_pool_rx_mode
> > ixgbe_set_pool_rx
> > ixgbe_set_pool_tx
> > ixgbe_set_pool_vlan_filter
> > ixgbe_set_vf_rate_limit
> >
> > Renamed the functions to the following:
> >
> > rte_pmd_ixgbe_set_vf_rxmode
> > rte_pmd_ixgbe_set_vf_rx
> > rte_pmd_ixgbe_set_vf_tx
> > rte_pmd_ixgbe_set_vf_vlan_filter
> > rte_pmd_ixgbe_set_vf_rate_limit
> >
> > Testpmd has been modified to use the following functions:
> > rte_pmd_ixgbe_set_vf_rxmode
> > rte_pmd_ixgbe_set_vf_rate_limit
> >
> > New testpmd commands have been added to test the following functions:
> > rte_pmd_ixgbe_set_vf_rx
> > rte_pmd_ixgbe_set_vf_tx
> > rte_pmd_ixgbe_set_vf_vlan_filter
> >
> > The testpmd user guide has been updated for the new commands.
> >
> > Bernard Iremonger (5):
> >   net/ixgbe: move set VF functions from the ethdev
> >   app/testpmd: use ixgbe public functions
> >   app/testpmd: add command for set VF VLAN filter
> >   app/testpmd: add command for set VF receive
> >   app/testpmd: add command for set VF transmit
> >
> >  app/test-pmd/cmdline.c                      | 270
> +++++++++++++++++++++++++++-
> >  app/test-pmd/config.c                       |  31 ++--
> >  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 +++
> >  drivers/net/ixgbe/ixgbe_ethdev.c            | 263
> +++++++++++++++++++++++++++
> >  drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++++++
> >  drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 ++
> >  6 files changed, 678 insertions(+), 21 deletions(-)
> >
> 
> Why this patchset doesn't remove ethdev updates for these functions?
> 
> ixgbe is the only user for these eth-dev_ops, since code moved to ixgbe
> driver, they and relevant rte_eth_xx functions (and deprecation notice) can
> be removed in this patchset. Most probably after testpmd updated to
> prevent compilation errors.

My understanding is that the functions should be copied and reworked before being removed from the ethdev, and that the removal should be done in a separate patch set.

Hi Thomas,

Could you clarify please.

Regards,

Bernard.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 0/5] net/ixgbe: move set VF functions.
  2016-12-09 11:27  3% Bernard Iremonger
@ 2016-12-09 11:54  0% ` Ferruh Yigit
  2016-12-09 12:00  0%   ` Iremonger, Bernard
  2016-12-09 17:25  3% ` [dpdk-dev] [PATCH v2 0/9] " Bernard Iremonger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-12-09 11:54 UTC (permalink / raw)
  To: Bernard Iremonger, thomas.monjalon, dev

On 12/9/2016 11:27 AM, Bernard Iremonger wrote:
> This patchset implements the following deprecation notice:
> [PATCH v1] doc: announce API and ABI change for librte_ether
> 
> The following functions from eth_dev_ops have been moved to the ixgbe PMD
> and renamed:
> 
> ixgbe_set_pool_rx_mode
> ixgbe_set_pool_rx
> ixgbe_set_pool_tx
> ixgbe_set_pool_vlan_filter
> ixgbe_set_vf_rate_limit
> 
> Renamed the functions to the following:
> 
> rte_pmd_ixgbe_set_vf_rxmode
> rte_pmd_ixgbe_set_vf_rx
> rte_pmd_ixgbe_set_vf_tx
> rte_pmd_ixgbe_set_vf_vlan_filter
> rte_pmd_ixgbe_set_vf_rate_limit
> 
> Testpmd has been modified to use the following functions:
> rte_pmd_ixgbe_set_vf_rxmode
> rte_pmd_ixgbe_set_vf_rate_limit
> 
> New testpmd commands have been added to test the following functions:
> rte_pmd_ixgbe_set_vf_rx
> rte_pmd_ixgbe_set_vf_tx
> rte_pmd_ixgbe_set_vf_vlan_filter
> 
> The testpmd user guide has been updated for the new commands.
> 
> Bernard Iremonger (5):
>   net/ixgbe: move set VF functions from the ethdev
>   app/testpmd: use ixgbe public functions
>   app/testpmd: add command for set VF VLAN filter
>   app/testpmd: add command for set VF receive
>   app/testpmd: add command for set VF transmit
> 
>  app/test-pmd/cmdline.c                      | 270 +++++++++++++++++++++++++++-
>  app/test-pmd/config.c                       |  31 ++--
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 +++
>  drivers/net/ixgbe/ixgbe_ethdev.c            | 263 +++++++++++++++++++++++++++
>  drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++++++
>  drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 ++
>  6 files changed, 678 insertions(+), 21 deletions(-)
> 

Why this patchset doesn't remove ethdev updates for these functions?

ixgbe is the only user for these eth-dev_ops, since code moved to ixgbe
driver, they and relevant rte_eth_xx functions (and deprecation notice)
can be removed in this patchset. Most probably after testpmd updated to
prevent compilation errors.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 0/5] net/ixgbe: move set VF functions.
@ 2016-12-09 11:27  3% Bernard Iremonger
  2016-12-09 11:54  0% ` Ferruh Yigit
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Bernard Iremonger @ 2016-12-09 11:27 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

This patchset implements the following deprecation notice:
[PATCH v1] doc: announce API and ABI change for librte_ether

The following functions from eth_dev_ops have been moved to the ixgbe PMD
and renamed:

ixgbe_set_pool_rx_mode
ixgbe_set_pool_rx
ixgbe_set_pool_tx
ixgbe_set_pool_vlan_filter
ixgbe_set_vf_rate_limit

Renamed the functions to the following:

rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter
rte_pmd_ixgbe_set_vf_rate_limit

Testpmd has been modified to use the following functions:
rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rate_limit

New testpmd commands have been added to test the following functions:
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter

The testpmd user guide has been updated for the new commands.

Bernard Iremonger (5):
  net/ixgbe: move set VF functions from the ethdev
  app/testpmd: use ixgbe public functions
  app/testpmd: add command for set VF VLAN filter
  app/testpmd: add command for set VF receive
  app/testpmd: add command for set VF transmit

 app/test-pmd/cmdline.c                      | 270 +++++++++++++++++++++++++++-
 app/test-pmd/config.c                       |  31 ++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 +++
 drivers/net/ixgbe/ixgbe_ethdev.c            | 263 +++++++++++++++++++++++++++
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 ++
 6 files changed, 678 insertions(+), 21 deletions(-)

-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 0/5] net/ixgbe: move set VF functions.
@ 2016-12-09 11:17  3% Bernard Iremonger
  0 siblings, 0 replies; 200+ results
From: Bernard Iremonger @ 2016-12-09 11:17 UTC (permalink / raw)
  To: thomas.monjalon, dev; +Cc: Bernard Iremonger

This patchset implements the following deprecation notice:
[PATCH v1] doc: announce API and ABI change for librte_ether

The following functions from eth_dev_ops have been moved to the ixgbe PMD
and renamed:

ixgbe_set_pool_rx_mode
ixgbe_set_pool_rx
ixgbe_set_pool_tx
ixgbe_set_pool_vlan_filter
ixgbe_set_vf_rate_limit

Renamed the functions to the following:

rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter
rte_pmd_ixgbe_set_vf_rate_limit

Testpmd has been modified to use the following functions:
rte_pmd_ixgbe_set_vf_rxmode
rte_pmd_ixgbe_set_vf_rate_limit

New testpmd commands have been added to test the following functions:
rte_pmd_ixgbe_set_vf_rx
rte_pmd_ixgbe_set_vf_tx
rte_pmd_ixgbe_set_vf_vlan_filter

The testpmd user guide has been updated for the new commands.

Bernard Iremonger (5):
  net/ixgbe: move set VF functions from the ethdev
  app/testpmd: use ixgbe public functions
  app/testpmd: add command for set VF VLAN filter
  app/testpmd: add command for set VF receive
  app/testpmd: add command for set VF transmit

 app/test-pmd/cmdline.c                      | 270 +++++++++++++++++++++++++++-
 app/test-pmd/config.c                       |  31 ++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  21 +++
 drivers/net/ixgbe/ixgbe_ethdev.c            | 263 +++++++++++++++++++++++++++
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 104 +++++++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  10 ++
 6 files changed, 678 insertions(+), 21 deletions(-)

-- 
2.10.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-12-02 21:06  0%         ` Kevin Traynor
  2016-12-06 18:11  0%           ` Chandran, Sugesh
@ 2016-12-08 17:07  3%           ` Adrien Mazarguil
  2016-12-14 11:48  0%             ` Kevin Traynor
  1 sibling, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-12-08 17:07 UTC (permalink / raw)
  To: Kevin Traynor
  Cc: dev, Thomas Monjalon, Pablo de Lara, Olivier Matz, sugesh.chandran

On Fri, Dec 02, 2016 at 09:06:42PM +0000, Kevin Traynor wrote:
> On 12/01/2016 08:36 AM, Adrien Mazarguil wrote:
> > Hi Kevin,
> > 
> > On Wed, Nov 30, 2016 at 05:47:17PM +0000, Kevin Traynor wrote:
> >> Hi Adrien,
> >>
> >> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> >>> This new API supersedes all the legacy filter types described in
> >>> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> >>> PMDs to process and validate flow rules.
> >>>
> >>> Benefits:
> >>>
> >>> - A unified API is easier to program for, applications do not have to be
> >>>   written for a specific filter type which may or may not be supported by
> >>>   the underlying device.
> >>>
> >>> - The behavior of a flow rule is the same regardless of the underlying
> >>>   device, applications do not need to be aware of hardware quirks.
> >>>
> >>> - Extensible by design, API/ABI breakage should rarely occur if at all.
> >>>
> >>> - Documentation is self-standing, no need to look up elsewhere.
> >>>
> >>> Existing filter types will be deprecated and removed in the near future.
> >>
> >> I'd suggest to add a deprecation notice to deprecation.rst, ideally with
> >> a target release.
> > 
> > Will do, not a sure about the target release though. It seems a bit early
> > since no PMD really supports this API yet.
> > 
> > [...]
> >>> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> >>> new file mode 100644
> >>> index 0000000..064963d
> >>> --- /dev/null
> >>> +++ b/lib/librte_ether/rte_flow.c
> >>> @@ -0,0 +1,159 @@
> >>> +/*-
> >>> + *   BSD LICENSE
> >>> + *
> >>> + *   Copyright 2016 6WIND S.A.
> >>> + *   Copyright 2016 Mellanox.
> >>
> >> There's Mellanox copyright but you are the only signed-off-by - is that
> >> right?
> > 
> > Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed
> > on their behalf to expose several features from mlx4/mlx5 as the existing
> > filter types had too many limitations.
> > 
> > [...]
> >>> +/* Get generic flow operations structure from a port. */
> >>> +const struct rte_flow_ops *
> >>> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
> >>> +{
> >>> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >>> +	const struct rte_flow_ops *ops;
> >>> +	int code;
> >>> +
> >>> +	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> >>> +		code = ENODEV;
> >>> +	else if (unlikely(!dev->dev_ops->filter_ctrl ||
> >>> +			  dev->dev_ops->filter_ctrl(dev,
> >>> +						    RTE_ETH_FILTER_GENERIC,
> >>> +						    RTE_ETH_FILTER_GET,
> >>> +						    &ops) ||
> >>> +			  !ops))
> >>> +		code = ENOTSUP;
> >>> +	else
> >>> +		return ops;
> >>> +	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> >>> +			   NULL, rte_strerror(code));
> >>> +	return NULL;
> >>> +}
> >>> +
> >>
> >> Is it expected that the application or pmd will provide locking between
> >> these functions if required? I think it's going to have to be the app.
> > 
> > Locking is indeed expected to be performed by applications. This API only
> > documents places where locking would make sense if necessary and expected
> > behavior.
> > 
> > Like all control path APIs, this one assumes a single control thread.
> > Applications must take the necessary precautions.
> 
> If you look at OVS now it's quite possible that you have 2 rx queues
> serviced by different threads, that would also install the flow rules in
> the software flow caches - possibly that could extend to adding hardware
> flows. There could also be another thread that is querying for stats. So
> anything that can be done to minimise the locking would be helpful -
> maybe query() could be atomic and not require any locking?

I think we need basic functions with as few constraints as possible on PMDs
first, this API being somewhat complex to implement on their side. That
covers the common use case where applications have a single control thread
or otherwise perform locking on their own.

Once the basics are there for most PMDs, we may add new functions, items,
properties and actions that provide additional constraints (timing,
multi-threading and so on), which remain to be defined according to
feedback. It is designed to be extended without causing ABI breakage.

As for query(), let's see how PMDs handle it first. A race between query()
and create() on a given device is almost unavoidable without locking, same
for queries that reset counters in a given flow rule. Basic parallel queries
should not cause any harm otherwise, although this cannot be guaranteed yet.

> > [...]
> >>> +/**
> >>> + * Flow rule attributes.
> >>> + *
> >>> + * Priorities are set on two levels: per group and per rule within groups.
> >>> + *
> >>> + * Lower values denote higher priority, the highest priority for both levels
> >>> + * is 0, so that a rule with priority 0 in group 8 is always matched after a
> >>> + * rule with priority 8 in group 0.
> >>> + *
> >>> + * Although optional, applications are encouraged to group similar rules as
> >>> + * much as possible to fully take advantage of hardware capabilities
> >>> + * (e.g. optimized matching) and work around limitations (e.g. a single
> >>> + * pattern type possibly allowed in a given group).
> >>> + *
> >>> + * Group and priority levels are arbitrary and up to the application, they
> >>> + * do not need to be contiguous nor start from 0, however the maximum number
> >>> + * varies between devices and may be affected by existing flow rules.
> >>> + *
> >>> + * If a packet is matched by several rules of a given group for a given
> >>> + * priority level, the outcome is undefined. It can take any path, may be
> >>> + * duplicated or even cause unrecoverable errors.
> >>
> >> I get what you are trying to do here wrt supporting multiple
> >> pmds/hardware implementations and it's a good idea to keep it flexible.
> >>
> >> Given that the outcome is undefined, it would be nice that the
> >> application has a way of finding the specific effects for verification
> >> and debugging.
> > 
> > Right, however it was deemed a bit difficult to manage in many cases hence
> > the vagueness.
> > 
> > For example, suppose two rules with the same group and priority, one
> > matching any IPv4 header, the other one any UDP header:
> > 
> > - TCPv4 packets => rule #1.
> > - UDPv6 packets => rule #2.
> > - UDPv4 packets => both?
> > 
> > That last one is perhaps invalid, checking that some unspecified protocol
> > combination does not overlap is expensive and may miss corner cases, even
> > assuming this is not an issue, what if the application guarantees that no
> > UDPv4 packets can ever hit that rule?
> 
> that's fine - I don't expect the software to be able to know what the
> hardware will do with those rules. It's more about trying to get a dump
> from the hardware if something goes wrong. Anyway covered in comment later.
> 
> > 
> > Suggestions are welcome though, perhaps we can refine the description
> > 
> >>> + *
> >>> + * Note that support for more than a single group and priority level is not
> >>> + * guaranteed.
> >>> + *
> >>> + * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
> >>> + *
> >>> + * Several pattern items and actions are valid and can be used in both
> >>> + * directions. Those valid for only one direction are described as such.
> >>> + *
> >>> + * Specifying both directions at once is not recommended but may be valid in
> >>> + * some cases, such as incrementing the same counter twice.
> >>> + *
> >>> + * Not specifying any direction is currently an error.
> >>> + */
> >>> +struct rte_flow_attr {
> >>> +	uint32_t group; /**< Priority group. */
> >>> +	uint32_t priority; /**< Priority level within group. */
> >>> +	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
> >>> +	uint32_t egress:1; /**< Rule applies to egress traffic. */
> >>> +	uint32_t reserved:30; /**< Reserved, must be zero. */
> >>> +};
> > [...]
> >>> +/**
> >>> + * RTE_FLOW_ITEM_TYPE_VF
> >>> + *
> >>> + * Matches packets addressed to a virtual function ID of the device.
> >>> + *
> >>> + * If the underlying device function differs from the one that would
> >>> + * normally receive the matched traffic, specifying this item prevents it
> >>> + * from reaching that device unless the flow rule contains a VF
> >>> + * action. Packets are not duplicated between device instances by default.
> >>> + *
> >>> + * - Likely to return an error or never match any traffic if this causes a
> >>> + *   VF device to match traffic addressed to a different VF.
> >>> + * - Can be specified multiple times to match traffic addressed to several
> >>> + *   specific VFs.
> >>> + * - Can be combined with a PF item to match both PF and VF traffic.
> >>> + *
> >>> + * A zeroed mask can be used to match any VF.
> >>
> >> can you refer explicitly to id
> > 
> > If you mean "VF" to "VF ID" then yes, will do it for v2.
> > 
> >>> + */
> >>> +struct rte_flow_item_vf {
> >>> +	uint32_t id; /**< Destination VF ID. */
> >>> +};
> > [...]
> >>> +/**
> >>> + * Matching pattern item definition.
> >>> + *
> >>> + * A pattern is formed by stacking items starting from the lowest protocol
> >>> + * layer to match. This stacking restriction does not apply to meta items
> >>> + * which can be placed anywhere in the stack with no effect on the meaning
> >>> + * of the resulting pattern.
> >>> + *
> >>> + * A stack is terminated by a END item.
> >>> + *
> >>> + * The spec field should be a valid pointer to a structure of the related
> >>> + * item type. It may be set to NULL in many cases to use default values.
> >>> + *
> >>> + * Optionally, last can point to a structure of the same type to define an
> >>> + * inclusive range. This is mostly supported by integer and address fields,
> >>> + * may cause errors otherwise. Fields that do not support ranges must be set
> >>> + * to the same value as their spec counterparts.
> >>> + *
> >>> + * By default all fields present in spec are considered relevant.* This
> >>
> >> typo "*"
> > 
> > No, that's an asterisk for a footnote below. Perhaps it is a bit unusual,
> > would something like "[1]" look better?
> 
> oh, I thought it was the start of a comment line gone astray. Maybe "See
> note below", no big deal though.

OK, will change it anyway for clarity.

> >>> + * behavior can be altered by providing a mask structure of the same type
> >>> + * with applicable bits set to one. It can also be used to partially filter
> >>> + * out specific fields (e.g. as an alternate mean to match ranges of IP
> >>> + * addresses).
> >>> + *
> >>> + * Note this is a simple bit-mask applied before interpreting the contents
> >>> + * of spec and last, which may yield unexpected results if not used
> >>> + * carefully. For example, if for an IPv4 address field, spec provides
> >>> + * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
> >>> + * effective range is 10.1.0.0 to 10.3.255.255.
> >>> + *
> > 
> > See footnote below:
> > 
> >>> + * * The defaults for data-matching items such as IPv4 when mask is not
> >>> + *   specified actually depend on the underlying implementation since only
> >>> + *   recognized fields can be taken into account.
> >>> + */
> >>> +struct rte_flow_item {
> >>> +	enum rte_flow_item_type type; /**< Item type. */
> >>> +	const void *spec; /**< Pointer to item specification structure. */
> >>> +	const void *last; /**< Defines an inclusive range (spec to last). */
> >>> +	const void *mask; /**< Bit-mask applied to spec and last. */
> >>> +};
> >>> +
> >>> +/**
> >>> + * Action types.
> >>> + *
> >>> + * Each possible action is represented by a type. Some have associated
> >>> + * configuration structures. Several actions combined in a list can be
> >>> + * affected to a flow rule. That list is not ordered.
> >>> + *
> >>> + * They fall in three categories:
> >>> + *
> >>> + * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
> >>> + *   processing matched packets by subsequent flow rules, unless overridden
> >>> + *   with PASSTHRU.
> >>> + *
> >>> + * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
> >>> + *   for additional processing by subsequent flow rules.
> >>> + *
> >>> + * - Other non terminating meta actions that do not affect the fate of
> >>> + *   packets (END, VOID, MARK, FLAG, COUNT).
> >>> + *
> >>> + * When several actions are combined in a flow rule, they should all have
> >>> + * different types (e.g. dropping a packet twice is not possible). The
> >>> + * defined behavior is for PMDs to only take into account the last action of
> >>> + * a given type found in the list. PMDs still perform error checking on the
> >>> + * entire list.
> >>
> >> why do you define that the pmd will interpret multiple same type rules
> >> in this way...would it not make more sense for the pmd to just return
> >> EINVAL for an invalid set of rules? It seems more transparent for the
> >> application.
> > 
> > Well, I had to define something as a default. The reason is that any number
> > of VOID actions may specified and did not want that to be a special case in
> > order to keep PMD parsers as simple as possible. I'll settle for EINVAL (or
> > some other error) if at least one PMD maintainer other than Nelio who
> > intends to implement this API is not convinced by this explanation, all
> > right?
> 
> From an API perspective I think it's cleaner to pass or fail with the
> input rather than change it. But yes, please take pmd maintainers input
> as to what is reasonable to check also.
> 
> > 
> > [...]
> >>> +/**
> >>> + * RTE_FLOW_ACTION_TYPE_MARK
> >>> + *
> >>> + * Attaches a 32 bit value to packets.
> >>> + *
> >>> + * This value is arbitrary and application-defined. For compatibility with
> >>> + * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
> >>> + * also set in ol_flags.
> >>> + */
> >>> +struct rte_flow_action_mark {
> >>> +	uint32_t id; /**< 32 bit value to return with packets. */
> >>> +};
> >>
> >> One use case I thought we would be able to do for OVS is classification
> >> in hardware and the unique flow id is sent with the packet to software.
> >> But in OVS the ufid is 128 bits, so it means we can't and there is still
> >> the miniflow extract overhead. I'm not sure if there is a practical way
> >> around this.
> >>
> >> Sugesh (cc'd) has looked at this before and may be able to comment or
> >> correct me.
> > 
> > Yes, we settled on 32 bit because currently no known hardware implementation
> > supports more than this. If that changes, another action with a larger type
> > shall be provided (no ABI breakage).
> > 
> > Also since even 64 bit would not be enough for the use case you mention,
> > there is no choice but use this as an indirect value (such as an array or
> > hash table index/value).
> 
> ok, cool. I think Sugesh has other ideas anyway!
> 
> > 
> > [...]
> >>> +/**
> >>> + * RTE_FLOW_ACTION_TYPE_RSS
> >>> + *
> >>> + * Similar to QUEUE, except RSS is additionally performed on packets to
> >>> + * spread them among several queues according to the provided parameters.
> >>> + *
> >>> + * Note: RSS hash result is normally stored in the hash.rss mbuf field,
> >>> + * however it conflicts with the MARK action as they share the same
> >>> + * space. When both actions are specified, the RSS hash is discarded and
> >>> + * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
> >>> + * structure should eventually evolve to store both.
> >>> + *
> >>> + * Terminating by default.
> >>> + */
> >>> +struct rte_flow_action_rss {
> >>> +	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
> >>> +	uint16_t queues; /**< Number of entries in queue[]. */
> >>> +	uint16_t queue[]; /**< Queues indices to use. */
> >>
> >> I'd try and avoid queue and queues - someone will say "huh?" when
> >> reading code. s/queues/num ?
> > 
> > Agreed, will update for v2.
> > 
> >>> +};
> >>> +
> >>> +/**
> >>> + * RTE_FLOW_ACTION_TYPE_VF
> >>> + *
> >>> + * Redirects packets to a virtual function (VF) of the current device.
> >>> + *
> >>> + * Packets matched by a VF pattern item can be redirected to their original
> >>> + * VF ID instead of the specified one. This parameter may not be available
> >>> + * and is not guaranteed to work properly if the VF part is matched by a
> >>> + * prior flow rule or if packets are not addressed to a VF in the first
> >>> + * place.
> >>
> >> Not clear what you mean by "not guaranteed to work if...". Please return
> >> fail when this action is used if this is not going to work.
> > 
> > Again, this is a case where it is difficult for a PMD to determine if the
> > entire list of flow rules makes sense. Perhaps it does, perhaps whatever
> > goes through has already been filtered out of possible issues.
> > 
> > Here the documentation states the precautions an application should take to
> > guarantee it will work as intended. Perhaps it can be reworded (any
> > suggestion?), but a PMD can certainly not provide any strong guarantee.
> 
> I see your point. Maybe for easy check things the pmd would return fail,
> but for more complex I agree it's too difficult.
> 
> > 
> >>> + *
> >>> + * Terminating by default.
> >>> + */
> >>> +struct rte_flow_action_vf {
> >>> +	uint32_t original:1; /**< Use original VF ID if possible. */
> >>> +	uint32_t reserved:31; /**< Reserved, must be zero. */
> >>> +	uint32_t id; /**< VF ID to redirect packets to. */
> >>> +};
> > [...]
> >>> +/**
> >>> + * Check whether a flow rule can be created on a given port.
> >>> + *
> >>> + * While this function has no effect on the target device, the flow rule is
> >>> + * validated against its current configuration state and the returned value
> >>> + * should be considered valid by the caller for that state only.
> >>> + *
> >>> + * The returned value is guaranteed to remain valid only as long as no
> >>> + * successful calls to rte_flow_create() or rte_flow_destroy() are made in
> >>> + * the meantime and no device parameter affecting flow rules in any way are
> >>> + * modified, due to possible collisions or resource limitations (although in
> >>> + * such cases EINVAL should not be returned).
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param[in] attr
> >>> + *   Flow rule attributes.
> >>> + * @param[in] pattern
> >>> + *   Pattern specification (list terminated by the END pattern item).
> >>> + * @param[in] actions
> >>> + *   Associated actions (list terminated by the END action).
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   0 if flow rule is valid and can be created. A negative errno value
> >>> + *   otherwise (rte_errno is also set), the following errors are defined:
> >>> + *
> >>> + *   -ENOSYS: underlying device does not support this functionality.
> >>> + *
> >>> + *   -EINVAL: unknown or invalid rule specification.
> >>> + *
> >>> + *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
> >>> + *   bit-masks are unsupported).
> >>> + *
> >>> + *   -EEXIST: collision with an existing rule.
> >>> + *
> >>> + *   -ENOMEM: not enough resources.
> >>> + *
> >>> + *   -EBUSY: action cannot be performed due to busy device resources, may
> >>> + *   succeed if the affected queues or even the entire port are in a stopped
> >>> + *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
> >>> + */
> >>> +int
> >>> +rte_flow_validate(uint8_t port_id,
> >>> +		  const struct rte_flow_attr *attr,
> >>> +		  const struct rte_flow_item pattern[],
> >>> +		  const struct rte_flow_action actions[],
> >>> +		  struct rte_flow_error *error);
> >>
> >> Why not just use rte_flow_create() and get an error? Is it less
> >> disruptive to do a validate and find the rule cannot be created, than
> >> using a create directly?
> > 
> > The rationale can be found in the original RFC, which I'll convert to actual
> > documentation in v2. In short:
> > 
> > - Calling rte_flow_validate() before rte_flow_create() is useless since
> >   rte_flow_create() also performs validation.
> > 
> > - We cannot possibly express a full static set of allowed flow rules, even
> >   if we could, it usually depends on the current hardware configuration
> >   therefore would not be static.
> > 
> > - rte_flow_validate() is thus provided as a replacement for capability
> >   flags. It can be used to determine during initialization if the underlying
> >   device can support the typical flow rules an application might want to
> >   provide later and do something useful with that information (e.g. always
> >   use software fallback due to HW limitations).
> > 
> > - rte_flow_validate() being a subset of rte_flow_create(), it is essentially
> >   free to expose.
> 
> make sense now, thanks.
> 
> > 
> >>> +
> >>> +/**
> >>> + * Create a flow rule on a given port.
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param[in] attr
> >>> + *   Flow rule attributes.
> >>> + * @param[in] pattern
> >>> + *   Pattern specification (list terminated by the END pattern item).
> >>> + * @param[in] actions
> >>> + *   Associated actions (list terminated by the END action).
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   A valid handle in case of success, NULL otherwise and rte_errno is set
> >>> + *   to the positive version of one of the error codes defined for
> >>> + *   rte_flow_validate().
> >>> + */
> >>> +struct rte_flow *
> >>> +rte_flow_create(uint8_t port_id,
> >>> +		const struct rte_flow_attr *attr,
> >>> +		const struct rte_flow_item pattern[],
> >>> +		const struct rte_flow_action actions[],
> >>> +		struct rte_flow_error *error);
> >>
> >> General question - are these functions threadsafe? In the OVS example
> >> you could have several threads wanting to create flow rules at the same
> >> time for same or different ports.
> > 
> > No they aren't, applications have to perform their own locking. The RFC (to
> > be converted to actual documentation in v2) says that:
> > 
> > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> >   returned).
> > 
> > - There is no provision for reentrancy/multi-thread safety, although nothing
> >   should prevent different devices from being configured at the same
> >   time. PMDs may protect their control path functions accordingly.
> 
> other comment above wrt locking.
> 
> > 
> >>> +
> >>> +/**
> >>> + * Destroy a flow rule on a given port.
> >>> + *
> >>> + * Failure to destroy a flow rule handle may occur when other flow rules
> >>> + * depend on it, and destroying it would result in an inconsistent state.
> >>> + *
> >>> + * This function is only guaranteed to succeed if handles are destroyed in
> >>> + * reverse order of their creation.
> >>
> >> How can the application find this information out on error?
> > 
> > Without maintaining a list, they cannot. The specified case is the only
> > possible guarantee. That does not mean PMDs should not do their best to
> > destroy flow rules, only that ordering must remain consistent in case of
> > inability to destroy one.
> > 
> > What do you suggest?
> 
> I think if the app cannot remove a specific rule it may want to remove
> all rules and deal with flows in software for a time. So once the app
> knows it fails that should be enough.

OK, then since destruction may return an error already, is it fine?
Applications may call rte_flow_flush() (not supposed to fail unless there is
a serious issue, abort() in that case) and switch to SW fallback.

> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param flow
> >>> + *   Flow rule handle to destroy.
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> >>> + */
> >>> +int
> >>> +rte_flow_destroy(uint8_t port_id,
> >>> +		 struct rte_flow *flow,
> >>> +		 struct rte_flow_error *error);
> >>> +
> >>> +/**
> >>> + * Destroy all flow rules associated with a port.
> >>> + *
> >>> + * In the unlikely event of failure, handles are still considered destroyed
> >>> + * and no longer valid but the port must be assumed to be in an inconsistent
> >>> + * state.
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> >>> + */
> >>> +int
> >>> +rte_flow_flush(uint8_t port_id,
> >>> +	       struct rte_flow_error *error);
> >>
> >> rte_flow_destroy_all() would be more descriptive (but breaks your style)
> > 
> > There are enough underscores as it is. I like flush, if enough people
> > complain we'll change it but it has to occur before the first public
> > release.
> > 
> >>> +
> >>> +/**
> >>> + * Query an existing flow rule.
> >>> + *
> >>> + * This function allows retrieving flow-specific data such as counters.
> >>> + * Data is gathered by special actions which must be present in the flow
> >>> + * rule definition.
> >>
> >> re last sentence, it would be good if you can put a link to
> >> RTE_FLOW_ACTION_TYPE_COUNT
> > 
> > Will do, I did not know how until very recently.
> > 
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param flow
> >>> + *   Flow rule handle to query.
> >>> + * @param action
> >>> + *   Action type to query.
> >>> + * @param[in, out] data
> >>> + *   Pointer to storage for the associated query data type.
> >>
> >> can this be anything other than rte_flow_query_count?
> > 
> > Likely in the future. I've only defined this one as a counterpart for
> > existing API functionality and because we wanted to expose it in mlx5.
> > 
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> >>> + */
> >>> +int
> >>> +rte_flow_query(uint8_t port_id,
> >>> +	       struct rte_flow *flow,
> >>> +	       enum rte_flow_action_type action,
> >>> +	       void *data,
> >>> +	       struct rte_flow_error *error);
> >>> +
> >>> +#ifdef __cplusplus
> >>> +}
> >>> +#endif
> >>
> >> I don't see a way to dump all the rules for a port out. I think this is
> >> neccessary for degbugging. You could have a look through dpif.h in OVS
> >> and see how dpif_flow_dump_next() is used, it might be a good reference.
> > 
> > DPDK does not maintain flow rules and, depending on hardware capabilities
> > and level of compliance, PMDs do not necessarily do it either, particularly
> > since it requires space and application probably have a better method to
> > store these pointers for their own needs.
> 
> understood
> 
> > 
> > What you see here is only a PMD interface. Depending on applications needs,
> > generic helper functions built on top of these may be added to manage flow
> > rules in the future.
> 
> I'm thinking of the case where something goes wrong and I want to get a
> dump of all the flow rules from hardware, not query the rules I think I
> have. I don't see a way to do it or something to build a helper on top of?

Generic helper functions would exist on top of this API and would likely
maintain a list of flow rules themselves. The dump in that case would be
entirely implemented in software. I think that recovering flow rules from HW
may be complicated in many cases (even without taking storage allocation and
rules conversion issues into account), therefore if there is really a need
for it, we could perhaps add a dump() function that PMDs are free to
implement later.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-08 15:07  0%   ` Neil Horman
@ 2016-12-08 15:10  0%     ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2016-12-08 15:10 UTC (permalink / raw)
  To: Neil Horman, Nélio Laranjeiro
  Cc: dev, Richardson, Bruce, Wiles, Keith, Morten Brørup, Lu,
	Wenzhuo, Olivier Matz



> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Thursday, December 8, 2016 3:07 PM
> To: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Wiles,
> Keith <keith.wiles@intel.com>; Morten Brørup <mb@smartsharesystems.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Subject: Re: [PATCH] net: introduce big and little endian types
> 
> On Thu, Dec 08, 2016 at 10:30:05AM +0100, Nélio Laranjeiro wrote:
> > Hi all,
> >
> > Following previous discussions, I would like to gather requirements for
> > v2, currently we have:
> >
> > 1. Introduction of new typedefs.
> > 2. Modification of network headers.
> > 3. Modification of rte_*_to_*() functions.
> >
> > Point 1. seems not to be an issue, everyone seems to agree on the fact
> > having those types could help to document some parts of the code.
> >
> No objection here
> 
> > Point 2. does not cause any ABI change as it is only a documentation
> > commit, not sure if anyone disagrees about this.
> >
> I have an objection here, and I think it was stated by others previously.  While
> its fine to offer endian encoded types so that developers can use them
> expediently, I don't like the idea of coding them into network headers
> specifically.  I assert that because network headers represent multiple views of
> network data (both network byte order if the data is taken off the wire and cpu
> byte order if its translated.  To implement such a network header change
> efficiently what you would need is something like the following:
> 
> struct rte_ip_network_hdr {
> 	rte_le_u32 dst;
> 	rte_le_u32 src;
> 	...
> };
> 
> struct rte_ip_cpu_hdr {
> 	rte_cpu_u32 dst;
> 	rte_cpu_u32 src;
> 	...
> };
> 
> where rte_cpu_* is defined to a big endian or little endian type based on the
> cpu being targeted.
> 
> Then of course you need to define translation macros to do all the appropriate
> conversions convieniently (or you need to do specific translations on the
> network byte order as needed, which may lead to lots of repeated conversions).
> Regardless, this seems to be unscalable. Endian types are the sort of thing that
> you should only use sparingly, not by default.

+1

> 
> > Point 3. documentation commit most people are uncomfortable with.
> > I propose to drop it from v2.
> >
> > Any objection to this plan?
> >
> > --
> > Nélio Laranjeiro
> > 6WIND
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-08  9:30  3% ` Nélio Laranjeiro
  2016-12-08 13:59  3%   ` Wiles, Keith
@ 2016-12-08 15:07  0%   ` Neil Horman
  2016-12-08 15:10  0%     ` Ananyev, Konstantin
  1 sibling, 1 reply; 200+ results
From: Neil Horman @ 2016-12-08 15:07 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: dev, Ananyev, Konstantin, Bruce Richardson, Wiles, Keith,
	Morten Brørup, wenzhuo.lu, Olivier Matz

On Thu, Dec 08, 2016 at 10:30:05AM +0100, Nélio Laranjeiro wrote:
> Hi all,
> 
> Following previous discussions, I would like to gather requirements for
> v2, currently we have:
> 
> 1. Introduction of new typedefs.
> 2. Modification of network headers.
> 3. Modification of rte_*_to_*() functions.
> 
> Point 1. seems not to be an issue, everyone seems to agree on the fact
> having those types could help to document some parts of the code.
> 
No objection here

> Point 2. does not cause any ABI change as it is only a documentation
> commit, not sure if anyone disagrees about this.
> 
I have an objection here, and I think it was stated by others previously.  While
its fine to offer endian encoded types so that developers can use them
expediently, I don't like the idea of coding them into network headers
specifically.  I assert that because network headers represent multiple views of
network data (both network byte order if the data is taken off the wire and cpu
byte order if its translated.  To implement such a network header change
efficiently what you would need is something like the following:

struct rte_ip_network_hdr {
	rte_le_u32 dst;
	rte_le_u32 src;
	...
}; 

struct rte_ip_cpu_hdr {
	rte_cpu_u32 dst;
	rte_cpu_u32 src;
	...
};

where rte_cpu_* is defined to a big endian or little endian type based on the
cpu being targeted.  

Then of course you need to define translation macros to do all the appropriate
conversions convieniently (or you need to do specific translations on the
network byte order as needed, which may lead to lots of repeated conversions).
Regardless, this seems to be unscalable. Endian types are the sort of thing that
you should only use sparingly, not by default.

> Point 3. documentation commit most people are uncomfortable with.
> I propose to drop it from v2.
> 
> Any objection to this plan?
> 
> -- 
> Nélio Laranjeiro
> 6WIND
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ethdev: expand size of eth_dev_name in next release
  2016-12-08  2:27  5% [dpdk-dev] [RFC] ethdev: expand size of eth_dev_name in next release Stephen Hemminger
@ 2016-12-08 15:04  4% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-12-08 15:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2016-12-07 18:27, Stephen Hemminger:
> In order to support Hyper-V in a direct fashion, the eth_dev name
> needs to be expanded. The standard format for text representation of GUID
> is 36 bytes (plus null).  See uuid_unparse(3).
[...]
> --- a/doc/guides/rel_notes/release_17_02.rst
> +++ b/doc/guides/rel_notes/release_17_02.rst
> @@ -116,7 +116,9 @@ ABI Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
>  
> -
> + * The macro ``RTE_ETH_NAME_MAX_LEN`` used in rte_eth_dev_data will be
> +   increased from 32 to 40 characters to allow for longer values such
> +   as GUID which is 36 characters long (plus null character).

Please start at the margin and keep the double blank lines before the next title.

>  Shared Library Versions
>  -----------------------
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 9678179..68cb956 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1652,7 +1652,11 @@ struct rte_eth_dev_sriov {
>  };
>  #define RTE_ETH_DEV_SRIOV(dev)         ((dev)->data->sriov)
>  
> +#ifdef RTE_NEXT_ABI
> +#define RTE_ETH_NAME_MAX_LEN 40
> +#else
>  #define RTE_ETH_NAME_MAX_LEN (32)
> +#endif

No need for RTE_NEXT_ABI as it was planned to break ethdev ABI for
several reasons (see doc/guides/rel_notes/deprecation.rst).

Note that we should continue the discussion about the ABI process,
but I prefer avoiding this debate during December as we are really
too busy until the RC1.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-08  9:30  3% ` Nélio Laranjeiro
@ 2016-12-08 13:59  3%   ` Wiles, Keith
  2016-12-08 15:07  0%   ` Neil Horman
  1 sibling, 0 replies; 200+ results
From: Wiles, Keith @ 2016-12-08 13:59 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: DPDK, Ananyev, Konstantin, Richardson, Bruce, Morten Brørup,
	Neil Horman, Lu, Wenzhuo, Olivier Matz


> On Dec 8, 2016, at 3:30 AM, Nélio Laranjeiro <nelio.laranjeiro@6wind.com> wrote:
> 
> Hi all,
> 
> Following previous discussions, I would like to gather requirements for
> v2, currently we have:
> 
> 1. Introduction of new typedefs.
> 2. Modification of network headers.
> 3. Modification of rte_*_to_*() functions.
> 
> Point 1. seems not to be an issue, everyone seems to agree on the fact
> having those types could help to document some parts of the code.

I never stated these new types were useful in any way, I still believe documentation of the code is the better solution then forcing yet another restriction in submitting patches. 

> 
> Point 2. does not cause any ABI change as it is only a documentation
> commit, not sure if anyone disagrees about this.

I guess no ABI change is done, but I feel it should be as the developer now need to adjust his to reflex these new type even if the compiler does not complain.

> 
> Point 3. documentation commit most people are uncomfortable with.

Not sure what this one is stating, but I whole heartily believe documentation of the code is the best way forward.

The main reasons are:
 - We do not need to add yet another type to DPDK to make the patch process even more restrictive.
 - The new types do not add any type of checking for the compiler and the developer can still get it wrong.
- If any common code used in other platform (say Linux kernel driver) we have to include these new types in that environment.
 - Documentation is the best solution IMO to resolve these types of issues and it does not require any new types or code changes in DPDK or developers code.

Sorry, I strongly disagree with this patch in any form expect documentation changes.

> I propose to drop it from v2.
> 
> Any objection to this plan?
> 
> -- 
> Nélio Laranjeiro
> 6WIND

Regards,
Keith


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/5] ethdev: add firmware version get
  @ 2016-12-08 11:07  3%     ` Ferruh Yigit
  2016-12-12  1:28  4%       ` Yang, Qiming
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-12-08 11:07 UTC (permalink / raw)
  To: Qiming Yang, dev; +Cc: Thomas Monjalon

Hi Qiming,

On 12/6/2016 7:16 AM, Qiming Yang wrote:
> This patch adds a new API 'rte_eth_dev_fwver_get' for fetching firmware
> version by a given device.
> 
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>

<...>

> @@ -1444,6 +1448,7 @@ struct eth_dev_ops {
>  	/**< Get names of extended statistics. */
>  	eth_queue_stats_mapping_set_t queue_stats_mapping_set;
>  	/**< Configure per queue stat counter mapping. */
> +	eth_fw_version_get_t       fw_version_get; /**< Get firmware version. */

Hi Qiming,

Not sure if I am missing something but this change is for following [1]
deprecation notice, right?
If so, notice suggest updating rte_eth_dev_info_get() to include
fw_version, but this patch adds a new eth_dev_ops.

Is it agreed to add a new eth_dev_ops for this?


[1]
* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
  will be extended with a new member ``fw_version`` in order to store
  the NIC firmware version.



>  	eth_dev_infos_get_t        dev_infos_get; /**< Get device info. */
>  	eth_dev_supported_ptypes_get_t dev_supported_ptypes_get;
>  	/**< Get packet types supported and identified by device*/

<...>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model
  2016-12-08  1:24  3%       ` Jerin Jacob
@ 2016-12-08 11:02  4%         ` Van Haaren, Harry
  2016-12-14 13:13  3%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2016-12-08 11:02 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, thomas.monjalon, Richardson, Bruce, hemant.agrawal, Eads, Gage

> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Thursday, December 8, 2016 1:24 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>

<snip>

> > * Operation and sched_type *increased* to 4 bits each (from previous value of 2) to
> allow future expansion without ABI changes
> 
> Anyway it will break ABI if we add new operation. I would propose to keep 4bit
> reserved and add it when required.

Ok sounds good. I'll suggest to move it to the middle between operation or sched type, which would allow expanding operation without ABI breaks. On expanding the field would remain in the same place with the same bits available in that place (no ABI break), but new bits can be added into the currently reserved space.


> > * Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
> > * sub-event-type reduced to 4 bits (previous value was 8 bits). Can we think of
> situations where 16 values for application specified identifiers of each event-type is
> genuinely not enough?
> One packet will not go beyond 16 stages but an application may have more stages and
> each packet may go mutually exclusive stages. For example,
> 
> packet 0: stagex_0 ->stagex_1
> packet 1: stagey_0 ->stagey_1
> 
> In that sense, IMO, more than 16 is required.(AFIAK, VPP has any much larger limit on
> number of stages)

My understanding was that stages are linked to event queues, so the application can determine the stage the packet comes from by reading queue_id?

I'm not opposed to having an 8 bit sub_event_type, but it seems unnecessarily large from my point of view. If you have a use for it, I'm ok with 8 bits.


> > In my opinion this structure layout is more balanced, and will perform better due to
> less loads that will need masking to access the required value.
> OK. Considering more balanced layout and above points. I propose following scheme(based on
> your input)
> 
> 	union {
> 		uint64_t event;
> 		struct {
> 			uint32_t flow_id: 20;
> 			uint32_t sub_event_type : 8;
> 			uint32_t event_type : 4;
> 
> 			uint8_t rsvd: 4; /* for future additions */
> 			uint8_t operation  : 2; /* new fwd drop */
> 			uint8_t sched_type : 2;
> 
> 			uint8_t queue_id;
> 			uint8_t priority;
> 			uint8_t impl_opaque;
> 		};
> 	};
> 
> Feedback and improvements welcomed,


So incorporating my latest suggestions on moving fields around, excluding sub_event_type *size* changes:

union {
	uint64_t event;
	struct {
		uint32_t flow_id: 20;
		uint32_t event_type : 4;
		uint32_t sub_event_type : 8; /* 8 bits now naturally aligned */

		uint8_t operation  : 2; /* new fwd drop */
		uint8_t rsvd: 4; /* for future additions, can be expanded into without ABI break */
		uint8_t sched_type : 2;

		uint8_t queue_id;
		uint8_t priority;
		uint8_t impl_opaque;
	};
};


-Harry

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] pci: remove unused UNBIND support
  @ 2016-12-08 10:53  3% ` David Marchand
  2016-12-21 15:15  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2016-12-08 10:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Thomas Monjalon

On Wed, Dec 7, 2016 at 7:04 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> No device driver sets the unbind flag in current public code base.
> Therefore it is good time to remove the unused dead code.

Yes, this has been unused for some time now.

I would say this is not subject to abi enforcement as this only
matters to driver api not application api.
So this can go into 17.02.

The patch looks good to me.


-- 
David Marchand

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-11-09 15:04  2% [dpdk-dev] [PATCH] net: introduce big and little endian types Nelio Laranjeiro
  2016-12-05 10:09  0% ` Ananyev, Konstantin
@ 2016-12-08  9:30  3% ` Nélio Laranjeiro
  2016-12-08 13:59  3%   ` Wiles, Keith
  2016-12-08 15:07  0%   ` Neil Horman
  1 sibling, 2 replies; 200+ results
From: Nélio Laranjeiro @ 2016-12-08  9:30 UTC (permalink / raw)
  To: dev, Ananyev, Konstantin, Bruce Richardson, Wiles, Keith,
	Morten Brørup, Neil Horman
  Cc: wenzhuo.lu, Olivier Matz

Hi all,

Following previous discussions, I would like to gather requirements for
v2, currently we have:

1. Introduction of new typedefs.
2. Modification of network headers.
3. Modification of rte_*_to_*() functions.

Point 1. seems not to be an issue, everyone seems to agree on the fact
having those types could help to document some parts of the code.

Point 2. does not cause any ABI change as it is only a documentation
commit, not sure if anyone disagrees about this.

Point 3. documentation commit most people are uncomfortable with.
I propose to drop it from v2.

Any objection to this plan?

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API Adrien Mazarguil
  2016-11-18  6:36  0%     ` Xing, Beilei
  2016-11-30 17:47  0%     ` Kevin Traynor
@ 2016-12-08  9:00  0%     ` Xing, Beilei
  2 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2016-12-08  9:00 UTC (permalink / raw)
  To: Adrien Mazarguil, dev
  Cc: Thomas Monjalon, De Lara Guarch, Pablo, Olivier Matz



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Thursday, November 17, 2016 12:23 AM
> To: dev@dpdk.org
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
> This new API supersedes all the legacy filter types described in rte_eth_ctrl.h.
> It is slightly higher level and as a result relies more on PMDs to process and
> validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
>  MAINTAINERS                            |   4 +
>  lib/librte_ether/Makefile              |   3 +
>  lib/librte_ether/rte_eth_ctrl.h        |   1 +
>  lib/librte_ether/rte_ether_version.map |  10 +
>  lib/librte_ether/rte_flow.c            | 159 +++++
>  lib/librte_ether/rte_flow.h            | 947 ++++++++++++++++++++++++++++
>  lib/librte_ether/rte_flow_driver.h     | 177 ++++++
>  7 files changed, 1301 insertions(+)
> 
> +/**
> + * RTE_FLOW_ITEM_TYPE_ETH
> + *
> + * Matches an Ethernet header.
> + */
> +struct rte_flow_item_eth {
> +	struct ether_addr dst; /**< Destination MAC. */
> +	struct ether_addr src; /**< Source MAC. */
> +	unsigned int type; /**< EtherType. */
Hi Adrien,

ETHERTYPE in ether header is 2 bytes, so I think "uint16_t type" is more appropriate here, what do you think?

Thanks,
Beilei Xing
> +};
> +

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC] ethdev: expand size of eth_dev_name in next release
@ 2016-12-08  2:27  5% Stephen Hemminger
  2016-12-08 15:04  4% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2016-12-08  2:27 UTC (permalink / raw)
  To: dev

This came up while revising earlier work on Hyper-V.
The current versions of DPDK does not have enough space to support
a logical device name in VMBUS. The kernel exposes the VMBUS
devices by GUID in a manner similar to how PCI is expressed
with domain:host:function notation.

In order to support Hyper-V in a direct fashion, the eth_dev name
needs to be expanded. The standard format for text representation of GUID
is 36 bytes (plus null).  See uuid_unparse(3).

The other alternative is to use base64 encoding, but this worse for
humans to read, and isn't directly handled by lib uuid.

---
 doc/guides/rel_notes/release_17_02.rst | 4 +++-
 lib/librte_ether/rte_ethdev.h          | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..52c97c6 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -116,7 +116,9 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
-
+ * The macro ``RTE_ETH_NAME_MAX_LEN`` used in rte_eth_dev_data will be
+   increased from 32 to 40 characters to allow for longer values such
+   as GUID which is 36 characters long (plus null character).
 
 Shared Library Versions
 -----------------------
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..68cb956 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1652,7 +1652,11 @@ struct rte_eth_dev_sriov {
 };
 #define RTE_ETH_DEV_SRIOV(dev)         ((dev)->data->sriov)
 
+#ifdef RTE_NEXT_ABI
+#define RTE_ETH_NAME_MAX_LEN 40
+#else
 #define RTE_ETH_NAME_MAX_LEN (32)
+#endif
 
 /**
  * @internal
-- 
2.10.2

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model
  2016-12-07 10:57  3%     ` Van Haaren, Harry
@ 2016-12-08  1:24  3%       ` Jerin Jacob
  2016-12-08 11:02  4%         ` Van Haaren, Harry
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-12-08  1:24 UTC (permalink / raw)
  To: Van Haaren, Harry
  Cc: dev, thomas.monjalon, Richardson, Bruce, hemant.agrawal, Eads, Gage

On Wed, Dec 07, 2016 at 10:57:13AM +0000, Van Haaren, Harry wrote:
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> 
> Hi Jerin,

Hi Harry,

> 
> Re v2 rte_event struct, there seems to be some changes in the struct layout and field sizes. I've investigated them, and would like to propose some changes to balance the byte-alignment and accessing of the fields.

OK. Looks like balanced byte-alignment makes more sense on IA.We will go with that then.
Few comments below,


> 
> These changes target only the first 64 bits of the rte_event struct. I've left the current v2 code for reference, please find my proposed changes below.
> 
> > +struct rte_event {
> > +	/** WORD0 */
> > +	RTE_STD_C11
> > +	union {
> > +		uint64_t event;
> > +		/** Event attributes for dequeue or enqueue operation */
> > +		struct {
> > +			uint64_t flow_id:20;
> > +			/**< Targeted flow identifier for the enqueue and
> > +			 * dequeue operation.
> > +			 * The value must be in the range of
> > +			 * [0, nb_event_queue_flows - 1] which
> > +			 * previously supplied to rte_event_dev_configure().
> > +			 */
> > +			uint64_t sub_event_type:8;
> > +			/**< Sub-event types based on the event source.
> > +			 * @see RTE_EVENT_TYPE_CPU
> > +			 */
> > +			uint64_t event_type:4;
> > +			/**< Event type to classify the event source.
> > +			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> > +			 */
> > +			uint64_t sched_type:2;
> > +			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> > +			 * associated with flow id on a given event queue
> > +			 * for the enqueue and dequeue operation.
> > +			 */
> > +			uint64_t queue_id:8;
> > +			/**< Targeted event queue identifier for the enqueue or
> > +			 * dequeue operation.
> > +			 * The value must be in the range of
> > +			 * [0, nb_event_queues - 1] which previously supplied to
> > +			 * rte_event_dev_configure().
> > +			 */
> > +			uint64_t priority:8;
> > +			/**< Event priority relative to other events in the
> > +			 * event queue. The requested priority should in the
> > +			 * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST,
> > +			 * RTE_EVENT_DEV_PRIORITY_LOWEST].
> > +			 * The implementation shall normalize the requested
> > +			 * priority to supported priority value.
> > +			 * Valid when the device has
> > +			 * RTE_EVENT_DEV_CAP_FLAG_EVENT_QOS capability.
> > +			 */
> > +			uint64_t op:2;
> > +			/**< The type of event enqueue operation - new/forward/
> > +			 * etc.This field is not preserved across an instance
> > +			 * and is undefined on dequeue.
> > +			 *  @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> > +			 */
> > +			uint64_t impl_opaque:12;
> > +			/**< Implementation specific opaque value.
> > +			 * An implementation may use this field to hold
> > +			 * implementation specific value to share between
> > +			 * dequeue and enqueue operation.
> > +			 * The application should not modify this field.
> > +			 */
> > +		};
> > +	};
> 
> struct rte_event {
> 	/** WORD0 */
> 	RTE_STD_C11
> 	union {
> 		uint64_t event;
> 		struct {
> 			uint32_t flow_id: 24;
> 			uint32_t impl_opaque : 8; /* not defined on deq */
> 
> 			uint8_t queue_id;
> 			uint8_t priority;
> 
> 			uint8_t operation  : 4; /* new fwd drop */
> 			uint8_t sched_type : 4;
> 
> 			uint8_t event_type : 4;
> 			uint8_t sub_event_type : 4;
> 		};
> 	};
> 	/** word 1 */
> <snip>
> 
> 
> The changes made are as follows:
> * Add impl_opaque to the remaining 8 bits of those 32 bits (previous size was 12 bits)
OK
> 
> * QueueID and Priority remain 8 bit integers - but now accessible as 8 bit ints.
OK
> 
> * Operation and sched_type *increased* to 4 bits each (from previous value of 2) to allow future expansion without ABI changes

Anyway it will break ABI if we add new operation. I would propose to keep 4bit
reserved and add it when required.

> 
> * Event type remains constant at 4 bits

OK

> * Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
> * sub-event-type reduced to 4 bits (previous value was 8 bits). Can we think of situations where 16 values for application specified identifiers of each event-type is genuinely not enough?
One packet will not go beyond 16 stages but an application may have more stages and
each packet may go mutually exclusive stages. For example,

packet 0: stagex_0 ->stagex_1
packet 1: stagey_0 ->stagey_1

In that sense, IMO, more than 16 is required.(AFIAK, VPP has any much larger limit on number of stages)

> 
> In my opinion this structure layout is more balanced, and will perform better due to less loads that will need masking to access the required value.
OK. Considering more balanced layout and above points. I propose following scheme(based on your input)

	union {
		uint64_t event;
		struct {
			uint32_t flow_id: 20;
			uint32_t sub_event_type : 8;
			uint32_t event_type : 4;

			uint8_t rsvd: 4; /* for future additions */
			uint8_t operation  : 2; /* new fwd drop */
			uint8_t sched_type : 2;

			uint8_t queue_id;
			uint8_t priority;
			uint8_t impl_opaque;
		};
	};

Feedback and improvements welcomed,

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model
  @ 2016-12-07 10:57  3%     ` Van Haaren, Harry
  2016-12-08  1:24  3%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2016-12-07 10:57 UTC (permalink / raw)
  To: Jerin Jacob, dev
  Cc: thomas.monjalon, Richardson, Bruce, hemant.agrawal, Eads, Gage

> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]

Hi Jerin,

Re v2 rte_event struct, there seems to be some changes in the struct layout and field sizes. I've investigated them, and would like to propose some changes to balance the byte-alignment and accessing of the fields.

These changes target only the first 64 bits of the rte_event struct. I've left the current v2 code for reference, please find my proposed changes below.

> +struct rte_event {
> +	/** WORD0 */
> +	RTE_STD_C11
> +	union {
> +		uint64_t event;
> +		/** Event attributes for dequeue or enqueue operation */
> +		struct {
> +			uint64_t flow_id:20;
> +			/**< Targeted flow identifier for the enqueue and
> +			 * dequeue operation.
> +			 * The value must be in the range of
> +			 * [0, nb_event_queue_flows - 1] which
> +			 * previously supplied to rte_event_dev_configure().
> +			 */
> +			uint64_t sub_event_type:8;
> +			/**< Sub-event types based on the event source.
> +			 * @see RTE_EVENT_TYPE_CPU
> +			 */
> +			uint64_t event_type:4;
> +			/**< Event type to classify the event source.
> +			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> +			 */
> +			uint64_t sched_type:2;
> +			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> +			 * associated with flow id on a given event queue
> +			 * for the enqueue and dequeue operation.
> +			 */
> +			uint64_t queue_id:8;
> +			/**< Targeted event queue identifier for the enqueue or
> +			 * dequeue operation.
> +			 * The value must be in the range of
> +			 * [0, nb_event_queues - 1] which previously supplied to
> +			 * rte_event_dev_configure().
> +			 */
> +			uint64_t priority:8;
> +			/**< Event priority relative to other events in the
> +			 * event queue. The requested priority should in the
> +			 * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST,
> +			 * RTE_EVENT_DEV_PRIORITY_LOWEST].
> +			 * The implementation shall normalize the requested
> +			 * priority to supported priority value.
> +			 * Valid when the device has
> +			 * RTE_EVENT_DEV_CAP_FLAG_EVENT_QOS capability.
> +			 */
> +			uint64_t op:2;
> +			/**< The type of event enqueue operation - new/forward/
> +			 * etc.This field is not preserved across an instance
> +			 * and is undefined on dequeue.
> +			 *  @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> +			 */
> +			uint64_t impl_opaque:12;
> +			/**< Implementation specific opaque value.
> +			 * An implementation may use this field to hold
> +			 * implementation specific value to share between
> +			 * dequeue and enqueue operation.
> +			 * The application should not modify this field.
> +			 */
> +		};
> +	};

struct rte_event {
	/** WORD0 */
	RTE_STD_C11
	union {
		uint64_t event;
		struct {
			uint32_t flow_id: 24;
			uint32_t impl_opaque : 8; /* not defined on deq */

			uint8_t queue_id;
			uint8_t priority;

			uint8_t operation  : 4; /* new fwd drop */
			uint8_t sched_type : 4;

			uint8_t event_type : 4;
			uint8_t sub_event_type : 4;
		};
	};
	/** word 1 */
<snip>


The changes made are as follows:
* Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
* Add impl_opaque to the remaining 8 bits of those 32 bits (previous size was 12 bits)

* QueueID and Priority remain 8 bit integers - but now accessible as 8 bit ints.

* Operation and sched_type *increased* to 4 bits each (from previous value of 2) to allow future expansion without ABI changes

* Event type remains constant at 4 bits
* sub-event-type reduced to 4 bits (previous value was 8 bits). Can we think of situations where 16 values for application specified identifiers of each event-type is genuinely not enough?

In my opinion this structure layout is more balanced, and will perform better due to less loads that will need masking to access the required value.


Feedback and improvements welcomed, -Harry

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-12-02 21:06  0%         ` Kevin Traynor
@ 2016-12-06 18:11  0%           ` Chandran, Sugesh
  2016-12-08 17:07  3%           ` Adrien Mazarguil
  1 sibling, 0 replies; 200+ results
From: Chandran, Sugesh @ 2016-12-06 18:11 UTC (permalink / raw)
  To: Kevin Traynor, Adrien Mazarguil
  Cc: dev, Thomas Monjalon, De Lara Guarch, Pablo, Olivier Matz,
	sugesh.chandran

Hi Adrien,
Thanks for sending out the patches,

Please find few comments below,


Regards
_Sugesh


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Kevin Traynor
> Sent: Friday, December 2, 2016 9:07 PM
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas.monjalon@6wind.com>; De
> Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>; sugesh.chandran@intel.comn
> Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
>>>>>>Snipp
> >>> + *
> >>> + * Attaches a 32 bit value to packets.
> >>> + *
> >>> + * This value is arbitrary and application-defined. For
> >>> +compatibility with
> >>> + * FDIR it is returned in the hash.fdir.hi mbuf field.
> >>> +PKT_RX_FDIR_ID is
> >>> + * also set in ol_flags.
> >>> + */
> >>> +struct rte_flow_action_mark {
> >>> +	uint32_t id; /**< 32 bit value to return with packets. */ };
> >>
> >> One use case I thought we would be able to do for OVS is
> >> classification in hardware and the unique flow id is sent with the packet to
> software.
> >> But in OVS the ufid is 128 bits, so it means we can't and there is
> >> still the miniflow extract overhead. I'm not sure if there is a
> >> practical way around this.
> >>
> >> Sugesh (cc'd) has looked at this before and may be able to comment or
> >> correct me.
> >
> > Yes, we settled on 32 bit because currently no known hardware
> > implementation supports more than this. If that changes, another
> > action with a larger type shall be provided (no ABI breakage).
> >
> > Also since even 64 bit would not be enough for the use case you
> > mention, there is no choice but use this as an indirect value (such as
> > an array or hash table index/value).
> 
> ok, cool. I think Sugesh has other ideas anyway!
[Sugesh] It should be fine with 32 bit . we can manage it in OVS accordingly.
> 
> >
> > [...]
> >>> +/**
> >>> + * RTE_FLOW_ACTION_TYPE_RSS
> >>> + *
> >>> +
> >>> + *
> >>> + * Terminating by default.
> >>> + */
> >>> +struct rte_flow_action_vf {
> >>> +	uint32_t original:1; /**< Use original VF ID if possible. */
> >>> +	uint32_t reserved:31; /**< Reserved, must be zero. */
> >>> +	uint32_t id; /**< VF ID to redirect packets to. */ };
> > [...]
> >>> +/**
> >>> + * Check whether a flow rule can be created on a given port.
> >>> + *
> >>> + * While this function has no effect on the target device, the flow
> >>> +rule is
> >>> + * validated against its current configuration state and the
> >>> +returned value
> >>> + * should be considered valid by the caller for that state only.
> >>> + *
> >>> + * The returned value is guaranteed to remain valid only as long as
> >>> +no
> >>> + * successful calls to rte_flow_create() or rte_flow_destroy() are
> >>> +made in
> >>> + * the meantime and no device parameter affecting flow rules in any
> >>> +way are
> >>> + * modified, due to possible collisions or resource limitations
> >>> +(although in
> >>> + * such cases EINVAL should not be returned).
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param[in] attr
> >>> + *   Flow rule attributes.
> >>> + * @param[in] pattern
> >>> + *   Pattern specification (list terminated by the END pattern item).
> >>> + * @param[in] actions
> >>> + *   Associated actions (list terminated by the END action).
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   0 if flow rule is valid and can be created. A negative errno value
> >>> + *   otherwise (rte_errno is also set), the following errors are defined:
> >>> + *
> >>> + *   -ENOSYS: underlying device does not support this functionality.
> >>> + *
> >>> + *   -EINVAL: unknown or invalid rule specification.
> >>> + *
> >>> + *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
> >>> + *   bit-masks are unsupported).
> >>> + *
> >>> + *   -EEXIST: collision with an existing rule.
> >>> + *
> >>> + *   -ENOMEM: not enough resources.
> >>> + *
> >>> + *   -EBUSY: action cannot be performed due to busy device resources,
> may
> >>> + *   succeed if the affected queues or even the entire port are in a
> stopped
> >>> + *   state (see rte_eth_dev_rx_queue_stop() and
> rte_eth_dev_stop()).
> >>> + */
> >>> +int
> >>> +rte_flow_validate(uint8_t port_id,
> >>> +		  const struct rte_flow_attr *attr,
> >>> +		  const struct rte_flow_item pattern[],
> >>> +		  const struct rte_flow_action actions[],
> >>> +		  struct rte_flow_error *error);
> >>
> >> Why not just use rte_flow_create() and get an error? Is it less
> >> disruptive to do a validate and find the rule cannot be created, than
> >> using a create directly?
> >
> > The rationale can be found in the original RFC, which I'll convert to
> > actual documentation in v2. In short:
> >
> > - Calling rte_flow_validate() before rte_flow_create() is useless since
> >   rte_flow_create() also performs validation.
> >
> > - We cannot possibly express a full static set of allowed flow rules, even
> >   if we could, it usually depends on the current hardware configuration
> >   therefore would not be static.
> >
> > - rte_flow_validate() is thus provided as a replacement for capability
> >   flags. It can be used to determine during initialization if the underlying
> >   device can support the typical flow rules an application might want to
> >   provide later and do something useful with that information (e.g. always
> >   use software fallback due to HW limitations).
> >
> > - rte_flow_validate() being a subset of rte_flow_create(), it is essentially
> >   free to expose.
> 
> make sense now, thanks.
[Sugesh] : We had this discussion earlier at the design stage about the time taken for programming the hardware,
and how to make it deterministic. How about having a timeout parameter as well for the rte_flow_*
If the hardware flow insert is timed out, error out than waiting indefinitely, so that application have some control over
The time to program the flow. It can be another set of APIs something like, rte_flow_create_timeout()

Are you going to provide any control over the initialization of NIC  to define the capability matrices
For eg; To operate in a L3 router mode,  software wanted to initialize the NIC port only to consider the L2 and L3 fields.
I assume the initialization is done based on the first rules that are programmed into the NIC.?
> 
> >
> >>> +
> >>> +/**
> >>> + * Create a flow rule on a given port.
> >>> + *
> >>> + * @param port_id
> >>> + *   Port identifier of Ethernet device.
> >>> + * @param[in] attr
> >>> + *   Flow rule attributes.
> >>> + * @param[in] pattern
> >>> + *   Pattern specification (list terminated by the END pattern item).
> >>> + * @param[in] actions
> >>> + *   Associated actions (list terminated by the END action).
> >>> + * @param[out] error
> >>> + *   Perform verbose error reporting if not NULL.
> >>> + *
> >>> + * @return
> >>> + *   A valid handle in case of success, NULL otherwise and rte_errno is
> set
> >>> + *   to the positive version of one of the error codes defined for
> >>> + *   rte_flow_validate().
> >>> + */
> >>> +struct rte_flow *
> >>> +rte_flow_create(uint8_t port_id,
> >>> +		const struct rte_flow_attr *attr,
> >>> +		const struct rte_flow_item pattern[],
> >>> +		const struct rte_flow_action actions[],
> >>> +		struct rte_flow_error *error);
> >>
> >> General question - are these functions threadsafe? In the OVS example
> >> you could have several threads wanting to create flow rules at the
> >> same time for same or different ports.
> >
> > No they aren't, applications have to perform their own locking. The
> > RFC (to be converted to actual documentation in v2) says that:
> >
> > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> >   returned).
> >
> > - There is no provision for reentrancy/multi-thread safety, although
> nothing
> >   should prevent different devices from being configured at the same
> >   time. PMDs may protect their control path functions accordingly.
> 
> other comment above wrt locking.
> 
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 16:31  3%                     ` Wiles, Keith
@ 2016-12-06 16:36  4%                       ` Richardson, Bruce
  0 siblings, 0 replies; 200+ results
From: Richardson, Bruce @ 2016-12-06 16:36 UTC (permalink / raw)
  To: Wiles, Keith, Nélio Laranjeiro
  Cc: Morten Brørup, Ananyev, Konstantin, DPDK, Olivier Matz, Lu,
	Wenzhuo, Adrien Mazarguil

> -----Original Message-----
> From: Wiles, Keith
> Sent: Tuesday, December 6, 2016 4:32 PM
> To: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Cc: Morten Brørup <mb@smartsharesystems.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; DPDK <dev@dpdk.org>; Olivier Matz
> <olivier.matz@6wind.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Adrien
> Mazarguil <adrien.mazarguil@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
> 
> 
> > On Dec 6, 2016, at 10:28 AM, Nélio Laranjeiro
> <nelio.laranjeiro@6wind.com> wrote:
> >
> > Hi all,
> >
> > On Tue, Dec 06, 2016 at 04:34:07PM +0100, Morten Brørup wrote:
> >> Hi all,
> >>
> >> Being a big fan of strong typing, I really like the concept of
> >> explicit endian types. Especially if type mismatches can be caught at
> >> compile time.
> >
> > +1,
> >
> >> However, I think it is too late! That train left the station when the
> >> rest of the world - including libraries and headers that might be
> >> linked with a DPDK application - decided to use implicit big endian
> >> types for network protocols, and has been doing so for decades. And,
> >> with all respect, I don't think the DPDK community has the momentum
> >> required to change this tradition outside the community.
> >
> > I don't think, those types can be use from now on to help new API to
> > expose explicitly the type they are handling.  For older ones, it can
> > come in a second step, even if there are not so numerous.  Only few of
> > them touches the network types.
> >
> >> Furthermore: If not enforced throughout DPDK (and beyond), it might
> >> confuse more than it helps.
> >
> > The current situation is more confusing,  nobody at any layer can rely
> > on a precise information, at each function entry we need to verify if
> > the callee has already handled the job.  The only solution is to
> > browse the code to have this information.
> >
> > Think about any function manipulating network headers (like flow
> > director or rte_flow) from the API down to the PMD, it may take a lot
> > of time to know at the end if the data is CPU or network ordered, with
> > those types it takes less than a second.
> 
> Still Documentation should handle this problem without code and ABI
> changes.
> 

While my additional suggestion of compiler-enforced endian correctness may break the ABI (though even that is not certain since parameters would be the same size, just from a compiler syntax analysis side the result would be different, I think), if I'm reading it correctly, Neilo's original suggestion of just using typedefs for big or little endian won't affect the ABI as the underlying types are exactly the same as before. Only the type name has changed to document for the user the endianness the expected data is to be. In effect, the original suggestion is a documentation patch - just the code is the documentation.

Regards,
/Bruce


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  @ 2016-12-06 16:31  3%                     ` Wiles, Keith
  2016-12-06 16:36  4%                       ` Richardson, Bruce
  0 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2016-12-06 16:31 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: Morten Brørup, Ananyev, Konstantin, Richardson, Bruce, DPDK,
	Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil


> On Dec 6, 2016, at 10:28 AM, Nélio Laranjeiro <nelio.laranjeiro@6wind.com> wrote:
> 
> Hi all,
> 
> On Tue, Dec 06, 2016 at 04:34:07PM +0100, Morten Brørup wrote:
>> Hi all,
>> 
>> Being a big fan of strong typing, I really like the concept of
>> explicit endian types. Especially if type mismatches can be caught at
>> compile time.
> 
> +1,
> 
>> However, I think it is too late! That train left the station when the
>> rest of the world - including libraries and headers that might be
>> linked with a DPDK application - decided to use implicit big endian
>> types for network protocols, and has been doing so for decades. And,
>> with all respect, I don't think the DPDK community has the momentum
>> required to change this tradition outside the community.
> 
> I don't think, those types can be use from now on to help new API to
> expose explicitly the type they are handling.  For older ones, it can
> come in a second step, even if there are not so numerous.  Only few of
> them touches the network types.
> 
>> Furthermore: If not enforced throughout DPDK (and beyond), it might
>> confuse more than it helps.
> 
> The current situation is more confusing,  nobody at any layer can rely
> on a precise information, at each function entry we need to verify if
> the callee has already handled the job.  The only solution is to browse
> the code to have this information.
> 
> Think about any function manipulating network headers (like flow director
> or rte_flow) from the API down to the PMD, it may take a lot of time to
> know at the end if the data is CPU or network ordered, with those types
> it takes less than a second.

Still Documentation should handle this problem without code and ABI changes.

> 
> Regards,
> 
> -- 
> Nélio Laranjeiro
> 6WIND

Regards,
Keith


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 14:45  3%             ` Ananyev, Konstantin
@ 2016-12-06 14:56  4%               ` Wiles, Keith
    0 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2016-12-06 14:56 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Richardson, Bruce, Nélio Laranjeiro, dev, Olivier Matz, Lu,
	Wenzhuo, Adrien Mazarguil


> On Dec 6, 2016, at 8:45 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
> 
> 
> 
>> 
>> On Tue, Dec 06, 2016 at 12:41:00PM +0000, Ananyev, Konstantin wrote:
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Richardson, Bruce
>>>> Sent: Tuesday, December 6, 2016 11:55 AM
>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>> Cc: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; dev@dpdk.org; Olivier Matz <olivier.matz@6wind.com>; Lu, Wenzhuo
>>>> <wenzhuo.lu@intel.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>
>>>> Subject: Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
>>>> 
>>>> On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
>>>>> Hi Neilo,
>>>>> 
>>>>> 
>>>>> Hi Neilo,
>>>>>>>> 
>>>>>>>> This commit introduces new rte_{le,be}{16,32,64}_t types and updates
>>>>>>>> rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
>>>>>>>> accordingly.
>>>>>>>> 
>>>>>>>> Specific big/little endian types avoid uncertainty and conversion mistakes.
>>>>>>>> 
>>>>>>>> No ABI change since these are simply typedefs to the original types.
>>>>>>> 
>>>>>>> It seems like quite a lot of changes...
>>>>>>> Could you probably explain what will be the benefit in return?
>>>>>>> Konstantin
>>>>>> 
>>>>>> Hi Konstantin,
>>>>>> 
>>>>>> The benefit is to provide documented byte ordering for data types
>>>>>> software is manipulating to determine when network to CPU (or CPU to
>>>>>> network) conversion must be performed.
>>>>> 
>>>>> Ok, but is it really worth it?
>>>>> User can still make a mistake and forget to call ntoh()/hton() at some particular place.
>>>>> From other side most people do know that network protocols headers are usually in BE format.
>>>>> I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
>>>>> based on these special types or so.
>>>>> Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
>>>>> (and might be  in some others too) to be consistent?
>>>>> Konstantin
>>>>> 
>>>> 
>>>> I actually quite like this patch as I think it will help make things
>>>> clear when the user is possibly doing something wrong. I don't think we
>>>> need to globally change all PMDs to use the types, though.
>>> 
>>> Ok, so where do you believe we should draw a line?
>>> Why let say inside lib/librte_net people should use these typedefs, but
>>> inside drivers/net/ixgbe they don't?
>> 
>> Because those are not public APIs. It would be great if driver writers
>> used the typedefs, but I don't think it should be mandatory.
> 
> Ok, so only public API would have to use these typedefs when appropriate, correct?
> I still think that the effort to make these changes and keep this rule outweighs the benefit,
> but ok if everyone else think it is useful - I wouldn't object too much. 

I believe the effort and advantages to this change have no real benefit when you can document the type in the function header. Adding a structure around the simple type just adds more typing and still will be difficult to manage even if it gives some compiler checking. The change would not prevent someone putting a BE value into a LE variable, right?

I would not like to see this type of change when documentation would be enough here. Breaking the ABI is a big thing here for a large number of APIs. We keep breaking the ABI and we need to stop doing it on every release of DPDK.

> 
>> 
>>> 
>>>> 
>>>> One thing I'm wondering though, is if we might want to take this
>>>> further. For little endian environments, we could define the big endian
>>>> types as structs using typedefs, and similarly the le types on be
>>>> platforms, so that assigning from the non-native type to the native one
>>>> without a transformation function would cause a compiler error.
>>> 
>>> Not sure I understand you here.
>>> Could you possibly provide some example?
>>> 
>> typedef struct {
>> 	short val;
>> } rte_be16_t;
> 
> Hmm, so:
> uint32_t x = rte_be_to_cpu_32(1);
> would suddenly stop compiling?
> That definitely looks like an ABI breakage to me.
> Konstantin
> 
>> 
>> That way if you try to assign a value of type rte_be16_t to a uint16_t
>> variable you'll get a compiler error, unless you use an appropriate
>> conversion function. In short, it changes things from not just looking
>> wrong - which is the main purpose of Neilo's patchset - to actually
>> making it incorrect from the compiler's point of view too.
>> 
>> /Bruce

Regards,
Keith


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 13:34  0%           ` Bruce Richardson
@ 2016-12-06 14:45  3%             ` Ananyev, Konstantin
  2016-12-06 14:56  4%               ` Wiles, Keith
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-12-06 14:45 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Nélio Laranjeiro, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil



> 
> On Tue, Dec 06, 2016 at 12:41:00PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Richardson, Bruce
> > > Sent: Tuesday, December 6, 2016 11:55 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; dev@dpdk.org; Olivier Matz <olivier.matz@6wind.com>; Lu, Wenzhuo
> > > <wenzhuo.lu@intel.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Subject: Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
> > >
> > > On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
> > > > Hi Neilo,
> > > >
> > > >
> > > > Hi Neilo,
> > > > > > >
> > > > > > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > > > > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > > > > > accordingly.
> > > > > > >
> > > > > > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > > > > > >
> > > > > > > No ABI change since these are simply typedefs to the original types.
> > > > > >
> > > > > > It seems like quite a lot of changes...
> > > > > > Could you probably explain what will be the benefit in return?
> > > > > > Konstantin
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > The benefit is to provide documented byte ordering for data types
> > > > > software is manipulating to determine when network to CPU (or CPU to
> > > > > network) conversion must be performed.
> > > >
> > > > Ok, but is it really worth it?
> > > > User can still make a mistake and forget to call ntoh()/hton() at some particular place.
> > > > From other side most people do know that network protocols headers are usually in BE format.
> > > > I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
> > > > based on these special types or so.
> > > > Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
> > > > (and might be  in some others too) to be consistent?
> > > > Konstantin
> > > >
> > >
> > > I actually quite like this patch as I think it will help make things
> > > clear when the user is possibly doing something wrong. I don't think we
> > > need to globally change all PMDs to use the types, though.
> >
> > Ok, so where do you believe we should draw a line?
> > Why let say inside lib/librte_net people should use these typedefs, but
> > inside drivers/net/ixgbe they don't?
> 
> Because those are not public APIs. It would be great if driver writers
> used the typedefs, but I don't think it should be mandatory.

Ok, so only public API would have to use these typedefs when appropriate, correct?
I still think that the effort to make these changes and keep this rule outweighs the benefit,
but ok if everyone else think it is useful - I wouldn't object too much. 

> 
> >
> > >
> > > One thing I'm wondering though, is if we might want to take this
> > > further. For little endian environments, we could define the big endian
> > > types as structs using typedefs, and similarly the le types on be
> > > platforms, so that assigning from the non-native type to the native one
> > > without a transformation function would cause a compiler error.
> >
> > Not sure I understand you here.
> > Could you possibly provide some example?
> >
> typedef struct {
> 	short val;
> } rte_be16_t;

Hmm, so:
uint32_t x = rte_be_to_cpu_32(1);
would suddenly stop compiling?
That definitely looks like an ABI breakage to me.
Konstantin

> 
> That way if you try to assign a value of type rte_be16_t to a uint16_t
> variable you'll get a compiler error, unless you use an appropriate
> conversion function. In short, it changes things from not just looking
> wrong - which is the main purpose of Neilo's patchset - to actually
> making it incorrect from the compiler's point of view too.
> 
> /Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-05 12:06  0%   ` Nélio Laranjeiro
  2016-12-06 11:23  0%     ` Ananyev, Konstantin
@ 2016-12-06 14:06  0%     ` Wiles, Keith
  1 sibling, 0 replies; 200+ results
From: Wiles, Keith @ 2016-12-06 14:06 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: Ananyev, Konstantin, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil


> On Dec 5, 2016, at 6:06 AM, Nélio Laranjeiro <nelio.laranjeiro@6wind.com> wrote:
> 
> On Mon, Dec 05, 2016 at 10:09:05AM +0000, Ananyev, Konstantin wrote:
>> Hi Neilo,
>> 
>>> 
>>> This commit introduces new rte_{le,be}{16,32,64}_t types and updates
>>> rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
>>> accordingly.
>>> 
>>> Specific big/little endian types avoid uncertainty and conversion mistakes.
>>> 
>>> No ABI change since these are simply typedefs to the original types.
>> 
>> It seems like quite a lot of changes...
>> Could you probably explain what will be the benefit in return?
>> Konstantin
> 
> Hi Konstantin,
> 
> The benefit is to provide documented byte ordering for data types
> software is manipulating to determine when network to CPU (or CPU to
> network) conversion must be performed.

Why can we not just document the variables with doxygen as to the BE or LE expected type. Adding a new type is not going to solve the problem of someone using it incorrectly. In the function header doc it should state the expected type.

Adding yet another type is just going to confuse people as some drivers or code may never get changed. Also some drivers are common to many other systems, which means we would have to move the typedefs over to those systems like Linux and FreeBSD for the common sections.

> 
> Regards,
> 
> -- 
> Nélio Laranjeiro
> 6WIND

Regards,
Keith


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 12:41  0%         ` Ananyev, Konstantin
@ 2016-12-06 13:34  0%           ` Bruce Richardson
  2016-12-06 14:45  3%             ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2016-12-06 13:34 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Nélio Laranjeiro, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil

On Tue, Dec 06, 2016 at 12:41:00PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Tuesday, December 6, 2016 11:55 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; dev@dpdk.org; Olivier Matz <olivier.matz@6wind.com>; Lu, Wenzhuo
> > <wenzhuo.lu@intel.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > Subject: Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
> > 
> > On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
> > > Hi Neilo,
> > >
> > >
> > > Hi Neilo,
> > > > > >
> > > > > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > > > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > > > > accordingly.
> > > > > >
> > > > > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > > > > >
> > > > > > No ABI change since these are simply typedefs to the original types.
> > > > >
> > > > > It seems like quite a lot of changes...
> > > > > Could you probably explain what will be the benefit in return?
> > > > > Konstantin
> > > >
> > > > Hi Konstantin,
> > > >
> > > > The benefit is to provide documented byte ordering for data types
> > > > software is manipulating to determine when network to CPU (or CPU to
> > > > network) conversion must be performed.
> > >
> > > Ok, but is it really worth it?
> > > User can still make a mistake and forget to call ntoh()/hton() at some particular place.
> > > From other side most people do know that network protocols headers are usually in BE format.
> > > I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
> > > based on these special types or so.
> > > Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
> > > (and might be  in some others too) to be consistent?
> > > Konstantin
> > >
> > 
> > I actually quite like this patch as I think it will help make things
> > clear when the user is possibly doing something wrong. I don't think we
> > need to globally change all PMDs to use the types, though.
> 
> Ok, so where do you believe we should draw a line?
> Why let say inside lib/librte_net people should use these typedefs, but
> inside drivers/net/ixgbe they don't?

Because those are not public APIs. It would be great if driver writers
used the typedefs, but I don't think it should be mandatory.

>  
> > 
> > One thing I'm wondering though, is if we might want to take this
> > further. For little endian environments, we could define the big endian
> > types as structs using typedefs, and similarly the le types on be
> > platforms, so that assigning from the non-native type to the native one
> > without a transformation function would cause a compiler error.
> 
> Not sure I understand you here.
> Could you possibly provide some example?
> 
typedef struct {
	short val;
} rte_be16_t;

That way if you try to assign a value of type rte_be16_t to a uint16_t
variable you'll get a compiler error, unless you use an appropriate
conversion function. In short, it changes things from not just looking
wrong - which is the main purpose of Neilo's patchset - to actually
making it incorrect from the compiler's point of view too.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 13:14  0%         ` Nélio Laranjeiro
@ 2016-12-06 13:30  0%           ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2016-12-06 13:30 UTC (permalink / raw)
  To: Nélio Laranjeiro
  Cc: Ananyev, Konstantin, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil

On Tue, Dec 06, 2016 at 02:14:17PM +0100, Nélio Laranjeiro wrote:
> Hi Konstantin, Bruce,
> 
> On Tue, Dec 06, 2016 at 11:55:02AM +0000, Bruce Richardson wrote:
> > On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
> > > Hi Neilo,
> > > 
> > > 
> > > Hi Neilo,
> > > > > >
> > > > > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > > > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > > > > accordingly.
> > > > > >
> > > > > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > > > > >
> > > > > > No ABI change since these are simply typedefs to the original types.
> > > > >
> > > > > It seems like quite a lot of changes...
> > > > > Could you probably explain what will be the benefit in return?
> > > > > Konstantin
> > > > 
> > > > Hi Konstantin,
> > > > 
> > > > The benefit is to provide documented byte ordering for data types
> > > > software is manipulating to determine when network to CPU (or CPU to
> > > > network) conversion must be performed.
> > > 
> > > Ok, but is it really worth it?
> > > User can still make a mistake and forget to call ntoh()/hton() at some particular place.
> > > From other side most people do know that network protocols headers are usually in BE format. 
> > > I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
> > > based on these special types or so.
> > > Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
> > > (and might be  in some others too) to be consistent?
> > > Konstantin
> > > 
> > 
> > I actually quite like this patch as I think it will help make things
> > clear when the user is possibly doing something wrong. I don't think we
> > need to globally change all PMDs to use the types, though.
> 
> I agree, at least APIs should use this, PMDs can do as they want.
> 
> > One thing I'm wondering though, is if we might want to take this
> > further. For little endian environments, we could define the big endian
> > types as structs using typedefs, and similarly the le types on be
> > platforms, so that assigning from the non-native type to the native one
> > without a transformation function would cause a compiler error.
> > 
> > /Bruce
> 
> If I understand you correctly, this will break hton like functions which
> expects an uint*_t not a structure.
> 
Yes, it would break the standard ones, which is the downside of doing
this. We could try "fixing" that with a macro, but that too won't always
work. It's a question of whether the additional safety given by having
the compiler flag an error on an invalid assignment, e.g. of a big-endian
value to a native-little endian value, is worth having to change existing
code using htons to use e.g. rte_htons. Given the cost of changing a lot of
existing code, it may just not be worthwhile, but I thought I'd suggest
it anyway as a way of even better guaranteeing endian-ness safety.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 11:55  0%       ` Bruce Richardson
  2016-12-06 12:41  0%         ` Ananyev, Konstantin
@ 2016-12-06 13:14  0%         ` Nélio Laranjeiro
  2016-12-06 13:30  0%           ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Nélio Laranjeiro @ 2016-12-06 13:14 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Ananyev, Konstantin, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil

Hi Konstantin, Bruce,

On Tue, Dec 06, 2016 at 11:55:02AM +0000, Bruce Richardson wrote:
> On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
> > Hi Neilo,
> > 
> > 
> > Hi Neilo,
> > > > >
> > > > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > > > accordingly.
> > > > >
> > > > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > > > >
> > > > > No ABI change since these are simply typedefs to the original types.
> > > >
> > > > It seems like quite a lot of changes...
> > > > Could you probably explain what will be the benefit in return?
> > > > Konstantin
> > > 
> > > Hi Konstantin,
> > > 
> > > The benefit is to provide documented byte ordering for data types
> > > software is manipulating to determine when network to CPU (or CPU to
> > > network) conversion must be performed.
> > 
> > Ok, but is it really worth it?
> > User can still make a mistake and forget to call ntoh()/hton() at some particular place.
> > From other side most people do know that network protocols headers are usually in BE format. 
> > I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
> > based on these special types or so.
> > Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
> > (and might be  in some others too) to be consistent?
> > Konstantin
> > 
> 
> I actually quite like this patch as I think it will help make things
> clear when the user is possibly doing something wrong. I don't think we
> need to globally change all PMDs to use the types, though.

I agree, at least APIs should use this, PMDs can do as they want.

> One thing I'm wondering though, is if we might want to take this
> further. For little endian environments, we could define the big endian
> types as structs using typedefs, and similarly the le types on be
> platforms, so that assigning from the non-native type to the native one
> without a transformation function would cause a compiler error.
> 
> /Bruce

If I understand you correctly, this will break hton like functions which
expects an uint*_t not a structure.

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 11:55  0%       ` Bruce Richardson
@ 2016-12-06 12:41  0%         ` Ananyev, Konstantin
  2016-12-06 13:34  0%           ` Bruce Richardson
  2016-12-06 13:14  0%         ` Nélio Laranjeiro
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-12-06 12:41 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Nélio Laranjeiro, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil



> -----Original Message-----
> From: Richardson, Bruce
> Sent: Tuesday, December 6, 2016 11:55 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; dev@dpdk.org; Olivier Matz <olivier.matz@6wind.com>; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
> 
> On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
> > Hi Neilo,
> >
> >
> > Hi Neilo,
> > > > >
> > > > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > > > accordingly.
> > > > >
> > > > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > > > >
> > > > > No ABI change since these are simply typedefs to the original types.
> > > >
> > > > It seems like quite a lot of changes...
> > > > Could you probably explain what will be the benefit in return?
> > > > Konstantin
> > >
> > > Hi Konstantin,
> > >
> > > The benefit is to provide documented byte ordering for data types
> > > software is manipulating to determine when network to CPU (or CPU to
> > > network) conversion must be performed.
> >
> > Ok, but is it really worth it?
> > User can still make a mistake and forget to call ntoh()/hton() at some particular place.
> > From other side most people do know that network protocols headers are usually in BE format.
> > I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
> > based on these special types or so.
> > Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
> > (and might be  in some others too) to be consistent?
> > Konstantin
> >
> 
> I actually quite like this patch as I think it will help make things
> clear when the user is possibly doing something wrong. I don't think we
> need to globally change all PMDs to use the types, though.

Ok, so where do you believe we should draw a line?
Why let say inside lib/librte_net people should use these typedefs, but
inside drivers/net/ixgbe they don't?
 
> 
> One thing I'm wondering though, is if we might want to take this
> further. For little endian environments, we could define the big endian
> types as structs using typedefs, and similarly the le types on be
> platforms, so that assigning from the non-native type to the native one
> without a transformation function would cause a compiler error.

Not sure I understand you here.
Could you possibly provide some example?

Konstantin

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-06 11:23  0%     ` Ananyev, Konstantin
@ 2016-12-06 11:55  0%       ` Bruce Richardson
  2016-12-06 12:41  0%         ` Ananyev, Konstantin
  2016-12-06 13:14  0%         ` Nélio Laranjeiro
  0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2016-12-06 11:55 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Nélio Laranjeiro, dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil

On Tue, Dec 06, 2016 at 11:23:42AM +0000, Ananyev, Konstantin wrote:
> Hi Neilo,
> 
> 
> Hi Neilo,
> > > >
> > > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > > accordingly.
> > > >
> > > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > > >
> > > > No ABI change since these are simply typedefs to the original types.
> > >
> > > It seems like quite a lot of changes...
> > > Could you probably explain what will be the benefit in return?
> > > Konstantin
> > 
> > Hi Konstantin,
> > 
> > The benefit is to provide documented byte ordering for data types
> > software is manipulating to determine when network to CPU (or CPU to
> > network) conversion must be performed.
> 
> Ok, but is it really worth it?
> User can still make a mistake and forget to call ntoh()/hton() at some particular place.
> From other side most people do know that network protocols headers are usually in BE format. 
> I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
> based on these special types or so.
> Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
> (and might be  in some others too) to be consistent?
> Konstantin
> 

I actually quite like this patch as I think it will help make things
clear when the user is possibly doing something wrong. I don't think we
need to globally change all PMDs to use the types, though.

One thing I'm wondering though, is if we might want to take this
further. For little endian environments, we could define the big endian
types as structs using typedefs, and similarly the le types on be
platforms, so that assigning from the non-native type to the native one
without a transformation function would cause a compiler error.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-05 12:06  0%   ` Nélio Laranjeiro
@ 2016-12-06 11:23  0%     ` Ananyev, Konstantin
  2016-12-06 11:55  0%       ` Bruce Richardson
  2016-12-06 14:06  0%     ` Wiles, Keith
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-12-06 11:23 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil

Hi Neilo,


Hi Neilo,
> > >
> > > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > > accordingly.
> > >
> > > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > >
> > > No ABI change since these are simply typedefs to the original types.
> >
> > It seems like quite a lot of changes...
> > Could you probably explain what will be the benefit in return?
> > Konstantin
> 
> Hi Konstantin,
> 
> The benefit is to provide documented byte ordering for data types
> software is manipulating to determine when network to CPU (or CPU to
> network) conversion must be performed.

Ok, but is it really worth it?
User can still make a mistake and forget to call ntoh()/hton() at some particular place.
>From other side most people do know that network protocols headers are usually in BE format. 
I would understand the effort, if we'll have some sort of tool that would do some sort of static code analysis
based on these special types or so.
Again, does it mean that we should go and change uint32_t to rte_le_32 inside all Intel PMDs
(and might be  in some others too) to be consistent?
Konstantin

> 
> Regards,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Intent to upstream Atomic Rules net/ark "Arkville" in DPDK 17.05
  2016-12-03 15:14  3% [dpdk-dev] Intent to upstream Atomic Rules net/ark "Arkville" in DPDK 17.05 Shepard Siegel
@ 2016-12-05 14:10  0% ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2016-12-05 14:10 UTC (permalink / raw)
  To: Shepard Siegel, dev

On 12/3/2016 3:14 PM, Shepard Siegel wrote:
> Atomic Rules would like to include our Arkville DPDK PMD net/ark in the
> DPDK 17.05 release.

Welcome to the DPDK community, it is great to see new hardware support.

> We have been watching the recent process of
> Solarflare’s net/sfc upstreaming and we decided it would be too aggressive
> for us to get in on 17.02. Rather than be the last in queue for 17.02, we
> would prefer to be one of the first in the queue for 17.05. This post is
> our statement of that intent.

We already have two new physical PMDs this release: sfc and DPAA2 and a
few virtual ones, it is good for reviewers to have some in 17.05 J

> 
> 
> Arkville is a product from Atomic Rules which is a combination of hardware
> and software. In the DPDK community, the easy way to describe Arkville is
> that it is a line-rate agnostic FPGA-based NIC that does include any
> specific MAC. Arkville is unique in that the design process worked backward
> from the DPDK API/ABI to allow us to design RTL DPDK-aware data movers.
> Arkville’s customers are the small and brave set of users that demand an
> FPGA exist between their MAC ports and their host. A link to a slide deck
> and product preview shown last month at SC16 is at the end of this post.
> 
> 
> Although we’ve done substantial testing; we are just now setting up a
> proper DTS environment. Our first course of business is to add two 10 GbE
> ports and make Arkville look like a Fortville X710-DA2. This is strange for
> us because we started out with four 100 GbE ports, and not much else to
> talk to! We are eager to work with merchant 100 GbE ASIC NICs to help bring
> DTS into the 100 GbE realm. But 100 GbE aside, as soon as we see our
> net/ark PMD playing nice in DTS with a Fortville, and the 17.05 aperture
> opens;  we will commence the patch submission process.

You don't have to wait for 17.05 merge window, you can send the patches
with note that it is targeted for 17.05. It is good to send patches as
early as possible.

> 
> 
> Thanks all who have helped us get this far so soon. Anyone needing
> additional details that aren’t DPDK community wide, please contact me
> directly.
> 
> 
> Shep for AR Team
> 
> 
> Shepard Siegel, CTO
> 
> atomicrules.com
> 
> 
> 
> Links:
> 
> https://dl.dropboxusercontent.com/u/5548901/share/AtomicRules_Arkville_SC16.pdf
> 
> 
> <https://dl.dropboxusercontent.com/u/5548901/share/AtomicRules_Arkville_SC16.pdf>
> 
> https://forums.xilinx.com/t5/Xcell-Daily-Blog/BittWare-s-UltraScale-XUPP3R-board-and-Atomic-Rules-IP-run-Intel/ba-p/734110
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 2/2] eal: rename dev init API for consistency
  2016-12-05 10:24  3%       ` Jerin Jacob
@ 2016-12-05 14:03  0%         ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-12-05 14:03 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, declan.doherty, david.marchand, thomas.monjalon

On Monday 05 December 2016 03:54 PM, Jerin Jacob wrote:
> On Mon, Dec 05, 2016 at 03:42:18PM +0530, Shreyansh Jain wrote:
>> Hello Jerin,
>
> Hello Shreyansh,
>
>>
>> On Sunday 04 December 2016 02:25 AM, Jerin Jacob wrote:
>>> rte_eal_dev_init() is a misleading name.
>>> It actually performs the driver->probe for vdev,
>>> which is parallel to rte_eal_pci_probe.
>>>
>>> Changed to rte_eal_vdev_probe for consistency and
>>> moved the vdev specific probe to eal_common_vdev.c
>>>
>>> Suggested-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> ---
>>> +int
>>> +rte_eal_vdev_probe(void)
>>> +{
>>> +	struct rte_devargs *devargs;
>>> +
>>> +	/*
>>> +	 * Note that the dev_driver_list is populated here
>>> +	 * from calls made to rte_eal_driver_register from constructor functions
>>> +	 * embedded into PMD modules via the RTE_PMD_REGISTER_VDEV macro
>>> +	 */
>>> +
>>> +	/* call the init function for each virtual device */
>>> +	TAILQ_FOREACH(devargs, &devargs_list, next) {
>>> +
>>> +		if (devargs->type != RTE_DEVTYPE_VIRTUAL)
>>> +			continue;
>>> +
>>> +		if (rte_eal_vdev_init(devargs->virt.drv_name,
>>
>> The situation now is:
>> rte_eal_init=>rte_eal_vdev_probe()=>rte_eal_vdev_init()=> driver->probe()
>>
>> Even though I had suggested this, my intention was to completely do away
>> with rte_*_[v]dev_init as it is misleading.
>>
>> rte_eal_init=>rte_eal_vdev_probe=>driver->probe()
>
> IMO, We don't need to remove rte_eal_vdev_init() as it is an
> application API that uses to create vdev driver instance.Moreover,
> change and removing that name will result in ABI breakage.
>
> grep -ri "rte_eal_vdev_init" app/
> app/test/test_cryptodev.c:				ret = rte_eal_vdev_init(
> app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
> app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
> app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
> app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
> app/test/test_cryptodev.c: int dev_id = rte_eal_vdev_init(
> app/test/test_cryptodev.c: ret = rte_eal_vdev_init(
> app/test/test_cryptodev_perf.c: ret = rte_eal_vdev_init(
> app/test/test_cryptodev_perf.c:	ret = rte_eal_vdev_init(
> app/test/test_cryptodev_perf.c:	ret = rte_eal_vdev_init(
> app/test/test_cryptodev_perf.c:	ret = rte_eal_vdev_init(
>

Got it.

Have you noticed patches from Ben which actually merges init and probe 
all together? [1]. It is for PCI right now (and that too would break 
ABIs, I am assuming).

[1] http://dpdk.org/dev/patchwork/patch/17206/

>
>>
>> should be the ideal order, IMO.
>> Apologies, I was not completely clear then.
>>
>>> +					devargs->args)) {
>>> +			RTE_LOG(ERR, EAL, "failed to initialize %s device\n",
>>> +					devargs->virt.drv_name);
>>> +			return -1;
>>> +		}
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
>>> index 8840380..146f505 100644
>>> --- a/lib/librte_eal/common/include/rte_dev.h
>>> +++ b/lib/librte_eal/common/include/rte_dev.h
>>> @@ -171,9 +171,9 @@ void rte_eal_driver_register(struct rte_driver *driver);
>>>  void rte_eal_driver_unregister(struct rte_driver *driver);
>>>
>>>  /**
>>> - * Initalize all the registered drivers in this process
>>> + * Probe all the registered vdev drivers in this process
>>>   */
>>> -int rte_eal_dev_init(void);
>>> +int rte_eal_vdev_probe(void);
>>>
>>>  /**
>>>   * Initialize a driver specified by name.
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>>> index 16dd5b9..faf75cf 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>>> @@ -884,8 +884,8 @@ rte_eal_init(int argc, char **argv)
>>>  	if (rte_eal_pci_probe())
>>>  		rte_panic("Cannot probe PCI\n");
>>>
>>> -	if (rte_eal_dev_init() < 0)
>>> -		rte_panic("Cannot init pmd devices\n");
>>> +	if (rte_eal_vdev_probe() < 0)
>>> +		rte_panic("Cannot probe vdev drivers\n");
>>>
>>>  	rte_eal_mcfg_complete();
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> index 83721ba..67fc95b 100644
>>> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
>>> @@ -22,7 +22,7 @@ DPDK_2.0 {
>>>  	rte_dump_tailq;
>>>  	rte_eal_alarm_cancel;
>>>  	rte_eal_alarm_set;
>>> -	rte_eal_dev_init;
>>> +	rte_eal_vdev_probe;
>>>  	rte_eal_devargs_add;
>>>  	rte_eal_devargs_dump;
>>>  	rte_eal_devargs_type_count;
>>>
>>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-12-05 10:09  0% ` Ananyev, Konstantin
@ 2016-12-05 12:06  0%   ` Nélio Laranjeiro
  2016-12-06 11:23  0%     ` Ananyev, Konstantin
  2016-12-06 14:06  0%     ` Wiles, Keith
  0 siblings, 2 replies; 200+ results
From: Nélio Laranjeiro @ 2016-12-05 12:06 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, Olivier Matz, Lu, Wenzhuo, Adrien Mazarguil

On Mon, Dec 05, 2016 at 10:09:05AM +0000, Ananyev, Konstantin wrote:
> Hi Neilo,
> 
> > 
> > This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> > rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> > accordingly.
> > 
> > Specific big/little endian types avoid uncertainty and conversion mistakes.
> > 
> > No ABI change since these are simply typedefs to the original types.
> 
> It seems like quite a lot of changes...
> Could you probably explain what will be the benefit in return?
> Konstantin

Hi Konstantin,

The benefit is to provide documented byte ordering for data types
software is manipulating to determine when network to CPU (or CPU to
network) conversion must be performed.

Regards,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 2/2] eal: rename dev init API for consistency
  @ 2016-12-05 10:24  3%       ` Jerin Jacob
  2016-12-05 14:03  0%         ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2016-12-05 10:24 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, declan.doherty, david.marchand, thomas.monjalon

On Mon, Dec 05, 2016 at 03:42:18PM +0530, Shreyansh Jain wrote:
> Hello Jerin,

Hello Shreyansh,

> 
> On Sunday 04 December 2016 02:25 AM, Jerin Jacob wrote:
> > rte_eal_dev_init() is a misleading name.
> > It actually performs the driver->probe for vdev,
> > which is parallel to rte_eal_pci_probe.
> > 
> > Changed to rte_eal_vdev_probe for consistency and
> > moved the vdev specific probe to eal_common_vdev.c
> > 
> > Suggested-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> > +int
> > +rte_eal_vdev_probe(void)
> > +{
> > +	struct rte_devargs *devargs;
> > +
> > +	/*
> > +	 * Note that the dev_driver_list is populated here
> > +	 * from calls made to rte_eal_driver_register from constructor functions
> > +	 * embedded into PMD modules via the RTE_PMD_REGISTER_VDEV macro
> > +	 */
> > +
> > +	/* call the init function for each virtual device */
> > +	TAILQ_FOREACH(devargs, &devargs_list, next) {
> > +
> > +		if (devargs->type != RTE_DEVTYPE_VIRTUAL)
> > +			continue;
> > +
> > +		if (rte_eal_vdev_init(devargs->virt.drv_name,
> 
> The situation now is:
> rte_eal_init=>rte_eal_vdev_probe()=>rte_eal_vdev_init()=> driver->probe()
> 
> Even though I had suggested this, my intention was to completely do away
> with rte_*_[v]dev_init as it is misleading.
> 
> rte_eal_init=>rte_eal_vdev_probe=>driver->probe()

IMO, We don't need to remove rte_eal_vdev_init() as it is an
application API that uses to create vdev driver instance.Moreover,
change and removing that name will result in ABI breakage.

grep -ri "rte_eal_vdev_init" app/
app/test/test_cryptodev.c:				ret = rte_eal_vdev_init(
app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
app/test/test_cryptodev.c: TEST_ASSERT_SUCCESS(rte_eal_vdev_init(
app/test/test_cryptodev.c: int dev_id = rte_eal_vdev_init(
app/test/test_cryptodev.c: ret = rte_eal_vdev_init(
app/test/test_cryptodev_perf.c: ret = rte_eal_vdev_init(
app/test/test_cryptodev_perf.c:	ret = rte_eal_vdev_init(
app/test/test_cryptodev_perf.c:	ret = rte_eal_vdev_init(
app/test/test_cryptodev_perf.c:	ret = rte_eal_vdev_init(


> 
> should be the ideal order, IMO.
> Apologies, I was not completely clear then.
> 
> > +					devargs->args)) {
> > +			RTE_LOG(ERR, EAL, "failed to initialize %s device\n",
> > +					devargs->virt.drv_name);
> > +			return -1;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> > index 8840380..146f505 100644
> > --- a/lib/librte_eal/common/include/rte_dev.h
> > +++ b/lib/librte_eal/common/include/rte_dev.h
> > @@ -171,9 +171,9 @@ void rte_eal_driver_register(struct rte_driver *driver);
> >  void rte_eal_driver_unregister(struct rte_driver *driver);
> > 
> >  /**
> > - * Initalize all the registered drivers in this process
> > + * Probe all the registered vdev drivers in this process
> >   */
> > -int rte_eal_dev_init(void);
> > +int rte_eal_vdev_probe(void);
> > 
> >  /**
> >   * Initialize a driver specified by name.
> > diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> > index 16dd5b9..faf75cf 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal.c
> > @@ -884,8 +884,8 @@ rte_eal_init(int argc, char **argv)
> >  	if (rte_eal_pci_probe())
> >  		rte_panic("Cannot probe PCI\n");
> > 
> > -	if (rte_eal_dev_init() < 0)
> > -		rte_panic("Cannot init pmd devices\n");
> > +	if (rte_eal_vdev_probe() < 0)
> > +		rte_panic("Cannot probe vdev drivers\n");
> > 
> >  	rte_eal_mcfg_complete();
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> > index 83721ba..67fc95b 100644
> > --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> > +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> > @@ -22,7 +22,7 @@ DPDK_2.0 {
> >  	rte_dump_tailq;
> >  	rte_eal_alarm_cancel;
> >  	rte_eal_alarm_set;
> > -	rte_eal_dev_init;
> > +	rte_eal_vdev_probe;
> >  	rte_eal_devargs_add;
> >  	rte_eal_devargs_dump;
> >  	rte_eal_devargs_type_count;
> > 
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] net: introduce big and little endian types
  2016-11-09 15:04  2% [dpdk-dev] [PATCH] net: introduce big and little endian types Nelio Laranjeiro
@ 2016-12-05 10:09  0% ` Ananyev, Konstantin
  2016-12-05 12:06  0%   ` Nélio Laranjeiro
  2016-12-08  9:30  3% ` Nélio Laranjeiro
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2016-12-05 10:09 UTC (permalink / raw)
  To: Nelio Laranjeiro, dev, Olivier Matz; +Cc: Lu, Wenzhuo, Adrien Mazarguil

Hi Neilo,

> 
> This commit introduces new rte_{le,be}{16,32,64}_t types and updates
> rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
> accordingly.
> 
> Specific big/little endian types avoid uncertainty and conversion mistakes.
> 
> No ABI change since these are simply typedefs to the original types.

It seems like quite a lot of changes...
Could you probably explain what will be the benefit in return?
Konstantin

> 
> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> ---
>  .../common/include/generic/rte_byteorder.h         | 31 +++++++++++-------
>  lib/librte_net/rte_arp.h                           | 15 +++++----
>  lib/librte_net/rte_ether.h                         | 10 +++---
>  lib/librte_net/rte_gre.h                           | 30 ++++++++---------
>  lib/librte_net/rte_icmp.h                          | 11 ++++---
>  lib/librte_net/rte_ip.h                            | 38 +++++++++++-----------
>  lib/librte_net/rte_net.c                           | 10 +++---
>  lib/librte_net/rte_sctp.h                          |  9 ++---
>  lib/librte_net/rte_tcp.h                           | 19 ++++++-----
>  lib/librte_net/rte_udp.h                           |  9 ++---
>  10 files changed, 97 insertions(+), 85 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/generic/rte_byteorder.h b/lib/librte_eal/common/include/generic/rte_byteorder.h
> index e00bccb..059c2a5 100644
> --- a/lib/librte_eal/common/include/generic/rte_byteorder.h
> +++ b/lib/librte_eal/common/include/generic/rte_byteorder.h
> @@ -75,6 +75,13 @@
>  #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
>  #endif
> 
> +typedef uint16_t rte_be16_t;
> +typedef uint32_t rte_be32_t;
> +typedef uint64_t rte_be64_t;
> +typedef uint16_t rte_le16_t;
> +typedef uint32_t rte_le32_t;
> +typedef uint64_t rte_le64_t;
> +
>  /*
>   * An internal function to swap bytes in a 16-bit value.
>   *
> @@ -143,65 +150,65 @@ static uint64_t rte_bswap64(uint64_t x);
>  /**
>   * Convert a 16-bit value from CPU order to little endian.
>   */
> -static uint16_t rte_cpu_to_le_16(uint16_t x);
> +static rte_le16_t rte_cpu_to_le_16(uint16_t x);
> 
>  /**
>   * Convert a 32-bit value from CPU order to little endian.
>   */
> -static uint32_t rte_cpu_to_le_32(uint32_t x);
> +static rte_le32_t rte_cpu_to_le_32(uint32_t x);
> 
>  /**
>   * Convert a 64-bit value from CPU order to little endian.
>   */
> -static uint64_t rte_cpu_to_le_64(uint64_t x);
> +static rte_le64_t rte_cpu_to_le_64(uint64_t x);
> 
> 
>  /**
>   * Convert a 16-bit value from CPU order to big endian.
>   */
> -static uint16_t rte_cpu_to_be_16(uint16_t x);
> +static rte_be16_t rte_cpu_to_be_16(uint16_t x);
> 
>  /**
>   * Convert a 32-bit value from CPU order to big endian.
>   */
> -static uint32_t rte_cpu_to_be_32(uint32_t x);
> +static rte_be32_t rte_cpu_to_be_32(uint32_t x);
> 
>  /**
>   * Convert a 64-bit value from CPU order to big endian.
>   */
> -static uint64_t rte_cpu_to_be_64(uint64_t x);
> +static rte_be64_t rte_cpu_to_be_64(uint64_t x);
> 
> 
>  /**
>   * Convert a 16-bit value from little endian to CPU order.
>   */
> -static uint16_t rte_le_to_cpu_16(uint16_t x);
> +static uint16_t rte_le_to_cpu_16(rte_le16_t x);
> 
>  /**
>   * Convert a 32-bit value from little endian to CPU order.
>   */
> -static uint32_t rte_le_to_cpu_32(uint32_t x);
> +static uint32_t rte_le_to_cpu_32(rte_le32_t x);
> 
>  /**
>   * Convert a 64-bit value from little endian to CPU order.
>   */
> -static uint64_t rte_le_to_cpu_64(uint64_t x);
> +static uint64_t rte_le_to_cpu_64(rte_le64_t x);
> 
> 
>  /**
>   * Convert a 16-bit value from big endian to CPU order.
>   */
> -static uint16_t rte_be_to_cpu_16(uint16_t x);
> +static uint16_t rte_be_to_cpu_16(rte_be16_t x);
> 
>  /**
>   * Convert a 32-bit value from big endian to CPU order.
>   */
> -static uint32_t rte_be_to_cpu_32(uint32_t x);
> +static uint32_t rte_be_to_cpu_32(rte_be32_t x);
> 
>  /**
>   * Convert a 64-bit value from big endian to CPU order.
>   */
> -static uint64_t rte_be_to_cpu_64(uint64_t x);
> +static uint64_t rte_be_to_cpu_64(rte_be64_t x);
> 
>  #endif /* __DOXYGEN__ */
> 
> diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h
> index 1836418..95f123e 100644
> --- a/lib/librte_net/rte_arp.h
> +++ b/lib/librte_net/rte_arp.h
> @@ -40,6 +40,7 @@
> 
>  #include <stdint.h>
>  #include <rte_ether.h>
> +#include <rte_byteorder.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -50,22 +51,22 @@ extern "C" {
>   */
>  struct arp_ipv4 {
>  	struct ether_addr arp_sha;  /**< sender hardware address */
> -	uint32_t          arp_sip;  /**< sender IP address */
> +	rte_be32_t        arp_sip;  /**< sender IP address */
>  	struct ether_addr arp_tha;  /**< target hardware address */
> -	uint32_t          arp_tip;  /**< target IP address */
> +	rte_be32_t        arp_tip;  /**< target IP address */
>  } __attribute__((__packed__));
> 
>  /**
>   * ARP header.
>   */
>  struct arp_hdr {
> -	uint16_t arp_hrd;    /* format of hardware address */
> +	rte_be16_t arp_hrd;  /* format of hardware address */
>  #define ARP_HRD_ETHER     1  /* ARP Ethernet address format */
> 
> -	uint16_t arp_pro;    /* format of protocol address */
> -	uint8_t  arp_hln;    /* length of hardware address */
> -	uint8_t  arp_pln;    /* length of protocol address */
> -	uint16_t arp_op;     /* ARP opcode (command) */
> +	rte_be16_t arp_pro;  /* format of protocol address */
> +	uint8_t    arp_hln;  /* length of hardware address */
> +	uint8_t    arp_pln;  /* length of protocol address */
> +	rte_be16_t arp_op;   /* ARP opcode (command) */
>  #define	ARP_OP_REQUEST    1 /* request to resolve address */
>  #define	ARP_OP_REPLY      2 /* response to previous request */
>  #define	ARP_OP_REVREQUEST 3 /* request proto addr given hardware */
> diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> index ff3d065..159e061 100644
> --- a/lib/librte_net/rte_ether.h
> +++ b/lib/librte_net/rte_ether.h
> @@ -300,7 +300,7 @@ ether_format_addr(char *buf, uint16_t size,
>  struct ether_hdr {
>  	struct ether_addr d_addr; /**< Destination address. */
>  	struct ether_addr s_addr; /**< Source address. */
> -	uint16_t ether_type;      /**< Frame type. */
> +	rte_be16_t ether_type;    /**< Frame type. */
>  } __attribute__((__packed__));
> 
>  /**
> @@ -309,8 +309,8 @@ struct ether_hdr {
>   * of the encapsulated frame.
>   */
>  struct vlan_hdr {
> -	uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
> -	uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
> +	rte_be16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
> +	rte_be16_t eth_proto;/**< Ethernet type of encapsulated frame. */
>  } __attribute__((__packed__));
> 
>  /**
> @@ -319,8 +319,8 @@ struct vlan_hdr {
>   * Reserved fields (24 bits and 8 bits)
>   */
>  struct vxlan_hdr {
> -	uint32_t vx_flags; /**< flag (8) + Reserved (24). */
> -	uint32_t vx_vni;   /**< VNI (24) + Reserved (8). */
> +	rte_be32_t vx_flags; /**< flag (8) + Reserved (24). */
> +	rte_be32_t vx_vni;   /**< VNI (24) + Reserved (8). */
>  } __attribute__((__packed__));
> 
>  /* Ethernet frame types */
> diff --git a/lib/librte_net/rte_gre.h b/lib/librte_net/rte_gre.h
> index 46568ff..b651af0 100644
> --- a/lib/librte_net/rte_gre.h
> +++ b/lib/librte_net/rte_gre.h
> @@ -45,23 +45,23 @@ extern "C" {
>   */
>  struct gre_hdr {
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> -	uint16_t res2:4; /**< Reserved */
> -	uint16_t s:1;    /**< Sequence Number Present bit */
> -	uint16_t k:1;    /**< Key Present bit */
> -	uint16_t res1:1; /**< Reserved */
> -	uint16_t c:1;    /**< Checksum Present bit */
> -	uint16_t ver:3;  /**< Version Number */
> -	uint16_t res3:5; /**< Reserved */
> +	uint16_t res2:4;  /**< Reserved */
> +	uint16_t s:1;     /**< Sequence Number Present bit */
> +	uint16_t k:1;     /**< Key Present bit */
> +	uint16_t res1:1;  /**< Reserved */
> +	uint16_t c:1;     /**< Checksum Present bit */
> +	uint16_t ver:3;   /**< Version Number */
> +	uint16_t res3:5;  /**< Reserved */
>  #elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> -	uint16_t c:1;    /**< Checksum Present bit */
> -	uint16_t res1:1; /**< Reserved */
> -	uint16_t k:1;    /**< Key Present bit */
> -	uint16_t s:1;    /**< Sequence Number Present bit */
> -	uint16_t res2:4; /**< Reserved */
> -	uint16_t res3:5; /**< Reserved */
> -	uint16_t ver:3;  /**< Version Number */
> +	uint16_t c:1;     /**< Checksum Present bit */
> +	uint16_t res1:1;  /**< Reserved */
> +	uint16_t k:1;     /**< Key Present bit */
> +	uint16_t s:1;     /**< Sequence Number Present bit */
> +	uint16_t res2:4;  /**< Reserved */
> +	uint16_t res3:5;  /**< Reserved */
> +	uint16_t ver:3;   /**< Version Number */
>  #endif
> -	uint16_t proto;  /**< Protocol Type */
> +	rte_be16_t proto; /**< Protocol Type */
>  } __attribute__((__packed__));
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_net/rte_icmp.h b/lib/librte_net/rte_icmp.h
> index 8b287f6..81bd907 100644
> --- a/lib/librte_net/rte_icmp.h
> +++ b/lib/librte_net/rte_icmp.h
> @@ -74,6 +74,7 @@
>   */
> 
>  #include <stdint.h>
> +#include <rte_byteorder.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -83,11 +84,11 @@ extern "C" {
>   * ICMP Header
>   */
>  struct icmp_hdr {
> -	uint8_t  icmp_type;   /* ICMP packet type. */
> -	uint8_t  icmp_code;   /* ICMP packet code. */
> -	uint16_t icmp_cksum;  /* ICMP packet checksum. */
> -	uint16_t icmp_ident;  /* ICMP packet identifier. */
> -	uint16_t icmp_seq_nb; /* ICMP packet sequence number. */
> +	uint8_t  icmp_type;     /* ICMP packet type. */
> +	uint8_t  icmp_code;     /* ICMP packet code. */
> +	rte_be16_t icmp_cksum;  /* ICMP packet checksum. */
> +	rte_be16_t icmp_ident;  /* ICMP packet identifier. */
> +	rte_be16_t icmp_seq_nb; /* ICMP packet sequence number. */
>  } __attribute__((__packed__));
> 
>  /* ICMP packet types */
> diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
> index 4491b86..6f7da36 100644
> --- a/lib/librte_net/rte_ip.h
> +++ b/lib/librte_net/rte_ip.h
> @@ -93,14 +93,14 @@ extern "C" {
>  struct ipv4_hdr {
>  	uint8_t  version_ihl;		/**< version and header length */
>  	uint8_t  type_of_service;	/**< type of service */
> -	uint16_t total_length;		/**< length of packet */
> -	uint16_t packet_id;		/**< packet ID */
> -	uint16_t fragment_offset;	/**< fragmentation offset */
> +	rte_be16_t total_length;	/**< length of packet */
> +	rte_be16_t packet_id;		/**< packet ID */
> +	rte_be16_t fragment_offset;	/**< fragmentation offset */
>  	uint8_t  time_to_live;		/**< time to live */
>  	uint8_t  next_proto_id;		/**< protocol ID */
> -	uint16_t hdr_checksum;		/**< header checksum */
> -	uint32_t src_addr;		/**< source address */
> -	uint32_t dst_addr;		/**< destination address */
> +	rte_be16_t hdr_checksum;	/**< header checksum */
> +	rte_be32_t src_addr;		/**< source address */
> +	rte_be32_t dst_addr;		/**< destination address */
>  } __attribute__((__packed__));
> 
>  /** Create IPv4 address */
> @@ -340,11 +340,11 @@ static inline uint16_t
>  rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr, uint64_t ol_flags)
>  {
>  	struct ipv4_psd_header {
> -		uint32_t src_addr; /* IP address of source host. */
> -		uint32_t dst_addr; /* IP address of destination host. */
> -		uint8_t  zero;     /* zero. */
> -		uint8_t  proto;    /* L4 protocol type. */
> -		uint16_t len;      /* L4 length. */
> +		rte_be32_t src_addr; /* IP address of source host. */
> +		rte_be32_t dst_addr; /* IP address of destination host. */
> +		uint8_t    zero;     /* zero. */
> +		uint8_t    proto;    /* L4 protocol type. */
> +		rte_be16_t len;      /* L4 length. */
>  	} psd_hdr;
> 
>  	psd_hdr.src_addr = ipv4_hdr->src_addr;
> @@ -398,12 +398,12 @@ rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
>   * IPv6 Header
>   */
>  struct ipv6_hdr {
> -	uint32_t vtc_flow;     /**< IP version, traffic class & flow label. */
> -	uint16_t payload_len;  /**< IP packet length - includes sizeof(ip_header). */
> -	uint8_t  proto;        /**< Protocol, next header. */
> -	uint8_t  hop_limits;   /**< Hop limits. */
> -	uint8_t  src_addr[16]; /**< IP address of source host. */
> -	uint8_t  dst_addr[16]; /**< IP address of destination host(s). */
> +	rte_be32_t vtc_flow;     /**< IP version, traffic class & flow label. */
> +	rte_be16_t payload_len;  /**< IP packet length - includes sizeof(ip_header). */
> +	uint8_t    proto;        /**< Protocol, next header. */
> +	uint8_t    hop_limits;   /**< Hop limits. */
> +	uint8_t    src_addr[16]; /**< IP address of source host. */
> +	uint8_t    dst_addr[16]; /**< IP address of destination host(s). */
>  } __attribute__((__packed__));
> 
>  /**
> @@ -427,8 +427,8 @@ rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
>  {
>  	uint32_t sum;
>  	struct {
> -		uint32_t len;   /* L4 length. */
> -		uint32_t proto; /* L4 protocol - top 3 bytes must be zero */
> +		rte_be32_t len;   /* L4 length. */
> +		rte_be32_t proto; /* L4 protocol - top 3 bytes must be zero */
>  	} psd_hdr;
> 
>  	psd_hdr.proto = (ipv6_hdr->proto << 24);
> diff --git a/lib/librte_net/rte_net.c b/lib/librte_net/rte_net.c
> index a8c7aff..9014ca5 100644
> --- a/lib/librte_net/rte_net.c
> +++ b/lib/librte_net/rte_net.c
> @@ -153,8 +153,8 @@ ptype_inner_l4(uint8_t proto)
> 
>  /* get the tunnel packet type if any, update proto and off. */
>  static uint32_t
> -ptype_tunnel(uint16_t *proto, const struct rte_mbuf *m,
> -	uint32_t *off)
> +ptype_tunnel(rte_be16_t *proto, const struct rte_mbuf *m,
> +	     uint32_t *off)
>  {
>  	switch (*proto) {
>  	case IPPROTO_GRE: {
> @@ -208,8 +208,8 @@ ip4_hlen(const struct ipv4_hdr *hdr)
> 
>  /* parse ipv6 extended headers, update offset and return next proto */
>  static uint16_t
> -skip_ip6_ext(uint16_t proto, const struct rte_mbuf *m, uint32_t *off,
> -	int *frag)
> +skip_ip6_ext(rte_be16_t proto, const struct rte_mbuf *m, uint32_t *off,
> +	     int *frag)
>  {
>  	struct ext_hdr {
>  		uint8_t next_hdr;
> @@ -261,7 +261,7 @@ uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
>  	struct ether_hdr eh_copy;
>  	uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
>  	uint32_t off = 0;
> -	uint16_t proto;
> +	rte_be16_t proto;
> 
>  	if (hdr_lens == NULL)
>  		hdr_lens = &local_hdr_lens;
> diff --git a/lib/librte_net/rte_sctp.h b/lib/librte_net/rte_sctp.h
> index 688e126..8c646c7 100644
> --- a/lib/librte_net/rte_sctp.h
> +++ b/lib/librte_net/rte_sctp.h
> @@ -81,15 +81,16 @@ extern "C" {
>  #endif
> 
>  #include <stdint.h>
> +#include <rte_byteorder.h>
> 
>  /**
>   * SCTP Header
>   */
>  struct sctp_hdr {
> -	uint16_t src_port; /**< Source port. */
> -	uint16_t dst_port; /**< Destin port. */
> -	uint32_t tag;      /**< Validation tag. */
> -	uint32_t cksum;    /**< Checksum. */
> +	rte_be16_t src_port; /**< Source port. */
> +	rte_be16_t dst_port; /**< Destin port. */
> +	rte_be32_t tag;      /**< Validation tag. */
> +	rte_le32_t cksum;    /**< Checksum. */
>  } __attribute__((__packed__));
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_net/rte_tcp.h b/lib/librte_net/rte_tcp.h
> index 28b61e6..545d4ab 100644
> --- a/lib/librte_net/rte_tcp.h
> +++ b/lib/librte_net/rte_tcp.h
> @@ -77,6 +77,7 @@
>   */
> 
>  #include <stdint.h>
> +#include <rte_byteorder.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -86,15 +87,15 @@ extern "C" {
>   * TCP Header
>   */
>  struct tcp_hdr {
> -	uint16_t src_port;  /**< TCP source port. */
> -	uint16_t dst_port;  /**< TCP destination port. */
> -	uint32_t sent_seq;  /**< TX data sequence number. */
> -	uint32_t recv_ack;  /**< RX data acknowledgement sequence number. */
> -	uint8_t  data_off;  /**< Data offset. */
> -	uint8_t  tcp_flags; /**< TCP flags */
> -	uint16_t rx_win;    /**< RX flow control window. */
> -	uint16_t cksum;     /**< TCP checksum. */
> -	uint16_t tcp_urp;   /**< TCP urgent pointer, if any. */
> +	rte_be16_t src_port;  /**< TCP source port. */
> +	rte_be16_t dst_port;  /**< TCP destination port. */
> +	rte_be32_t sent_seq;  /**< TX data sequence number. */
> +	rte_be32_t recv_ack;  /**< RX data acknowledgement sequence number. */
> +	uint8_t    data_off;  /**< Data offset. */
> +	uint8_t    tcp_flags; /**< TCP flags */
> +	rte_be16_t rx_win;    /**< RX flow control window. */
> +	rte_be16_t cksum;     /**< TCP checksum. */
> +	rte_be16_t tcp_urp;   /**< TCP urgent pointer, if any. */
>  } __attribute__((__packed__));
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_net/rte_udp.h b/lib/librte_net/rte_udp.h
> index bc5be4a..89fdded 100644
> --- a/lib/librte_net/rte_udp.h
> +++ b/lib/librte_net/rte_udp.h
> @@ -77,6 +77,7 @@
>   */
> 
>  #include <stdint.h>
> +#include <rte_byteorder.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -86,10 +87,10 @@ extern "C" {
>   * UDP Header
>   */
>  struct udp_hdr {
> -	uint16_t src_port;    /**< UDP source port. */
> -	uint16_t dst_port;    /**< UDP destination port. */
> -	uint16_t dgram_len;   /**< UDP datagram length */
> -	uint16_t dgram_cksum; /**< UDP datagram checksum */
> +	rte_be16_t src_port;    /**< UDP source port. */
> +	rte_be16_t dst_port;    /**< UDP destination port. */
> +	rte_be16_t dgram_len;   /**< UDP datagram length */
> +	rte_be16_t dgram_cksum; /**< UDP datagram checksum */
>  } __attribute__((__packed__));
> 
>  #ifdef __cplusplus
> --
> 2.1.4

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Intent to upstream Atomic Rules net/ark "Arkville" in DPDK 17.05
@ 2016-12-03 15:14  3% Shepard Siegel
  2016-12-05 14:10  0% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Shepard Siegel @ 2016-12-03 15:14 UTC (permalink / raw)
  To: dev

Atomic Rules would like to include our Arkville DPDK PMD net/ark in the
DPDK 17.05 release. We have been watching the recent process of
Solarflare’s net/sfc upstreaming and we decided it would be too aggressive
for us to get in on 17.02. Rather than be the last in queue for 17.02, we
would prefer to be one of the first in the queue for 17.05. This post is
our statement of that intent.


Arkville is a product from Atomic Rules which is a combination of hardware
and software. In the DPDK community, the easy way to describe Arkville is
that it is a line-rate agnostic FPGA-based NIC that does include any
specific MAC. Arkville is unique in that the design process worked backward
from the DPDK API/ABI to allow us to design RTL DPDK-aware data movers.
Arkville’s customers are the small and brave set of users that demand an
FPGA exist between their MAC ports and their host. A link to a slide deck
and product preview shown last month at SC16 is at the end of this post.


Although we’ve done substantial testing; we are just now setting up a
proper DTS environment. Our first course of business is to add two 10 GbE
ports and make Arkville look like a Fortville X710-DA2. This is strange for
us because we started out with four 100 GbE ports, and not much else to
talk to! We are eager to work with merchant 100 GbE ASIC NICs to help bring
DTS into the 100 GbE realm. But 100 GbE aside, as soon as we see our
net/ark PMD playing nice in DTS with a Fortville, and the 17.05 aperture
opens;  we will commence the patch submission process.


Thanks all who have helped us get this far so soon. Anyone needing
additional details that aren’t DPDK community wide, please contact me
directly.


Shep for AR Team


Shepard Siegel, CTO

atomicrules.com



Links:

https://dl.dropboxusercontent.com/u/5548901/share/AtomicRules_Arkville_SC16.pdf


<https://dl.dropboxusercontent.com/u/5548901/share/AtomicRules_Arkville_SC16.pdf>

https://forums.xilinx.com/t5/Xcell-Daily-Blog/BittWare-s-UltraScale-XUPP3R-board-and-Atomic-Rules-IP-run-Intel/ba-p/734110

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-12-01  8:36  2%       ` Adrien Mazarguil
@ 2016-12-02 21:06  0%         ` Kevin Traynor
  2016-12-06 18:11  0%           ` Chandran, Sugesh
  2016-12-08 17:07  3%           ` Adrien Mazarguil
  0 siblings, 2 replies; 200+ results
From: Kevin Traynor @ 2016-12-02 21:06 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: dev, Thomas Monjalon, Pablo de Lara, Olivier Matz, sugesh.chandran

On 12/01/2016 08:36 AM, Adrien Mazarguil wrote:
> Hi Kevin,
> 
> On Wed, Nov 30, 2016 at 05:47:17PM +0000, Kevin Traynor wrote:
>> Hi Adrien,
>>
>> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
>>> This new API supersedes all the legacy filter types described in
>>> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
>>> PMDs to process and validate flow rules.
>>>
>>> Benefits:
>>>
>>> - A unified API is easier to program for, applications do not have to be
>>>   written for a specific filter type which may or may not be supported by
>>>   the underlying device.
>>>
>>> - The behavior of a flow rule is the same regardless of the underlying
>>>   device, applications do not need to be aware of hardware quirks.
>>>
>>> - Extensible by design, API/ABI breakage should rarely occur if at all.
>>>
>>> - Documentation is self-standing, no need to look up elsewhere.
>>>
>>> Existing filter types will be deprecated and removed in the near future.
>>
>> I'd suggest to add a deprecation notice to deprecation.rst, ideally with
>> a target release.
> 
> Will do, not a sure about the target release though. It seems a bit early
> since no PMD really supports this API yet.
> 
> [...]
>>> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
>>> new file mode 100644
>>> index 0000000..064963d
>>> --- /dev/null
>>> +++ b/lib/librte_ether/rte_flow.c
>>> @@ -0,0 +1,159 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright 2016 6WIND S.A.
>>> + *   Copyright 2016 Mellanox.
>>
>> There's Mellanox copyright but you are the only signed-off-by - is that
>> right?
> 
> Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed
> on their behalf to expose several features from mlx4/mlx5 as the existing
> filter types had too many limitations.
> 
> [...]
>>> +/* Get generic flow operations structure from a port. */
>>> +const struct rte_flow_ops *
>>> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
>>> +{
>>> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>>> +	const struct rte_flow_ops *ops;
>>> +	int code;
>>> +
>>> +	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
>>> +		code = ENODEV;
>>> +	else if (unlikely(!dev->dev_ops->filter_ctrl ||
>>> +			  dev->dev_ops->filter_ctrl(dev,
>>> +						    RTE_ETH_FILTER_GENERIC,
>>> +						    RTE_ETH_FILTER_GET,
>>> +						    &ops) ||
>>> +			  !ops))
>>> +		code = ENOTSUP;
>>> +	else
>>> +		return ops;
>>> +	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
>>> +			   NULL, rte_strerror(code));
>>> +	return NULL;
>>> +}
>>> +
>>
>> Is it expected that the application or pmd will provide locking between
>> these functions if required? I think it's going to have to be the app.
> 
> Locking is indeed expected to be performed by applications. This API only
> documents places where locking would make sense if necessary and expected
> behavior.
> 
> Like all control path APIs, this one assumes a single control thread.
> Applications must take the necessary precautions.

If you look at OVS now it's quite possible that you have 2 rx queues
serviced by different threads, that would also install the flow rules in
the software flow caches - possibly that could extend to adding hardware
flows. There could also be another thread that is querying for stats. So
anything that can be done to minimise the locking would be helpful -
maybe query() could be atomic and not require any locking?

> 
> [...]
>>> +/**
>>> + * Flow rule attributes.
>>> + *
>>> + * Priorities are set on two levels: per group and per rule within groups.
>>> + *
>>> + * Lower values denote higher priority, the highest priority for both levels
>>> + * is 0, so that a rule with priority 0 in group 8 is always matched after a
>>> + * rule with priority 8 in group 0.
>>> + *
>>> + * Although optional, applications are encouraged to group similar rules as
>>> + * much as possible to fully take advantage of hardware capabilities
>>> + * (e.g. optimized matching) and work around limitations (e.g. a single
>>> + * pattern type possibly allowed in a given group).
>>> + *
>>> + * Group and priority levels are arbitrary and up to the application, they
>>> + * do not need to be contiguous nor start from 0, however the maximum number
>>> + * varies between devices and may be affected by existing flow rules.
>>> + *
>>> + * If a packet is matched by several rules of a given group for a given
>>> + * priority level, the outcome is undefined. It can take any path, may be
>>> + * duplicated or even cause unrecoverable errors.
>>
>> I get what you are trying to do here wrt supporting multiple
>> pmds/hardware implementations and it's a good idea to keep it flexible.
>>
>> Given that the outcome is undefined, it would be nice that the
>> application has a way of finding the specific effects for verification
>> and debugging.
> 
> Right, however it was deemed a bit difficult to manage in many cases hence
> the vagueness.
> 
> For example, suppose two rules with the same group and priority, one
> matching any IPv4 header, the other one any UDP header:
> 
> - TCPv4 packets => rule #1.
> - UDPv6 packets => rule #2.
> - UDPv4 packets => both?
> 
> That last one is perhaps invalid, checking that some unspecified protocol
> combination does not overlap is expensive and may miss corner cases, even
> assuming this is not an issue, what if the application guarantees that no
> UDPv4 packets can ever hit that rule?

that's fine - I don't expect the software to be able to know what the
hardware will do with those rules. It's more about trying to get a dump
from the hardware if something goes wrong. Anyway covered in comment later.

> 
> Suggestions are welcome though, perhaps we can refine the description
> 
>>> + *
>>> + * Note that support for more than a single group and priority level is not
>>> + * guaranteed.
>>> + *
>>> + * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
>>> + *
>>> + * Several pattern items and actions are valid and can be used in both
>>> + * directions. Those valid for only one direction are described as such.
>>> + *
>>> + * Specifying both directions at once is not recommended but may be valid in
>>> + * some cases, such as incrementing the same counter twice.
>>> + *
>>> + * Not specifying any direction is currently an error.
>>> + */
>>> +struct rte_flow_attr {
>>> +	uint32_t group; /**< Priority group. */
>>> +	uint32_t priority; /**< Priority level within group. */
>>> +	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
>>> +	uint32_t egress:1; /**< Rule applies to egress traffic. */
>>> +	uint32_t reserved:30; /**< Reserved, must be zero. */
>>> +};
> [...]
>>> +/**
>>> + * RTE_FLOW_ITEM_TYPE_VF
>>> + *
>>> + * Matches packets addressed to a virtual function ID of the device.
>>> + *
>>> + * If the underlying device function differs from the one that would
>>> + * normally receive the matched traffic, specifying this item prevents it
>>> + * from reaching that device unless the flow rule contains a VF
>>> + * action. Packets are not duplicated between device instances by default.
>>> + *
>>> + * - Likely to return an error or never match any traffic if this causes a
>>> + *   VF device to match traffic addressed to a different VF.
>>> + * - Can be specified multiple times to match traffic addressed to several
>>> + *   specific VFs.
>>> + * - Can be combined with a PF item to match both PF and VF traffic.
>>> + *
>>> + * A zeroed mask can be used to match any VF.
>>
>> can you refer explicitly to id
> 
> If you mean "VF" to "VF ID" then yes, will do it for v2.
> 
>>> + */
>>> +struct rte_flow_item_vf {
>>> +	uint32_t id; /**< Destination VF ID. */
>>> +};
> [...]
>>> +/**
>>> + * Matching pattern item definition.
>>> + *
>>> + * A pattern is formed by stacking items starting from the lowest protocol
>>> + * layer to match. This stacking restriction does not apply to meta items
>>> + * which can be placed anywhere in the stack with no effect on the meaning
>>> + * of the resulting pattern.
>>> + *
>>> + * A stack is terminated by a END item.
>>> + *
>>> + * The spec field should be a valid pointer to a structure of the related
>>> + * item type. It may be set to NULL in many cases to use default values.
>>> + *
>>> + * Optionally, last can point to a structure of the same type to define an
>>> + * inclusive range. This is mostly supported by integer and address fields,
>>> + * may cause errors otherwise. Fields that do not support ranges must be set
>>> + * to the same value as their spec counterparts.
>>> + *
>>> + * By default all fields present in spec are considered relevant.* This
>>
>> typo "*"
> 
> No, that's an asterisk for a footnote below. Perhaps it is a bit unusual,
> would something like "[1]" look better?

oh, I thought it was the start of a comment line gone astray. Maybe "See
note below", no big deal though.

> 
>>> + * behavior can be altered by providing a mask structure of the same type
>>> + * with applicable bits set to one. It can also be used to partially filter
>>> + * out specific fields (e.g. as an alternate mean to match ranges of IP
>>> + * addresses).
>>> + *
>>> + * Note this is a simple bit-mask applied before interpreting the contents
>>> + * of spec and last, which may yield unexpected results if not used
>>> + * carefully. For example, if for an IPv4 address field, spec provides
>>> + * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
>>> + * effective range is 10.1.0.0 to 10.3.255.255.
>>> + *
> 
> See footnote below:
> 
>>> + * * The defaults for data-matching items such as IPv4 when mask is not
>>> + *   specified actually depend on the underlying implementation since only
>>> + *   recognized fields can be taken into account.
>>> + */
>>> +struct rte_flow_item {
>>> +	enum rte_flow_item_type type; /**< Item type. */
>>> +	const void *spec; /**< Pointer to item specification structure. */
>>> +	const void *last; /**< Defines an inclusive range (spec to last). */
>>> +	const void *mask; /**< Bit-mask applied to spec and last. */
>>> +};
>>> +
>>> +/**
>>> + * Action types.
>>> + *
>>> + * Each possible action is represented by a type. Some have associated
>>> + * configuration structures. Several actions combined in a list can be
>>> + * affected to a flow rule. That list is not ordered.
>>> + *
>>> + * They fall in three categories:
>>> + *
>>> + * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
>>> + *   processing matched packets by subsequent flow rules, unless overridden
>>> + *   with PASSTHRU.
>>> + *
>>> + * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
>>> + *   for additional processing by subsequent flow rules.
>>> + *
>>> + * - Other non terminating meta actions that do not affect the fate of
>>> + *   packets (END, VOID, MARK, FLAG, COUNT).
>>> + *
>>> + * When several actions are combined in a flow rule, they should all have
>>> + * different types (e.g. dropping a packet twice is not possible). The
>>> + * defined behavior is for PMDs to only take into account the last action of
>>> + * a given type found in the list. PMDs still perform error checking on the
>>> + * entire list.
>>
>> why do you define that the pmd will interpret multiple same type rules
>> in this way...would it not make more sense for the pmd to just return
>> EINVAL for an invalid set of rules? It seems more transparent for the
>> application.
> 
> Well, I had to define something as a default. The reason is that any number
> of VOID actions may specified and did not want that to be a special case in
> order to keep PMD parsers as simple as possible. I'll settle for EINVAL (or
> some other error) if at least one PMD maintainer other than Nelio who
> intends to implement this API is not convinced by this explanation, all
> right?

>From an API perspective I think it's cleaner to pass or fail with the
input rather than change it. But yes, please take pmd maintainers input
as to what is reasonable to check also.

> 
> [...]
>>> +/**
>>> + * RTE_FLOW_ACTION_TYPE_MARK
>>> + *
>>> + * Attaches a 32 bit value to packets.
>>> + *
>>> + * This value is arbitrary and application-defined. For compatibility with
>>> + * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
>>> + * also set in ol_flags.
>>> + */
>>> +struct rte_flow_action_mark {
>>> +	uint32_t id; /**< 32 bit value to return with packets. */
>>> +};
>>
>> One use case I thought we would be able to do for OVS is classification
>> in hardware and the unique flow id is sent with the packet to software.
>> But in OVS the ufid is 128 bits, so it means we can't and there is still
>> the miniflow extract overhead. I'm not sure if there is a practical way
>> around this.
>>
>> Sugesh (cc'd) has looked at this before and may be able to comment or
>> correct me.
> 
> Yes, we settled on 32 bit because currently no known hardware implementation
> supports more than this. If that changes, another action with a larger type
> shall be provided (no ABI breakage).
> 
> Also since even 64 bit would not be enough for the use case you mention,
> there is no choice but use this as an indirect value (such as an array or
> hash table index/value).

ok, cool. I think Sugesh has other ideas anyway!

> 
> [...]
>>> +/**
>>> + * RTE_FLOW_ACTION_TYPE_RSS
>>> + *
>>> + * Similar to QUEUE, except RSS is additionally performed on packets to
>>> + * spread them among several queues according to the provided parameters.
>>> + *
>>> + * Note: RSS hash result is normally stored in the hash.rss mbuf field,
>>> + * however it conflicts with the MARK action as they share the same
>>> + * space. When both actions are specified, the RSS hash is discarded and
>>> + * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
>>> + * structure should eventually evolve to store both.
>>> + *
>>> + * Terminating by default.
>>> + */
>>> +struct rte_flow_action_rss {
>>> +	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
>>> +	uint16_t queues; /**< Number of entries in queue[]. */
>>> +	uint16_t queue[]; /**< Queues indices to use. */
>>
>> I'd try and avoid queue and queues - someone will say "huh?" when
>> reading code. s/queues/num ?
> 
> Agreed, will update for v2.
> 
>>> +};
>>> +
>>> +/**
>>> + * RTE_FLOW_ACTION_TYPE_VF
>>> + *
>>> + * Redirects packets to a virtual function (VF) of the current device.
>>> + *
>>> + * Packets matched by a VF pattern item can be redirected to their original
>>> + * VF ID instead of the specified one. This parameter may not be available
>>> + * and is not guaranteed to work properly if the VF part is matched by a
>>> + * prior flow rule or if packets are not addressed to a VF in the first
>>> + * place.
>>
>> Not clear what you mean by "not guaranteed to work if...". Please return
>> fail when this action is used if this is not going to work.
> 
> Again, this is a case where it is difficult for a PMD to determine if the
> entire list of flow rules makes sense. Perhaps it does, perhaps whatever
> goes through has already been filtered out of possible issues.
> 
> Here the documentation states the precautions an application should take to
> guarantee it will work as intended. Perhaps it can be reworded (any
> suggestion?), but a PMD can certainly not provide any strong guarantee.

I see your point. Maybe for easy check things the pmd would return fail,
but for more complex I agree it's too difficult.

> 
>>> + *
>>> + * Terminating by default.
>>> + */
>>> +struct rte_flow_action_vf {
>>> +	uint32_t original:1; /**< Use original VF ID if possible. */
>>> +	uint32_t reserved:31; /**< Reserved, must be zero. */
>>> +	uint32_t id; /**< VF ID to redirect packets to. */
>>> +};
> [...]
>>> +/**
>>> + * Check whether a flow rule can be created on a given port.
>>> + *
>>> + * While this function has no effect on the target device, the flow rule is
>>> + * validated against its current configuration state and the returned value
>>> + * should be considered valid by the caller for that state only.
>>> + *
>>> + * The returned value is guaranteed to remain valid only as long as no
>>> + * successful calls to rte_flow_create() or rte_flow_destroy() are made in
>>> + * the meantime and no device parameter affecting flow rules in any way are
>>> + * modified, due to possible collisions or resource limitations (although in
>>> + * such cases EINVAL should not be returned).
>>> + *
>>> + * @param port_id
>>> + *   Port identifier of Ethernet device.
>>> + * @param[in] attr
>>> + *   Flow rule attributes.
>>> + * @param[in] pattern
>>> + *   Pattern specification (list terminated by the END pattern item).
>>> + * @param[in] actions
>>> + *   Associated actions (list terminated by the END action).
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   0 if flow rule is valid and can be created. A negative errno value
>>> + *   otherwise (rte_errno is also set), the following errors are defined:
>>> + *
>>> + *   -ENOSYS: underlying device does not support this functionality.
>>> + *
>>> + *   -EINVAL: unknown or invalid rule specification.
>>> + *
>>> + *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
>>> + *   bit-masks are unsupported).
>>> + *
>>> + *   -EEXIST: collision with an existing rule.
>>> + *
>>> + *   -ENOMEM: not enough resources.
>>> + *
>>> + *   -EBUSY: action cannot be performed due to busy device resources, may
>>> + *   succeed if the affected queues or even the entire port are in a stopped
>>> + *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
>>> + */
>>> +int
>>> +rte_flow_validate(uint8_t port_id,
>>> +		  const struct rte_flow_attr *attr,
>>> +		  const struct rte_flow_item pattern[],
>>> +		  const struct rte_flow_action actions[],
>>> +		  struct rte_flow_error *error);
>>
>> Why not just use rte_flow_create() and get an error? Is it less
>> disruptive to do a validate and find the rule cannot be created, than
>> using a create directly?
> 
> The rationale can be found in the original RFC, which I'll convert to actual
> documentation in v2. In short:
> 
> - Calling rte_flow_validate() before rte_flow_create() is useless since
>   rte_flow_create() also performs validation.
> 
> - We cannot possibly express a full static set of allowed flow rules, even
>   if we could, it usually depends on the current hardware configuration
>   therefore would not be static.
> 
> - rte_flow_validate() is thus provided as a replacement for capability
>   flags. It can be used to determine during initialization if the underlying
>   device can support the typical flow rules an application might want to
>   provide later and do something useful with that information (e.g. always
>   use software fallback due to HW limitations).
> 
> - rte_flow_validate() being a subset of rte_flow_create(), it is essentially
>   free to expose.

make sense now, thanks.

> 
>>> +
>>> +/**
>>> + * Create a flow rule on a given port.
>>> + *
>>> + * @param port_id
>>> + *   Port identifier of Ethernet device.
>>> + * @param[in] attr
>>> + *   Flow rule attributes.
>>> + * @param[in] pattern
>>> + *   Pattern specification (list terminated by the END pattern item).
>>> + * @param[in] actions
>>> + *   Associated actions (list terminated by the END action).
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   A valid handle in case of success, NULL otherwise and rte_errno is set
>>> + *   to the positive version of one of the error codes defined for
>>> + *   rte_flow_validate().
>>> + */
>>> +struct rte_flow *
>>> +rte_flow_create(uint8_t port_id,
>>> +		const struct rte_flow_attr *attr,
>>> +		const struct rte_flow_item pattern[],
>>> +		const struct rte_flow_action actions[],
>>> +		struct rte_flow_error *error);
>>
>> General question - are these functions threadsafe? In the OVS example
>> you could have several threads wanting to create flow rules at the same
>> time for same or different ports.
> 
> No they aren't, applications have to perform their own locking. The RFC (to
> be converted to actual documentation in v2) says that:
> 
> - API operations are synchronous and blocking (``EAGAIN`` cannot be
>   returned).
> 
> - There is no provision for reentrancy/multi-thread safety, although nothing
>   should prevent different devices from being configured at the same
>   time. PMDs may protect their control path functions accordingly.

other comment above wrt locking.

> 
>>> +
>>> +/**
>>> + * Destroy a flow rule on a given port.
>>> + *
>>> + * Failure to destroy a flow rule handle may occur when other flow rules
>>> + * depend on it, and destroying it would result in an inconsistent state.
>>> + *
>>> + * This function is only guaranteed to succeed if handles are destroyed in
>>> + * reverse order of their creation.
>>
>> How can the application find this information out on error?
> 
> Without maintaining a list, they cannot. The specified case is the only
> possible guarantee. That does not mean PMDs should not do their best to
> destroy flow rules, only that ordering must remain consistent in case of
> inability to destroy one.
> 
> What do you suggest?

I think if the app cannot remove a specific rule it may want to remove
all rules and deal with flows in software for a time. So once the app
knows it fails that should be enough.

> 
>>> + *
>>> + * @param port_id
>>> + *   Port identifier of Ethernet device.
>>> + * @param flow
>>> + *   Flow rule handle to destroy.
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>> + */
>>> +int
>>> +rte_flow_destroy(uint8_t port_id,
>>> +		 struct rte_flow *flow,
>>> +		 struct rte_flow_error *error);
>>> +
>>> +/**
>>> + * Destroy all flow rules associated with a port.
>>> + *
>>> + * In the unlikely event of failure, handles are still considered destroyed
>>> + * and no longer valid but the port must be assumed to be in an inconsistent
>>> + * state.
>>> + *
>>> + * @param port_id
>>> + *   Port identifier of Ethernet device.
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>> + */
>>> +int
>>> +rte_flow_flush(uint8_t port_id,
>>> +	       struct rte_flow_error *error);
>>
>> rte_flow_destroy_all() would be more descriptive (but breaks your style)
> 
> There are enough underscores as it is. I like flush, if enough people
> complain we'll change it but it has to occur before the first public
> release.
> 
>>> +
>>> +/**
>>> + * Query an existing flow rule.
>>> + *
>>> + * This function allows retrieving flow-specific data such as counters.
>>> + * Data is gathered by special actions which must be present in the flow
>>> + * rule definition.
>>
>> re last sentence, it would be good if you can put a link to
>> RTE_FLOW_ACTION_TYPE_COUNT
> 
> Will do, I did not know how until very recently.
> 
>>> + *
>>> + * @param port_id
>>> + *   Port identifier of Ethernet device.
>>> + * @param flow
>>> + *   Flow rule handle to query.
>>> + * @param action
>>> + *   Action type to query.
>>> + * @param[in, out] data
>>> + *   Pointer to storage for the associated query data type.
>>
>> can this be anything other than rte_flow_query_count?
> 
> Likely in the future. I've only defined this one as a counterpart for
> existing API functionality and because we wanted to expose it in mlx5.
> 
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>>> + */
>>> +int
>>> +rte_flow_query(uint8_t port_id,
>>> +	       struct rte_flow *flow,
>>> +	       enum rte_flow_action_type action,
>>> +	       void *data,
>>> +	       struct rte_flow_error *error);
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>
>> I don't see a way to dump all the rules for a port out. I think this is
>> neccessary for degbugging. You could have a look through dpif.h in OVS
>> and see how dpif_flow_dump_next() is used, it might be a good reference.
> 
> DPDK does not maintain flow rules and, depending on hardware capabilities
> and level of compliance, PMDs do not necessarily do it either, particularly
> since it requires space and application probably have a better method to
> store these pointers for their own needs.

understood

> 
> What you see here is only a PMD interface. Depending on applications needs,
> generic helper functions built on top of these may be added to manage flow
> rules in the future.

I'm thinking of the case where something goes wrong and I want to get a
dump of all the flow rules from hardware, not query the rules I think I
have. I don't see a way to do it or something to build a helper on top of?

> 
>> Also, it would be nice if there were an api that would allow a test
>> packet to be injected and traced for debugging - although I'm not
>> exactly sure how well it could be traced. For reference:
>> http://developers.redhat.com/blog/2016/10/12/tracing-packets-inside-open-vswitch/
> 
> Thanks for the link, I'm not sure how you'd do this either. Remember, as
> generic as it looks, this interface is only meant to configure the
> underlying device. You need to see it as one big offload, everything else
> is left to applications.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-11-30 17:47  0%     ` Kevin Traynor
@ 2016-12-01  8:36  2%       ` Adrien Mazarguil
  2016-12-02 21:06  0%         ` Kevin Traynor
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-12-01  8:36 UTC (permalink / raw)
  To: Kevin Traynor
  Cc: dev, Thomas Monjalon, Pablo de Lara, Olivier Matz, sugesh.chandra

Hi Kevin,

On Wed, Nov 30, 2016 at 05:47:17PM +0000, Kevin Traynor wrote:
> Hi Adrien,
> 
> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> > This new API supersedes all the legacy filter types described in
> > rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> > PMDs to process and validate flow rules.
> > 
> > Benefits:
> > 
> > - A unified API is easier to program for, applications do not have to be
> >   written for a specific filter type which may or may not be supported by
> >   the underlying device.
> > 
> > - The behavior of a flow rule is the same regardless of the underlying
> >   device, applications do not need to be aware of hardware quirks.
> > 
> > - Extensible by design, API/ABI breakage should rarely occur if at all.
> > 
> > - Documentation is self-standing, no need to look up elsewhere.
> > 
> > Existing filter types will be deprecated and removed in the near future.
> 
> I'd suggest to add a deprecation notice to deprecation.rst, ideally with
> a target release.

Will do, not a sure about the target release though. It seems a bit early
since no PMD really supports this API yet.

[...]
> > diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> > new file mode 100644
> > index 0000000..064963d
> > --- /dev/null
> > +++ b/lib/librte_ether/rte_flow.c
> > @@ -0,0 +1,159 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright 2016 6WIND S.A.
> > + *   Copyright 2016 Mellanox.
> 
> There's Mellanox copyright but you are the only signed-off-by - is that
> right?

Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed
on their behalf to expose several features from mlx4/mlx5 as the existing
filter types had too many limitations.

[...]
> > +/* Get generic flow operations structure from a port. */
> > +const struct rte_flow_ops *
> > +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_flow_ops *ops;
> > +	int code;
> > +
> > +	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> > +		code = ENODEV;
> > +	else if (unlikely(!dev->dev_ops->filter_ctrl ||
> > +			  dev->dev_ops->filter_ctrl(dev,
> > +						    RTE_ETH_FILTER_GENERIC,
> > +						    RTE_ETH_FILTER_GET,
> > +						    &ops) ||
> > +			  !ops))
> > +		code = ENOTSUP;
> > +	else
> > +		return ops;
> > +	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > +			   NULL, rte_strerror(code));
> > +	return NULL;
> > +}
> > +
> 
> Is it expected that the application or pmd will provide locking between
> these functions if required? I think it's going to have to be the app.

Locking is indeed expected to be performed by applications. This API only
documents places where locking would make sense if necessary and expected
behavior.

Like all control path APIs, this one assumes a single control thread.
Applications must take the necessary precautions.

[...]
> > +/**
> > + * Flow rule attributes.
> > + *
> > + * Priorities are set on two levels: per group and per rule within groups.
> > + *
> > + * Lower values denote higher priority, the highest priority for both levels
> > + * is 0, so that a rule with priority 0 in group 8 is always matched after a
> > + * rule with priority 8 in group 0.
> > + *
> > + * Although optional, applications are encouraged to group similar rules as
> > + * much as possible to fully take advantage of hardware capabilities
> > + * (e.g. optimized matching) and work around limitations (e.g. a single
> > + * pattern type possibly allowed in a given group).
> > + *
> > + * Group and priority levels are arbitrary and up to the application, they
> > + * do not need to be contiguous nor start from 0, however the maximum number
> > + * varies between devices and may be affected by existing flow rules.
> > + *
> > + * If a packet is matched by several rules of a given group for a given
> > + * priority level, the outcome is undefined. It can take any path, may be
> > + * duplicated or even cause unrecoverable errors.
> 
> I get what you are trying to do here wrt supporting multiple
> pmds/hardware implementations and it's a good idea to keep it flexible.
> 
> Given that the outcome is undefined, it would be nice that the
> application has a way of finding the specific effects for verification
> and debugging.

Right, however it was deemed a bit difficult to manage in many cases hence
the vagueness.

For example, suppose two rules with the same group and priority, one
matching any IPv4 header, the other one any UDP header:

- TCPv4 packets => rule #1.
- UDPv6 packets => rule #2.
- UDPv4 packets => both?

That last one is perhaps invalid, checking that some unspecified protocol
combination does not overlap is expensive and may miss corner cases, even
assuming this is not an issue, what if the application guarantees that no
UDPv4 packets can ever hit that rule?

Suggestions are welcome though, perhaps we can refine the description.

> > + *
> > + * Note that support for more than a single group and priority level is not
> > + * guaranteed.
> > + *
> > + * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
> > + *
> > + * Several pattern items and actions are valid and can be used in both
> > + * directions. Those valid for only one direction are described as such.
> > + *
> > + * Specifying both directions at once is not recommended but may be valid in
> > + * some cases, such as incrementing the same counter twice.
> > + *
> > + * Not specifying any direction is currently an error.
> > + */
> > +struct rte_flow_attr {
> > +	uint32_t group; /**< Priority group. */
> > +	uint32_t priority; /**< Priority level within group. */
> > +	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
> > +	uint32_t egress:1; /**< Rule applies to egress traffic. */
> > +	uint32_t reserved:30; /**< Reserved, must be zero. */
> > +};
[...]
> > +/**
> > + * RTE_FLOW_ITEM_TYPE_VF
> > + *
> > + * Matches packets addressed to a virtual function ID of the device.
> > + *
> > + * If the underlying device function differs from the one that would
> > + * normally receive the matched traffic, specifying this item prevents it
> > + * from reaching that device unless the flow rule contains a VF
> > + * action. Packets are not duplicated between device instances by default.
> > + *
> > + * - Likely to return an error or never match any traffic if this causes a
> > + *   VF device to match traffic addressed to a different VF.
> > + * - Can be specified multiple times to match traffic addressed to several
> > + *   specific VFs.
> > + * - Can be combined with a PF item to match both PF and VF traffic.
> > + *
> > + * A zeroed mask can be used to match any VF.
> 
> can you refer explicitly to id

If you mean "VF" to "VF ID" then yes, will do it for v2.

> > + */
> > +struct rte_flow_item_vf {
> > +	uint32_t id; /**< Destination VF ID. */
> > +};
[...]
> > +/**
> > + * Matching pattern item definition.
> > + *
> > + * A pattern is formed by stacking items starting from the lowest protocol
> > + * layer to match. This stacking restriction does not apply to meta items
> > + * which can be placed anywhere in the stack with no effect on the meaning
> > + * of the resulting pattern.
> > + *
> > + * A stack is terminated by a END item.
> > + *
> > + * The spec field should be a valid pointer to a structure of the related
> > + * item type. It may be set to NULL in many cases to use default values.
> > + *
> > + * Optionally, last can point to a structure of the same type to define an
> > + * inclusive range. This is mostly supported by integer and address fields,
> > + * may cause errors otherwise. Fields that do not support ranges must be set
> > + * to the same value as their spec counterparts.
> > + *
> > + * By default all fields present in spec are considered relevant.* This
> 
> typo "*"

No, that's an asterisk for a footnote below. Perhaps it is a bit unusual,
would something like "[1]" look better?

> > + * behavior can be altered by providing a mask structure of the same type
> > + * with applicable bits set to one. It can also be used to partially filter
> > + * out specific fields (e.g. as an alternate mean to match ranges of IP
> > + * addresses).
> > + *
> > + * Note this is a simple bit-mask applied before interpreting the contents
> > + * of spec and last, which may yield unexpected results if not used
> > + * carefully. For example, if for an IPv4 address field, spec provides
> > + * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
> > + * effective range is 10.1.0.0 to 10.3.255.255.
> > + *

See footnote below:

> > + * * The defaults for data-matching items such as IPv4 when mask is not
> > + *   specified actually depend on the underlying implementation since only
> > + *   recognized fields can be taken into account.
> > + */
> > +struct rte_flow_item {
> > +	enum rte_flow_item_type type; /**< Item type. */
> > +	const void *spec; /**< Pointer to item specification structure. */
> > +	const void *last; /**< Defines an inclusive range (spec to last). */
> > +	const void *mask; /**< Bit-mask applied to spec and last. */
> > +};
> > +
> > +/**
> > + * Action types.
> > + *
> > + * Each possible action is represented by a type. Some have associated
> > + * configuration structures. Several actions combined in a list can be
> > + * affected to a flow rule. That list is not ordered.
> > + *
> > + * They fall in three categories:
> > + *
> > + * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
> > + *   processing matched packets by subsequent flow rules, unless overridden
> > + *   with PASSTHRU.
> > + *
> > + * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
> > + *   for additional processing by subsequent flow rules.
> > + *
> > + * - Other non terminating meta actions that do not affect the fate of
> > + *   packets (END, VOID, MARK, FLAG, COUNT).
> > + *
> > + * When several actions are combined in a flow rule, they should all have
> > + * different types (e.g. dropping a packet twice is not possible). The
> > + * defined behavior is for PMDs to only take into account the last action of
> > + * a given type found in the list. PMDs still perform error checking on the
> > + * entire list.
> 
> why do you define that the pmd will interpret multiple same type rules
> in this way...would it not make more sense for the pmd to just return
> EINVAL for an invalid set of rules? It seems more transparent for the
> application.

Well, I had to define something as a default. The reason is that any number
of VOID actions may specified and did not want that to be a special case in
order to keep PMD parsers as simple as possible. I'll settle for EINVAL (or
some other error) if at least one PMD maintainer other than Nelio who
intends to implement this API is not convinced by this explanation, all
right?

[...]
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_MARK
> > + *
> > + * Attaches a 32 bit value to packets.
> > + *
> > + * This value is arbitrary and application-defined. For compatibility with
> > + * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
> > + * also set in ol_flags.
> > + */
> > +struct rte_flow_action_mark {
> > +	uint32_t id; /**< 32 bit value to return with packets. */
> > +};
> 
> One use case I thought we would be able to do for OVS is classification
> in hardware and the unique flow id is sent with the packet to software.
> But in OVS the ufid is 128 bits, so it means we can't and there is still
> the miniflow extract overhead. I'm not sure if there is a practical way
> around this.
> 
> Sugesh (cc'd) has looked at this before and may be able to comment or
> correct me.

Yes, we settled on 32 bit because currently no known hardware implementation
supports more than this. If that changes, another action with a larger type
shall be provided (no ABI breakage).

Also since even 64 bit would not be enough for the use case you mention,
there is no choice but use this as an indirect value (such as an array or
hash table index/value).

[...]
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_RSS
> > + *
> > + * Similar to QUEUE, except RSS is additionally performed on packets to
> > + * spread them among several queues according to the provided parameters.
> > + *
> > + * Note: RSS hash result is normally stored in the hash.rss mbuf field,
> > + * however it conflicts with the MARK action as they share the same
> > + * space. When both actions are specified, the RSS hash is discarded and
> > + * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
> > + * structure should eventually evolve to store both.
> > + *
> > + * Terminating by default.
> > + */
> > +struct rte_flow_action_rss {
> > +	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
> > +	uint16_t queues; /**< Number of entries in queue[]. */
> > +	uint16_t queue[]; /**< Queues indices to use. */
> 
> I'd try and avoid queue and queues - someone will say "huh?" when
> reading code. s/queues/num ?

Agreed, will update for v2.

> > +};
> > +
> > +/**
> > + * RTE_FLOW_ACTION_TYPE_VF
> > + *
> > + * Redirects packets to a virtual function (VF) of the current device.
> > + *
> > + * Packets matched by a VF pattern item can be redirected to their original
> > + * VF ID instead of the specified one. This parameter may not be available
> > + * and is not guaranteed to work properly if the VF part is matched by a
> > + * prior flow rule or if packets are not addressed to a VF in the first
> > + * place.
> 
> Not clear what you mean by "not guaranteed to work if...". Please return
> fail when this action is used if this is not going to work.

Again, this is a case where it is difficult for a PMD to determine if the
entire list of flow rules makes sense. Perhaps it does, perhaps whatever
goes through has already been filtered out of possible issues.

Here the documentation states the precautions an application should take to
guarantee it will work as intended. Perhaps it can be reworded (any
suggestion?), but a PMD can certainly not provide any strong guarantee.

> > + *
> > + * Terminating by default.
> > + */
> > +struct rte_flow_action_vf {
> > +	uint32_t original:1; /**< Use original VF ID if possible. */
> > +	uint32_t reserved:31; /**< Reserved, must be zero. */
> > +	uint32_t id; /**< VF ID to redirect packets to. */
> > +};
[...]
> > +/**
> > + * Check whether a flow rule can be created on a given port.
> > + *
> > + * While this function has no effect on the target device, the flow rule is
> > + * validated against its current configuration state and the returned value
> > + * should be considered valid by the caller for that state only.
> > + *
> > + * The returned value is guaranteed to remain valid only as long as no
> > + * successful calls to rte_flow_create() or rte_flow_destroy() are made in
> > + * the meantime and no device parameter affecting flow rules in any way are
> > + * modified, due to possible collisions or resource limitations (although in
> > + * such cases EINVAL should not be returned).
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] attr
> > + *   Flow rule attributes.
> > + * @param[in] pattern
> > + *   Pattern specification (list terminated by the END pattern item).
> > + * @param[in] actions
> > + *   Associated actions (list terminated by the END action).
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   0 if flow rule is valid and can be created. A negative errno value
> > + *   otherwise (rte_errno is also set), the following errors are defined:
> > + *
> > + *   -ENOSYS: underlying device does not support this functionality.
> > + *
> > + *   -EINVAL: unknown or invalid rule specification.
> > + *
> > + *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
> > + *   bit-masks are unsupported).
> > + *
> > + *   -EEXIST: collision with an existing rule.
> > + *
> > + *   -ENOMEM: not enough resources.
> > + *
> > + *   -EBUSY: action cannot be performed due to busy device resources, may
> > + *   succeed if the affected queues or even the entire port are in a stopped
> > + *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
> > + */
> > +int
> > +rte_flow_validate(uint8_t port_id,
> > +		  const struct rte_flow_attr *attr,
> > +		  const struct rte_flow_item pattern[],
> > +		  const struct rte_flow_action actions[],
> > +		  struct rte_flow_error *error);
> 
> Why not just use rte_flow_create() and get an error? Is it less
> disruptive to do a validate and find the rule cannot be created, than
> using a create directly?

The rationale can be found in the original RFC, which I'll convert to actual
documentation in v2. In short:

- Calling rte_flow_validate() before rte_flow_create() is useless since
  rte_flow_create() also performs validation.

- We cannot possibly express a full static set of allowed flow rules, even
  if we could, it usually depends on the current hardware configuration
  therefore would not be static.

- rte_flow_validate() is thus provided as a replacement for capability
  flags. It can be used to determine during initialization if the underlying
  device can support the typical flow rules an application might want to
  provide later and do something useful with that information (e.g. always
  use software fallback due to HW limitations).

- rte_flow_validate() being a subset of rte_flow_create(), it is essentially
  free to expose.

> > +
> > +/**
> > + * Create a flow rule on a given port.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[in] attr
> > + *   Flow rule attributes.
> > + * @param[in] pattern
> > + *   Pattern specification (list terminated by the END pattern item).
> > + * @param[in] actions
> > + *   Associated actions (list terminated by the END action).
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   A valid handle in case of success, NULL otherwise and rte_errno is set
> > + *   to the positive version of one of the error codes defined for
> > + *   rte_flow_validate().
> > + */
> > +struct rte_flow *
> > +rte_flow_create(uint8_t port_id,
> > +		const struct rte_flow_attr *attr,
> > +		const struct rte_flow_item pattern[],
> > +		const struct rte_flow_action actions[],
> > +		struct rte_flow_error *error);
> 
> General question - are these functions threadsafe? In the OVS example
> you could have several threads wanting to create flow rules at the same
> time for same or different ports.

No they aren't, applications have to perform their own locking. The RFC (to
be converted to actual documentation in v2) says that:

- API operations are synchronous and blocking (``EAGAIN`` cannot be
  returned).

- There is no provision for reentrancy/multi-thread safety, although nothing
  should prevent different devices from being configured at the same
  time. PMDs may protect their control path functions accordingly.

> > +
> > +/**
> > + * Destroy a flow rule on a given port.
> > + *
> > + * Failure to destroy a flow rule handle may occur when other flow rules
> > + * depend on it, and destroying it would result in an inconsistent state.
> > + *
> > + * This function is only guaranteed to succeed if handles are destroyed in
> > + * reverse order of their creation.
> 
> How can the application find this information out on error?

Without maintaining a list, they cannot. The specified case is the only
possible guarantee. That does not mean PMDs should not do their best to
destroy flow rules, only that ordering must remain consistent in case of
inability to destroy one.

What do you suggest?

> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param flow
> > + *   Flow rule handle to destroy.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +int
> > +rte_flow_destroy(uint8_t port_id,
> > +		 struct rte_flow *flow,
> > +		 struct rte_flow_error *error);
> > +
> > +/**
> > + * Destroy all flow rules associated with a port.
> > + *
> > + * In the unlikely event of failure, handles are still considered destroyed
> > + * and no longer valid but the port must be assumed to be in an inconsistent
> > + * state.
> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +int
> > +rte_flow_flush(uint8_t port_id,
> > +	       struct rte_flow_error *error);
> 
> rte_flow_destroy_all() would be more descriptive (but breaks your style)

There are enough underscores as it is. I like flush, if enough people
complain we'll change it but it has to occur before the first public
release.

> > +
> > +/**
> > + * Query an existing flow rule.
> > + *
> > + * This function allows retrieving flow-specific data such as counters.
> > + * Data is gathered by special actions which must be present in the flow
> > + * rule definition.
> 
> re last sentence, it would be good if you can put a link to
> RTE_FLOW_ACTION_TYPE_COUNT

Will do, I did not know how until very recently.

> > + *
> > + * @param port_id
> > + *   Port identifier of Ethernet device.
> > + * @param flow
> > + *   Flow rule handle to query.
> > + * @param action
> > + *   Action type to query.
> > + * @param[in, out] data
> > + *   Pointer to storage for the associated query data type.
> 
> can this be anything other than rte_flow_query_count?

Likely in the future. I've only defined this one as a counterpart for
existing API functionality and because we wanted to expose it in mlx5.

> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is set.
> > + */
> > +int
> > +rte_flow_query(uint8_t port_id,
> > +	       struct rte_flow *flow,
> > +	       enum rte_flow_action_type action,
> > +	       void *data,
> > +	       struct rte_flow_error *error);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> 
> I don't see a way to dump all the rules for a port out. I think this is
> neccessary for degbugging. You could have a look through dpif.h in OVS
> and see how dpif_flow_dump_next() is used, it might be a good reference.

DPDK does not maintain flow rules and, depending on hardware capabilities
and level of compliance, PMDs do not necessarily do it either, particularly
since it requires space and application probably have a better method to
store these pointers for their own needs.

What you see here is only a PMD interface. Depending on applications needs,
generic helper functions built on top of these may be added to manage flow
rules in the future.

> Also, it would be nice if there were an api that would allow a test
> packet to be injected and traced for debugging - although I'm not
> exactly sure how well it could be traced. For reference:
> http://developers.redhat.com/blog/2016/10/12/tracing-packets-inside-open-vswitch/

Thanks for the link, I'm not sure how you'd do this either. Remember, as
generic as it looks, this interface is only meant to configure the
underlying device. You need to see it as one big offload, everything else
is left to applications.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API Adrien Mazarguil
  2016-11-18  6:36  0%     ` Xing, Beilei
@ 2016-11-30 17:47  0%     ` Kevin Traynor
  2016-12-01  8:36  2%       ` Adrien Mazarguil
  2016-12-08  9:00  0%     ` Xing, Beilei
  2 siblings, 1 reply; 200+ results
From: Kevin Traynor @ 2016-11-30 17:47 UTC (permalink / raw)
  To: Adrien Mazarguil, dev
  Cc: Thomas Monjalon, Pablo de Lara, Olivier Matz, sugesh.chandra

Hi Adrien,

On 11/16/2016 04:23 PM, Adrien Mazarguil wrote:
> This new API supersedes all the legacy filter types described in
> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
> PMDs to process and validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.

I'd suggest to add a deprecation notice to deprecation.rst, ideally with
a target release.

> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
>  MAINTAINERS                            |   4 +
>  lib/librte_ether/Makefile              |   3 +
>  lib/librte_ether/rte_eth_ctrl.h        |   1 +
>  lib/librte_ether/rte_ether_version.map |  10 +
>  lib/librte_ether/rte_flow.c            | 159 +++++
>  lib/librte_ether/rte_flow.h            | 947 ++++++++++++++++++++++++++++
>  lib/librte_ether/rte_flow_driver.h     | 177 ++++++
>  7 files changed, 1301 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d6bb8f8..3b46630 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -243,6 +243,10 @@ M: Thomas Monjalon <thomas.monjalon@6wind.com>
>  F: lib/librte_ether/
>  F: scripts/test-null.sh
>  
> +Generic flow API
> +M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> +F: lib/librte_ether/rte_flow*
> +
>  Crypto API
>  M: Declan Doherty <declan.doherty@intel.com>
>  F: lib/librte_cryptodev/
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index efe1e5f..9335361 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
>  LIBABIVER := 5
>  
>  SRCS-y += rte_ethdev.c
> +SRCS-y += rte_flow.c
>  
>  #
>  # Export include files
> @@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
>  SYMLINK-y-include += rte_ethdev.h
>  SYMLINK-y-include += rte_eth_ctrl.h
>  SYMLINK-y-include += rte_dev_info.h
> +SYMLINK-y-include += rte_flow.h
> +SYMLINK-y-include += rte_flow_driver.h
>  
>  # this lib depends upon:
>  DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
> diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
> index fe80eb0..8386904 100644
> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -99,6 +99,7 @@ enum rte_filter_type {
>  	RTE_ETH_FILTER_FDIR,
>  	RTE_ETH_FILTER_HASH,
>  	RTE_ETH_FILTER_L2_TUNNEL,
> +	RTE_ETH_FILTER_GENERIC,
>  	RTE_ETH_FILTER_MAX
>  };
>  
> diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
> index 72be66d..b5d2547 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -147,3 +147,13 @@ DPDK_16.11 {
>  	rte_eth_dev_pci_remove;
>  
>  } DPDK_16.07;
> +
> +DPDK_17.02 {
> +	global:
> +
> +	rte_flow_validate;
> +	rte_flow_create;
> +	rte_flow_destroy;
> +	rte_flow_query;
> +
> +} DPDK_16.11;
> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
> new file mode 100644
> index 0000000..064963d
> --- /dev/null
> +++ b/lib/librte_ether/rte_flow.c
> @@ -0,0 +1,159 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 6WIND S.A.
> + *   Copyright 2016 Mellanox.

There's Mellanox copyright but you are the only signed-off-by - is that
right?

> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include <rte_branch_prediction.h>
> +#include "rte_ethdev.h"
> +#include "rte_flow_driver.h"
> +#include "rte_flow.h"
> +
> +/* Get generic flow operations structure from a port. */
> +const struct rte_flow_ops *
> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops;
> +	int code;
> +
> +	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
> +		code = ENODEV;
> +	else if (unlikely(!dev->dev_ops->filter_ctrl ||
> +			  dev->dev_ops->filter_ctrl(dev,
> +						    RTE_ETH_FILTER_GENERIC,
> +						    RTE_ETH_FILTER_GET,
> +						    &ops) ||
> +			  !ops))
> +		code = ENOTSUP;
> +	else
> +		return ops;
> +	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			   NULL, rte_strerror(code));
> +	return NULL;
> +}
> +

Is it expected that the application or pmd will provide locking between
these functions if required? I think it's going to have to be the app.

> +/* Check whether a flow rule can be created on a given port. */
> +int
> +rte_flow_validate(uint8_t port_id,
> +		  const struct rte_flow_attr *attr,
> +		  const struct rte_flow_item pattern[],
> +		  const struct rte_flow_action actions[],
> +		  struct rte_flow_error *error)
> +{
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->validate))
> +		return ops->validate(dev, attr, pattern, actions, error);
> +	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			   NULL, rte_strerror(ENOTSUP));
> +	return -rte_errno;
> +}
> +
> +/* Create a flow rule on a given port. */
> +struct rte_flow *
> +rte_flow_create(uint8_t port_id,
> +		const struct rte_flow_attr *attr,
> +		const struct rte_flow_item pattern[],
> +		const struct rte_flow_action actions[],
> +		struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return NULL;
> +	if (likely(!!ops->create))
> +		return ops->create(dev, attr, pattern, actions, error);
> +	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			   NULL, rte_strerror(ENOTSUP));
> +	return NULL;
> +}
> +
> +/* Destroy a flow rule on a given port. */
> +int
> +rte_flow_destroy(uint8_t port_id,
> +		 struct rte_flow *flow,
> +		 struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->destroy))
> +		return ops->destroy(dev, flow, error);
> +	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			   NULL, rte_strerror(ENOTSUP));
> +	return -rte_errno;
> +}
> +
> +/* Destroy all flow rules associated with a port. */
> +int
> +rte_flow_flush(uint8_t port_id,
> +	       struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (unlikely(!ops))
> +		return -rte_errno;
> +	if (likely(!!ops->flush))
> +		return ops->flush(dev, error);
> +	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			   NULL, rte_strerror(ENOTSUP));
> +	return -rte_errno;
> +}
> +
> +/* Query an existing flow rule. */
> +int
> +rte_flow_query(uint8_t port_id,
> +	       struct rte_flow *flow,
> +	       enum rte_flow_action_type action,
> +	       void *data,
> +	       struct rte_flow_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
> +
> +	if (!ops)
> +		return -rte_errno;
> +	if (likely(!!ops->query))
> +		return ops->query(dev, flow, action, data, error);
> +	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			   NULL, rte_strerror(ENOTSUP));
> +	return -rte_errno;
> +}
> diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
> new file mode 100644
> index 0000000..211f307
> --- /dev/null
> +++ b/lib/librte_ether/rte_flow.h
> @@ -0,0 +1,947 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 6WIND S.A.
> + *   Copyright 2016 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef RTE_FLOW_H_
> +#define RTE_FLOW_H_
> +
> +/**
> + * @file
> + * RTE generic flow API
> + *
> + * This interface provides the ability to program packet matching and
> + * associated actions in hardware through flow rules.
> + */
> +
> +#include <rte_arp.h>
> +#include <rte_ether.h>
> +#include <rte_icmp.h>
> +#include <rte_ip.h>
> +#include <rte_sctp.h>
> +#include <rte_tcp.h>
> +#include <rte_udp.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Flow rule attributes.
> + *
> + * Priorities are set on two levels: per group and per rule within groups.
> + *
> + * Lower values denote higher priority, the highest priority for both levels
> + * is 0, so that a rule with priority 0 in group 8 is always matched after a
> + * rule with priority 8 in group 0.
> + *
> + * Although optional, applications are encouraged to group similar rules as
> + * much as possible to fully take advantage of hardware capabilities
> + * (e.g. optimized matching) and work around limitations (e.g. a single
> + * pattern type possibly allowed in a given group).
> + *
> + * Group and priority levels are arbitrary and up to the application, they
> + * do not need to be contiguous nor start from 0, however the maximum number
> + * varies between devices and may be affected by existing flow rules.
> + *
> + * If a packet is matched by several rules of a given group for a given
> + * priority level, the outcome is undefined. It can take any path, may be
> + * duplicated or even cause unrecoverable errors.

I get what you are trying to do here wrt supporting multiple
pmds/hardware implementations and it's a good idea to keep it flexible.

Given that the outcome is undefined, it would be nice that the
application has a way of finding the specific effects for verification
and debugging.

> + *
> + * Note that support for more than a single group and priority level is not
> + * guaranteed.
> + *
> + * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
> + *
> + * Several pattern items and actions are valid and can be used in both
> + * directions. Those valid for only one direction are described as such.
> + *
> + * Specifying both directions at once is not recommended but may be valid in
> + * some cases, such as incrementing the same counter twice.
> + *
> + * Not specifying any direction is currently an error.
> + */
> +struct rte_flow_attr {
> +	uint32_t group; /**< Priority group. */
> +	uint32_t priority; /**< Priority level within group. */
> +	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
> +	uint32_t egress:1; /**< Rule applies to egress traffic. */
> +	uint32_t reserved:30; /**< Reserved, must be zero. */
> +};
> +
> +/**
> + * Matching pattern item types.
> + *
> + * Items are arranged in a list to form a matching pattern for packets.
> + * They fall in two categories:
> + *
> + * - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, SCTP,
> + *   VXLAN and so on), usually associated with a specification
> + *   structure. These must be stacked in the same order as the protocol
> + *   layers to match, starting from L2.
> + *
> + * - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, PORT
> + *   and so on), often without a specification structure. Since they are
> + *   meta data that does not match packet contents, these can be specified
> + *   anywhere within item lists without affecting the protocol matching
> + *   items.
> + *
> + * See the description of individual types for more information. Those
> + * marked with [META] fall into the second category.
> + */
> +enum rte_flow_item_type {
> +	/**
> +	 * [META]
> +	 *
> +	 * End marker for item lists. Prevents further processing of items,
> +	 * thereby ending the pattern.
> +	 *
> +	 * No associated specification structure.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_END,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Used as a placeholder for convenience. It is ignored and simply
> +	 * discarded by PMDs.
> +	 *
> +	 * No associated specification structure.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_VOID,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Inverted matching, i.e. process packets that do not match the
> +	 * pattern.
> +	 *
> +	 * No associated specification structure.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_INVERT,
> +
> +	/**
> +	 * Matches any protocol in place of the current layer, a single ANY
> +	 * may also stand for several protocol layers.
> +	 *
> +	 * See struct rte_flow_item_any.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_ANY,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Matches packets addressed to the physical function of the device.
> +	 *
> +	 * If the underlying device function differs from the one that would
> +	 * normally receive the matched traffic, specifying this item
> +	 * prevents it from reaching that device unless the flow rule
> +	 * contains a PF action. Packets are not duplicated between device
> +	 * instances by default.
> +	 *
> +	 * No associated specification structure.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_PF,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Matches packets addressed to a virtual function ID of the device.
> +	 *
> +	 * If the underlying device function differs from the one that would
> +	 * normally receive the matched traffic, specifying this item
> +	 * prevents it from reaching that device unless the flow rule
> +	 * contains a VF action. Packets are not duplicated between device
> +	 * instances by default.
> +	 *
> +	 * See struct rte_flow_item_vf.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_VF,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Matches packets coming from the specified physical port of the
> +	 * underlying device.
> +	 *
> +	 * The first PORT item overrides the physical port normally
> +	 * associated with the specified DPDK input port (port_id). This
> +	 * item can be provided several times to match additional physical
> +	 * ports.
> +	 *
> +	 * See struct rte_flow_item_port.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_PORT,
> +
> +	/**
> +	 * Matches a byte string of a given length at a given offset.
> +	 *
> +	 * See struct rte_flow_item_raw.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_RAW,
> +
> +	/**
> +	 * Matches an Ethernet header.
> +	 *
> +	 * See struct rte_flow_item_eth.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_ETH,
> +
> +	/**
> +	 * Matches an 802.1Q/ad VLAN tag.
> +	 *
> +	 * See struct rte_flow_item_vlan.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_VLAN,
> +
> +	/**
> +	 * Matches an IPv4 header.
> +	 *
> +	 * See struct rte_flow_item_ipv4.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_IPV4,
> +
> +	/**
> +	 * Matches an IPv6 header.
> +	 *
> +	 * See struct rte_flow_item_ipv6.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_IPV6,
> +
> +	/**
> +	 * Matches an ICMP header.
> +	 *
> +	 * See struct rte_flow_item_icmp.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_ICMP,
> +
> +	/**
> +	 * Matches a UDP header.
> +	 *
> +	 * See struct rte_flow_item_udp.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_UDP,
> +
> +	/**
> +	 * Matches a TCP header.
> +	 *
> +	 * See struct rte_flow_item_tcp.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_TCP,
> +
> +	/**
> +	 * Matches a SCTP header.
> +	 *
> +	 * See struct rte_flow_item_sctp.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_SCTP,
> +
> +	/**
> +	 * Matches a VXLAN header.
> +	 *
> +	 * See struct rte_flow_item_vxlan.
> +	 */
> +	RTE_FLOW_ITEM_TYPE_VXLAN,
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_ANY
> + *
> + * Matches any protocol in place of the current layer, a single ANY may also
> + * stand for several protocol layers.
> + *
> + * This is usually specified as the first pattern item when looking for a
> + * protocol anywhere in a packet.
> + *
> + * A maximum value of 0 requests matching any number of protocol layers
> + * above or equal to the minimum value, a maximum value lower than the
> + * minimum one is otherwise invalid.
> + *
> + * This type does not work with a range (struct rte_flow_item.last).
> + */
> +struct rte_flow_item_any {
> +	uint16_t min; /**< Minimum number of layers covered. */
> +	uint16_t max; /**< Maximum number of layers covered, 0 for infinity. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_VF
> + *
> + * Matches packets addressed to a virtual function ID of the device.
> + *
> + * If the underlying device function differs from the one that would
> + * normally receive the matched traffic, specifying this item prevents it
> + * from reaching that device unless the flow rule contains a VF
> + * action. Packets are not duplicated between device instances by default.
> + *
> + * - Likely to return an error or never match any traffic if this causes a
> + *   VF device to match traffic addressed to a different VF.
> + * - Can be specified multiple times to match traffic addressed to several
> + *   specific VFs.
> + * - Can be combined with a PF item to match both PF and VF traffic.
> + *
> + * A zeroed mask can be used to match any VF.

can you refer explicitly to id

> + */
> +struct rte_flow_item_vf {
> +	uint32_t id; /**< Destination VF ID. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_PORT
> + *
> + * Matches packets coming from the specified physical port of the underlying
> + * device.
> + *
> + * The first PORT item overrides the physical port normally associated with
> + * the specified DPDK input port (port_id). This item can be provided
> + * several times to match additional physical ports.
> + *
> + * Note that physical ports are not necessarily tied to DPDK input ports
> + * (port_id) when those are not under DPDK control. Possible values are
> + * specific to each device, they are not necessarily indexed from zero and
> + * may not be contiguous.
> + *
> + * As a device property, the list of allowed values as well as the value
> + * associated with a port_id should be retrieved by other means.
> + *
> + * A zeroed mask can be used to match any port index.
> + */
> +struct rte_flow_item_port {
> +	uint32_t index; /**< Physical port index. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_RAW
> + *
> + * Matches a byte string of a given length at a given offset.
> + *
> + * Offset is either absolute (using the start of the packet) or relative to
> + * the end of the previous matched item in the stack, in which case negative
> + * values are allowed.
> + *
> + * If search is enabled, offset is used as the starting point. The search
> + * area can be delimited by setting limit to a nonzero value, which is the
> + * maximum number of bytes after offset where the pattern may start.
> + *
> + * Matching a zero-length pattern is allowed, doing so resets the relative
> + * offset for subsequent items.
> + *
> + * This type does not work with a range (struct rte_flow_item.last).
> + */
> +struct rte_flow_item_raw {
> +	uint32_t relative:1; /**< Look for pattern after the previous item. */
> +	uint32_t search:1; /**< Search pattern from offset (see also limit). */
> +	uint32_t reserved:30; /**< Reserved, must be set to zero. */
> +	int32_t offset; /**< Absolute or relative offset for pattern. */
> +	uint16_t limit; /**< Search area limit for start of pattern. */
> +	uint16_t length; /**< Pattern length. */
> +	uint8_t pattern[]; /**< Byte string to look for. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_ETH
> + *
> + * Matches an Ethernet header.
> + */
> +struct rte_flow_item_eth {
> +	struct ether_addr dst; /**< Destination MAC. */
> +	struct ether_addr src; /**< Source MAC. */
> +	unsigned int type; /**< EtherType. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_VLAN
> + *
> + * Matches an 802.1Q/ad VLAN tag.
> + *
> + * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
> + * RTE_FLOW_ITEM_TYPE_VLAN.
> + */
> +struct rte_flow_item_vlan {
> +	uint16_t tpid; /**< Tag protocol identifier. */
> +	uint16_t tci; /**< Tag control information. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_IPV4
> + *
> + * Matches an IPv4 header.
> + *
> + * Note: IPv4 options are handled by dedicated pattern items.
> + */
> +struct rte_flow_item_ipv4 {
> +	struct ipv4_hdr hdr; /**< IPv4 header definition. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_IPV6.
> + *
> + * Matches an IPv6 header.
> + *
> + * Note: IPv6 options are handled by dedicated pattern items.
> + */
> +struct rte_flow_item_ipv6 {
> +	struct ipv6_hdr hdr; /**< IPv6 header definition. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_ICMP.
> + *
> + * Matches an ICMP header.
> + */
> +struct rte_flow_item_icmp {
> +	struct icmp_hdr hdr; /**< ICMP header definition. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_UDP.
> + *
> + * Matches a UDP header.
> + */
> +struct rte_flow_item_udp {
> +	struct udp_hdr hdr; /**< UDP header definition. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_TCP.
> + *
> + * Matches a TCP header.
> + */
> +struct rte_flow_item_tcp {
> +	struct tcp_hdr hdr; /**< TCP header definition. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_SCTP.
> + *
> + * Matches a SCTP header.
> + */
> +struct rte_flow_item_sctp {
> +	struct sctp_hdr hdr; /**< SCTP header definition. */
> +};
> +
> +/**
> + * RTE_FLOW_ITEM_TYPE_VXLAN.
> + *
> + * Matches a VXLAN header (RFC 7348).
> + */
> +struct rte_flow_item_vxlan {
> +	uint8_t flags; /**< Normally 0x08 (I flag). */
> +	uint8_t rsvd0[3]; /**< Reserved, normally 0x000000. */
> +	uint8_t vni[3]; /**< VXLAN identifier. */
> +	uint8_t rsvd1; /**< Reserved, normally 0x00. */
> +};
> +
> +/**
> + * Matching pattern item definition.
> + *
> + * A pattern is formed by stacking items starting from the lowest protocol
> + * layer to match. This stacking restriction does not apply to meta items
> + * which can be placed anywhere in the stack with no effect on the meaning
> + * of the resulting pattern.
> + *
> + * A stack is terminated by a END item.
> + *
> + * The spec field should be a valid pointer to a structure of the related
> + * item type. It may be set to NULL in many cases to use default values.
> + *
> + * Optionally, last can point to a structure of the same type to define an
> + * inclusive range. This is mostly supported by integer and address fields,
> + * may cause errors otherwise. Fields that do not support ranges must be set
> + * to the same value as their spec counterparts.
> + *
> + * By default all fields present in spec are considered relevant.* This

typo "*"

> + * behavior can be altered by providing a mask structure of the same type
> + * with applicable bits set to one. It can also be used to partially filter
> + * out specific fields (e.g. as an alternate mean to match ranges of IP
> + * addresses).
> + *
> + * Note this is a simple bit-mask applied before interpreting the contents
> + * of spec and last, which may yield unexpected results if not used
> + * carefully. For example, if for an IPv4 address field, spec provides
> + * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
> + * effective range is 10.1.0.0 to 10.3.255.255.
> + *
> + * * The defaults for data-matching items such as IPv4 when mask is not
> + *   specified actually depend on the underlying implementation since only
> + *   recognized fields can be taken into account.
> + */
> +struct rte_flow_item {
> +	enum rte_flow_item_type type; /**< Item type. */
> +	const void *spec; /**< Pointer to item specification structure. */
> +	const void *last; /**< Defines an inclusive range (spec to last). */
> +	const void *mask; /**< Bit-mask applied to spec and last. */
> +};
> +
> +/**
> + * Action types.
> + *
> + * Each possible action is represented by a type. Some have associated
> + * configuration structures. Several actions combined in a list can be
> + * affected to a flow rule. That list is not ordered.
> + *
> + * They fall in three categories:
> + *
> + * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
> + *   processing matched packets by subsequent flow rules, unless overridden
> + *   with PASSTHRU.
> + *
> + * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
> + *   for additional processing by subsequent flow rules.
> + *
> + * - Other non terminating meta actions that do not affect the fate of
> + *   packets (END, VOID, MARK, FLAG, COUNT).
> + *
> + * When several actions are combined in a flow rule, they should all have
> + * different types (e.g. dropping a packet twice is not possible). The
> + * defined behavior is for PMDs to only take into account the last action of
> + * a given type found in the list. PMDs still perform error checking on the
> + * entire list.

why do you define that the pmd will interpret multiple same type rules
in this way...would it not make more sense for the pmd to just return
EINVAL for an invalid set of rules? It seems more transparent for the
application.

> + *
> + * Note that PASSTHRU is the only action able to override a terminating
> + * rule.
> + */
> +enum rte_flow_action_type {
> +	/**
> +	 * [META]
> +	 *
> +	 * End marker for action lists. Prevents further processing of
> +	 * actions, thereby ending the list.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_END,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Used as a placeholder for convenience. It is ignored and simply
> +	 * discarded by PMDs.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_VOID,
> +
> +	/**
> +	 * Leaves packets up for additional processing by subsequent flow
> +	 * rules. This is the default when a rule does not contain a
> +	 * terminating action, but can be specified to force a rule to
> +	 * become non-terminating.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_PASSTHRU,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Attaches a 32 bit value to packets.
> +	 *
> +	 * See struct rte_flow_action_mark.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_MARK,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Flag packets. Similar to MARK but only affects ol_flags.
> +	 *
> +	 * Note: a distinctive flag must be defined for it.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_FLAG,
> +
> +	/**
> +	 * Assigns packets to a given queue index.
> +	 *
> +	 * See struct rte_flow_action_queue.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_QUEUE,
> +
> +	/**
> +	 * Drops packets.
> +	 *
> +	 * PASSTHRU overrides this action if both are specified.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_DROP,
> +
> +	/**
> +	 * [META]
> +	 *
> +	 * Enables counters for this rule.
> +	 *
> +	 * These counters can be retrieved and reset through rte_flow_query(),
> +	 * see struct rte_flow_query_count.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_COUNT,
> +
> +	/**
> +	 * Duplicates packets to a given queue index.
> +	 *
> +	 * This is normally combined with QUEUE, however when used alone, it
> +	 * is actually similar to QUEUE + PASSTHRU.
> +	 *
> +	 * See struct rte_flow_action_dup.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_DUP,
> +
> +	/**
> +	 * Similar to QUEUE, except RSS is additionally performed on packets
> +	 * to spread them among several queues according to the provided
> +	 * parameters.
> +	 *
> +	 * See struct rte_flow_action_rss.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_RSS,
> +
> +	/**
> +	 * Redirects packets to the physical function (PF) of the current
> +	 * device.
> +	 *
> +	 * No associated configuration structure.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_PF,
> +
> +	/**
> +	 * Redirects packets to the virtual function (VF) of the current
> +	 * device with the specified ID.
> +	 *
> +	 * See struct rte_flow_action_vf.
> +	 */
> +	RTE_FLOW_ACTION_TYPE_VF,
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_MARK
> + *
> + * Attaches a 32 bit value to packets.
> + *
> + * This value is arbitrary and application-defined. For compatibility with
> + * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
> + * also set in ol_flags.
> + */
> +struct rte_flow_action_mark {
> +	uint32_t id; /**< 32 bit value to return with packets. */
> +};

One use case I thought we would be able to do for OVS is classification
in hardware and the unique flow id is sent with the packet to software.
But in OVS the ufid is 128 bits, so it means we can't and there is still
the miniflow extract overhead. I'm not sure if there is a practical way
around this.

Sugesh (cc'd) has looked at this before and may be able to comment or
correct me.

> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_QUEUE
> + *
> + * Assign packets to a given queue index.
> + *
> + * Terminating by default.
> + */
> +struct rte_flow_action_queue {
> +	uint16_t index; /**< Queue index to use. */
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_COUNT (query)
> + *
> + * Query structure to retrieve and reset flow rule counters.
> + */
> +struct rte_flow_query_count {
> +	uint32_t reset:1; /**< Reset counters after query [in]. */
> +	uint32_t hits_set:1; /**< hits field is set [out]. */
> +	uint32_t bytes_set:1; /**< bytes field is set [out]. */
> +	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
> +	uint64_t hits; /**< Number of hits for this rule [out]. */
> +	uint64_t bytes; /**< Number of bytes through this rule [out]. */
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_DUP
> + *
> + * Duplicates packets to a given queue index.
> + *
> + * This is normally combined with QUEUE, however when used alone, it is
> + * actually similar to QUEUE + PASSTHRU.
> + *
> + * Non-terminating by default.
> + */
> +struct rte_flow_action_dup {
> +	uint16_t index; /**< Queue index to duplicate packets to. */
> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_RSS
> + *
> + * Similar to QUEUE, except RSS is additionally performed on packets to
> + * spread them among several queues according to the provided parameters.
> + *
> + * Note: RSS hash result is normally stored in the hash.rss mbuf field,
> + * however it conflicts with the MARK action as they share the same
> + * space. When both actions are specified, the RSS hash is discarded and
> + * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
> + * structure should eventually evolve to store both.
> + *
> + * Terminating by default.
> + */
> +struct rte_flow_action_rss {
> +	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
> +	uint16_t queues; /**< Number of entries in queue[]. */
> +	uint16_t queue[]; /**< Queues indices to use. */

I'd try and avoid queue and queues - someone will say "huh?" when
reading code. s/queues/num ?

> +};
> +
> +/**
> + * RTE_FLOW_ACTION_TYPE_VF
> + *
> + * Redirects packets to a virtual function (VF) of the current device.
> + *
> + * Packets matched by a VF pattern item can be redirected to their original
> + * VF ID instead of the specified one. This parameter may not be available
> + * and is not guaranteed to work properly if the VF part is matched by a
> + * prior flow rule or if packets are not addressed to a VF in the first
> + * place.

Not clear what you mean by "not guaranteed to work if...". Please return
fail when this action is used if this is not going to work.

> + *
> + * Terminating by default.
> + */
> +struct rte_flow_action_vf {
> +	uint32_t original:1; /**< Use original VF ID if possible. */
> +	uint32_t reserved:31; /**< Reserved, must be zero. */
> +	uint32_t id; /**< VF ID to redirect packets to. */
> +};
> +
> +/**
> + * Definition of a single action.
> + *
> + * A list of actions is terminated by a END action.
> + *
> + * For simple actions without a configuration structure, conf remains NULL.
> + */
> +struct rte_flow_action {
> +	enum rte_flow_action_type type; /**< Action type. */
> +	const void *conf; /**< Pointer to action configuration structure. */
> +};
> +
> +/**
> + * Opaque type returned after successfully creating a flow.
> + *
> + * This handle can be used to manage and query the related flow (e.g. to
> + * destroy it or retrieve counters).
> + */
> +struct rte_flow;
> +
> +/**
> + * Verbose error types.
> + *
> + * Most of them provide the type of the object referenced by struct
> + * rte_flow_error.cause.
> + */
> +enum rte_flow_error_type {
> +	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
> +	RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
> +	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
> +	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
> +	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
> +	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
> +	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
> +	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
> +	RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
> +	RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
> +	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
> +	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
> +};
> +
> +/**
> + * Verbose error structure definition.
> + *
> + * This object is normally allocated by applications and set by PMDs, the
> + * message points to a constant string which does not need to be freed by
> + * the application, however its pointer can be considered valid only as long
> + * as its associated DPDK port remains configured. Closing the underlying
> + * device or unloading the PMD invalidates it.
> + *
> + * Both cause and message may be NULL regardless of the error type.
> + */
> +struct rte_flow_error {
> +	enum rte_flow_error_type type; /**< Cause field and error types. */
> +	const void *cause; /**< Object responsible for the error. */
> +	const char *message; /**< Human-readable error message. */
> +};
> +
> +/**
> + * Check whether a flow rule can be created on a given port.
> + *
> + * While this function has no effect on the target device, the flow rule is
> + * validated against its current configuration state and the returned value
> + * should be considered valid by the caller for that state only.
> + *
> + * The returned value is guaranteed to remain valid only as long as no
> + * successful calls to rte_flow_create() or rte_flow_destroy() are made in
> + * the meantime and no device parameter affecting flow rules in any way are
> + * modified, due to possible collisions or resource limitations (although in
> + * such cases EINVAL should not be returned).
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] attr
> + *   Flow rule attributes.
> + * @param[in] pattern
> + *   Pattern specification (list terminated by the END pattern item).
> + * @param[in] actions
> + *   Associated actions (list terminated by the END action).
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 if flow rule is valid and can be created. A negative errno value
> + *   otherwise (rte_errno is also set), the following errors are defined:
> + *
> + *   -ENOSYS: underlying device does not support this functionality.
> + *
> + *   -EINVAL: unknown or invalid rule specification.
> + *
> + *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
> + *   bit-masks are unsupported).
> + *
> + *   -EEXIST: collision with an existing rule.
> + *
> + *   -ENOMEM: not enough resources.
> + *
> + *   -EBUSY: action cannot be performed due to busy device resources, may
> + *   succeed if the affected queues or even the entire port are in a stopped
> + *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
> + */
> +int
> +rte_flow_validate(uint8_t port_id,
> +		  const struct rte_flow_attr *attr,
> +		  const struct rte_flow_item pattern[],
> +		  const struct rte_flow_action actions[],
> +		  struct rte_flow_error *error);

Why not just use rte_flow_create() and get an error? Is it less
disruptive to do a validate and find the rule cannot be created, than
using a create directly?

> +
> +/**
> + * Create a flow rule on a given port.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[in] attr
> + *   Flow rule attributes.
> + * @param[in] pattern
> + *   Pattern specification (list terminated by the END pattern item).
> + * @param[in] actions
> + *   Associated actions (list terminated by the END action).
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   A valid handle in case of success, NULL otherwise and rte_errno is set
> + *   to the positive version of one of the error codes defined for
> + *   rte_flow_validate().
> + */
> +struct rte_flow *
> +rte_flow_create(uint8_t port_id,
> +		const struct rte_flow_attr *attr,
> +		const struct rte_flow_item pattern[],
> +		const struct rte_flow_action actions[],
> +		struct rte_flow_error *error);

General question - are these functions threadsafe? In the OVS example
you could have several threads wanting to create flow rules at the same
time for same or different ports.

> +
> +/**
> + * Destroy a flow rule on a given port.
> + *
> + * Failure to destroy a flow rule handle may occur when other flow rules
> + * depend on it, and destroying it would result in an inconsistent state.
> + *
> + * This function is only guaranteed to succeed if handles are destroyed in
> + * reverse order of their creation.

How can the application find this information out on error?

> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param flow
> + *   Flow rule handle to destroy.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +rte_flow_destroy(uint8_t port_id,
> +		 struct rte_flow *flow,
> +		 struct rte_flow_error *error);
> +
> +/**
> + * Destroy all flow rules associated with a port.
> + *
> + * In the unlikely event of failure, handles are still considered destroyed
> + * and no longer valid but the port must be assumed to be in an inconsistent
> + * state.
> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +rte_flow_flush(uint8_t port_id,
> +	       struct rte_flow_error *error);

rte_flow_destroy_all() would be more descriptive (but breaks your style)

> +
> +/**
> + * Query an existing flow rule.
> + *
> + * This function allows retrieving flow-specific data such as counters.
> + * Data is gathered by special actions which must be present in the flow
> + * rule definition.

re last sentence, it would be good if you can put a link to
RTE_FLOW_ACTION_TYPE_COUNT

> + *
> + * @param port_id
> + *   Port identifier of Ethernet device.
> + * @param flow
> + *   Flow rule handle to query.
> + * @param action
> + *   Action type to query.
> + * @param[in, out] data
> + *   Pointer to storage for the associated query data type.

can this be anything other than rte_flow_query_count?

> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +rte_flow_query(uint8_t port_id,
> +	       struct rte_flow *flow,
> +	       enum rte_flow_action_type action,
> +	       void *data,
> +	       struct rte_flow_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif

I don't see a way to dump all the rules for a port out. I think this is
neccessary for degbugging. You could have a look through dpif.h in OVS
and see how dpif_flow_dump_next() is used, it might be a good reference.

Also, it would be nice if there were an api that would allow a test
packet to be injected and traced for debugging - although I'm not
exactly sure how well it could be traced. For reference:
http://developers.redhat.com/blog/2016/10/12/tracing-packets-inside-open-vswitch/

thanks,
Kevin.

> +
> +#endif /* RTE_FLOW_H_ */
> diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
> new file mode 100644
> index 0000000..a88c621
> --- /dev/null
> +++ b/lib/librte_ether/rte_flow_driver.h
> @@ -0,0 +1,177 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2016 6WIND S.A.
> + *   Copyright 2016 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef RTE_FLOW_DRIVER_H_
> +#define RTE_FLOW_DRIVER_H_
> +
> +/**
> + * @file
> + * RTE generic flow API (driver side)
> + *
> + * This file provides implementation helpers for internal use by PMDs, they
> + * are not intended to be exposed to applications and are not subject to ABI
> + * versioning.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include "rte_flow.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Generic flow operations structure implemented and returned by PMDs.
> + *
> + * To implement this API, PMDs must handle the RTE_ETH_FILTER_GENERIC filter
> + * type in their .filter_ctrl callback function (struct eth_dev_ops) as well
> + * as the RTE_ETH_FILTER_GET filter operation.
> + *
> + * If successful, this operation must result in a pointer to a PMD-specific
> + * struct rte_flow_ops written to the argument address as described below:
> + *
> + *  // PMD filter_ctrl callback
> + *
> + *  static const struct rte_flow_ops pmd_flow_ops = { ... };
> + *
> + *  switch (filter_type) {
> + *  case RTE_ETH_FILTER_GENERIC:
> + *      if (filter_op != RTE_ETH_FILTER_GET)
> + *          return -EINVAL;
> + *      *(const void **)arg = &pmd_flow_ops;
> + *      return 0;
> + *  }
> + *
> + * See also rte_flow_ops_get().
> + *
> + * These callback functions are not supposed to be used by applications
> + * directly, which must rely on the API defined in rte_flow.h.
> + *
> + * Public-facing wrapper functions perform a few consistency checks so that
> + * unimplemented (i.e. NULL) callbacks simply return -ENOTSUP. These
> + * callbacks otherwise only differ by their first argument (with port ID
> + * already resolved to a pointer to struct rte_eth_dev).
> + */
> +struct rte_flow_ops {
> +	/** See rte_flow_validate(). */
> +	int (*validate)
> +		(struct rte_eth_dev *,
> +		 const struct rte_flow_attr *,
> +		 const struct rte_flow_item [],
> +		 const struct rte_flow_action [],
> +		 struct rte_flow_error *);
> +	/** See rte_flow_create(). */
> +	struct rte_flow *(*create)
> +		(struct rte_eth_dev *,
> +		 const struct rte_flow_attr *,
> +		 const struct rte_flow_item [],
> +		 const struct rte_flow_action [],
> +		 struct rte_flow_error *);
> +	/** See rte_flow_destroy(). */
> +	int (*destroy)
> +		(struct rte_eth_dev *,
> +		 struct rte_flow *,
> +		 struct rte_flow_error *);
> +	/** See rte_flow_flush(). */
> +	int (*flush)
> +		(struct rte_eth_dev *,
> +		 struct rte_flow_error *);
> +	/** See rte_flow_query(). */
> +	int (*query)
> +		(struct rte_eth_dev *,
> +		 struct rte_flow *,
> +		 enum rte_flow_action_type,
> +		 void *,
> +		 struct rte_flow_error *);
> +};
> +
> +/**
> + * Initialize generic flow error structure.
> + *
> + * This function also sets rte_errno to a given value.
> + *
> + * @param[out] error
> + *   Pointer to flow error structure (may be NULL).
> + * @param code
> + *   Related error code (rte_errno).
> + * @param type
> + *   Cause field and error types.
> + * @param cause
> + *   Object responsible for the error.
> + * @param message
> + *   Human-readable error message.
> + *
> + * @return
> + *   Pointer to flow error structure.
> + */
> +static inline struct rte_flow_error *
> +rte_flow_error_set(struct rte_flow_error *error,
> +		   int code,
> +		   enum rte_flow_error_type type,
> +		   void *cause,
> +		   const char *message)
> +{
> +	if (error) {
> +		*error = (struct rte_flow_error){
> +			.type = type,
> +			.cause = cause,
> +			.message = message,
> +		};
> +	}
> +	rte_errno = code;
> +	return error;
> +}
> +
> +/**
> + * Get generic flow operations structure from a port.
> + *
> + * @param port_id
> + *   Port identifier to query.
> + * @param[out] error
> + *   Pointer to flow error structure.
> + *
> + * @return
> + *   The flow operations structure associated with port_id, NULL in case of
> + *   error, in which case rte_errno is set and the error structure contains
> + *   additional details.
> + */
> +const struct rte_flow_ops *
> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_FLOW_DRIVER_H_ */
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC PATCH 6/6] eal: removing eth_driver
  2016-11-17 12:53  4%   ` Jan Blunck
@ 2016-11-18 13:05  3%     ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-11-18 13:05 UTC (permalink / raw)
  To: Jan Blunck; +Cc: David Marchand, dev

sorry for delay in responding; somehow I didn't notice this email.

On Thursday 17 November 2016 06:23 PM, Jan Blunck wrote:
> On Thu, Nov 17, 2016 at 6:30 AM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>> This patch demonstrates how eth_driver can be replaced with appropriate
>> changes for rte_xxx_driver from the PMD itself. It uses ixgbe_ethernet as
>> an example.
>>
>> A large set of changes exists in the rte_ethdev.c - primarily because too
>> much PCI centric code (names, assumption of rte_pci_device) still exists
>> in it. Most, except symbol naming, has been changed in this patch.
>>
>> This proposes that:
>>  - PMD would declare the rte_xxx_driver. In case of ixgbe, it would be
>>    rte_pci_driver.
>>  - Probe and remove continue to exists in rte_pci_driver. But, the
>>    rte_driver has new hooks for init and uninit. The rationale is that
>>    once a ethernet or cryto device is created, the rte_driver->init would
>>    be responsible for initializing the device.
>>    -- Eth_dev -> rte_driver -> rte_pci_driver
>>                           |    `-> probe/remove
>>                           `--> init/uninit
>
> Hmm, from my perspective this moves struct rte_driver a step closer to
> struct rte_eth_dev instead of decoupling them. It is up to the
> rte_driver->probe if it wants to allocate a struct rte_eth_dev,
> rte_crypto_dev or the famous rte_foo_dev.

That 'closeness' was my intention - to make rte_eth_dev an 
implementation of rte_device type.

rte_eth_dev == rte_cryptodev == rte_anyother_functional_device
- for the above context. All would include rte_device.

As for rte_driver->probe(), it still comes in the rte_driver->init()'s 
role to initialize the 'generic' functional device associated with the 
driver. And, allowing bus specific driver (like PCI) for its individual 
initialization using rte_xxx_driver->probe.

>
> Instead of explicitly modelling rte_eth_dev specifics like init, unit
> or dev_private_size I think we should delegate this to the
> rte_driver->probe instead. Most of what is in rte_eth_dev_pci_probe()
> today is anyway a rte_eth_dev_allocate_priv() anyway. I already have
> some patches in this area in my patch stack.

Can be done - either way rte_pci_driver->probe() ends up calling 
driver->init() (or erstwhile eth_driver->eth_dev_init()).

But, I still think it is better to keep them separate.
A PCI device is type of rte_device, physically.
A ethernet device is type of rte_device, logically.
They both should exist independently. It will help in splitting the 
functionality from physical layout in future - if need be.

>
>
>>  - necessary changes in the rte_eth_dev have also been done so that it
>>    refers to the rte_device and rte_driver rather than rte_xxx_*. This
>>    would imply, ethernet device is 'linked' to a rte_device/rte_driver
>>    which in turn is a rte_xxx_device/rte_xxx_driver type.
>>    - for all operations related to extraction relvant xxx type,
>>      container_of would have to be used.
>>
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> ---
>>  drivers/net/ixgbe/ixgbe_ethdev.c | 49 +++++++++++++++++++++-------------------
>>  lib/librte_ether/rte_ethdev.c    | 36 +++++++++++++++++------------
>>  lib/librte_ether/rte_ethdev.h    |  6 ++---
>>  3 files changed, 51 insertions(+), 40 deletions(-)
>>
>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
>> index edc9b22..acead31 100644
>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>> @@ -1419,7 +1419,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
>>                 return 0;
>>         }
>>
>> -       pci_dev = eth_dev->pci_dev;
>> +       pci_dev = container_of(eth_dev->device, struct rte_pci_device, device);
>>
>>         rte_eth_copy_pci_info(eth_dev, pci_dev);
>>
>> @@ -1532,7 +1532,9 @@ static int
>>  eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
>>  {
>>         struct ixgbe_hw *hw;
>> -       struct rte_pci_device *pci_dev = eth_dev->pci_dev;
>> +       struct rte_pci_device *pci_dev;
>> +
>> +       pci_dev = container_of(eth_dev->device, struct rte_pci_device, device);
>>
>>         PMD_INIT_FUNC_TRACE();
>>
>> @@ -1562,32 +1564,33 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
>>         return 0;
>>  }
>>
>> -static struct eth_driver rte_ixgbe_pmd = {
>> -       .pci_drv = {
>> -               .id_table = pci_id_ixgbe_map,
>> -               .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
>> -                       RTE_PCI_DRV_DETACHABLE,
>> -               .probe = rte_eth_dev_pci_probe,
>> -               .remove = rte_eth_dev_pci_remove,
>> +static struct rte_pci_driver rte_ixgbe_pci_driver = {
>> +       .id_table = pci_id_ixgbe_map,
>> +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
>> +                    RTE_PCI_DRV_DETACHABLE,
>> +       .probe = rte_eth_dev_pci_probe,
>> +       .remove = rte_eth_dev_pci_remove,
>> +       .driver = {
>> +               .driver_init_t= eth_ixgbe_dev_init,
>> +               .driver_uninit_t= eth_ixgbe_dev_uninit,
>> +               .dev_private_size = sizeof(struct ixgbe_adapter),
>>         },
>> -       .eth_dev_init = eth_ixgbe_dev_init,
>> -       .eth_dev_uninit = eth_ixgbe_dev_uninit,
>> -       .dev_private_size = sizeof(struct ixgbe_adapter),
>>  };
>>
>>  /*
>>   * virtual function driver struct
>>   */
>> -static struct eth_driver rte_ixgbevf_pmd = {
>> -       .pci_drv = {
>> -               .id_table = pci_id_ixgbevf_map,
>> -               .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_DETACHABLE,
>> -               .probe = rte_eth_dev_pci_probe,
>> -               .remove = rte_eth_dev_pci_remove,
>> +static struct rte_pci_driver rte_ixgbevf_pci_driver = {
>> +       .id_table = pci_id_ixgbevf_map,
>> +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_DETACHABLE,
>> +       .probe = rte_eth_dev_pci_probe,
>> +       .remove = rte_eth_dev_pci_remove,
>> +       .driver = {
>> +               /* rte_driver hooks */
>> +               .init = eth_ixgbevf_dev_init,
>> +               .uninit = eth_ixgbevf_dev_uninit,
>> +               .dev_private_size = sizeof(struct ixgbe_adapter),
>>         },
>> -       .eth_dev_init = eth_ixgbevf_dev_init,
>> -       .eth_dev_uninit = eth_ixgbevf_dev_uninit,
>> -       .dev_private_size = sizeof(struct ixgbe_adapter),
>>  };
>>
>>  static int
>> @@ -7592,7 +7595,7 @@ ixgbevf_dev_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
>>         ixgbevf_dev_interrupt_action(dev);
>>  }
>>
>> -RTE_PMD_REGISTER_PCI(net_ixgbe, rte_ixgbe_pmd.pci_drv);
>> +RTE_PMD_REGISTER_PCI(net_ixgbe, rte_ixgbe_pci_driver);
>>  RTE_PMD_REGISTER_PCI_TABLE(net_ixgbe, pci_id_ixgbe_map);
>> -RTE_PMD_REGISTER_PCI(net_ixgbe_vf, rte_ixgbevf_pmd.pci_drv);
>> +RTE_PMD_REGISTER_PCI(net_ixgbe_vf, rte_ixgbevf_pci_driver);
>>  RTE_PMD_REGISTER_PCI_TABLE(net_ixgbe_vf, pci_id_ixgbevf_map);
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index fde8112..3535ff4 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -235,13 +235,13 @@ int
>>  rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>>                       struct rte_pci_device *pci_dev)
>>  {
>> -       struct eth_driver    *eth_drv;
>> +       struct rte_driver *drv;
>>         struct rte_eth_dev *eth_dev;
>>         char ethdev_name[RTE_ETH_NAME_MAX_LEN];
>>
>>         int diag;
>>
>> -       eth_drv = (struct eth_driver *)pci_drv;
>> +       drv = pci_drv->driver;
>>
>>         rte_eal_pci_device_name(&pci_dev->addr, ethdev_name,
>>                         sizeof(ethdev_name));
>> @@ -252,13 +252,13 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>>
>>         if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>>                 eth_dev->data->dev_private = rte_zmalloc("ethdev private structure",
>> -                                 eth_drv->dev_private_size,
>> +                                 drv->dev_private_size,
>>                                   RTE_CACHE_LINE_SIZE);
>>                 if (eth_dev->data->dev_private == NULL)
>>                         rte_panic("Cannot allocate memzone for private port data\n");
>>         }
>> -       eth_dev->pci_dev = pci_dev;
>> -       eth_dev->driver = eth_drv;
>> +       eth_dev->device = pci_dev->device;
>> +       eth_dev->driver = drv;
>>         eth_dev->data->rx_mbuf_alloc_failed = 0;
>>
>>         /* init user callbacks */
>> @@ -270,7 +270,7 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>>         eth_dev->data->mtu = ETHER_MTU;
>>
>>         /* Invoke PMD device initialization function */
>> -       diag = (*eth_drv->eth_dev_init)(eth_dev);
>> +       diag = (*drv->init)(eth_dev);
>>         if (diag == 0)
>>                 return 0;
>>
>> @@ -287,7 +287,7 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>>  int
>>  rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>>  {
>> -       const struct eth_driver *eth_drv;
>> +       const struct rte_driver *drv;
>>         struct rte_eth_dev *eth_dev;
>>         char ethdev_name[RTE_ETH_NAME_MAX_LEN];
>>         int ret;
>> @@ -302,11 +302,11 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>>         if (eth_dev == NULL)
>>                 return -ENODEV;
>>
>> -       eth_drv = (const struct eth_driver *)pci_dev->driver;
>> +       drv = pci_dev->driver;
>>
>>         /* Invoke PMD device uninit function */
>> -       if (*eth_drv->eth_dev_uninit) {
>> -               ret = (*eth_drv->eth_dev_uninit)(eth_dev);
>> +       if (*drv->uninit) {
>> +               ret = (*drv->uninit)(eth_dev);
>>                 if (ret)
>>                         return ret;
>>         }
>> @@ -317,7 +317,7 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>>         if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>>                 rte_free(eth_dev->data->dev_private);
>>
>> -       eth_dev->pci_dev = NULL;
>> +       eth_dev->device = NULL;
>>         eth_dev->driver = NULL;
>>         eth_dev->data = NULL;
>>
>> @@ -1556,7 +1556,7 @@ rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
>>
>>         RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>>         (*dev->dev_ops->dev_infos_get)(dev, dev_info);
>> -       dev_info->pci_dev = dev->pci_dev;
>> +       dev_info->device = dev->device;
>>         dev_info->driver_name = dev->data->drv_name;
>>         dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>>         dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>> @@ -2537,6 +2537,7 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
>>  {
>>         uint32_t vec;
>>         struct rte_eth_dev *dev;
>> +       struct rte_pci_device *pci_dev;
>>         struct rte_intr_handle *intr_handle;
>>         uint16_t qid;
>>         int rc;
>> @@ -2544,6 +2545,10 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
>>         RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>
>>         dev = &rte_eth_devices[port_id];
>> +       /* TODO intr_handle is currently in rte_pci_device;
>> +        * Below is incorrect until that time
>> +        */
>> +       pci_dev = container_of(dev->device, struct rte_pci_device, device);
>>         intr_handle = &dev->pci_dev->intr_handle;
>>         if (!intr_handle->intr_vec) {
>>                 RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n");
>> @@ -2572,7 +2577,7 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>>         const struct rte_memzone *mz;
>>
>>         snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
>> -                dev->driver->pci_drv.driver.name, ring_name,
>> +                dev->driver->name, ring_name,
>>                  dev->data->port_id, queue_id);
>>
>>         mz = rte_memzone_lookup(z_name);
>> @@ -2593,6 +2598,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
>>  {
>>         uint32_t vec;
>>         struct rte_eth_dev *dev;
>> +       struct rte_pci_device *pci_dev;
>>         struct rte_intr_handle *intr_handle;
>>         int rc;
>>
>> @@ -2604,7 +2610,9 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
>>                 return -EINVAL;
>>         }
>>
>> -       intr_handle = &dev->pci_dev->intr_handle;
>> +       /* TODO; Until intr_handle is available in rte_device, below is incorrect */
>> +       pci_dev = container_of(dev->device, struct rte_pci_device, device);
>> +       intr_handle = &pci_dev->intr_handle;
>>         if (!intr_handle->intr_vec) {
>>                 RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n");
>>                 return -EPERM;
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index 38641e8..2b1d826 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -876,7 +876,7 @@ struct rte_eth_conf {
>>   * Ethernet device information
>>   */
>>  struct rte_eth_dev_info {
>> -       struct rte_pci_device *pci_dev; /**< Device PCI information. */
>> +       struct rte_device *device; /**< Device PCI information. */
>
> We already the situation that virtual devices don't set the pci_dev
> field. I wonder if it really makes sense to replace it with a struct
> rte_device because that is not adding a lot of value (only numa_node).

Sorry, I couldn't understand which way you are pointing:
  - continuing with 'rte_pci_device' in rte_eth_dev_info.
  - completely removing both, rte_pci_device and rte_device

In either case, I am ok. I went through the code usage of 
rte_eth_dev_info and it is mostly being used for getting information. I 
couldn't point out a situation where based on available info 
(rte_eth_dev_info), we need to extract back the device it is 
representing. Is that understanding correct?

If yes, I can remove this (after checking that this member is not being 
used).

> If we break ABI we might want to add numa_node and dev_flags as
> suggested by Stephen Hemminger already. If we choose to not break ABI
> we can delegate the population of the pci_dev field to dev_infos_get.
> I already have that patch in my patch stack too.

We can't avoid the ABI breakage - it is anyway going to happen.
As for 'dev_flags', I am assuming you are referring to moving 
'drv_flags' from rte_pci_driver. And you are suggesting moving that to 
'rte_driver' - is that correct understanding?

I don't know if drv_flags have any significance in rte_device. I thought 
they are driver specific flags (mmap, etc). Or, maybe they are just 
placed in driver for acting on all compatible devices.

>
> The problem with rte_eth_dev_info is that it doesn't support
> extensions. Maybe its time to add rte_eth_dev_info_ex() ... or
> rte_eth_dev_xinfo() if you don't like the IB verbs API.

I have no idea about IB verbs. And as for extensions, I will have to see 
- I don't prefer mixing that with current set. Though, the idea is nice.

>
>
>>         const char *driver_name; /**< Device Driver name. */
>>         unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>>                 Use if_indextoname() to translate into an interface name. */
>> @@ -1623,9 +1623,9 @@ struct rte_eth_dev {
>>         eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>>         eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
>>         struct rte_eth_dev_data *data;  /**< Pointer to device data */
>> -       const struct eth_driver *driver;/**< Driver for this device */
>> +       const struct rte_driver *driver;/**< Driver for this device */
>>         const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
>> -       struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */
>> +       struct rte_device *device; /**< Device instance */
>>         /** User application callbacks for NIC interrupts */
>>         struct rte_eth_dev_cb_list link_intr_cbs;
>>         /**
>> --
>> 2.7.4
>>
>

-
Shreyansh

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-11-18  6:36  0%     ` Xing, Beilei
@ 2016-11-18 10:28  3%       ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-11-18 10:28 UTC (permalink / raw)
  To: Xing, Beilei; +Cc: dev, Thomas Monjalon, De Lara Guarch, Pablo, Olivier Matz

Hi Beilei,

On Fri, Nov 18, 2016 at 06:36:31AM +0000, Xing, Beilei wrote:
> Hi Adrien,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Thursday, November 17, 2016 12:23 AM
> > To: dev@dpdk.org
> > Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; De Lara Guarch,
> > Pablo <pablo.de.lara.guarch@intel.com>; Olivier Matz
> > <olivier.matz@6wind.com>
> > Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> > 
> > This new API supersedes all the legacy filter types described in rte_eth_ctrl.h.
> > It is slightly higher level and as a result relies more on PMDs to process and
> > validate flow rules.
> > 
> > Benefits:
> > 
> > - A unified API is easier to program for, applications do not have to be
> >   written for a specific filter type which may or may not be supported by
> >   the underlying device.
> > 
> > - The behavior of a flow rule is the same regardless of the underlying
> >   device, applications do not need to be aware of hardware quirks.
> > 
> > - Extensible by design, API/ABI breakage should rarely occur if at all.
> > 
> > - Documentation is self-standing, no need to look up elsewhere.
> > 
> > Existing filter types will be deprecated and removed in the near future.
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> 
> 
> > +
> > +/**
> > + * Opaque type returned after successfully creating a flow.
> > + *
> > + * This handle can be used to manage and query the related flow (e.g.
> > +to
> > + * destroy it or retrieve counters).
> > + */
> > +struct rte_flow;
> > +
> 
> As we talked before, we use attr/pattern/actions to create and destroy a flow in PMD, 
> but I don't think it's easy to clone the user-provided parameters and return the result
> to the application as a rte_flow pointer.  As you suggested:
> /* PMD-specific code. */
>  struct rte_flow {
>     struct rte_flow_attr attr;
>     struct rte_flow_item *pattern;
>     struct rte_flow_action *actions;
>  };

Just to provide some context to the community since the above snippet comes
from private exchanges, I've suggested the above structure as a mean to
create and remove rules in the same fashion as FDIR, by providing the rule
used for creation to the destroy callback.

As an opaque type, each PMD currently needs to implement its own version of
struct rte_flow. The above definition may ease transition from FDIR to
rte_flow for some PMDs, however they need to clone the entire
application-provided rule to do so because there is no requirement for it to
be kept allocated.

I've implemented such a function in testpmd (port_flow_new() in commit [1])
as an example.

 [1] http://dpdk.org/ml/archives/dev/2016-November/050266.html

However my suggestion is for PMDs to use their own HW-specific structure
that only contains relevant information instead of being forced to drag
large, non-native data around, missing useful context and that requires
parsing every time. This is one benefit of using an opaque type in the first
place, the other being ABI breakage avoidance.

> Because both pattern and actions are pointers, and there're also pointers in structure
> rte_flow_item and struct rte_flow_action. We need to iterate allocation during clone
> and iterate free during destroy, then seems that the code is something ugly, right?

Well since I wrote that code, I won't easily admit it's ugly. I think PMDs
should not require the duplication of generic rules actually, which are only
defined as a common language between applications and PMDs. Both are free to
store rules in their own preferred and efficient format internally.

> I think application saves info when creating a flow rule, so why not application provide
> attr/pattern/actions info to PMD before calling PMD API?

They have to do so temporarily (e.g. allocated on the stack) while calling
rte_flow_create() and rte_flow_validate(), that's it. Once a rule is
created, there's no requirement for applications to keep anything around.

For simple applications such as testpmd, the generic format is probably
enough. More complex and existing applications such as ovs-dpdk may rather
choose to keep using their internal format that already fits their needs,
partially duplicating this information in rte_flow_attr and
rte_flow_item/rte_flow_action lists would waste memory. The conversion in
this case should only be performed when creating/validating flow rules.

In short, I fail to see any downside with maintaining struct rte_flow opaque
to applications.

Best regards,

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 2/4] lib: add bitrate statistics library
    2016-11-18  8:00  2% ` [dpdk-dev] [PATCH v5 1/4] lib: add information metrics library Remy Horton
@ 2016-11-18  8:00  3% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-11-18  8:00 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_17_02.rst             |   6 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 128 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 11 files changed, 289 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 52bd8a9..d6bbdd5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -600,6 +600,10 @@ M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 F: doc/guides/sample_app_ug/keep_alive.rst
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index dedc4c3..beca7ec 100644
--- a/config/common_base
+++ b/config/common_base
@@ -594,3 +594,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the device metrics library
 #
 CONFIG_RTE_LIBRTE_METRICS=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index ca50fa6..91e8ea6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -148,4 +148,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [Device Metrics]     (@ref rte_metrics.h),
+  [Bitrate Statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fe830eb..8765ddd 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -58,6 +58,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_ring \
                           lib/librte_sched \
                           lib/librte_metrics \
+                          lib/librte_bitratestats \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 2d82dd1..0f7c06d 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,6 +40,11 @@ New Features
      intended to provide a reporting mechanism that is independent of the
      ethdev library.
 
+   * **Added bit-rate calculation library.**
+
+     A library that can be used to calculate device bit-rates. Calculated
+     bitrates are reported using the metrics library.
+
      This section is a comment. do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =========================================================
@@ -143,6 +148,7 @@ The libraries prepended with a plus sign were incremented in this version.
 .. code-block:: diff
 
      librte_acl.so.2
+   + librte_bitratestats.so.1
      librte_cfgfile.so.2
      librte_cmdline.so.2
      librte_cryptodev.so.2
diff --git a/lib/Makefile b/lib/Makefile
index 5d85dcf..e211bc0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..b725d4e
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..6346bb1
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,128 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate_s {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates_s {
+	struct rte_stats_bitrate_s port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates_s *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates_s), 0);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data)
+{
+	const char *names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_metrics(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate_s *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +50 fixes integer rounding during divison */
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_metrics(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..bc87c5e
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates_s;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates_s *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..66f232f
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.02 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 40fcf33..6aac5ac 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 1/4] lib: add information metrics library
  @ 2016-11-18  8:00  2% ` Remy Horton
  2016-11-18  8:00  3% ` [dpdk-dev] [PATCH v5 2/4] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-11-18  8:00 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a new information metric library that allows other
modules to register named metrics and update their values. It is
intended to be independent of ethdev, rather than mixing ethdev
and non-ethdev information in xstats.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   5 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/rel_notes/release_17_02.rst     |   7 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 308 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 190 ++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 11 files changed, 584 insertions(+)
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..52bd8a9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -595,6 +595,11 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+F: doc/guides/sample_app_ug/keep_alive.rst
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 4bff83a..dedc4c3 100644
--- a/config/common_base
+++ b/config/common_base
@@ -589,3 +589,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..ca50fa6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -147,4 +147,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [Device Metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..fe830eb 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -57,6 +57,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_reorder \
                           lib/librte_ring \
                           lib/librte_sched \
+                          lib/librte_metrics \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..2d82dd1 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -34,6 +34,12 @@ New Features
 
      Refer to the previous release notes for examples.
 
+   * **Added information metric library.**
+
+     A library that allows information metrics to be added and update. It is
+     intended to provide a reporting mechanism that is independent of the
+     ethdev library.
+
      This section is a comment. do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =========================================================
@@ -152,6 +158,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_mbuf.so.2
      librte_mempool.so.2
      librte_meter.so.1
+   + librte_metrics.so.1
      librte_net.so.1
      librte_pdump.so.1
      librte_pipeline.so.3
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..5d85dcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..5edacc6
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,308 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ * @param name
+ *   Name of metric
+ * @param value
+ *   Current value for metric
+ * @param idx_next_set
+ *   Index of next root element (zero for none)
+ * @param idx_next_metric
+ *   Index of next metric in set (zero for none)
+ *
+ * Only the root of each set needs idx_next_set but since it has to be
+ * assumed that number of sets could equal total number of metrics,
+ * having a separate set metadata table doesn't save any memory.
+ */
+struct rte_metrics_meta_s {
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	uint64_t value[RTE_MAX_ETHPORTS];
+	uint64_t nonport_value;
+	uint16_t idx_next_set;
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * @param idx_last_set
+ *   Index of last metadata entry with valid data. This value is
+ *   not valid if cnt_stats is zero.
+ * @param cnt_stats
+ *   Number of metrics.
+ * @param metadata
+ *   Stat data memory block.
+ *
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	uint16_t idx_last_set;
+	uint16_t cnt_stats;
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(void)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), rte_socket_id(), 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_metric(const char *name)
+{
+	const char *list_names[] = {name};
+
+	return rte_metrics_reg_metrics(list_names, 1);
+}
+
+int
+rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_metric(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_metrics(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_metrics(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_NONPORT)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].nonport_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		if (port_id == RTE_METRICS_NONPORT)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->nonport_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..c58b366
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,190 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * RTE Metrics module
+ *
+ * Metric information is populated using a push model, where the
+ * information provider calls an update function on the relevant
+ * metrics. Currently only bulk querying of metrics is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/** Used to indicate port-independent information */
+#define RTE_METRICS_NONPORT -1
+
+
+/**
+ * Metric name
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric name.
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This only has to be explicitly called if you
+ * intend to use rte_metrics_reg_metric() or rte_metrics_reg_metrics() from a
+ * secondary process. This function must be called from a primary process.
+ */
+void rte_metrics_init(void);
+
+
+/**
+ * Register a metric
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metric(const char *name);
+
+/**
+ * Register a set of metrics
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metrics(const char **names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   Array of names to receive key names
+ *
+ * @param capacity
+ *   Space available in names
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Fetch metrics.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   Array to receive values and their keys
+ *
+ * @param capacity
+ *   Space available in values
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metric(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metrics(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..f904814
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.02 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_metric;
+	rte_metrics_reg_metrics;
+	rte_metrics_update_metric;
+	rte_metrics_update_metrics;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index f75f0e2..40fcf33 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-11-18  6:36  0%     ` Xing, Beilei
  2016-11-18 10:28  3%       ` Adrien Mazarguil
  2016-11-30 17:47  0%     ` Kevin Traynor
  2016-12-08  9:00  0%     ` Xing, Beilei
  2 siblings, 1 reply; 200+ results
From: Xing, Beilei @ 2016-11-18  6:36 UTC (permalink / raw)
  To: Adrien Mazarguil, dev
  Cc: Thomas Monjalon, De Lara Guarch, Pablo, Olivier Matz

Hi Adrien,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Thursday, November 17, 2016 12:23 AM
> To: dev@dpdk.org
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> 
> This new API supersedes all the legacy filter types described in rte_eth_ctrl.h.
> It is slightly higher level and as a result relies more on PMDs to process and
> validate flow rules.
> 
> Benefits:
> 
> - A unified API is easier to program for, applications do not have to be
>   written for a specific filter type which may or may not be supported by
>   the underlying device.
> 
> - The behavior of a flow rule is the same regardless of the underlying
>   device, applications do not need to be aware of hardware quirks.
> 
> - Extensible by design, API/ABI breakage should rarely occur if at all.
> 
> - Documentation is self-standing, no need to look up elsewhere.
> 
> Existing filter types will be deprecated and removed in the near future.
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>


> +
> +/**
> + * Opaque type returned after successfully creating a flow.
> + *
> + * This handle can be used to manage and query the related flow (e.g.
> +to
> + * destroy it or retrieve counters).
> + */
> +struct rte_flow;
> +

As we talked before, we use attr/pattern/actions to create and destroy a flow in PMD, 
but I don't think it's easy to clone the user-provided parameters and return the result
to the application as a rte_flow pointer.  As you suggested:
/* PMD-specific code. */
 struct rte_flow {
    struct rte_flow_attr attr;
    struct rte_flow_item *pattern;
    struct rte_flow_action *actions;
 };

Because both pattern and actions are pointers, and there're also pointers in structure
rte_flow_item and struct rte_flow_action. We need to iterate allocation during clone
and iterate free during destroy, then seems that the code is something ugly, right?

I think application saves info when creating a flow rule, so why not application provide
attr/pattern/actions info to PMD before calling PMD API?

Thanks,
Beilei Xing

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC PATCH 6/6] eal: removing eth_driver
  @ 2016-11-17 12:53  4%   ` Jan Blunck
  2016-11-18 13:05  3%     ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2016-11-17 12:53 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: David Marchand, dev

On Thu, Nov 17, 2016 at 6:30 AM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
> This patch demonstrates how eth_driver can be replaced with appropriate
> changes for rte_xxx_driver from the PMD itself. It uses ixgbe_ethernet as
> an example.
>
> A large set of changes exists in the rte_ethdev.c - primarily because too
> much PCI centric code (names, assumption of rte_pci_device) still exists
> in it. Most, except symbol naming, has been changed in this patch.
>
> This proposes that:
>  - PMD would declare the rte_xxx_driver. In case of ixgbe, it would be
>    rte_pci_driver.
>  - Probe and remove continue to exists in rte_pci_driver. But, the
>    rte_driver has new hooks for init and uninit. The rationale is that
>    once a ethernet or cryto device is created, the rte_driver->init would
>    be responsible for initializing the device.
>    -- Eth_dev -> rte_driver -> rte_pci_driver
>                           |    `-> probe/remove
>                           `--> init/uninit

Hmm, from my perspective this moves struct rte_driver a step closer to
struct rte_eth_dev instead of decoupling them. It is up to the
rte_driver->probe if it wants to allocate a struct rte_eth_dev,
rte_crypto_dev or the famous rte_foo_dev.

Instead of explicitly modelling rte_eth_dev specifics like init, unit
or dev_private_size I think we should delegate this to the
rte_driver->probe instead. Most of what is in rte_eth_dev_pci_probe()
today is anyway a rte_eth_dev_allocate_priv() anyway. I already have
some patches in this area in my patch stack.


>  - necessary changes in the rte_eth_dev have also been done so that it
>    refers to the rte_device and rte_driver rather than rte_xxx_*. This
>    would imply, ethernet device is 'linked' to a rte_device/rte_driver
>    which in turn is a rte_xxx_device/rte_xxx_driver type.
>    - for all operations related to extraction relvant xxx type,
>      container_of would have to be used.
>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 49 +++++++++++++++++++++-------------------
>  lib/librte_ether/rte_ethdev.c    | 36 +++++++++++++++++------------
>  lib/librte_ether/rte_ethdev.h    |  6 ++---
>  3 files changed, 51 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
> index edc9b22..acead31 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -1419,7 +1419,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
>                 return 0;
>         }
>
> -       pci_dev = eth_dev->pci_dev;
> +       pci_dev = container_of(eth_dev->device, struct rte_pci_device, device);
>
>         rte_eth_copy_pci_info(eth_dev, pci_dev);
>
> @@ -1532,7 +1532,9 @@ static int
>  eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
>  {
>         struct ixgbe_hw *hw;
> -       struct rte_pci_device *pci_dev = eth_dev->pci_dev;
> +       struct rte_pci_device *pci_dev;
> +
> +       pci_dev = container_of(eth_dev->device, struct rte_pci_device, device);
>
>         PMD_INIT_FUNC_TRACE();
>
> @@ -1562,32 +1564,33 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
>         return 0;
>  }
>
> -static struct eth_driver rte_ixgbe_pmd = {
> -       .pci_drv = {
> -               .id_table = pci_id_ixgbe_map,
> -               .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
> -                       RTE_PCI_DRV_DETACHABLE,
> -               .probe = rte_eth_dev_pci_probe,
> -               .remove = rte_eth_dev_pci_remove,
> +static struct rte_pci_driver rte_ixgbe_pci_driver = {
> +       .id_table = pci_id_ixgbe_map,
> +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
> +                    RTE_PCI_DRV_DETACHABLE,
> +       .probe = rte_eth_dev_pci_probe,
> +       .remove = rte_eth_dev_pci_remove,
> +       .driver = {
> +               .driver_init_t= eth_ixgbe_dev_init,
> +               .driver_uninit_t= eth_ixgbe_dev_uninit,
> +               .dev_private_size = sizeof(struct ixgbe_adapter),
>         },
> -       .eth_dev_init = eth_ixgbe_dev_init,
> -       .eth_dev_uninit = eth_ixgbe_dev_uninit,
> -       .dev_private_size = sizeof(struct ixgbe_adapter),
>  };
>
>  /*
>   * virtual function driver struct
>   */
> -static struct eth_driver rte_ixgbevf_pmd = {
> -       .pci_drv = {
> -               .id_table = pci_id_ixgbevf_map,
> -               .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_DETACHABLE,
> -               .probe = rte_eth_dev_pci_probe,
> -               .remove = rte_eth_dev_pci_remove,
> +static struct rte_pci_driver rte_ixgbevf_pci_driver = {
> +       .id_table = pci_id_ixgbevf_map,
> +       .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_DETACHABLE,
> +       .probe = rte_eth_dev_pci_probe,
> +       .remove = rte_eth_dev_pci_remove,
> +       .driver = {
> +               /* rte_driver hooks */
> +               .init = eth_ixgbevf_dev_init,
> +               .uninit = eth_ixgbevf_dev_uninit,
> +               .dev_private_size = sizeof(struct ixgbe_adapter),
>         },
> -       .eth_dev_init = eth_ixgbevf_dev_init,
> -       .eth_dev_uninit = eth_ixgbevf_dev_uninit,
> -       .dev_private_size = sizeof(struct ixgbe_adapter),
>  };
>
>  static int
> @@ -7592,7 +7595,7 @@ ixgbevf_dev_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
>         ixgbevf_dev_interrupt_action(dev);
>  }
>
> -RTE_PMD_REGISTER_PCI(net_ixgbe, rte_ixgbe_pmd.pci_drv);
> +RTE_PMD_REGISTER_PCI(net_ixgbe, rte_ixgbe_pci_driver);
>  RTE_PMD_REGISTER_PCI_TABLE(net_ixgbe, pci_id_ixgbe_map);
> -RTE_PMD_REGISTER_PCI(net_ixgbe_vf, rte_ixgbevf_pmd.pci_drv);
> +RTE_PMD_REGISTER_PCI(net_ixgbe_vf, rte_ixgbevf_pci_driver);
>  RTE_PMD_REGISTER_PCI_TABLE(net_ixgbe_vf, pci_id_ixgbevf_map);
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index fde8112..3535ff4 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -235,13 +235,13 @@ int
>  rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>                       struct rte_pci_device *pci_dev)
>  {
> -       struct eth_driver    *eth_drv;
> +       struct rte_driver *drv;
>         struct rte_eth_dev *eth_dev;
>         char ethdev_name[RTE_ETH_NAME_MAX_LEN];
>
>         int diag;
>
> -       eth_drv = (struct eth_driver *)pci_drv;
> +       drv = pci_drv->driver;
>
>         rte_eal_pci_device_name(&pci_dev->addr, ethdev_name,
>                         sizeof(ethdev_name));
> @@ -252,13 +252,13 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>
>         if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>                 eth_dev->data->dev_private = rte_zmalloc("ethdev private structure",
> -                                 eth_drv->dev_private_size,
> +                                 drv->dev_private_size,
>                                   RTE_CACHE_LINE_SIZE);
>                 if (eth_dev->data->dev_private == NULL)
>                         rte_panic("Cannot allocate memzone for private port data\n");
>         }
> -       eth_dev->pci_dev = pci_dev;
> -       eth_dev->driver = eth_drv;
> +       eth_dev->device = pci_dev->device;
> +       eth_dev->driver = drv;
>         eth_dev->data->rx_mbuf_alloc_failed = 0;
>
>         /* init user callbacks */
> @@ -270,7 +270,7 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>         eth_dev->data->mtu = ETHER_MTU;
>
>         /* Invoke PMD device initialization function */
> -       diag = (*eth_drv->eth_dev_init)(eth_dev);
> +       diag = (*drv->init)(eth_dev);
>         if (diag == 0)
>                 return 0;
>
> @@ -287,7 +287,7 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>  int
>  rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>  {
> -       const struct eth_driver *eth_drv;
> +       const struct rte_driver *drv;
>         struct rte_eth_dev *eth_dev;
>         char ethdev_name[RTE_ETH_NAME_MAX_LEN];
>         int ret;
> @@ -302,11 +302,11 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>         if (eth_dev == NULL)
>                 return -ENODEV;
>
> -       eth_drv = (const struct eth_driver *)pci_dev->driver;
> +       drv = pci_dev->driver;
>
>         /* Invoke PMD device uninit function */
> -       if (*eth_drv->eth_dev_uninit) {
> -               ret = (*eth_drv->eth_dev_uninit)(eth_dev);
> +       if (*drv->uninit) {
> +               ret = (*drv->uninit)(eth_dev);
>                 if (ret)
>                         return ret;
>         }
> @@ -317,7 +317,7 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>         if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>                 rte_free(eth_dev->data->dev_private);
>
> -       eth_dev->pci_dev = NULL;
> +       eth_dev->device = NULL;
>         eth_dev->driver = NULL;
>         eth_dev->data = NULL;
>
> @@ -1556,7 +1556,7 @@ rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
>
>         RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>         (*dev->dev_ops->dev_infos_get)(dev, dev_info);
> -       dev_info->pci_dev = dev->pci_dev;
> +       dev_info->device = dev->device;
>         dev_info->driver_name = dev->data->drv_name;
>         dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>         dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> @@ -2537,6 +2537,7 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
>  {
>         uint32_t vec;
>         struct rte_eth_dev *dev;
> +       struct rte_pci_device *pci_dev;
>         struct rte_intr_handle *intr_handle;
>         uint16_t qid;
>         int rc;
> @@ -2544,6 +2545,10 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
>         RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>
>         dev = &rte_eth_devices[port_id];
> +       /* TODO intr_handle is currently in rte_pci_device;
> +        * Below is incorrect until that time
> +        */
> +       pci_dev = container_of(dev->device, struct rte_pci_device, device);
>         intr_handle = &dev->pci_dev->intr_handle;
>         if (!intr_handle->intr_vec) {
>                 RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n");
> @@ -2572,7 +2577,7 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>         const struct rte_memzone *mz;
>
>         snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
> -                dev->driver->pci_drv.driver.name, ring_name,
> +                dev->driver->name, ring_name,
>                  dev->data->port_id, queue_id);
>
>         mz = rte_memzone_lookup(z_name);
> @@ -2593,6 +2598,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
>  {
>         uint32_t vec;
>         struct rte_eth_dev *dev;
> +       struct rte_pci_device *pci_dev;
>         struct rte_intr_handle *intr_handle;
>         int rc;
>
> @@ -2604,7 +2610,9 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
>                 return -EINVAL;
>         }
>
> -       intr_handle = &dev->pci_dev->intr_handle;
> +       /* TODO; Until intr_handle is available in rte_device, below is incorrect */
> +       pci_dev = container_of(dev->device, struct rte_pci_device, device);
> +       intr_handle = &pci_dev->intr_handle;
>         if (!intr_handle->intr_vec) {
>                 RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n");
>                 return -EPERM;
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 38641e8..2b1d826 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -876,7 +876,7 @@ struct rte_eth_conf {
>   * Ethernet device information
>   */
>  struct rte_eth_dev_info {
> -       struct rte_pci_device *pci_dev; /**< Device PCI information. */
> +       struct rte_device *device; /**< Device PCI information. */

We already the situation that virtual devices don't set the pci_dev
field. I wonder if it really makes sense to replace it with a struct
rte_device because that is not adding a lot of value (only numa_node).
If we break ABI we might want to add numa_node and dev_flags as
suggested by Stephen Hemminger already. If we choose to not break ABI
we can delegate the population of the pci_dev field to dev_infos_get.
I already have that patch in my patch stack too.

The problem with rte_eth_dev_info is that it doesn't support
extensions. Maybe its time to add rte_eth_dev_info_ex() ... or
rte_eth_dev_xinfo() if you don't like the IB verbs API.


>         const char *driver_name; /**< Device Driver name. */
>         unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>                 Use if_indextoname() to translate into an interface name. */
> @@ -1623,9 +1623,9 @@ struct rte_eth_dev {
>         eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>         eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
>         struct rte_eth_dev_data *data;  /**< Pointer to device data */
> -       const struct eth_driver *driver;/**< Driver for this device */
> +       const struct rte_driver *driver;/**< Driver for this device */
>         const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> -       struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */
> +       struct rte_device *device; /**< Device instance */
>         /** User application callbacks for NIC interrupts */
>         struct rte_eth_dev_cb_list link_intr_cbs;
>         /**
> --
> 2.7.4
>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 02/22] cmdline: add support for dynamic tokens
    2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API Adrien Mazarguil
@ 2016-11-16 16:23  2%   ` Adrien Mazarguil
    2 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-11-16 16:23 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, Pablo de Lara, Olivier Matz

Considering tokens must be hard-coded in a list part of the instruction
structure, context-dependent tokens cannot be expressed.

This commit adds support for building dynamic token lists through a
user-provided function, which is called when the static token list is empty
(a single NULL entry).

Because no structures are modified (existing fields are reused), this
commit has no impact on the current ABI.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_cmdline/cmdline_parse.c | 60 +++++++++++++++++++++++++++++----
 lib/librte_cmdline/cmdline_parse.h | 21 ++++++++++++
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c
index b496067..14f5553 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -146,7 +146,9 @@ nb_common_chars(const char * s1, const char * s2)
  */
 static int
 match_inst(cmdline_parse_inst_t *inst, const char *buf,
-	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size)
+	   unsigned int nb_match_token, void *resbuf, unsigned resbuf_size,
+	   cmdline_parse_token_hdr_t
+		*(*dyn_tokens)[CMDLINE_PARSE_DYNAMIC_TOKENS])
 {
 	unsigned int token_num=0;
 	cmdline_parse_token_hdr_t * token_p;
@@ -155,6 +157,11 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 	struct cmdline_token_hdr token_hdr;
 
 	token_p = inst->tokens[token_num];
+	if (!token_p && dyn_tokens && inst->f) {
+		if (!(*dyn_tokens)[0])
+			inst->f(&(*dyn_tokens)[0], NULL, dyn_tokens);
+		token_p = (*dyn_tokens)[0];
+	}
 	if (token_p)
 		memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -196,7 +203,17 @@ match_inst(cmdline_parse_inst_t *inst, const char *buf,
 		buf += n;
 
 		token_num ++;
-		token_p = inst->tokens[token_num];
+		if (!inst->tokens[0]) {
+			if (token_num < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!(*dyn_tokens)[token_num])
+					inst->f(&(*dyn_tokens)[token_num],
+						NULL,
+						dyn_tokens);
+				token_p = (*dyn_tokens)[token_num];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[token_num];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 	}
@@ -239,6 +256,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 	cmdline_parse_inst_t *inst;
 	const char *curbuf;
 	char result_buf[CMDLINE_PARSE_RESULT_BUFSIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	void (*f)(void *, struct cmdline *, void *) = NULL;
 	void *data = NULL;
 	int comment = 0;
@@ -255,6 +273,7 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		return CMDLINE_PARSE_BAD_ARGS;
 
 	ctx = cl->ctx;
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/*
 	 * - look if the buffer contains at least one line
@@ -299,7 +318,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
 		debug_printf("INST %d\n", inst_num);
 
 		/* fully parsed */
-		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf));
+		tok = match_inst(inst, buf, 0, result_buf, sizeof(result_buf),
+				 &dyn_tokens);
 
 		if (tok > 0) /* we matched at least one token */
 			err = CMDLINE_PARSE_BAD_ARGS;
@@ -355,6 +375,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 	cmdline_parse_token_hdr_t *token_p;
 	struct cmdline_token_hdr token_hdr;
 	char tmpbuf[CMDLINE_BUFFER_SIZE], comp_buf[CMDLINE_BUFFER_SIZE];
+	cmdline_parse_token_hdr_t *dyn_tokens[CMDLINE_PARSE_DYNAMIC_TOKENS];
 	unsigned int partial_tok_len;
 	int comp_len = -1;
 	int tmp_len = -1;
@@ -374,6 +395,7 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 
 	debug_printf("%s called\n", __func__);
 	memset(&token_hdr, 0, sizeof(token_hdr));
+	memset(&dyn_tokens, 0, sizeof(dyn_tokens));
 
 	/* count the number of complete token to parse */
 	for (i=0 ; buf[i] ; i++) {
@@ -396,11 +418,24 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		inst = ctx[inst_num];
 		while (inst) {
 			/* parse the first tokens of the inst */
-			if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+			if (nb_token &&
+			    match_inst(inst, buf, nb_token, NULL, 0,
+				       &dyn_tokens))
 				goto next;
 
 			debug_printf("instruction match\n");
-			token_p = inst->tokens[nb_token];
+			if (!inst->tokens[0]) {
+				if (nb_token <
+				    (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+					if (!dyn_tokens[nb_token])
+						inst->f(&dyn_tokens[nb_token],
+							NULL,
+							&dyn_tokens);
+					token_p = dyn_tokens[nb_token];
+				} else
+					token_p = NULL;
+			} else
+				token_p = inst->tokens[nb_token];
 			if (token_p)
 				memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
@@ -490,10 +525,21 @@ cmdline_complete(struct cmdline *cl, const char *buf, int *state,
 		/* we need to redo it */
 		inst = ctx[inst_num];
 
-		if (nb_token && match_inst(inst, buf, nb_token, NULL, 0))
+		if (nb_token &&
+		    match_inst(inst, buf, nb_token, NULL, 0, &dyn_tokens))
 			goto next2;
 
-		token_p = inst->tokens[nb_token];
+		if (!inst->tokens[0]) {
+			if (nb_token < (CMDLINE_PARSE_DYNAMIC_TOKENS - 1)) {
+				if (!dyn_tokens[nb_token])
+					inst->f(&dyn_tokens[nb_token],
+						NULL,
+						&dyn_tokens);
+				token_p = dyn_tokens[nb_token];
+			} else
+				token_p = NULL;
+		} else
+			token_p = inst->tokens[nb_token];
 		if (token_p)
 			memcpy(&token_hdr, token_p, sizeof(token_hdr));
 
diff --git a/lib/librte_cmdline/cmdline_parse.h b/lib/librte_cmdline/cmdline_parse.h
index 4ac05d6..65b18d4 100644
--- a/lib/librte_cmdline/cmdline_parse.h
+++ b/lib/librte_cmdline/cmdline_parse.h
@@ -83,6 +83,9 @@ extern "C" {
 /* maximum buffer size for parsed result */
 #define CMDLINE_PARSE_RESULT_BUFSIZE 8192
 
+/* maximum number of dynamic tokens */
+#define CMDLINE_PARSE_DYNAMIC_TOKENS 128
+
 /**
  * Stores a pointer to the ops struct, and the offset: the place to
  * write the parsed result in the destination structure.
@@ -130,6 +133,24 @@ struct cmdline;
  * Store a instruction, which is a pointer to a callback function and
  * its parameter that is called when the instruction is parsed, a help
  * string, and a list of token composing this instruction.
+ *
+ * When no tokens are defined (tokens[0] == NULL), they are retrieved
+ * dynamically by calling f() as follows:
+ *
+ *  f((struct cmdline_token_hdr **)&token_hdr,
+ *    NULL,
+ *    (struct cmdline_token_hdr *[])tokens));
+ *
+ * The address of the resulting token is expected at the location pointed by
+ * the first argument. Can be set to NULL to end the list.
+ *
+ * The cmdline argument (struct cmdline *) is always NULL.
+ *
+ * The last argument points to the NULL-terminated list of dynamic tokens
+ * defined so far. Since token_hdr points to an index of that list, the
+ * current index can be derived as follows:
+ *
+ *  int index = token_hdr - &(*tokens)[0];
  */
 struct cmdline_inst {
 	/* f(parsed_struct, data) */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
  @ 2016-11-16 16:23  2%   ` Adrien Mazarguil
  2016-11-18  6:36  0%     ` Xing, Beilei
                       ` (2 more replies)
  2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 02/22] cmdline: add support for dynamic tokens Adrien Mazarguil
    2 siblings, 3 replies; 200+ results
From: Adrien Mazarguil @ 2016-11-16 16:23 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, Pablo de Lara, Olivier Matz

This new API supersedes all the legacy filter types described in
rte_eth_ctrl.h. It is slightly higher level and as a result relies more on
PMDs to process and validate flow rules.

Benefits:

- A unified API is easier to program for, applications do not have to be
  written for a specific filter type which may or may not be supported by
  the underlying device.

- The behavior of a flow rule is the same regardless of the underlying
  device, applications do not need to be aware of hardware quirks.

- Extensible by design, API/ABI breakage should rarely occur if at all.

- Documentation is self-standing, no need to look up elsewhere.

Existing filter types will be deprecated and removed in the near future.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 MAINTAINERS                            |   4 +
 lib/librte_ether/Makefile              |   3 +
 lib/librte_ether/rte_eth_ctrl.h        |   1 +
 lib/librte_ether/rte_ether_version.map |  10 +
 lib/librte_ether/rte_flow.c            | 159 +++++
 lib/librte_ether/rte_flow.h            | 947 ++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow_driver.h     | 177 ++++++
 7 files changed, 1301 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..3b46630 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -243,6 +243,10 @@ M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
 F: scripts/test-null.sh
 
+Generic flow API
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: lib/librte_ether/rte_flow*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index efe1e5f..9335361 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map
 LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
+SRCS-y += rte_flow.c
 
 #
 # Export include files
@@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
+SYMLINK-y-include += rte_flow.h
+SYMLINK-y-include += rte_flow_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index fe80eb0..8386904 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -99,6 +99,7 @@ enum rte_filter_type {
 	RTE_ETH_FILTER_FDIR,
 	RTE_ETH_FILTER_HASH,
 	RTE_ETH_FILTER_L2_TUNNEL,
+	RTE_ETH_FILTER_GENERIC,
 	RTE_ETH_FILTER_MAX
 };
 
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 72be66d..b5d2547 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -147,3 +147,13 @@ DPDK_16.11 {
 	rte_eth_dev_pci_remove;
 
 } DPDK_16.07;
+
+DPDK_17.02 {
+	global:
+
+	rte_flow_validate;
+	rte_flow_create;
+	rte_flow_destroy;
+	rte_flow_query;
+
+} DPDK_16.11;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
new file mode 100644
index 0000000..064963d
--- /dev/null
+++ b/lib/librte_ether/rte_flow.c
@@ -0,0 +1,159 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_flow_driver.h"
+#include "rte_flow.h"
+
+/* Get generic flow operations structure from a port. */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops;
+	int code;
+
+	if (unlikely(!rte_eth_dev_is_valid_port(port_id)))
+		code = ENODEV;
+	else if (unlikely(!dev->dev_ops->filter_ctrl ||
+			  dev->dev_ops->filter_ctrl(dev,
+						    RTE_ETH_FILTER_GENERIC,
+						    RTE_ETH_FILTER_GET,
+						    &ops) ||
+			  !ops))
+		code = ENOTSUP;
+	else
+		return ops;
+	rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(code));
+	return NULL;
+}
+
+/* Check whether a flow rule can be created on a given port. */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error)
+{
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->validate))
+		return ops->validate(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
+
+/* Create a flow rule on a given port. */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return NULL;
+	if (likely(!!ops->create))
+		return ops->create(dev, attr, pattern, actions, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return NULL;
+}
+
+/* Destroy a flow rule on a given port. */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->destroy))
+		return ops->destroy(dev, flow, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
+
+/* Destroy all flow rules associated with a port. */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (unlikely(!ops))
+		return -rte_errno;
+	if (likely(!!ops->flush))
+		return ops->flush(dev, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
+
+/* Query an existing flow rule. */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
+
+	if (!ops)
+		return -rte_errno;
+	if (likely(!!ops->query))
+		return ops->query(dev, flow, action, data, error);
+	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			   NULL, rte_strerror(ENOTSUP));
+	return -rte_errno;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
new file mode 100644
index 0000000..211f307
--- /dev/null
+++ b/lib/librte_ether/rte_flow.h
@@ -0,0 +1,947 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_H_
+#define RTE_FLOW_H_
+
+/**
+ * @file
+ * RTE generic flow API
+ *
+ * This interface provides the ability to program packet matching and
+ * associated actions in hardware through flow rules.
+ */
+
+#include <rte_arp.h>
+#include <rte_ether.h>
+#include <rte_icmp.h>
+#include <rte_ip.h>
+#include <rte_sctp.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Flow rule attributes.
+ *
+ * Priorities are set on two levels: per group and per rule within groups.
+ *
+ * Lower values denote higher priority, the highest priority for both levels
+ * is 0, so that a rule with priority 0 in group 8 is always matched after a
+ * rule with priority 8 in group 0.
+ *
+ * Although optional, applications are encouraged to group similar rules as
+ * much as possible to fully take advantage of hardware capabilities
+ * (e.g. optimized matching) and work around limitations (e.g. a single
+ * pattern type possibly allowed in a given group).
+ *
+ * Group and priority levels are arbitrary and up to the application, they
+ * do not need to be contiguous nor start from 0, however the maximum number
+ * varies between devices and may be affected by existing flow rules.
+ *
+ * If a packet is matched by several rules of a given group for a given
+ * priority level, the outcome is undefined. It can take any path, may be
+ * duplicated or even cause unrecoverable errors.
+ *
+ * Note that support for more than a single group and priority level is not
+ * guaranteed.
+ *
+ * Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+ *
+ * Several pattern items and actions are valid and can be used in both
+ * directions. Those valid for only one direction are described as such.
+ *
+ * Specifying both directions at once is not recommended but may be valid in
+ * some cases, such as incrementing the same counter twice.
+ *
+ * Not specifying any direction is currently an error.
+ */
+struct rte_flow_attr {
+	uint32_t group; /**< Priority group. */
+	uint32_t priority; /**< Priority level within group. */
+	uint32_t ingress:1; /**< Rule applies to ingress traffic. */
+	uint32_t egress:1; /**< Rule applies to egress traffic. */
+	uint32_t reserved:30; /**< Reserved, must be zero. */
+};
+
+/**
+ * Matching pattern item types.
+ *
+ * Items are arranged in a list to form a matching pattern for packets.
+ * They fall in two categories:
+ *
+ * - Protocol matching (ANY, RAW, ETH, IPV4, IPV6, ICMP, UDP, TCP, SCTP,
+ *   VXLAN and so on), usually associated with a specification
+ *   structure. These must be stacked in the same order as the protocol
+ *   layers to match, starting from L2.
+ *
+ * - Affecting how the pattern is processed (END, VOID, INVERT, PF, VF, PORT
+ *   and so on), often without a specification structure. Since they are
+ *   meta data that does not match packet contents, these can be specified
+ *   anywhere within item lists without affecting the protocol matching
+ *   items.
+ *
+ * See the description of individual types for more information. Those
+ * marked with [META] fall into the second category.
+ */
+enum rte_flow_item_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for item lists. Prevents further processing of items,
+	 * thereby ending the pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_VOID,
+
+	/**
+	 * [META]
+	 *
+	 * Inverted matching, i.e. process packets that do not match the
+	 * pattern.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_INVERT,
+
+	/**
+	 * Matches any protocol in place of the current layer, a single ANY
+	 * may also stand for several protocol layers.
+	 *
+	 * See struct rte_flow_item_any.
+	 */
+	RTE_FLOW_ITEM_TYPE_ANY,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to the physical function of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a PF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * No associated specification structure.
+	 */
+	RTE_FLOW_ITEM_TYPE_PF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets addressed to a virtual function ID of the device.
+	 *
+	 * If the underlying device function differs from the one that would
+	 * normally receive the matched traffic, specifying this item
+	 * prevents it from reaching that device unless the flow rule
+	 * contains a VF action. Packets are not duplicated between device
+	 * instances by default.
+	 *
+	 * See struct rte_flow_item_vf.
+	 */
+	RTE_FLOW_ITEM_TYPE_VF,
+
+	/**
+	 * [META]
+	 *
+	 * Matches packets coming from the specified physical port of the
+	 * underlying device.
+	 *
+	 * The first PORT item overrides the physical port normally
+	 * associated with the specified DPDK input port (port_id). This
+	 * item can be provided several times to match additional physical
+	 * ports.
+	 *
+	 * See struct rte_flow_item_port.
+	 */
+	RTE_FLOW_ITEM_TYPE_PORT,
+
+	/**
+	 * Matches a byte string of a given length at a given offset.
+	 *
+	 * See struct rte_flow_item_raw.
+	 */
+	RTE_FLOW_ITEM_TYPE_RAW,
+
+	/**
+	 * Matches an Ethernet header.
+	 *
+	 * See struct rte_flow_item_eth.
+	 */
+	RTE_FLOW_ITEM_TYPE_ETH,
+
+	/**
+	 * Matches an 802.1Q/ad VLAN tag.
+	 *
+	 * See struct rte_flow_item_vlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VLAN,
+
+	/**
+	 * Matches an IPv4 header.
+	 *
+	 * See struct rte_flow_item_ipv4.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV4,
+
+	/**
+	 * Matches an IPv6 header.
+	 *
+	 * See struct rte_flow_item_ipv6.
+	 */
+	RTE_FLOW_ITEM_TYPE_IPV6,
+
+	/**
+	 * Matches an ICMP header.
+	 *
+	 * See struct rte_flow_item_icmp.
+	 */
+	RTE_FLOW_ITEM_TYPE_ICMP,
+
+	/**
+	 * Matches a UDP header.
+	 *
+	 * See struct rte_flow_item_udp.
+	 */
+	RTE_FLOW_ITEM_TYPE_UDP,
+
+	/**
+	 * Matches a TCP header.
+	 *
+	 * See struct rte_flow_item_tcp.
+	 */
+	RTE_FLOW_ITEM_TYPE_TCP,
+
+	/**
+	 * Matches a SCTP header.
+	 *
+	 * See struct rte_flow_item_sctp.
+	 */
+	RTE_FLOW_ITEM_TYPE_SCTP,
+
+	/**
+	 * Matches a VXLAN header.
+	 *
+	 * See struct rte_flow_item_vxlan.
+	 */
+	RTE_FLOW_ITEM_TYPE_VXLAN,
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ANY
+ *
+ * Matches any protocol in place of the current layer, a single ANY may also
+ * stand for several protocol layers.
+ *
+ * This is usually specified as the first pattern item when looking for a
+ * protocol anywhere in a packet.
+ *
+ * A maximum value of 0 requests matching any number of protocol layers
+ * above or equal to the minimum value, a maximum value lower than the
+ * minimum one is otherwise invalid.
+ *
+ * This type does not work with a range (struct rte_flow_item.last).
+ */
+struct rte_flow_item_any {
+	uint16_t min; /**< Minimum number of layers covered. */
+	uint16_t max; /**< Maximum number of layers covered, 0 for infinity. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VF
+ *
+ * Matches packets addressed to a virtual function ID of the device.
+ *
+ * If the underlying device function differs from the one that would
+ * normally receive the matched traffic, specifying this item prevents it
+ * from reaching that device unless the flow rule contains a VF
+ * action. Packets are not duplicated between device instances by default.
+ *
+ * - Likely to return an error or never match any traffic if this causes a
+ *   VF device to match traffic addressed to a different VF.
+ * - Can be specified multiple times to match traffic addressed to several
+ *   specific VFs.
+ * - Can be combined with a PF item to match both PF and VF traffic.
+ *
+ * A zeroed mask can be used to match any VF.
+ */
+struct rte_flow_item_vf {
+	uint32_t id; /**< Destination VF ID. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_PORT
+ *
+ * Matches packets coming from the specified physical port of the underlying
+ * device.
+ *
+ * The first PORT item overrides the physical port normally associated with
+ * the specified DPDK input port (port_id). This item can be provided
+ * several times to match additional physical ports.
+ *
+ * Note that physical ports are not necessarily tied to DPDK input ports
+ * (port_id) when those are not under DPDK control. Possible values are
+ * specific to each device, they are not necessarily indexed from zero and
+ * may not be contiguous.
+ *
+ * As a device property, the list of allowed values as well as the value
+ * associated with a port_id should be retrieved by other means.
+ *
+ * A zeroed mask can be used to match any port index.
+ */
+struct rte_flow_item_port {
+	uint32_t index; /**< Physical port index. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_RAW
+ *
+ * Matches a byte string of a given length at a given offset.
+ *
+ * Offset is either absolute (using the start of the packet) or relative to
+ * the end of the previous matched item in the stack, in which case negative
+ * values are allowed.
+ *
+ * If search is enabled, offset is used as the starting point. The search
+ * area can be delimited by setting limit to a nonzero value, which is the
+ * maximum number of bytes after offset where the pattern may start.
+ *
+ * Matching a zero-length pattern is allowed, doing so resets the relative
+ * offset for subsequent items.
+ *
+ * This type does not work with a range (struct rte_flow_item.last).
+ */
+struct rte_flow_item_raw {
+	uint32_t relative:1; /**< Look for pattern after the previous item. */
+	uint32_t search:1; /**< Search pattern from offset (see also limit). */
+	uint32_t reserved:30; /**< Reserved, must be set to zero. */
+	int32_t offset; /**< Absolute or relative offset for pattern. */
+	uint16_t limit; /**< Search area limit for start of pattern. */
+	uint16_t length; /**< Pattern length. */
+	uint8_t pattern[]; /**< Byte string to look for. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ETH
+ *
+ * Matches an Ethernet header.
+ */
+struct rte_flow_item_eth {
+	struct ether_addr dst; /**< Destination MAC. */
+	struct ether_addr src; /**< Source MAC. */
+	unsigned int type; /**< EtherType. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VLAN
+ *
+ * Matches an 802.1Q/ad VLAN tag.
+ *
+ * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
+ * RTE_FLOW_ITEM_TYPE_VLAN.
+ */
+struct rte_flow_item_vlan {
+	uint16_t tpid; /**< Tag protocol identifier. */
+	uint16_t tci; /**< Tag control information. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV4
+ *
+ * Matches an IPv4 header.
+ *
+ * Note: IPv4 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv4 {
+	struct ipv4_hdr hdr; /**< IPv4 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_IPV6.
+ *
+ * Matches an IPv6 header.
+ *
+ * Note: IPv6 options are handled by dedicated pattern items.
+ */
+struct rte_flow_item_ipv6 {
+	struct ipv6_hdr hdr; /**< IPv6 header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_ICMP.
+ *
+ * Matches an ICMP header.
+ */
+struct rte_flow_item_icmp {
+	struct icmp_hdr hdr; /**< ICMP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_UDP.
+ *
+ * Matches a UDP header.
+ */
+struct rte_flow_item_udp {
+	struct udp_hdr hdr; /**< UDP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_TCP.
+ *
+ * Matches a TCP header.
+ */
+struct rte_flow_item_tcp {
+	struct tcp_hdr hdr; /**< TCP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_SCTP.
+ *
+ * Matches a SCTP header.
+ */
+struct rte_flow_item_sctp {
+	struct sctp_hdr hdr; /**< SCTP header definition. */
+};
+
+/**
+ * RTE_FLOW_ITEM_TYPE_VXLAN.
+ *
+ * Matches a VXLAN header (RFC 7348).
+ */
+struct rte_flow_item_vxlan {
+	uint8_t flags; /**< Normally 0x08 (I flag). */
+	uint8_t rsvd0[3]; /**< Reserved, normally 0x000000. */
+	uint8_t vni[3]; /**< VXLAN identifier. */
+	uint8_t rsvd1; /**< Reserved, normally 0x00. */
+};
+
+/**
+ * Matching pattern item definition.
+ *
+ * A pattern is formed by stacking items starting from the lowest protocol
+ * layer to match. This stacking restriction does not apply to meta items
+ * which can be placed anywhere in the stack with no effect on the meaning
+ * of the resulting pattern.
+ *
+ * A stack is terminated by a END item.
+ *
+ * The spec field should be a valid pointer to a structure of the related
+ * item type. It may be set to NULL in many cases to use default values.
+ *
+ * Optionally, last can point to a structure of the same type to define an
+ * inclusive range. This is mostly supported by integer and address fields,
+ * may cause errors otherwise. Fields that do not support ranges must be set
+ * to the same value as their spec counterparts.
+ *
+ * By default all fields present in spec are considered relevant.* This
+ * behavior can be altered by providing a mask structure of the same type
+ * with applicable bits set to one. It can also be used to partially filter
+ * out specific fields (e.g. as an alternate mean to match ranges of IP
+ * addresses).
+ *
+ * Note this is a simple bit-mask applied before interpreting the contents
+ * of spec and last, which may yield unexpected results if not used
+ * carefully. For example, if for an IPv4 address field, spec provides
+ * 10.1.2.3, last provides 10.3.4.5 and mask provides 255.255.0.0, the
+ * effective range is 10.1.0.0 to 10.3.255.255.
+ *
+ * * The defaults for data-matching items such as IPv4 when mask is not
+ *   specified actually depend on the underlying implementation since only
+ *   recognized fields can be taken into account.
+ */
+struct rte_flow_item {
+	enum rte_flow_item_type type; /**< Item type. */
+	const void *spec; /**< Pointer to item specification structure. */
+	const void *last; /**< Defines an inclusive range (spec to last). */
+	const void *mask; /**< Bit-mask applied to spec and last. */
+};
+
+/**
+ * Action types.
+ *
+ * Each possible action is represented by a type. Some have associated
+ * configuration structures. Several actions combined in a list can be
+ * affected to a flow rule. That list is not ordered.
+ *
+ * They fall in three categories:
+ *
+ * - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+ *   processing matched packets by subsequent flow rules, unless overridden
+ *   with PASSTHRU.
+ *
+ * - Non terminating actions (PASSTHRU, DUP) that leave matched packets up
+ *   for additional processing by subsequent flow rules.
+ *
+ * - Other non terminating meta actions that do not affect the fate of
+ *   packets (END, VOID, MARK, FLAG, COUNT).
+ *
+ * When several actions are combined in a flow rule, they should all have
+ * different types (e.g. dropping a packet twice is not possible). The
+ * defined behavior is for PMDs to only take into account the last action of
+ * a given type found in the list. PMDs still perform error checking on the
+ * entire list.
+ *
+ * Note that PASSTHRU is the only action able to override a terminating
+ * rule.
+ */
+enum rte_flow_action_type {
+	/**
+	 * [META]
+	 *
+	 * End marker for action lists. Prevents further processing of
+	 * actions, thereby ending the list.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_END,
+
+	/**
+	 * [META]
+	 *
+	 * Used as a placeholder for convenience. It is ignored and simply
+	 * discarded by PMDs.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_VOID,
+
+	/**
+	 * Leaves packets up for additional processing by subsequent flow
+	 * rules. This is the default when a rule does not contain a
+	 * terminating action, but can be specified to force a rule to
+	 * become non-terminating.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PASSTHRU,
+
+	/**
+	 * [META]
+	 *
+	 * Attaches a 32 bit value to packets.
+	 *
+	 * See struct rte_flow_action_mark.
+	 */
+	RTE_FLOW_ACTION_TYPE_MARK,
+
+	/**
+	 * [META]
+	 *
+	 * Flag packets. Similar to MARK but only affects ol_flags.
+	 *
+	 * Note: a distinctive flag must be defined for it.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_FLAG,
+
+	/**
+	 * Assigns packets to a given queue index.
+	 *
+	 * See struct rte_flow_action_queue.
+	 */
+	RTE_FLOW_ACTION_TYPE_QUEUE,
+
+	/**
+	 * Drops packets.
+	 *
+	 * PASSTHRU overrides this action if both are specified.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_DROP,
+
+	/**
+	 * [META]
+	 *
+	 * Enables counters for this rule.
+	 *
+	 * These counters can be retrieved and reset through rte_flow_query(),
+	 * see struct rte_flow_query_count.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_COUNT,
+
+	/**
+	 * Duplicates packets to a given queue index.
+	 *
+	 * This is normally combined with QUEUE, however when used alone, it
+	 * is actually similar to QUEUE + PASSTHRU.
+	 *
+	 * See struct rte_flow_action_dup.
+	 */
+	RTE_FLOW_ACTION_TYPE_DUP,
+
+	/**
+	 * Similar to QUEUE, except RSS is additionally performed on packets
+	 * to spread them among several queues according to the provided
+	 * parameters.
+	 *
+	 * See struct rte_flow_action_rss.
+	 */
+	RTE_FLOW_ACTION_TYPE_RSS,
+
+	/**
+	 * Redirects packets to the physical function (PF) of the current
+	 * device.
+	 *
+	 * No associated configuration structure.
+	 */
+	RTE_FLOW_ACTION_TYPE_PF,
+
+	/**
+	 * Redirects packets to the virtual function (VF) of the current
+	 * device with the specified ID.
+	 *
+	 * See struct rte_flow_action_vf.
+	 */
+	RTE_FLOW_ACTION_TYPE_VF,
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_MARK
+ *
+ * Attaches a 32 bit value to packets.
+ *
+ * This value is arbitrary and application-defined. For compatibility with
+ * FDIR it is returned in the hash.fdir.hi mbuf field. PKT_RX_FDIR_ID is
+ * also set in ol_flags.
+ */
+struct rte_flow_action_mark {
+	uint32_t id; /**< 32 bit value to return with packets. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_QUEUE
+ *
+ * Assign packets to a given queue index.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_queue {
+	uint16_t index; /**< Queue index to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_COUNT (query)
+ *
+ * Query structure to retrieve and reset flow rule counters.
+ */
+struct rte_flow_query_count {
+	uint32_t reset:1; /**< Reset counters after query [in]. */
+	uint32_t hits_set:1; /**< hits field is set [out]. */
+	uint32_t bytes_set:1; /**< bytes field is set [out]. */
+	uint32_t reserved:29; /**< Reserved, must be zero [in, out]. */
+	uint64_t hits; /**< Number of hits for this rule [out]. */
+	uint64_t bytes; /**< Number of bytes through this rule [out]. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_DUP
+ *
+ * Duplicates packets to a given queue index.
+ *
+ * This is normally combined with QUEUE, however when used alone, it is
+ * actually similar to QUEUE + PASSTHRU.
+ *
+ * Non-terminating by default.
+ */
+struct rte_flow_action_dup {
+	uint16_t index; /**< Queue index to duplicate packets to. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_RSS
+ *
+ * Similar to QUEUE, except RSS is additionally performed on packets to
+ * spread them among several queues according to the provided parameters.
+ *
+ * Note: RSS hash result is normally stored in the hash.rss mbuf field,
+ * however it conflicts with the MARK action as they share the same
+ * space. When both actions are specified, the RSS hash is discarded and
+ * PKT_RX_RSS_HASH is not set in ol_flags. MARK has priority. The mbuf
+ * structure should eventually evolve to store both.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_rss {
+	const struct rte_eth_rss_conf *rss_conf; /**< RSS parameters. */
+	uint16_t queues; /**< Number of entries in queue[]. */
+	uint16_t queue[]; /**< Queues indices to use. */
+};
+
+/**
+ * RTE_FLOW_ACTION_TYPE_VF
+ *
+ * Redirects packets to a virtual function (VF) of the current device.
+ *
+ * Packets matched by a VF pattern item can be redirected to their original
+ * VF ID instead of the specified one. This parameter may not be available
+ * and is not guaranteed to work properly if the VF part is matched by a
+ * prior flow rule or if packets are not addressed to a VF in the first
+ * place.
+ *
+ * Terminating by default.
+ */
+struct rte_flow_action_vf {
+	uint32_t original:1; /**< Use original VF ID if possible. */
+	uint32_t reserved:31; /**< Reserved, must be zero. */
+	uint32_t id; /**< VF ID to redirect packets to. */
+};
+
+/**
+ * Definition of a single action.
+ *
+ * A list of actions is terminated by a END action.
+ *
+ * For simple actions without a configuration structure, conf remains NULL.
+ */
+struct rte_flow_action {
+	enum rte_flow_action_type type; /**< Action type. */
+	const void *conf; /**< Pointer to action configuration structure. */
+};
+
+/**
+ * Opaque type returned after successfully creating a flow.
+ *
+ * This handle can be used to manage and query the related flow (e.g. to
+ * destroy it or retrieve counters).
+ */
+struct rte_flow;
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_flow_error.cause.
+ */
+enum rte_flow_error_type {
+	RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+	RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+	RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+	RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+	RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+	RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+	RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+	RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_flow_error {
+	enum rte_flow_error_type type; /**< Cause field and error types. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Check whether a flow rule can be created on a given port.
+ *
+ * While this function has no effect on the target device, the flow rule is
+ * validated against its current configuration state and the returned value
+ * should be considered valid by the caller for that state only.
+ *
+ * The returned value is guaranteed to remain valid only as long as no
+ * successful calls to rte_flow_create() or rte_flow_destroy() are made in
+ * the meantime and no device parameter affecting flow rules in any way are
+ * modified, due to possible collisions or resource limitations (although in
+ * such cases EINVAL should not be returned).
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 if flow rule is valid and can be created. A negative errno value
+ *   otherwise (rte_errno is also set), the following errors are defined:
+ *
+ *   -ENOSYS: underlying device does not support this functionality.
+ *
+ *   -EINVAL: unknown or invalid rule specification.
+ *
+ *   -ENOTSUP: valid but unsupported rule specification (e.g. partial
+ *   bit-masks are unsupported).
+ *
+ *   -EEXIST: collision with an existing rule.
+ *
+ *   -ENOMEM: not enough resources.
+ *
+ *   -EBUSY: action cannot be performed due to busy device resources, may
+ *   succeed if the affected queues or even the entire port are in a stopped
+ *   state (see rte_eth_dev_rx_queue_stop() and rte_eth_dev_stop()).
+ */
+int
+rte_flow_validate(uint8_t port_id,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item pattern[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error);
+
+/**
+ * Create a flow rule on a given port.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   A valid handle in case of success, NULL otherwise and rte_errno is set
+ *   to the positive version of one of the error codes defined for
+ *   rte_flow_validate().
+ */
+struct rte_flow *
+rte_flow_create(uint8_t port_id,
+		const struct rte_flow_attr *attr,
+		const struct rte_flow_item pattern[],
+		const struct rte_flow_action actions[],
+		struct rte_flow_error *error);
+
+/**
+ * Destroy a flow rule on a given port.
+ *
+ * Failure to destroy a flow rule handle may occur when other flow rules
+ * depend on it, and destroying it would result in an inconsistent state.
+ *
+ * This function is only guaranteed to succeed if handles are destroyed in
+ * reverse order of their creation.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to destroy.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_destroy(uint8_t port_id,
+		 struct rte_flow *flow,
+		 struct rte_flow_error *error);
+
+/**
+ * Destroy all flow rules associated with a port.
+ *
+ * In the unlikely event of failure, handles are still considered destroyed
+ * and no longer valid but the port must be assumed to be in an inconsistent
+ * state.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_flush(uint8_t port_id,
+	       struct rte_flow_error *error);
+
+/**
+ * Query an existing flow rule.
+ *
+ * This function allows retrieving flow-specific data such as counters.
+ * Data is gathered by special actions which must be present in the flow
+ * rule definition.
+ *
+ * @param port_id
+ *   Port identifier of Ethernet device.
+ * @param flow
+ *   Flow rule handle to query.
+ * @param action
+ *   Action type to query.
+ * @param[in, out] data
+ *   Pointer to storage for the associated query data type.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+rte_flow_query(uint8_t port_id,
+	       struct rte_flow *flow,
+	       enum rte_flow_action_type action,
+	       void *data,
+	       struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_H_ */
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
new file mode 100644
index 0000000..a88c621
--- /dev/null
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -0,0 +1,177 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_FLOW_DRIVER_H_
+#define RTE_FLOW_DRIVER_H_
+
+/**
+ * @file
+ * RTE generic flow API (driver side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_flow.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Generic flow operations structure implemented and returned by PMDs.
+ *
+ * To implement this API, PMDs must handle the RTE_ETH_FILTER_GENERIC filter
+ * type in their .filter_ctrl callback function (struct eth_dev_ops) as well
+ * as the RTE_ETH_FILTER_GET filter operation.
+ *
+ * If successful, this operation must result in a pointer to a PMD-specific
+ * struct rte_flow_ops written to the argument address as described below:
+ *
+ *  // PMD filter_ctrl callback
+ *
+ *  static const struct rte_flow_ops pmd_flow_ops = { ... };
+ *
+ *  switch (filter_type) {
+ *  case RTE_ETH_FILTER_GENERIC:
+ *      if (filter_op != RTE_ETH_FILTER_GET)
+ *          return -EINVAL;
+ *      *(const void **)arg = &pmd_flow_ops;
+ *      return 0;
+ *  }
+ *
+ * See also rte_flow_ops_get().
+ *
+ * These callback functions are not supposed to be used by applications
+ * directly, which must rely on the API defined in rte_flow.h.
+ *
+ * Public-facing wrapper functions perform a few consistency checks so that
+ * unimplemented (i.e. NULL) callbacks simply return -ENOTSUP. These
+ * callbacks otherwise only differ by their first argument (with port ID
+ * already resolved to a pointer to struct rte_eth_dev).
+ */
+struct rte_flow_ops {
+	/** See rte_flow_validate(). */
+	int (*validate)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_create(). */
+	struct rte_flow *(*create)
+		(struct rte_eth_dev *,
+		 const struct rte_flow_attr *,
+		 const struct rte_flow_item [],
+		 const struct rte_flow_action [],
+		 struct rte_flow_error *);
+	/** See rte_flow_destroy(). */
+	int (*destroy)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 struct rte_flow_error *);
+	/** See rte_flow_flush(). */
+	int (*flush)
+		(struct rte_eth_dev *,
+		 struct rte_flow_error *);
+	/** See rte_flow_query(). */
+	int (*query)
+		(struct rte_eth_dev *,
+		 struct rte_flow *,
+		 enum rte_flow_action_type,
+		 void *,
+		 struct rte_flow_error *);
+};
+
+/**
+ * Initialize generic flow error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Pointer to flow error structure.
+ */
+static inline struct rte_flow_error *
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return error;
+}
+
+/**
+ * Get generic flow operations structure from a port.
+ *
+ * @param port_id
+ *   Port identifier to query.
+ * @param[out] error
+ *   Pointer to flow error structure.
+ *
+ * @return
+ *   The flow operations structure associated with port_id, NULL in case of
+ *   error, in which case rte_errno is set and the error structure contains
+ *   additional details.
+ */
+const struct rte_flow_ops *
+rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_FLOW_DRIVER_H_ */
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v5] latencystats: added new library for latency stats
  @ 2016-11-15 13:37  1% ` Reshma Pattan
  0 siblings, 0 replies; 200+ results
From: Reshma Pattan @ 2016-11-15 13:37 UTC (permalink / raw)
  To: dev; +Cc: Reshma Pattan

Add a library designed to calculate latency statistics and report them
to the application when queried. The library measures minimum, average and
maximum latencies, and jitter in nano seconds. The current implementation
supports global latency stats, i.e. per application stats.

Added new field to mbuf struct to mark the packet arrival time on Rx.

Modify testpmd code to initialize/uninitialize latency statistics
calulation.

Modify the dpdk-procinfo process to display the newly added metrics.
Added new command line option "--metrics" to display metrics.

This pacth is dependent on http://dpdk.org/dev/patchwork/patch/16927/

APIs:

* Added APIs to initialize and un initialize latency stats
  calculation.
* Added API to retrieve latency stats names and values.

Functionality:

* The library will register ethdev Rx/Tx callbacks for each active port,
  queue combinations.
* The library will register latency stats names with new metrics library.
* Rx packets will be marked with time stamp on each sampling interval.
* On Tx side, packets with time stamp will be considered for calculating
  the minimum, maximum, average latencies and also jitter.
* Average latency is calculated using exponential weighted moving average
  method.
* Minimum and maximum latencies will be low and high latency values
  observed so far.
* Jitter calculation is done based on inter packet delay variation.
* Measured stats are reported to the metrics library in a separate
  pthread.
* Measured stats can be retrieved via get API of the libray (or)
  by calling generic get API of the new metrics library.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
---
v5:
* References to 16.11 changed to 17.02
* Updated comments and doxygen
* rte_stat_value changed to rte_metric_value in library and proc_info
* Updated doc for doxygen
* Updated release notes

---
 MAINTAINERS                                        |   4 +
 app/proc_info/main.c                               |  70 ++++
 app/test-pmd/testpmd.c                             |  10 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_17_02.rst             |   5 +
 lib/Makefile                                       |   1 +
 lib/librte_latencystats/Makefile                   |  57 +++
 lib/librte_latencystats/rte_latencystats.c         | 389 +++++++++++++++++++++
 lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
 .../rte_latencystats_version.map                   |  10 +
 lib/librte_mbuf/rte_mbuf.h                         |   3 +
 mk/rte.app.mk                                      |   2 +
 14 files changed, 704 insertions(+)
 create mode 100644 lib/librte_latencystats/Makefile
 create mode 100644 lib/librte_latencystats/rte_latencystats.c
 create mode 100644 lib/librte_latencystats/rte_latencystats.h
 create mode 100644 lib/librte_latencystats/rte_latencystats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..6e5e26b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -704,3 +704,7 @@ F: examples/tep_termination/
 F: examples/vmdq/
 F: examples/vmdq_dcb/
 F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+
+Latency Stats
+M: Reshma Pattan <reshma.pattan@intel.com>
+F: lib/librte_latencystats/
diff --git a/app/proc_info/main.c b/app/proc_info/main.c
index 2c56d10..33a4b39 100644
--- a/app/proc_info/main.c
+++ b/app/proc_info/main.c
@@ -57,6 +57,7 @@
 #include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
+#include <rte_metrics.h>
 
 /* Maximum long option length for option parsing. */
 #define MAX_LONG_OPT_SZ 64
@@ -68,6 +69,8 @@ static uint32_t enabled_port_mask;
 static uint32_t enable_stats;
 /**< Enable xstats. */
 static uint32_t enable_xstats;
+/**< Enable metrics. */
+static uint32_t enable_metrics;
 /**< Enable stats reset. */
 static uint32_t reset_stats;
 /**< Enable xstats reset. */
@@ -85,6 +88,8 @@ proc_info_usage(const char *prgname)
 		"  --stats: to display port statistics, enabled by default\n"
 		"  --xstats: to display extended port statistics, disabled by "
 			"default\n"
+		"  --metrics: to display derived metrics of the ports, disabled by "
+			"default\n"
 		"  --stats-reset: to reset port statistics\n"
 		"  --xstats-reset: to reset port extended statistics\n",
 		prgname);
@@ -127,6 +132,7 @@ proc_info_parse_args(int argc, char **argv)
 		{"stats", 0, NULL, 0},
 		{"stats-reset", 0, NULL, 0},
 		{"xstats", 0, NULL, 0},
+		{"metrics", 0, NULL, 0},
 		{"xstats-reset", 0, NULL, 0},
 		{NULL, 0, 0, 0}
 	};
@@ -159,6 +165,10 @@ proc_info_parse_args(int argc, char **argv)
 			else if (!strncmp(long_option[option_index].name, "xstats",
 					MAX_LONG_OPT_SZ))
 				enable_xstats = 1;
+			else if (!strncmp(long_option[option_index].name,
+					"metrics",
+					MAX_LONG_OPT_SZ))
+				enable_metrics = 1;
 			/* Reset stats */
 			if (!strncmp(long_option[option_index].name, "stats-reset",
 					MAX_LONG_OPT_SZ))
@@ -301,6 +311,60 @@ nic_xstats_clear(uint8_t port_id)
 	printf("\n  NIC extended statistics for port %d cleared\n", port_id);
 }
 
+static void
+metrics_display(int port_id)
+{
+	struct rte_metric_value *metrics;
+	struct rte_metric_name *names;
+	int len, ret;
+	static const char *nic_stats_border = "########################";
+
+	memset(&metrics, 0, sizeof(struct rte_metric_value));
+	len = rte_metrics_get_names(NULL, 0);
+	if (len < 0) {
+		printf("Cannot get metrics count\n");
+		return;
+	}
+
+	metrics = malloc(sizeof(struct rte_metric_value) * len);
+	if (metrics == NULL) {
+		printf("Cannot allocate memory for metrics\n");
+		return;
+	}
+
+	names =  malloc(sizeof(struct rte_metric_name) * len);
+	if (names == NULL) {
+		printf("Cannot allocate memory for metrcis names\n");
+		free(metrics);
+		return;
+	}
+
+	if (len != rte_metrics_get_names(names, len)) {
+		printf("Cannot get metrics names\n");
+		free(metrics);
+		free(names);
+		return;
+	}
+
+	printf("###### metrics for port %-2d #########\n", port_id);
+	printf("%s############################\n", nic_stats_border);
+	ret = rte_metrics_get_values(port_id, metrics, len);
+	if (ret < 0 || ret > len) {
+		printf("Cannot get metrics values\n");
+		free(metrics);
+		free(names);
+		return;
+	}
+
+	int i;
+	for (i = 0; i < len; i++)
+		printf("%s: %"PRIu64"\n", names[i].name, metrics[i].value);
+
+	printf("%s############################\n", nic_stats_border);
+	free(metrics);
+	free(names);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -360,8 +424,14 @@ main(int argc, char **argv)
 				nic_stats_clear(i);
 			else if (reset_xstats)
 				nic_xstats_clear(i);
+			else if (enable_metrics)
+				metrics_display(i);
 		}
 	}
 
+	/* print port independent stats */
+	if (enable_metrics)
+		metrics_display(RTE_METRICS_NONPORT);
+
 	return 0;
 }
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a0332c2..aba6d78 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -78,6 +78,10 @@
 #ifdef RTE_LIBRTE_PDUMP
 #include <rte_pdump.h>
 #endif
+#include <rte_metrics.h>
+#ifdef RTE_LIBRTE_LATENCY_STATS
+#include <rte_latencystats.h>
+#endif
 
 #include "testpmd.h"
 
@@ -2075,6 +2079,9 @@ signal_handler(int signum)
 		/* uninitialize packet capture framework */
 		rte_pdump_uninit();
 #endif
+#ifdef RTE_LIBRTE_LATENCY_STATS
+		rte_latencystats_uninit();
+#endif
 		force_quit();
 		/* exit with the expected status */
 		signal(signum, SIG_DFL);
@@ -2132,6 +2139,9 @@ main(int argc, char** argv)
 	/* set all ports to promiscuous mode by default */
 	FOREACH_PORT(port_id, ports)
 		rte_eth_promiscuous_enable(port_id);
+#ifdef RTE_LIBRTE_LATENCY_STATS
+	rte_latencystats_init(1, NULL);
+#endif
 
 #ifdef RTE_LIBRTE_CMDLINE
 	if (interactive == 1) {
diff --git a/config/common_base b/config/common_base
index 4bff83a..a15e8e9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -589,3 +589,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the latency statistics library
+#
+CONFIG_RTE_LIBRTE_LATENCY_STATS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..6df3ca6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -147,4 +147,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [Latency stats]      (@ref rte_latencystats.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..8964ee8 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -46,6 +46,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_jobstats \
                           lib/librte_kni \
                           lib/librte_kvargs \
+                          lib/librte_latencystats \
                           lib/librte_lpm \
                           lib/librte_mbuf \
                           lib/librte_mempool \
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..bf8a460 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -38,6 +38,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added latency stats library.**
+  A library that facilitates latency stats measurment of the dpdk based applications.
+  The library measures minimum, average, maximum latencies and also jitter
+  in nano seconds.
 
 Resolved Issues
 ---------------
@@ -127,6 +131,7 @@ Shared Library Versions
      librte_acl.so.2
    + librte_cfgfile.so.2
      librte_cmdline.so.2
+     librte_latencystats.so.1
 
    This section is a comment. do not overwrite or remove it.
    =========================================================
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..2111349 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += librte_latencystats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_latencystats/Makefile b/lib/librte_latencystats/Makefile
new file mode 100644
index 0000000..f744da6
--- /dev/null
+++ b/lib/librte_latencystats/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_latencystats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+LDLIBS += -lm
+LDLIBS += -lpthread
+
+EXPORT_MAP := rte_latencystats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) := rte_latencystats.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)-include := rte_latencystats.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_latencystats/rte_latencystats.c b/lib/librte_latencystats/rte_latencystats.c
new file mode 100644
index 0000000..dcde7f6
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats.c
@@ -0,0 +1,389 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <stdbool.h>
+#include <math.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_log.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_metrics.h>
+#include <rte_memzone.h>
+#include <rte_lcore.h>
+#include <rte_timer.h>
+
+#include "rte_latencystats.h"
+
+/** Nano seconds per second */
+#define NS_PER_SEC 1E9
+
+/** Clock cycles per nano second */
+#define CYCLES_PER_NS (rte_get_timer_hz() / NS_PER_SEC)
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_LATENCY_STATS RTE_LOGTYPE_USER1
+
+static pthread_t latency_stats_thread;
+static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats";
+static int latency_stats_index;
+static uint64_t samp_intvl;
+static uint64_t timer_tsc;
+static uint64_t prev_tsc;
+
+static struct rte_latency_stats {
+	float min_latency; /**< Minimum latency in nano seconds */
+	float avg_latency; /**< Average latency in nano seconds */
+	float max_latency; /**< Maximum latency in nano seconds */
+	float jitter; /** Latency variation */
+} *glob_stats;
+
+static struct rxtx_cbs {
+	struct rte_eth_rxtx_callback *cb;
+} rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+	tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+
+struct latency_stats_nameoff {
+	char name[RTE_ETH_XSTATS_NAME_SIZE];
+	unsigned int offset;
+};
+
+static const struct latency_stats_nameoff lat_stats_strings[] = {
+	{"min_latency_ns", offsetof(struct rte_latency_stats, min_latency)},
+	{"avg_latency_ns", offsetof(struct rte_latency_stats, avg_latency)},
+	{"max_latency_ns", offsetof(struct rte_latency_stats, max_latency)},
+	{"jitter_ns", offsetof(struct rte_latency_stats, jitter)},
+};
+
+#define NUM_LATENCY_STATS (sizeof(lat_stats_strings) / \
+				sizeof(lat_stats_strings[0]))
+
+static __attribute__((noreturn)) void *
+report_latency_stats(__rte_unused void *arg)
+{
+	for (;;) {
+		unsigned int i;
+		float *stats_ptr = NULL;
+		uint64_t values[NUM_LATENCY_STATS] = {0};
+		int ret;
+
+		for (i = 0; i < NUM_LATENCY_STATS; i++) {
+			stats_ptr = RTE_PTR_ADD(glob_stats,
+					lat_stats_strings[i].offset);
+			values[i] = (uint64_t)floor((*stats_ptr)/
+					CYCLES_PER_NS);
+		}
+
+		ret = rte_metrics_update_metrics(RTE_METRICS_NONPORT,
+						latency_stats_index,
+						values, NUM_LATENCY_STATS);
+		if (ret < 0)
+			RTE_LOG(INFO, LATENCY_STATS,
+				"Failed to push the stats\n");
+	}
+}
+
+static void
+rte_latencystats_fill_values(struct rte_metric_value *values)
+{
+	unsigned int i;
+	float *stats_ptr = NULL;
+
+	for (i = 0; i < NUM_LATENCY_STATS; i++) {
+		stats_ptr = RTE_PTR_ADD(glob_stats,
+				lat_stats_strings[i].offset);
+		values[i].key = i;
+		values[i].value = (uint64_t)floor((*stats_ptr)/
+						CYCLES_PER_NS);
+	}
+}
+
+static uint16_t
+add_time_stamps(uint8_t pid __rte_unused,
+		uint16_t qid __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		uint16_t max_pkts __rte_unused,
+		void *user_cb __rte_unused)
+{
+	unsigned int i;
+	uint64_t diff_tsc, now;
+
+	/*
+	 * For every sample interval,
+	 * time stamp is marked on one received packet.
+	 */
+	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		diff_tsc = now - prev_tsc;
+		timer_tsc += diff_tsc;
+		if (timer_tsc >= samp_intvl) {
+			/*
+			 * TBD: Mark the timestamp only
+			 * if not already marked by the
+			 * hardware or the PMD.
+			 */
+			pkts[i]->timestamp = now;
+			timer_tsc = 0;
+		}
+		prev_tsc = now;
+		now = rte_rdtsc();
+	}
+
+	return nb_pkts;
+}
+
+static uint16_t
+calc_latency(uint8_t pid __rte_unused,
+		uint16_t qid __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *_ __rte_unused)
+{
+	unsigned int i, cnt = 0;
+	uint64_t now;
+	float latency[nb_pkts];
+	static float prev_latency;
+	/*
+	 * Alpha represents degree of weighting decrease in EWMA,
+	 * a constant smoothing factor between 0 and 1. The value
+	 * is used below for measuring average latency.
+	 */
+	const float alpha = 0.2;
+
+	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		if (pkts[i]->timestamp)
+			latency[cnt++] = now - pkts[i]->timestamp;
+	}
+
+	for (i = 0; i < cnt; i++) {
+		/*
+		 * The jitter is calculated as statistical mean of interpacket
+		 * delay variation. The "jitter estimate" is computed by taking
+		 * the absolute values of the ipdv sequence and applying an
+		 * exponential filter with parameter 1/16 to generate the
+		 * estimate. i.e J=J+(|D(i-1,i)|-J)/16. Where J is jitter,
+		 * D(i-1,i) is difference in latency of two consecutive packets
+		 * i-1 and i.
+		 * Reference: Calculated as per RFC 5481, sec 4.1,
+		 * RFC 3393 sec 4.5, RFC 1889 sec.
+		 */
+		glob_stats->jitter +=  (abs(prev_latency - latency[i])
+					- glob_stats->jitter)/16;
+		if (glob_stats->min_latency == 0)
+			glob_stats->min_latency = latency[i];
+		else if (latency[i] < glob_stats->min_latency)
+			glob_stats->min_latency = latency[i];
+		else if (latency[i] > glob_stats->max_latency)
+			glob_stats->max_latency = latency[i];
+		/*
+		 * The average latency is measured using exponential moving
+		 * average, i.e. using EWMA
+		 * https://en.wikipedia.org/wiki/Moving_average
+		 */
+		glob_stats->avg_latency +=
+			alpha * (latency[i] - glob_stats->avg_latency);
+		prev_latency = latency[i];
+	}
+
+	return nb_pkts;
+}
+
+int
+rte_latencystats_init(uint64_t samp_intvl,
+		rte_latency_stats_flow_type_fn user_cb)
+{
+	unsigned int i;
+	uint8_t pid;
+	uint16_t qid;
+	struct rxtx_cbs *cbs = NULL;
+	const uint8_t nb_ports = rte_eth_dev_count();
+	const char *ptr_strings[NUM_LATENCY_STATS] = {0};
+	const struct rte_memzone *mz = NULL;
+	const unsigned int flags = 0;
+
+	/** Allocate stats in shared memory fo muliti process support */
+	mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*glob_stats),
+					rte_socket_id(), flags);
+	if (mz == NULL) {
+		RTE_LOG(ERR, LATENCY_STATS, "Cannot reserve memory: %s:%d\n",
+			__func__, __LINE__);
+		return -ENOMEM;
+	}
+
+	glob_stats = mz->addr;
+	samp_intvl *= CYCLES_PER_NS;
+
+	/** Register latency stats with stats library */
+	for (i = 0; i < NUM_LATENCY_STATS; i++)
+		ptr_strings[i] = lat_stats_strings[i].name;
+
+	latency_stats_index = rte_metrics_reg_metrics(ptr_strings,
+							NUM_LATENCY_STATS);
+	if (latency_stats_index < 0) {
+		RTE_LOG(DEBUG, LATENCY_STATS,
+			"Failed to register latency stats names\n");
+		return -1;
+	}
+
+	/** Register Rx/Tx callbacks */
+	for (pid = 0; pid < nb_ports; pid++) {
+		struct rte_eth_dev_info dev_info;
+		rte_eth_dev_info_get(pid, &dev_info);
+		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
+			cbs = &rx_cbs[pid][qid];
+			cbs->cb = rte_eth_add_first_rx_callback(pid, qid,
+					add_time_stamps, user_cb);
+			if (!cbs->cb)
+				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
+					"register Rx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
+			cbs = &tx_cbs[pid][qid];
+			cbs->cb =  rte_eth_add_tx_callback(pid, qid,
+					calc_latency, user_cb);
+			if (!cbs->cb)
+				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
+					"register Tx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+	}
+
+	int ret = 0;
+	char thread_name[RTE_MAX_THREAD_NAME_LEN];
+
+	/** Create the host thread to update latency stats to stats library */
+	ret = pthread_create(&latency_stats_thread, NULL, report_latency_stats,
+				NULL);
+	if (ret != 0) {
+		RTE_LOG(ERR, LATENCY_STATS,
+			"Failed to create the latency stats thread:%s, %s:%d\n",
+			strerror(errno), __func__, __LINE__);
+		return -1;
+	}
+	/** Set thread_name for aid in debugging */
+	snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "latency-stats-thread");
+	ret = rte_thread_setname(latency_stats_thread, thread_name);
+	if (ret != 0)
+		RTE_LOG(DEBUG, LATENCY_STATS,
+			"Failed to set thread name for latency stats handling\n");
+
+	return 0;
+}
+
+int
+rte_latencystats_uninit(void)
+{
+	uint8_t pid;
+	uint16_t qid;
+	int ret = 0;
+	struct rxtx_cbs *cbs = NULL;
+	const uint8_t nb_ports = rte_eth_dev_count();
+
+	/** De register Rx/Tx callbacks */
+	for (pid = 0; pid < nb_ports; pid++) {
+		struct rte_eth_dev_info dev_info;
+		rte_eth_dev_info_get(pid, &dev_info);
+		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
+			cbs = &rx_cbs[pid][qid];
+			ret = rte_eth_remove_rx_callback(pid, qid, cbs->cb);
+			if (ret)
+				RTE_LOG(INFO, LATENCY_STATS, "failed to "
+					"remove Rx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
+			cbs = &tx_cbs[pid][qid];
+			ret = rte_eth_remove_tx_callback(pid, qid, cbs->cb);
+			if (ret)
+				RTE_LOG(INFO, LATENCY_STATS, "failed to "
+					"remove Tx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+	}
+
+	/** Cancel the thread */
+	ret = pthread_cancel(latency_stats_thread);
+	if (ret != 0) {
+		RTE_LOG(ERR, LATENCY_STATS,
+			"Failed to cancel latency stats update thread:"
+			"%s,%s:%d\n",
+			strerror(errno), __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_latencystats_get_names(struct rte_metric_name *names, uint16_t size)
+{
+	unsigned int i;
+
+	if (names == NULL || size < NUM_LATENCY_STATS)
+		return NUM_LATENCY_STATS;
+
+	for (i = 0; i < NUM_LATENCY_STATS; i++)
+		snprintf(names[i].name, sizeof(names[i].name),
+				"%s", lat_stats_strings[i].name);
+
+	return NUM_LATENCY_STATS;
+}
+
+int
+rte_latencystats_get(struct rte_metric_value *values, uint16_t size)
+{
+	if (size < NUM_LATENCY_STATS || values == NULL)
+		return NUM_LATENCY_STATS;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		const struct rte_memzone *mz;
+		mz = rte_memzone_lookup(MZ_RTE_LATENCY_STATS);
+		if (mz == NULL) {
+			RTE_LOG(ERR, LATENCY_STATS,
+				"Latency stats memzone not found\n");
+			return -ENOMEM;
+		}
+		glob_stats =  mz->addr;
+	}
+
+	/* Retrieve latency stats */
+	rte_latencystats_fill_values(values);
+
+	return NUM_LATENCY_STATS;
+}
diff --git a/lib/librte_latencystats/rte_latencystats.h b/lib/librte_latencystats/rte_latencystats.h
new file mode 100644
index 0000000..405b878
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats.h
@@ -0,0 +1,146 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_LATENCYSTATS_H_
+#define _RTE_LATENCYSTATS_H_
+
+/**
+ * @file
+ * RTE latency stats
+ *
+ * library to provide application and flow based latency stats.
+ */
+
+#include <rte_metrics.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ *  Note: This function pointer is for future flow based latency stats
+ *  implementation.
+ *
+ * Function type used for identifting flow types of a Rx packet.
+ *
+ * The callback function is called on Rx for each packet.
+ * This function is used for flow based latency calculations.
+ *
+ * @param pkt
+ *   Packet that has to be identified with its flow types.
+ * @param user_param
+ *   The arbitrary user parameter passed in by the application when
+ *   the callback was originally configured.
+ * @return
+ *   The flow_mask, representing the multiple flow types of a packet.
+ */
+typedef uint16_t (*rte_latency_stats_flow_type_fn)(struct rte_mbuf *pkt,
+							void *user_param);
+
+/**
+ *  Registers Rx/Tx callbacks for each active port, queue.
+ *
+ * @param samp_intvl
+ *  Sampling time period in nano seconds, at which packet
+ *  should be marked with time stamp.
+ * @param user_cb
+ *  Note: This param is for future flow based latency stats
+ *  implementation.
+ *  User callback to be called to get flow types of a packet.
+ *  Used for flow based latency calculation.
+ *  If the value is NULL, global stats will be calculated,
+ *  else flow based latency stats will be calculated.
+ *  For now just pass on the NULL value to this param.
+ *  @return
+ *   -1     : On error
+ *   -ENOMEM: On error
+ *    0     : On success
+ */
+int rte_latencystats_init(uint64_t samp_intvl,
+			rte_latency_stats_flow_type_fn user_cb);
+
+/**
+ *  Removes registered Rx/Tx callbacks for each active port, queue.
+ *
+ *  @return
+ *   -1: On error
+ *    0: On success
+ */
+int rte_latencystats_uninit(void);
+
+/**
+ * Retrieve names of latency statistics
+ *
+ * @param names
+ *  Block of memory to insert names into. Must be at least size in capacity.
+ *  If set to NULL, function returns required capacity.
+ * @param size
+ *  Capacity of latency stats names (number of names).
+ * @return
+ *   - positive value lower or equal to size: success. The return value
+ *     is the number of entries filled in the stats table.
+ *   - positive value higher than size: error, the given statistics table
+ *     is too small. The return value corresponds to the size that should
+ *     be given to succeed. The entries in the table are not valid and
+ *     shall not be used by the caller.
+ */
+int rte_latencystats_get_names(struct rte_metric_name *names,
+				uint16_t size);
+
+/**
+ * Retrieve latency statistics.
+ *
+ * @param values
+ *   A pointer to a table of structure of type *rte_metric_value*
+ *   to be filled with latency statistics ids and values.
+ *   This parameter can be set to NULL if size is 0.
+ * @param size
+ *   The size of the stats table, which should be large enough to store
+ *   all the latency stats.
+ * @return
+ *   - positive value lower or equal to size: success. The return value
+ *     is the number of entries filled in the stats table.
+ *   - positive value higher than size: error, the given statistics table
+ *     is too small. The return value corresponds to the size that should
+ *     be given to succeed. The entries in the table are not valid and
+ *     shall not be used by the caller.
+ *   -ENOMEM: On failure.
+ */
+int rte_latencystats_get(struct rte_metric_value *values,
+			uint16_t size);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_LATENCYSTATS_H_ */
diff --git a/lib/librte_latencystats/rte_latencystats_version.map b/lib/librte_latencystats/rte_latencystats_version.map
new file mode 100644
index 0000000..502018e
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats_version.map
@@ -0,0 +1,10 @@
+DPDK_17.02 {
+	global:
+
+	rte_latencystats_get;
+	rte_latencystats_get_names;
+	rte_latencystats_init;
+	rte_latencystats_uninit;
+
+	local: *;
+};
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ead7c6e..44ba922 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -493,6 +493,9 @@ struct rte_mbuf {
 
 	/** Timesync flags for use with IEEE1588. */
 	uint16_t timesync;
+
+	/** Timestamp for measuring latency. */
+	uint64_t timestamp;
 } __rte_cache_aligned;
 
 /**
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index f75f0e2..4e5289a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
+_LDLIBS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)  += -lrte_latencystats
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v4 2/3] lib: add bitrate statistics library
    2016-11-15  7:15  2% ` [dpdk-dev] [PATCH v4 1/3] lib: add information metrics library Remy Horton
@ 2016-11-15  7:15  3% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-11-15  7:15 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_17_02.rst             |   5 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 128 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 11 files changed, 288 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 52bd8a9..d6bbdd5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -600,6 +600,10 @@ M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 F: doc/guides/sample_app_ug/keep_alive.rst
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index dedc4c3..beca7ec 100644
--- a/config/common_base
+++ b/config/common_base
@@ -594,3 +594,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the device metrics library
 #
 CONFIG_RTE_LIBRTE_METRICS=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index ca50fa6..91e8ea6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -148,4 +148,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [Device Metrics]     (@ref rte_metrics.h),
+  [Bitrate Statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fe830eb..8765ddd 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -58,6 +58,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_ring \
                           lib/librte_sched \
                           lib/librte_metrics \
+                          lib/librte_bitratestats \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index e1b8894..f949e88 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,6 +40,11 @@ New Features
      intended to provide a reporting mechanism that is independent of the
      ethdev library.
 
+   * **Added bit-rate calculation library.**
+
+     A library that can be used to calculate device bit-rates. Calculated
+     bitrates are reported using the metrics library.
+
      This section is a comment. do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =========================================================
diff --git a/lib/Makefile b/lib/Makefile
index 5d85dcf..e211bc0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..b725d4e
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..6346bb1
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,128 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate_s {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates_s {
+	struct rte_stats_bitrate_s port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates_s *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates_s), 0);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data)
+{
+	const char *names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_metrics(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate_s *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +50 fixes integer rounding during divison */
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_metrics(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..bc87c5e
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates_s;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates_s *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..66f232f
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.02 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 40fcf33..6aac5ac 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 1/3] lib: add information metrics library
  @ 2016-11-15  7:15  2% ` Remy Horton
  2016-11-15  7:15  3% ` [dpdk-dev] [PATCH v4 2/3] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-11-15  7:15 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon

This patch adds a new information metric library that allows other
modules to register named metrics and update their values. It is
intended to be independent of ethdev, rather than mixing ethdev
and non-ethdev information in xstats.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   5 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/rel_notes/release_17_02.rst     |   6 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 308 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 190 ++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 11 files changed, 583 insertions(+)
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..52bd8a9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -595,6 +595,11 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+F: doc/guides/sample_app_ug/keep_alive.rst
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 4bff83a..dedc4c3 100644
--- a/config/common_base
+++ b/config/common_base
@@ -589,3 +589,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..ca50fa6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -147,4 +147,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [Device Metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..fe830eb 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -57,6 +57,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_reorder \
                           lib/librte_ring \
                           lib/librte_sched \
+                          lib/librte_metrics \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 3b65038..e1b8894 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -34,6 +34,12 @@ New Features
 
      Refer to the previous release notes for examples.
 
+   * **Added information metric library.**
+
+     A library that allows information metrics to be added and update. It is
+     intended to provide a reporting mechanism that is independent of the
+     ethdev library.
+
      This section is a comment. do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
      =========================================================
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..5d85dcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..5edacc6
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,308 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ * @param name
+ *   Name of metric
+ * @param value
+ *   Current value for metric
+ * @param idx_next_set
+ *   Index of next root element (zero for none)
+ * @param idx_next_metric
+ *   Index of next metric in set (zero for none)
+ *
+ * Only the root of each set needs idx_next_set but since it has to be
+ * assumed that number of sets could equal total number of metrics,
+ * having a separate set metadata table doesn't save any memory.
+ */
+struct rte_metrics_meta_s {
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	uint64_t value[RTE_MAX_ETHPORTS];
+	uint64_t nonport_value;
+	uint16_t idx_next_set;
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * @param idx_last_set
+ *   Index of last metadata entry with valid data. This value is
+ *   not valid if cnt_stats is zero.
+ * @param cnt_stats
+ *   Number of metrics.
+ * @param metadata
+ *   Stat data memory block.
+ *
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	uint16_t idx_last_set;
+	uint16_t cnt_stats;
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(void)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), rte_socket_id(), 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_metric(const char *name)
+{
+	const char *list_names[] = {name};
+
+	return rte_metrics_reg_metrics(list_names, 1);
+}
+
+int
+rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_metric(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_metrics(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_metrics(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_NONPORT)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].nonport_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		if (port_id == RTE_METRICS_NONPORT)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->nonport_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..c58b366
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,190 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * RTE Metrics module
+ *
+ * Metric information is populated using a push model, where the
+ * information provider calls an update function on the relevant
+ * metrics. Currently only bulk querying of metrics is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/** Used to indicate port-independent information */
+#define RTE_METRICS_NONPORT -1
+
+
+/**
+ * Metric name
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric name.
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This only has to be explicitly called if you
+ * intend to use rte_metrics_reg_metric() or rte_metrics_reg_metrics() from a
+ * secondary process. This function must be called from a primary process.
+ */
+void rte_metrics_init(void);
+
+
+/**
+ * Register a metric
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metric(const char *name);
+
+/**
+ * Register a set of metrics
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metrics(const char **names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   Array of names to receive key names
+ *
+ * @param capacity
+ *   Space available in names
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Fetch metrics.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   Array to receive values and their keys
+ *
+ * @param capacity
+ *   Space available in values
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metric(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metrics(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..f904814
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.02 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_metric;
+	rte_metrics_reg_metrics;
+	rte_metrics_update_metric;
+	rte_metrics_update_metrics;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index f75f0e2..40fcf33 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1] doc: add template release notes for 17.02
@ 2016-11-14 12:31  6% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2016-11-14 12:31 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Add template release notes for DPDK 17.02 with inline
comments and explanations of the various sections.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_17_02.rst | 223 +++++++++++++++++++++++++++++++++
 2 files changed, 224 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_17_02.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 7e51b2c..cf8f167 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -36,6 +36,7 @@ Release Notes
     :numbered:
 
     rel_description
+    release_17_02
     release_16_11
     release_16_07
     release_16_04
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
new file mode 100644
index 0000000..d251752
--- /dev/null
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -0,0 +1,223 @@
+DPDK Release 17.02
+==================
+
+.. **Read this first.**
+
+   The text below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text: ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      firefox build/doc/html/guides/rel_notes/release_17_02.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to understand
+     the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like this.
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     This section is a comment. do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =========================================================
+
+
+Resolved Issues
+---------------
+
+.. This section should contain bug fixes added to the relevant sections. Sample format:
+
+   * **code/section Fixed issue in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description of the resolved issue in the past tense.
+     The title should contain the code/lib section like a commit message.
+     Add the entries in alphabetic order in the relevant sections below.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+EAL
+~~~
+
+
+Drivers
+~~~~~~~
+
+
+Libraries
+~~~~~~~~~
+
+
+Examples
+~~~~~~~~
+
+
+Other
+~~~~~
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change that was announced in
+     the previous releases and made in this release. Use fixed width quotes for
+     ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+``
+   sign, like this:
+
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+   This section is a comment. do not overwrite or remove it.
+   =========================================================
+
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     librte_acl.so.2
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_cryptodev.so.2
+     librte_distributor.so.1
+     librte_eal.so.3
+     librte_ethdev.so.5
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.2
+     librte_mempool.so.2
+     librte_meter.so.1
+     librte_net.so.1
+     librte_pdump.so.1
+     librte_pipeline.so.3
+     librte_pmd_bond.so.1
+     librte_pmd_ring.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_table.so.2
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this release.
+
+   The format is:
+
+   #. Platform name.
+
+      * Platform details.
+      * Platform details.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Tested NICs
+-----------
+
+.. This section should contain a list of NICs that were tested with this release.
+
+   The format is:
+
+   #. NIC name.
+
+      * NIC details.
+      * NIC details.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Tested OSes
+-----------
+
+.. This section should contain a list of OSes that were tested with this release.
+   The format is as follows, in alphabetical order:
+
+   * CentOS 7.0
+   * Fedora 23
+   * Fedora 24
+   * FreeBSD 10.3
+   * Red Hat Enterprise Linux 7.2
+   * SUSE Enterprise Linux 12
+   * Ubuntu 15.10
+   * Ubuntu 16.04 LTS
+   * Wind River Linux 8
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance
  2016-11-11  1:26  4% ` Zhang, Helin
@ 2016-11-13 13:57  4%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-13 13:57 UTC (permalink / raw)
  To: Yang, Qiming; +Cc: dev, Zhang, Helin

> > This patch adds a notice that the ABI change for ethtool app to get the NIC
> > firmware version in the 17.02 release.
> > 
> > Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Beilei Xing <beilei.xing@intel.com>
> Acked-by: Helin Zhang <helin.zhang@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether
  2016-11-10 10:36  4%   ` Ferruh Yigit
@ 2016-11-13 13:46  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-13 13:46 UTC (permalink / raw)
  To: Iremonger, Bernard; +Cc: dev, Ferruh Yigit, Mcnamara, John

> >> In 17.02 five rte_eth_dev_set_vf_*** functions will be removed from
> >> librte_ether, renamed and moved to the ixgbe PMD.
> >>
> >> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> > 
> > Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal
  2016-11-11 15:02  4%     ` Pattan, Reshma
@ 2016-11-13  9:02  4%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-13  9:02 UTC (permalink / raw)
  To: Shreyansh Jain
  Cc: Pattan, Reshma, Neil Horman, dev, Yigit, Ferruh, David Marchand

> > >> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> > >
> > > Acked-by: David Marchand <david.marchand@6wind.com>
> > 
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Acked-by: Reshma Pattan <reshma.pattan@intel.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal
  2016-11-11 13:05  4%   ` Ferruh Yigit
@ 2016-11-11 15:02  4%     ` Pattan, Reshma
  2016-11-13  9:02  4%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Pattan, Reshma @ 2016-11-11 15:02 UTC (permalink / raw)
  To: Shreyansh Jain
  Cc: Neil Horman, dev, Thomas Monjalon, Yigit, Ferruh, David Marchand



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Friday, November 11, 2016 1:05 PM
> To: David Marchand <david.marchand@6wind.com>; Shreyansh Jain
> <shreyansh.jain@nxp.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>; dev@dpdk.org; Thomas
> Monjalon <thomas.monjalon@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: announce API and ABI changes for
> librte_eal
> 
> On 11/10/2016 3:51 PM, David Marchand wrote:
> > On Thu, Nov 10, 2016 at 12:17 PM, Shreyansh Jain
> <shreyansh.jain@nxp.com> wrote:
> >> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> >> index 1a9e1ae..2af2476 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -35,3 +35,13 @@ Deprecation Notices
> >>  * mempool: The functions for single/multi producer/consumer are
> deprecated
> >>    and will be removed in 17.02.
> >>    It is replaced by ``rte_mempool_generic_get/put`` functions.
> >> +
> >> +* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will
> be
> >> +  impacted because of introduction of a new ``rte_bus`` hierarchy. This
> would
> >> +  also impact the way devices are identified by EAL. A bus-device-driver
> model
> >> +  will be introduced providing a hierarchical view of devices.
> >> +
> >> +* ``eth_driver`` is planned to be removed in 17.02. This currently serves
> as
> >> +  a placeholder for PMDs to register themselves. Changes for ``rte_bus``
> will
> >> +  provide a way to handle device initialization currently being done in
> >> +  ``eth_driver``.
> >> --
> >> 2.7.4
> >>
> >
> > Acked-by: David Marchand <david.marchand@6wind.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Reshma Pattan <reshma.pattan@intel.com>


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal
  2016-11-10 15:51  4% ` David Marchand
@ 2016-11-11 13:05  4%   ` Ferruh Yigit
  2016-11-11 15:02  4%     ` Pattan, Reshma
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-11-11 13:05 UTC (permalink / raw)
  To: David Marchand, Shreyansh Jain; +Cc: Neil Horman, dev, Thomas Monjalon

On 11/10/2016 3:51 PM, David Marchand wrote:
> On Thu, Nov 10, 2016 at 12:17 PM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index 1a9e1ae..2af2476 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -35,3 +35,13 @@ Deprecation Notices
>>  * mempool: The functions for single/multi producer/consumer are deprecated
>>    and will be removed in 17.02.
>>    It is replaced by ``rte_mempool_generic_get/put`` functions.
>> +
>> +* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
>> +  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
>> +  also impact the way devices are identified by EAL. A bus-device-driver model
>> +  will be introduced providing a hierarchical view of devices.
>> +
>> +* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
>> +  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
>> +  provide a way to handle device initialization currently being done in
>> +  ``eth_driver``.
>> --
>> 2.7.4
>>
> 
> Acked-by: David Marchand <david.marchand@6wind.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: remove iomem and ioport handling in igb_uio
  @ 2016-11-11  2:12  3%   ` Remy Horton
  0 siblings, 0 replies; 200+ results
From: Remy Horton @ 2016-11-11  2:12 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: ferruh.yigit, david.marchand, thomas.monjalon


On 22/09/2016 13:44, Jianfeng Tan wrote:
[..]
>
> Suggested-by: Yigit, Ferruh <ferruh.yigit@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Acked-by: Remy Horton <remy.horton@intel.com>


> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -57,3 +57,8 @@ Deprecation Notices
>  * API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
>    structures. The member ``file_name`` data type will be changed from
>    ``char *`` to ``const char *``. This change targets release 16.11.

As an aside I don't think changing a structure entry to const will 
affect its binary layout, so this ought not be an ABI break..

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance
    2016-10-24  9:23  4% ` Wu, Jingjing
  2016-10-25  2:53  4% ` Xing, Beilei
@ 2016-11-11  1:26  4% ` Zhang, Helin
  2016-11-13 13:57  4%   ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Zhang, Helin @ 2016-11-11  1:26 UTC (permalink / raw)
  To: Yang, Qiming, dev; +Cc: Yang, Qiming



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qiming Yang
> Sent: Sunday, October 9, 2016 11:17 AM
> To: dev@dpdk.org
> Cc: Yang, Qiming
> Subject: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app
> enhance
> 
> This patch adds a notice that the ABI change for ethtool app to get the NIC
> firmware version in the 17.02 release.
> 
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>
Acked-by: Helin Zhang <helin.zhang@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 845d2aa..60bd7ed 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -62,3 +62,7 @@ Deprecation Notices
>  * API will change for ``rte_port_source_params`` and
> ``rte_port_sink_params``
>    structures. The member ``file_name`` data type will be changed from
>    ``char *`` to ``const char *``. This change targets release 16.11.
> +
> +* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
> +  will be extended with a new member ``fw_version`` in order to store
> +  the NIC firmware version.
> --
> 2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
  2016-11-10 10:26  4% ` Kulasek, TomaszX
@ 2016-11-10 23:33  4%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-10 23:33 UTC (permalink / raw)
  To: Kulasek, TomaszX; +Cc: dev

> > The changes for the feature "Tx prepare" should be made in version 17.02.
> > 
> > Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> 
> Acked-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal
  2016-11-10 11:17  9% [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal Shreyansh Jain
@ 2016-11-10 15:51  4% ` David Marchand
  2016-11-11 13:05  4%   ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2016-11-10 15:51 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: Neil Horman, dev, Thomas Monjalon

On Thu, Nov 10, 2016 at 12:17 PM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 1a9e1ae..2af2476 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -35,3 +35,13 @@ Deprecation Notices
>  * mempool: The functions for single/multi producer/consumer are deprecated
>    and will be removed in 17.02.
>    It is replaced by ``rte_mempool_generic_get/put`` functions.
> +
> +* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
> +  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
> +  also impact the way devices are identified by EAL. A bus-device-driver model
> +  will be introduced providing a hierarchical view of devices.
> +
> +* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
> +  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
> +  provide a way to handle device initialization currently being done in
> +  ``eth_driver``.
> --
> 2.7.4
>

Acked-by: David Marchand <david.marchand@6wind.com>


-- 
David Marchand

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
  2016-11-09 22:31 21% [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare Thomas Monjalon
  2016-11-10 10:16  4% ` Mcnamara, John
  2016-11-10 10:26  4% ` Kulasek, TomaszX
@ 2016-11-10 11:15  4% ` Ananyev, Konstantin
  2 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2016-11-10 11:15 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Wednesday, November 9, 2016 10:31 PM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
> 
> The changes for the feature "Tx prepare" should be made in version 17.02.
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 1a9e1ae..ab6014d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,8 +8,8 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
> 
> -* In 16.11 ABI changes are planned: the ``rte_eth_dev`` structure will be
> -  extended with new function pointer ``tx_pkt_prep`` allowing verification
> +* In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prepare`` allowing verification
>    and processing of packet burst to meet HW specific requirements before
>    transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
>    ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.7.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal
@ 2016-11-10 11:17  9% Shreyansh Jain
  2016-11-10 15:51  4% ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-11-10 11:17 UTC (permalink / raw)
  To: nhorman; +Cc: dev, thomas.monjalon, Shreyansh Jain

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1a9e1ae..2af2476 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -35,3 +35,13 @@ Deprecation Notices
 * mempool: The functions for single/multi producer/consumer are deprecated
   and will be removed in 17.02.
   It is replaced by ``rte_mempool_generic_get/put`` functions.
+
+* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
+  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
+  also impact the way devices are identified by EAL. A bus-device-driver model
+  will be introduced providing a hierarchical view of devices.
+
+* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
+  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
+  provide a way to handle device initialization currently being done in
+  ``eth_driver``.
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] Clarification for eth_driver changes
  2016-11-10 10:51  3%         ` Stephen Hemminger
  2016-11-10 11:07  0%           ` Thomas Monjalon
@ 2016-11-10 11:09  0%           ` Shreyansh Jain
  1 sibling, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-11-10 11:09 UTC (permalink / raw)
  To: Stephen Hemminger, Jianbo Liu; +Cc: David Marchand, dev, Thomas Monjalon

On Thursday 10 November 2016 04:21 PM, Stephen Hemminger wrote:
> I also think drv_flags should part of device not PCI. Most of the flags
> there like link state support are generic. If it isn't changed for this
> release will probably have to break ABI to fully support VMBUS
>

I didn't get your point.
Currently drv_flags is in rte_pci_driver.
You intend to say that it should be moved to rte_device?

And, all the changes being discussed here are for 17.02.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Clarification for eth_driver changes
  2016-11-10 10:51  3%         ` Stephen Hemminger
@ 2016-11-10 11:07  0%           ` Thomas Monjalon
  2016-11-10 11:09  0%           ` Shreyansh Jain
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-10 11:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jianbo Liu, David Marchand, Shreyansh Jain, dev

Hi Stephen,

2016-11-10 02:51, Stephen Hemminger:
> I also think drv_flags should part of device not PCI. Most of the flags
> there like link state support are generic. If it isn't changed for this
> release will probably have to break ABI to fully support VMBUS

When do you plan to send VMBUS patches?
Could you send a deprecation notice for this change?
Are you aware of the work started by Shreyansh to have a generic bus model?
Could you help in 17.02 timeframe to have a solid bus model?

Thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Clarification for eth_driver changes
  @ 2016-11-10 10:51  3%         ` Stephen Hemminger
  2016-11-10 11:07  0%           ` Thomas Monjalon
  2016-11-10 11:09  0%           ` Shreyansh Jain
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2016-11-10 10:51 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: David Marchand, Shreyansh Jain, dev, Thomas Monjalon

I also think drv_flags should part of device not PCI. Most of the flags
there like link state support are generic. If it isn't changed for this
release will probably have to break ABI to fully support VMBUS

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether
  2016-11-04 13:39  4% ` Mcnamara, John
@ 2016-11-10 10:36  4%   ` Ferruh Yigit
  2016-11-13 13:46  4%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2016-11-10 10:36 UTC (permalink / raw)
  To: Mcnamara, John, Iremonger, Bernard, dev

On 11/4/2016 1:39 PM, Mcnamara, John wrote:
> 
> 
>> -----Original Message-----
>> From: Iremonger, Bernard
>> Sent: Tuesday, October 18, 2016 2:38 PM
>> To: dev@dpdk.org; Mcnamara, John <john.mcnamara@intel.com>
>> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>
>> Subject: [PATCH v1] doc: announce API and ABI change for librte_ether
>>
>> In 17.02 five rte_eth_dev_set_vf_*** functions will be removed from
>> librte_ether, renamed and moved to the ixgbe PMD.
>>
>> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> 
> Acked-by: John McNamara <john.mcnamara@intel.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether
  2016-10-18 13:38  4% [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether Bernard Iremonger
  2016-11-04 13:39  4% ` Mcnamara, John
@ 2016-11-10 10:26  4% ` Pattan, Reshma
  1 sibling, 0 replies; 200+ results
From: Pattan, Reshma @ 2016-11-10 10:26 UTC (permalink / raw)
  To: Iremonger, Bernard; +Cc: Iremonger, Bernard, Mcnamara, John, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bernard Iremonger
> Sent: Tuesday, October 18, 2016 2:38 PM
> To: dev@dpdk.org; Mcnamara, John <john.mcnamara@intel.com>
> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>
> Subject: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for
> librte_ether
> 
> In 17.02 five rte_eth_dev_set_vf_*** functions will be removed from
> librte_ether, renamed and moved to the ixgbe PMD.
> 
> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 36
> ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 1d274d8..20e11ac 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -53,3 +53,39 @@ Deprecation Notices
>  * librte_ether: an API change is planned for 17.02 for the function
>    ``_rte_eth_dev_callback_process``. In 17.02 the function will return an
> ``int``
>    instead of ``void`` and a fourth parameter ``void *ret_param`` will be
> added.
> +
> +* librte_ether: for 17.02 it is planned to deprecate the following five
> functions:
> +
> +  ``rte_eth_dev_set_vf_rxmode``
> +
> +  ``rte_eth_dev_set_vf_rx``
> +
> +  ``rte_eth_dev_set_vf_tx``
> +
> +  ``rte_eth_dev_set_vf_vlan_filter``
> +
> +  ``rte_eth_set_vf_rate_limit``
> +
> +  The following fields will be removed from ``struct eth_dev_ops``:
> +
> +  ``eth_set_vf_rx_mode_t``
> +
> +  ``eth_set_vf_rx_t``
> +
> +  ``eth_set_vf_tx_t``
> +
> +  ``eth_set_vf_vlan_filter_t``
> +
> +  ``eth_set_vf_rate_limit_t``
> +
> +  The functions will be renamed to the following, and moved to the ``ixgbe``
> PMD.
> +
> +  ``rte_pmd_ixgbe_set_vf_rxmode``
> +
> +  ``rte_pmd_ixgbe_set_vf_rx``
> +
> +  ``rte_pmd_ixgbe_set_vf_tx``
> +
> +  ``rte_pmd_ixgbe_set_vf_vlan_filter``
> +
> +  ``rte_pmd_ixgbe_set_vf_rate_limit``
> --
> 2.10.1

Acked-by: Reshma Pattan <reshma.pattan@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
  2016-11-09 22:31 21% [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare Thomas Monjalon
  2016-11-10 10:16  4% ` Mcnamara, John
@ 2016-11-10 10:26  4% ` Kulasek, TomaszX
  2016-11-10 23:33  4%   ` Thomas Monjalon
  2016-11-10 11:15  4% ` Ananyev, Konstantin
  2 siblings, 1 reply; 200+ results
From: Kulasek, TomaszX @ 2016-11-10 10:26 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, November 9, 2016 23:31
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] doc: postpone ABI changes for Tx prepare
> 
> The changes for the feature "Tx prepare" should be made in version 17.02.
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 1a9e1ae..ab6014d 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,8 +8,8 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
> 
> -* In 16.11 ABI changes are planned: the ``rte_eth_dev`` structure will be
> -  extended with new function pointer ``tx_pkt_prep`` allowing
> verification
> +* In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
> +  extended with new function pointer ``tx_pkt_prepare`` allowing
> verification
>    and processing of packet burst to meet HW specific requirements before
>    transmit. Also new fields will be added to the ``rte_eth_desc_lim``
> structure:
>    ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about
> number of
> --
> 2.7.0

Acked-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
  2016-11-09 22:31 21% [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare Thomas Monjalon
@ 2016-11-10 10:16  4% ` Mcnamara, John
  2016-11-10 10:26  4% ` Kulasek, TomaszX
  2016-11-10 11:15  4% ` Ananyev, Konstantin
  2 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2016-11-10 10:16 UTC (permalink / raw)
  To: Thomas Monjalon, Kulasek, TomaszX; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Wednesday, November 9, 2016 10:31 PM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
> 
> The changes for the feature "Tx prepare" should be made in version 17.02.
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare
@ 2016-11-09 22:31 21% Thomas Monjalon
  2016-11-10 10:16  4% ` Mcnamara, John
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Thomas Monjalon @ 2016-11-09 22:31 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

The changes for the feature "Tx prepare" should be made in version 17.02.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1a9e1ae..ab6014d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,8 +8,8 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* In 16.11 ABI changes are planned: the ``rte_eth_dev`` structure will be
-  extended with new function pointer ``tx_pkt_prep`` allowing verification
+* In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
+  extended with new function pointer ``tx_pkt_prepare`` allowing verification
   and processing of packet burst to meet HW specific requirements before
   transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
-- 
2.7.0

^ permalink raw reply	[relevance 21%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes for mbuf
  2016-11-09 16:12 15% [dpdk-dev] [PATCH] doc: postpone ABI changes for mbuf Olivier Matz
@ 2016-11-09 22:16  4% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-09 22:16 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, john.mcnamara

2016-11-09 17:12, Olivier Matz:
> Mbuf modifications are not ready for 16.11, postpone them to 17.02.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Applied, thanks

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: postpone ABI changes for mbuf
@ 2016-11-09 16:12 15% Olivier Matz
  2016-11-09 22:16  4% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2016-11-09 16:12 UTC (permalink / raw)
  To: dev, john.mcnamara

Mbuf modifications are not ready for 16.11, postpone them to 17.02.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9f5fa55..1a9e1ae 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -15,16 +15,17 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
+* ABI changes are planned for 17.02 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
   store address is not naturally aligned. Other mbuf fields, such as the
-  ``port`` field, may be moved or removed as part of this mbuf work.
+  ``port`` field, may be moved or removed as part of this mbuf work. A
+  ``timestamp`` will also be added.
 
 * The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
   are respectively replaced by PKT_RX_VLAN_STRIPPED and
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
-  their behavior will be kept in 16.07 and will be removed in 16.11.
+  their behavior will be kept until 16.11 and will be removed in 17.02.
 
 * mempool: The functions ``rte_mempool_count`` and ``rte_mempool_free_count``
   will be removed in 17.02.
-- 
2.8.1

^ permalink raw reply	[relevance 15%]

* [dpdk-dev] [PATCH] net: introduce big and little endian types
@ 2016-11-09 15:04  2% Nelio Laranjeiro
  2016-12-05 10:09  0% ` Ananyev, Konstantin
  2016-12-08  9:30  3% ` Nélio Laranjeiro
  0 siblings, 2 replies; 200+ results
From: Nelio Laranjeiro @ 2016-11-09 15:04 UTC (permalink / raw)
  To: dev, Olivier Matz; +Cc: wenzhuo.lu, Adrien Mazarguil

This commit introduces new rte_{le,be}{16,32,64}_t types and updates
rte_{le,be,cpu}_to_{le,be,cpu}_*() and network header structures
accordingly.

Specific big/little endian types avoid uncertainty and conversion mistakes.

No ABI change since these are simply typedefs to the original types.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 .../common/include/generic/rte_byteorder.h         | 31 +++++++++++-------
 lib/librte_net/rte_arp.h                           | 15 +++++----
 lib/librte_net/rte_ether.h                         | 10 +++---
 lib/librte_net/rte_gre.h                           | 30 ++++++++---------
 lib/librte_net/rte_icmp.h                          | 11 ++++---
 lib/librte_net/rte_ip.h                            | 38 +++++++++++-----------
 lib/librte_net/rte_net.c                           | 10 +++---
 lib/librte_net/rte_sctp.h                          |  9 ++---
 lib/librte_net/rte_tcp.h                           | 19 ++++++-----
 lib/librte_net/rte_udp.h                           |  9 ++---
 10 files changed, 97 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_byteorder.h b/lib/librte_eal/common/include/generic/rte_byteorder.h
index e00bccb..059c2a5 100644
--- a/lib/librte_eal/common/include/generic/rte_byteorder.h
+++ b/lib/librte_eal/common/include/generic/rte_byteorder.h
@@ -75,6 +75,13 @@
 #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
 #endif
 
+typedef uint16_t rte_be16_t;
+typedef uint32_t rte_be32_t;
+typedef uint64_t rte_be64_t;
+typedef uint16_t rte_le16_t;
+typedef uint32_t rte_le32_t;
+typedef uint64_t rte_le64_t;
+
 /*
  * An internal function to swap bytes in a 16-bit value.
  *
@@ -143,65 +150,65 @@ static uint64_t rte_bswap64(uint64_t x);
 /**
  * Convert a 16-bit value from CPU order to little endian.
  */
-static uint16_t rte_cpu_to_le_16(uint16_t x);
+static rte_le16_t rte_cpu_to_le_16(uint16_t x);
 
 /**
  * Convert a 32-bit value from CPU order to little endian.
  */
-static uint32_t rte_cpu_to_le_32(uint32_t x);
+static rte_le32_t rte_cpu_to_le_32(uint32_t x);
 
 /**
  * Convert a 64-bit value from CPU order to little endian.
  */
-static uint64_t rte_cpu_to_le_64(uint64_t x);
+static rte_le64_t rte_cpu_to_le_64(uint64_t x);
 
 
 /**
  * Convert a 16-bit value from CPU order to big endian.
  */
-static uint16_t rte_cpu_to_be_16(uint16_t x);
+static rte_be16_t rte_cpu_to_be_16(uint16_t x);
 
 /**
  * Convert a 32-bit value from CPU order to big endian.
  */
-static uint32_t rte_cpu_to_be_32(uint32_t x);
+static rte_be32_t rte_cpu_to_be_32(uint32_t x);
 
 /**
  * Convert a 64-bit value from CPU order to big endian.
  */
-static uint64_t rte_cpu_to_be_64(uint64_t x);
+static rte_be64_t rte_cpu_to_be_64(uint64_t x);
 
 
 /**
  * Convert a 16-bit value from little endian to CPU order.
  */
-static uint16_t rte_le_to_cpu_16(uint16_t x);
+static uint16_t rte_le_to_cpu_16(rte_le16_t x);
 
 /**
  * Convert a 32-bit value from little endian to CPU order.
  */
-static uint32_t rte_le_to_cpu_32(uint32_t x);
+static uint32_t rte_le_to_cpu_32(rte_le32_t x);
 
 /**
  * Convert a 64-bit value from little endian to CPU order.
  */
-static uint64_t rte_le_to_cpu_64(uint64_t x);
+static uint64_t rte_le_to_cpu_64(rte_le64_t x);
 
 
 /**
  * Convert a 16-bit value from big endian to CPU order.
  */
-static uint16_t rte_be_to_cpu_16(uint16_t x);
+static uint16_t rte_be_to_cpu_16(rte_be16_t x);
 
 /**
  * Convert a 32-bit value from big endian to CPU order.
  */
-static uint32_t rte_be_to_cpu_32(uint32_t x);
+static uint32_t rte_be_to_cpu_32(rte_be32_t x);
 
 /**
  * Convert a 64-bit value from big endian to CPU order.
  */
-static uint64_t rte_be_to_cpu_64(uint64_t x);
+static uint64_t rte_be_to_cpu_64(rte_be64_t x);
 
 #endif /* __DOXYGEN__ */
 
diff --git a/lib/librte_net/rte_arp.h b/lib/librte_net/rte_arp.h
index 1836418..95f123e 100644
--- a/lib/librte_net/rte_arp.h
+++ b/lib/librte_net/rte_arp.h
@@ -40,6 +40,7 @@
 
 #include <stdint.h>
 #include <rte_ether.h>
+#include <rte_byteorder.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -50,22 +51,22 @@ extern "C" {
  */
 struct arp_ipv4 {
 	struct ether_addr arp_sha;  /**< sender hardware address */
-	uint32_t          arp_sip;  /**< sender IP address */
+	rte_be32_t        arp_sip;  /**< sender IP address */
 	struct ether_addr arp_tha;  /**< target hardware address */
-	uint32_t          arp_tip;  /**< target IP address */
+	rte_be32_t        arp_tip;  /**< target IP address */
 } __attribute__((__packed__));
 
 /**
  * ARP header.
  */
 struct arp_hdr {
-	uint16_t arp_hrd;    /* format of hardware address */
+	rte_be16_t arp_hrd;  /* format of hardware address */
 #define ARP_HRD_ETHER     1  /* ARP Ethernet address format */
 
-	uint16_t arp_pro;    /* format of protocol address */
-	uint8_t  arp_hln;    /* length of hardware address */
-	uint8_t  arp_pln;    /* length of protocol address */
-	uint16_t arp_op;     /* ARP opcode (command) */
+	rte_be16_t arp_pro;  /* format of protocol address */
+	uint8_t    arp_hln;  /* length of hardware address */
+	uint8_t    arp_pln;  /* length of protocol address */
+	rte_be16_t arp_op;   /* ARP opcode (command) */
 #define	ARP_OP_REQUEST    1 /* request to resolve address */
 #define	ARP_OP_REPLY      2 /* response to previous request */
 #define	ARP_OP_REVREQUEST 3 /* request proto addr given hardware */
diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
index ff3d065..159e061 100644
--- a/lib/librte_net/rte_ether.h
+++ b/lib/librte_net/rte_ether.h
@@ -300,7 +300,7 @@ ether_format_addr(char *buf, uint16_t size,
 struct ether_hdr {
 	struct ether_addr d_addr; /**< Destination address. */
 	struct ether_addr s_addr; /**< Source address. */
-	uint16_t ether_type;      /**< Frame type. */
+	rte_be16_t ether_type;    /**< Frame type. */
 } __attribute__((__packed__));
 
 /**
@@ -309,8 +309,8 @@ struct ether_hdr {
  * of the encapsulated frame.
  */
 struct vlan_hdr {
-	uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
-	uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
+	rte_be16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
+	rte_be16_t eth_proto;/**< Ethernet type of encapsulated frame. */
 } __attribute__((__packed__));
 
 /**
@@ -319,8 +319,8 @@ struct vlan_hdr {
  * Reserved fields (24 bits and 8 bits)
  */
 struct vxlan_hdr {
-	uint32_t vx_flags; /**< flag (8) + Reserved (24). */
-	uint32_t vx_vni;   /**< VNI (24) + Reserved (8). */
+	rte_be32_t vx_flags; /**< flag (8) + Reserved (24). */
+	rte_be32_t vx_vni;   /**< VNI (24) + Reserved (8). */
 } __attribute__((__packed__));
 
 /* Ethernet frame types */
diff --git a/lib/librte_net/rte_gre.h b/lib/librte_net/rte_gre.h
index 46568ff..b651af0 100644
--- a/lib/librte_net/rte_gre.h
+++ b/lib/librte_net/rte_gre.h
@@ -45,23 +45,23 @@ extern "C" {
  */
 struct gre_hdr {
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	uint16_t res2:4; /**< Reserved */
-	uint16_t s:1;    /**< Sequence Number Present bit */
-	uint16_t k:1;    /**< Key Present bit */
-	uint16_t res1:1; /**< Reserved */
-	uint16_t c:1;    /**< Checksum Present bit */
-	uint16_t ver:3;  /**< Version Number */
-	uint16_t res3:5; /**< Reserved */
+	uint16_t res2:4;  /**< Reserved */
+	uint16_t s:1;     /**< Sequence Number Present bit */
+	uint16_t k:1;     /**< Key Present bit */
+	uint16_t res1:1;  /**< Reserved */
+	uint16_t c:1;     /**< Checksum Present bit */
+	uint16_t ver:3;   /**< Version Number */
+	uint16_t res3:5;  /**< Reserved */
 #elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-	uint16_t c:1;    /**< Checksum Present bit */
-	uint16_t res1:1; /**< Reserved */
-	uint16_t k:1;    /**< Key Present bit */
-	uint16_t s:1;    /**< Sequence Number Present bit */
-	uint16_t res2:4; /**< Reserved */
-	uint16_t res3:5; /**< Reserved */
-	uint16_t ver:3;  /**< Version Number */
+	uint16_t c:1;     /**< Checksum Present bit */
+	uint16_t res1:1;  /**< Reserved */
+	uint16_t k:1;     /**< Key Present bit */
+	uint16_t s:1;     /**< Sequence Number Present bit */
+	uint16_t res2:4;  /**< Reserved */
+	uint16_t res3:5;  /**< Reserved */
+	uint16_t ver:3;   /**< Version Number */
 #endif
-	uint16_t proto;  /**< Protocol Type */
+	rte_be16_t proto; /**< Protocol Type */
 } __attribute__((__packed__));
 
 #ifdef __cplusplus
diff --git a/lib/librte_net/rte_icmp.h b/lib/librte_net/rte_icmp.h
index 8b287f6..81bd907 100644
--- a/lib/librte_net/rte_icmp.h
+++ b/lib/librte_net/rte_icmp.h
@@ -74,6 +74,7 @@
  */
 
 #include <stdint.h>
+#include <rte_byteorder.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -83,11 +84,11 @@ extern "C" {
  * ICMP Header
  */
 struct icmp_hdr {
-	uint8_t  icmp_type;   /* ICMP packet type. */
-	uint8_t  icmp_code;   /* ICMP packet code. */
-	uint16_t icmp_cksum;  /* ICMP packet checksum. */
-	uint16_t icmp_ident;  /* ICMP packet identifier. */
-	uint16_t icmp_seq_nb; /* ICMP packet sequence number. */
+	uint8_t  icmp_type;     /* ICMP packet type. */
+	uint8_t  icmp_code;     /* ICMP packet code. */
+	rte_be16_t icmp_cksum;  /* ICMP packet checksum. */
+	rte_be16_t icmp_ident;  /* ICMP packet identifier. */
+	rte_be16_t icmp_seq_nb; /* ICMP packet sequence number. */
 } __attribute__((__packed__));
 
 /* ICMP packet types */
diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h
index 4491b86..6f7da36 100644
--- a/lib/librte_net/rte_ip.h
+++ b/lib/librte_net/rte_ip.h
@@ -93,14 +93,14 @@ extern "C" {
 struct ipv4_hdr {
 	uint8_t  version_ihl;		/**< version and header length */
 	uint8_t  type_of_service;	/**< type of service */
-	uint16_t total_length;		/**< length of packet */
-	uint16_t packet_id;		/**< packet ID */
-	uint16_t fragment_offset;	/**< fragmentation offset */
+	rte_be16_t total_length;	/**< length of packet */
+	rte_be16_t packet_id;		/**< packet ID */
+	rte_be16_t fragment_offset;	/**< fragmentation offset */
 	uint8_t  time_to_live;		/**< time to live */
 	uint8_t  next_proto_id;		/**< protocol ID */
-	uint16_t hdr_checksum;		/**< header checksum */
-	uint32_t src_addr;		/**< source address */
-	uint32_t dst_addr;		/**< destination address */
+	rte_be16_t hdr_checksum;	/**< header checksum */
+	rte_be32_t src_addr;		/**< source address */
+	rte_be32_t dst_addr;		/**< destination address */
 } __attribute__((__packed__));
 
 /** Create IPv4 address */
@@ -340,11 +340,11 @@ static inline uint16_t
 rte_ipv4_phdr_cksum(const struct ipv4_hdr *ipv4_hdr, uint64_t ol_flags)
 {
 	struct ipv4_psd_header {
-		uint32_t src_addr; /* IP address of source host. */
-		uint32_t dst_addr; /* IP address of destination host. */
-		uint8_t  zero;     /* zero. */
-		uint8_t  proto;    /* L4 protocol type. */
-		uint16_t len;      /* L4 length. */
+		rte_be32_t src_addr; /* IP address of source host. */
+		rte_be32_t dst_addr; /* IP address of destination host. */
+		uint8_t    zero;     /* zero. */
+		uint8_t    proto;    /* L4 protocol type. */
+		rte_be16_t len;      /* L4 length. */
 	} psd_hdr;
 
 	psd_hdr.src_addr = ipv4_hdr->src_addr;
@@ -398,12 +398,12 @@ rte_ipv4_udptcp_cksum(const struct ipv4_hdr *ipv4_hdr, const void *l4_hdr)
  * IPv6 Header
  */
 struct ipv6_hdr {
-	uint32_t vtc_flow;     /**< IP version, traffic class & flow label. */
-	uint16_t payload_len;  /**< IP packet length - includes sizeof(ip_header). */
-	uint8_t  proto;        /**< Protocol, next header. */
-	uint8_t  hop_limits;   /**< Hop limits. */
-	uint8_t  src_addr[16]; /**< IP address of source host. */
-	uint8_t  dst_addr[16]; /**< IP address of destination host(s). */
+	rte_be32_t vtc_flow;     /**< IP version, traffic class & flow label. */
+	rte_be16_t payload_len;  /**< IP packet length - includes sizeof(ip_header). */
+	uint8_t    proto;        /**< Protocol, next header. */
+	uint8_t    hop_limits;   /**< Hop limits. */
+	uint8_t    src_addr[16]; /**< IP address of source host. */
+	uint8_t    dst_addr[16]; /**< IP address of destination host(s). */
 } __attribute__((__packed__));
 
 /**
@@ -427,8 +427,8 @@ rte_ipv6_phdr_cksum(const struct ipv6_hdr *ipv6_hdr, uint64_t ol_flags)
 {
 	uint32_t sum;
 	struct {
-		uint32_t len;   /* L4 length. */
-		uint32_t proto; /* L4 protocol - top 3 bytes must be zero */
+		rte_be32_t len;   /* L4 length. */
+		rte_be32_t proto; /* L4 protocol - top 3 bytes must be zero */
 	} psd_hdr;
 
 	psd_hdr.proto = (ipv6_hdr->proto << 24);
diff --git a/lib/librte_net/rte_net.c b/lib/librte_net/rte_net.c
index a8c7aff..9014ca5 100644
--- a/lib/librte_net/rte_net.c
+++ b/lib/librte_net/rte_net.c
@@ -153,8 +153,8 @@ ptype_inner_l4(uint8_t proto)
 
 /* get the tunnel packet type if any, update proto and off. */
 static uint32_t
-ptype_tunnel(uint16_t *proto, const struct rte_mbuf *m,
-	uint32_t *off)
+ptype_tunnel(rte_be16_t *proto, const struct rte_mbuf *m,
+	     uint32_t *off)
 {
 	switch (*proto) {
 	case IPPROTO_GRE: {
@@ -208,8 +208,8 @@ ip4_hlen(const struct ipv4_hdr *hdr)
 
 /* parse ipv6 extended headers, update offset and return next proto */
 static uint16_t
-skip_ip6_ext(uint16_t proto, const struct rte_mbuf *m, uint32_t *off,
-	int *frag)
+skip_ip6_ext(rte_be16_t proto, const struct rte_mbuf *m, uint32_t *off,
+	     int *frag)
 {
 	struct ext_hdr {
 		uint8_t next_hdr;
@@ -261,7 +261,7 @@ uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
 	struct ether_hdr eh_copy;
 	uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
 	uint32_t off = 0;
-	uint16_t proto;
+	rte_be16_t proto;
 
 	if (hdr_lens == NULL)
 		hdr_lens = &local_hdr_lens;
diff --git a/lib/librte_net/rte_sctp.h b/lib/librte_net/rte_sctp.h
index 688e126..8c646c7 100644
--- a/lib/librte_net/rte_sctp.h
+++ b/lib/librte_net/rte_sctp.h
@@ -81,15 +81,16 @@ extern "C" {
 #endif
 
 #include <stdint.h>
+#include <rte_byteorder.h>
 
 /**
  * SCTP Header
  */
 struct sctp_hdr {
-	uint16_t src_port; /**< Source port. */
-	uint16_t dst_port; /**< Destin port. */
-	uint32_t tag;      /**< Validation tag. */
-	uint32_t cksum;    /**< Checksum. */
+	rte_be16_t src_port; /**< Source port. */
+	rte_be16_t dst_port; /**< Destin port. */
+	rte_be32_t tag;      /**< Validation tag. */
+	rte_le32_t cksum;    /**< Checksum. */
 } __attribute__((__packed__));
 
 #ifdef __cplusplus
diff --git a/lib/librte_net/rte_tcp.h b/lib/librte_net/rte_tcp.h
index 28b61e6..545d4ab 100644
--- a/lib/librte_net/rte_tcp.h
+++ b/lib/librte_net/rte_tcp.h
@@ -77,6 +77,7 @@
  */
 
 #include <stdint.h>
+#include <rte_byteorder.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -86,15 +87,15 @@ extern "C" {
  * TCP Header
  */
 struct tcp_hdr {
-	uint16_t src_port;  /**< TCP source port. */
-	uint16_t dst_port;  /**< TCP destination port. */
-	uint32_t sent_seq;  /**< TX data sequence number. */
-	uint32_t recv_ack;  /**< RX data acknowledgement sequence number. */
-	uint8_t  data_off;  /**< Data offset. */
-	uint8_t  tcp_flags; /**< TCP flags */
-	uint16_t rx_win;    /**< RX flow control window. */
-	uint16_t cksum;     /**< TCP checksum. */
-	uint16_t tcp_urp;   /**< TCP urgent pointer, if any. */
+	rte_be16_t src_port;  /**< TCP source port. */
+	rte_be16_t dst_port;  /**< TCP destination port. */
+	rte_be32_t sent_seq;  /**< TX data sequence number. */
+	rte_be32_t recv_ack;  /**< RX data acknowledgement sequence number. */
+	uint8_t    data_off;  /**< Data offset. */
+	uint8_t    tcp_flags; /**< TCP flags */
+	rte_be16_t rx_win;    /**< RX flow control window. */
+	rte_be16_t cksum;     /**< TCP checksum. */
+	rte_be16_t tcp_urp;   /**< TCP urgent pointer, if any. */
 } __attribute__((__packed__));
 
 #ifdef __cplusplus
diff --git a/lib/librte_net/rte_udp.h b/lib/librte_net/rte_udp.h
index bc5be4a..89fdded 100644
--- a/lib/librte_net/rte_udp.h
+++ b/lib/librte_net/rte_udp.h
@@ -77,6 +77,7 @@
  */
 
 #include <stdint.h>
+#include <rte_byteorder.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -86,10 +87,10 @@ extern "C" {
  * UDP Header
  */
 struct udp_hdr {
-	uint16_t src_port;    /**< UDP source port. */
-	uint16_t dst_port;    /**< UDP destination port. */
-	uint16_t dgram_len;   /**< UDP datagram length */
-	uint16_t dgram_cksum; /**< UDP datagram checksum */
+	rte_be16_t src_port;    /**< UDP source port. */
+	rte_be16_t dst_port;    /**< UDP destination port. */
+	rte_be16_t dgram_len;   /**< UDP datagram length */
+	rte_be16_t dgram_cksum; /**< UDP datagram checksum */
 } __attribute__((__packed__));
 
 #ifdef __cplusplus
-- 
2.1.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
  2016-11-02  8:28  3%   ` Dai, Wei
  2016-11-02  9:07  3%     ` Mcnamara, John
  2016-11-02  9:21  0%     ` Morten Brørup
@ 2016-11-08 13:33  0%     ` Tahhan, Maryam
  2 siblings, 0 replies; 200+ results
From: Tahhan, Maryam @ 2016-11-08 13:33 UTC (permalink / raw)
  To: Dai, Wei, Thomas Monjalon, Mcnamara, John, Ananyev, Konstantin,
	Wu, Jingjing, Zhang, Helin, Dai, Wei, Curran, Greg
  Cc: dev

> 
> Hi, John & Greg
> 
> Would you please give any opinion for this patch ?
> 
> I have looked through all PMDs and found not all statistics items can be
> supported by some NIC.
> For example,  rx_nombuf,  q_ipackets,  q_opackets,  q_ibytes and q_obytes
> are not supported by i40e.

Queue stats should be supported by i40e as we have access to struct i40e_queue_stats  this is a gap. Same for e1000.
For me (from a stats perspective), we should be able to report everything that ethtool can report for the different kernel network drivers (as we have the same base driver code in DPDK). In other words, the DPDK stats API should provide the same set of stats as a standard networking interface would to an external monitoring tool in case we want to perform some sort of analytics on it afterwards.
At a very minimum the top level stats should include: ipackets, opackets, ibytes,  obytes,  imissed,  ierrors,  oerrors. The queue stats in theory could be migrated to the xstats, it would require a lot of clean up in existing drivers which is why we didn't remove them when we did the original cleanup of the struct for the xstats API.


> But when the function rte_eth_stats_get(uint8_t port_id, struct
> rte_eth_stats *stats) is called for i40e PMD, Above un-supported statistics
> item in output stats are zero, this is not real value.

Agreed - should not output 0 for these. But should ensure where stats are possible to obtain, we support them in DPDK.

> So far, there is no way to know whether an item in struct rte_eth_stats is
> supported or not only from this structure definition.
> Maybe some structure member can be added to indicate each of statistics
> item valid or not.
> But this means ABI change.

Migrating the queue/nonstandard stats to the xstats API would fix this, the only issue is with the existing drivers that are unsupported fields with 0.

> 
> In following list, I list statistics support details of all PMDs.
> Hope it can be displayed in your screen.
> 
Thanks for this, it's very helpful. I'm currently collating a list of the missing stats for e1000, ixgbe and i40e from DPDK. So this is very helpful.

> Thanks
> /Wei
> 
> NIC           ipackets   opackets  ibytes  obytes  imissed  ierrors  oerrors
> rx_nombuf  q_ipackets   q_opacktes     q_ibytes    q_obytes  q_errors
> af_packet      y          y         y       y       n        n        y        n          y            y            y         y
> y
> bnx2x         y          y         y       y       y        y        y        y          n            n            n         n
> n
> bnxt          y          y         y       y       y        y        y        n          y            y            y         y
> y
> bonding       y          y         y       y       y        y        y        y          y            y            y         y
> y
> cxgbe         y          y         y       y       y        y        y        n          y            y            y         y
> y
> e1000(igb)     y          y         y       y       y        y        y        n          n            n            n         n
> n
> e1000(igbvf)   y          y         y       y       n        n        n        n          n            n            n
> n         n
> ena           y          y         y       y       y        y        y        y          n            n            n         n
> n
> enic          y          y         y       y       y        y        y        y          n            n            n         n
> n
> fm10k         y          y         y       y       n        n        n        n          y            y            y         y
> n
> i40e          y          y         y       y       y        y        y        n          n            n            n         n
> n
> i40evf        y          y         y       y       n        y        y        n          n            n            n         n
> n
> ixgbe         y          y         y       y       y        y        y        n          y            y            y         y
> y
> ixgbevf       y          y         y       y       n        n        n        n          n            n            n         n
> n
> mlx4          y          y         y       y       n        y        y        y          y            y            y         y
> y
> mlx5          y          y         y       y       n        y        y        y          y            y            y         y
> y
> mpipe         y          y         y       y       n        y        y        y          y            y            y         y
> y
> nfp           y          y         y       y       y        y        y        y          y            y            y         y
> n
> null          y          y         n       n       n        n        y        n          y            y            n         n
> y
> pcap          y          y         y       y       n        n        y        n          y            y            y         y
> y
> qede          y          y         y       y       y        y        y        y          n            n            n         n
> n
> ring          y          y         n       n       n        n        y        n          y            y            n         n
> y
> szedata2      y          y         y       y       n        n        y        n          y            y            y         y
> y
> thunderx      y          y         y       y       y        y        y        n          y            y            y         y
> n
> vhost         y          y         y       y       n        n        y        n          y            y            y         y
> n
> virtio         y          y         y       y       n        y        y        y          y            y            y         y
> n
> vmxnet3      y          y         y       y       n        y        y        y          y            y            y         y
> y
> xenvirt       y          y         n       n       n        n        n        n          n            n            n         n
> n
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Tuesday, October 4, 2016 5:35 PM
> > To: Dai, Wei <wei.dai@intel.com>; Mcnamara, John
> > <john.mcnamara@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
> >
> > 2016-08-26 18:08, Wei Dai:
> > >  /**
> > >   * A structure used to retrieve statistics for an Ethernet port.
> > > + * Not all statistics fields in struct rte_eth_stats are supported
> > > + * by any type of network interface card (NIC). If any statistics
> > > + * field is not supported, its value is 0 .
> > >   */
> > >  struct rte_eth_stats {
> >
> > I'm missing the point of this patch.
> > Why do you think it is a fix?
> >
> > John, any opinion?
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: rename library for consistency
  2016-11-06 18:21 13% [dpdk-dev] [PATCH] ethdev: rename library for consistency Thomas Monjalon
@ 2016-11-06 19:54  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-11-06 19:54 UTC (permalink / raw)
  To: dev

2016-11-06 19:21, Thomas Monjalon:
> The library was named libethdev without rte_ prefix.
> It is now fixed, the library namespace is consistent.
> 
> Note: the ABI version has already been changed in this release cycle.
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied for 16.11, as announced in the deprecation notice.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] ethdev: rename library for consistency
@ 2016-11-06 18:21 13% Thomas Monjalon
  2016-11-06 19:54  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-11-06 18:21 UTC (permalink / raw)
  To: dev

The library was named libethdev without rte_ prefix.
It is now fixed, the library namespace is consistent.

Note: the ABI version has already been changed in this release cycle.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst   | 3 ---
 doc/guides/rel_notes/release_16_11.rst | 2 +-
 lib/librte_ether/Makefile              | 2 +-
 mk/rte.app.mk                          | 2 +-
 mk/rte.lib.mk                          | 2 +-
 5 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 884a231..9f5fa55 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,9 +8,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* The ethdev library file will be renamed from libethdev.* to librte_ethdev.*
-  in release 16.11 in order to have a more consistent namespace.
-
 * In 16.11 ABI changes are planned: the ``rte_eth_dev`` structure will be
   extended with new function pointer ``tx_pkt_prep`` allowing verification
   and processing of packet burst to meet HW specific requirements before
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index db20567..aad21ba 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -258,13 +258,13 @@ The libraries prepended with a plus sign were incremented in this version.
 
 .. code-block:: diff
 
-   + libethdev.so.5
      librte_acl.so.2
      librte_cfgfile.so.2
      librte_cmdline.so.2
    + librte_cryptodev.so.2
      librte_distributor.so.1
    + librte_eal.so.3
+   + librte_ethdev.so.5
      librte_hash.so.2
      librte_ip_frag.so.1
      librte_jobstats.so.1
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index bc2e5f6..efe1e5f 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 # library name
 #
-LIB = libethdev.a
+LIB = librte_ethdev.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 51bc3b0..f75f0e2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -91,7 +91,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)          += -lrte_vhost
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KVARGS)         += -lrte_kvargs
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MBUF)           += -lrte_mbuf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_NET)            += -lrte_net
-_LDLIBS-$(CONFIG_RTE_LIBRTE_ETHER)          += -lethdev
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ETHER)          += -lrte_ethdev
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CRYPTODEV)      += -lrte_cryptodev
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MEMPOOL)        += -lrte_mempool
 _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 7b96fd4..33a5f5a 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -81,7 +81,7 @@ endif
 # Ignore (sub)directory dependencies which do not provide an actual library
 _IGNORE_DIRS = lib/librte_eal/% lib/librte_compat
 _DEPDIRS = $(filter-out $(_IGNORE_DIRS),$(DEPDIRS-y))
-_LDDIRS = $(subst librte_ether,libethdev,$(_DEPDIRS))
+_LDDIRS = $(subst librte_ether,librte_ethdev,$(_DEPDIRS))
 LDLIBS += $(subst lib/lib,-l,$(_LDDIRS))
 
 O_TO_A = $(AR) crDs $(LIB) $(OBJS-y)
-- 
2.7.0

^ permalink raw reply	[relevance 13%]

* Re: [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether
  2016-10-18 13:38  4% [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether Bernard Iremonger
@ 2016-11-04 13:39  4% ` Mcnamara, John
  2016-11-10 10:36  4%   ` Ferruh Yigit
  2016-11-10 10:26  4% ` Pattan, Reshma
  1 sibling, 1 reply; 200+ results
From: Mcnamara, John @ 2016-11-04 13:39 UTC (permalink / raw)
  To: Iremonger, Bernard, dev



> -----Original Message-----
> From: Iremonger, Bernard
> Sent: Tuesday, October 18, 2016 2:38 PM
> To: dev@dpdk.org; Mcnamara, John <john.mcnamara@intel.com>
> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>
> Subject: [PATCH v1] doc: announce API and ABI change for librte_ether
> 
> In 17.02 five rte_eth_dev_set_vf_*** functions will be removed from
> librte_ether, renamed and moved to the ixgbe PMD.
> 
> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] dpdk16.11 RC2 package ipv4 reassembly example can't work
  2016-11-04  6:36  0%     ` Lu, Wenzhuo
@ 2016-11-04 10:20  0%       ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2016-11-04 10:20 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Ananyev, Konstantin, Liu, Yu Y, Chen, WeichunX, Xu, HuilongX, dev

On Fri, Nov 04, 2016 at 06:36:30AM +0000, Lu, Wenzhuo wrote:
> Hi Adrien,
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Wednesday, November 2, 2016 11:21 PM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Liu, Yu Y; Chen, WeichunX; Xu, HuilongX;
> > dev@dpdk.org
> > Subject: Re: dpdk16.11 RC2 package ipv4 reassembly example can't work
> > 
> > Hi all,
> > 
> > On Wed, Nov 02, 2016 at 08:39:31AM +0000, Lu, Wenzhuo wrote:
> > > Correct the typo of receiver.
> > >
> > > Hi Adrien,
> > > The change from struct ip_frag_pkt pkt[0]  to struct ip_frag_pkt pkt[] will
> > make IP reassembly not working. I think this is not the root cause. Maybe
> > Konstantin can give us some idea.
> > > But I notice one thing, you change some from [0] to [], but others just add
> > '__extension__'. I believe if you add '__extension__' for struct ip_frag_pkt pkt[0],
> > we'll not hit this issue. Just curious why you use 2 ways to resolve the same
> > problem.
> > 
> > I've used the __extension__ method whenever the C99 syntax could not work
> > due to invalid usage in the code, e.g. a flexible array cannot be the only member
> > of a struct, you cannot make arrays out of structures that contain such fields,
> > while there is no such constraint with the GNU syntax.
> > 
> > For example see __extension__ uint8_t action_data[0] in struct
> > rte_pipeline_table_entry. The C99 could not be used because of
> > test_table_acl.c:
> > 
> >   struct rte_pipeline_table_entry entries[5];
> > 
> > If replacing ip_frag_pkt[] with __extension__ ip_frag_pkt pkt[0] in rte_ip_frag.h
> > solves the issue, either some code is breaking some constraint somewhere or
> > this change broke the ABI (unlikely considering a simple recompilation should
> > have taken care of the issue). I did not notice any change in sizeof(struct
> > rte_ip_frag_tbl) nor offsetof(struct rte_ip_frag_tbl, pkt) on my setup, perhaps
> > the compilation flags used in your test affect them somehow.
> Thanks for your explanation. I also checked sizeof(struct rte_ip_frag_tbl). I don't see any change either.
> 
> > 
> > Can you confirm whether only reverting this particular field solves the issue?
> Yes. ip_frag_pkt pkt[0] or even ip_frag_pkt pkt[1] can work but ip_frag_pkt pkt[] cannot :(
> Do you like the idea of changing the ip_frag_pkt[] to __extension__ ip_frag_pkt pkt[0]?

Yes, restoring the original code (with __extension__) as a workaround until
we understand what is going on is safer, that's fine by me. The commit log
should explicitly state that weirdness occurs for an unknown reason with the
C99 syntax though (compiler bug is also a possibility).

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] dpdk16.11 RC2 package ipv4 reassembly example can't work
  2016-11-02 15:21  3%   ` [dpdk-dev] dpdk16.11 RC2 package ipv4 reassembly example can't work Adrien Mazarguil
@ 2016-11-04  6:36  0%     ` Lu, Wenzhuo
  2016-11-04 10:20  0%       ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Lu, Wenzhuo @ 2016-11-04  6:36 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Ananyev, Konstantin, Liu, Yu Y, Chen, WeichunX, Xu, HuilongX, dev

Hi Adrien,

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Wednesday, November 2, 2016 11:21 PM
> To: Lu, Wenzhuo
> Cc: Ananyev, Konstantin; Liu, Yu Y; Chen, WeichunX; Xu, HuilongX;
> dev@dpdk.org
> Subject: Re: dpdk16.11 RC2 package ipv4 reassembly example can't work
> 
> Hi all,
> 
> On Wed, Nov 02, 2016 at 08:39:31AM +0000, Lu, Wenzhuo wrote:
> > Correct the typo of receiver.
> >
> > Hi Adrien,
> > The change from struct ip_frag_pkt pkt[0]  to struct ip_frag_pkt pkt[] will
> make IP reassembly not working. I think this is not the root cause. Maybe
> Konstantin can give us some idea.
> > But I notice one thing, you change some from [0] to [], but others just add
> '__extension__'. I believe if you add '__extension__' for struct ip_frag_pkt pkt[0],
> we'll not hit this issue. Just curious why you use 2 ways to resolve the same
> problem.
> 
> I've used the __extension__ method whenever the C99 syntax could not work
> due to invalid usage in the code, e.g. a flexible array cannot be the only member
> of a struct, you cannot make arrays out of structures that contain such fields,
> while there is no such constraint with the GNU syntax.
> 
> For example see __extension__ uint8_t action_data[0] in struct
> rte_pipeline_table_entry. The C99 could not be used because of
> test_table_acl.c:
> 
>   struct rte_pipeline_table_entry entries[5];
> 
> If replacing ip_frag_pkt[] with __extension__ ip_frag_pkt pkt[0] in rte_ip_frag.h
> solves the issue, either some code is breaking some constraint somewhere or
> this change broke the ABI (unlikely considering a simple recompilation should
> have taken care of the issue). I did not notice any change in sizeof(struct
> rte_ip_frag_tbl) nor offsetof(struct rte_ip_frag_tbl, pkt) on my setup, perhaps
> the compilation flags used in your test affect them somehow.
Thanks for your explanation. I also checked sizeof(struct rte_ip_frag_tbl). I don't see any change either.

> 
> Can you confirm whether only reverting this particular field solves the issue?
Yes. ip_frag_pkt pkt[0] or even ip_frag_pkt pkt[1] can work but ip_frag_pkt pkt[] cannot :(
Do you like the idea of changing the ip_frag_pkt[] to __extension__ ip_frag_pkt pkt[0]?

> 
> > From: Xu, HuilongX
> > Sent: Wednesday, November 2, 2016 4:29 PM
> > To: drien.mazarguil@6wind.com
> > Cc: Ananyev, Konstantin; Liu, Yu Y; Chen, WeichunX; Lu, Wenzhuo; Xu,
> > HuilongX
> > Subject: dpdk16.11 RC2 package ipv4 reassembly example can't work
> >
> > Hi mazarguil,
> > I find ip reassembly example can't work with dpdk16.11 rc2 package.
> > But when I reset dpdk code before
> 347a1e037fd323e6c2af55d17f7f0dc4bfe1d479, it works ok.
> > Could you have time to check this issue, thanks  a lot.
> > Unzip password: intel123
> >
> > Test detail info:
> >
> > os&kernel:4.2.3-300.fc23.x86_64
> > gcc version:5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> > NIC:03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet
> > Connection X552/X557-AT 10GBASE-T [8086:15ad] and
> > 84:00.0 Ethernet controller [0200]: Intel Corporation 82599ES
> > 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
> > package: dpdk16.11.rc2.tar.gz
> > test steps:
> > 1. build and install dpdk
> > 2. build ip_reassembly example
> > 3. run ip_reassembly
> > ./examples/ip_reassembly/build/ip_reassembly -c 0x2 -n 4 - -p 0x1
> > --maxflows=1024 --flowttl=10s 4. set tester port mtu ip link set mtu
> > 9000 dev ens160f1 5. setup scapy on tester and send packet scapy pcap
> > = rdpcap("file.pcap") sendp(pcap, iface="ens160f1") 6. sniff packet on
> > tester and check packet test result:
> > dpdk16.04 reassembly packet successful but dpdk16.11 reassembly pack failed.
> >
> > comments:
> > file.pcap: send packets pcap file
> > tcpdump_16.04_reassembly_successful.pcap: sniff packets by tcpdump on
> 16.04.
> > tcpdump_reset_code_reassembly_failed.pcap: sniff packets by tcpdump on
> > 16.11
> > reset_code_reassembly_successful_.jpg: reassembly a packets successful
> > detail info
> > dpdk16.11_reassembly_failed.jpg: reassembly a packets failed detail
> > info
> >
> 
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 2/3] lib: add bitrate statistics library
    2016-11-04  3:36  2% ` [dpdk-dev] [PATCH v3 1/3] lib: add information metrics library Remy Horton
@ 2016-11-04  3:36  3% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-11-04  3:36 UTC (permalink / raw)
  To: dev

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_16_11.rst             |   5 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 128 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 10 files changed, 284 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/config/common_base b/config/common_base
index 2277727..25c3911 100644
--- a/config/common_base
+++ b/config/common_base
@@ -594,3 +594,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the device metrics library
 #
 CONFIG_RTE_LIBRTE_METRICS=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index ca50fa6..91e8ea6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -148,4 +148,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [Device Metrics]     (@ref rte_metrics.h),
+  [Bitrate Statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fe830eb..8765ddd 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -58,6 +58,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_ring \
                           lib/librte_sched \
                           lib/librte_metrics \
+                          lib/librte_bitratestats \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 507f715..b690b72 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -137,6 +137,11 @@ New Features
   intended to provide a reporting mechanism that is independent of the
   ethdev library.
 
+* **Added bit-rate calculation library.**
+
+  A library that can be used to calculate device bit-rates. Calculated
+  bitrates are reported using the metrics library.
+
 
 Resolved Issues
 ---------------
diff --git a/lib/Makefile b/lib/Makefile
index 5d85dcf..e211bc0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..b725d4e
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..d97a526
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,128 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate_s {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates_s {
+	struct rte_stats_bitrate_s port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+
+struct rte_stats_bitrates_s *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates_s), 0);
+}
+
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data)
+{
+	const char *names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_metrics(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate_s *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +50 fixes integer rounding during divison */
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_metrics(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..cd566d6
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+
+/**
+ *  Bitrate statistics data structure
+ */
+struct rte_stats_bitrates_s;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates_s *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period this function
+ * is called should be the intended time window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..9de6be9
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_16.11 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 2db5427..5b5e547 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 1/3] lib: add information metrics library
  @ 2016-11-04  3:36  2% ` Remy Horton
  2016-11-04  3:36  3% ` [dpdk-dev] [PATCH v3 2/3] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-11-04  3:36 UTC (permalink / raw)
  To: dev

This patch adds a new information metric library that allows other
modules to register named metrics and update their values. It is
intended to be independent of ethdev, rather than mixing ethdev
and non-ethdev information in xstats.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/rel_notes/release_16_11.rst     |   6 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 300 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 204 ++++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 10 files changed, 584 insertions(+)
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/config/common_base b/config/common_base
index 21d18f8..2277727 100644
--- a/config/common_base
+++ b/config/common_base
@@ -589,3 +589,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..ca50fa6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -147,4 +147,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [Device Metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..fe830eb 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -57,6 +57,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_reorder \
                           lib/librte_ring \
                           lib/librte_sched \
+                          lib/librte_metrics \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index aa0c09a..507f715 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -131,6 +131,12 @@ New Features
   The GCC 4.9 ``-march`` option supports the Intel processor code names.
   The config option ``RTE_MACHINE`` can be used to pass code names to the compiler as ``-march`` flag.
 
+* **Added information metric library.**
+
+  A library that allows information metrics to be added and update. It is
+  intended to provide a reporting mechanism that is independent of the
+  ethdev library.
+
 
 Resolved Issues
 ---------------
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..5d85dcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..220c2ac
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,300 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ * @param name
+ *   Name of metric
+ * @param value
+ *   Current value for metric
+ * @param idx_next_set
+ *   Index of next root element (zero for none)
+ * @param idx_next_metric
+ *   Index of next metric in set (zero for none)
+ *
+ * Only the root of each set needs idx_next_set but since it has to be
+ * assumed that number of sets could equal total number of metrics,
+ * having a separate set metadata table doesn't save any memory.
+ */
+struct rte_metrics_meta_s {
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	uint64_t value[RTE_MAX_ETHPORTS];
+	uint64_t nonport_value;
+	uint16_t idx_next_set;
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * @param idx_last_set
+ *   Index of last metadata entry with valid data. This value is
+ *   not valid if cnt_stats is zero.
+ * @param cnt_stats
+ *   Number of metrics.
+ * @param metadata
+ *   Stat data memory block.
+ *
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	uint16_t idx_last_set;
+	uint16_t cnt_stats;
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	rte_spinlock_t lock;
+};
+
+
+void
+rte_metrics_init(void)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), rte_socket_id(), 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+
+int
+rte_metrics_reg_metric(const char *name)
+{
+	const char *list_names[] = {name};
+
+	return rte_metrics_reg_metrics(list_names, 1);
+}
+
+
+int
+rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+
+int
+rte_metrics_update_metric(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_metrics(port_id, key, &value, 1);
+}
+
+
+int
+rte_metrics_update_metrics(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_NONPORT &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_NONPORT)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].nonport_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_stat_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			rte_spinlock_unlock(&stats->lock);
+			return -ERANGE;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++) {
+			entry = &stats->metadata[idx_name];
+			values[idx_name].key = idx_name;
+			values[idx_name].value = entry->value[port_id];
+		}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..6b75404
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,204 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * RTE Statistics module
+ *
+ * Statistic information is populated using callbacks, each of which
+ * is associated with one or more metric names. When queried, the
+ * callback is used to update all metric in the set at once. Currently
+ * only bulk querying of all metric is supported.
+ *
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of statistic name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/** Used to indicate port-independent information */
+#define RTE_METRICS_NONPORT -1
+
+
+/**
+ * Statistic name
+ */
+struct rte_metric_name {
+	/** String describing statistic */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Statistic name.
+ */
+struct rte_stat_value {
+	/** Numeric identifier of statistic */
+	uint16_t key;
+	/** Value for statistic */
+	uint64_t value;
+};
+
+
+/**
+ * Initialises statistic module. This only has to be explicitly called if you
+ * intend to use rte_metrics_reg_metric() or rte_metrics_reg_metrics() from a
+ * secondary process.
+ */
+void rte_metrics_init(void);
+
+
+/**
+ * Register a statistic and its associated callback.
+ *
+ * @param name
+ *   Statistic name
+ *
+ * @param callback
+ *   Callback to use when fetching statistic
+ *
+ * @param data
+ *   Data pointer to pass to callback
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metric(const char *name);
+
+/**
+ * Register a set of statistic and their associated callback.
+ *
+ * @param names
+ *   List of statistic names
+ *
+ * @param cnt_names
+ *   Number of statistics in set
+ *
+ * @param callback
+ *   Callback to use when fetching statsitics
+ *
+ * @param data
+ *   Data pointer to pass to callback
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metrics(const char **names, uint16_t cnt_names);
+
+/**
+ * Get statistic name-key lookup table.
+ *
+ * @param names
+ *   Array of names to receive key names
+ *
+ * @param capacity
+ *   Space available in names
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Fetch statistics.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   Array to receive values and their keys
+ *
+ * @param capacity
+ *   Space available in values
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_stat_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a statistic metric
+ *
+ * @param port_id
+ *   Port to update statistics for
+ * @param key
+ *   Id of statistic metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared statistics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metric(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a statistic metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update statistics for
+ * @param key
+ *   Base id of statistics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds statistic set size
+ *   - -EIO if upable to access shared statistics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metrics(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..a31a80a
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_16.11 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_metric;
+	rte_metrics_reg_metrics;
+	rte_metrics_update_metric;
+	rte_metrics_update_metrics;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 51bc3b0..2db5427 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI changes in filtering support
  2016-11-02 15:12 14% ` Stroe, Laura
@ 2016-11-03 11:42  7%   ` Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2016-11-03 11:42 UTC (permalink / raw)
  To: Stroe, Laura, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stroe, Laura
> Sent: Wednesday, November 2, 2016 3:12 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI changes in filtering
> support
> 
> Self-Nack.
> After an internal review of ABI breakage announcements we found a way of
> achieving this with an ABI change.
> 

Hi Laura,

I guess this should say *without* an ABI change. :-)

Thanks,

John

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
  2016-11-03  2:00  0%       ` Remy Horton
@ 2016-11-03  9:07  0%         ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2016-11-03  9:07 UTC (permalink / raw)
  To: Remy Horton, Mcnamara, John, Dai, Wei, Thomas Monjalon, Ananyev,
	Konstantin, Wu, Jingjing, Zhang, Helin, Curran, Greg, Van Haaren,
	Harry
  Cc: dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Remy Horton
> Sent: Thursday, November 3, 2016 3:01 AM
> 
> On 02/11/2016 17:07, Mcnamara, John wrote:
> [..]
> > Perhaps we could an API that returns a struct, or otherwise, that
> > indicated what stats are returned by a PMD. An application that
> > required stats could call it once to establish what stats were
> > available. It would have to be done in some way that wouldn't break
> > ABI every time a new stat was added.
> >
> > Harry, Remy, how would this fit in with the existing stats scheme or
> > the new metrics library.
> 
> At the moment xstats (rte_eth_xstats_get()) pulls stuff out of
> rte_eth_stats and reports them unconditionally alongside all the
> driver-specific xstats. This could change so that it only reports the
> (legacy) stats the PMDs actually fills in.
> 
> Personally in the longer term I think xstats should get all the info it
> requires directly rather than relying on the legacy stats for some of
> its info, but that would involve pushing a lot of common code into the
> PMDs..
> 
> ..Remy

Adding eth_stats to eth_xstats or not is not important - it's not a synchronized snapshot of the entire counter set, just a question of calling one or two functions to obtain the values.

Regarding eth_xstats, I would dare to say that the NIC HW designers chose their statistics counters wisely, based on a combination of industry standards (e.g. common SNMP MIBs, such as the Interfaces MIB and etherStats) and customer feedback, so the hardware counters are probably useful to a DPDK application, and thus it makes sense to expose them directly. The application can transform them into industry standard counter sets (IF-MIB, etherStats, etc.) if required. DPDK could offer a common library for this transformation.

-Morten

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
  2016-11-02  9:07  3%     ` Mcnamara, John
@ 2016-11-03  2:00  0%       ` Remy Horton
  2016-11-03  9:07  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Remy Horton @ 2016-11-03  2:00 UTC (permalink / raw)
  To: Mcnamara, John, Dai, Wei, Thomas Monjalon, Ananyev, Konstantin,
	Wu, Jingjing, Zhang, Helin, Curran, Greg, Van Haaren, Harry
  Cc: dev


On 02/11/2016 17:07, Mcnamara, John wrote:
[..]
> Perhaps we could an API that returns a struct, or otherwise, that
> indicated what stats are returned by a PMD. An application that
> required stats could call it once to establish what stats were
> available. It would have to be done in some way that wouldn't break
> ABI every time a new stat was added.
>
> Harry, Remy, how would this fit in with the existing stats scheme or
> the new metrics library.

At the moment xstats (rte_eth_xstats_get()) pulls stuff out of 
rte_eth_stats and reports them unconditionally alongside all the 
driver-specific xstats. This could change so that it only reports the 
(legacy) stats the PMDs actually fills in.

Personally in the longer term I think xstats should get all the info it 
requires directly rather than relying on the legacy stats for some of 
its info, but that would involve pushing a lot of common code into the 
PMDs..

..Remy

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] dpdk16.11 RC2 package ipv4 reassembly example can't work
       [not found]     ` <6A0DE07E22DDAD4C9103DF62FEBC09093934068B@shsmsx102.ccr.corp.intel.com>
@ 2016-11-02 15:21  3%   ` Adrien Mazarguil
  2016-11-04  6:36  0%     ` Lu, Wenzhuo
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-11-02 15:21 UTC (permalink / raw)
  To: Lu, Wenzhuo; +Cc: dev

Hi all,

On Wed, Nov 02, 2016 at 08:39:31AM +0000, Lu, Wenzhuo wrote:
> Correct the typo of receiver.
> 
> Hi Adrien,
> The change from struct ip_frag_pkt pkt[0]  to struct ip_frag_pkt pkt[] will make IP reassembly not working. I think this is not the root cause. Maybe Konstantin can give us some idea.
> But I notice one thing, you change some from [0] to [], but others just add '__extension__'. I believe if you add '__extension__' for struct ip_frag_pkt pkt[0], we'll not hit this issue. Just curious why you use 2 ways to resolve the same problem.

I've used the __extension__ method whenever the C99 syntax could not work
due to invalid usage in the code, e.g. a flexible array cannot be the only
member of a struct, you cannot make arrays out of structures that contain
such fields, while there is no such constraint with the GNU syntax.

For example see __extension__ uint8_t action_data[0] in struct
rte_pipeline_table_entry. The C99 could not be used because of
test_table_acl.c:

  struct rte_pipeline_table_entry entries[5];

If replacing ip_frag_pkt[] with __extension__ ip_frag_pkt pkt[0] in
rte_ip_frag.h solves the issue, either some code is breaking some constraint
somewhere or this change broke the ABI (unlikely considering a simple
recompilation should have taken care of the issue). I did not notice any
change in sizeof(struct rte_ip_frag_tbl) nor offsetof(struct
rte_ip_frag_tbl, pkt) on my setup, perhaps the compilation flags used in
your test affect them somehow.

Can you confirm whether only reverting this particular field solves the
issue?

> From: Xu, HuilongX
> Sent: Wednesday, November 2, 2016 4:29 PM
> To: drien.mazarguil@6wind.com
> Cc: Ananyev, Konstantin; Liu, Yu Y; Chen, WeichunX; Lu, Wenzhuo; Xu, HuilongX
> Subject: dpdk16.11 RC2 package ipv4 reassembly example can't work
> 
> Hi mazarguil,
> I find ip reassembly example can't work with dpdk16.11 rc2 package.
> But when I reset dpdk code before 347a1e037fd323e6c2af55d17f7f0dc4bfe1d479, it works ok.
> Could you have time to check this issue, thanks  a lot.
> Unzip password: intel123
> 
> Test detail info:
> 
> os&kernel:4.2.3-300.fc23.x86_64
> gcc version:5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
> NIC:03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T [8086:15ad] and
> 84:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
> package: dpdk16.11.rc2.tar.gz
> test steps:
> 1. build and install dpdk
> 2. build ip_reassembly example
> 3. run ip_reassembly
> ./examples/ip_reassembly/build/ip_reassembly -c 0x2 -n 4 - -p 0x1 --maxflows=1024 --flowttl=10s
> 4. set tester port mtu
> ip link set mtu 9000 dev ens160f1
> 5. setup scapy on tester and send packet
> scapy
> pcap = rdpcap("file.pcap")
> sendp(pcap, iface="ens160f1")
> 6. sniff packet on tester and check packet
> test result:
> dpdk16.04 reassembly packet successful but dpdk16.11 reassembly pack failed.
> 
> comments:
> file.pcap: send packets pcap file
> tcpdump_16.04_reassembly_successful.pcap: sniff packets by tcpdump on 16.04.
> tcpdump_reset_code_reassembly_failed.pcap: sniff packets by tcpdump on 16.11
> reset_code_reassembly_successful_.jpg: reassembly a packets successful detail info
> dpdk16.11_reassembly_failed.jpg: reassembly a packets failed detail info
> 

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI changes in filtering support
  @ 2016-11-02 15:12 14% ` Stroe, Laura
  2016-11-03 11:42  7%   ` Mcnamara, John
  0 siblings, 1 reply; 200+ results
From: Stroe, Laura @ 2016-11-02 15:12 UTC (permalink / raw)
  To: dev

Self-Nack.
After an internal review of ABI breakage announcements we found a way of achieving this with an ABI change.

-----Original Message-----
From: Stroe, Laura 
Sent: Friday, September 23, 2016 12:23 PM
To: dev@dpdk.org
Cc: Stroe, Laura <laura.stroe@intel.com>
Subject: [PATCH] doc: announce ABI changes in filtering support

From: Laura Stroe <laura.stroe@intel.com>

This patch adds a notice that the ABI for filter types functionality will be enhanced in the 17.02 release with new operation available to manipulate the tunnel filters:
replace filter types.

Signed-off-by: Laura Stroe <laura.stroe@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1a3831f..1cd1d2c 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -57,3 +57,12 @@ Deprecation Notices
 * API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
   structures. The member ``file_name`` data type will be changed from
   ``char *`` to ``const char *``. This change targets release 16.11.
+
+* In 17.02 ABI changes are planned: the ``rte_filter_op `` enum will be 
+extended
+  with a new member RTE_ETH_FILTER_REPLACE in order to facilitate
+  the new operation - replacing the tunnel filters,
+  the ``rte_eth_tunnel_filter_conf`` structure will be extended with a 
+new field
+  ``filter_type_replace`` handling the bitmask combination of the 
+filter types
+  defined by the values  ETH_TUNNEL_FILTER_XX,
+  define new values for Outer VLAN and Outer Ethertype filters
+  ETH_TUNNEL_FILTER_OVLAN and ETH_TUNNEL_FILTER_OETH.
--
2.5.5

^ permalink raw reply	[relevance 14%]

* Re: [dpdk-dev] Best Practices for PMD Verification before Upstream Requests
  @ 2016-11-02 12:21  0%   ` Shepard Siegel
  0 siblings, 0 replies; 200+ results
From: Shepard Siegel @ 2016-11-02 12:21 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Thomas and DPDK devs,

Almost a year into our DPDK development, we have shipped an alpha version
of our "Arkville" product. We've thankful for all the support from this
group. Most everyone has suggested "get your code upstream ASAP"; but our
team is cut from the "if it isn't tested, it doesn't work" cloth. We now
have some solid miles on our Arkville PMD driver "ark" with 16.07. Mostly
testpmd and a suite of user apps; dts not so much, only because our use
case is a little different. We expect almost all of our contribution would
land under $dpdk/drivers/net/ark . We are looking past 16.11 to possibly
jump on board when the 17.02 window opens in December. One question that
came up is "Should we do a thorough port and regression against 16.11 as a
precursor to up streaming at 17.02?". Constructive feedback always welcome!

-Shep

Shepard Siegel, CTO
atomicrules.com

On Mon, Aug 22, 2016 at 9:07 AM, Thomas Monjalon <thomas.monjalon@6wind.com>
wrote:

> 2016-08-17 08:34, Shepard Siegel:
> > Atomic Rules is new to the DPDK community. We attended the DPDK Summit
> last
> > week and received terrific advice and encouragement. We are developing a
> > DPDK PMD for our Arkville product which is a DPDK-aware data mover,
> capable
> > of marshaling packets between FPGA/ASIC gates with AXI interfaces on one
> > side, and the DPDK API/ABI on the other. Arkville plus a MAC looks like a
> > line-rate-agnostic bare-bones L2 NIC. We have testpmd and our first DPDK
> > applications running using our early-alpha Arkville PMD.
>
> Welcome :)
>
> Any release targeted for upstream support?
>
> <snip>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
  2016-11-02  8:28  3%   ` Dai, Wei
  2016-11-02  9:07  3%     ` Mcnamara, John
@ 2016-11-02  9:21  0%     ` Morten Brørup
  2016-11-08 13:33  0%     ` Tahhan, Maryam
  2 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2016-11-02  9:21 UTC (permalink / raw)
  To: Dai, Wei, Thomas Monjalon, Mcnamara, John, Ananyev, Konstantin,
	Wu, Jingjing, Zhang, Helin, Curran, Greg
  Cc: dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Dai, Wei
> Sent: Wednesday, November 2, 2016 9:29 AM
> To: Thomas Monjalon; Mcnamara, John; Ananyev, Konstantin; Wu, Jingjing;
> Zhang, Helin; Dai, Wei; Curran, Greg
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
> 
> Hi, John & Greg
> 
> Would you please give any opinion for this patch ?
> 
> I have looked through all PMDs and found not all statistics items can
> be supported by some NIC.
> For example,  rx_nombuf,  q_ipackets,  q_opackets,  q_ibytes and
> q_obytes are not supported by i40e.
> But when the function rte_eth_stats_get(uint8_t port_id, struct
> rte_eth_stats *stats) is called for i40e PMD, Above un-supported
> statistics item in output stats are zero, this is not real value.
> So far, there is no way to know whether an item in struct rte_eth_stats
> is supported or not only from this structure definition.
> Maybe some structure member can be added to indicate each of statistics
> item valid or not.
> But this means ABI change.
> 
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Tuesday, October 4, 2016 5:35 PM
> > To: Dai, Wei <wei.dai@intel.com>; Mcnamara, John
> > <john.mcnamara@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
> >
> > 2016-08-26 18:08, Wei Dai:
> > >  /**
> > >   * A structure used to retrieve statistics for an Ethernet port.
> > > + * Not all statistics fields in struct rte_eth_stats are supported
> > > + * by any type of network interface card (NIC). If any statistics
> > > + * field is not supported, its value is 0 .
> > >   */
> > >  struct rte_eth_stats {
> >
> > I'm missing the point of this patch.
> > Why do you think it is a fix?
> >
> > John, any opinion?
> 

I think the source code comment is an improvement. I would also like to see a source code comment describing the criteria for choosing which counters go in the eth_stats, and which counters are relegated to the eth_xstats.

It doesn't look like the Interfaces MIB (IF-MIB) or even the etherStats MIB drives this selection.

The ifOutQLen in the IF-MIB is deprecated; I guess it is not important for network monitoring. But fast access to the queue lengths are obviously useful for a fast path application with RED or other intelligent queue overflow handling mechanisms, so in my mind they are useful here.

-Morten


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
  2016-11-02  8:28  3%   ` Dai, Wei
@ 2016-11-02  9:07  3%     ` Mcnamara, John
  2016-11-03  2:00  0%       ` Remy Horton
  2016-11-02  9:21  0%     ` Morten Brørup
  2016-11-08 13:33  0%     ` Tahhan, Maryam
  2 siblings, 1 reply; 200+ results
From: Mcnamara, John @ 2016-11-02  9:07 UTC (permalink / raw)
  To: Dai, Wei, Thomas Monjalon, Ananyev, Konstantin, Wu, Jingjing,
	Zhang, Helin, Curran, Greg, Horton, Remy, Van Haaren, Harry
  Cc: dev

> -----Original Message-----
> From: Dai, Wei
> Sent: Wednesday, November 2, 2016 8:29 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Dai, Wei <wei.dai@intel.com>;
> Curran, Greg <greg.curran@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] ethdev: fix statistics description
> 
> Hi, John & Greg
> 
> Would you please give any opinion for this patch ?
> 
> I have looked through all PMDs and found not all statistics items can be
> supported by some NIC.
> For example,  rx_nombuf,  q_ipackets,  q_opackets,  q_ibytes and q_obytes
> are not supported by i40e.
> But when the function rte_eth_stats_get(uint8_t port_id, struct
> rte_eth_stats *stats) is called for i40e PMD, Above un-supported
> statistics item in output stats are zero, this is not real value.
> So far, there is no way to know whether an item in struct rte_eth_stats is
> supported or not only from this structure definition.
> Maybe some structure member can be added to indicate each of statistics
> item valid or not.
> But this means ABI change.
> 
> In following list, I list statistics support details of all PMDs.
> Hope it can be displayed in your screen.

Hi,

Thanks for the analysis.

Perhaps we could an API that returns a struct, or otherwise, that indicated what stats are returned by a PMD. An application that required stats could call it once to establish what stats were available. It would have to be done in some way that wouldn't break ABI every time a new stat was added.

Harry, Remy, how would this fit in with the existing stats scheme or the new metrics library.

John

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
  @ 2016-11-02  8:28  3%   ` Dai, Wei
  2016-11-02  9:07  3%     ` Mcnamara, John
                       ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Dai, Wei @ 2016-11-02  8:28 UTC (permalink / raw)
  To: Thomas Monjalon, Mcnamara, John, Ananyev, Konstantin, Wu,
	Jingjing, Zhang, Helin, Dai, Wei, Curran, Greg
  Cc: dev

Hi, John & Greg

Would you please give any opinion for this patch ?

I have looked through all PMDs and found not all statistics items can be supported by some NIC.
For example,  rx_nombuf,  q_ipackets,  q_opackets,  q_ibytes and q_obytes are not supported by i40e.
But when the function rte_eth_stats_get(uint8_t port_id, struct rte_eth_stats *stats) is called for i40e PMD,
Above un-supported statistics item in output stats are zero, this is not real value.
So far, there is no way to know whether an item in struct rte_eth_stats is supported or not only from this structure definition.
Maybe some structure member can be added to indicate each of statistics item valid or not.
But this means ABI change.

In following list, I list statistics support details of all PMDs.
Hope it can be displayed in your screen.

Thanks
/Wei

NIC           ipackets   opackets  ibytes  obytes  imissed  ierrors  oerrors  rx_nombuf  q_ipackets   q_opacktes     q_ibytes    q_obytes  q_errors
af_packet      y          y         y       y       n        n        y        n          y            y            y         y         y
bnx2x         y          y         y       y       y        y        y        y          n            n            n         n         n
bnxt          y          y         y       y       y        y        y        n          y            y            y         y         y
bonding       y          y         y       y       y        y        y        y          y            y            y         y         y
cxgbe         y          y         y       y       y        y        y        n          y            y            y         y         y
e1000(igb)     y          y         y       y       y        y        y        n          n            n            n         n         n
e1000(igbvf)   y          y         y       y       n        n        n        n          n            n            n         n         n
ena           y          y         y       y       y        y        y        y          n            n            n         n         n
enic          y          y         y       y       y        y        y        y          n            n            n         n         n
fm10k         y          y         y       y       n        n        n        n          y            y            y         y         n
i40e          y          y         y       y       y        y        y        n          n            n            n         n         n
i40evf        y          y         y       y       n        y        y        n          n            n            n         n         n
ixgbe         y          y         y       y       y        y        y        n          y            y            y         y         y
ixgbevf       y          y         y       y       n        n        n        n          n            n            n         n         n
mlx4          y          y         y       y       n        y        y        y          y            y            y         y         y
mlx5          y          y         y       y       n        y        y        y          y            y            y         y         y
mpipe         y          y         y       y       n        y        y        y          y            y            y         y         y
nfp           y          y         y       y       y        y        y        y          y            y            y         y         n
null          y          y         n       n       n        n        y        n          y            y            n         n         y
pcap          y          y         y       y       n        n        y        n          y            y            y         y         y
qede          y          y         y       y       y        y        y        y          n            n            n         n         n
ring          y          y         n       n       n        n        y        n          y            y            n         n         y
szedata2      y          y         y       y       n        n        y        n          y            y            y         y         y
thunderx      y          y         y       y       y        y        y        n          y            y            y         y         n
vhost         y          y         y       y       n        n        y        n          y            y            y         y         n
virtio         y          y         y       y       n        y        y        y          y            y            y         y         n
vmxnet3      y          y         y       y       n        y        y        y          y            y            y         y         y
xenvirt       y          y         n       n       n        n        n        n          n            n            n         n         n

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Tuesday, October 4, 2016 5:35 PM
> To: Dai, Wei <wei.dai@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ethdev: fix statistics description
> 
> 2016-08-26 18:08, Wei Dai:
> >  /**
> >   * A structure used to retrieve statistics for an Ethernet port.
> > + * Not all statistics fields in struct rte_eth_stats are supported
> > + * by any type of network interface card (NIC). If any statistics
> > + * field is not supported, its value is 0 .
> >   */
> >  struct rte_eth_stats {
> 
> I'm missing the point of this patch.
> Why do you think it is a fix?
> 
> John, any opinion?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] mbuf changes
  2016-10-28 17:00  0%                             ` Richardson, Bruce
@ 2016-10-28 20:27  0%                               ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2016-10-28 20:27 UTC (permalink / raw)
  To: Richardson, Bruce, Adrien Mazarguil; +Cc: dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Richardson, Bruce
> Sent: Friday, October 28, 2016 7:01 PM
> To: Adrien Mazarguil; Morten Brørup
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] mbuf changes
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Friday, October 28, 2016 5:50 PM
> > To: Morten Brørup <mb@smartsharesystems.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> > On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Brørup wrote:
> > > Comments at the end.
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pattan,
> > > > Reshma
> > > > Sent: Friday, October 28, 2016 3:35 PM
> > > > To: Olivier Matz
> > > > Cc: dev@dpdk.org; Morten Brørup
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > > Hi Olivier,
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier
> > > > > Matz
> > > > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > > > To: Richardson, Bruce <bruce.richardson@intel.com>; Morten
> > > > > Brørup <mb@smartsharesystems.com>
> > > > > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> > > > > <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> > > > > <olegk@mellanox.com>
> > > > > Subject: Re: [dpdk-dev] mbuf changes
> > > > >
> > > > >
> > > > >
> > > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> > > > > >> Comments at the end.
> > > > > >>
> > > > > >> Med venlig hilsen / kind regards
> > > > > >> - Morten Brørup
> > > > > >>
> > > > > >>> -----Original Message-----
> > > > > >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > > > >>> To: Morten Brørup
> > > > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier
> > > > > >>> Matz; Oleg Kuporosov
> > > > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > > > >>>
> > > > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> > > > > >>>> Comments inline.
> > > > > >>>>
> > > > > >>>>> -----Original Message-----
> > > > > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > > > > >>>>> Richardson
> > > > > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> > > > > >>>>> To: Adrien Mazarguil
> > > > > >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier
> > > > > >>>>> Matz; Oleg Kuporosov
> > > > > >>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > > > >>>>>
> > > > > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > > > wrote:
> > > > > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup
> > wrote:
> > > > > >>>>>>> Comments inline.
> > > > > >>>>>>>
> > > > > >>>>>>> Med venlig hilsen / kind regards
> > > > > >>>>>>> - Morten Brørup
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>> -----Original Message-----
> > > > > >>>>>>>> From: Adrien Mazarguil
> > > > > >>>>>>>> [mailto:adrien.mazarguil@6wind.com]
> > > > > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> > > > > >>>>>>>> To: Bruce Richardson
> > > > > >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier
> > > > > >>>>>>>> Matz; Oleg Kuporosov
> > > > > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce
> > > > > >>>>>>>> Richardson
> > > > > >>> wrote:
> > > > > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> > > > > >>> wrote:
> > > > > >>>>>>>> [...]
> > > > > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > > > > >>>>>>>> <mb@smartsharesystems.com> wrote:
> > > > > >>>>>>>> [...]
> > > > > >>>>
> > > > > >>>>>>>>> One other point I'll mention is that we need to have a
> > > > > >>>>>>>>> discussion on how/where to add in a timestamp value
> > > > > >>>>>>>>> into
> > > > > >>> the
> > > > > >>>>>>>>> mbuf. Personally, I think it can be in a union with
> > > > > >>>>>>>>> the
> > > > > >>>>> sequence
> > > > > >>>>>>>>> number value, but I also suspect that 32-bits of a
> > > > > >>> timestamp
> > > > > >>>>>>>>> is not going to be enough for
> > > > > >>>>>>>> many.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thoughts?
> > > > > >>>>>>>>
> > > > > >>>>>>>> If we consider that timestamp representation should use
> > > > > >>>>> nanosecond
> > > > > >>>>>>>> granularity, a 32-bit value may likely wrap around too
> > > > > >>> quickly
> > > > > >>>>>>>> to be useful. We can also assume that applications
> > > > requesting
> > > > > >>>>>>>> timestamps may care more about latency than throughput,
> > > > > >>>>>>>> Oleg
> > > > > >>>>> found
> > > > > >>>>>>>> that using the second cache line for this purpose had a
> > > > > >>>>> noticeable impact [1].
> > > > > >>>>>>>>
> > > > > >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-
> > > > October/049237.html
> > > > > >>>>>>>
> > > > > >>>>>>> I agree with Oleg about the latency vs. throughput
> > > > > >>>>>>> importance for
> > > > > >>>>> such applications.
> > > > > >>>>>>>
> > > > > >>>>>>> If you need high resolution timestamps, consider them to
> > > > > >>>>>>> be
> > > > > >>>>> generated by the NIC RX driver, possibly by the hardware
> > > > > >>>>> itself
> > > > > >>>>> (http://w3new.napatech.com/features/time-precision/hardwar
> > > > > >>>>> e-
> > > > time
> > > > > >>>>> - stamp), so the timestamp belongs in the first cache line.
> > > > > >>>>> And I am proposing that it should have the highest
> > > > > >>>>> possible accuracy, which makes the value hardware dependent.
> > > > > >>>>>>>
> > > > > >>>>>>> Furthermore, I am arguing that we leave it up to the
> > > > > >>> application
> > > > > >>>>>>> to
> > > > > >>>>> keep track of the slowly moving bits (i.e. counting whole
> > > > > >>>>> seconds, hours and calendar date) out of band, so we don't
> > > > > >>>>> use precious
> > > > > >>> space
> > > > > >>>>> in the mbuf. The application doesn't need the NIC RX
> > > > > >>>>> driver's fast path to capture which date (or even which
> > > > > >>>>> second) a packet was received. Yes, it adds complexity to
> > > > > >>>>> the application, but
> > > > we
> > > > > >>>>> can't set aside 64 bit for a generic timestamp. Or as a
> > > > > >>>>> weird
> > > > tradeoff:
> > > > > >>>>> Put the fast moving 32 bit in the first cache line and the
> > > > > >>>>> slow moving 32 bit in the second cache line, as a
> > > > > >>>>> placeholder for
> > > > the
> > > > > >>> application to fill out if needed.
> > > > > >>>>> Yes, it means that the application needs to check the time
> > > > > >>>>> and update its variable holding the slow moving time once
> > > > > >>>>> every second or so; but that should be doable without
> > > > > >>>>> significant
> > > > effort.
> > > > > >>>>>>
> > > > > >>>>>> That's a good point, however without a 64 bit value,
> > > > > >>>>>> elapsed time between two arbitrary mbufs cannot be
> > > > > >>>>>> measured reliably due to
> > > > > >>> not
> > > > > >>>>>> enough context, one way or another the low resolution
> > > > > >>>>>> value is also
> > > > > >>>>> needed.
> > > > > >>>>>>
> > > > > >>>>>> Obviously latency-sensitive applications are unlikely to
> > > > > >>>>>> perform lengthy buffering and require this but I'm not
> > > > > >>>>>> sure about all the possible use-cases. Considering many
> > > > > >>>>>> NICs expose
> > > > > >>>>>> 64 bit
> > > > > >>> timestaps,
> > > > > >>>>>> I suggest we do not truncate them.
> > > > > >>>>>>
> > > > > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be
> > > > > >>>>>> tempted to fill the extra 32 bits whenever they can and
> > > > > >>>>>> negate the performance improvement of the first cache line.
> > > > > >>>>>
> > > > > >>>>> I would tend to agree, and I don't really see any
> > > > > >>>>> convenient
> > > > way
> > > > > >>>>> to avoid putting in a 64-bit field for the timestamp in
> > > > > >>>>> cache-
> > > > line 0.
> > > > > >>>>> If we are ok with having this overlap/partially overlap
> > > > > >>>>> with sequence number, it will use up an extra 4B of
> > > > > >>>>> storage in that
> > > > > >>> cacheline.
> > > > > >>>>
> > > > > >>>> I agree about the lack of convenience! And Adrien certainly
> > > > > >>>> has
> > > > a
> > > > > >>> point about PMD temptations.
> > > > > >>>>
> > > > > >>>> However, I still don't think that a NICs ability to
> > > > > >>>> date-stamp a
> > > > > >>> packet is sufficient reason to put a date-stamp in cache
> > > > > >>> line
> > > > > >>> 0
> > > > of
> > > > > >>> the mbuf. Storing only the fast moving 32 bit in cache line
> > > > > >>> 0 seems like a good compromise to me.
> > > > > >>>>
> > > > > >>>> Maybe you can find just one more byte, so it can hold 17
> > > > > >>>> minutes with nanosecond resolution. (I'm joking!)
> > > > > >>>>
> > > > > >>>> Please don't sacrifice the sequence number for the
> > > > > >>>> seconds/hours/days
> > > > > >>> part a timestamp. Maybe it could be configurable to use a 32
> > > > > >>> bit or
> > > > > >>> 64 bit timestamp.
> > > > > >>>>
> > > > > >>> Do you see both timestamp and sequence numbers being used
> > > > together?
> > > > > >>> I would have thought that apps would either use one or the
> > other?
> > > > > >>> However, your suggestion is workable in any case, to allow
> > > > > >>> the sequence number to overlap just the high 32 bits of the
> > > > timestamp,
> > > > > >>> rather than the low.
> > > > > >>
> > > > > >> In our case, I can foresee sequence numbers used for packet
> > > > > >> processing and
> > > > > timestamps for timing analysis (and possibly for packet
> > > > > capturing, when being used). For timing analysis, we don’t need
> > > > > long durations, e.g. 4 seconds with 32 bit nanosecond resolution
> > > > > suffices. And for packet capturing we are perfectly capable of
> > > > > adding the slowly moving
> > > > > 32 bit of the timestamp to our output data stream without
> > > > > fetching it
> > > > from the mbuf.
> > > > > >>
> > > > >
> > > > > We should keep in mind that today we have the seqn field but it
> > > > > is
> > > > not
> > > > > used by any PMD. In case it is implemented, would it be a
> > > > > per-queue
> > > > sequence number?
> > > > > Is it useful from an application view?
> > > > >
> > > > > This field is only used by the librte_reorder library, and in my
> > > > > opinion, we should consider moving it in the second cache line
> > > > > since
> > > > it is not filled by the PMD.
> > > > >
> > > > >
> > > > > > For the 32-bit timestamp case, it might be useful to have a
> > > > > > right-shift value passed in to the ethdev driver. If we assume
> > > > > > a
> > > > NIC
> > > > > > with nanosecond resolution, (or TSC value with resolution of
> > > > > > that order of magnitude), then the app can choose to have 1 ns
> > > > resolution
> > > > > > with 4 second wraparound, or alternatively 4ns resolution with
> > > > > > 16 second wraparound, or even microsecond resolution with wrap
> > > > > > around of over
> > > > > an hour.
> > > > > > The cost is obviously just a shift op in the driver code per
> > > > > > packet
> > > > > > - hopefully with multiple packets done at a time using vector
> > > > operations.
> > > > >
> > > > >
> > > > > About the timestamp, we can manage to find 64 bits in the first
> > > > > cache line, without sacrifying any field we have today. The
> > > > > question is
> > > > more
> > > > > for the fields we may want to add later.
> > > > >
> > > > > To answer to the question of the size of the timestamp, the
> > > > > first question is to know what is the precision required for the
> > > > applications using it?
> > > >
> > > > As part of the 17.02 latency calculation feature, the requirement
> > > > is to report latency in nanoseconds. So, +1 on keeping timestamp
> > > > as
> > 64bits.
> > > > Since the feature is planned for 17.02, can we finalize on the
> > > > timestamp position in the mbuf struct?
> > > > Based on the decision, I am planning to make the change and send
> > > > ABI notice if needed.
> > > >
> > > > Reshma
> > > >
> > > > >
> > > > > I don't quite like the idea of splitting the timestamp in the 2
> > > > > cache lines, I think it would not be easy to use.
> > > > >
> > > > >
> > > > > Olivier
> > >
> > > Nanosecond precision in latency calculations does not require more
> > > than
> > 32 bit, unless we must also be able to measure more than 4 seconds of
> > latency. And please consider how many packets we would need to hold in
> > memory to keep track of 4 seconds of traffic on a 100 Gbit/s link (if
> > we are measuring network latency): With an average packet size of 300
> > B that would be 40 million packets.
> >
> > Consider another, perhaps more realistic use-case: keeping a few
> > relevant packets from a stream for later analysis, which could occur
> > outside of the normal TX processing path for performance reasons. In
> > this case 4 seconds worth of context may not be enough to avoid losing
> information.

Agreed. And I would say that this exception path can add the higher 32 bit itself; it is not required by the PMD or even in the mbuf.

> >
> > > If I recall correctly, the 17.02 feature is about measuring
> > > application
> > latency, not network latency (as my calculation above is about). I
> > don't consider application latency measuring a common feature for most
> > applications, except for application debugging/profiling purposes (not
> > network debugging purposes). So do we really need both nanosecond
> > precision and the ability to measure 4+ seconds of latency?
> >
> > Existing HW can already provide 64-bit precision, why truncate part of
> it?
> > The only valid reason would be that there is not enough room in the
> > mbuf header, which is not the case. Looks like today there is even
> > enough room in the first 64 bytes.
> >
> > As suggested by Bruce, we can even union (part of) this field with
> > another if necessary, as long as we make sure both cannot be requested
> > at the same time (determining which field should be sacrificed is
> > going to be tricky though).

It would make some applications easier to develop if the mbuf contained a 64 bit timestamp; but I don't think these are very common applications. It's a tradeoff, and I'm advocating for conserving the very precious space in the mbuf, at the cost of requiring some extra work in these uncommon applications.

> >
> > > Furthermore, if the timestamp isn't set by hardware, why not just
> > > take
> > the entire array of packets pulled out from the NIC and set their
> > timestamp to the same value, instead of setting them one by one in the
> > PMD
> > - that would also consolidate the expense of obtaining the time from
> > some clock source.
> >
> > Like any offload, some PMD processing is expected to convert the
> > possibly HW-specific value to a common mbuf format, however if
> > hardware does not report any kind of information relevant to timestamp
> > individual packets, I do not think the PMD should expose that feature
> > and implement it in software. It would be much worse, certainly not
> > what the application wanted as it could have done the same on its own.
Agree. And I would go so far as to say that this should be a guiding principle for DPDK. If the NIC HW doesn't support something, the PMD shouldn't try to emulate it in software; instead, DPDK could provide various libraries to offer these features in software, like the packet type library mentioned by Olivier at the recent DPDK Userspace conference.

> >
> > What matters is to know when NIC receives a packet, not when the
> > application fetches it. You should assume there is a significant delay
> > between these two events, hence the need for it to be offloaded.
Agree.

> >
> 
> I would actually see both use cases being used. In some cases, NICs will
> have hardware offload capability for timestamping which can be used, while
> in many cases, timestamping on retrieval by the core will suffice for
> latency measurements within an app. I don't think we can discount either
> possibility. Thankfully, AFAIK which model is used doesn't really affect
> whether we need 32 or 64 bits for the data.
Agreed about 32/64 bit choice not being affected by this.

But take note of Adrien's comment about the potential delay from the packet arrives on the NIC until the application gets round to calling the PMD to fetch it. Software based timestamps may give a false impression about the application latency. Assume that the application polls the PMD and bulk fetches all the packets, then proceeds to processing them, and then starts over. If a packet arrives just after the first batch was fetched, a software timestamp in the PMD will set the time when the application starts fetching the next batch of packets, ignoring all the time the packet spent waiting for being fetched while the application processed the first batch of packets.

> 
> /Bruce

-Morten

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] mbuf changes
  2016-10-28 16:50  0%                           ` Adrien Mazarguil
@ 2016-10-28 17:00  0%                             ` Richardson, Bruce
  2016-10-28 20:27  0%                               ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Richardson, Bruce @ 2016-10-28 17:00 UTC (permalink / raw)
  To: Adrien Mazarguil, Morten Brørup; +Cc: dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Friday, October 28, 2016 5:50 PM
> To: Morten Brørup <mb@smartsharesystems.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] mbuf changes
> 
> On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Brørup wrote:
> > Comments at the end.
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pattan, Reshma
> > > Sent: Friday, October 28, 2016 3:35 PM
> > > To: Olivier Matz
> > > Cc: dev@dpdk.org; Morten Brørup
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > > Hi Olivier,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > > To: Richardson, Bruce <bruce.richardson@intel.com>; Morten Brørup
> > > > <mb@smartsharesystems.com>
> > > > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> > > > <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> > > > <olegk@mellanox.com>
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > >
> > > >
> > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> > > > >> Comments at the end.
> > > > >>
> > > > >> Med venlig hilsen / kind regards
> > > > >> - Morten Brørup
> > > > >>
> > > > >>> -----Original Message-----
> > > > >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > > >>> To: Morten Brørup
> > > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier
> > > > >>> Matz; Oleg Kuporosov
> > > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>
> > > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> > > > >>>> Comments inline.
> > > > >>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > > > >>>>> Richardson
> > > > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> > > > >>>>> To: Adrien Mazarguil
> > > > >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > > > >>>>> Oleg Kuporosov
> > > > >>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>>>
> > > > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > > wrote:
> > > > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup
> wrote:
> > > > >>>>>>> Comments inline.
> > > > >>>>>>>
> > > > >>>>>>> Med venlig hilsen / kind regards
> > > > >>>>>>> - Morten Brørup
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> -----Original Message-----
> > > > >>>>>>>> From: Adrien Mazarguil
> > > > >>>>>>>> [mailto:adrien.mazarguil@6wind.com]
> > > > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> > > > >>>>>>>> To: Bruce Richardson
> > > > >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier
> > > > >>>>>>>> Matz; Oleg Kuporosov
> > > > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > > >>>>>>>>
> > > > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce
> > > > >>>>>>>> Richardson
> > > > >>> wrote:
> > > > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> > > > >>> wrote:
> > > > >>>>>>>> [...]
> > > > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > > > >>>>>>>> <mb@smartsharesystems.com> wrote:
> > > > >>>>>>>> [...]
> > > > >>>>
> > > > >>>>>>>>> One other point I'll mention is that we need to have a
> > > > >>>>>>>>> discussion on how/where to add in a timestamp value into
> > > > >>> the
> > > > >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> > > > >>>>> sequence
> > > > >>>>>>>>> number value, but I also suspect that 32-bits of a
> > > > >>> timestamp
> > > > >>>>>>>>> is not going to be enough for
> > > > >>>>>>>> many.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thoughts?
> > > > >>>>>>>>
> > > > >>>>>>>> If we consider that timestamp representation should use
> > > > >>>>> nanosecond
> > > > >>>>>>>> granularity, a 32-bit value may likely wrap around too
> > > > >>> quickly
> > > > >>>>>>>> to be useful. We can also assume that applications
> > > requesting
> > > > >>>>>>>> timestamps may care more about latency than throughput,
> > > > >>>>>>>> Oleg
> > > > >>>>> found
> > > > >>>>>>>> that using the second cache line for this purpose had a
> > > > >>>>> noticeable impact [1].
> > > > >>>>>>>>
> > > > >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-
> > > October/049237.html
> > > > >>>>>>>
> > > > >>>>>>> I agree with Oleg about the latency vs. throughput
> > > > >>>>>>> importance for
> > > > >>>>> such applications.
> > > > >>>>>>>
> > > > >>>>>>> If you need high resolution timestamps, consider them to
> > > > >>>>>>> be
> > > > >>>>> generated by the NIC RX driver, possibly by the hardware
> > > > >>>>> itself
> > > > >>>>> (http://w3new.napatech.com/features/time-precision/hardware-
> > > time
> > > > >>>>> - stamp), so the timestamp belongs in the first cache line.
> > > > >>>>> And I am proposing that it should have the highest possible
> > > > >>>>> accuracy, which makes the value hardware dependent.
> > > > >>>>>>>
> > > > >>>>>>> Furthermore, I am arguing that we leave it up to the
> > > > >>> application
> > > > >>>>>>> to
> > > > >>>>> keep track of the slowly moving bits (i.e. counting whole
> > > > >>>>> seconds, hours and calendar date) out of band, so we don't
> > > > >>>>> use precious
> > > > >>> space
> > > > >>>>> in the mbuf. The application doesn't need the NIC RX
> > > > >>>>> driver's fast path to capture which date (or even which
> > > > >>>>> second) a packet was received. Yes, it adds complexity to
> > > > >>>>> the application, but
> > > we
> > > > >>>>> can't set aside 64 bit for a generic timestamp. Or as a
> > > > >>>>> weird
> > > tradeoff:
> > > > >>>>> Put the fast moving 32 bit in the first cache line and the
> > > > >>>>> slow moving 32 bit in the second cache line, as a
> > > > >>>>> placeholder for
> > > the
> > > > >>> application to fill out if needed.
> > > > >>>>> Yes, it means that the application needs to check the time
> > > > >>>>> and update its variable holding the slow moving time once
> > > > >>>>> every second or so; but that should be doable without
> > > > >>>>> significant
> > > effort.
> > > > >>>>>>
> > > > >>>>>> That's a good point, however without a 64 bit value,
> > > > >>>>>> elapsed time between two arbitrary mbufs cannot be measured
> > > > >>>>>> reliably due to
> > > > >>> not
> > > > >>>>>> enough context, one way or another the low resolution value
> > > > >>>>>> is also
> > > > >>>>> needed.
> > > > >>>>>>
> > > > >>>>>> Obviously latency-sensitive applications are unlikely to
> > > > >>>>>> perform lengthy buffering and require this but I'm not sure
> > > > >>>>>> about all the possible use-cases. Considering many NICs
> > > > >>>>>> expose
> > > > >>>>>> 64 bit
> > > > >>> timestaps,
> > > > >>>>>> I suggest we do not truncate them.
> > > > >>>>>>
> > > > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be
> > > > >>>>>> tempted to fill the extra 32 bits whenever they can and
> > > > >>>>>> negate the performance improvement of the first cache line.
> > > > >>>>>
> > > > >>>>> I would tend to agree, and I don't really see any convenient
> > > way
> > > > >>>>> to avoid putting in a 64-bit field for the timestamp in
> > > > >>>>> cache-
> > > line 0.
> > > > >>>>> If we are ok with having this overlap/partially overlap with
> > > > >>>>> sequence number, it will use up an extra 4B of storage in
> > > > >>>>> that
> > > > >>> cacheline.
> > > > >>>>
> > > > >>>> I agree about the lack of convenience! And Adrien certainly
> > > > >>>> has
> > > a
> > > > >>> point about PMD temptations.
> > > > >>>>
> > > > >>>> However, I still don't think that a NICs ability to
> > > > >>>> date-stamp a
> > > > >>> packet is sufficient reason to put a date-stamp in cache line
> > > > >>> 0
> > > of
> > > > >>> the mbuf. Storing only the fast moving 32 bit in cache line 0
> > > > >>> seems like a good compromise to me.
> > > > >>>>
> > > > >>>> Maybe you can find just one more byte, so it can hold 17
> > > > >>>> minutes with nanosecond resolution. (I'm joking!)
> > > > >>>>
> > > > >>>> Please don't sacrifice the sequence number for the
> > > > >>>> seconds/hours/days
> > > > >>> part a timestamp. Maybe it could be configurable to use a 32
> > > > >>> bit or
> > > > >>> 64 bit timestamp.
> > > > >>>>
> > > > >>> Do you see both timestamp and sequence numbers being used
> > > together?
> > > > >>> I would have thought that apps would either use one or the
> other?
> > > > >>> However, your suggestion is workable in any case, to allow the
> > > > >>> sequence number to overlap just the high 32 bits of the
> > > timestamp,
> > > > >>> rather than the low.
> > > > >>
> > > > >> In our case, I can foresee sequence numbers used for packet
> > > > >> processing and
> > > > timestamps for timing analysis (and possibly for packet capturing,
> > > > when being used). For timing analysis, we don’t need long
> > > > durations, e.g. 4 seconds with 32 bit nanosecond resolution
> > > > suffices. And for packet capturing we are perfectly capable of
> > > > adding the slowly moving
> > > > 32 bit of the timestamp to our output data stream without fetching
> > > > it
> > > from the mbuf.
> > > > >>
> > > >
> > > > We should keep in mind that today we have the seqn field but it is
> > > not
> > > > used by any PMD. In case it is implemented, would it be a
> > > > per-queue
> > > sequence number?
> > > > Is it useful from an application view?
> > > >
> > > > This field is only used by the librte_reorder library, and in my
> > > > opinion, we should consider moving it in the second cache line
> > > > since
> > > it is not filled by the PMD.
> > > >
> > > >
> > > > > For the 32-bit timestamp case, it might be useful to have a
> > > > > right-shift value passed in to the ethdev driver. If we assume a
> > > NIC
> > > > > with nanosecond resolution, (or TSC value with resolution of
> > > > > that order of magnitude), then the app can choose to have 1 ns
> > > resolution
> > > > > with 4 second wraparound, or alternatively 4ns resolution with
> > > > > 16 second wraparound, or even microsecond resolution with wrap
> > > > > around of over
> > > > an hour.
> > > > > The cost is obviously just a shift op in the driver code per
> > > > > packet
> > > > > - hopefully with multiple packets done at a time using vector
> > > operations.
> > > >
> > > >
> > > > About the timestamp, we can manage to find 64 bits in the first
> > > > cache line, without sacrifying any field we have today. The
> > > > question is
> > > more
> > > > for the fields we may want to add later.
> > > >
> > > > To answer to the question of the size of the timestamp, the first
> > > > question is to know what is the precision required for the
> > > applications using it?
> > >
> > > As part of the 17.02 latency calculation feature, the requirement is
> > > to report latency in nanoseconds. So, +1 on keeping timestamp as
> 64bits.
> > > Since the feature is planned for 17.02, can we finalize on the
> > > timestamp position in the mbuf struct?
> > > Based on the decision, I am planning to make the change and send ABI
> > > notice if needed.
> > >
> > > Reshma
> > >
> > > >
> > > > I don't quite like the idea of splitting the timestamp in the 2
> > > > cache lines, I think it would not be easy to use.
> > > >
> > > >
> > > > Olivier
> >
> > Nanosecond precision in latency calculations does not require more than
> 32 bit, unless we must also be able to measure more than 4 seconds of
> latency. And please consider how many packets we would need to hold in
> memory to keep track of 4 seconds of traffic on a 100 Gbit/s link (if we
> are measuring network latency): With an average packet size of 300 B that
> would be 40 million packets.
> 
> Consider another, perhaps more realistic use-case: keeping a few relevant
> packets from a stream for later analysis, which could occur outside of the
> normal TX processing path for performance reasons. In this case 4 seconds
> worth of context may not be enough to avoid losing information.
> 
> > If I recall correctly, the 17.02 feature is about measuring application
> latency, not network latency (as my calculation above is about). I don't
> consider application latency measuring a common feature for most
> applications, except for application debugging/profiling purposes (not
> network debugging purposes). So do we really need both nanosecond
> precision and the ability to measure 4+ seconds of latency?
> 
> Existing HW can already provide 64-bit precision, why truncate part of it?
> The only valid reason would be that there is not enough room in the mbuf
> header, which is not the case. Looks like today there is even enough room
> in the first 64 bytes.
> 
> As suggested by Bruce, we can even union (part of) this field with another
> if necessary, as long as we make sure both cannot be requested at the same
> time (determining which field should be sacrificed is going to be tricky
> though).
> 
> > Furthermore, if the timestamp isn't set by hardware, why not just take
> the entire array of packets pulled out from the NIC and set their
> timestamp to the same value, instead of setting them one by one in the PMD
> - that would also consolidate the expense of obtaining the time from some
> clock source.
> 
> Like any offload, some PMD processing is expected to convert the possibly
> HW-specific value to a common mbuf format, however if hardware does not
> report any kind of information relevant to timestamp individual packets, I
> do not think the PMD should expose that feature and implement it in
> software. It would be much worse, certainly not what the application
> wanted as it could have done the same on its own.
> 
> What matters is to know when NIC receives a packet, not when the
> application fetches it. You should assume there is a significant delay
> between these two events, hence the need for it to be offloaded.
> 

I would actually see both use cases being used. In some cases, NICs will
have hardware offload capability for timestamping which can be used, while
in many cases, timestamping on retrieval by the core will suffice for
latency measurements within an app. I don't think we can discount either
possibility. Thankfully, AFAIK which model is used doesn't really affect
whether we need 32 or 64 bits for the data.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] mbuf changes
  2016-10-28 14:11  0%                         ` Morten Brørup
  2016-10-28 15:25  0%                           ` Pattan, Reshma
@ 2016-10-28 16:50  0%                           ` Adrien Mazarguil
  2016-10-28 17:00  0%                             ` Richardson, Bruce
  1 sibling, 1 reply; 200+ results
From: Adrien Mazarguil @ 2016-10-28 16:50 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Fri, Oct 28, 2016 at 04:11:45PM +0200, Morten Brørup wrote:
> Comments at the end.
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pattan, Reshma
> > Sent: Friday, October 28, 2016 3:35 PM
> > To: Olivier Matz
> > Cc: dev@dpdk.org; Morten Brørup
> > Subject: Re: [dpdk-dev] mbuf changes
> > 
> > Hi Olivier,
> > 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>; Morten Brørup
> > > <mb@smartsharesystems.com>
> > > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> > > <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> > > <olegk@mellanox.com>
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > >
> > >
> > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> > > >> Comments at the end.
> > > >>
> > > >> Med venlig hilsen / kind regards
> > > >> - Morten Brørup
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > >>> To: Morten Brørup
> > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > > >>> Oleg Kuporosov
> > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>
> > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> > > >>>> Comments inline.
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > > >>>>> Richardson
> > > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> > > >>>>> To: Adrien Mazarguil
> > > >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > > >>>>> Oleg Kuporosov
> > > >>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>>>
> > > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > wrote:
> > > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> > > >>>>>>> Comments inline.
> > > >>>>>>>
> > > >>>>>>> Med venlig hilsen / kind regards
> > > >>>>>>> - Morten Brørup
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> -----Original Message-----
> > > >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> > > >>>>>>>> To: Bruce Richardson
> > > >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz;
> > > >>>>>>>> Oleg Kuporosov
> > > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>>>>>>
> > > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > > >>> wrote:
> > > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> > > >>> wrote:
> > > >>>>>>>> [...]
> > > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > > >>>>>>>> <mb@smartsharesystems.com> wrote:
> > > >>>>>>>> [...]
> > > >>>>
> > > >>>>>>>>> One other point I'll mention is that we need to have a
> > > >>>>>>>>> discussion on how/where to add in a timestamp value into
> > > >>> the
> > > >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> > > >>>>> sequence
> > > >>>>>>>>> number value, but I also suspect that 32-bits of a
> > > >>> timestamp
> > > >>>>>>>>> is not going to be enough for
> > > >>>>>>>> many.
> > > >>>>>>>>>
> > > >>>>>>>>> Thoughts?
> > > >>>>>>>>
> > > >>>>>>>> If we consider that timestamp representation should use
> > > >>>>> nanosecond
> > > >>>>>>>> granularity, a 32-bit value may likely wrap around too
> > > >>> quickly
> > > >>>>>>>> to be useful. We can also assume that applications
> > requesting
> > > >>>>>>>> timestamps may care more about latency than throughput, Oleg
> > > >>>>> found
> > > >>>>>>>> that using the second cache line for this purpose had a
> > > >>>>> noticeable impact [1].
> > > >>>>>>>>
> > > >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-
> > October/049237.html
> > > >>>>>>>
> > > >>>>>>> I agree with Oleg about the latency vs. throughput importance
> > > >>>>>>> for
> > > >>>>> such applications.
> > > >>>>>>>
> > > >>>>>>> If you need high resolution timestamps, consider them to be
> > > >>>>> generated by the NIC RX driver, possibly by the hardware itself
> > > >>>>> (http://w3new.napatech.com/features/time-precision/hardware-
> > time
> > > >>>>> - stamp), so the timestamp belongs in the first cache line. And
> > > >>>>> I am proposing that it should have the highest possible
> > > >>>>> accuracy, which makes the value hardware dependent.
> > > >>>>>>>
> > > >>>>>>> Furthermore, I am arguing that we leave it up to the
> > > >>> application
> > > >>>>>>> to
> > > >>>>> keep track of the slowly moving bits (i.e. counting whole
> > > >>>>> seconds, hours and calendar date) out of band, so we don't use
> > > >>>>> precious
> > > >>> space
> > > >>>>> in the mbuf. The application doesn't need the NIC RX driver's
> > > >>>>> fast path to capture which date (or even which second) a packet
> > > >>>>> was received. Yes, it adds complexity to the application, but
> > we
> > > >>>>> can't set aside 64 bit for a generic timestamp. Or as a weird
> > tradeoff:
> > > >>>>> Put the fast moving 32 bit in the first cache line and the slow
> > > >>>>> moving 32 bit in the second cache line, as a placeholder for
> > the
> > > >>> application to fill out if needed.
> > > >>>>> Yes, it means that the application needs to check the time and
> > > >>>>> update its variable holding the slow moving time once every
> > > >>>>> second or so; but that should be doable without significant
> > effort.
> > > >>>>>>
> > > >>>>>> That's a good point, however without a 64 bit value, elapsed
> > > >>>>>> time between two arbitrary mbufs cannot be measured reliably
> > > >>>>>> due to
> > > >>> not
> > > >>>>>> enough context, one way or another the low resolution value is
> > > >>>>>> also
> > > >>>>> needed.
> > > >>>>>>
> > > >>>>>> Obviously latency-sensitive applications are unlikely to
> > > >>>>>> perform lengthy buffering and require this but I'm not sure
> > > >>>>>> about all the possible use-cases. Considering many NICs expose
> > > >>>>>> 64 bit
> > > >>> timestaps,
> > > >>>>>> I suggest we do not truncate them.
> > > >>>>>>
> > > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be
> > > >>>>>> tempted to fill the extra 32 bits whenever they can and negate
> > > >>>>>> the performance improvement of the first cache line.
> > > >>>>>
> > > >>>>> I would tend to agree, and I don't really see any convenient
> > way
> > > >>>>> to avoid putting in a 64-bit field for the timestamp in cache-
> > line 0.
> > > >>>>> If we are ok with having this overlap/partially overlap with
> > > >>>>> sequence number, it will use up an extra 4B of storage in that
> > > >>> cacheline.
> > > >>>>
> > > >>>> I agree about the lack of convenience! And Adrien certainly has
> > a
> > > >>> point about PMD temptations.
> > > >>>>
> > > >>>> However, I still don't think that a NICs ability to date-stamp a
> > > >>> packet is sufficient reason to put a date-stamp in cache line 0
> > of
> > > >>> the mbuf. Storing only the fast moving 32 bit in cache line 0
> > > >>> seems like a good compromise to me.
> > > >>>>
> > > >>>> Maybe you can find just one more byte, so it can hold 17 minutes
> > > >>>> with nanosecond resolution. (I'm joking!)
> > > >>>>
> > > >>>> Please don't sacrifice the sequence number for the
> > > >>>> seconds/hours/days
> > > >>> part a timestamp. Maybe it could be configurable to use a 32 bit
> > > >>> or
> > > >>> 64 bit timestamp.
> > > >>>>
> > > >>> Do you see both timestamp and sequence numbers being used
> > together?
> > > >>> I would have thought that apps would either use one or the other?
> > > >>> However, your suggestion is workable in any case, to allow the
> > > >>> sequence number to overlap just the high 32 bits of the
> > timestamp,
> > > >>> rather than the low.
> > > >>
> > > >> In our case, I can foresee sequence numbers used for packet
> > > >> processing and
> > > timestamps for timing analysis (and possibly for packet capturing,
> > > when being used). For timing analysis, we don’t need long durations,
> > > e.g. 4 seconds with 32 bit nanosecond resolution suffices. And for
> > > packet capturing we are perfectly capable of adding the slowly moving
> > > 32 bit of the timestamp to our output data stream without fetching it
> > from the mbuf.
> > > >>
> > >
> > > We should keep in mind that today we have the seqn field but it is
> > not
> > > used by any PMD. In case it is implemented, would it be a per-queue
> > sequence number?
> > > Is it useful from an application view?
> > >
> > > This field is only used by the librte_reorder library, and in my
> > > opinion, we should consider moving it in the second cache line since
> > it is not filled by the PMD.
> > >
> > >
> > > > For the 32-bit timestamp case, it might be useful to have a
> > > > right-shift value passed in to the ethdev driver. If we assume a
> > NIC
> > > > with nanosecond resolution, (or TSC value with resolution of that
> > > > order of magnitude), then the app can choose to have 1 ns
> > resolution
> > > > with 4 second wraparound, or alternatively 4ns resolution with 16
> > > > second wraparound, or even microsecond resolution with wrap around
> > > > of over
> > > an hour.
> > > > The cost is obviously just a shift op in the driver code per packet
> > > > - hopefully with multiple packets done at a time using vector
> > operations.
> > >
> > >
> > > About the timestamp, we can manage to find 64 bits in the first cache
> > > line, without sacrifying any field we have today. The question is
> > more
> > > for the fields we may want to add later.
> > >
> > > To answer to the question of the size of the timestamp, the first
> > > question is to know what is the precision required for the
> > applications using it?
> > 
> > As part of the 17.02 latency calculation feature, the requirement is to
> > report latency in nanoseconds. So, +1 on keeping timestamp as 64bits.
> > Since the feature is planned for 17.02, can we finalize on the
> > timestamp position in the mbuf struct?
> > Based on the decision, I am planning to make the change and send ABI
> > notice if needed.
> > 
> > Reshma
> > 
> > >
> > > I don't quite like the idea of splitting the timestamp in the 2 cache
> > > lines, I think it would not be easy to use.
> > >
> > >
> > > Olivier
> 
> Nanosecond precision in latency calculations does not require more than 32 bit, unless we must also be able to measure more than 4 seconds of latency. And please consider how many packets we would need to hold in memory to keep track of 4 seconds of traffic on a 100 Gbit/s link (if we are measuring network latency): With an average packet size of 300 B that would be 40 million packets.

Consider another, perhaps more realistic use-case: keeping a few relevant
packets from a stream for later analysis, which could occur outside of the
normal TX processing path for performance reasons. In this case 4 seconds
worth of context may not be enough to avoid losing information.

> If I recall correctly, the 17.02 feature is about measuring application latency, not network latency (as my calculation above is about). I don't consider application latency measuring a common feature for most applications, except for application debugging/profiling purposes (not network debugging purposes). So do we really need both nanosecond precision and the ability to measure 4+ seconds of latency?

Existing HW can already provide 64-bit precision, why truncate part of it?
The only valid reason would be that there is not enough room in the mbuf
header, which is not the case. Looks like today there is even enough room in
the first 64 bytes.

As suggested by Bruce, we can even union (part of) this field with another
if necessary, as long as we make sure both cannot be requested at the same
time (determining which field should be sacrificed is going to be tricky
though).

> Furthermore, if the timestamp isn't set by hardware, why not just take the entire array of packets pulled out from the NIC and set their timestamp to the same value, instead of setting them one by one in the PMD - that would also consolidate the expense of obtaining the time from some clock source.

Like any offload, some PMD processing is expected to convert the possibly
HW-specific value to a common mbuf format, however if hardware does not
report any kind of information relevant to timestamp individual packets, I
do not think the PMD should expose that feature and implement it in
software. It would be much worse, certainly not what the application
wanted as it could have done the same on its own.

What matters is to know when NIC receives a packet, not when the application
fetches it. You should assume there is a significant delay between these two
events, hence the need for it to be offloaded.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] mbuf changes
  2016-10-28 14:11  0%                         ` Morten Brørup
@ 2016-10-28 15:25  0%                           ` Pattan, Reshma
  2016-10-28 16:50  0%                           ` Adrien Mazarguil
  1 sibling, 0 replies; 200+ results
From: Pattan, Reshma @ 2016-10-28 15:25 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz; +Cc: dev



> -----Original Message-----
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Friday, October 28, 2016 3:12 PM
> To: Pattan, Reshma <reshma.pattan@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] mbuf changes
> 
> Comments at the end.
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pattan, Reshma
> > Sent: Friday, October 28, 2016 3:35 PM
> > To: Olivier Matz
> > Cc: dev@dpdk.org; Morten Brørup
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> > Hi Olivier,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Tuesday, October 25, 2016 1:49 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>; Morten Brørup
> > > <mb@smartsharesystems.com>
> > > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> > > <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> > > <olegk@mellanox.com>
> > > Subject: Re: [dpdk-dev] mbuf changes
> > >
> > >
> > >
> > > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> > > >> Comments at the end.
> > > >>
> > > >> Med venlig hilsen / kind regards
> > > >> - Morten Brørup
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > > >>> To: Morten Brørup
> > > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > > >>> Oleg Kuporosov
> > > >>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>
> > > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> > > >>>> Comments inline.
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > > >>>>> Richardson
> > > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> > > >>>>> To: Adrien Mazarguil
> > > >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > > >>>>> Oleg Kuporosov
> > > >>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>>>
> > > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> > wrote:
> > > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> > > >>>>>>> Comments inline.
> > > >>>>>>>
> > > >>>>>>> Med venlig hilsen / kind regards
> > > >>>>>>> - Morten Brørup
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> -----Original Message-----
> > > >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> > > >>>>>>>> To: Bruce Richardson
> > > >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier
> > > >>>>>>>> Matz; Oleg Kuporosov
> > > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> > > >>>>>>>>
> > > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > > >>> wrote:
> > > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> > > >>> wrote:
> > > >>>>>>>> [...]
> > > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > > >>>>>>>> <mb@smartsharesystems.com> wrote:
> > > >>>>>>>> [...]
> > > >>>>
> > > >>>>>>>>> One other point I'll mention is that we need to have a
> > > >>>>>>>>> discussion on how/where to add in a timestamp value into
> > > >>> the
> > > >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> > > >>>>> sequence
> > > >>>>>>>>> number value, but I also suspect that 32-bits of a
> > > >>> timestamp
> > > >>>>>>>>> is not going to be enough for
> > > >>>>>>>> many.
> > > >>>>>>>>>
> > > >>>>>>>>> Thoughts?
> > > >>>>>>>>
> > > >>>>>>>> If we consider that timestamp representation should use
> > > >>>>> nanosecond
> > > >>>>>>>> granularity, a 32-bit value may likely wrap around too
> > > >>> quickly
> > > >>>>>>>> to be useful. We can also assume that applications
> > requesting
> > > >>>>>>>> timestamps may care more about latency than throughput,
> > > >>>>>>>> Oleg
> > > >>>>> found
> > > >>>>>>>> that using the second cache line for this purpose had a
> > > >>>>> noticeable impact [1].
> > > >>>>>>>>
> > > >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-
> > October/049237.html
> > > >>>>>>>
> > > >>>>>>> I agree with Oleg about the latency vs. throughput
> > > >>>>>>> importance for
> > > >>>>> such applications.
> > > >>>>>>>
> > > >>>>>>> If you need high resolution timestamps, consider them to be
> > > >>>>> generated by the NIC RX driver, possibly by the hardware
> > > >>>>> itself
> > > >>>>> (http://w3new.napatech.com/features/time-precision/hardware-
> > time
> > > >>>>> - stamp), so the timestamp belongs in the first cache line.
> > > >>>>> And I am proposing that it should have the highest possible
> > > >>>>> accuracy, which makes the value hardware dependent.
> > > >>>>>>>
> > > >>>>>>> Furthermore, I am arguing that we leave it up to the
> > > >>> application
> > > >>>>>>> to
> > > >>>>> keep track of the slowly moving bits (i.e. counting whole
> > > >>>>> seconds, hours and calendar date) out of band, so we don't use
> > > >>>>> precious
> > > >>> space
> > > >>>>> in the mbuf. The application doesn't need the NIC RX driver's
> > > >>>>> fast path to capture which date (or even which second) a
> > > >>>>> packet was received. Yes, it adds complexity to the
> > > >>>>> application, but
> > we
> > > >>>>> can't set aside 64 bit for a generic timestamp. Or as a weird
> > tradeoff:
> > > >>>>> Put the fast moving 32 bit in the first cache line and the
> > > >>>>> slow moving 32 bit in the second cache line, as a placeholder
> > > >>>>> for
> > the
> > > >>> application to fill out if needed.
> > > >>>>> Yes, it means that the application needs to check the time and
> > > >>>>> update its variable holding the slow moving time once every
> > > >>>>> second or so; but that should be doable without significant
> > effort.
> > > >>>>>>
> > > >>>>>> That's a good point, however without a 64 bit value, elapsed
> > > >>>>>> time between two arbitrary mbufs cannot be measured reliably
> > > >>>>>> due to
> > > >>> not
> > > >>>>>> enough context, one way or another the low resolution value
> > > >>>>>> is also
> > > >>>>> needed.
> > > >>>>>>
> > > >>>>>> Obviously latency-sensitive applications are unlikely to
> > > >>>>>> perform lengthy buffering and require this but I'm not sure
> > > >>>>>> about all the possible use-cases. Considering many NICs
> > > >>>>>> expose
> > > >>>>>> 64 bit
> > > >>> timestaps,
> > > >>>>>> I suggest we do not truncate them.
> > > >>>>>>
> > > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be
> > > >>>>>> tempted to fill the extra 32 bits whenever they can and
> > > >>>>>> negate the performance improvement of the first cache line.
> > > >>>>>
> > > >>>>> I would tend to agree, and I don't really see any convenient
> > way
> > > >>>>> to avoid putting in a 64-bit field for the timestamp in cache-
> > line 0.
> > > >>>>> If we are ok with having this overlap/partially overlap with
> > > >>>>> sequence number, it will use up an extra 4B of storage in that
> > > >>> cacheline.
> > > >>>>
> > > >>>> I agree about the lack of convenience! And Adrien certainly has
> > a
> > > >>> point about PMD temptations.
> > > >>>>
> > > >>>> However, I still don't think that a NICs ability to date-stamp
> > > >>>> a
> > > >>> packet is sufficient reason to put a date-stamp in cache line 0
> > of
> > > >>> the mbuf. Storing only the fast moving 32 bit in cache line 0
> > > >>> seems like a good compromise to me.
> > > >>>>
> > > >>>> Maybe you can find just one more byte, so it can hold 17
> > > >>>> minutes with nanosecond resolution. (I'm joking!)
> > > >>>>
> > > >>>> Please don't sacrifice the sequence number for the
> > > >>>> seconds/hours/days
> > > >>> part a timestamp. Maybe it could be configurable to use a 32 bit
> > > >>> or
> > > >>> 64 bit timestamp.
> > > >>>>
> > > >>> Do you see both timestamp and sequence numbers being used
> > together?
> > > >>> I would have thought that apps would either use one or the other?
> > > >>> However, your suggestion is workable in any case, to allow the
> > > >>> sequence number to overlap just the high 32 bits of the
> > timestamp,
> > > >>> rather than the low.
> > > >>
> > > >> In our case, I can foresee sequence numbers used for packet
> > > >> processing and
> > > timestamps for timing analysis (and possibly for packet capturing,
> > > when being used). For timing analysis, we don’t need long durations,
> > > e.g. 4 seconds with 32 bit nanosecond resolution suffices. And for
> > > packet capturing we are perfectly capable of adding the slowly
> > > moving
> > > 32 bit of the timestamp to our output data stream without fetching
> > > it
> > from the mbuf.
> > > >>
> > >
> > > We should keep in mind that today we have the seqn field but it is
> > not
> > > used by any PMD. In case it is implemented, would it be a per-queue
> > sequence number?
> > > Is it useful from an application view?
> > >
> > > This field is only used by the librte_reorder library, and in my
> > > opinion, we should consider moving it in the second cache line since
> > it is not filled by the PMD.
> > >
> > >
> > > > For the 32-bit timestamp case, it might be useful to have a
> > > > right-shift value passed in to the ethdev driver. If we assume a
> > NIC
> > > > with nanosecond resolution, (or TSC value with resolution of that
> > > > order of magnitude), then the app can choose to have 1 ns
> > resolution
> > > > with 4 second wraparound, or alternatively 4ns resolution with 16
> > > > second wraparound, or even microsecond resolution with wrap around
> > > > of over
> > > an hour.
> > > > The cost is obviously just a shift op in the driver code per
> > > > packet
> > > > - hopefully with multiple packets done at a time using vector
> > operations.
> > >
> > >
> > > About the timestamp, we can manage to find 64 bits in the first
> > > cache line, without sacrifying any field we have today. The question
> > > is
> > more
> > > for the fields we may want to add later.
> > >
> > > To answer to the question of the size of the timestamp, the first
> > > question is to know what is the precision required for the
> > applications using it?
> >
> > As part of the 17.02 latency calculation feature, the requirement is
> > to report latency in nanoseconds. So, +1 on keeping timestamp as 64bits.
> > Since the feature is planned for 17.02, can we finalize on the
> > timestamp position in the mbuf struct?
> > Based on the decision, I am planning to make the change and send ABI
> > notice if needed.
> >
> > Reshma
> >
> > >
> > > I don't quite like the idea of splitting the timestamp in the 2
> > > cache lines, I think it would not be easy to use.
> > >
> > >
> > > Olivier
> 
> Nanosecond precision in latency calculations does not require more than 32 bit,
> unless we must also be able to measure more than 4 seconds of latency. And
> please consider how many packets we would need to hold in memory to keep
> track of 4 seconds of traffic on a 100 Gbit/s link (if we are measuring network
> latency): With an average packet size of 300 B that would be 40 million packets.
> 

Ok, I got the point, For 17.02 feature, latency is not measured for a particular duration of time, instead
 average latency is measured on the fly for every time stamped packet that is observed on the Tx and 
on Rx side packets are marked with timestamps for every sampling intervals. I should be able to say 
my opinion on the size again after having internal discussion.

Reshma

> If I recall correctly, the 17.02 feature is about measuring application latency, not
> network latency (as my calculation above is about). I don't consider application
> latency measuring a common feature for most applications, except for
> application debugging/profiling purposes (not network debugging purposes). So
> do we really need both nanosecond precision and the ability to measure 4+
> seconds of latency?
> 
> Furthermore, if the timestamp isn't set by hardware, why not just take the entire
> array of packets pulled out from the NIC and set their timestamp to the same
> value, instead of setting them one by one in the PMD - that would also
> consolidate the expense of obtaining the time from some clock source.
> 
> -Morten


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] mbuf changes
  2016-10-28 13:34  3%                       ` Pattan, Reshma
@ 2016-10-28 14:11  0%                         ` Morten Brørup
  2016-10-28 15:25  0%                           ` Pattan, Reshma
  2016-10-28 16:50  0%                           ` Adrien Mazarguil
  0 siblings, 2 replies; 200+ results
From: Morten Brørup @ 2016-10-28 14:11 UTC (permalink / raw)
  To: Pattan, Reshma, Olivier Matz; +Cc: dev

Comments at the end.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pattan, Reshma
> Sent: Friday, October 28, 2016 3:35 PM
> To: Olivier Matz
> Cc: dev@dpdk.org; Morten Brørup
> Subject: Re: [dpdk-dev] mbuf changes
> 
> Hi Olivier,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> > Sent: Tuesday, October 25, 2016 1:49 PM
> > To: Richardson, Bruce <bruce.richardson@intel.com>; Morten Brørup
> > <mb@smartsharesystems.com>
> > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> > <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> > <olegk@mellanox.com>
> > Subject: Re: [dpdk-dev] mbuf changes
> >
> >
> >
> > On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> > >> Comments at the end.
> > >>
> > >> Med venlig hilsen / kind regards
> > >> - Morten Brørup
> > >>
> > >>> -----Original Message-----
> > >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > >>> Sent: Tuesday, October 25, 2016 2:20 PM
> > >>> To: Morten Brørup
> > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > >>> Oleg Kuporosov
> > >>> Subject: Re: [dpdk-dev] mbuf changes
> > >>>
> > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> > >>>> Comments inline.
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > >>>>> Richardson
> > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> > >>>>> To: Adrien Mazarguil
> > >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> > >>>>> Oleg Kuporosov
> > >>>>> Subject: Re: [dpdk-dev] mbuf changes
> > >>>>>
> > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil
> wrote:
> > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> > >>>>>>> Comments inline.
> > >>>>>>>
> > >>>>>>> Med venlig hilsen / kind regards
> > >>>>>>> - Morten Brørup
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> > >>>>>>>> To: Bruce Richardson
> > >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz;
> > >>>>>>>> Oleg Kuporosov
> > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> > >>>>>>>>
> > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > >>> wrote:
> > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> > >>> wrote:
> > >>>>>>>> [...]
> > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > >>>>>>>> <mb@smartsharesystems.com> wrote:
> > >>>>>>>> [...]
> > >>>>
> > >>>>>>>>> One other point I'll mention is that we need to have a
> > >>>>>>>>> discussion on how/where to add in a timestamp value into
> > >>> the
> > >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> > >>>>> sequence
> > >>>>>>>>> number value, but I also suspect that 32-bits of a
> > >>> timestamp
> > >>>>>>>>> is not going to be enough for
> > >>>>>>>> many.
> > >>>>>>>>>
> > >>>>>>>>> Thoughts?
> > >>>>>>>>
> > >>>>>>>> If we consider that timestamp representation should use
> > >>>>> nanosecond
> > >>>>>>>> granularity, a 32-bit value may likely wrap around too
> > >>> quickly
> > >>>>>>>> to be useful. We can also assume that applications
> requesting
> > >>>>>>>> timestamps may care more about latency than throughput, Oleg
> > >>>>> found
> > >>>>>>>> that using the second cache line for this purpose had a
> > >>>>> noticeable impact [1].
> > >>>>>>>>
> > >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-
> October/049237.html
> > >>>>>>>
> > >>>>>>> I agree with Oleg about the latency vs. throughput importance
> > >>>>>>> for
> > >>>>> such applications.
> > >>>>>>>
> > >>>>>>> If you need high resolution timestamps, consider them to be
> > >>>>> generated by the NIC RX driver, possibly by the hardware itself
> > >>>>> (http://w3new.napatech.com/features/time-precision/hardware-
> time
> > >>>>> - stamp), so the timestamp belongs in the first cache line. And
> > >>>>> I am proposing that it should have the highest possible
> > >>>>> accuracy, which makes the value hardware dependent.
> > >>>>>>>
> > >>>>>>> Furthermore, I am arguing that we leave it up to the
> > >>> application
> > >>>>>>> to
> > >>>>> keep track of the slowly moving bits (i.e. counting whole
> > >>>>> seconds, hours and calendar date) out of band, so we don't use
> > >>>>> precious
> > >>> space
> > >>>>> in the mbuf. The application doesn't need the NIC RX driver's
> > >>>>> fast path to capture which date (or even which second) a packet
> > >>>>> was received. Yes, it adds complexity to the application, but
> we
> > >>>>> can't set aside 64 bit for a generic timestamp. Or as a weird
> tradeoff:
> > >>>>> Put the fast moving 32 bit in the first cache line and the slow
> > >>>>> moving 32 bit in the second cache line, as a placeholder for
> the
> > >>> application to fill out if needed.
> > >>>>> Yes, it means that the application needs to check the time and
> > >>>>> update its variable holding the slow moving time once every
> > >>>>> second or so; but that should be doable without significant
> effort.
> > >>>>>>
> > >>>>>> That's a good point, however without a 64 bit value, elapsed
> > >>>>>> time between two arbitrary mbufs cannot be measured reliably
> > >>>>>> due to
> > >>> not
> > >>>>>> enough context, one way or another the low resolution value is
> > >>>>>> also
> > >>>>> needed.
> > >>>>>>
> > >>>>>> Obviously latency-sensitive applications are unlikely to
> > >>>>>> perform lengthy buffering and require this but I'm not sure
> > >>>>>> about all the possible use-cases. Considering many NICs expose
> > >>>>>> 64 bit
> > >>> timestaps,
> > >>>>>> I suggest we do not truncate them.
> > >>>>>>
> > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be
> > >>>>>> tempted to fill the extra 32 bits whenever they can and negate
> > >>>>>> the performance improvement of the first cache line.
> > >>>>>
> > >>>>> I would tend to agree, and I don't really see any convenient
> way
> > >>>>> to avoid putting in a 64-bit field for the timestamp in cache-
> line 0.
> > >>>>> If we are ok with having this overlap/partially overlap with
> > >>>>> sequence number, it will use up an extra 4B of storage in that
> > >>> cacheline.
> > >>>>
> > >>>> I agree about the lack of convenience! And Adrien certainly has
> a
> > >>> point about PMD temptations.
> > >>>>
> > >>>> However, I still don't think that a NICs ability to date-stamp a
> > >>> packet is sufficient reason to put a date-stamp in cache line 0
> of
> > >>> the mbuf. Storing only the fast moving 32 bit in cache line 0
> > >>> seems like a good compromise to me.
> > >>>>
> > >>>> Maybe you can find just one more byte, so it can hold 17 minutes
> > >>>> with nanosecond resolution. (I'm joking!)
> > >>>>
> > >>>> Please don't sacrifice the sequence number for the
> > >>>> seconds/hours/days
> > >>> part a timestamp. Maybe it could be configurable to use a 32 bit
> > >>> or
> > >>> 64 bit timestamp.
> > >>>>
> > >>> Do you see both timestamp and sequence numbers being used
> together?
> > >>> I would have thought that apps would either use one or the other?
> > >>> However, your suggestion is workable in any case, to allow the
> > >>> sequence number to overlap just the high 32 bits of the
> timestamp,
> > >>> rather than the low.
> > >>
> > >> In our case, I can foresee sequence numbers used for packet
> > >> processing and
> > timestamps for timing analysis (and possibly for packet capturing,
> > when being used). For timing analysis, we don’t need long durations,
> > e.g. 4 seconds with 32 bit nanosecond resolution suffices. And for
> > packet capturing we are perfectly capable of adding the slowly moving
> > 32 bit of the timestamp to our output data stream without fetching it
> from the mbuf.
> > >>
> >
> > We should keep in mind that today we have the seqn field but it is
> not
> > used by any PMD. In case it is implemented, would it be a per-queue
> sequence number?
> > Is it useful from an application view?
> >
> > This field is only used by the librte_reorder library, and in my
> > opinion, we should consider moving it in the second cache line since
> it is not filled by the PMD.
> >
> >
> > > For the 32-bit timestamp case, it might be useful to have a
> > > right-shift value passed in to the ethdev driver. If we assume a
> NIC
> > > with nanosecond resolution, (or TSC value with resolution of that
> > > order of magnitude), then the app can choose to have 1 ns
> resolution
> > > with 4 second wraparound, or alternatively 4ns resolution with 16
> > > second wraparound, or even microsecond resolution with wrap around
> > > of over
> > an hour.
> > > The cost is obviously just a shift op in the driver code per packet
> > > - hopefully with multiple packets done at a time using vector
> operations.
> >
> >
> > About the timestamp, we can manage to find 64 bits in the first cache
> > line, without sacrifying any field we have today. The question is
> more
> > for the fields we may want to add later.
> >
> > To answer to the question of the size of the timestamp, the first
> > question is to know what is the precision required for the
> applications using it?
> 
> As part of the 17.02 latency calculation feature, the requirement is to
> report latency in nanoseconds. So, +1 on keeping timestamp as 64bits.
> Since the feature is planned for 17.02, can we finalize on the
> timestamp position in the mbuf struct?
> Based on the decision, I am planning to make the change and send ABI
> notice if needed.
> 
> Reshma
> 
> >
> > I don't quite like the idea of splitting the timestamp in the 2 cache
> > lines, I think it would not be easy to use.
> >
> >
> > Olivier

Nanosecond precision in latency calculations does not require more than 32 bit, unless we must also be able to measure more than 4 seconds of latency. And please consider how many packets we would need to hold in memory to keep track of 4 seconds of traffic on a 100 Gbit/s link (if we are measuring network latency): With an average packet size of 300 B that would be 40 million packets.

If I recall correctly, the 17.02 feature is about measuring application latency, not network latency (as my calculation above is about). I don't consider application latency measuring a common feature for most applications, except for application debugging/profiling purposes (not network debugging purposes). So do we really need both nanosecond precision and the ability to measure 4+ seconds of latency?

Furthermore, if the timestamp isn't set by hardware, why not just take the entire array of packets pulled out from the NIC and set their timestamp to the same value, instead of setting them one by one in the PMD - that would also consolidate the expense of obtaining the time from some clock source.

-Morten


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] mbuf changes
  @ 2016-10-28 13:34  3%                       ` Pattan, Reshma
  2016-10-28 14:11  0%                         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Pattan, Reshma @ 2016-10-28 13:34 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Morten Brørup

Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, October 25, 2016 1:49 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>; Morten Brørup
> <mb@smartsharesystems.com>
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> <olegk@mellanox.com>
> Subject: Re: [dpdk-dev] mbuf changes
> 
> 
> 
> On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> >> Comments at the end.
> >>
> >> Med venlig hilsen / kind regards
> >> - Morten Brørup
> >>
> >>> -----Original Message-----
> >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> >>> Sent: Tuesday, October 25, 2016 2:20 PM
> >>> To: Morten Brørup
> >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> >>> Kuporosov
> >>> Subject: Re: [dpdk-dev] mbuf changes
> >>>
> >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> >>>> Comments inline.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> >>>>> Richardson
> >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> >>>>> To: Adrien Mazarguil
> >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> >>>>> Kuporosov
> >>>>> Subject: Re: [dpdk-dev] mbuf changes
> >>>>>
> >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote:
> >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> >>>>>>> Comments inline.
> >>>>>>>
> >>>>>>> Med venlig hilsen / kind regards
> >>>>>>> - Morten Brørup
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> >>>>>>>> To: Bruce Richardson
> >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz;
> >>>>>>>> Oleg Kuporosov
> >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> >>>>>>>>
> >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> >>> wrote:
> >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> >>> wrote:
> >>>>>>>> [...]
> >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> >>>>>>>> <mb@smartsharesystems.com> wrote:
> >>>>>>>> [...]
> >>>>
> >>>>>>>>> One other point I'll mention is that we need to have a
> >>>>>>>>> discussion on how/where to add in a timestamp value into
> >>> the
> >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> >>>>> sequence
> >>>>>>>>> number value, but I also suspect that 32-bits of a
> >>> timestamp
> >>>>>>>>> is not going to be enough for
> >>>>>>>> many.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>> If we consider that timestamp representation should use
> >>>>> nanosecond
> >>>>>>>> granularity, a 32-bit value may likely wrap around too
> >>> quickly
> >>>>>>>> to be useful. We can also assume that applications requesting
> >>>>>>>> timestamps may care more about latency than throughput, Oleg
> >>>>> found
> >>>>>>>> that using the second cache line for this purpose had a
> >>>>> noticeable impact [1].
> >>>>>>>>
> >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> >>>>>>>
> >>>>>>> I agree with Oleg about the latency vs. throughput importance
> >>>>>>> for
> >>>>> such applications.
> >>>>>>>
> >>>>>>> If you need high resolution timestamps, consider them to be
> >>>>> generated by the NIC RX driver, possibly by the hardware itself
> >>>>> (http://w3new.napatech.com/features/time-precision/hardware-time-
> >>>>> stamp), so the timestamp belongs in the first cache line. And I am
> >>>>> proposing that it should have the highest possible accuracy, which
> >>>>> makes the value hardware dependent.
> >>>>>>>
> >>>>>>> Furthermore, I am arguing that we leave it up to the
> >>> application
> >>>>>>> to
> >>>>> keep track of the slowly moving bits (i.e. counting whole seconds,
> >>>>> hours and calendar date) out of band, so we don't use precious
> >>> space
> >>>>> in the mbuf. The application doesn't need the NIC RX driver's fast
> >>>>> path to capture which date (or even which second) a packet was
> >>>>> received. Yes, it adds complexity to the application, but we can't
> >>>>> set aside 64 bit for a generic timestamp. Or as a weird tradeoff:
> >>>>> Put the fast moving 32 bit in the first cache line and the slow
> >>>>> moving 32 bit in the second cache line, as a placeholder for the
> >>> application to fill out if needed.
> >>>>> Yes, it means that the application needs to check the time and
> >>>>> update its variable holding the slow moving time once every second
> >>>>> or so; but that should be doable without significant effort.
> >>>>>>
> >>>>>> That's a good point, however without a 64 bit value, elapsed time
> >>>>>> between two arbitrary mbufs cannot be measured reliably due to
> >>> not
> >>>>>> enough context, one way or another the low resolution value is
> >>>>>> also
> >>>>> needed.
> >>>>>>
> >>>>>> Obviously latency-sensitive applications are unlikely to perform
> >>>>>> lengthy buffering and require this but I'm not sure about all the
> >>>>>> possible use-cases. Considering many NICs expose 64 bit
> >>> timestaps,
> >>>>>> I suggest we do not truncate them.
> >>>>>>
> >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be tempted
> >>>>>> to fill the extra 32 bits whenever they can and negate the
> >>>>>> performance improvement of the first cache line.
> >>>>>
> >>>>> I would tend to agree, and I don't really see any convenient way
> >>>>> to avoid putting in a 64-bit field for the timestamp in cache-line 0.
> >>>>> If we are ok with having this overlap/partially overlap with
> >>>>> sequence number, it will use up an extra 4B of storage in that
> >>> cacheline.
> >>>>
> >>>> I agree about the lack of convenience! And Adrien certainly has a
> >>> point about PMD temptations.
> >>>>
> >>>> However, I still don't think that a NICs ability to date-stamp a
> >>> packet is sufficient reason to put a date-stamp in cache line 0 of
> >>> the mbuf. Storing only the fast moving 32 bit in cache line 0 seems
> >>> like a good compromise to me.
> >>>>
> >>>> Maybe you can find just one more byte, so it can hold 17 minutes
> >>>> with nanosecond resolution. (I'm joking!)
> >>>>
> >>>> Please don't sacrifice the sequence number for the
> >>>> seconds/hours/days
> >>> part a timestamp. Maybe it could be configurable to use a 32 bit or
> >>> 64 bit timestamp.
> >>>>
> >>> Do you see both timestamp and sequence numbers being used together?
> >>> I would have thought that apps would either use one or the other?
> >>> However, your suggestion is workable in any case, to allow the
> >>> sequence number to overlap just the high 32 bits of the timestamp,
> >>> rather than the low.
> >>
> >> In our case, I can foresee sequence numbers used for packet processing and
> timestamps for timing analysis (and possibly for packet capturing, when being
> used). For timing analysis, we don’t need long durations, e.g. 4 seconds with 32
> bit nanosecond resolution suffices. And for packet capturing we are perfectly
> capable of adding the slowly moving 32 bit of the timestamp to our output data
> stream without fetching it from the mbuf.
> >>
> 
> We should keep in mind that today we have the seqn field but it is not used by
> any PMD. In case it is implemented, would it be a per-queue sequence number?
> Is it useful from an application view?
> 
> This field is only used by the librte_reorder library, and in my opinion, we should
> consider moving it in the second cache line since it is not filled by the PMD.
> 
> 
> > For the 32-bit timestamp case, it might be useful to have a
> > right-shift value passed in to the ethdev driver. If we assume a NIC
> > with nanosecond resolution, (or TSC value with resolution of that
> > order of magnitude), then the app can choose to have 1 ns resolution
> > with 4 second wraparound, or alternatively 4ns resolution with 16
> > second wraparound, or even microsecond resolution with wrap around of over
> an hour.
> > The cost is obviously just a shift op in the driver code per packet -
> > hopefully with multiple packets done at a time using vector operations.
> 
> 
> About the timestamp, we can manage to find 64 bits in the first cache line,
> without sacrifying any field we have today. The question is more for the fields
> we may want to add later.
> 
> To answer to the question of the size of the timestamp, the first question is to
> know what is the precision required for the applications using it?

As part of the 17.02 latency calculation feature, the requirement is to report latency in nanoseconds. So, +1 on keeping timestamp as 64bits. 
Since the feature is planned for 17.02, can we finalize on the timestamp position in the mbuf struct? 
Based on the decision, I am planning to make the change and send ABI notice if needed.

Reshma

> 
> I don't quite like the idea of splitting the timestamp in the 2 cache lines, I think it
> would not be easy to use.
> 
> 
> Olivier

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 00/21] Introduce SoC device/driver framework for EAL
  2016-10-28 12:26  2%   ` [dpdk-dev] [PATCH v7 " Shreyansh Jain
@ 2016-10-28 12:35  0%     ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-10-28 12:35 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon

On Friday 28 October 2016 05:56 PM, Shreyansh Jain wrote:
> Introduction:
> =============
>
> This patch set is direct derivative of Jan's original series [1],[2].
>
>  - This version is based on master HEAD (ca41215)
>
>  - In this, I am merging the series [11] back. It was initially part
>    of this set but I had split considering that those changes in PCI
>    were good standalone as well. But, 1) not much feedback was avail-
>    able and 2) this patchset is a use-case for those patches making
>    it easier to review. Just like what Jan had intended in original
>    series.
>
>  - SoC support is not enabled by default. It needs the 'enable-soc' toggle
>    on command line. This is primarily because this patchset is still
>    experimental and we would like to keep it isolated from non-SoC ops.
>    Though, it does impact the ABI.

Sending v7 as patch 11/21 of v6 wasn't received by mailing list leading 
to automated build failure. No other change has been done.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7 00/21] Introduce SoC device/driver framework for EAL
  2016-10-27 15:17  2% ` [dpdk-dev] [PATCH v6 " Shreyansh Jain
@ 2016-10-28 12:26  2%   ` Shreyansh Jain
  2016-10-28 12:35  0%     ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-10-28 12:26 UTC (permalink / raw)
  To: dev; +Cc: Shreyansh Jain, thomas.monjalon, viktorin

Introduction:
=============

This patch set is direct derivative of Jan's original series [1],[2].

 - This version is based on master HEAD (ca41215)
 
 - In this, I am merging the series [11] back. It was initially part
   of this set but I had split considering that those changes in PCI
   were good standalone as well. But, 1) not much feedback was avail-
   able and 2) this patchset is a use-case for those patches making
   it easier to review. Just like what Jan had intended in original
   series.

 - SoC support is not enabled by default. It needs the 'enable-soc' toggle
   on command line. This is primarily because this patchset is still
   experimental and we would like to keep it isolated from non-SoC ops.
   Though, it does impact the ABI.

Aim:
====

As of now EAL is primarly focused on PCI initialization/probing.

 rte_eal_init()
  |- rte_eal_pci_init(): Find PCI devices from sysfs
  |- ...
  |- rte_eal_memzone_init()
  |- ...
  `- rte_eal_pci_probe(): Driver<=>Device initialization

This patchset introduces SoC framework which would enable SoC drivers and
drivers to be plugged into EAL, very similar to how PCI drivers/devices are
done today.

This is a stripped down version of PCI framework which allows the SoC PMDs
to implement their own routines for detecting devices and linking devices to
drivers.

1) Changes to EAL
 rte_eal_init()
  |- rte_eal_pci_init(): Find PCI devices from sysfs
  |- rte_eal_soc_init(): Calls PMDs->scan_fn
  |- ...
  |- rte_eal_memzone_init()
  |- ...
  |- rte_eal_pci_probe(): Driver<=>Device initialization, PMD->devinit()
  `- rte_eal_soc_probe(): Calls PMDs->match_fn and PMDs->devinit();

2) New device/driver structures:
  - rte_soc_driver (inheriting rte_driver)
  - rte_soc_device (inheriting rte_device)
  - rte_eth_dev and eth_driver embedded rte_soc_device and rte_soc_driver,
    respectively.

3) The SoC PMDs need to:
 - define rte_soc_driver with necessary scan and match callbacks
 - Register themselves using DRIVER_REGISTER_SOC()
 - Implement respective bus scanning in the scan callbacks to add necessary
   devices to SoC device list
 - Implement necessary eth_dev_init/uninint for ethernet instances

4) Design considerations that are same as PCI:
 - SoC initialization is being done through rte_eal_init(), just after PCI
   initialization is done.
 - As in case of PCI, probe is done after rte_eal_pci_probe() to link the
   devices detected with the drivers registered.
 - Device attach/detach functions are available and have been designed on
   the lines of PCI framework.
 - PMDs register using DRIVER_REGISTER_SOC, very similar to
   DRIVER_REGISTER_PCI for PCI devices.
 - Linked list of SoC driver and devices exists independent of the other
   driver/device list, but inheriting rte_driver/rte_driver, these are
   also part of a global list.

5) Design considerations that are different from PCI:
 - Each driver implements its own scan and match function. PCI uses the BDF
   format to read the device from sysfs, but this _may_not_ be a case for a
   SoC ethernet device.
   = This is an important change from initial proposal by Jan in [2].
   Unlike his attempt to use /sys/bus/platform, this patch relies on the
   PMD to detect the devices. This is because SoC may require specific or
   additional info for device detection. Further, SoC may have embedded
   devices/MACs which require initialization which cannot be covered
   through sysfs parsing.
   `-> Point (6) below is a side note to above.
   = PCI based PMDs rely on EAL's capability to detect devices. This
   proposal puts the onus on PMD to detect devices, add to soc_device_list
   and wait for Probe. Matching, of device<=>driver is again PMD's
   callback.

6) Adding default scan and match helpers for PMDs
 - The design warrrants the PMDs implement their own scan of devices
   on bus, and match routines for probe implementation.
   This patch introduces helpers which can be used by PMDs for scan of
   the platform bus and matching devices against the compatible string
   extracted from the scan.
 - Intention is to make it easier to integrate known SoC which expose
   platform bus compliant information (compat, sys/bus/platform...).
 - PMDs which have deviations from this standard model can implement and
   hook their bus scanning and probe match callbacks while registering
   driver.

Patchset Overview:
==================
 - Patches 0001~0004 are from [11] - moving some PCI specific functions
   and definitions to non-PCI area.
 - Patches 0005~0008 introduce the base infrastructure and test case
 - Patch 0009 is for command line support for no-soc, on lines of no-pci
 - Patch 0010 enables EAL to handle SoC type devices
 - Patch 0011 adds support for scan and probe callbacks and updates the test
   framework with relevant test case.
 - Patch 0012~0014 enable device argument, driver specific flags and
   interrupt handling related basic infra. Subsequent patches build up on
   them.
 - Patch 0015~0016 add support for default function which PMDs can use for
   scanning platform bus. These functions are optional and need to be hooked
   to by PMDs.
 - Patch 0017~0019 makes changes to PCI as well as ethdev code to remove
   assumption that eth_driver is a PCI driver.
 - Patch 0020 adds necessary ethdev probe/remove functions for PMDs to use
 - Patch 0021 adds support for SoC driver/devices, along with probe/remove
   functions for Cryptodev devices.

Future/Pending Changes:
=======================
- Device whitelisting/blacklist still relies on command line '-b' and '-c'
  which are internally implemented using OPT_PCI_BLACKLIST/OPT_PCI_WHITELIST.
  This needs to be changed to a generic form - OPT_DEV_*LIST - probably.
- No cryptodriver currently uses SoC framework - probably a example driver
  can be created to demonstrate usage.
- test case for enable-soc command line parameter
- This patch impacts a couple of ABIs (rte_device/driver) and thus require
  a patch for bump of libraries (if need be) and documentation.

 [1] http://dpdk.org/ml/archives/dev/2016-January/030915.html
 [2] http://www.dpdk.org/ml/archives/dev/2016-May/038486.html
 [3] http://dpdk.org/ml/archives/dev/2016-August/045707.html
 [4] http://dpdk.org/ml/archives/dev/2016-May/038948.html
 [5] http://dpdk.org/ml/archives/dev/2016-May/038953.html
 [6] http://dpdk.org/ml/archives/dev/2016-May/038487.html
 [7] http://dpdk.org/ml/archives/dev/2016-May/038488.html
 [8] http://dpdk.org/ml/archives/dev/2016-May/038489.html
 [9] http://dpdk.org/ml/archives/dev/2016-May/038491.html
[10] http://dpdk.org/ml/archives/dev/2016-September/046256.html
[11] http://dpdk.org/ml/archives/dev/2016-October/048915.html

Changes since v6;
 - Fix patch rebase over HEAD (ca41215)

Changes since v5:
 - fix devinit/devuninit name change; it was in wrong patch
 - rebased over HEAD (ca41215)
 - Update to pending section of coverletter

Changes since v4:
 - change name of rte_eal_soc_scan function name to
   rte_eal_soc_scan_platform_bus. This still remains a helper function.
 - Fix comments over scan and match functions

Changes since v3:
 - rebasing over HEAD (fed622df tag: v16.11-rc1)
 - Add support for default scan function; PMD can use this for
   scanning on platform bus.
 - Support for kernel driver bind/unbind, numa and DMA from
   Jan's original patches.
 - SoC is disabled by default. '--enable-soc' command line parameter
   enables it. doc updated.
 - Updated testcase function names and comments
 - Map file addition alphabetically ordered
 - Patch author corrected

Changes since v2:
 - Rebasing over rte_driver/device patchset v9 [10]
 - Added cryptodev support for SoC
 - Default match function for SoC device<=>Driver
 - Some variables renamed to reflect 'drv' rather than 'dr'

Change since v1 [2]:
 - Removed patch 1-5 which were for generalizing some PCI specific routines
   into EAL. These patches are good-to-have but not directly linked to SoC
   and hence would be proposed separately.
 - Removed support for sysfs parsing (patches 6~9)
 - Rebasing over the recent (v8) version of rte_driver/device patchset
 - Rebasing over master (16.07)
 - Changes to various map file to change API intro to 16.11 from 16.07

Jan Viktorin (19):
  eal: generalize PCI kernel driver enum to EAL
  eal: generalize PCI map/unmap resource to EAL
  eal/linux: generalize PCI kernel unbinding driver to EAL
  eal/linux: generalize PCI kernel driver extraction to EAL
  eal: define container macro
  eal/soc: introduce very essential SoC infra definitions
  eal/soc: add SoC PMD register/unregister logic
  eal/soc: implement SoC device list and dump
  eal: introduce command line enable SoC option
  eal/soc: init SoC infra from EAL
  eal/soc: extend and utilize devargs
  eal/soc: add drv_flags
  eal/soc: add intr_handle
  eal/soc: add default scan for Soc devices
  eal/soc: additional features for SoC
  ether: utilize container_of for pci_drv
  ether: verify we copy info from a PCI device
  ether: extract function eth_dev_get_intr_handle
  ether: introduce ethernet dev probe remove

Shreyansh Jain (2):
  eal/soc: implement probing of drivers
  eal/crypto: Support rte_soc_driver/device for cryptodev

 app/test/Makefile                               |   1 +
 app/test/test_soc.c                             | 404 +++++++++++++++++++
 doc/guides/testpmd_app_ug/run_app.rst           |   4 +
 lib/librte_cryptodev/rte_cryptodev.c            | 122 +++++-
 lib/librte_cryptodev/rte_cryptodev.h            |   3 +
 lib/librte_cryptodev/rte_cryptodev_pmd.h        |  18 +-
 lib/librte_cryptodev/rte_cryptodev_version.map  |   2 +
 lib/librte_eal/bsdapp/eal/Makefile              |   1 +
 lib/librte_eal/bsdapp/eal/eal.c                 |  18 +
 lib/librte_eal/bsdapp/eal/eal_pci.c             |   6 +-
 lib/librte_eal/bsdapp/eal/eal_soc.c             |  46 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  11 +
 lib/librte_eal/common/Makefile                  |   2 +-
 lib/librte_eal/common/eal_common_dev.c          |  66 ++-
 lib/librte_eal/common/eal_common_devargs.c      |  17 +
 lib/librte_eal/common/eal_common_options.c      |   5 +
 lib/librte_eal/common/eal_common_pci.c          |  39 --
 lib/librte_eal/common/eal_common_pci_uio.c      |  16 +-
 lib/librte_eal/common/eal_common_soc.c          | 368 +++++++++++++++++
 lib/librte_eal/common/eal_internal_cfg.h        |   1 +
 lib/librte_eal/common/eal_options.h             |   2 +
 lib/librte_eal/common/eal_private.h             |  64 +++
 lib/librte_eal/common/include/rte_common.h      |  18 +
 lib/librte_eal/common/include/rte_dev.h         |  44 ++
 lib/librte_eal/common/include/rte_devargs.h     |   8 +
 lib/librte_eal/common/include/rte_pci.h         |  41 --
 lib/librte_eal/common/include/rte_soc.h         | 322 +++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile            |   2 +
 lib/librte_eal/linuxapp/eal/eal.c               |  63 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c           |  62 +--
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c       |   2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c      |   5 +-
 lib/librte_eal/linuxapp/eal/eal_soc.c           | 515 ++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  11 +
 lib/librte_ether/rte_ethdev.c                   | 166 +++++++-
 lib/librte_ether/rte_ethdev.h                   |  33 +-
 36 files changed, 2343 insertions(+), 165 deletions(-)
 create mode 100644 app/test/test_soc.c
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_soc.c
 create mode 100644 lib/librte_eal/common/eal_common_soc.c
 create mode 100644 lib/librte_eal/common/include/rte_soc.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_soc.c

-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library
    2016-10-28  1:04  2% ` [dpdk-dev] [RFC PATCH v2 1/3] lib: add information metrics library Remy Horton
@ 2016-10-28  1:04  2% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-10-28  1:04 UTC (permalink / raw)
  To: dev

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 126 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 lib/librte_metrics/rte_metrics.c                   |  22 ++--
 lib/librte_metrics/rte_metrics.h                   |   4 +-
 mk/rte.app.mk                                      |   1 +
 11 files changed, 291 insertions(+), 12 deletions(-)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/config/common_base b/config/common_base
index c23a632..e778c00 100644
--- a/config/common_base
+++ b/config/common_base
@@ -597,3 +597,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the device metrics library
 #
 CONFIG_RTE_LIBRTE_METRICS=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index ca50fa6..91e8ea6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -148,4 +148,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [Device Metrics]     (@ref rte_metrics.h),
+  [Bitrate Statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fe830eb..8765ddd 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -58,6 +58,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_ring \
                           lib/librte_sched \
                           lib/librte_metrics \
+                          lib/librte_bitratestats \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/lib/Makefile b/lib/Makefile
index 5d85dcf..e211bc0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -59,6 +59,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..b725d4e
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..fcdf401
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,126 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+
+struct rte_stats_bitrate_s {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates_s {
+	struct rte_stats_bitrate_s port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+
+struct rte_stats_bitrates_s *rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates_s), 0);
+}
+
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data)
+{
+	const char *names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	bitrate_data = rte_stats_bitrate_create();
+	if (bitrate_data == NULL)
+		rte_exit(EXIT_FAILURE, "Could not allocate bitrate data.\n");
+	return_value = rte_metrics_reg_metrics(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate_s *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	delta = (delta * alpha_percent) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_metrics(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..cd566d6
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+
+/**
+ *  Bitrate statistics data structure
+ */
+struct rte_stats_bitrates_s;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates_s *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates_s *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period this function
+ * is called should be the intended time window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates_s *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..9de6be9
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_16.11 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
index 354ba47..7dce738 100644
--- a/lib/librte_metrics/rte_metrics.c
+++ b/lib/librte_metrics/rte_metrics.c
@@ -44,6 +44,8 @@
 #define RTE_METRICS_MAX_METRICS 256
 
 
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
 /**
  * Internal stats metadata and value entry.
  *
@@ -62,7 +64,7 @@
  * having a separate set metadata table doesn't save any memory.
  */
 struct rte_metrics_meta_s {
-	struct rte_stat_name name;
+	char name[RTE_METRICS_MAX_NAME_LEN];
 	uint64_t value[RTE_MAX_ETHPORTS];
 	uint16_t idx_next_set;
 	uint16_t idx_next_stat;
@@ -100,10 +102,10 @@ rte_metrics_init(void)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return;
 
-	memzone = rte_memzone_lookup("RTE_STATS");
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
 	if (memzone != NULL)
 		return;
-	memzone = rte_memzone_reserve("RTE_STATS",
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
 		sizeof(struct rte_metrics_data_s), rte_socket_id(), 0);
 	if (memzone == NULL)
 		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
@@ -135,7 +137,7 @@ rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
 		return -EINVAL;
 
 	rte_metrics_init();
-	memzone = rte_memzone_lookup("RTE_STATS");
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
 	if (memzone == NULL)
 		return -EIO;
 	stats = memzone->addr;
@@ -150,7 +152,7 @@ rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
 
 	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
 		entry = &stats->metadata[idx_name + stats->cnt_stats];
-		strncpy(entry->name.name, names[idx_name],
+		strncpy(entry->name, names[idx_name],
 			RTE_METRICS_MAX_NAME_LEN);
 		memset(entry->value, 0, sizeof(entry->value));
 		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
@@ -184,7 +186,7 @@ rte_metrics_update_metrics(int port_id,
 	uint16_t cnt_setsize;
 
 	rte_metrics_init();
-	memzone = rte_memzone_lookup("RTE_STATS");
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
 	if (memzone == NULL)
 		return -EIO;
 	stats = memzone->addr;
@@ -211,14 +213,14 @@ rte_metrics_update_metrics(int port_id,
 
 
 int
-rte_metrics_get_names(struct rte_stat_name *names,
+rte_metrics_get_names(struct rte_metric_name *names,
 	uint16_t capacity)
 {
 	struct rte_metrics_data_s *stats;
 	const struct rte_memzone *memzone;
 	uint16_t idx_name;
 
-	memzone = rte_memzone_lookup("RTE_STATS");
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
 	/* If not allocated, fail silently */
 	if (memzone == NULL)
 		return 0;
@@ -229,7 +231,7 @@ rte_metrics_get_names(struct rte_stat_name *names,
 			return -ERANGE;
 		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
 			strncpy(names[idx_name].name,
-				stats->metadata[idx_name].name.name,
+				stats->metadata[idx_name].name,
 				RTE_METRICS_MAX_NAME_LEN);
 	}
 	return stats->cnt_stats;
@@ -246,7 +248,7 @@ rte_metrics_get_values(int port_id,
 	const struct rte_memzone *memzone;
 	uint16_t idx_name;
 
-	memzone = rte_memzone_lookup("RTE_STATS");
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
 	/* If not allocated, fail silently */
 	if (memzone == NULL)
 		return 0;
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
index e8888eb..348db30 100644
--- a/lib/librte_metrics/rte_metrics.h
+++ b/lib/librte_metrics/rte_metrics.h
@@ -52,7 +52,7 @@
 /**
  * Statistic name
  */
-struct rte_stat_name {
+struct rte_metric_name {
 	/** String describing statistic */
 	char name[RTE_METRICS_MAX_NAME_LEN];
 };
@@ -130,7 +130,7 @@ int rte_metrics_reg_metrics(const char **names, uint16_t cnt_names);
  *   - Negative: Failure
  */
 int rte_metrics_get_names(
-	struct rte_stat_name *names,
+	struct rte_metric_name *names,
 	uint16_t capacity);
 
 /**
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 11d0c80..c499c69 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [RFC PATCH v2 1/3] lib: add information metrics library
  @ 2016-10-28  1:04  2% ` Remy Horton
  2016-10-28  1:04  2% ` [dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2016-10-28  1:04 UTC (permalink / raw)
  To: dev

This patch adds a new information metric library that allows other
modules to register named metrics and update their values. It is
intended to be independent of ethdev, rather than mixing ethdev
and non-ethdev information in xstats.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 ++++++
 lib/librte_metrics/rte_metrics.c           | 265 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 200 ++++++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 9 files changed, 539 insertions(+)
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/config/common_base b/config/common_base
index f5d2eff..c23a632 100644
--- a/config/common_base
+++ b/config/common_base
@@ -592,3 +592,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..ca50fa6 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -147,4 +147,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [Device Metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..fe830eb 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -57,6 +57,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_reorder \
                           lib/librte_ring \
                           lib/librte_sched \
+                          lib/librte_metrics \
                           lib/librte_table \
                           lib/librte_timer \
                           lib/librte_vhost
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..5d85dcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..354ba47
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,265 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+
+
+#define RTE_METRICS_MAX_METRICS 256
+
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ * @param name
+ *   Name of metric
+ * @param value
+ *   Current value for metric
+ * @param idx_next_set
+ *   Index of next root element (zero for none)
+ * @param idx_next_metric
+ *   Index of next metric in set (zero for none)
+ *
+ * Only the root of each set needs idx_next_set but since it has to be
+ * assumed that number of sets could equal total number of metrics,
+ * having a separate set metadata table doesn't save any memory.
+ */
+struct rte_metrics_meta_s {
+	struct rte_stat_name name;
+	uint64_t value[RTE_MAX_ETHPORTS];
+	uint16_t idx_next_set;
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * @param idx_last_set
+ *   Index of last metadata entry with valid data. This value is
+ *   not valid if cnt_stats is zero.
+ * @param cnt_stats
+ *   Number of metrics.
+ * @param metadata
+ *   Stat data memory block.
+ *
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	uint16_t idx_last_set;
+	uint16_t cnt_stats;
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+};
+
+
+void
+rte_metrics_init(void)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup("RTE_STATS");
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve("RTE_STATS",
+		sizeof(struct rte_metrics_data_s), rte_socket_id(), 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+}
+
+
+int
+rte_metrics_reg_metric(const char *name)
+{
+	const char *list_names[] = {name};
+
+	return rte_metrics_reg_metrics(list_names, 1);
+}
+
+
+int
+rte_metrics_reg_metrics(const char **names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup("RTE_STATS");
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name.name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	return idx_base;
+}
+
+
+int
+rte_metrics_update_metric(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_metrics(port_id, key, &value, 1);
+}
+
+
+int
+rte_metrics_update_metrics(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	rte_metrics_init();
+	memzone = rte_memzone_lookup("RTE_STATS");
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize)
+		return -ERANGE;
+
+	for (idx_value = 0; idx_value < count; idx_value++) {
+		idx_metric = key + idx_value;
+		stats->metadata[idx_metric].value[port_id] = values[idx_value];
+	}
+	return 0;
+}
+
+
+int
+rte_metrics_get_names(struct rte_stat_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+
+	memzone = rte_memzone_lookup("RTE_STATS");
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats)
+			return -ERANGE;
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name.name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return stats->cnt_stats;
+}
+
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_stat_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+
+	memzone = rte_memzone_lookup("RTE_STATS");
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats)
+			return -ERANGE;
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++) {
+			entry = &stats->metadata[idx_name];
+			values[idx_name].key = idx_name;
+			values[idx_name].value = entry->value[port_id];
+		}
+	}
+	return stats->cnt_stats;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..e8888eb
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,200 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * RTE Statistics module
+ *
+ * Statistic information is populated using callbacks, each of which
+ * is associated with one or more metric names. When queried, the
+ * callback is used to update all metric in the set at once. Currently
+ * only bulk querying of all metric is supported.
+ *
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of statistic name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/**
+ * Statistic name
+ */
+struct rte_stat_name {
+	/** String describing statistic */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Statistic name.
+ */
+struct rte_stat_value {
+	/** Numeric identifier of statistic */
+	uint16_t key;
+	/** Value for statistic */
+	uint64_t value;
+};
+
+
+/**
+ * Initialises statistic module. This only has to be explicitly called if you
+ * intend to use rte_metrics_reg_metric() or rte_metrics_reg_metrics() from a
+ * secondary process.
+ */
+void rte_metrics_init(void);
+
+
+/**
+ * Register a statistic and its associated callback.
+ *
+ * @param name
+ *   Statistic name
+ *
+ * @param callback
+ *   Callback to use when fetching statistic
+ *
+ * @param data
+ *   Data pointer to pass to callback
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metric(const char *name);
+
+/**
+ * Register a set of statistic and their associated callback.
+ *
+ * @param names
+ *   List of statistic names
+ *
+ * @param cnt_names
+ *   Number of statistics in set
+ *
+ * @param callback
+ *   Callback to use when fetching statsitics
+ *
+ * @param data
+ *   Data pointer to pass to callback
+ *
+ * @return
+ *  - Zero or positive: Success
+ *  - Negative: Failure
+ */
+int rte_metrics_reg_metrics(const char **names, uint16_t cnt_names);
+
+/**
+ * Get statistic name-key lookup table.
+ *
+ * @param names
+ *   Array of names to receive key names
+ *
+ * @param capacity
+ *   Space available in names
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_names(
+	struct rte_stat_name *names,
+	uint16_t capacity);
+
+/**
+ * Fetch statistics.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   Array to receive values and their keys
+ *
+ * @param capacity
+ *   Space available in values
+ *
+ * @return
+ *   - Non-negative: Success (number of names)
+ *   - Negative: Failure
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_stat_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a statistic metric
+ *
+ * @param port_id
+ *   Port to update statistics for
+ * @param key
+ *   Id of statistic metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared statistics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metric(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a statistic metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update statistics for
+ * @param key
+ *   Base id of statistics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds statistic set size
+ *   - -EIO if upable to access shared statistics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_metrics(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..a31a80a
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_16.11 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_metric;
+	rte_metrics_reg_metrics;
+	rte_metrics_update_metric;
+	rte_metrics_update_metrics;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index ac50a21..11d0c80 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -97,6 +97,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v6 00/21] Introduce SoC device/driver framework for EAL
  @ 2016-10-27 15:17  2% ` Shreyansh Jain
  2016-10-28 12:26  2%   ` [dpdk-dev] [PATCH v7 " Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-10-27 15:17 UTC (permalink / raw)
  To: dev; +Cc: Shreyansh Jain, thomas.monjalon, viktorin

Introduction:
=============

This patch set is direct derivative of Jan's original series [1],[2].

 - This version is based on master HEAD (ca41215)
 
 - In this, I am merging the series [11] back. It was initially part
   of this set but I had split considering that those changes in PCI
   were good standalone as well. But, 1) not much feedback was avail-
   able and 2) this patchset is a use-case for those patches making
   it easier to review. Just like what Jan had intended in original
   series.

 - SoC support is not enabled by default. It needs the 'enable-soc' toggle
   on command line. This is primarily because this patchset is still
   experimental and we would like to keep it isolated from non-SoC ops.
   Though, it does impact the ABI.

Aim:
====

As of now EAL is primarly focused on PCI initialization/probing.

 rte_eal_init()
  |- rte_eal_pci_init(): Find PCI devices from sysfs
  |- ...
  |- rte_eal_memzone_init()
  |- ...
  `- rte_eal_pci_probe(): Driver<=>Device initialization

This patchset introduces SoC framework which would enable SoC drivers and
drivers to be plugged into EAL, very similar to how PCI drivers/devices are
done today.

This is a stripped down version of PCI framework which allows the SoC PMDs
to implement their own routines for detecting devices and linking devices to
drivers.

1) Changes to EAL
 rte_eal_init()
  |- rte_eal_pci_init(): Find PCI devices from sysfs
  |- rte_eal_soc_init(): Calls PMDs->scan_fn
  |- ...
  |- rte_eal_memzone_init()
  |- ...
  |- rte_eal_pci_probe(): Driver<=>Device initialization, PMD->devinit()
  `- rte_eal_soc_probe(): Calls PMDs->match_fn and PMDs->devinit();

2) New device/driver structures:
  - rte_soc_driver (inheriting rte_driver)
  - rte_soc_device (inheriting rte_device)
  - rte_eth_dev and eth_driver embedded rte_soc_device and rte_soc_driver,
    respectively.

3) The SoC PMDs need to:
 - define rte_soc_driver with necessary scan and match callbacks
 - Register themselves using DRIVER_REGISTER_SOC()
 - Implement respective bus scanning in the scan callbacks to add necessary
   devices to SoC device list
 - Implement necessary eth_dev_init/uninint for ethernet instances

4) Design considerations that are same as PCI:
 - SoC initialization is being done through rte_eal_init(), just after PCI
   initialization is done.
 - As in case of PCI, probe is done after rte_eal_pci_probe() to link the
   devices detected with the drivers registered.
 - Device attach/detach functions are available and have been designed on
   the lines of PCI framework.
 - PMDs register using DRIVER_REGISTER_SOC, very similar to
   DRIVER_REGISTER_PCI for PCI devices.
 - Linked list of SoC driver and devices exists independent of the other
   driver/device list, but inheriting rte_driver/rte_driver, these are
   also part of a global list.

5) Design considerations that are different from PCI:
 - Each driver implements its own scan and match function. PCI uses the BDF
   format to read the device from sysfs, but this _may_not_ be a case for a
   SoC ethernet device.
   = This is an important change from initial proposal by Jan in [2].
   Unlike his attempt to use /sys/bus/platform, this patch relies on the
   PMD to detect the devices. This is because SoC may require specific or
   additional info for device detection. Further, SoC may have embedded
   devices/MACs which require initialization which cannot be covered
   through sysfs parsing.
   `-> Point (6) below is a side note to above.
   = PCI based PMDs rely on EAL's capability to detect devices. This
   proposal puts the onus on PMD to detect devices, add to soc_device_list
   and wait for Probe. Matching, of device<=>driver is again PMD's
   callback.

6) Adding default scan and match helpers for PMDs
 - The design warrrants the PMDs implement their own scan of devices
   on bus, and match routines for probe implementation.
   This patch introduces helpers which can be used by PMDs for scan of
   the platform bus and matching devices against the compatible string
   extracted from the scan.
 - Intention is to make it easier to integrate known SoC which expose
   platform bus compliant information (compat, sys/bus/platform...).
 - PMDs which have deviations from this standard model can implement and
   hook their bus scanning and probe match callbacks while registering
   driver.

Patchset Overview:
==================
 - Patches 0001~0004 are from [11] - moving some PCI specific functions
   and definitions to non-PCI area.
 - Patches 0005~0008 introduce the base infrastructure and test case
 - Patch 0009 is for command line support for no-soc, on lines of no-pci
 - Patch 0010 enables EAL to handle SoC type devices
 - Patch 0011 adds support for scan and probe callbacks and updates the test
   framework with relevant test case.
 - Patch 0012~0014 enable device argument, driver specific flags and
   interrupt handling related basic infra. Subsequent patches build up on
   them.
 - Patch 0015~0016 add support for default function which PMDs can use for
   scanning platform bus. These functions are optional and need to be hooked
   to by PMDs.
 - Patch 0017~0019 makes changes to PCI as well as ethdev code to remove
   assumption that eth_driver is a PCI driver.
 - Patch 0020 adds necessary ethdev probe/remove functions for PMDs to use
 - Patch 0021 adds support for SoC driver/devices, along with probe/remove
   functions for Cryptodev devices.

Future/Pending Changes:
=======================
- Device whitelisting/blacklist still relies on command line '-b' and '-c'
  which are internally implemented using OPT_PCI_BLACKLIST/OPT_PCI_WHITELIST.
  This needs to be changed to a generic form - OPT_DEV_*LIST - probably.
- No cryptodriver currently uses SoC framework - probably a example driver
  can be created to demonstrate usage.
- test case for enable-soc command line parameter
- This patch impacts a couple of ABIs (rte_device/driver) and thus require
  a patch for bump of libraries (if need be) and documentation.

 [1] http://dpdk.org/ml/archives/dev/2016-January/030915.html
 [2] http://www.dpdk.org/ml/archives/dev/2016-May/038486.html
 [3] http://dpdk.org/ml/archives/dev/2016-August/045707.html
 [4] http://dpdk.org/ml/archives/dev/2016-May/038948.html
 [5] http://dpdk.org/ml/archives/dev/2016-May/038953.html
 [6] http://dpdk.org/ml/archives/dev/2016-May/038487.html
 [7] http://dpdk.org/ml/archives/dev/2016-May/038488.html
 [8] http://dpdk.org/ml/archives/dev/2016-May/038489.html
 [9] http://dpdk.org/ml/archives/dev/2016-May/038491.html
[10] http://dpdk.org/ml/archives/dev/2016-September/046256.html
[11] http://dpdk.org/ml/archives/dev/2016-October/048915.html

Changes since v5:
 - fix devinit/devuninit name change; it was in wrong patch
 - rebased over HEAD (ca41215)
 - Update to pending section of coverletter

Changes since v4:
 - change name of rte_eal_soc_scan function name to
   rte_eal_soc_scan_platform_bus. This still remains a helper function.
 - Fix comments over scan and match functions

Changes since v3:
 - rebasing over HEAD (fed622df tag: v16.11-rc1)
 - Add support for default scan function; PMD can use this for
   scanning on platform bus.
 - Support for kernel driver bind/unbind, numa and DMA from
   Jan's original patches.
 - SoC is disabled by default. '--enable-soc' command line parameter
   enables it. doc updated.
 - Updated testcase function names and comments
 - Map file addition alphabetically ordered
 - Patch author corrected

Changes since v2:
 - Rebasing over rte_driver/device patchset v9 [10]
 - Added cryptodev support for SoC
 - Default match function for SoC device<=>Driver
 - Some variables renamed to reflect 'drv' rather than 'dr'

Change since v1 [2]:
 - Removed patch 1-5 which were for generalizing some PCI specific routines
   into EAL. These patches are good-to-have but not directly linked to SoC
   and hence would be proposed separately.
 - Removed support for sysfs parsing (patches 6~9)
 - Rebasing over the recent (v8) version of rte_driver/device patchset
 - Rebasing over master (16.07)
 - Changes to various map file to change API intro to 16.11 from 16.07

Jan Viktorin (19):
  eal: generalize PCI kernel driver enum to EAL
  eal: generalize PCI map/unmap resource to EAL
  eal/linux: generalize PCI kernel unbinding driver to EAL
  eal/linux: generalize PCI kernel driver extraction to EAL
  eal: define container macro
  eal/soc: introduce very essential SoC infra definitions
  eal/soc: add SoC PMD register/unregister logic
  eal/soc: implement SoC device list and dump
  eal: introduce command line enable SoC option
  eal/soc: init SoC infra from EAL
  eal/soc: extend and utilize devargs
  eal/soc: add drv_flags
  eal/soc: add intr_handle
  eal/soc: add default scan for Soc devices
  eal/soc: additional features for SoC
  ether: utilize container_of for pci_drv
  ether: verify we copy info from a PCI device
  ether: extract function eth_dev_get_intr_handle
  ether: introduce ethernet dev probe remove

Shreyansh Jain (2):
  eal/soc: implement probing of drivers
  eal/crypto: Support rte_soc_driver/device for cryptodev

 app/test/Makefile                               |   1 +
 app/test/test_soc.c                             | 404 +++++++++++++++++++
 doc/guides/testpmd_app_ug/run_app.rst           |   4 +
 lib/librte_cryptodev/rte_cryptodev.c            | 122 +++++-
 lib/librte_cryptodev/rte_cryptodev.h            |   3 +
 lib/librte_cryptodev/rte_cryptodev_pmd.h        |  18 +-
 lib/librte_cryptodev/rte_cryptodev_version.map  |   2 +
 lib/librte_eal/bsdapp/eal/Makefile              |   1 +
 lib/librte_eal/bsdapp/eal/eal.c                 |  18 +
 lib/librte_eal/bsdapp/eal/eal_pci.c             |   6 +-
 lib/librte_eal/bsdapp/eal/eal_soc.c             |  46 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  11 +
 lib/librte_eal/common/Makefile                  |   2 +-
 lib/librte_eal/common/eal_common_dev.c          |  66 ++-
 lib/librte_eal/common/eal_common_devargs.c      |  17 +
 lib/librte_eal/common/eal_common_options.c      |   5 +
 lib/librte_eal/common/eal_common_pci.c          |  39 --
 lib/librte_eal/common/eal_common_pci_uio.c      |  16 +-
 lib/librte_eal/common/eal_common_soc.c          | 368 +++++++++++++++++
 lib/librte_eal/common/eal_internal_cfg.h        |   1 +
 lib/librte_eal/common/eal_options.h             |   2 +
 lib/librte_eal/common/eal_private.h             |  64 +++
 lib/librte_eal/common/include/rte_common.h      |  18 +
 lib/librte_eal/common/include/rte_dev.h         |  44 ++
 lib/librte_eal/common/include/rte_devargs.h     |   8 +
 lib/librte_eal/common/include/rte_pci.h         |  41 --
 lib/librte_eal/common/include/rte_soc.h         | 322 +++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile            |   2 +
 lib/librte_eal/linuxapp/eal/eal.c               |  63 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c           |  62 +--
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c       |   2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c      |   5 +-
 lib/librte_eal/linuxapp/eal/eal_soc.c           | 515 ++++++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  11 +
 lib/librte_ether/rte_ethdev.c                   | 166 +++++++-
 lib/librte_ether/rte_ethdev.h                   |  33 +-
 36 files changed, 2343 insertions(+), 165 deletions(-)
 create mode 100644 app/test/test_soc.c
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_soc.c
 create mode 100644 lib/librte_eal/common/eal_common_soc.c
 create mode 100644 lib/librte_eal/common/include/rte_soc.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_soc.c

-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4] eal: fix lib version for device generalization patches
  2016-10-27 11:32  4%       ` Shreyansh Jain
@ 2016-10-27 12:16  4%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-10-27 12:16 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, ferruh.yigit

2016-10-27 17:02, Shreyansh Jain:
> Even though I have sent the v4, there is another possibility of 
> splitting this log across API and ABI changes.
> Problem is that most of the changes are quite related in terms of impact 
> on ABI and API. (some like rte_device is clear enough, though).
> Any suggestions? Would repetitions be OK in release notes?

In general, API change implies ABI change.
I think we must use the "ABI changes" section for cases where API is not changed.
No need of repeating in both sections.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4] eal: fix lib version for device generalization patches
  2016-10-27 11:29  9%     ` [dpdk-dev] [PATCH v4] eal: fix lib version " Shreyansh Jain
@ 2016-10-27 11:32  4%       ` Shreyansh Jain
  2016-10-27 12:16  4%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-10-27 11:32 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: dev, ferruh.yigit

On Thursday 27 October 2016 04:59 PM, Shreyansh Jain wrote:
> index aa0c09a..db20567 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -201,6 +201,32 @@ API Changes
>  * The ``file_name`` data type of ``struct rte_port_source_params`` and
>    ``struct rte_port_sink_params`` is changed from `char *`` to ``const char *``.
>
> +* **Improved device/driver hierarchy and generalized hotplugging**
> +
> +  Device and driver relationship has been restructured by introducing generic
> +  classes. This paves way for having PCI, VDEV and other device types as
> +  just instantiated objects rather than classes in themselves. Hotplugging too
> +  has been generalized into EAL so that ethernet or crypto devices can use the
> +  common infrastructure.
> +
> +  * removed ``pmd_type`` as way of segregation of devices
> +  * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from
> +    ``rte_pci_driver``. These can now be used by any instantiated object of
> +    ``rte_driver``.
> +  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
> +  * renamed devinit/devuninit handlers to probe/remove to make it more
> +    semantically correct with respect to device<=>driver relationship
> +  * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the
> +    APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``.
> +  * helpers and support macros have been renamed to make them more synonymous
> +    with their device types
> +    (e.g. ``PMD_REGISTER_DRIVER`` => ``RTE_PMD_REGISTER_PCI``)
> +  * Device naming functions have been generalized from ethdev and cryptodev
> +    to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining
> +    unique device name from PCI Domain-BDF description.
> +  * Virtual device registration APIs have been added: ``rte_eal_vdrv_register``
> +    and ``rte_eal_vdrv_unregister``.
> +
>
>  ABI Changes
>  -----------

Even though I have sent the v4, there is another possibility of 
splitting this log across API and ABI changes.
Problem is that most of the changes are quite related in terms of impact 
on ABI and API. (some like rte_device is clear enough, though).
Any suggestions? Would repetitions be OK in release notes?

-
Shreyansh

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4] eal: fix lib version for device generalization patches
  2016-10-27  7:08  7%   ` [dpdk-dev] [PATCH v3] " Shreyansh Jain
  2016-10-27 10:15  0%     ` Thomas Monjalon
@ 2016-10-27 11:29  9%     ` Shreyansh Jain
  2016-10-27 11:32  4%       ` Shreyansh Jain
  1 sibling, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-10-27 11:29 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: dev, Shreyansh Jain, ferruh.yigit

rte_device/driver generalization patches [1] were merged without a change
in the LIBABIVER variable. This patches bumps the macro of affected libs:

- libcryptodev and libetherdev have been bumped
- librte_eal version changed in
  d7e61ad3ae36 ("log: remove deprecated history dump")

Details of ABI/API changes:
- EAL [version already bumped in: d7e61ad3ae36]
  |- type field was removed from rte_driver
  |- rte_pci_device now embeds rte_device
  |- rte_pci_resource renamed to rte_mem_resource
  |- numa_node and devargs of rte_pci_driver is moved to rte_driver
  |- APIs for device hotplug (attach/detach) moved into EAL
  |- API rte_eal_pci_device_name added for PCI device naming
  |- vdev registration API introduced (rte_eal_vdrv_register,
  |  rte_eal_vdrv_unregister

- librte_crypto (v 1=>2)
  |- removed rte_cryptodev_create_unique_device_name API
  |- moved device naming to EAL

- librte_ethdev (v 4=>5)
  |- rte_eth_dev_type is removed
  |- removed dev_type from rte_eth_dev_allocate API
  |- removed API rte_eth_dev_get_device_type
  |- removed API rte_eth_dev_get_addr_by_port
  |- removed API rte_eth_dev_get_port_by_addr
  |- removed rte_cryptodev_create_unique_device_name API
  |- moved device naming to EAL

Also, deprecation notice from 16.07 has been removed and release notes for
16.11 added.

[1] http://dpdk.org/ml/archives/dev/2016-September/047087.html

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
--
v4:
 - fix spelling mistakes and incorrect symbol name in doc
 - reword commit log for EAL modification commit id

v3:
 - add API/ABI change info in commit log
 - fix library version change notification in release note
 - fix erroneous change to librte_eal version in v2
---
 doc/guides/rel_notes/deprecation.rst   | 12 ------------
 doc/guides/rel_notes/release_16_11.rst | 30 ++++++++++++++++++++++++++++--
 lib/librte_cryptodev/Makefile          |  2 +-
 lib/librte_ether/Makefile              |  2 +-
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d5c1490..884a231 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -18,18 +18,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* The ethdev hotplug API is going to be moved to EAL with a notification
-  mechanism added to crypto and ethdev libraries so that hotplug is now
-  available to both of them. This API will be stripped of the device arguments
-  so that it only cares about hotplugging.
-
-* Structures embodying pci and vdev devices are going to be reworked to
-  integrate new common rte_device / rte_driver objects (see
-  http://dpdk.org/ml/archives/dev/2016-January/031390.html).
-  ethdev and crypto libraries will then only handle those objects so that they
-  do not need to care about the kind of devices that are being used, making it
-  easier to add new buses later.
-
 * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index aa0c09a..db20567 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -201,6 +201,32 @@ API Changes
 * The ``file_name`` data type of ``struct rte_port_source_params`` and
   ``struct rte_port_sink_params`` is changed from `char *`` to ``const char *``.
 
+* **Improved device/driver hierarchy and generalized hotplugging**
+
+  Device and driver relationship has been restructured by introducing generic
+  classes. This paves way for having PCI, VDEV and other device types as
+  just instantiated objects rather than classes in themselves. Hotplugging too
+  has been generalized into EAL so that ethernet or crypto devices can use the
+  common infrastructure.
+
+  * removed ``pmd_type`` as way of segregation of devices
+  * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from
+    ``rte_pci_driver``. These can now be used by any instantiated object of
+    ``rte_driver``.
+  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
+  * renamed devinit/devuninit handlers to probe/remove to make it more
+    semantically correct with respect to device<=>driver relationship
+  * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the
+    APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``.
+  * helpers and support macros have been renamed to make them more synonymous
+    with their device types
+    (e.g. ``PMD_REGISTER_DRIVER`` => ``RTE_PMD_REGISTER_PCI``)
+  * Device naming functions have been generalized from ethdev and cryptodev
+    to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining
+    unique device name from PCI Domain-BDF description.
+  * Virtual device registration APIs have been added: ``rte_eal_vdrv_register``
+    and ``rte_eal_vdrv_unregister``.
+
 
 ABI Changes
 -----------
@@ -232,11 +258,11 @@ The libraries prepended with a plus sign were incremented in this version.
 
 .. code-block:: diff
 
-     libethdev.so.4
+   + libethdev.so.5
      librte_acl.so.2
      librte_cfgfile.so.2
      librte_cmdline.so.2
-     librte_cryptodev.so.1
+   + librte_cryptodev.so.2
      librte_distributor.so.1
    + librte_eal.so.3
      librte_hash.so.2
diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cryptodev/Makefile
index 314a046..aebf5d9 100644
--- a/lib/librte_cryptodev/Makefile
+++ b/lib/librte_cryptodev/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_cryptodev.a
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # build flags
 CFLAGS += -O3
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 488b7c8..bc2e5f6 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
 
 EXPORT_MAP := rte_ether_version.map
 
-LIBABIVER := 4
+LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
 
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v3] eal: fix libabi macro for device generalization patches
  2016-10-27 10:15  0%     ` Thomas Monjalon
@ 2016-10-27 11:10  0%       ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-10-27 11:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, ferruh.yigit

Hello Thomas,

On Thursday 27 October 2016 03:45 PM, Thomas Monjalon wrote:
> 2016-10-27 12:38, Shreyansh Jain:
>> rte_device/driver generalization patches [1] were merged without a change
>> in the LIBABIVER macro. This patches bumps the macro of affected libs.
>
> It is not a macro but a Makefile variable.

Yes, I will change that.

>
>> (librte_eal was already bumped; libcryptodev and libetherdev have been
>> bumped).
>
> Please provide the commit id where EAL was bumped.

Ok. Will do.

>
>> Details of ABI/API changes:
>> - EAL (version not bumped)
>
> not bumped -> already bumped

Ok.

>
>>   |- type field was removed from rte_driver
>>   |- rte_pci_device now embeds rte_device
>>   |- rte_pci_resource renamed to rte_mem_resource
>>   |- numa_node and devargs of rte_pci_driver is moved to rte_driver
>>   |- APIs for device hotplug (attach/detach) moved into EAL
>>   |- API rte_eal_pci_device_name added for PCI device naming
>>   |- vdev registration API introduced (rte_eal_vdrv_register,
>>   |  rte_eal_vdrv_unregister
>>
>> - librte_crypto (v 1=>2)
>>   |- removed rte_cryptodev_create_unique_device_name API
>>   |- moved device naming to EAL
>>
>> - librte_ethdev (v 4=>5)
>>   |- rte_eth_dev_type is removed
>>   |- removed dev_type from rte_eth_dev_allocate API
>>   |- removed API rte_eth_dev_get_device_type
>>   |- removed API rte_eth_dev_get_addr_by_port
>>   |- removed API rte_eth_dev_get_port_by_addr
>>   |- removed rte_cryptodev_create_unique_device_name API
>>   |- moved device naming to EAL
>>
>> Also, deprecation notice from 16.07 has been removed and release notes for
>> 16.11 added.
>>
>> [1] http://dpdk.org/ml/archives/dev/2016-September/047087.html
>>
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> [...]
>> --- a/doc/guides/rel_notes/release_16_11.rst
>> +++ b/doc/guides/rel_notes/release_16_11.rst
>> @@ -149,6 +149,32 @@ Resolved Issues
>
> It is the "Resolved Issues" section.
> Please move in the "API Changes" section.

Ok.

>
>>  EAL
>>  ~~~
>>
>> +* **Improved device/driver heirarchy and generalized hotplugging**
>
> typo: hierarchy

Yes.

>
>> +  Device and driver relationship has been restructured by introducing generic
>> +  classes. This paves way for having PCI, VDEV and other device types as
>> +  just instantiated objects rather than classes in themselves. Hotplugging too
>> +  has been generalized into EAL so that ethernet or crypto devices can use the
>> +  common infrastructure.
>> +
>> +  * removed ``pmd_type`` as way of segragation of devices
>> +  * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from
>> +    ``rte_pci_driver``. These can now be used by any instantiated object of
>> +    ``rte_driver``.
>> +  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
>> +  * renamed devinit/devuninit handlers to probe/remove to make it more
>> +    semantically correct with respect to device<=>driver relationship
>> +  * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the
>> +    APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``.
>> +  * helpers and support macros have been renamed to make them more synonymous
>> +    with their device types
>> +    (e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``)
>
> It is RTE_PMD_REGISTER_PCI

It seems my Friday is earlier than usual :(
I was the one who changed it and I completely forgot about it.

>
>> +  * Device naming functions have been generalized from ethdev and cryptodev
>> +    to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining
>> +    unique device name from PCI Domain-BDF description.
>> +  * Virtual device registration APIs have been added: ``rte_eal_vdrv_register``
>> +    and ``rte_eal_vdrv_unregister``.
>
> Thanks
>

I am sending v4 soon.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] eal: fix libabi macro for device generalization patches
  2016-10-27  7:08  7%   ` [dpdk-dev] [PATCH v3] " Shreyansh Jain
@ 2016-10-27 10:15  0%     ` Thomas Monjalon
  2016-10-27 11:10  0%       ` Shreyansh Jain
  2016-10-27 11:29  9%     ` [dpdk-dev] [PATCH v4] eal: fix lib version " Shreyansh Jain
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2016-10-27 10:15 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, ferruh.yigit

2016-10-27 12:38, Shreyansh Jain:
> rte_device/driver generalization patches [1] were merged without a change
> in the LIBABIVER macro. This patches bumps the macro of affected libs.

It is not a macro but a Makefile variable.

> (librte_eal was already bumped; libcryptodev and libetherdev have been
> bumped).

Please provide the commit id where EAL was bumped.

> Details of ABI/API changes:
> - EAL (version not bumped)

not bumped -> already bumped

>   |- type field was removed from rte_driver
>   |- rte_pci_device now embeds rte_device
>   |- rte_pci_resource renamed to rte_mem_resource
>   |- numa_node and devargs of rte_pci_driver is moved to rte_driver
>   |- APIs for device hotplug (attach/detach) moved into EAL
>   |- API rte_eal_pci_device_name added for PCI device naming
>   |- vdev registration API introduced (rte_eal_vdrv_register,
>   |  rte_eal_vdrv_unregister
> 
> - librte_crypto (v 1=>2)
>   |- removed rte_cryptodev_create_unique_device_name API
>   |- moved device naming to EAL
> 
> - librte_ethdev (v 4=>5)
>   |- rte_eth_dev_type is removed
>   |- removed dev_type from rte_eth_dev_allocate API
>   |- removed API rte_eth_dev_get_device_type
>   |- removed API rte_eth_dev_get_addr_by_port
>   |- removed API rte_eth_dev_get_port_by_addr
>   |- removed rte_cryptodev_create_unique_device_name API
>   |- moved device naming to EAL
> 
> Also, deprecation notice from 16.07 has been removed and release notes for
> 16.11 added.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-September/047087.html
> 
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
[...]
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -149,6 +149,32 @@ Resolved Issues

It is the "Resolved Issues" section.
Please move in the "API Changes" section.

>  EAL
>  ~~~
>  
> +* **Improved device/driver heirarchy and generalized hotplugging**

typo: hierarchy

> +  Device and driver relationship has been restructured by introducing generic
> +  classes. This paves way for having PCI, VDEV and other device types as
> +  just instantiated objects rather than classes in themselves. Hotplugging too
> +  has been generalized into EAL so that ethernet or crypto devices can use the
> +  common infrastructure.
> +
> +  * removed ``pmd_type`` as way of segragation of devices
> +  * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from
> +    ``rte_pci_driver``. These can now be used by any instantiated object of
> +    ``rte_driver``.
> +  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
> +  * renamed devinit/devuninit handlers to probe/remove to make it more
> +    semantically correct with respect to device<=>driver relationship
> +  * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the
> +    APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``.
> +  * helpers and support macros have been renamed to make them more synonymous
> +    with their device types
> +    (e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``)

It is RTE_PMD_REGISTER_PCI

> +  * Device naming functions have been generalized from ethdev and cryptodev
> +    to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining
> +    unique device name from PCI Domain-BDF description.
> +  * Virtual device registration APIs have been added: ``rte_eal_vdrv_register``
> +    and ``rte_eal_vdrv_unregister``.

Thanks

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3] eal: fix libabi macro for device generalization patches
  2016-10-26 13:00  4% ` [dpdk-dev] [PATCH v2] " Shreyansh Jain
  2016-10-26 13:12  0%   ` Shreyansh Jain
@ 2016-10-27  7:08  7%   ` Shreyansh Jain
  2016-10-27 10:15  0%     ` Thomas Monjalon
  2016-10-27 11:29  9%     ` [dpdk-dev] [PATCH v4] eal: fix lib version " Shreyansh Jain
  1 sibling, 2 replies; 200+ results
From: Shreyansh Jain @ 2016-10-27  7:08 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: dev, Shreyansh Jain, ferruh.yigit

rte_device/driver generalization patches [1] were merged without a change
in the LIBABIVER macro. This patches bumps the macro of affected libs.
(librte_eal was already bumped; libcryptodev and libetherdev have been
bumped).

Details of ABI/API changes:
- EAL (version not bumped)
  |- type field was removed from rte_driver
  |- rte_pci_device now embeds rte_device
  |- rte_pci_resource renamed to rte_mem_resource
  |- numa_node and devargs of rte_pci_driver is moved to rte_driver
  |- APIs for device hotplug (attach/detach) moved into EAL
  |- API rte_eal_pci_device_name added for PCI device naming
  |- vdev registration API introduced (rte_eal_vdrv_register,
  |  rte_eal_vdrv_unregister

- librte_crypto (v 1=>2)
  |- removed rte_cryptodev_create_unique_device_name API
  |- moved device naming to EAL

- librte_ethdev (v 4=>5)
  |- rte_eth_dev_type is removed
  |- removed dev_type from rte_eth_dev_allocate API
  |- removed API rte_eth_dev_get_device_type
  |- removed API rte_eth_dev_get_addr_by_port
  |- removed API rte_eth_dev_get_port_by_addr
  |- removed rte_cryptodev_create_unique_device_name API
  |- moved device naming to EAL

Also, deprecation notice from 16.07 has been removed and release notes for
16.11 added.

[1] http://dpdk.org/ml/archives/dev/2016-September/047087.html

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
--
v3:
 - add API/ABI change info in commit log
 - fix library version change notification in release note
 - fix erroneous change to librte_eal version in v2
---
 doc/guides/rel_notes/deprecation.rst   | 12 ------------
 doc/guides/rel_notes/release_16_11.rst | 30 ++++++++++++++++++++++++++++--
 lib/librte_cryptodev/Makefile          |  2 +-
 lib/librte_ether/Makefile              |  2 +-
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d5c1490..884a231 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -18,18 +18,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* The ethdev hotplug API is going to be moved to EAL with a notification
-  mechanism added to crypto and ethdev libraries so that hotplug is now
-  available to both of them. This API will be stripped of the device arguments
-  so that it only cares about hotplugging.
-
-* Structures embodying pci and vdev devices are going to be reworked to
-  integrate new common rte_device / rte_driver objects (see
-  http://dpdk.org/ml/archives/dev/2016-January/031390.html).
-  ethdev and crypto libraries will then only handle those objects so that they
-  do not need to care about the kind of devices that are being used, making it
-  easier to add new buses later.
-
 * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index aa0c09a..5a5485b 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -149,6 +149,32 @@ Resolved Issues
 EAL
 ~~~
 
+* **Improved device/driver heirarchy and generalized hotplugging**
+
+  Device and driver relationship has been restructured by introducing generic
+  classes. This paves way for having PCI, VDEV and other device types as
+  just instantiated objects rather than classes in themselves. Hotplugging too
+  has been generalized into EAL so that ethernet or crypto devices can use the
+  common infrastructure.
+
+  * removed ``pmd_type`` as way of segragation of devices
+  * moved ``numa_node`` and ``devargs`` into ``rte_driver`` from
+    ``rte_pci_driver``. These can now be used by any instantiated object of
+    ``rte_driver``.
+  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
+  * renamed devinit/devuninit handlers to probe/remove to make it more
+    semantically correct with respect to device<=>driver relationship
+  * moved hotplugging support to EAL. Hereafter, PCI and vdev can use the
+    APIs ``rte_eal_dev_attach`` and ``rte_eal_dev_detach``.
+  * helpers and support macros have been renamed to make them more synonymous
+    with their device types
+    (e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``)
+  * Device naming functions have been generalized from ethdev and cryptodev
+    to EAL. ``rte_eal_pci_device_name`` has been introduced for obtaining
+    unique device name from PCI Domain-BDF description.
+  * Virtual device registration APIs have been added: ``rte_eal_vdrv_register``
+    and ``rte_eal_vdrv_unregister``.
+
 
 Drivers
 ~~~~~~~
@@ -232,11 +258,11 @@ The libraries prepended with a plus sign were incremented in this version.
 
 .. code-block:: diff
 
-     libethdev.so.4
+   + libethdev.so.5
      librte_acl.so.2
      librte_cfgfile.so.2
      librte_cmdline.so.2
-     librte_cryptodev.so.1
+   + librte_cryptodev.so.2
      librte_distributor.so.1
    + librte_eal.so.3
      librte_hash.so.2
diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cryptodev/Makefile
index 314a046..aebf5d9 100644
--- a/lib/librte_cryptodev/Makefile
+++ b/lib/librte_cryptodev/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_cryptodev.a
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # build flags
 CFLAGS += -O3
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 488b7c8..bc2e5f6 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
 
 EXPORT_MAP := rte_ether_version.map
 
-LIBABIVER := 4
+LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
 
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v2] eal: fix libabi macro for device generalization patches
  2016-10-26 13:00  4% ` [dpdk-dev] [PATCH v2] " Shreyansh Jain
@ 2016-10-26 13:12  0%   ` Shreyansh Jain
  2016-10-27  7:08  7%   ` [dpdk-dev] [PATCH v3] " Shreyansh Jain
  1 sibling, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-10-26 13:12 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: dev

On Wednesday 26 October 2016 06:30 PM, Shreyansh Jain wrote:
> rte_device/driver generalization patches [1] were merged without a change
> in the LIBABIVER macro. This patches bumps the macro of affected libs.
>
> Also, deprecation notice from 16.07 has been removed and release notes for
> 16.11 added.
>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> --
> v2:
>  - Mark bumped libraries in release_16_11.rst file
>  - change code symbol names from text to code layout
>
> ---
>  doc/guides/rel_notes/deprecation.rst   | 12 ------------
>  doc/guides/rel_notes/release_16_11.rst | 21 +++++++++++++++++++--
>  lib/librte_cryptodev/Makefile          |  2 +-
>  lib/librte_eal/bsdapp/eal/Makefile     |  2 +-
>  lib/librte_eal/linuxapp/eal/Makefile   |  2 +-
>  lib/librte_ether/Makefile              |  2 +-
>  6 files changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d5c1490..884a231 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -18,18 +18,6 @@ Deprecation Notices
>    ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
>    segments limit to be transmitted by device for TSO/non-TSO packets.
>
> -* The ethdev hotplug API is going to be moved to EAL with a notification
> -  mechanism added to crypto and ethdev libraries so that hotplug is now
> -  available to both of them. This API will be stripped of the device arguments
> -  so that it only cares about hotplugging.
> -
> -* Structures embodying pci and vdev devices are going to be reworked to
> -  integrate new common rte_device / rte_driver objects (see
> -  http://dpdk.org/ml/archives/dev/2016-January/031390.html).
> -  ethdev and crypto libraries will then only handle those objects so that they
> -  do not need to care about the kind of devices that are being used, making it
> -  easier to add new buses later.
> -
>  * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
>    may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
>    ``nb_segs`` in one operation, because some platforms have an overhead if the
> diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
> index 26cdd62..2d5636c 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -149,6 +149,23 @@ Resolved Issues
>  EAL
>  ~~~
>
> +* **Improved device/driver heirarchy and generalized hotplugging**
> +
> +  Device and driver relationship has been restructured by introducing generic
> +  classes. This paves way for having PCI, VDEV and other device types as
> +  just instantiated objects rather than classes in themselves. Hotplugging too
> +  has been generalized into EAL so that ethernet or crypto devices can use the
> +  common infrastructure.
> +
> +  * removed ``pmd_type`` as way of segragation of devices
> +  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
> +  * renamed devinit/devuninit handlers to probe/remove to make it more
> +    semantically correct with respect to device<=>driver relationship
> +  * moved hotplugging support to EAL
> +  * helpers and support macros have been renamed to make them more synonymous
> +    with their device types
> +    (e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``)
> +
>
>  Drivers
>  ~~~~~~~
> @@ -232,11 +249,11 @@ The libraries prepended with a plus sign were incremented in this version.
>
>  .. code-block:: diff
>
> -     libethdev.so.4
> +   + libethdev.so.4

Just noticed:
Should the '4' here reflect the current LIBABIVER number?
If so, I will send this patch again.

>       librte_acl.so.2
>       librte_cfgfile.so.2
>       librte_cmdline.so.2
> -     librte_cryptodev.so.1
> +   + librte_cryptodev.so.1
>       librte_distributor.so.1
>     + librte_eal.so.3
>       librte_hash.so.2
> diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cryptodev/Makefile
> index 314a046..aebf5d9 100644
> --- a/lib/librte_cryptodev/Makefile
> +++ b/lib/librte_cryptodev/Makefile
> @@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  LIB = librte_cryptodev.a
>
>  # library version
> -LIBABIVER := 1
> +LIBABIVER := 2
>
>  # build flags
>  CFLAGS += -O3
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
> index a15b762..122798c 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -48,7 +48,7 @@ LDLIBS += -lgcc_s
>
>  EXPORT_MAP := rte_eal_version.map
>
> -LIBABIVER := 3
> +LIBABIVER := 4
>
>  # specific to bsdapp exec-env
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index 4e206f0..4ad7c85 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -37,7 +37,7 @@ ARCH_DIR ?= $(RTE_ARCH)
>  EXPORT_MAP := rte_eal_version.map
>  VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
>
> -LIBABIVER := 3
> +LIBABIVER := 4
>
>  VPATH += $(RTE_SDK)/lib/librte_eal/common
>
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index 488b7c8..bc2e5f6 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
>
>  EXPORT_MAP := rte_ether_version.map
>
> -LIBABIVER := 4
> +LIBABIVER := 5
>
>  SRCS-y += rte_ethdev.c
>
>

Sorry for repeated versions.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] eal: fix libabi macro for device generalization patches
  2016-10-26 12:38  4% [dpdk-dev] [PATCH] eal: fix libabi macro for device generalization patches Shreyansh Jain
@ 2016-10-26 13:00  4% ` Shreyansh Jain
  2016-10-26 13:12  0%   ` Shreyansh Jain
  2016-10-27  7:08  7%   ` [dpdk-dev] [PATCH v3] " Shreyansh Jain
  0 siblings, 2 replies; 200+ results
From: Shreyansh Jain @ 2016-10-26 13:00 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: dev, Shreyansh Jain

rte_device/driver generalization patches [1] were merged without a change
in the LIBABIVER macro. This patches bumps the macro of affected libs.

Also, deprecation notice from 16.07 has been removed and release notes for
16.11 added.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
--
v2:
 - Mark bumped libraries in release_16_11.rst file
 - change code symbol names from text to code layout

---
 doc/guides/rel_notes/deprecation.rst   | 12 ------------
 doc/guides/rel_notes/release_16_11.rst | 21 +++++++++++++++++++--
 lib/librte_cryptodev/Makefile          |  2 +-
 lib/librte_eal/bsdapp/eal/Makefile     |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile   |  2 +-
 lib/librte_ether/Makefile              |  2 +-
 6 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d5c1490..884a231 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -18,18 +18,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* The ethdev hotplug API is going to be moved to EAL with a notification
-  mechanism added to crypto and ethdev libraries so that hotplug is now
-  available to both of them. This API will be stripped of the device arguments
-  so that it only cares about hotplugging.
-
-* Structures embodying pci and vdev devices are going to be reworked to
-  integrate new common rte_device / rte_driver objects (see
-  http://dpdk.org/ml/archives/dev/2016-January/031390.html).
-  ethdev and crypto libraries will then only handle those objects so that they
-  do not need to care about the kind of devices that are being used, making it
-  easier to add new buses later.
-
 * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 26cdd62..2d5636c 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -149,6 +149,23 @@ Resolved Issues
 EAL
 ~~~
 
+* **Improved device/driver heirarchy and generalized hotplugging**
+
+  Device and driver relationship has been restructured by introducing generic
+  classes. This paves way for having PCI, VDEV and other device types as
+  just instantiated objects rather than classes in themselves. Hotplugging too
+  has been generalized into EAL so that ethernet or crypto devices can use the
+  common infrastructure.
+
+  * removed ``pmd_type`` as way of segragation of devices
+  * added ``rte_device`` class and all PCI and VDEV devices inherit from it
+  * renamed devinit/devuninit handlers to probe/remove to make it more
+    semantically correct with respect to device<=>driver relationship
+  * moved hotplugging support to EAL
+  * helpers and support macros have been renamed to make them more synonymous
+    with their device types
+    (e.g. ``PMD_REGISTER_DRIVER`` => ``DRIVER_REGISTER_PCI``)
+
 
 Drivers
 ~~~~~~~
@@ -232,11 +249,11 @@ The libraries prepended with a plus sign were incremented in this version.
 
 .. code-block:: diff
 
-     libethdev.so.4
+   + libethdev.so.4
      librte_acl.so.2
      librte_cfgfile.so.2
      librte_cmdline.so.2
-     librte_cryptodev.so.1
+   + librte_cryptodev.so.1
      librte_distributor.so.1
    + librte_eal.so.3
      librte_hash.so.2
diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cryptodev/Makefile
index 314a046..aebf5d9 100644
--- a/lib/librte_cryptodev/Makefile
+++ b/lib/librte_cryptodev/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_cryptodev.a
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # build flags
 CFLAGS += -O3
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index a15b762..122798c 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -48,7 +48,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := rte_eal_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 4e206f0..4ad7c85 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -37,7 +37,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 488b7c8..bc2e5f6 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
 
 EXPORT_MAP := rte_ether_version.map
 
-LIBABIVER := 4
+LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
 
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] eal: fix libabi macro for device generalization patches
@ 2016-10-26 12:38  4% Shreyansh Jain
  2016-10-26 13:00  4% ` [dpdk-dev] [PATCH v2] " Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2016-10-26 12:38 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: dev, Shreyansh Jain

rte_device/driver generalization patches [1] were merged without a change
in the LIBABIVER macro. This patches bumps the macro of affected libs.

Also, deprecation notice from 16.07 has been removed and release notes for
16.11 added.

[1] http://dpdk.org/ml/archives/dev/2016-September/047087.html

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst   | 12 ------------
 doc/guides/rel_notes/release_16_11.rst | 16 ++++++++++++++++
 lib/librte_cryptodev/Makefile          |  2 +-
 lib/librte_eal/bsdapp/eal/Makefile     |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile   |  2 +-
 lib/librte_ether/Makefile              |  2 +-
 6 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d5c1490..884a231 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -18,18 +18,6 @@ Deprecation Notices
   ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
   segments limit to be transmitted by device for TSO/non-TSO packets.
 
-* The ethdev hotplug API is going to be moved to EAL with a notification
-  mechanism added to crypto and ethdev libraries so that hotplug is now
-  available to both of them. This API will be stripped of the device arguments
-  so that it only cares about hotplugging.
-
-* Structures embodying pci and vdev devices are going to be reworked to
-  integrate new common rte_device / rte_driver objects (see
-  http://dpdk.org/ml/archives/dev/2016-January/031390.html).
-  ethdev and crypto libraries will then only handle those objects so that they
-  do not need to care about the kind of devices that are being used, making it
-  easier to add new buses later.
-
 * ABI changes are planned for 16.11 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 26cdd62..c3f3bd9 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -149,6 +149,22 @@ Resolved Issues
 EAL
 ~~~
 
+* **Improved device/driver heirarchy and generalized hotplugging**
+
+  Device and driver relationship has been restructured by introducing generic
+  classes. This paves way for having PCI, VDEV and other device types as
+  just instantiated objects rather than classes in themselves. Hotplugging too
+  has been generalized into EAL so that ethernet or cryptodevices can use the
+  common infrastructure.
+
+  * removed pmd_type as way of segragation of devices
+  * added rte_device class and all PCI and VDEV devices inherit from it
+  * renamed devinit/devuninit handlers to probe/remove to make it more
+    semantically correct with respect to device<=>driver relationship
+  * moved hotplugging support to EAL
+  * helpers and support macros have been renamed to make them more synonymous
+    with their device types (e.g. PMD_REGISTER_DRIVER => DRIVER_REGISTER_PCI)
+
 
 Drivers
 ~~~~~~~
diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cryptodev/Makefile
index 314a046..aebf5d9 100644
--- a/lib/librte_cryptodev/Makefile
+++ b/lib/librte_cryptodev/Makefile
@@ -34,7 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_cryptodev.a
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # build flags
 CFLAGS += -O3
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index a15b762..122798c 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -48,7 +48,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := rte_eal_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 4e206f0..4ad7c85 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -37,7 +37,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 488b7c8..bc2e5f6 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -41,7 +41,7 @@ CFLAGS += $(WERROR_FLAGS)
 
 EXPORT_MAP := rte_ether_version.map
 
-LIBABIVER := 4
+LIBABIVER := 5
 
 SRCS-y += rte_ethdev.c
 
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance
    2016-10-24  9:23  4% ` Wu, Jingjing
@ 2016-10-25  2:53  4% ` Xing, Beilei
  2016-11-11  1:26  4% ` Zhang, Helin
  2 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2016-10-25  2:53 UTC (permalink / raw)
  To: Yang, Qiming, dev; +Cc: Yang, Qiming


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qiming Yang
> Sent: Sunday, October 9, 2016 11:17 AM
> To: dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app
> enhance
> 
> This patch adds a notice that the ABI change for ethtool app to get the NIC
> firmware version in the 17.02 release.
> 
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 845d2aa..60bd7ed 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -62,3 +62,7 @@ Deprecation Notices
>  * API will change for ``rte_port_source_params`` and
> ``rte_port_sink_params``
>    structures. The member ``file_name`` data type will be changed from
>    ``char *`` to ``const char *``. This change targets release 16.11.
> +
> +* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
> +  will be extended with a new member ``fw_version`` in order to store
> +  the NIC firmware version.
> --
> 2.7.4

Acked-by: Beilei Xing<beilei.xing@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance
  @ 2016-10-24  9:23  4% ` Wu, Jingjing
  2016-10-25  2:53  4% ` Xing, Beilei
  2016-11-11  1:26  4% ` Zhang, Helin
  2 siblings, 0 replies; 200+ results
From: Wu, Jingjing @ 2016-10-24  9:23 UTC (permalink / raw)
  To: Yang, Qiming, dev; +Cc: Yang, Qiming



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qiming Yang
> Sent: Sunday, October 9, 2016 11:17 AM
> To: dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance
> 
> This patch adds a notice that the ABI change for ethtool app to
> get the NIC firmware version in the 17.02 release.
> 
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 845d2aa..60bd7ed 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -62,3 +62,7 @@ Deprecation Notices
>  * API will change for ``rte_port_source_params`` and ``rte_port_sink_params``
>    structures. The member ``file_name`` data type will be changed from
>    ``char *`` to ``const char *``. This change targets release 16.11.
> +
> +* In 17.02 ABI change is planned: the ``rte_eth_dev_info`` structure
> +  will be extended with a new member ``fw_version`` in order to store
> +  the NIC firmware version.
> --
> 2.7.4

Acked-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization
  @ 2016-10-19 11:13  0%               ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2016-10-19 11:13 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon
  Cc: dev, viktorin, David Marchand, Hemant Agrawal

Hi Ferruh,

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Tuesday, October 18, 2016 2:53 PM
> To: Shreyansh Jain <shreyansh.jain@nxp.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; viktorin@rehivetech.com; David Marchand
> <david.marchand@6wind.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
> Subject: Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device
> generalization
> 
> On 10/17/2016 6:29 PM, Shreyansh Jain wrote:
> > Hi Ferruh,
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> >> Sent: Monday, October 17, 2016 7:13 PM
> >> To: Shreyansh Jain <shreyansh.jain@nxp.com>; Thomas Monjalon
> >> <thomas.monjalon@6wind.com>
> >> Cc: dev@dpdk.org; viktorin@rehivetech.com; David Marchand
> >> <david.marchand@6wind.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
> >> Subject: Re: [dpdk-dev] [PATCH v11 00/24] Introducing
> rte_driver/rte_device
> >> generalization
> >>
> >> On 10/5/2016 12:57 PM, Shreyansh Jain wrote:
> >>> Hi Thomas,
> >>>
> >>> On Tuesday 04 October 2016 01:12 PM, Thomas Monjalon wrote:
> >>>> 2016-10-04 12:21, Shreyansh Jain:
> >>>>> Hi Thomas,
> >>>>>
> >>>>> On Monday 03 October 2016 07:58 PM, Thomas Monjalon wrote:
> >>>>>> Applied, thanks everybody for the great (re)work!
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>> [...]
> >>> [...]
> >>>>>
> >>>>> It can be merged with changes for:
> >>>>>   - drv_name
> >>>>>   - EAL_ before _REGISTER_ macros
> >>>>>   - eth_driver => rte_driver naming
> >>>>
> >>>> Good.
> >>>> Could you make it this week, please?
> >>>>
> >>>
> >>> Certainly. At least some of those I can send within this week :)
> >>>
> >>
> >>
> >> I caught while running ABI validation script today, I think this patch
> >> should increase LIBABIVER of:
> >> - lib/librte_cryptodev
> >> - lib/librte_eal
> >> - lib/librte_ether
> >
> > Should I be referring to [1] for understanding how/when to change the
> LIBABIVER?
> >
> > [1] http://dpdk.org/doc/guides/contributing/versioning.html
> 
> Yes, this is the document.
> 
> Briefly, if library becomes incompatible with existing applications,
> LIBABIVER needs to be increased to indicate this.
> 
> Increasing LIBABIVER changes dynamic library name and so_name, and this
> cause existing application do not work with this new library. Not
> increasing the LIBABIVER, app may start running but can have segfault or
> can generate wrong values. So increasing LIBABIVER is to inform user and
> prevent surprises.

Thanks for explanation. I understand.
I will send across another patch for this.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet
  @ 2016-10-19 10:14  3%     ` Pattan, Reshma
  0 siblings, 0 replies; 200+ results
From: Pattan, Reshma @ 2016-10-19 10:14 UTC (permalink / raw)
  To: Olivier Matz, Oleg Kuporosov; +Cc: Richardson, Bruce, Ananyev, Konstantin, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, October 18, 2016 4:44 PM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the
> packet
> 
> 
> 
> On 10/13/2016 04:35 PM, Oleg Kuporosov wrote:
> > The hard requirement of financial services industry is accurate
> > timestamping aligned with the packet itself. This patch is to satisfy
> > this requirement:
> >
> > - include uint64_t timestamp field into rte_mbuf with minimal impact to
> >   throughput/latency. Keep it just simple uint64_t in ns (more than 580
> >   years) would be enough for immediate needs while using full
> >   struct timespec with twice bigger size would have much stronger
> >   performance impact as missed cacheline0.
> >
> > - it is possible as there is 6-bytes gap in 1st cacheline (fast path)
> >   and moving uint16_t vlan_tci_outer field to 2nd cacheline.
> >
> > - such move will only impact for pretty rare usable VLAN RX stripping
> >   mode for outer TCI (it used only for one NIC i40e from the whole set and
> >   allows to keep minimal performance impact for RX/TX timestamps.
> 
> This argument is difficult to accept. One can say you are adding a field for a
> pretty rare case used by only one NIC :)
> 
> Honestly, I'm not able to judge whether timestamp is more important than
> vlan_tci_outer. As room is tight in the first cache line, your patch submission is
> the occasion to raise the question: how to decide what should be in the first part
> of the mbuf? There are also some other candidates for moving: m->seqn is only
> used in librte_reorder and it is not set in the RX part of a driver.
> 
> About the timestamp, it would be valuable to have other opinions, not only
> about the placement of the field in the structure, but also to check that this API
> is also usable for other NICs.
> 
> Have you measured the impact of having the timestamp in the second part of
> the mbuf?
> 
> Changing the mbuf structure should happen as rarely as possible, and we have to
> make sure we take the correct decisions. I think we will discuss this at dpdk
> userland 2016.
> 
> 

I just read this mail chain, to make every one aware again, I am emphasizing on the point that I am also adding new "time_arraived" field 
to mbuf struct as part of  below 17.02 Road map item. 

" Extended Stats (Latency and Bit Rate Statistics): Enhance the Extended NIC Stats (Xstats) implementation to support the collection and reporting of latency and bit rate measurements. Latency statistics will include min, max and average latency, and jitter. Bit rate statistics will include peak and average bit rate aggregated over a user-defined time period. This will be implemented for IXGBE and I40E."

 Adding new field  " time_arrived" to the 2nd cache line of rte_mbuf struct to mark the packet arrival time for latency measurements. 
Adding this new filed to second cache line will not break the ABI. I implemented a new library to mark the time as part of rte_eth_rx callbacks
and use that time stamp inside rte_eth_tx callback for measuring the latency. Below is the latest v3 RFC patch for reference. 
http://dpdk.org/dev/patchwork/patch/16655/

Comments are welcome.

Thanks,
Reshma

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization
  2016-10-18 13:04  0%           ` Neil Horman
@ 2016-10-18 13:47  3%             ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2016-10-18 13:47 UTC (permalink / raw)
  To: Neil Horman, Ferruh Yigit, Shreyansh Jain
  Cc: dev, viktorin, David Marchand, hemant.agrawal

2016-10-18 09:04, Neil Horman:
> On Mon, Oct 17, 2016 at 02:43:12PM +0100, Ferruh Yigit wrote:
> > I caught while running ABI validation script today, I think this patch
> > should increase LIBABIVER of:
> > - lib/librte_cryptodev
> > - lib/librte_eal
> > - lib/librte_ether
> > 
> It should, and it in fact should wait a release if it hadn't been documented as
> a comming change.

It was announced in release 16.04:
	http://dpdk.org/commit/c7970af

While bumping LIBABIVER, the deprecation notices must be removed,
and the API/ABI changes noted in the release notes, please.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether
@ 2016-10-18 13:38  4% Bernard Iremonger
  2016-11-04 13:39  4% ` Mcnamara, John
  2016-11-10 10:26  4% ` Pattan, Reshma
  0 siblings, 2 replies; 200+ results
From: Bernard Iremonger @ 2016-10-18 13:38 UTC (permalink / raw)
  To: dev, john.mcnamara; +Cc: Bernard Iremonger

In 17.02 five rte_eth_dev_set_vf_*** functions will be removed
from librte_ether, renamed and moved to the ixgbe PMD.

Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1d274d8..20e11ac 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -53,3 +53,39 @@ Deprecation Notices
 * librte_ether: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
+
+* librte_ether: for 17.02 it is planned to deprecate the following five functions:
+
+  ``rte_eth_dev_set_vf_rxmode``
+
+  ``rte_eth_dev_set_vf_rx``
+
+  ``rte_eth_dev_set_vf_tx``
+
+  ``rte_eth_dev_set_vf_vlan_filter``
+
+  ``rte_eth_set_vf_rate_limit``
+
+  The following fields will be removed from ``struct eth_dev_ops``:
+
+  ``eth_set_vf_rx_mode_t``
+
+  ``eth_set_vf_rx_t``
+
+  ``eth_set_vf_tx_t``
+
+  ``eth_set_vf_vlan_filter_t``
+
+  ``eth_set_vf_rate_limit_t``
+
+  The functions will be renamed to the following, and moved to the ``ixgbe`` PMD.
+
+  ``rte_pmd_ixgbe_set_vf_rxmode``
+
+  ``rte_pmd_ixgbe_set_vf_rx``
+
+  ``rte_pmd_ixgbe_set_vf_tx``
+
+  ``rte_pmd_ixgbe_set_vf_vlan_filter``
+
+  ``rte_pmd_ixgbe_set_vf_rate_limit``
-- 
2.10.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization
    @ 2016-10-18 13:04  0%           ` Neil Horman
  2016-10-18 13:47  3%             ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Neil Horman @ 2016-10-18 13:04 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Shreyansh Jain, Thomas Monjalon, dev, viktorin, David Marchand,
	hemant.agrawal

On Mon, Oct 17, 2016 at 02:43:12PM +0100, Ferruh Yigit wrote:
> On 10/5/2016 12:57 PM, Shreyansh Jain wrote:
> > Hi Thomas,
> > 
> > On Tuesday 04 October 2016 01:12 PM, Thomas Monjalon wrote:
> >> 2016-10-04 12:21, Shreyansh Jain:
> >>> Hi Thomas,
> >>>
> >>> On Monday 03 October 2016 07:58 PM, Thomas Monjalon wrote:
> >>>> Applied, thanks everybody for the great (re)work!
> >>>
> >>> Thanks!
> >>>
> > [...]
> > [...]
> >>>
> >>> It can be merged with changes for:
> >>>   - drv_name
> >>>   - EAL_ before _REGISTER_ macros
> >>>   - eth_driver => rte_driver naming
> >>
> >> Good.
> >> Could you make it this week, please?
> >>
> > 
> > Certainly. At least some of those I can send within this week :)
> > 
> 
> 
> I caught while running ABI validation script today, I think this patch
> should increase LIBABIVER of:
> - lib/librte_cryptodev
> - lib/librte_eal
> - lib/librte_ether
> 
It should, and it in fact should wait a release if it hadn't been documented as
a comming change.
Neil

> Thanks,
> ferruh
> 

^ permalink raw reply	[relevance 0%]

Results 11201-11400 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2016-06-21 12:02     [dpdk-dev] [PATCH v4 00/17] prepare for rte_device / rte_driver Shreyansh Jain
2016-10-03 14:28     ` [dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization Thomas Monjalon
2016-10-04  6:51       ` Shreyansh Jain
2016-10-04  7:42         ` Thomas Monjalon
2016-10-05 11:57           ` Shreyansh Jain
2016-10-17 13:43             ` Ferruh Yigit
2016-10-17 17:29               ` Shreyansh Jain
2016-10-18  9:23                 ` Ferruh Yigit
2016-10-19 11:13  0%               ` Shreyansh Jain
2016-10-18 13:04  0%           ` Neil Horman
2016-10-18 13:47  3%             ` Thomas Monjalon
2016-08-17 12:34     [dpdk-dev] Best Practices for PMD Verification before Upstream Requests Shepard Siegel
2016-08-22 13:07     ` Thomas Monjalon
2016-11-02 12:21  0%   ` Shepard Siegel
2016-08-19 19:32     [dpdk-dev] [RFC v2] Generic flow director/filtering/classification API Adrien Mazarguil
2016-11-16 16:23     ` [dpdk-dev] [PATCH 00/22] Generic flow API (rte_flow) Adrien Mazarguil
2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API Adrien Mazarguil
2016-11-18  6:36  0%     ` Xing, Beilei
2016-11-18 10:28  3%       ` Adrien Mazarguil
2016-11-30 17:47  0%     ` Kevin Traynor
2016-12-01  8:36  2%       ` Adrien Mazarguil
2016-12-02 21:06  0%         ` Kevin Traynor
2016-12-06 18:11  0%           ` Chandran, Sugesh
2016-12-08 17:07  3%           ` Adrien Mazarguil
2016-12-14 11:48  0%             ` Kevin Traynor
2016-12-14 13:54  0%               ` Adrien Mazarguil
2016-12-08  9:00  0%     ` Xing, Beilei
2016-11-16 16:23  2%   ` [dpdk-dev] [PATCH 02/22] cmdline: add support for dynamic tokens Adrien Mazarguil
2016-12-16 16:24       ` [dpdk-dev] [PATCH v2 00/25] Generic flow API (rte_flow) Adrien Mazarguil
2016-12-16 16:24  2%     ` [dpdk-dev] [PATCH v2 01/25] ethdev: introduce generic flow API Adrien Mazarguil
2016-12-16 16:24  1%     ` [dpdk-dev] [PATCH v2 02/25] doc: add rte_flow prog guide Adrien Mazarguil
2016-12-16 16:25  2%     ` [dpdk-dev] [PATCH v2 04/25] cmdline: add support for dynamic tokens Adrien Mazarguil
2016-12-19 17:48         ` [dpdk-dev] [PATCH v3 00/25] Generic flow API (rte_flow) Adrien Mazarguil
2016-12-19 17:48  2%       ` [dpdk-dev] [PATCH v3 01/25] ethdev: introduce generic flow API Adrien Mazarguil
2016-12-19 17:48  1%       ` [dpdk-dev] [PATCH v3 02/25] doc: add rte_flow prog guide Adrien Mazarguil
2016-12-19 17:48  2%       ` [dpdk-dev] [PATCH v3 04/25] cmdline: add support for dynamic tokens Adrien Mazarguil
2016-12-20 18:42           ` [dpdk-dev] [PATCH v4 00/25] Generic flow API (rte_flow) Adrien Mazarguil
2016-12-20 18:42  2%         ` [dpdk-dev] [PATCH v4 01/25] ethdev: introduce generic flow API Adrien Mazarguil
2016-12-20 18:42  1%         ` [dpdk-dev] [PATCH v4 02/25] doc: add rte_flow prog guide Adrien Mazarguil
2016-12-20 18:42  2%         ` [dpdk-dev] [PATCH v4 04/25] cmdline: add support for dynamic tokens Adrien Mazarguil
2016-12-21 14:51             ` [dpdk-dev] [PATCH v5 00/26] Generic flow API (rte_flow) Adrien Mazarguil
2016-12-21 14:51  2%           ` [dpdk-dev] [PATCH v5 01/26] ethdev: introduce generic flow API Adrien Mazarguil
2016-12-21 14:51  1%           ` [dpdk-dev] [PATCH v5 02/26] doc: add rte_flow prog guide Adrien Mazarguil
2016-12-21 14:51  2%           ` [dpdk-dev] [PATCH v5 04/26] cmdline: add support for dynamic tokens Adrien Mazarguil
2016-08-26 10:08     [dpdk-dev] [PATCH] ethdev: fix statistics description Wei Dai
2016-10-04  9:34     ` Thomas Monjalon
2016-11-02  8:28  3%   ` Dai, Wei
2016-11-02  9:07  3%     ` Mcnamara, John
2016-11-03  2:00  0%       ` Remy Horton
2016-11-03  9:07  0%         ` Morten Brørup
2016-11-02  9:21  0%     ` Morten Brørup
2016-11-08 13:33  0%     ` Tahhan, Maryam
2016-09-01  2:16     [dpdk-dev] [RFC] igb_uio: deprecate iomem and ioport mapping Jianfeng Tan
2016-09-22  5:44     ` [dpdk-dev] [PATCH] doc: remove iomem and ioport handling in igb_uio Jianfeng Tan
2016-11-11  2:12  3%   ` Remy Horton
2016-09-23 11:22     [dpdk-dev] [PATCH] doc: announce ABI changes in filtering support Your Name
2016-11-02 15:12 14% ` Stroe, Laura
2016-11-03 11:42  7%   ` Mcnamara, John
2016-10-09  3:16     [dpdk-dev] [PATCH] doc: announce ABI change for ethtool app enhance Qiming Yang
2016-10-24  9:23  4% ` Wu, Jingjing
2016-10-25  2:53  4% ` Xing, Beilei
2016-11-11  1:26  4% ` Zhang, Helin
2016-11-13 13:57  4%   ` Thomas Monjalon
2016-10-13 14:35     [dpdk-dev] [PATCH 0/3] Improvements in packet timestamps support Oleg Kuporosov
2016-10-13 14:35     ` [dpdk-dev] [PATCH 1/3] mbuf: embedding timestamp into the packet Oleg Kuporosov
2016-10-18 15:43       ` Olivier Matz
2016-10-19 10:14  3%     ` Pattan, Reshma
2016-10-18 13:38  4% [dpdk-dev] [PATCH v1] doc: announce API and ABI change for librte_ether Bernard Iremonger
2016-11-04 13:39  4% ` Mcnamara, John
2016-11-10 10:36  4%   ` Ferruh Yigit
2016-11-13 13:46  4%     ` Thomas Monjalon
2016-11-10 10:26  4% ` Pattan, Reshma
2016-10-24 11:59     [dpdk-dev] [PATCH v5 00/21] Introduce SoC device/driver framework for EAL Shreyansh Jain
2016-10-27 15:17  2% ` [dpdk-dev] [PATCH v6 " Shreyansh Jain
2016-10-28 12:26  2%   ` [dpdk-dev] [PATCH v7 " Shreyansh Jain
2016-10-28 12:35  0%     ` Shreyansh Jain
2016-10-24 15:49     [dpdk-dev] mbuf changes Morten Brørup
2016-10-24 16:11     ` Wiles, Keith
2016-10-24 16:25       ` Bruce Richardson
2016-10-25  9:39         ` Adrien Mazarguil
2016-10-25 10:11           ` Morten Brørup
2016-10-25 11:04             ` Adrien Mazarguil
2016-10-25 11:13               ` Bruce Richardson
2016-10-25 12:16                 ` Morten Brørup
2016-10-25 12:20                   ` Bruce Richardson
2016-10-25 12:33                     ` Morten Brørup
2016-10-25 12:45                       ` Bruce Richardson
2016-10-25 12:48                         ` Olivier Matz
2016-10-28 13:34  3%                       ` Pattan, Reshma
2016-10-28 14:11  0%                         ` Morten Brørup
2016-10-28 15:25  0%                           ` Pattan, Reshma
2016-10-28 16:50  0%                           ` Adrien Mazarguil
2016-10-28 17:00  0%                             ` Richardson, Bruce
2016-10-28 20:27  0%                               ` Morten Brørup
2016-10-26 12:38  4% [dpdk-dev] [PATCH] eal: fix libabi macro for device generalization patches Shreyansh Jain
2016-10-26 13:00  4% ` [dpdk-dev] [PATCH v2] " Shreyansh Jain
2016-10-26 13:12  0%   ` Shreyansh Jain
2016-10-27  7:08  7%   ` [dpdk-dev] [PATCH v3] " Shreyansh Jain
2016-10-27 10:15  0%     ` Thomas Monjalon
2016-10-27 11:10  0%       ` Shreyansh Jain
2016-10-27 11:29  9%     ` [dpdk-dev] [PATCH v4] eal: fix lib version " Shreyansh Jain
2016-10-27 11:32  4%       ` Shreyansh Jain
2016-10-27 12:16  4%         ` Thomas Monjalon
2016-10-28  1:04     [dpdk-dev] [PATCH v2 0/3] expanded statistic reporting Remy Horton
2016-10-28  1:04  2% ` [dpdk-dev] [RFC PATCH v2 1/3] lib: add information metrics library Remy Horton
2016-10-28  1:04  2% ` [dpdk-dev] [RFC PATCH v2 2/3] lib: add bitrate statistics library Remy Horton
     [not found]     <DF2A19295B96364286FEB7F3DDA27A46661A797A@SHSMSX101.ccr.corp.intel.com>
     [not found]     ` <6A0DE07E22DDAD4C9103DF62FEBC09093934068B@shsmsx102.ccr.corp.intel.com>
2016-11-02 15:21  3%   ` [dpdk-dev] dpdk16.11 RC2 package ipv4 reassembly example can't work Adrien Mazarguil
2016-11-04  6:36  0%     ` Lu, Wenzhuo
2016-11-04 10:20  0%       ` Adrien Mazarguil
2016-11-04  3:36     [dpdk-dev] [PATCH v3 0/3] Expanded statistics reporting Remy Horton
2016-11-04  3:36  2% ` [dpdk-dev] [PATCH v3 1/3] lib: add information metrics library Remy Horton
2016-11-04  3:36  3% ` [dpdk-dev] [PATCH v3 2/3] lib: add bitrate statistics library Remy Horton
2016-11-06 18:21 13% [dpdk-dev] [PATCH] ethdev: rename library for consistency Thomas Monjalon
2016-11-06 19:54  0% ` Thomas Monjalon
2016-11-07 13:14     [dpdk-dev] [PATCH v4] latencystats: added new library for latency stats Reshma Pattan
2016-11-15 13:37  1% ` [dpdk-dev] [PATCH v5] " Reshma Pattan
2016-11-09 15:04  2% [dpdk-dev] [PATCH] net: introduce big and little endian types Nelio Laranjeiro
2016-12-05 10:09  0% ` Ananyev, Konstantin
2016-12-05 12:06  0%   ` Nélio Laranjeiro
2016-12-06 11:23  0%     ` Ananyev, Konstantin
2016-12-06 11:55  0%       ` Bruce Richardson
2016-12-06 12:41  0%         ` Ananyev, Konstantin
2016-12-06 13:34  0%           ` Bruce Richardson
2016-12-06 14:45  3%             ` Ananyev, Konstantin
2016-12-06 14:56  4%               ` Wiles, Keith
2016-12-06 15:34                     ` Morten Brørup
2016-12-06 16:28                       ` Nélio Laranjeiro
2016-12-06 16:31  3%                     ` Wiles, Keith
2016-12-06 16:36  4%                       ` Richardson, Bruce
2016-12-06 13:14  0%         ` Nélio Laranjeiro
2016-12-06 13:30  0%           ` Bruce Richardson
2016-12-06 14:06  0%     ` Wiles, Keith
2016-12-08  9:30  3% ` Nélio Laranjeiro
2016-12-08 13:59  3%   ` Wiles, Keith
2016-12-08 15:07  0%   ` Neil Horman
2016-12-08 15:10  0%     ` Ananyev, Konstantin
2016-11-09 16:12 15% [dpdk-dev] [PATCH] doc: postpone ABI changes for mbuf Olivier Matz
2016-11-09 22:16  4% ` Thomas Monjalon
2016-11-09 22:31 21% [dpdk-dev] [PATCH] doc: postpone ABI changes for Tx prepare Thomas Monjalon
2016-11-10 10:16  4% ` Mcnamara, John
2016-11-10 10:26  4% ` Kulasek, TomaszX
2016-11-10 23:33  4%   ` Thomas Monjalon
2016-11-10 11:15  4% ` Ananyev, Konstantin
2016-11-10  7:26     [dpdk-dev] Clarification for eth_driver changes Shreyansh Jain
2016-11-10  8:03     ` Thomas Monjalon
2016-11-10  8:42       ` Shreyansh Jain
2016-11-10  8:58         ` Thomas Monjalon
2016-11-10  9:20           ` Jianbo Liu
2016-11-10 10:51  3%         ` Stephen Hemminger
2016-11-10 11:07  0%           ` Thomas Monjalon
2016-11-10 11:09  0%           ` Shreyansh Jain
2016-11-10 11:17  9% [dpdk-dev] [PATCH] doc: announce API and ABI changes for librte_eal Shreyansh Jain
2016-11-10 15:51  4% ` David Marchand
2016-11-11 13:05  4%   ` Ferruh Yigit
2016-11-11 15:02  4%     ` Pattan, Reshma
2016-11-13  9:02  4%       ` Thomas Monjalon
2016-11-14 12:31  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 17.02 John McNamara
2016-11-15  7:15     [dpdk-dev] [PATCH v4 0/3] Expanded statistics reporting Remy Horton
2016-11-15  7:15  2% ` [dpdk-dev] [PATCH v4 1/3] lib: add information metrics library Remy Horton
2016-11-15  7:15  3% ` [dpdk-dev] [PATCH v4 2/3] lib: add bitrate statistics library Remy Horton
2016-11-17  5:29     [dpdk-dev] [RFC PATCH 0/6] Restructure EAL device model for bus support Shreyansh Jain
2016-11-17  5:30     ` [dpdk-dev] [RFC PATCH 6/6] eal: removing eth_driver Shreyansh Jain
2016-11-17 12:53  4%   ` Jan Blunck
2016-11-18 13:05  3%     ` Shreyansh Jain
2016-11-17  9:42     [dpdk-dev] [PATCH 1/5] ethdev: add firmware version get Qiming Yang
2016-12-06  7:16     ` [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw " Qiming Yang
2016-12-06  7:16       ` [dpdk-dev] [PATCH v2 1/5] ethdev: add firmware " Qiming Yang
2016-12-08 11:07  3%     ` Ferruh Yigit
2016-12-12  1:28  4%       ` Yang, Qiming
2016-12-08  8:34       ` [dpdk-dev] [PATCH v2 0/5] example/ethtool: add bus info and fw " Remy Horton
2016-12-22 11:07         ` Thomas Monjalon
2016-12-22 14:36  5%       ` Ferruh Yigit
2016-12-22 14:47  3%         ` Thomas Monjalon
2016-12-22 15:05  0%           ` Ferruh Yigit
2016-12-22 15:31  0%             ` Thomas Monjalon
2016-12-23 12:48  0%               ` Ferruh Yigit
2017-01-05  3:04  3%               ` Zhang, Helin
2016-12-27 12:30       ` [dpdk-dev] [PATCH v3 0/4] new API 'rte_eth_dev_fw_info_get' Qiming Yang
2016-12-27 12:30         ` [dpdk-dev] [PATCH v3 2/4] net/e1000: add firmware version get Qiming Yang
2017-01-03 15:02           ` Ferruh Yigit
2017-01-04  3:14             ` Yang, Qiming
2017-01-04  8:47  4%           ` Ferruh Yigit
2017-01-04 12:03         ` [dpdk-dev] [PATCH v4 0/5] new API 'rte_eth_dev_fw_version_get' Qiming Yang
2017-01-04 12:03  5%       ` [dpdk-dev] [PATCH v4 1/5] ethdev: add firmware version get Qiming Yang
2017-01-08  4:11           ` [dpdk-dev] [PATCH v5 0/5] new API 'rte_eth_dev_fw_version_get' Qiming Yang
2017-01-08  4:11  5%         ` [dpdk-dev] [PATCH v5 1/5] ethdev: add firmware version get Qiming Yang
2017-01-08  6:38  0%           ` Andrew Rybchenko
2017-01-10  9:00  5%         ` [dpdk-dev] [DPDK " Qiming Yang
2017-01-10  9:08             ` [dpdk-dev] [PATCH v6 0/5] new API 'rte_eth_dev_fw_version_get' Qiming Yang
2017-01-10  9:08  5%           ` [dpdk-dev] [PATCH v6 1/5] ethdev: add firmware version get Qiming Yang
2017-01-11  6:41                 ` [dpdk-dev] [PATCH v7 0/5] new API 'rte_eth_dev_fw_version_get' Qiming Yang
2017-01-11  6:41  5%               ` [dpdk-dev] [PATCH v7 1/5] ethdev: add firmware version get Qiming Yang
2017-01-12  6:31                   ` [dpdk-dev] [PATCH v8 0/5] new API 'rte_eth_dev_fw_version_get' Qiming Yang
2017-01-12  6:31  5%                 ` [dpdk-dev] [PATCH v8 1/5] ethdev: add firmware version get Qiming Yang
2016-11-18  5:44     [dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model Jerin Jacob
2016-12-06  3:52     ` [dpdk-dev] [PATCH v2 0/6] libeventdev API and northbound implementation Jerin Jacob
2016-12-06  3:52       ` [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model Jerin Jacob
2016-12-07 10:57  3%     ` Van Haaren, Harry
2016-12-08  1:24  3%       ` Jerin Jacob
2016-12-08 11:02  4%         ` Van Haaren, Harry
2016-12-14 13:13  3%           ` Jerin Jacob
2016-12-14 15:15  0%             ` Bruce Richardson
2016-11-18  8:00     [dpdk-dev] [PATCH v5 0/4] Expanded statistics reporting Remy Horton
2016-11-18  8:00  2% ` [dpdk-dev] [PATCH v5 1/4] lib: add information metrics library Remy Horton
2016-11-18  8:00  3% ` [dpdk-dev] [PATCH v5 2/4] lib: add bitrate statistics library Remy Horton
2016-11-20  8:00     [dpdk-dev] [PATCH] eal: postpone vdev initialization Jerin Jacob
2016-12-03 20:55     ` [dpdk-dev] [PATCH v2 0/2] " Jerin Jacob
2016-12-03 20:55       ` [dpdk-dev] [PATCH v2 2/2] eal: rename dev init API for consistency Jerin Jacob
2016-12-05 10:12         ` Shreyansh Jain
2016-12-05 10:24  3%       ` Jerin Jacob
2016-12-05 14:03  0%         ` Shreyansh Jain
2016-12-02 11:53     [dpdk-dev] [PATCH 00/24] net/i40e: Consistent filter API Beilei Xing
2016-12-02 11:53     ` [dpdk-dev] [PATCH 10/24] ethdev: parse ethertype filter Beilei Xing
2016-12-20 18:12       ` Ferruh Yigit
2016-12-21  3:54         ` Xing, Beilei
2016-12-23  8:43  3%       ` Adrien Mazarguil
2016-12-27  6:36  0%         ` Xing, Beilei
2016-12-03 15:14  3% [dpdk-dev] Intent to upstream Atomic Rules net/ark "Arkville" in DPDK 17.05 Shepard Siegel
2016-12-05 14:10  0% ` Ferruh Yigit
2016-12-04 11:33     [dpdk-dev] [PATCH] Add crypto PMD optimized for ARMv8 zbigniew.bodek
2016-12-06 20:27     ` [dpdk-dev] [PATCH v2 02/12] lib: add cryptodev type for the upcoming ARMv8 PMD Thomas Monjalon
2016-12-07 19:04       ` Zbigniew Bodek
2016-12-07 20:09         ` Thomas Monjalon
2016-12-09 12:06  3%       ` Declan Doherty
2016-12-07 18:04     [dpdk-dev] [RFC] pci: remove unused UNBIND support Stephen Hemminger
2016-12-08 10:53  3% ` David Marchand
2016-12-21 15:15  0%   ` Thomas Monjalon
2016-12-08  2:27  5% [dpdk-dev] [RFC] ethdev: expand size of eth_dev_name in next release Stephen Hemminger
2016-12-08 15:04  4% ` Thomas Monjalon
2016-12-08 15:51     [dpdk-dev] [PATCH v1 0/4] app: make python apps python2/3 compliant John McNamara
2016-12-21 15:03  4% ` [dpdk-dev] [PATCH v4 3/3] doc: add required python versions to docs John McNamara
2016-12-09 11:17  3% [dpdk-dev] [PATCH v1 0/5] net/ixgbe: move set VF functions Bernard Iremonger
2016-12-09 11:27  3% Bernard Iremonger
2016-12-09 11:54  0% ` Ferruh Yigit
2016-12-09 12:00  0%   ` Iremonger, Bernard
2016-12-09 17:25  3% ` [dpdk-dev] [PATCH v2 0/9] " Bernard Iremonger
2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 " Bernard Iremonger
2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 0/7] " Bernard Iremonger
2016-12-13 13:36  0%       ` Ferruh Yigit
2016-12-13 13:46  0%         ` Iremonger, Bernard
2016-12-13 11:40  3%     ` [dpdk-dev] [PATCH v4 6/7] librte_ether: remove the set VF API's Bernard Iremonger
2016-12-13 11:40  4%     ` [dpdk-dev] [PATCH v4 7/7] doc: update release notes Bernard Iremonger
2016-12-12 13:50  3%   ` [dpdk-dev] [PATCH v3 8/9] librte_ether: remove the set VF API's Bernard Iremonger
2016-12-09 17:26     ` [dpdk-dev] [PATCH v2 " Bernard Iremonger
2016-12-09 18:00  3%   ` Ferruh Yigit
2016-12-09 17:26  4% ` [dpdk-dev] [PATCH v2 9/9] doc: remove deprecation notice Bernard Iremonger
2016-12-13 10:03  4% [dpdk-dev] [PATCH] doc: fix required tools list layout Baruch Siach
2016-12-15 15:09  0% ` Mcnamara, John
2016-12-18 19:11  0%   ` Baruch Siach
2016-12-18 20:50  3%     ` Mcnamara, John
     [not found]     <415214732.17903310.1481728244157.JavaMail.zimbra@ulg.ac.be>
2016-12-14 15:13  3% ` [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst tom.barbette
2016-12-14 16:52  0%   ` Bruce Richardson
2016-12-17 10:43  0%     ` tom.barbette
2016-12-19 10:25  0%       ` Bruce Richardson
2016-12-14 23:40  3% [dpdk-dev] KNI broken again with 4.9 kernel Stephen Hemminger
2016-12-15 11:53  0% ` [dpdk-dev] KNI Questions Ferruh Yigit
2016-12-15 12:01  0% ` [dpdk-dev] KNI broken again with 4.9 kernel Mcnamara, John
2016-12-15 12:55  0%   ` Jay Rolette
2016-12-14 23:59  3% [dpdk-dev] [PATCH 0/2] support for Hyper-V VMBUS Stephen Hemminger
2016-12-15 21:59  4% [dpdk-dev] [PATCH 0/3] buildtools/devtools/usertools Thomas Monjalon
2016-12-15 21:59 32% ` [dpdk-dev] [PATCH 2/3] scripts: move to devtools Thomas Monjalon
2016-12-15 21:59  2% ` [dpdk-dev] [PATCH 3/3] tools: move to usertools Thomas Monjalon
2016-12-16 12:48     [dpdk-dev] [PATCH 0/3] New API to free consumed buffers in TX ring Billy McFall
2016-12-16 12:48     ` [dpdk-dev] [PATCH 1/3] ethdev: " Billy McFall
2016-12-20 11:27       ` Adrien Mazarguil
2016-12-20 12:17         ` Ananyev, Konstantin
2016-12-20 12:58           ` Adrien Mazarguil
2016-12-20 14:15             ` Billy McFall
2016-12-23  9:45  4%           ` Adrien Mazarguil
2016-12-16 14:38     [dpdk-dev] [PATCH v4 00/29] Support VFD and DPDK PF + kernel VF on i40e Ferruh Yigit
2016-12-16 19:02     ` [dpdk-dev] [PATCH v5 " Ferruh Yigit
2016-12-16 19:02       ` [dpdk-dev] [PATCH v5 29/29] net/i40e: set/clear VF stats from PF Ferruh Yigit
2016-12-20 13:24         ` Ferruh Yigit
2016-12-20 13:40           ` Iremonger, Bernard
2016-12-21  0:56  3%         ` Lu, Wenzhuo
2016-12-22 16:38  0%           ` Iremonger, Bernard
2016-12-19 19:28 11% [dpdk-dev] [PATCH v2] doc: fix required tools list layout Baruch Siach
2016-12-19 21:59     [dpdk-dev] [RFC v2 00/13] Generalize rte_eth_dev model Stephen Hemminger
2016-12-19 21:59 16% ` [dpdk-dev] [PATCH 06/13] ethdev: make dev_info generic (not just PCI) Stephen Hemminger
2016-12-20 11:20  0%   ` Jan Blunck
2016-12-22 11:53     [dpdk-dev] [PATCH v2] ethdev: cleanup device ops struct whitespace Ferruh Yigit
2016-12-22 13:10     ` [dpdk-dev] [PATCH v3] " Ferruh Yigit
2016-12-22 15:10  3%   ` Jan Blunck
2016-12-22 15:16  3%     ` Ferruh Yigit
2016-12-22 15:28  3%       ` Thomas Monjalon
2017-01-05 10:44  4% [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev Bernard Iremonger
2017-01-05 13:31  4% ` Thomas Monjalon
2017-01-05 14:40  4%   ` Iremonger, Bernard
2017-01-05 15:25  4% ` [dpdk-dev] [PATCH v2] " Bernard Iremonger
2017-01-11 15:05  3% [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Bruce Richardson
2017-01-11 15:05  2% ` [dpdk-dev] [RFC PATCH 01/11] ring: add new typed ring header file Bruce Richardson
2017-01-11 15:05  1% ` [dpdk-dev] [RFC PATCH 05/11] ring: add user-specified typing to typed rings Bruce Richardson
2017-01-11 15:05 12% ` [dpdk-dev] [RFC PATCH 07/11] ring: allow multiple typed rings in the same unit Bruce Richardson
2017-01-11 15:05  6% ` [dpdk-dev] [RFC PATCH 09/11] ring: make existing rings reuse the typed ring definitions Bruce Richardson
2017-01-13 14:23  3% ` [dpdk-dev] [RFC PATCH 00/11] generalise rte_ring to allow different datatypes Olivier Matz
2017-01-11 16:03     [dpdk-dev] [PATCH v6 0/4] Expanded statistics reporting Remy Horton
2017-01-11 16:03  2% ` [dpdk-dev] [PATCH v6 1/4] lib: add information metrics library Remy Horton
2017-01-11 16:03  3% ` [dpdk-dev] [PATCH v6 2/4] lib: add bitrate statistics library Remy Horton
2017-01-13 13:06  6% [dpdk-dev] [PATCH v1] doc: add guidelines on stable and lts releases John McNamara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).