DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated
  2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-26 16:09  7%     ` Andrew Rybchenko
  2018-03-26 16:09  6%     ` [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - clarify min_chunk_size meaning
 - rebase on top of patch series which fixes library version in meson
   build

RFCv2 -> v1:
 - move default calc_mem_size callback to rte_mempool_ops_default.c
 - add ABI changes to release notes
 - name default callback consistently: rte_mempool_op_<callback>_default()
 - bump ABI version since it is the first patch which breaks ABI
 - describe default callback behaviour in details
 - avoid introduction of internal function to cope with deprecation
   (keep it to deprecation patch)
 - move cache-line or page boundary chunk alignment to default callback
 - highlight that min_chunk_size and align parameters are output only

 doc/guides/rel_notes/deprecation.rst         |  3 +-
 doc/guides/rel_notes/release_18_05.rst       |  7 ++-
 lib/librte_mempool/Makefile                  |  3 +-
 lib/librte_mempool/meson.build               |  5 +-
 lib/librte_mempool/rte_mempool.c             | 43 +++++++-------
 lib/librte_mempool/rte_mempool.h             | 86 +++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c         | 18 ++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 38 ++++++++++++
 lib/librte_mempool/rte_mempool_version.map   |  7 +++
 9 files changed, 182 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6594585..e02d4ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,8 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize required memory chunk calculation,
-    customize objects population and allocate contiguous
+  - addition of new ops to customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index f2525bb..59583ea 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -80,6 +80,11 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Changed rte_mempool_ops structure.**
+
+  A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
+  to allow to customize required memory size calculation.
+
 
 Removed Items
 -------------
@@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.3
-     librte_mempool.so.3
+   + librte_mempool.so.4
    + librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..072740f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
 
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 712720f..9e3b527 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 3
-sources = files('rte_mempool.c', 'rte_mempool_ops.c')
+version = 4
+sources = files('rte_mempool.c', 'rte_mempool_ops.c',
+		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d8e3720..dd2d0fe 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -561,10 +561,10 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
-	size_t size, total_elt_sz, align, pg_sz, pg_shift;
+	ssize_t mem_size;
+	size_t align, pg_sz, pg_shift;
 	rte_iova_t iova;
 	unsigned mz_id, n;
-	unsigned int mp_flags;
 	int ret;
 
 	ret = mempool_ops_alloc_once(mp);
@@ -575,29 +575,23 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_flags;
-
 	if (rte_eal_has_hugepages()) {
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
-		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
-		align = pg_sz;
 	}
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		size_t min_chunk_size;
+
+		mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+				&min_chunk_size, &align);
+		if (mem_size < 0) {
+			ret = mem_size;
+			goto fail;
+		}
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -606,7 +600,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
+		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 			mp->socket_id, mz_flags, align);
 		/* not enough memory, retry with the biggest zone we have */
 		if (mz == NULL)
@@ -617,6 +611,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
+		if (mz->len < min_chunk_size) {
+			rte_memzone_free(mz);
+			ret = -ENOMEM;
+			goto fail;
+		}
+
 		if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG)
 			iova = RTE_BAD_IOVA;
 		else
@@ -649,13 +649,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 static size_t
 get_anon_size(const struct rte_mempool *mp)
 {
-	size_t size, total_elt_sz, pg_sz, pg_shift;
+	size_t size, pg_sz, pg_shift;
+	size_t min_chunk_size;
+	size_t align;
 
 	pg_sz = getpagesize();
 	pg_shift = rte_bsf32(pg_sz);
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+	size = rte_mempool_ops_calc_mem_size(mp, mp->size, pg_shift,
+					     &min_chunk_size, &align);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e531a15..191255d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -400,6 +400,62 @@ typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
 typedef int (*rte_mempool_ops_register_memory_area_t)
 (const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
 
+/**
+ * Calculate memory size required to store given number of objects.
+ *
+ * If mempool objects are not required to be IOVA-contiguous
+ * (the flag MEMPOOL_F_NO_IOVA_CONTIG is set), min_chunk_size defines
+ * virtually contiguous chunk size. Otherwise, if mempool objects must
+ * be IOVA-contiguous (the flag MEMPOOL_F_NO_IOVA_CONTIG is clear),
+ * min_chunk_size defines IOVA-contiguous chunk size.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+		uint32_t obj_num,  uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+		uint32_t obj_num, uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -416,6 +472,11 @@ struct rte_mempool_ops {
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
+	/**
+	 * Optional callback to calculate memory size required to
+	 * store specified number of objects.
+	 */
+	rte_mempool_calc_mem_size_t calc_mem_size;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -565,6 +626,29 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
 				char *vaddr, rte_iova_t iova, size_t len);
 
 /**
+ * @internal wrapper for mempool_ops calc_mem_size callback.
+ * API to calculate size of memory required to store specified number of
+ * object.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				      uint32_t obj_num, uint32_t pg_shift,
+				      size_t *min_chunk_size, size_t *align);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
@@ -1534,7 +1618,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * of objects. Assume that the memory buffer will be aligned at page
  * boundary.
  *
- * Note that if object size is bigger then page size, then it assumes
+ * Note that if object size is bigger than page size, then it assumes
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 0732255..26908cc 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_count = h->get_count;
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
+	ops->calc_mem_size = h->calc_mem_size;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -123,6 +124,23 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
 	return ops->register_memory_area(mp, vaddr, iova, len);
 }
 
+/* wrapper to notify new memory area to external mempool */
+ssize_t
+rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				uint32_t obj_num, uint32_t pg_shift,
+				size_t *min_chunk_size, size_t *align)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->calc_mem_size == NULL)
+		return rte_mempool_op_calc_mem_size_default(mp, obj_num,
+				pg_shift, min_chunk_size, align);
+
+	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
new file mode 100644
index 0000000..57fe79b
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 6WIND S.A.
+ * Copyright(c) 2018 Solarflare Communications Inc.
+ */
+
+#include <rte_mempool.h>
+
+ssize_t
+rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+				     uint32_t obj_num, uint32_t pg_shift,
+				     size_t *min_chunk_size, size_t *align)
+{
+	unsigned int mp_flags;
+	int ret;
+	size_t total_elt_sz;
+	size_t mem_size;
+
+	/* Get mempool capabilities */
+	mp_flags = 0;
+	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
+					 mp->flags | mp_flags);
+
+	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
+		*min_chunk_size = mem_size;
+	else
+		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+
+	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
+
+	return mem_size;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 62b76f9..cb38189 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -51,3 +51,10 @@ DPDK_17.11 {
 	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
+
+DPDK_18.05 {
+	global:
+
+	rte_mempool_op_calc_mem_size_default;
+
+} DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions
  2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
                       ` (2 preceding siblings ...)
  2018-03-26 16:09  6%     ` [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
@ 2018-03-26 16:09  4%     ` Andrew Rybchenko
  3 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Thomas Monjalon

Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_default().

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - deprecate rte_mempool_populate_iova_tab()
 - add -Wno-deprecated-declarations to fix build errors because of
   rte_mempool_populate_iova_tab() deprecation
 - add @deprecated to deprecated functions description

RFCv2 -> v1:
 - advertise deprecation in release notes
 - factor out default memory size calculation into non-deprecated
   internal function to avoid usage of deprecated function internally
 - remove test for deprecated functions to address build issue because
   of usage of deprecated functions (it is easy to allow usage of
   deprecated function in Makefile, but very complicated in meson)

 doc/guides/rel_notes/deprecation.rst         |  7 -------
 doc/guides/rel_notes/release_18_05.rst       | 11 ++++++++++
 lib/librte_mempool/Makefile                  |  3 +++
 lib/librte_mempool/meson.build               | 12 +++++++++++
 lib/librte_mempool/rte_mempool.c             | 19 ++++++++++++++---
 lib/librte_mempool/rte_mempool.h             | 30 +++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops_default.c |  4 ++--
 test/test/test_mempool.c                     | 31 ----------------------------
 8 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 4deed9a..473330d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -60,13 +60,6 @@ Deprecation Notices
   - ``rte_eal_mbuf_default_mempool_ops``
 
 * mempool: several API and ABI changes are planned in v18.05.
-  The following functions, introduced for Xen, which is not supported
-  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
-  Therefore they will be deprecated in v18.05 and removed in v18.08:
-
-  - ``rte_mempool_xmem_create``
-  - ``rte_mempool_xmem_size``
-  - ``rte_mempool_xmem_usage``
 
   The following changes are planned:
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index c50f26c..6a8db54 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -74,6 +74,17 @@ API Changes
   Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
   used to achieve it without specific knowledge in the generic code.
 
+* **Deprecated mempool xmem functions.**
+
+  The following functions, introduced for Xen, which is not supported
+  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
+  Therefore they were deprecated in v18.05 and will be removed in v18.08:
+
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+  - ``rte_mempool_populate_iova_tab``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 072740f..2c46fdd 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -7,6 +7,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_mempool.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+CFLAGS += -Wno-deprecated-declarations
 LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 9e3b527..22e912a 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,6 +1,18 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+extra_flags = []
+
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+extra_flags += '-Wno-deprecated-declarations'
+
+foreach flag: extra_flags
+	if cc.has_argument(flag)
+		cflags += flag
+	endif
+endforeach
+
 version = 4
 sources = files('rte_mempool.c', 'rte_mempool_ops.c',
 		'rte_mempool_ops_default.c')
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 40eedde..8c3b0b1 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Calculate maximum amount of memory required to store given number of objects.
+ * Internal function to calculate required memory chunk size shared
+ * by default implementation of the corresponding callback and
+ * deprecated external function.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
+rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+				 uint32_t pg_shift)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 
@@ -228,6 +230,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 }
 
 /*
+ * Calculate maximum amount of memory required to store given number of objects.
+ */
+size_t
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
+		      __rte_unused unsigned int flags)
+{
+	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
+						pg_shift);
+}
+
+/*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0b83d5e..9107f5a 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -427,6 +427,28 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal Helper function to calculate memory size required to store
+ * specified number of objects in assumption that the memory buffer will
+ * be aligned at page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * @param elt_num
+ *   Number of elements.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by rte_mempool_calc_obj_size().
+ * @param pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+size_t rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+		uint32_t pg_shift);
+
+/**
  * Function to be called for each populated object.
  *
  * @param[in] mp
@@ -855,6 +877,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 		   int socket_id, unsigned flags);
 
 /**
+ * @deprecated
  * Create a new mempool named *name* in memory.
  *
  * The pool contains n elements of elt_size. Its size is set to n.
@@ -912,6 +935,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. See rte_mempool_create() for details.
  */
+__rte_deprecated
 struct rte_mempool *
 rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
 		unsigned cache_size, unsigned private_data_size,
@@ -1008,6 +1032,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
 	void *opaque);
 
 /**
+ * @deprecated
  * Add physical memory for objects in the pool at init
  *
  * Add a virtually contiguous memory chunk in the pool where objects can
@@ -1033,6 +1058,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
  *   On error, the chunks are not added in the memory list of the
  *   mempool and a negative errno is returned.
  */
+__rte_deprecated
 int rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr,
 	const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift,
 	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque);
@@ -1652,6 +1678,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 	struct rte_mempool_objsz *sz);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate the maximum amount of memory required to store given number
@@ -1674,10 +1701,12 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * @return
  *   Required memory size aligned at page boundary.
  */
+__rte_deprecated
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
 	uint32_t pg_shift, unsigned int flags);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate how much memory would be actually required with the given
@@ -1705,6 +1734,7 @@ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
  *   buffer is too small, return a negative value whose absolute value
  *   is the actual number of elements that can be stored in that buffer.
  */
+__rte_deprecated
 ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
 	uint32_t pg_shift, unsigned int flags);
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 3defc15..fd63ca1 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -16,8 +16,8 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags);
+	mem_size = rte_mempool_calc_mem_size_helper(obj_num, total_elt_sz,
+						    pg_shift);
 
 	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 63f921e..8d29af2 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -444,34 +444,6 @@ test_mempool_same_name_twice_creation(void)
 	return 0;
 }
 
-/*
- * Basic test for mempool_xmem functions.
- */
-static int
-test_mempool_xmem_misc(void)
-{
-	uint32_t elt_num, total_size;
-	size_t sz;
-	ssize_t usz;
-
-	elt_num = MAX_KEEP;
-	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
-	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
-
-	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
-		MEMPOOL_PG_SHIFT_MAX, 0);
-
-	if (sz != (size_t)usz)  {
-		printf("failure @ %s: rte_mempool_xmem_usage(%u, %u) "
-			"returns: %#zx, while expected: %#zx;\n",
-			__func__, elt_num, total_size, sz, (size_t)usz);
-		return -1;
-	}
-
-	return 0;
-}
-
 static void
 walk_cb(struct rte_mempool *mp, void *userdata __rte_unused)
 {
@@ -596,9 +568,6 @@ test_mempool(void)
 	if (test_mempool_same_name_twice_creation() < 0)
 		goto err;
 
-	if (test_mempool_xmem_misc() < 0)
-		goto err;
-
 	/* test the stack handler */
 	if (test_mempool_basic(mp_stack, 1) < 0)
 		goto err;
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities
  2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-26 16:09  7%     ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-26 16:09  6%     ` [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
@ 2018-03-26 16:09  6%     ` Andrew Rybchenko
  2018-03-26 16:09  4%     ` [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions Andrew Rybchenko
  3 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Santosh Shukla, Jerin Jacob

The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - fix typo
 - rebase on top of patch which renames MEMPOOL_F_NO_PHYS_CONTIG

RFCv2 -> v1:
 - squash mempool/octeontx patches to add calc_mem_size and populate
   callbacks to this one in order to avoid breakages in the middle of
   patchset
 - advertise API changes in release notes

 doc/guides/rel_notes/deprecation.rst            |  1 -
 doc/guides/rel_notes/release_18_05.rst          | 11 +++++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 +++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.c                | 44 ++----------------
 lib/librte_mempool/rte_mempool.h                | 52 +---------------------
 lib/librte_mempool/rte_mempool_ops.c            | 14 ------
 lib/librte_mempool/rte_mempool_ops_default.c    | 15 +------
 lib/librte_mempool/rte_mempool_version.map      |  1 -
 8 files changed, 68 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index c06fc67..4deed9a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -70,7 +70,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index abaefe5..c50f26c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -66,6 +66,14 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Removed mempool capability flags and related functions.**
+
+  Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and
+  ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool
+  driver to customize generic mempool library behaviour.
+  Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
+  used to achieve it without specific knowledge in the generic code.
+
 
 ABI Changes
 -----------
@@ -86,6 +94,9 @@ ABI Changes
   to allow to customize required memory size calculation.
   A new callback ``populate`` has been added to ``rte_mempool_ops``
   to allow to customize objects population.
+  Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
+  since its features are covered by ``calc_mem_size`` and ``populate``
+  callbacks.
 
 
 Removed Items
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index d143d05..64ed528 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
 	return octeontx_fpa_bufpool_free_count(pool);
 }
 
-static int
-octeontx_fpavf_get_capabilities(const struct rte_mempool *mp,
-				unsigned int *flags)
+static ssize_t
+octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp,
+			     uint32_t obj_num, uint32_t pg_shift,
+			     size_t *min_chunk_size, size_t *align)
 {
-	RTE_SET_USED(mp);
-	*flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
-			MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS);
-	return 0;
+	ssize_t mem_size;
+
+	/*
+	 * Simply need space for one more object to be able to
+	 * fulfil alignment requirements.
+	 */
+	mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1,
+							pg_shift,
+							min_chunk_size, align);
+	if (mem_size >= 0) {
+		/*
+		 * Memory area which contains objects must be physically
+		 * contiguous.
+		 */
+		*min_chunk_size = mem_size;
+	}
+
+	return mem_size;
 }
 
 static int
@@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp,
 	return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool);
 }
 
+static int
+octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs,
+			void *vaddr, rte_iova_t iova, size_t len,
+			rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+
+	if (iova == RTE_BAD_IOVA)
+		return -EINVAL;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	/* align object start address to a multiple of total_elt_sz */
+	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
+
+	if (len < off)
+		return -EINVAL;
+
+	vaddr = (char *)vaddr + off;
+	iova += off;
+	len -= off;
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len,
+					       obj_cb, obj_cb_arg);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.name = "octeontx_fpavf",
 	.alloc = octeontx_fpavf_alloc,
@@ -157,8 +199,9 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.enqueue = octeontx_fpavf_enqueue,
 	.dequeue = octeontx_fpavf_dequeue,
 	.get_count = octeontx_fpavf_get_count,
-	.get_capabilities = octeontx_fpavf_get_capabilities,
 	.register_memory_area = octeontx_fpavf_register_memory_area,
+	.calc_mem_size = octeontx_fpavf_calc_mem_size,
+	.populate = octeontx_fpavf_populate,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d917dc7..40eedde 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -208,15 +208,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      __rte_unused unsigned int flags)
 {
 	size_t obj_per_page, pg_num, pg_sz;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	if (total_elt_sz == 0)
 		return 0;
@@ -240,18 +234,12 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 ssize_t
 rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags)
+	uint32_t pg_shift, __rte_unused unsigned int flags)
 {
 	uint32_t elt_cnt = 0;
 	rte_iova_t start, end;
 	uint32_t iova_idx;
 	size_t pg_sz = (size_t)1 << pg_shift;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	/* if iova is NULL, assume contiguous memory */
 	if (iova == NULL) {
@@ -345,8 +333,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque)
 {
-	unsigned total_elt_sz;
-	unsigned int mp_capa_flags;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -365,27 +351,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-
-	/* Get mempool capabilities */
-	mp_capa_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_capa_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_capa_flags;
-
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -397,10 +362,7 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp_capa_flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
-		/* align object start address to a multiple of total_elt_sz */
-		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
-	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 754261e..0b83d5e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -246,24 +246,6 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
-/**
- * This capability flag is advertised by a mempool handler, if the whole
- * memory area containing the objects must be physically contiguous.
- * Note: This flag should not be passed by application.
- */
-#define MEMPOOL_F_CAPA_PHYS_CONTIG 0x0040
-/**
- * This capability flag is advertised by a mempool handler. Used for a case
- * where mempool driver wants object start address(vaddr) aligned to block
- * size(/ total element size).
- *
- * Note:
- * - This flag should not be passed by application.
- *   Flag used for mempool driver only.
- * - Mempool driver must also set MEMPOOL_F_CAPA_PHYS_CONTIG flag along with
- *   MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS.
- */
-#define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -389,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Get the mempool capabilities.
- */
-typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
-		unsigned int *flags);
-
-/**
  * Notify new memory area to mempool.
  */
 typedef int (*rte_mempool_ops_register_memory_area_t)
@@ -440,13 +416,7 @@ typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
- * If mempool driver requires object addresses to be block size aligned
- * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
- * reserved to be able to meet the requirement.
- *
- * Minimum size of memory chunk is either all required space, if
- * capabilities say that whole memory area must be physically contiguous
- * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * Minimum size of memory chunk is a maximum of the page size and total
  * element size.
  *
  * Required memory chunk alignment is a maximum of page size and cache
@@ -522,10 +492,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Get the mempool capabilities
-	 */
-	rte_mempool_get_capabilities_t get_capabilities;
-	/**
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
@@ -651,22 +617,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops get_capabilities callback.
- *
- * @param mp [in]
- *   Pointer to the memory pool.
- * @param flags [out]
- *   Pointer to the mempool flags.
- * @return
- *   - 0: Success; The mempool driver has advertised his pool capabilities in
- *   flags param.
- *   - -ENOTSUP - doesn't support get_capabilities ops (valid case).
- *   - Otherwise, pool create fails.
- */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags);
-/**
  * @internal wrapper for mempool_ops register_memory_area callback.
  * API to notify the mempool handler when a new memory area is added to pool.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 1a7f39f..6ac669a 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 	return ops->get_count(mp);
 }
 
-/* wrapper to get external mempool capabilities. */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->get_capabilities, -ENOTSUP);
-	return ops->get_capabilities(mp, flags);
-}
-
 /* wrapper to notify new memory area to external mempool */
 int
 rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57295f7..3defc15 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -11,26 +11,15 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 				     uint32_t obj_num, uint32_t pg_shift,
 				     size_t *min_chunk_size, size_t *align)
 {
-	unsigned int mp_flags;
-	int ret;
 	size_t total_elt_sz;
 	size_t mem_size;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
 	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags | mp_flags);
+					 mp->flags);
 
-	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
-		*min_chunk_size = mem_size;
-	else
-		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
 	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 41a0b09..637f73f 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_get_capabilities;
 	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver
                       ` (2 preceding siblings ...)
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
@ 2018-03-26 16:09  2%   ` Andrew Rybchenko
  2018-03-26 16:09  7%     ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
                       ` (3 more replies)
  3 siblings, 4 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev
  Cc: Olivier MATZ, Thomas Monjalon, Anatoly Burakov, Santosh Shukla,
	Jerin Jacob, Hemant Agrawal, Shreyansh Jain

The patch series should be applied on top of [7].

The initial patch series [1] is split into two to simplify processing.
The second series relies on this one and will add bucket mempool driver
and related ops.

The patch series has generic enhancements suggested by Olivier.
Basically it adds driver callbacks to calculate required memory size and
to populate objects using provided memory area. It allows to remove
so-called capability flags used before to tell generic code how to
allocate and slice allocated memory into mempool objects.
Clean up which removes get_capabilities and register_memory_area is
not strictly required, but I think right thing to do.
Existing mempool drivers are updated.

rte_mempool_populate_iova_tab() is also deprecated in v2 as agreed in [2].
Unfortunately it requires addition of -Wno-deprecated-declarations flag
to librte_mempool since the function is used by deprecated earlier
rte_mempool_populate_phys_tab(). If the later may be removed in the
release, we can avoid addition of the flag to allow usage of deprecated
functions.

One open question remains from previous review [3].

The patch series interfere with memory hotplug for DPDK [4] ([5] to be
precise). So, rebase may be required.

A new patch is added to the series to rename MEMPOOL_F_NO_PHYS_CONTIG
as MEMPOOL_F_NO_IOVA_CONTIG as agreed in [6].
MEMPOOL_F_CAPA_PHYS_CONTIG is not renamed since it removed in this
patchset.

It breaks ABI since changes rte_mempool_ops. Also it removes
rte_mempool_ops_register_memory_area() and
rte_mempool_ops_get_capabilities() since corresponding callbacks are
removed.

Internal global functions are not listed in map file since it is not
a part of external API.

[1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
[2] https://dpdk.org/ml/archives/dev/2018-March/093186.html
[3] https://dpdk.org/ml/archives/dev/2018-March/093329.html
[4] https://dpdk.org/ml/archives/dev/2018-March/092070.html
[5] https://dpdk.org/ml/archives/dev/2018-March/092088.html
[6] https://dpdk.org/ml/archives/dev/2018-March/093345.html
[7] https://dpdk.org/ml/archives/dev/2018-March/093196.html

v2 -> v3:
  - fix build error in mempool/dpaa: prepare to remove register memory area op

v1 -> v2:
  - deprecate rte_mempool_populate_iova_tab()
  - add patch to fix memory leak if no objects are populated
  - add patch to rename MEMPOOL_F_NO_PHYS_CONTIG
  - minor fixes (typos, blank line at the end of file)
  - highlight meaning of min_chunk_size (when it is virtual or
    physical contiguous)
  - make sure that mempool is initialized in rte_mempool_populate_anon()
  - move patch to ensure that mempool is initialized earlier in the series

RFCv2 -> v1:
  - split the series in two
  - squash octeontx patches which implement calc_mem_size and populate
    callbacks into the patch which removes get_capabilities since it is
    the easiest way to untangle the tangle of tightly related library
    functions and flags advertised by the driver
  - consistently name default callbacks
  - move default callbacks to dedicated file
  - see detailed description in patches

RFCv1 -> RFCv2:
  - add driver ops to calculate required memory size and populate
    mempool objects, remove extra flags which were required before
    to control it
  - transition of octeontx and dpaa drivers to the new callbacks
  - change info API to get information from driver required to
    API user to know contiguous block size
  - remove get_capabilities (not required any more and may be
    substituted with more in info get API)
  - remove register_memory_area since it is substituted with
    populate callback which can do more
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - deprecate XMEM API
  - avoid introduction of a new function to flush cache
  - fix NO_CACHE_ALIGN case in bucket mempool

Andrew Rybchenko (9):
  mempool: fix memhdr leak when no objects are populated
  mempool: rename flag to control IOVA-contiguous objects
  mempool: add op to calculate memory size to be allocated
  mempool: add op to populate objects using provided memory
  mempool: remove callback to get capabilities
  mempool: deprecate xmem functions
  mempool/octeontx: prepare to remove register memory area op
  mempool/dpaa: prepare to remove register memory area op
  mempool: remove callback to register memory area

Artem V. Andreev (2):
  mempool: ensure the mempool is initialized before populating
  mempool: support flushing the default cache of the mempool

 doc/guides/rel_notes/deprecation.rst            |  12 +-
 doc/guides/rel_notes/release_18_05.rst          |  33 ++-
 drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
 drivers/net/thunderx/nicvf_ethdev.c             |   2 +-
 lib/librte_mempool/Makefile                     |   6 +-
 lib/librte_mempool/meson.build                  |  17 +-
 lib/librte_mempool/rte_mempool.c                | 179 ++++++++-------
 lib/librte_mempool/rte_mempool.h                | 280 +++++++++++++++++-------
 lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
 lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
 lib/librte_mempool/rte_mempool_version.map      |  10 +-
 test/test/test_mempool.c                        |  31 ---
 13 files changed, 485 insertions(+), 250 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory
  2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-26 16:09  7%     ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-26 16:09  6%     ` Andrew Rybchenko
  2018-03-26 16:09  6%     ` [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
  2018-03-26 16:09  4%     ` [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions Andrew Rybchenko
  3 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - fix memory leak if off is bigger than len

RFCv2 -> v1:
 - advertise ABI changes in release notes
 - use consistent name for default callback:
   rte_mempool_op_<callback>_default()
 - add opaque data pointer to populated object callback
 - move default callback to dedicated file

 doc/guides/rel_notes/deprecation.rst         |  2 +-
 doc/guides/rel_notes/release_18_05.rst       |  2 +
 lib/librte_mempool/rte_mempool.c             | 23 ++++---
 lib/librte_mempool/rte_mempool.h             | 90 ++++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops.c         | 21 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 24 ++++++++
 lib/librte_mempool/rte_mempool_version.map   |  1 +
 7 files changed, 149 insertions(+), 14 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e02d4ca..c06fc67 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,7 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize objects population and allocate contiguous
+  - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 59583ea..abaefe5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,8 @@ ABI Changes
 
   A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
   to allow to customize required memory size calculation.
+  A new callback ``populate`` has been added to ``rte_mempool_ops``
+  to allow to customize objects population.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index dd2d0fe..d917dc7 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size)
 }
 
 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
+mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
+		 void *obj, rte_iova_t iova)
 {
 	struct rte_mempool_objhdr *hdr;
 	struct rte_mempool_objtlr *tlr __rte_unused;
@@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 	tlr = __mempool_get_trailer(obj);
 	tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-
-	/* enqueue in ring */
-	rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
 }
 
 /* call obj_cb() for each mempool element */
@@ -407,17 +405,16 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
 
-	while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
-		off += mp->header_size;
-		if (iova == RTE_BAD_IOVA)
-			mempool_add_elem(mp, (char *)vaddr + off,
-				RTE_BAD_IOVA);
-		else
-			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
-		off += mp->elt_size + mp->trailer_size;
-		i++;
+	if (off > len) {
+		ret = -EINVAL;
+		goto fail;
 	}
 
+	i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
+		(char *)vaddr + off,
+		(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
+		len - off, mempool_add_elem, NULL);
+
 	/* not enough room to store one object */
 	if (i == 0) {
 		ret = -EINVAL;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 191255d..754261e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -456,6 +456,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		uint32_t obj_num, uint32_t pg_shift,
 		size_t *min_chunk_size, size_t *align);
 
+/**
+ * Function to be called for each populated object.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] opaque
+ *   An opaque pointer passed to iterator.
+ * @param[in] vaddr
+ *   Object virtual address.
+ * @param[in] iova
+ *   Input/output virtual address of the object or RTE_BAD_IOVA.
+ */
+typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp,
+		void *opaque, void *vaddr, rte_iova_t iova);
+
+/**
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * Populated objects should be enqueued to the pool, e.g. using
+ * rte_mempool_ops_enqueue_bulk().
+ *
+ * If the given IO address is unknown (iova = RTE_BAD_IOVA),
+ * the chunk doesn't need to be physically contiguous (only virtually),
+ * and allocated objects may span two pages.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+typedef int (*rte_mempool_populate_t)(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
+/**
+ * Default way to populate memory pool object using provided memory
+ * chunk: just slice objects one by one.
+ */
+int rte_mempool_op_populate_default(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -477,6 +534,11 @@ struct rte_mempool_ops {
 	 * store specified number of objects.
 	 */
 	rte_mempool_calc_mem_size_t calc_mem_size;
+	/**
+	 * Optional callback to populate mempool objects using
+	 * provided memory chunk.
+	 */
+	rte_mempool_populate_t populate;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -649,6 +711,34 @@ ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				      size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal wrapper for mempool_ops populate callback.
+ *
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+			     void *vaddr, rte_iova_t iova, size_t len,
+			     rte_mempool_populate_obj_cb_t *obj_cb,
+			     void *obj_cb_arg);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 26908cc..1a7f39f 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
+	ops->populate = h->populate;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -141,6 +142,26 @@ rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
 }
 
+/* wrapper to populate memory pool objects using provided memory chunk */
+int
+rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+				void *vaddr, rte_iova_t iova, size_t len,
+				rte_mempool_populate_obj_cb_t *obj_cb,
+				void *obj_cb_arg)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->populate == NULL)
+		return rte_mempool_op_populate_default(mp, max_objs, vaddr,
+						       iova, len, obj_cb,
+						       obj_cb_arg);
+
+	return ops->populate(mp, max_objs, vaddr, iova, len, obj_cb,
+			     obj_cb_arg);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57fe79b..57295f7 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -36,3 +36,27 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	return mem_size;
 }
+
+int
+rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+	unsigned int i;
+	void *obj;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+		off += mp->header_size;
+		obj = (char *)vaddr + off;
+		obj_cb(mp, obj_cb_arg, obj,
+		       (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off));
+		rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
+		off += mp->elt_size + mp->trailer_size;
+	}
+
+	return i;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index cb38189..41a0b09 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -56,5 +56,6 @@ DPDK_18.05 {
 	global:
 
 	rte_mempool_op_calc_mem_size_default;
+	rte_mempool_op_populate_default;
 
 } DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-25 18:17  0%     ` Vladimir Medvedkin
@ 2018-03-26  9:50  0%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-03-26  9:50 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev

On Sun, Mar 25, 2018 at 09:17:20PM +0300, Vladimir Medvedkin wrote:
> Hi,
> 
> 2018-03-14 14:09 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:
> 
> > On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > > RIB is an alternative to current LPM library.
> > > It solves the following problems
> > >  - Increases the speed of control plane operations against lpm such as
> > >    adding/deleting routes
> > >  - Adds abstraction from dataplane algorithms, so it is possible to add
> > >    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
> > >    in addition to current dir24_8
> > >  - It is possible to keep user defined application specific additional
> > >    information in struct rte_rib_node which represents route entry.
> > >    It can be next hop/set of next hops (i.e. active and feasible),
> > >    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
> > >    plenty of additional control plane information.
> > >  - For dir24_8 implementation it is possible to remove
> > rte_lpm_tbl_entry.depth
> > >    field that helps to save 6 bits.
> > >  - Also new dir24_8 implementation supports different next_hop sizes
> > >    (1/2/4/8 bytes per next hop)
> > >  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary
> > operator.
> > >    Instead it returns special default value if there is no route.
> > >
> > > Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> > > ---
> > >  config/common_base                 |   6 +
> > >  doc/api/doxy-api.conf              |   1 +
> > >  lib/Makefile                       |   2 +
> > >  lib/librte_rib/Makefile            |  22 ++
> > >  lib/librte_rib/rte_dir24_8.c       | 482 ++++++++++++++++++++++++++++++
> > +++
> > >  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
> > >  lib/librte_rib/rte_rib.c           | 526 ++++++++++++++++++++++++++++++
> > +++++++
> > >  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
> > >  lib/librte_rib/rte_rib_version.map |  18 ++
> > >  mk/rte.app.mk                      |   1 +
> > >  10 files changed, 1496 insertions(+)
> > >  create mode 100644 lib/librte_rib/Makefile
> > >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> > >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> > >  create mode 100644 lib/librte_rib/rte_rib.c
> > >  create mode 100644 lib/librte_rib/rte_rib.h
> > >  create mode 100644 lib/librte_rib/rte_rib_version.map
> > >
> >
> > First pass review comments. For now just reviewed the main public header
> > file rte_rib.h. Later reviews will cover the other files as best I can.
> >
> > /Bruce
> >
> > <snip>
> > > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> > > new file mode 100644
> > > index 0000000..6eac8fb
> > > --- /dev/null
> > > +++ b/lib/librte_rib/rte_rib.h
> > > @@ -0,0 +1,322 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > > + */
> > > +
> > > +#ifndef _RTE_RIB_H_
> > > +#define _RTE_RIB_H_
> > > +
> > > +/**
> > > + * @file
> > > + * Compressed trie implementation for Longest Prefix Match
> > > + */
> > > +
> > > +/** @internal Macro to enable/disable run-time checks. */
> > > +#if defined(RTE_LIBRTE_RIB_DEBUG)
> > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {    \
> > > +     if (cond)                                       \
> > > +             return retval;                          \
> > > +} while (0)
> > > +#else
> > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> > > +#endif
> >
> > use RTE_ASSERT?
> >
> it was done just like it was done in the LPM lib. But if you think it
> should be RTE_ASSERT so be it.
> 
> 
> >
> > > +
> > > +#define RTE_RIB_VALID_NODE   1
> >
> > should there be an INVALID_NODE macro?
> >
> No
> 
> 
> >
> > > +#define RTE_RIB_GET_NXT_ALL  0
> > > +#define RTE_RIB_GET_NXT_COVER        1
> > > +
> > > +#define RTE_RIB_INVALID_ROUTE        0
> > > +#define RTE_RIB_VALID_ROUTE  1
> > > +
> > > +/** Max number of characters in RIB name. */
> > > +#define RTE_RIB_NAMESIZE     64
> > > +
> > > +/** Maximum depth value possible for IPv4 RIB. */
> > > +#define RTE_RIB_MAXDEPTH     32
> >
> > I think we should have IPv4 in the name here. Will it not be extended to
> > support IPv6 in future?
> >
> I think there should be a separate implementation of the library for ipv6
> 
I can understand the need for a separate LPM implementation, but should
they both not be under the same rib library?

> 
> >
> > > +
> > > +/**
> > > + * Macro to check if prefix1 {key1/depth1}
> > > + * is covered by prefix2 {key2/depth2}
> > > + */
> > > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)
> >      \
> > > +     ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> > > +             && (depth1 > depth2))
> > Neat check!
> >
> > Any particular reason for using UINT64_MAX here rather than UINT32_MAX?
> 
> in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain
> UINT32_MAX because shift count will be masked to 5 bits.
> 
> I think you can avoid the casting and have a slightly shorter mask by
> > changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to
> > "~(UINT32_MAX >> depth2)"
> > I'd also suggest for readability putting the second check first, and,
> > for maintainability, using an inline function rather than a macro.
> >
>  Agree, it looks clearer
> 
> 
> > > +
> > > +/** @internal Macro to get next node in tree*/
> > > +#define RTE_RIB_GET_NXT_NODE(node, key)
> >       \
> > > +     ((key & (1 << (31 - node->depth))) ? node->right : node->left)
> > > +/** @internal Macro to check if node is right child*/
> > > +#define RTE_RIB_IS_RIGHT_NODE(node)  (node->parent->right == node)
> >
> > Again, consider inline fns rather than macros.
> >
> Ok
> 
> For the latter macro, rather than doing additional pointer derefs to
> > parent, can you also get if it's a right node by using:
> > "(node->key & (1 << (32 - node->depth)))"?
> >
> No, it is not possible. Decision whether node be left or right is made
> using parent and child common depth.
> Consider case with 10.0.0.0/8 and 10.128.0.0/24. In this way common depth
> will be /8 and 10.128.0.0/24 will be right child.
> 
> 
> > > +
> > > +
> > > +struct rte_rib_node {
> > > +     struct rte_rib_node *left;
> > > +     struct rte_rib_node *right;
> > > +     struct rte_rib_node *parent;
> > > +     uint32_t        key;
> > > +     uint8_t         depth;
> > > +     uint8_t         flag;
> > > +     uint64_t        nh;
> > > +     uint64_t        ext[0];
> > > +};
> > > +
> > > +struct rte_rib;
> > > +
> > > +/** Type of FIB struct*/
> > > +enum rte_rib_type {
> > > +     RTE_RIB_DIR24_8_1B,
> > > +     RTE_RIB_DIR24_8_2B,
> > > +     RTE_RIB_DIR24_8_4B,
> > > +     RTE_RIB_DIR24_8_8B,
> > > +     RTE_RIB_TYPE_MAX
> > > +};
> >
> > If the plan is to support multiple underlying fib types and algorithms
> > under the rib library, would it not be better to separate out the
> > algorithm part from the data storage part? So have the type just be
> > DIR_24_8, and have the 1, 2, 4 or 8 specified separately.
> >
> Yes, we were talk about it in IRC, agree. Now I pass next hop size in
> union rte_rib_fib_conf inside rte_rib_conf
> 
> 
> >
> > > +
> > > +enum rte_rib_op {
> > > +     RTE_RIB_ADD,
> > > +     RTE_RIB_DEL
> > > +};
> > > +
> > > +/** RIB nodes allocation type */
> > > +enum rte_rib_alloc_type {
> > > +     RTE_RIB_MALLOC,
> > > +     RTE_RIB_MEMPOOL,
> > > +     RTE_RIB_ALLOC_MAX
> > > +};
> >
> > Not sure you need this any more. Malloc allocations and mempool
> > allocations are now pretty much the same thing.
> >
> Actually I think to remove malloc. On performance tests with
> adding/deleting huge amount of routes malloc is slower. Maybe because of
> fragmentation.
> What do you think?
> 
Yes, definitely mempool allocations are the way to go!

> 
> > > +
> > > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> > > +     uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
> >
> > Do you anticipate more ops in future than just add and delete? If not,
> > why not just split this function into two and drop the op struct.
> >
> It is difficult question. I'm not ready to make decision at the moment.
> 
> 
> >
> > > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> > > +     uint64_t *next_hops, const unsigned n);
> > > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib
> > *rib);
> > > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> > > +     struct rte_rib_node *node);
> > > +
> > > +struct rte_rib {
> > > +     char name[RTE_RIB_NAMESIZE];
> > > +     /*pointer to rib trie*/
> > > +     struct rte_rib_node     *trie;
> > > +     /*pointer to dataplane struct*/
> > > +     void    *fib;
> > > +     /*prefix modification*/
> > > +     rte_rib_modify_fn_t     modify;
> > > +     /* Bulk lookup fn*/
> > > +     rte_rib_tree_lookup_fn_t        lookup;
> > > +     /*alloc trie element*/
> > > +     rte_rib_alloc_node_fn_t alloc_node;
> > > +     /*free trie element*/
> > > +     rte_rib_free_node_fn_t  free_node;
> > > +     struct rte_mempool      *node_pool;
> > > +     uint32_t                cur_nodes;
> > > +     uint32_t                cur_routes;
> > > +     int                     max_nodes;
> > > +     int                     node_sz;
> > > +     enum rte_rib_type       type;
> > > +     enum rte_rib_alloc_type alloc_type;
> > > +};
> > > +
> > > +/** RIB configuration structure */
> > > +struct rte_rib_conf {
> > > +     enum rte_rib_type       type;
> > > +     enum rte_rib_alloc_type alloc_type;
> > > +     int     max_nodes;
> > > +     size_t  node_sz;
> > > +     uint64_t def_nh;
> > > +};
> > > +
> > > +/**
> > > + * Lookup an IP into the RIB structure
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  IP to be looked up in the RIB
> > > + * @return
> > > + *  pointer to struct rte_rib_node on success,
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> > > +
> > > +/**
> > > + * Lookup less specific route into the RIB structure
> > > + *
> > > + * @param ent
> > > + *  Pointer to struct rte_rib_node that represents target route
> > > + * @return
> > > + *  pointer to struct rte_rib_node that represents
> > > + *  less specific route on success,
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> > > +
> > > +/**
> > > + * Lookup prefix into the RIB structure
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net to be looked up in the RIB
> > > + * @param depth
> > > + *  prefix length
> > > + * @return
> > > + *  pointer to struct rte_rib_node on success,
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t
> > depth);
> >
> > Can you explain the difference between this and regular lookup, and how
> > they would be used. I don't think the names convey the differences
> > sufficiently, and so we should look to rename one or both to be clearer.
> >
> Regular lookup (rte_rib_tree_lookup) will lookup for most specific node for
> passed key.
> rte_rib_tree_lookup_exact will lookup node contained key and depth equal to
> passed in args. It used to find exact route.
> 
So if there is no node exactly matching the parameters, it the lookup_exact
returns failure? E.g. if you request a /24 node, it won't return a /8 node
that would cover the /24?

> 
> >
> > > +
> > > +/**
> > > + * Retrieve next more specific prefix from the RIB
> > s/more/most/
> >
> 
> > > + * that is covered by key/depth supernet
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net address of supernet prefix that covers returned more specific
> > prefixes
> > > + * @param depth
> > > + *  supernet prefix length
> > > + * @param cur
> > > + *   pointer to the last returned prefix to get next prefix
> > > + *   or
> > > + *   NULL to get first more specific prefix
> > > + * @param flag
> > > + *  -RTE_RIB_GET_NXT_ALL
> > > + *   get all prefixes from subtrie
> >
> > By all prefixes do you mean more specific, i.e. the final prefix?
> >
> What do you mean the final prefix?
> 
The most specific one, or the longest prefix.

> 
> > > + *  -RTE_RIB_GET_NXT_COVER
> > > + *   get only first more specific prefix even if it have more specifics
> > > + * @return
> > > + *  pointer to the next more specific prefix
> > > + *  or
> > > + *  NULL if there is no prefixes left
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> > > +     struct rte_rib_node *cur, int flag);
> > > +
> > > +/**
> > > + * Remove prefix from the RIB
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net to be removed from the RIB
> > > + * @param depth
> > > + *  prefix length
> > > + */
> > > +void
> > > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > > +
> > > +/**
> > > + * Insert prefix into the RIB
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net to be inserted to the RIB
> > > + * @param depth
> > > + *  prefix length
> > > + * @return
> > > + *  pointer to new rte_rib_node on success
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > > +
> > > +/**
> > > + * Create RIB
> > > + *
> > > + * @param name
> > > + *  RIB name
> > > + * @param socket_id
> > > + *  NUMA socket ID for RIB table memory allocation
> > > + * @param conf
> > > + *  Structure containing the configuration
> > > + * @return
> > > + *  Handle to RIB object on success
> > > + *  NULL otherwise with rte_errno set to an appropriate values.
> > > + */
> > > +struct rte_rib *
> > > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf
> > *conf);
> > > +
> > > +/**
> > > + * Find an existing RIB object and return a pointer to it.
> > > + *
> > > + * @param name
> > > + *  Name of the rib object as passed to rte_rib_create()
> > > + * @return
> > > + *  Pointer to rib object or NULL if object not found with rte_errno
> > > + *  set appropriately. Possible rte_errno values include:
> > > + *   - ENOENT - required entry not available to return.
> > > + */
> > > +struct rte_rib *
> > > +rte_rib_find_existing(const char *name);
> > > +
> > > +/**
> > > + * Free an RIB object.
> > > + *
> > > + * @param rib
> > > + *   RIB object handle
> > > + * @return
> > > + *   None
> > > + */
> > > +void
> > > +rte_rib_free(struct rte_rib *rib);
> > > +
> > > +/**
> > > + * Add a rule to the RIB.
> > > + *
> > > + * @param rib
> > > + *   RIB object handle
> > > + * @param ip
> > > + *   IP of the rule to be added to the RIB
> > > + * @param depth
> > > + *   Depth of the rule to be added to the RIB
> > > + * @param next_hop
> > > + *   Next hop of the rule to be added to the RIB
> > > + * @return
> > > + *   0 on success, negative value otherwise
> > > + */
> > > +int
> > > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t
> > next_hop);
> > > +
> > > +/**
> > > + * Delete a rule from the RIB.
> > > + *
> > > + * @param rib
> > > + *   RIB object handle
> > > + * @param ip
> > > + *   IP of the rule to be deleted from the RIB
> > > + * @param depth
> > > + *   Depth of the rule to be deleted from the RIB
> > > + * @return
> > > + *   0 on success, negative value otherwise
> > > + */
> > > +int
> > > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> > > +
> > > +/**
> > > + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> > > + * macro, so the address of the function should not be used.
> > > + *
> > > + * @param RIB
> > > + *   RIB object handle
> > > + * @param ips
> > > + *   Array of IPs to be looked up in the FIB
> > > + * @param next_hops
> > > + *   Next hop of the most specific rule found for IP.
> > > + *   This is an array of eight byte values.
> > > + *   If the lookup for the given IP failed, then corresponding element
> > would
> > > + *   contain default value, see description of then next parameter.
> > > + * @param n
> > > + *   Number of elements in ips (and next_hops) array to lookup. This
> > should be a
> > > + *   compile time constant, and divisible by 8 for best performance.
> > > + * @param defv
> > > + *   Default value to populate into corresponding element of hop[]
> > array,
> > > + *   if lookup would fail.
> > > + *  @return
> > > + *   -EINVAL for incorrect arguments, otherwise 0
> > > + */
> > > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)      \
> > > +     rib->lookup(rib->fib, ips, next_hops, n)
> >
> > My main thought here is whether this needs to be a function at all?
> > Given that it takes a full burst of addresses in a single go, how much
> > performance would actually be lost by making this a regular function in
> > the C file?
> > IF we do convert this to a regular function, then a lot of the structure
> > definitions above - most importantly, the rib structure itself - can
> > probably be moved to a private header file and not exposed to
> > applications at all. This will make ABI compatibility a *lot* easier, as
> > the structures can be changed without affecting the public ABI.
> >
> I didn't quite understand what you mean.
> 
Sorry, by "needs to be a function" in first line read "needs to be a
macro". Basically, the point is to not inline anything that doesn't need
it. If a function works on a burst of packets, it probably will be fine
being a regular function than a macro or inline function.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-14 11:09  4%   ` Bruce Richardson
@ 2018-03-25 18:17  0%     ` Vladimir Medvedkin
  2018-03-26  9:50  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Vladimir Medvedkin @ 2018-03-25 18:17 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Hi,

2018-03-14 14:09 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:

> On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > RIB is an alternative to current LPM library.
> > It solves the following problems
> >  - Increases the speed of control plane operations against lpm such as
> >    adding/deleting routes
> >  - Adds abstraction from dataplane algorithms, so it is possible to add
> >    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
> >    in addition to current dir24_8
> >  - It is possible to keep user defined application specific additional
> >    information in struct rte_rib_node which represents route entry.
> >    It can be next hop/set of next hops (i.e. active and feasible),
> >    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
> >    plenty of additional control plane information.
> >  - For dir24_8 implementation it is possible to remove
> rte_lpm_tbl_entry.depth
> >    field that helps to save 6 bits.
> >  - Also new dir24_8 implementation supports different next_hop sizes
> >    (1/2/4/8 bytes per next hop)
> >  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary
> operator.
> >    Instead it returns special default value if there is no route.
> >
> > Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> > ---
> >  config/common_base                 |   6 +
> >  doc/api/doxy-api.conf              |   1 +
> >  lib/Makefile                       |   2 +
> >  lib/librte_rib/Makefile            |  22 ++
> >  lib/librte_rib/rte_dir24_8.c       | 482 ++++++++++++++++++++++++++++++
> +++
> >  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
> >  lib/librte_rib/rte_rib.c           | 526 ++++++++++++++++++++++++++++++
> +++++++
> >  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
> >  lib/librte_rib/rte_rib_version.map |  18 ++
> >  mk/rte.app.mk                      |   1 +
> >  10 files changed, 1496 insertions(+)
> >  create mode 100644 lib/librte_rib/Makefile
> >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> >  create mode 100644 lib/librte_rib/rte_rib.c
> >  create mode 100644 lib/librte_rib/rte_rib.h
> >  create mode 100644 lib/librte_rib/rte_rib_version.map
> >
>
> First pass review comments. For now just reviewed the main public header
> file rte_rib.h. Later reviews will cover the other files as best I can.
>
> /Bruce
>
> <snip>
> > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> > new file mode 100644
> > index 0000000..6eac8fb
> > --- /dev/null
> > +++ b/lib/librte_rib/rte_rib.h
> > @@ -0,0 +1,322 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > + */
> > +
> > +#ifndef _RTE_RIB_H_
> > +#define _RTE_RIB_H_
> > +
> > +/**
> > + * @file
> > + * Compressed trie implementation for Longest Prefix Match
> > + */
> > +
> > +/** @internal Macro to enable/disable run-time checks. */
> > +#if defined(RTE_LIBRTE_RIB_DEBUG)
> > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {    \
> > +     if (cond)                                       \
> > +             return retval;                          \
> > +} while (0)
> > +#else
> > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> > +#endif
>
> use RTE_ASSERT?
>
it was done just like it was done in the LPM lib. But if you think it
should be RTE_ASSERT so be it.


>
> > +
> > +#define RTE_RIB_VALID_NODE   1
>
> should there be an INVALID_NODE macro?
>
No


>
> > +#define RTE_RIB_GET_NXT_ALL  0
> > +#define RTE_RIB_GET_NXT_COVER        1
> > +
> > +#define RTE_RIB_INVALID_ROUTE        0
> > +#define RTE_RIB_VALID_ROUTE  1
> > +
> > +/** Max number of characters in RIB name. */
> > +#define RTE_RIB_NAMESIZE     64
> > +
> > +/** Maximum depth value possible for IPv4 RIB. */
> > +#define RTE_RIB_MAXDEPTH     32
>
> I think we should have IPv4 in the name here. Will it not be extended to
> support IPv6 in future?
>
I think there should be a separate implementation of the library for ipv6


>
> > +
> > +/**
> > + * Macro to check if prefix1 {key1/depth1}
> > + * is covered by prefix2 {key2/depth2}
> > + */
> > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)
>      \
> > +     ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> > +             && (depth1 > depth2))
> Neat check!
>
> Any particular reason for using UINT64_MAX here rather than UINT32_MAX?

in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain
UINT32_MAX because shift count will be masked to 5 bits.

I think you can avoid the casting and have a slightly shorter mask by
> changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to
> "~(UINT32_MAX >> depth2)"
> I'd also suggest for readability putting the second check first, and,
> for maintainability, using an inline function rather than a macro.
>
 Agree, it looks clearer


> > +
> > +/** @internal Macro to get next node in tree*/
> > +#define RTE_RIB_GET_NXT_NODE(node, key)
>       \
> > +     ((key & (1 << (31 - node->depth))) ? node->right : node->left)
> > +/** @internal Macro to check if node is right child*/
> > +#define RTE_RIB_IS_RIGHT_NODE(node)  (node->parent->right == node)
>
> Again, consider inline fns rather than macros.
>
Ok

For the latter macro, rather than doing additional pointer derefs to
> parent, can you also get if it's a right node by using:
> "(node->key & (1 << (32 - node->depth)))"?
>
No, it is not possible. Decision whether node be left or right is made
using parent and child common depth.
Consider case with 10.0.0.0/8 and 10.128.0.0/24. In this way common depth
will be /8 and 10.128.0.0/24 will be right child.


> > +
> > +
> > +struct rte_rib_node {
> > +     struct rte_rib_node *left;
> > +     struct rte_rib_node *right;
> > +     struct rte_rib_node *parent;
> > +     uint32_t        key;
> > +     uint8_t         depth;
> > +     uint8_t         flag;
> > +     uint64_t        nh;
> > +     uint64_t        ext[0];
> > +};
> > +
> > +struct rte_rib;
> > +
> > +/** Type of FIB struct*/
> > +enum rte_rib_type {
> > +     RTE_RIB_DIR24_8_1B,
> > +     RTE_RIB_DIR24_8_2B,
> > +     RTE_RIB_DIR24_8_4B,
> > +     RTE_RIB_DIR24_8_8B,
> > +     RTE_RIB_TYPE_MAX
> > +};
>
> If the plan is to support multiple underlying fib types and algorithms
> under the rib library, would it not be better to separate out the
> algorithm part from the data storage part? So have the type just be
> DIR_24_8, and have the 1, 2, 4 or 8 specified separately.
>
Yes, we were talk about it in IRC, agree. Now I pass next hop size in
union rte_rib_fib_conf inside rte_rib_conf


>
> > +
> > +enum rte_rib_op {
> > +     RTE_RIB_ADD,
> > +     RTE_RIB_DEL
> > +};
> > +
> > +/** RIB nodes allocation type */
> > +enum rte_rib_alloc_type {
> > +     RTE_RIB_MALLOC,
> > +     RTE_RIB_MEMPOOL,
> > +     RTE_RIB_ALLOC_MAX
> > +};
>
> Not sure you need this any more. Malloc allocations and mempool
> allocations are now pretty much the same thing.
>
Actually I think to remove malloc. On performance tests with
adding/deleting huge amount of routes malloc is slower. Maybe because of
fragmentation.
What do you think?


> > +
> > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> > +     uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
>
> Do you anticipate more ops in future than just add and delete? If not,
> why not just split this function into two and drop the op struct.
>
It is difficult question. I'm not ready to make decision at the moment.


>
> > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> > +     uint64_t *next_hops, const unsigned n);
> > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib
> *rib);
> > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> > +     struct rte_rib_node *node);
> > +
> > +struct rte_rib {
> > +     char name[RTE_RIB_NAMESIZE];
> > +     /*pointer to rib trie*/
> > +     struct rte_rib_node     *trie;
> > +     /*pointer to dataplane struct*/
> > +     void    *fib;
> > +     /*prefix modification*/
> > +     rte_rib_modify_fn_t     modify;
> > +     /* Bulk lookup fn*/
> > +     rte_rib_tree_lookup_fn_t        lookup;
> > +     /*alloc trie element*/
> > +     rte_rib_alloc_node_fn_t alloc_node;
> > +     /*free trie element*/
> > +     rte_rib_free_node_fn_t  free_node;
> > +     struct rte_mempool      *node_pool;
> > +     uint32_t                cur_nodes;
> > +     uint32_t                cur_routes;
> > +     int                     max_nodes;
> > +     int                     node_sz;
> > +     enum rte_rib_type       type;
> > +     enum rte_rib_alloc_type alloc_type;
> > +};
> > +
> > +/** RIB configuration structure */
> > +struct rte_rib_conf {
> > +     enum rte_rib_type       type;
> > +     enum rte_rib_alloc_type alloc_type;
> > +     int     max_nodes;
> > +     size_t  node_sz;
> > +     uint64_t def_nh;
> > +};
> > +
> > +/**
> > + * Lookup an IP into the RIB structure
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  IP to be looked up in the RIB
> > + * @return
> > + *  pointer to struct rte_rib_node on success,
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> > +
> > +/**
> > + * Lookup less specific route into the RIB structure
> > + *
> > + * @param ent
> > + *  Pointer to struct rte_rib_node that represents target route
> > + * @return
> > + *  pointer to struct rte_rib_node that represents
> > + *  less specific route on success,
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> > +
> > +/**
> > + * Lookup prefix into the RIB structure
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net to be looked up in the RIB
> > + * @param depth
> > + *  prefix length
> > + * @return
> > + *  pointer to struct rte_rib_node on success,
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t
> depth);
>
> Can you explain the difference between this and regular lookup, and how
> they would be used. I don't think the names convey the differences
> sufficiently, and so we should look to rename one or both to be clearer.
>
Regular lookup (rte_rib_tree_lookup) will lookup for most specific node for
passed key.
rte_rib_tree_lookup_exact will lookup node contained key and depth equal to
passed in args. It used to find exact route.


>
> > +
> > +/**
> > + * Retrieve next more specific prefix from the RIB
> s/more/most/
>

> > + * that is covered by key/depth supernet
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net address of supernet prefix that covers returned more specific
> prefixes
> > + * @param depth
> > + *  supernet prefix length
> > + * @param cur
> > + *   pointer to the last returned prefix to get next prefix
> > + *   or
> > + *   NULL to get first more specific prefix
> > + * @param flag
> > + *  -RTE_RIB_GET_NXT_ALL
> > + *   get all prefixes from subtrie
>
> By all prefixes do you mean more specific, i.e. the final prefix?
>
What do you mean the final prefix?


> > + *  -RTE_RIB_GET_NXT_COVER
> > + *   get only first more specific prefix even if it have more specifics
> > + * @return
> > + *  pointer to the next more specific prefix
> > + *  or
> > + *  NULL if there is no prefixes left
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> > +     struct rte_rib_node *cur, int flag);
> > +
> > +/**
> > + * Remove prefix from the RIB
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net to be removed from the RIB
> > + * @param depth
> > + *  prefix length
> > + */
> > +void
> > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > +
> > +/**
> > + * Insert prefix into the RIB
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net to be inserted to the RIB
> > + * @param depth
> > + *  prefix length
> > + * @return
> > + *  pointer to new rte_rib_node on success
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > +
> > +/**
> > + * Create RIB
> > + *
> > + * @param name
> > + *  RIB name
> > + * @param socket_id
> > + *  NUMA socket ID for RIB table memory allocation
> > + * @param conf
> > + *  Structure containing the configuration
> > + * @return
> > + *  Handle to RIB object on success
> > + *  NULL otherwise with rte_errno set to an appropriate values.
> > + */
> > +struct rte_rib *
> > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf
> *conf);
> > +
> > +/**
> > + * Find an existing RIB object and return a pointer to it.
> > + *
> > + * @param name
> > + *  Name of the rib object as passed to rte_rib_create()
> > + * @return
> > + *  Pointer to rib object or NULL if object not found with rte_errno
> > + *  set appropriately. Possible rte_errno values include:
> > + *   - ENOENT - required entry not available to return.
> > + */
> > +struct rte_rib *
> > +rte_rib_find_existing(const char *name);
> > +
> > +/**
> > + * Free an RIB object.
> > + *
> > + * @param rib
> > + *   RIB object handle
> > + * @return
> > + *   None
> > + */
> > +void
> > +rte_rib_free(struct rte_rib *rib);
> > +
> > +/**
> > + * Add a rule to the RIB.
> > + *
> > + * @param rib
> > + *   RIB object handle
> > + * @param ip
> > + *   IP of the rule to be added to the RIB
> > + * @param depth
> > + *   Depth of the rule to be added to the RIB
> > + * @param next_hop
> > + *   Next hop of the rule to be added to the RIB
> > + * @return
> > + *   0 on success, negative value otherwise
> > + */
> > +int
> > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t
> next_hop);
> > +
> > +/**
> > + * Delete a rule from the RIB.
> > + *
> > + * @param rib
> > + *   RIB object handle
> > + * @param ip
> > + *   IP of the rule to be deleted from the RIB
> > + * @param depth
> > + *   Depth of the rule to be deleted from the RIB
> > + * @return
> > + *   0 on success, negative value otherwise
> > + */
> > +int
> > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> > +
> > +/**
> > + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> > + * macro, so the address of the function should not be used.
> > + *
> > + * @param RIB
> > + *   RIB object handle
> > + * @param ips
> > + *   Array of IPs to be looked up in the FIB
> > + * @param next_hops
> > + *   Next hop of the most specific rule found for IP.
> > + *   This is an array of eight byte values.
> > + *   If the lookup for the given IP failed, then corresponding element
> would
> > + *   contain default value, see description of then next parameter.
> > + * @param n
> > + *   Number of elements in ips (and next_hops) array to lookup. This
> should be a
> > + *   compile time constant, and divisible by 8 for best performance.
> > + * @param defv
> > + *   Default value to populate into corresponding element of hop[]
> array,
> > + *   if lookup would fail.
> > + *  @return
> > + *   -EINVAL for incorrect arguments, otherwise 0
> > + */
> > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)      \
> > +     rib->lookup(rib->fib, ips, next_hops, n)
>
> My main thought here is whether this needs to be a function at all?
> Given that it takes a full burst of addresses in a single go, how much
> performance would actually be lost by making this a regular function in
> the C file?
> IF we do convert this to a regular function, then a lot of the structure
> definitions above - most importantly, the rib structure itself - can
> probably be moved to a private header file and not exposed to
> applications at all. This will make ABI compatibility a *lot* easier, as
> the structures can be changed without affecting the public ABI.
>
I didn't quite understand what you mean.


> /Bruce
>
> > +
> > +#endif /* _RTE_RIB_H_ */
>



-- 
Regards,
Vladimir

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
  2018-03-25 16:20  7%     ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-25 16:20  6%     ` [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
@ 2018-03-25 16:20  6%     ` Andrew Rybchenko
  2018-03-25 16:20  4%     ` [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions Andrew Rybchenko
  2018-03-25 16:20  8%     ` [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Santosh Shukla, Jerin Jacob

The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - fix typo
 - rebase on top of patch which renames MEMPOOL_F_NO_PHYS_CONTIG

RFCv2 -> v1:
 - squash mempool/octeontx patches to add calc_mem_size and populate
   callbacks to this one in order to avoid breakages in the middle of
   patchset
 - advertise API changes in release notes

 doc/guides/rel_notes/deprecation.rst            |  1 -
 doc/guides/rel_notes/release_18_05.rst          | 11 +++++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 +++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.c                | 44 ++----------------
 lib/librte_mempool/rte_mempool.h                | 52 +---------------------
 lib/librte_mempool/rte_mempool_ops.c            | 14 ------
 lib/librte_mempool/rte_mempool_ops_default.c    | 15 +------
 lib/librte_mempool/rte_mempool_version.map      |  1 -
 8 files changed, 68 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index c06fc67..4deed9a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -70,7 +70,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index abaefe5..c50f26c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -66,6 +66,14 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Removed mempool capability flags and related functions.**
+
+  Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and
+  ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool
+  driver to customize generic mempool library behaviour.
+  Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
+  used to achieve it without specific knowledge in the generic code.
+
 
 ABI Changes
 -----------
@@ -86,6 +94,9 @@ ABI Changes
   to allow to customize required memory size calculation.
   A new callback ``populate`` has been added to ``rte_mempool_ops``
   to allow to customize objects population.
+  Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
+  since its features are covered by ``calc_mem_size`` and ``populate``
+  callbacks.
 
 
 Removed Items
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index d143d05..64ed528 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
 	return octeontx_fpa_bufpool_free_count(pool);
 }
 
-static int
-octeontx_fpavf_get_capabilities(const struct rte_mempool *mp,
-				unsigned int *flags)
+static ssize_t
+octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp,
+			     uint32_t obj_num, uint32_t pg_shift,
+			     size_t *min_chunk_size, size_t *align)
 {
-	RTE_SET_USED(mp);
-	*flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
-			MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS);
-	return 0;
+	ssize_t mem_size;
+
+	/*
+	 * Simply need space for one more object to be able to
+	 * fulfil alignment requirements.
+	 */
+	mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1,
+							pg_shift,
+							min_chunk_size, align);
+	if (mem_size >= 0) {
+		/*
+		 * Memory area which contains objects must be physically
+		 * contiguous.
+		 */
+		*min_chunk_size = mem_size;
+	}
+
+	return mem_size;
 }
 
 static int
@@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp,
 	return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool);
 }
 
+static int
+octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs,
+			void *vaddr, rte_iova_t iova, size_t len,
+			rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+
+	if (iova == RTE_BAD_IOVA)
+		return -EINVAL;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	/* align object start address to a multiple of total_elt_sz */
+	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
+
+	if (len < off)
+		return -EINVAL;
+
+	vaddr = (char *)vaddr + off;
+	iova += off;
+	len -= off;
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len,
+					       obj_cb, obj_cb_arg);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.name = "octeontx_fpavf",
 	.alloc = octeontx_fpavf_alloc,
@@ -157,8 +199,9 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.enqueue = octeontx_fpavf_enqueue,
 	.dequeue = octeontx_fpavf_dequeue,
 	.get_count = octeontx_fpavf_get_count,
-	.get_capabilities = octeontx_fpavf_get_capabilities,
 	.register_memory_area = octeontx_fpavf_register_memory_area,
+	.calc_mem_size = octeontx_fpavf_calc_mem_size,
+	.populate = octeontx_fpavf_populate,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d917dc7..40eedde 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -208,15 +208,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      __rte_unused unsigned int flags)
 {
 	size_t obj_per_page, pg_num, pg_sz;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	if (total_elt_sz == 0)
 		return 0;
@@ -240,18 +234,12 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 ssize_t
 rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags)
+	uint32_t pg_shift, __rte_unused unsigned int flags)
 {
 	uint32_t elt_cnt = 0;
 	rte_iova_t start, end;
 	uint32_t iova_idx;
 	size_t pg_sz = (size_t)1 << pg_shift;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	/* if iova is NULL, assume contiguous memory */
 	if (iova == NULL) {
@@ -345,8 +333,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque)
 {
-	unsigned total_elt_sz;
-	unsigned int mp_capa_flags;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -365,27 +351,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-
-	/* Get mempool capabilities */
-	mp_capa_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_capa_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_capa_flags;
-
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -397,10 +362,7 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp_capa_flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
-		/* align object start address to a multiple of total_elt_sz */
-		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
-	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 754261e..0b83d5e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -246,24 +246,6 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
-/**
- * This capability flag is advertised by a mempool handler, if the whole
- * memory area containing the objects must be physically contiguous.
- * Note: This flag should not be passed by application.
- */
-#define MEMPOOL_F_CAPA_PHYS_CONTIG 0x0040
-/**
- * This capability flag is advertised by a mempool handler. Used for a case
- * where mempool driver wants object start address(vaddr) aligned to block
- * size(/ total element size).
- *
- * Note:
- * - This flag should not be passed by application.
- *   Flag used for mempool driver only.
- * - Mempool driver must also set MEMPOOL_F_CAPA_PHYS_CONTIG flag along with
- *   MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS.
- */
-#define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -389,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Get the mempool capabilities.
- */
-typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
-		unsigned int *flags);
-
-/**
  * Notify new memory area to mempool.
  */
 typedef int (*rte_mempool_ops_register_memory_area_t)
@@ -440,13 +416,7 @@ typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
- * If mempool driver requires object addresses to be block size aligned
- * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
- * reserved to be able to meet the requirement.
- *
- * Minimum size of memory chunk is either all required space, if
- * capabilities say that whole memory area must be physically contiguous
- * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * Minimum size of memory chunk is a maximum of the page size and total
  * element size.
  *
  * Required memory chunk alignment is a maximum of page size and cache
@@ -522,10 +492,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Get the mempool capabilities
-	 */
-	rte_mempool_get_capabilities_t get_capabilities;
-	/**
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
@@ -651,22 +617,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops get_capabilities callback.
- *
- * @param mp [in]
- *   Pointer to the memory pool.
- * @param flags [out]
- *   Pointer to the mempool flags.
- * @return
- *   - 0: Success; The mempool driver has advertised his pool capabilities in
- *   flags param.
- *   - -ENOTSUP - doesn't support get_capabilities ops (valid case).
- *   - Otherwise, pool create fails.
- */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags);
-/**
  * @internal wrapper for mempool_ops register_memory_area callback.
  * API to notify the mempool handler when a new memory area is added to pool.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 1a7f39f..6ac669a 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 	return ops->get_count(mp);
 }
 
-/* wrapper to get external mempool capabilities. */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->get_capabilities, -ENOTSUP);
-	return ops->get_capabilities(mp, flags);
-}
-
 /* wrapper to notify new memory area to external mempool */
 int
 rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57295f7..3defc15 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -11,26 +11,15 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 				     uint32_t obj_num, uint32_t pg_shift,
 				     size_t *min_chunk_size, size_t *align)
 {
-	unsigned int mp_flags;
-	int ret;
 	size_t total_elt_sz;
 	size_t mem_size;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
 	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags | mp_flags);
+					 mp->flags);
 
-	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
-		*min_chunk_size = mem_size;
-	else
-		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
 	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 41a0b09..637f73f 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_get_capabilities;
 	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
                       ` (3 preceding siblings ...)
  2018-03-25 16:20  4%     ` [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions Andrew Rybchenko
@ 2018-03-25 16:20  8%     ` Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback is not required any more since there is a new callback
to populate objects using provided memory area which provides
the same information.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - none

RFCv2 -> v1:
 - advertise ABI changes in release notes

 doc/guides/rel_notes/deprecation.rst       |  1 -
 doc/guides/rel_notes/release_18_05.rst     |  2 ++
 lib/librte_mempool/rte_mempool.c           |  5 -----
 lib/librte_mempool/rte_mempool.h           | 31 ------------------------------
 lib/librte_mempool/rte_mempool_ops.c       | 14 --------------
 lib/librte_mempool/rte_mempool_version.map |  1 -
 6 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 473330d..5301259 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -63,7 +63,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 6a8db54..016c4ed 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -108,6 +108,8 @@ ABI Changes
   Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
   since its features are covered by ``calc_mem_size`` and ``populate``
   callbacks.
+  Callback ``register_memory_area`` has been removed from ``rte_mempool_ops``
+  since the new callback ``populate`` may be used instead of it.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 8c3b0b1..c58bcc6 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -355,11 +355,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (ret != 0)
 		return ret;
 
-	/* Notify memory area to mempool */
-	ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len);
-	if (ret != -ENOTSUP && ret < 0)
-		return ret;
-
 	/* mempool is already populated */
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 9107f5a..314f909 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -371,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Notify new memory area to mempool.
- */
-typedef int (*rte_mempool_ops_register_memory_area_t)
-(const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * Calculate memory size required to store given number of objects.
  *
  * If mempool objects are not required to be IOVA-contiguous
@@ -514,10 +508,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Notify new memory area to mempool
-	 */
-	rte_mempool_ops_register_memory_area_t register_memory_area;
-	/**
 	 * Optional callback to calculate memory size required to
 	 * store specified number of objects.
 	 */
@@ -639,27 +629,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops register_memory_area callback.
- * API to notify the mempool handler when a new memory area is added to pool.
- *
- * @param mp
- *   Pointer to the memory pool.
- * @param vaddr
- *   Pointer to the buffer virtual address.
- * @param iova
- *   Pointer to the buffer IO address.
- * @param len
- *   Pool size.
- * @return
- *   - 0: Success;
- *   - -ENOTSUP - doesn't support register_memory_area ops (valid error case).
- *   - Otherwise, rte_mempool_populate_phys fails thus pool create fails.
- */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
-				char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * @internal wrapper for mempool_ops calc_mem_size callback.
  * API to calculate size of memory required to store specified number of
  * object.
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 6ac669a..ea9be1e 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
 
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 }
 
 /* wrapper to notify new memory area to external mempool */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
-					rte_iova_t iova, size_t len)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->register_memory_area, -ENOTSUP);
-	return ops->register_memory_area(mp, vaddr, iova, len);
-}
-
-/* wrapper to notify new memory area to external mempool */
 ssize_t
 rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				uint32_t obj_num, uint32_t pg_shift,
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 637f73f..cf375db 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
 
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
                       ` (2 preceding siblings ...)
  2018-03-25 16:20  6%     ` [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
@ 2018-03-25 16:20  4%     ` Andrew Rybchenko
  2018-03-25 16:20  8%     ` [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Thomas Monjalon

Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_default().

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - deprecate rte_mempool_populate_iova_tab()
 - add -Wno-deprecated-declarations to fix build errors because of
   rte_mempool_populate_iova_tab() deprecation
 - add @deprecated to deprecated functions description

RFCv2 -> v1:
 - advertise deprecation in release notes
 - factor out default memory size calculation into non-deprecated
   internal function to avoid usage of deprecated function internally
 - remove test for deprecated functions to address build issue because
   of usage of deprecated functions (it is easy to allow usage of
   deprecated function in Makefile, but very complicated in meson)

 doc/guides/rel_notes/deprecation.rst         |  7 -------
 doc/guides/rel_notes/release_18_05.rst       | 11 ++++++++++
 lib/librte_mempool/Makefile                  |  3 +++
 lib/librte_mempool/meson.build               | 12 +++++++++++
 lib/librte_mempool/rte_mempool.c             | 19 ++++++++++++++---
 lib/librte_mempool/rte_mempool.h             | 30 +++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops_default.c |  4 ++--
 test/test/test_mempool.c                     | 31 ----------------------------
 8 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 4deed9a..473330d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -60,13 +60,6 @@ Deprecation Notices
   - ``rte_eal_mbuf_default_mempool_ops``
 
 * mempool: several API and ABI changes are planned in v18.05.
-  The following functions, introduced for Xen, which is not supported
-  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
-  Therefore they will be deprecated in v18.05 and removed in v18.08:
-
-  - ``rte_mempool_xmem_create``
-  - ``rte_mempool_xmem_size``
-  - ``rte_mempool_xmem_usage``
 
   The following changes are planned:
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index c50f26c..6a8db54 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -74,6 +74,17 @@ API Changes
   Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
   used to achieve it without specific knowledge in the generic code.
 
+* **Deprecated mempool xmem functions.**
+
+  The following functions, introduced for Xen, which is not supported
+  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
+  Therefore they were deprecated in v18.05 and will be removed in v18.08:
+
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+  - ``rte_mempool_populate_iova_tab``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 072740f..2c46fdd 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -7,6 +7,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_mempool.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+CFLAGS += -Wno-deprecated-declarations
 LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 9e3b527..22e912a 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,6 +1,18 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+extra_flags = []
+
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+extra_flags += '-Wno-deprecated-declarations'
+
+foreach flag: extra_flags
+	if cc.has_argument(flag)
+		cflags += flag
+	endif
+endforeach
+
 version = 4
 sources = files('rte_mempool.c', 'rte_mempool_ops.c',
 		'rte_mempool_ops_default.c')
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 40eedde..8c3b0b1 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Calculate maximum amount of memory required to store given number of objects.
+ * Internal function to calculate required memory chunk size shared
+ * by default implementation of the corresponding callback and
+ * deprecated external function.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
+rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+				 uint32_t pg_shift)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 
@@ -228,6 +230,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 }
 
 /*
+ * Calculate maximum amount of memory required to store given number of objects.
+ */
+size_t
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
+		      __rte_unused unsigned int flags)
+{
+	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
+						pg_shift);
+}
+
+/*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0b83d5e..9107f5a 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -427,6 +427,28 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal Helper function to calculate memory size required to store
+ * specified number of objects in assumption that the memory buffer will
+ * be aligned at page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * @param elt_num
+ *   Number of elements.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by rte_mempool_calc_obj_size().
+ * @param pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+size_t rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+		uint32_t pg_shift);
+
+/**
  * Function to be called for each populated object.
  *
  * @param[in] mp
@@ -855,6 +877,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 		   int socket_id, unsigned flags);
 
 /**
+ * @deprecated
  * Create a new mempool named *name* in memory.
  *
  * The pool contains n elements of elt_size. Its size is set to n.
@@ -912,6 +935,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. See rte_mempool_create() for details.
  */
+__rte_deprecated
 struct rte_mempool *
 rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
 		unsigned cache_size, unsigned private_data_size,
@@ -1008,6 +1032,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
 	void *opaque);
 
 /**
+ * @deprecated
  * Add physical memory for objects in the pool at init
  *
  * Add a virtually contiguous memory chunk in the pool where objects can
@@ -1033,6 +1058,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
  *   On error, the chunks are not added in the memory list of the
  *   mempool and a negative errno is returned.
  */
+__rte_deprecated
 int rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr,
 	const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift,
 	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque);
@@ -1652,6 +1678,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 	struct rte_mempool_objsz *sz);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate the maximum amount of memory required to store given number
@@ -1674,10 +1701,12 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * @return
  *   Required memory size aligned at page boundary.
  */
+__rte_deprecated
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
 	uint32_t pg_shift, unsigned int flags);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate how much memory would be actually required with the given
@@ -1705,6 +1734,7 @@ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
  *   buffer is too small, return a negative value whose absolute value
  *   is the actual number of elements that can be stored in that buffer.
  */
+__rte_deprecated
 ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
 	uint32_t pg_shift, unsigned int flags);
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 3defc15..fd63ca1 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -16,8 +16,8 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags);
+	mem_size = rte_mempool_calc_mem_size_helper(obj_num, total_elt_sz,
+						    pg_shift);
 
 	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 63f921e..8d29af2 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -444,34 +444,6 @@ test_mempool_same_name_twice_creation(void)
 	return 0;
 }
 
-/*
- * Basic test for mempool_xmem functions.
- */
-static int
-test_mempool_xmem_misc(void)
-{
-	uint32_t elt_num, total_size;
-	size_t sz;
-	ssize_t usz;
-
-	elt_num = MAX_KEEP;
-	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
-	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
-
-	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
-		MEMPOOL_PG_SHIFT_MAX, 0);
-
-	if (sz != (size_t)usz)  {
-		printf("failure @ %s: rte_mempool_xmem_usage(%u, %u) "
-			"returns: %#zx, while expected: %#zx;\n",
-			__func__, elt_num, total_size, sz, (size_t)usz);
-		return -1;
-	}
-
-	return 0;
-}
-
 static void
 walk_cb(struct rte_mempool *mp, void *userdata __rte_unused)
 {
@@ -596,9 +568,6 @@ test_mempool(void)
 	if (test_mempool_same_name_twice_creation() < 0)
 		goto err;
 
-	if (test_mempool_xmem_misc() < 0)
-		goto err;
-
 	/* test the stack handler */
 	if (test_mempool_basic(mp_stack, 1) < 0)
 		goto err;
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
  2018-03-25 16:20  7%     ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-25 16:20  6%     ` Andrew Rybchenko
  2018-03-25 16:20  6%     ` [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - fix memory leak if off is bigger than len

RFCv2 -> v1:
 - advertise ABI changes in release notes
 - use consistent name for default callback:
   rte_mempool_op_<callback>_default()
 - add opaque data pointer to populated object callback
 - move default callback to dedicated file

 doc/guides/rel_notes/deprecation.rst         |  2 +-
 doc/guides/rel_notes/release_18_05.rst       |  2 +
 lib/librte_mempool/rte_mempool.c             | 23 ++++---
 lib/librte_mempool/rte_mempool.h             | 90 ++++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops.c         | 21 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 24 ++++++++
 lib/librte_mempool/rte_mempool_version.map   |  1 +
 7 files changed, 149 insertions(+), 14 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e02d4ca..c06fc67 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,7 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize objects population and allocate contiguous
+  - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 59583ea..abaefe5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,8 @@ ABI Changes
 
   A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
   to allow to customize required memory size calculation.
+  A new callback ``populate`` has been added to ``rte_mempool_ops``
+  to allow to customize objects population.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index dd2d0fe..d917dc7 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size)
 }
 
 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
+mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
+		 void *obj, rte_iova_t iova)
 {
 	struct rte_mempool_objhdr *hdr;
 	struct rte_mempool_objtlr *tlr __rte_unused;
@@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 	tlr = __mempool_get_trailer(obj);
 	tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-
-	/* enqueue in ring */
-	rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
 }
 
 /* call obj_cb() for each mempool element */
@@ -407,17 +405,16 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
 
-	while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
-		off += mp->header_size;
-		if (iova == RTE_BAD_IOVA)
-			mempool_add_elem(mp, (char *)vaddr + off,
-				RTE_BAD_IOVA);
-		else
-			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
-		off += mp->elt_size + mp->trailer_size;
-		i++;
+	if (off > len) {
+		ret = -EINVAL;
+		goto fail;
 	}
 
+	i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
+		(char *)vaddr + off,
+		(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
+		len - off, mempool_add_elem, NULL);
+
 	/* not enough room to store one object */
 	if (i == 0) {
 		ret = -EINVAL;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 191255d..754261e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -456,6 +456,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		uint32_t obj_num, uint32_t pg_shift,
 		size_t *min_chunk_size, size_t *align);
 
+/**
+ * Function to be called for each populated object.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] opaque
+ *   An opaque pointer passed to iterator.
+ * @param[in] vaddr
+ *   Object virtual address.
+ * @param[in] iova
+ *   Input/output virtual address of the object or RTE_BAD_IOVA.
+ */
+typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp,
+		void *opaque, void *vaddr, rte_iova_t iova);
+
+/**
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * Populated objects should be enqueued to the pool, e.g. using
+ * rte_mempool_ops_enqueue_bulk().
+ *
+ * If the given IO address is unknown (iova = RTE_BAD_IOVA),
+ * the chunk doesn't need to be physically contiguous (only virtually),
+ * and allocated objects may span two pages.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+typedef int (*rte_mempool_populate_t)(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
+/**
+ * Default way to populate memory pool object using provided memory
+ * chunk: just slice objects one by one.
+ */
+int rte_mempool_op_populate_default(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -477,6 +534,11 @@ struct rte_mempool_ops {
 	 * store specified number of objects.
 	 */
 	rte_mempool_calc_mem_size_t calc_mem_size;
+	/**
+	 * Optional callback to populate mempool objects using
+	 * provided memory chunk.
+	 */
+	rte_mempool_populate_t populate;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -649,6 +711,34 @@ ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				      size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal wrapper for mempool_ops populate callback.
+ *
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+			     void *vaddr, rte_iova_t iova, size_t len,
+			     rte_mempool_populate_obj_cb_t *obj_cb,
+			     void *obj_cb_arg);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 26908cc..1a7f39f 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
+	ops->populate = h->populate;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -141,6 +142,26 @@ rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
 }
 
+/* wrapper to populate memory pool objects using provided memory chunk */
+int
+rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+				void *vaddr, rte_iova_t iova, size_t len,
+				rte_mempool_populate_obj_cb_t *obj_cb,
+				void *obj_cb_arg)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->populate == NULL)
+		return rte_mempool_op_populate_default(mp, max_objs, vaddr,
+						       iova, len, obj_cb,
+						       obj_cb_arg);
+
+	return ops->populate(mp, max_objs, vaddr, iova, len, obj_cb,
+			     obj_cb_arg);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57fe79b..57295f7 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -36,3 +36,27 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	return mem_size;
 }
+
+int
+rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+	unsigned int i;
+	void *obj;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+		off += mp->header_size;
+		obj = (char *)vaddr + off;
+		obj_cb(mp, obj_cb_arg, obj,
+		       (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off));
+		rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
+		off += mp->elt_size + mp->trailer_size;
+	}
+
+	return i;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index cb38189..41a0b09 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -56,5 +56,6 @@ DPDK_18.05 {
 	global:
 
 	rte_mempool_op_calc_mem_size_default;
+	rte_mempool_op_populate_default;
 
 } DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
@ 2018-03-25 16:20  7%     ` Andrew Rybchenko
  2018-03-25 16:20  6%     ` [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - clarify min_chunk_size meaning
 - rebase on top of patch series which fixes library version in meson
   build

RFCv2 -> v1:
 - move default calc_mem_size callback to rte_mempool_ops_default.c
 - add ABI changes to release notes
 - name default callback consistently: rte_mempool_op_<callback>_default()
 - bump ABI version since it is the first patch which breaks ABI
 - describe default callback behaviour in details
 - avoid introduction of internal function to cope with deprecation
   (keep it to deprecation patch)
 - move cache-line or page boundary chunk alignment to default callback
 - highlight that min_chunk_size and align parameters are output only

 doc/guides/rel_notes/deprecation.rst         |  3 +-
 doc/guides/rel_notes/release_18_05.rst       |  7 ++-
 lib/librte_mempool/Makefile                  |  3 +-
 lib/librte_mempool/meson.build               |  5 +-
 lib/librte_mempool/rte_mempool.c             | 43 +++++++-------
 lib/librte_mempool/rte_mempool.h             | 86 +++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c         | 18 ++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 38 ++++++++++++
 lib/librte_mempool/rte_mempool_version.map   |  7 +++
 9 files changed, 182 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6594585..e02d4ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,8 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize required memory chunk calculation,
-    customize objects population and allocate contiguous
+  - addition of new ops to customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index f2525bb..59583ea 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -80,6 +80,11 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Changed rte_mempool_ops structure.**
+
+  A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
+  to allow to customize required memory size calculation.
+
 
 Removed Items
 -------------
@@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.3
-     librte_mempool.so.3
+   + librte_mempool.so.4
    + librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..072740f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
 
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 712720f..9e3b527 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 3
-sources = files('rte_mempool.c', 'rte_mempool_ops.c')
+version = 4
+sources = files('rte_mempool.c', 'rte_mempool_ops.c',
+		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d8e3720..dd2d0fe 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -561,10 +561,10 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
-	size_t size, total_elt_sz, align, pg_sz, pg_shift;
+	ssize_t mem_size;
+	size_t align, pg_sz, pg_shift;
 	rte_iova_t iova;
 	unsigned mz_id, n;
-	unsigned int mp_flags;
 	int ret;
 
 	ret = mempool_ops_alloc_once(mp);
@@ -575,29 +575,23 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_flags;
-
 	if (rte_eal_has_hugepages()) {
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
-		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
-		align = pg_sz;
 	}
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		size_t min_chunk_size;
+
+		mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+				&min_chunk_size, &align);
+		if (mem_size < 0) {
+			ret = mem_size;
+			goto fail;
+		}
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -606,7 +600,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
+		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 			mp->socket_id, mz_flags, align);
 		/* not enough memory, retry with the biggest zone we have */
 		if (mz == NULL)
@@ -617,6 +611,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
+		if (mz->len < min_chunk_size) {
+			rte_memzone_free(mz);
+			ret = -ENOMEM;
+			goto fail;
+		}
+
 		if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG)
 			iova = RTE_BAD_IOVA;
 		else
@@ -649,13 +649,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 static size_t
 get_anon_size(const struct rte_mempool *mp)
 {
-	size_t size, total_elt_sz, pg_sz, pg_shift;
+	size_t size, pg_sz, pg_shift;
+	size_t min_chunk_size;
+	size_t align;
 
 	pg_sz = getpagesize();
 	pg_shift = rte_bsf32(pg_sz);
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+	size = rte_mempool_ops_calc_mem_size(mp, mp->size, pg_shift,
+					     &min_chunk_size, &align);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e531a15..191255d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -400,6 +400,62 @@ typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
 typedef int (*rte_mempool_ops_register_memory_area_t)
 (const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
 
+/**
+ * Calculate memory size required to store given number of objects.
+ *
+ * If mempool objects are not required to be IOVA-contiguous
+ * (the flag MEMPOOL_F_NO_IOVA_CONTIG is set), min_chunk_size defines
+ * virtually contiguous chunk size. Otherwise, if mempool objects must
+ * be IOVA-contiguous (the flag MEMPOOL_F_NO_IOVA_CONTIG is clear),
+ * min_chunk_size defines IOVA-contiguous chunk size.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+		uint32_t obj_num,  uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+		uint32_t obj_num, uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -416,6 +472,11 @@ struct rte_mempool_ops {
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
+	/**
+	 * Optional callback to calculate memory size required to
+	 * store specified number of objects.
+	 */
+	rte_mempool_calc_mem_size_t calc_mem_size;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -565,6 +626,29 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
 				char *vaddr, rte_iova_t iova, size_t len);
 
 /**
+ * @internal wrapper for mempool_ops calc_mem_size callback.
+ * API to calculate size of memory required to store specified number of
+ * object.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				      uint32_t obj_num, uint32_t pg_shift,
+				      size_t *min_chunk_size, size_t *align);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
@@ -1534,7 +1618,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * of objects. Assume that the memory buffer will be aligned at page
  * boundary.
  *
- * Note that if object size is bigger then page size, then it assumes
+ * Note that if object size is bigger than page size, then it assumes
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 0732255..26908cc 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_count = h->get_count;
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
+	ops->calc_mem_size = h->calc_mem_size;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -123,6 +124,23 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
 	return ops->register_memory_area(mp, vaddr, iova, len);
 }
 
+/* wrapper to notify new memory area to external mempool */
+ssize_t
+rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				uint32_t obj_num, uint32_t pg_shift,
+				size_t *min_chunk_size, size_t *align)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->calc_mem_size == NULL)
+		return rte_mempool_op_calc_mem_size_default(mp, obj_num,
+				pg_shift, min_chunk_size, align);
+
+	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
new file mode 100644
index 0000000..57fe79b
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 6WIND S.A.
+ * Copyright(c) 2018 Solarflare Communications Inc.
+ */
+
+#include <rte_mempool.h>
+
+ssize_t
+rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+				     uint32_t obj_num, uint32_t pg_shift,
+				     size_t *min_chunk_size, size_t *align)
+{
+	unsigned int mp_flags;
+	int ret;
+	size_t total_elt_sz;
+	size_t mem_size;
+
+	/* Get mempool capabilities */
+	mp_flags = 0;
+	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
+					 mp->flags | mp_flags);
+
+	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
+		*min_chunk_size = mem_size;
+	else
+		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+
+	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
+
+	return mem_size;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 62b76f9..cb38189 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -51,3 +51,10 @@ DPDK_17.11 {
 	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
+
+DPDK_18.05 {
+	global:
+
+	rte_mempool_op_calc_mem_size_default;
+
+} DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v2 00/11] mempool: prepare to add bucket driver
    2018-01-31 16:44  0%   ` Olivier Matz
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-25 16:20  2%   ` Andrew Rybchenko
  2018-03-25 16:20  7%     ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
                       ` (4 more replies)
  2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  3 siblings, 5 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev
  Cc: Olivier MATZ, Thomas Monjalon, Anatoly Burakov, Santosh Shukla,
	Jerin Jacob, Hemant Agrawal, Shreyansh Jain

The patch series should be applied on top of [7].

The initial patch series [1] is split into two to simplify processing.
The second series relies on this one and will add bucket mempool driver
and related ops.

The patch series has generic enhancements suggested by Olivier.
Basically it adds driver callbacks to calculate required memory size and
to populate objects using provided memory area. It allows to remove
so-called capability flags used before to tell generic code how to
allocate and slice allocated memory into mempool objects.
Clean up which removes get_capabilities and register_memory_area is
not strictly required, but I think right thing to do.
Existing mempool drivers are updated.

rte_mempool_populate_iova_tab() is also deprecated in v2 as agreed in [2].
Unfortunately it requires addition of -Wno-deprecated-declarations flag
to librte_mempool since the function is used by deprecated earlier
rte_mempool_populate_phys_tab(). If the later may be removed in the
release, we can avoid addition of the flag to allow usage of deprecated
functions.

One open question remains from previous review [3].

The patch series interfere with memory hotplug for DPDK [4] ([5] to be
precise). So, rebase may be required.

A new patch is added to the series to rename MEMPOOL_F_NO_PHYS_CONTIG
as MEMPOOL_F_NO_IOVA_CONTIG as agreed in [6].
MEMPOOL_F_CAPA_PHYS_CONTIG is not renamed since it removed in this
patchset.

It breaks ABI since changes rte_mempool_ops. Also it removes
rte_mempool_ops_register_memory_area() and
rte_mempool_ops_get_capabilities() since corresponding callbacks are
removed.

Internal global functions are not listed in map file since it is not
a part of external API.

[1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
[2] https://dpdk.org/ml/archives/dev/2018-March/093186.html
[3] https://dpdk.org/ml/archives/dev/2018-March/093329.html
[4] https://dpdk.org/ml/archives/dev/2018-March/092070.html
[5] https://dpdk.org/ml/archives/dev/2018-March/092088.html
[6] https://dpdk.org/ml/archives/dev/2018-March/093345.html
[7] https://dpdk.org/ml/archives/dev/2018-March/093196.html

v1 -> v2:
  - deprecate rte_mempool_populate_iova_tab()
  - add patch to fix memory leak if no objects are populated
  - add patch to rename MEMPOOL_F_NO_PHYS_CONTIG
  - minor fixes (typos, blank line at the end of file)
  - highlight meaning of min_chunk_size (when it is virtual or
    physical contiguous)
  - make sure that mempool is initialized in rte_mempool_populate_anon()
  - move patch to ensure that mempool is initialized earlier in the series

RFCv2 -> v1:
  - split the series in two
  - squash octeontx patches which implement calc_mem_size and populate
    callbacks into the patch which removes get_capabilities since it is
    the easiest way to untangle the tangle of tightly related library
    functions and flags advertised by the driver
  - consistently name default callbacks
  - move default callbacks to dedicated file
  - see detailed description in patches

RFCv1 -> RFCv2:
  - add driver ops to calculate required memory size and populate
    mempool objects, remove extra flags which were required before
    to control it
  - transition of octeontx and dpaa drivers to the new callbacks
  - change info API to get information from driver required to
    API user to know contiguous block size
  - remove get_capabilities (not required any more and may be
    substituted with more in info get API)
  - remove register_memory_area since it is substituted with
    populate callback which can do more
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - deprecate XMEM API
  - avoid introduction of a new function to flush cache
  - fix NO_CACHE_ALIGN case in bucket mempool


Andrew Rybchenko (9):
  mempool: fix memhdr leak when no objects are populated
  mempool: rename flag to control IOVA-contiguous objects
  mempool: add op to calculate memory size to be allocated
  mempool: add op to populate objects using provided memory
  mempool: remove callback to get capabilities
  mempool: deprecate xmem functions
  mempool/octeontx: prepare to remove register memory area op
  mempool/dpaa: prepare to remove register memory area op
  mempool: remove callback to register memory area

Artem V. Andreev (2):
  mempool: ensure the mempool is initialized before populating
  mempool: support flushing the default cache of the mempool

 doc/guides/rel_notes/deprecation.rst            |  12 +-
 doc/guides/rel_notes/release_18_05.rst          |  33 ++-
 drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
 drivers/net/thunderx/nicvf_ethdev.c             |   2 +-
 lib/librte_mempool/Makefile                     |   6 +-
 lib/librte_mempool/meson.build                  |  17 +-
 lib/librte_mempool/rte_mempool.c                | 179 ++++++++-------
 lib/librte_mempool/rte_mempool.h                | 280 +++++++++++++++++-------
 lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
 lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
 lib/librte_mempool/rte_mempool_version.map      |  10 +-
 test/test/test_mempool.c                        |  31 ---
 13 files changed, 485 insertions(+), 250 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 1/7] crypto/virtio: add virtio related fundamental functions
  @ 2018-03-25  8:33  2% ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-03-25  8:33 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Since there does not have the common virtio library, we have to put
these files here. They are basically the same with virtio net related files
with some minor changes.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 config/common_base                  |  20 ++
 drivers/crypto/virtio/virtio_logs.h |  47 ++++
 drivers/crypto/virtio/virtio_pci.c  | 460 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h  | 253 ++++++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h | 137 +++++++++++
 drivers/crypto/virtio/virtqueue.c   |  43 ++++
 drivers/crypto/virtio/virtqueue.h   | 176 ++++++++++++++
 7 files changed, 1136 insertions(+)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/config/common_base b/config/common_base
index ad03cf4..19d0cdd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -482,6 +482,26 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n
 CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 
 #
+# Compile PMD for virtio crypto devices
+#
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP=n
+#
+# Number of maximum virtio crypto devices
+#
+CONFIG_RTE_MAX_VIRTIO_CRYPTO=32
+#
+# Number of sessions to create in the session memory pool
+# on a single virtio crypto device.
+#
+CONFIG_RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS=1024
+
+#
 # Compile PMD for AESNI backed device
 #
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..20582a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
+#define PMD_INIT_FUNC_TRACE() do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION
+#define PMD_SESSION_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() session: " fmt "\n", __func__, ## args)
+#else
+#define PMD_SESSION_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): driver " fmt "\n", __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..7aa5cdd
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	PMD_INIT_LOG(DEBUG, "queue %u addresses:", vq->vq_queue_index);
+	PMD_INIT_LOG(DEBUG, "\t desc_addr: %" PRIx64, desc_addr);
+	PMD_INIT_LOG(DEBUG, "\t aval_addr: %" PRIx64, avail_addr);
+	PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
+	PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		PMD_INIT_LOG(ERR, "invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		PMD_INIT_LOG(ERR, "offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		PMD_INIT_LOG(ERR,
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		PMD_INIT_LOG(DEBUG, "failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			PMD_INIT_LOG(ERR,
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			PMD_INIT_LOG(DEBUG,
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		PMD_INIT_LOG(DEBUG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		PMD_INIT_LOG(INFO, "no modern virtio pci device found.");
+		return -1;
+	}
+
+	PMD_INIT_LOG(INFO, "found modern virtio pci device.");
+
+	PMD_INIT_LOG(DEBUG, "common cfg mapped at: %p", hw->common_cfg);
+	PMD_INIT_LOG(DEBUG, "device cfg mapped at: %p", hw->dev_cfg);
+	PMD_INIT_LOG(DEBUG, "isr cfg mapped at: %p", hw->isr);
+	PMD_INIT_LOG(DEBUG, "notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		PMD_INIT_LOG(INFO, "modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..cd316a6
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+	const struct rte_cryptodev_capabilities *virtio_dev_capabilities;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..1bd0e89
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,176 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	PMD_INIT_LOG(DEBUG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 2/4] net/nfp: update PMD for using new CPP interface
    2018-03-23 17:35  1% ` [dpdk-dev] [PATCH 1/4] net/nfp: add NFP CPP support Alejandro Lucero
@ 2018-03-23 17:35  6% ` Alejandro Lucero
  1 sibling, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-03-23 17:35 UTC (permalink / raw)
  To: dev

PF PMD support was based on NSPU interface. This patch changes the
PMD for using the new CPP user space interface which gives more
flexibility for adding new functionalities, and specifically at this
point, for properly selecting the right firmware file which requires
to know about the card to work with.

This change just changes initialization with the datapath being unaffected.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/Makefile      |  17 ++-
 drivers/net/nfp/nfp_net.c     | 342 +++++++++++++++++++++++++++++-------------
 drivers/net/nfp/nfp_net_pmd.h |  16 +-
 3 files changed, 264 insertions(+), 111 deletions(-)

diff --git a/drivers/net/nfp/Makefile b/drivers/net/nfp/Makefile
index aa3b68a..ab4e0a7 100644
--- a/drivers/net/nfp/Makefile
+++ b/drivers/net/nfp/Makefile
@@ -20,11 +20,24 @@ EXPORT_MAP := rte_pmd_nfp_version.map
 
 LIBABIVER := 1
 
+VPATH += $(SRCDIR)/nfpcore
+
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_cppcore.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_cpp_pcie_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_mutex.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_resource.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_crc.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_mip.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nffw.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_hwinfo.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_rtsym.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nsp.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nsp_cmds.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nsp_eth.c
+
 #
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_net.c
-SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nfpu.c
-SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nspu.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index e5bfde6..0657a23 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014, 2015 Netronome Systems, Inc.
+ * Copyright (c) 2014-2018 Netronome Systems, Inc.
  * All rights reserved.
  *
  * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
@@ -55,7 +55,13 @@
 #include <rte_alarm.h>
 #include <rte_spinlock.h>
 
-#include "nfp_nfpu.h"
+#include "nfpcore/nfp_cpp.h"
+#include "nfpcore/nfp_nffw.h"
+#include "nfpcore/nfp_hwinfo.h"
+#include "nfpcore/nfp_mip.h"
+#include "nfpcore/nfp_rtsym.h"
+#include "nfpcore/nfp_nsp.h"
+
 #include "nfp_net_pmd.h"
 #include "nfp_net_logs.h"
 #include "nfp_net_ctrl.h"
@@ -104,12 +110,8 @@ static int nfp_net_rss_reta_write(struct rte_eth_dev *dev,
 static int nfp_net_rss_hash_write(struct rte_eth_dev *dev,
 			struct rte_eth_rss_conf *rss_conf);
 
-/*
- * The offset of the queue controller queues in the PCIe Target. These
- * happen to be at the same offset on the NFP6000 and the NFP3200 so
- * we use a single macro here.
- */
-#define NFP_PCIE_QUEUE(_q)	(0x800 * ((_q) & 0xff))
+/* The offset of the queue controller queues in the PCIe Target */
+#define NFP_PCIE_QUEUE(_q) (0x80000 + (NFP_QCP_QUEUE_ADDR_SZ * ((_q) & 0xff)))
 
 /* Maximum value which can be added to a queue with one transaction */
 #define NFP_QCP_MAX_ADD	0x7f
@@ -625,47 +627,29 @@ enum nfp_qcp_ptr {
 #define ETH_ADDR_LEN	6
 
 static void
-nfp_eth_copy_mac_reverse(uint8_t *dst, const uint8_t *src)
+nfp_eth_copy_mac(uint8_t *dst, const uint8_t *src)
 {
 	int i;
 
 	for (i = 0; i < ETH_ADDR_LEN; i++)
-		dst[ETH_ADDR_LEN - i - 1] = src[i];
+		dst[i] = src[i];
 }
 
 static int
 nfp_net_pf_read_mac(struct nfp_net_hw *hw, int port)
 {
-	union eth_table_entry *entry;
-	int idx, i;
-
-	idx = port;
-	entry = hw->eth_table;
-
-	/* Reading NFP ethernet table obtained before */
-	for (i = 0; i < NSP_ETH_MAX_COUNT; i++) {
-		if (!(entry->port & NSP_ETH_PORT_LANES_MASK)) {
-			/* port not in use */
-			entry++;
-			continue;
-		}
-		if (idx == 0)
-			break;
-		idx--;
-		entry++;
-	}
-
-	if (i == NSP_ETH_MAX_COUNT)
-		return -EINVAL;
+	struct nfp_eth_table *nfp_eth_table;
 
+	nfp_eth_table = nfp_eth_read_ports(hw->cpp);
 	/*
 	 * hw points to port0 private data. We need hw now pointing to
 	 * right port.
 	 */
 	hw += port;
-	nfp_eth_copy_mac_reverse((uint8_t *)&hw->mac_addr,
-				 (uint8_t *)&entry->mac_addr);
+	nfp_eth_copy_mac((uint8_t *)&hw->mac_addr,
+			 (uint8_t *)&nfp_eth_table->ports[port].mac_addr);
 
+	free(nfp_eth_table);
 	return 0;
 }
 
@@ -831,7 +815,7 @@ enum nfp_qcp_ptr {
 
 	if (hw->is_pf)
 		/* Configure the physical port up */
-		nfp_nsp_eth_config(hw->nspu_desc, hw->pf_port_idx, 1);
+		nfp_eth_set_configured(hw->cpp, hw->pf_port_idx, 1);
 
 	hw->ctrl = new_ctrl;
 
@@ -882,7 +866,7 @@ enum nfp_qcp_ptr {
 
 	if (hw->is_pf)
 		/* Configure the physical port down */
-		nfp_nsp_eth_config(hw->nspu_desc, hw->pf_port_idx, 0);
+		nfp_eth_set_configured(hw->cpp, hw->pf_port_idx, 0);
 }
 
 /* Reset and stop device. The device can not be restarted. */
@@ -2734,10 +2718,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	uint64_t tx_bar_off = 0, rx_bar_off = 0;
 	uint32_t start_q;
 	int stride = 4;
-
-	nspu_desc_t *nspu_desc = NULL;
-	uint64_t bar_offset;
 	int port = 0;
+	int err;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -2758,7 +2740,6 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
 		/* This points to the specific port private data */
 		hw = &hwport0[port];
-		hw->pf_port_idx = port;
 	} else {
 		hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 		hwport0 = 0;
@@ -2792,19 +2773,14 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	}
 
 	if (hw->is_pf && port == 0) {
-		nspu_desc = hw->nspu_desc;
-
-		if (nfp_nsp_map_ctrl_bar(nspu_desc, &bar_offset) != 0) {
-			/*
-			 * A firmware should be there after PF probe so this
-			 * should not happen.
-			 */
-			RTE_LOG(ERR, PMD, "PF BAR symbol resolution failed\n");
-			return -ENODEV;
+		hw->ctrl_bar = nfp_rtsym_map(hw->sym_tbl, "_pf0_net_bar0",
+					     hw->total_ports * 32768,
+					     &hw->ctrl_area);
+		if (!hw->ctrl_bar) {
+			printf("nfp_rtsym_map fails for _pf0_net_ctrl_bar\n");
+			return -EIO;
 		}
 
-		/* vNIC PF control BAR is a subset of PF PCI device BAR */
-		hw->ctrl_bar += bar_offset;
 		PMD_INIT_LOG(DEBUG, "ctrl bar: %p\n", hw->ctrl_bar);
 	}
 
@@ -2828,13 +2804,14 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	case PCI_DEVICE_ID_NFP6000_PF_NIC:
 	case PCI_DEVICE_ID_NFP6000_VF_NIC:
 		start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_TXQ);
-		tx_bar_off = NFP_PCIE_QUEUE(start_q);
+		tx_bar_off = start_q * NFP_QCP_QUEUE_ADDR_SZ;
 		start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_RXQ);
-		rx_bar_off = NFP_PCIE_QUEUE(start_q);
+		rx_bar_off = start_q * NFP_QCP_QUEUE_ADDR_SZ;
 		break;
 	default:
 		RTE_LOG(ERR, PMD, "nfp_net: no device ID matching\n");
-		return -ENODEV;
+		err = -ENODEV;
+		goto dev_err_ctrl_map;
 	}
 
 	PMD_INIT_LOG(DEBUG, "tx_bar_off: 0x%" PRIx64 "\n", tx_bar_off);
@@ -2842,17 +2819,19 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
 	if (hw->is_pf && port == 0) {
 		/* configure access to tx/rx vNIC BARs */
-		nfp_nsp_map_queues_bar(nspu_desc, &bar_offset);
-		PMD_INIT_LOG(DEBUG, "tx/rx bar_offset: %" PRIx64 "\n",
-				    bar_offset);
-		hwport0->hw_queues = (uint8_t *)pci_dev->mem_resource[0].addr;
-
-		/* vNIC PF tx/rx BARs are a subset of PF PCI device */
-		hwport0->hw_queues += bar_offset;
+		hwport0->hw_queues = nfp_cpp_map_area(hw->cpp, 0, 0,
+						      NFP_PCIE_QUEUE(0),
+						      NFP_QCP_QUEUE_AREA_SZ,
+						      &hw->hwqueues_area);
+
+		if (!hwport0->hw_queues) {
+			printf("nfp_rtsym_map fails for net.qc\n");
+			err = -EIO;
+			goto dev_err_ctrl_map;
+		}
 
-		/* Lets seize the chance to read eth table from hw */
-		if (nfp_nsp_eth_read_table(nspu_desc, &hw->eth_table))
-			return -ENODEV;
+		PMD_INIT_LOG(DEBUG, "tx/rx bar address: 0x%p\n",
+				    hwport0->hw_queues);
 	}
 
 	if (hw->is_pf) {
@@ -2912,7 +2891,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	eth_dev->data->mac_addrs = rte_zmalloc("mac_addr", ETHER_ADDR_LEN, 0);
 	if (eth_dev->data->mac_addrs == NULL) {
 		PMD_INIT_LOG(ERR, "Failed to space for MAC address");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto dev_err_queues_map;
 	}
 
 	if (hw->is_pf) {
@@ -2923,6 +2903,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	}
 
 	if (!is_valid_assigned_ether_addr((struct ether_addr *)&hw->mac_addr)) {
+		PMD_INIT_LOG(INFO, "Using random mac address for port %d\n",
+				   port);
 		/* Using random mac addresses for VFs */
 		eth_random_addr(&hw->mac_addr[0]);
 		nfp_net_write_mac(hw, (uint8_t *)&hw->mac_addr);
@@ -2951,11 +2933,19 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	nfp_net_stats_reset(eth_dev);
 
 	return 0;
+
+dev_err_queues_map:
+		nfp_cpp_area_free(hw->hwqueues_area);
+dev_err_ctrl_map:
+		nfp_cpp_area_free(hw->ctrl_area);
+
+	return err;
 }
 
 static int
 nfp_pf_create_dev(struct rte_pci_device *dev, int port, int ports,
-		  nfpu_desc_t *nfpu_desc, void **priv)
+		  struct nfp_cpp *cpp, struct nfp_hwinfo *hwinfo,
+		  int phys_port, struct nfp_rtsym_table *sym_tbl, void **priv)
 {
 	struct rte_eth_dev *eth_dev;
 	struct nfp_net_hw *hw;
@@ -2993,12 +2983,16 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	 * Then dev_private is adjusted per port.
 	 */
 	hw = (struct nfp_net_hw *)(eth_dev->data->dev_private) + port;
-	hw->nspu_desc = nfpu_desc->nspu;
-	hw->nfpu_desc = nfpu_desc;
+	hw->cpp = cpp;
+	hw->hwinfo = hwinfo;
+	hw->sym_tbl = sym_tbl;
+	hw->pf_port_idx = phys_port;
 	hw->is_pf = 1;
 	if (ports > 1)
 		hw->pf_multiport_enabled = 1;
 
+	hw->total_ports = ports;
+
 	eth_dev->device = &dev->device;
 	rte_eth_copy_pci_info(eth_dev, dev);
 
@@ -3012,55 +3006,191 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	return ret;
 }
 
+#define DEFAULT_FW_PATH       "/lib/firmware/netronome"
+
+static int
+nfp_fw_upload(struct rte_pci_device *dev, struct nfp_nsp *nsp, char *card)
+{
+	struct nfp_cpp *cpp = nsp->cpp;
+	int fw_f;
+	char *fw_buf;
+	char fw_name[100];
+	char serial[100];
+	struct stat file_stat;
+	off_t fsize, bytes;
+
+	/* Looking for firmware file in order of priority */
+
+	/* First try to find a firmware image specific for this device */
+	sprintf(serial, "serial-%02x-%02x-%02x-%02x-%02x-%02x-%02hhx-%02hhx",
+		cpp->serial[0], cpp->serial[1], cpp->serial[2], cpp->serial[3],
+		cpp->serial[4], cpp->serial[5], cpp->interface >> 8,
+		cpp->interface & 0xff);
+
+	sprintf(fw_name, "%s/%s.nffw", DEFAULT_FW_PATH, serial);
+
+	RTE_LOG(DEBUG, PMD, "Trying with fw file: %s\n", fw_name);
+	fw_f = open(fw_name, O_RDONLY);
+	if (fw_f > 0)
+		goto read_fw;
+
+	/* Then try the PCI name */
+	sprintf(fw_name, "%s/pci-%s.nffw", DEFAULT_FW_PATH, dev->device.name);
+
+	RTE_LOG(DEBUG, PMD, "Trying with fw file: %s\n", fw_name);
+	fw_f = open(fw_name, O_RDONLY);
+	if (fw_f > 0)
+		goto read_fw;
+
+	/* Finally try the card type and media */
+	sprintf(fw_name, "%s/%s", DEFAULT_FW_PATH, card);
+	RTE_LOG(DEBUG, PMD, "Trying with fw file: %s\n", fw_name);
+	fw_f = open(fw_name, O_RDONLY);
+	if (fw_f < 0) {
+		RTE_LOG(INFO, PMD, "Firmware file %s not found.", fw_name);
+		return -ENOENT;
+	}
+
+read_fw:
+	if (fstat(fw_f, &file_stat) < 0) {
+		RTE_LOG(INFO, PMD, "Firmware file %s size is unknown", fw_name);
+		close(fw_f);
+		return -ENOENT;
+	}
+
+	fsize = file_stat.st_size;
+	RTE_LOG(INFO, PMD, "Firmware file found at %s with size: %" PRIu64 "\n",
+			    fw_name, (uint64_t)fsize);
+
+	fw_buf = malloc((size_t)fsize);
+	if (!fw_buf) {
+		RTE_LOG(INFO, PMD, "malloc failed for fw buffer");
+		close(fw_f);
+		return -ENOMEM;
+	}
+	memset(fw_buf, 0, fsize);
+
+	bytes = read(fw_f, fw_buf, fsize);
+	if (bytes != fsize) {
+		RTE_LOG(INFO, PMD, "Reading fw to buffer failed.\n"
+				   "Just %" PRIu64 " of %" PRIu64 " bytes read",
+				   (uint64_t)bytes, (uint64_t)fsize);
+		free(fw_buf);
+		close(fw_f);
+		return -EIO;
+	}
+
+	RTE_LOG(INFO, PMD, "Uploading the firmware ...");
+	nfp_nsp_load_fw(nsp, fw_buf, bytes);
+	RTE_LOG(INFO, PMD, "Done");
+
+	free(fw_buf);
+	close(fw_f);
+
+	return 0;
+}
+
+static int
+nfp_fw_setup(struct rte_pci_device *dev, struct nfp_cpp *cpp,
+	     struct nfp_eth_table *nfp_eth_table, struct nfp_hwinfo *hwinfo)
+{
+	struct nfp_nsp *nsp;
+	const char *nfp_fw_model;
+	char card_desc[100];
+	int err = 0;
+
+	nfp_fw_model = nfp_hwinfo_lookup(hwinfo, "assembly.partno");
+
+	if (nfp_fw_model) {
+		RTE_LOG(INFO, PMD, "firmware model found: %s\n", nfp_fw_model);
+	} else {
+		RTE_LOG(ERR, PMD, "firmware model NOT found\n");
+		return -EIO;
+	}
+
+	if (nfp_eth_table->count == 0 || nfp_eth_table->count > 8) {
+		RTE_LOG(ERR, PMD, "NFP ethernet table reports wrong ports: %u\n",
+		       nfp_eth_table->count);
+		return -EIO;
+	}
+
+	RTE_LOG(INFO, PMD, "NFP ethernet port table reports %u ports\n",
+			   nfp_eth_table->count);
+
+	RTE_LOG(INFO, PMD, "Port speed: %u\n", nfp_eth_table->ports[0].speed);
+
+	sprintf(card_desc, "nic_%s_%dx%d.nffw", nfp_fw_model,
+		nfp_eth_table->count, nfp_eth_table->ports[0].speed / 1000);
+
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp) {
+		RTE_LOG(ERR, PMD, "NFP error when obtaining NSP handle\n");
+		return -EIO;
+	}
+
+	nfp_nsp_device_soft_reset(nsp);
+	err = nfp_fw_upload(dev, nsp, card_desc);
+
+	nfp_nsp_close(nsp);
+	return err;
+}
+
 static int nfp_pf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 			    struct rte_pci_device *dev)
 {
-	nfpu_desc_t *nfpu_desc;
-	nspu_desc_t *nspu_desc;
-	uint64_t offset_symbol;
-	uint8_t *bar_offset;
-	int major, minor;
+	struct nfp_cpp *cpp;
+	struct nfp_hwinfo *hwinfo;
+	struct nfp_rtsym_table *sym_tbl;
+	struct nfp_eth_table *nfp_eth_table = NULL;
 	int total_ports;
 	void *priv = 0;
 	int ret = -ENODEV;
+	int err;
 	int i;
 
 	if (!dev)
 		return ret;
 
-	nfpu_desc = rte_malloc("nfp nfpu", sizeof(nfpu_desc_t), 0);
-	if (!nfpu_desc)
-		return -ENOMEM;
-
-	if (nfpu_open(dev, nfpu_desc, 0) < 0) {
-		RTE_LOG(ERR, PMD,
-			"nfpu_open failed\n");
-		goto nfpu_error;
+	cpp = nfp_cpp_from_device_name(dev->device.name);
+	if (!cpp) {
+		RTE_LOG(ERR, PMD, "A CPP handle can not be obtained");
+		ret = -EIO;
+		goto error;
 	}
 
-	nspu_desc = nfpu_desc->nspu;
+	hwinfo = nfp_hwinfo_read(cpp);
+	if (!hwinfo) {
+		RTE_LOG(ERR, PMD, "Error reading hwinfo table");
+		return -EIO;
+	}
 
+	nfp_eth_table = nfp_eth_read_ports(cpp);
+	if (!nfp_eth_table) {
+		RTE_LOG(ERR, PMD, "Error reading NFP ethernet table\n");
+		return -EIO;
+	}
 
-	/* Check NSP ABI version */
-	if (nfp_nsp_get_abi_version(nspu_desc, &major, &minor) < 0) {
-		RTE_LOG(INFO, PMD, "NFP NSP not present\n");
+	if (nfp_fw_setup(dev, cpp, nfp_eth_table, hwinfo)) {
+		RTE_LOG(INFO, PMD, "Error when uploading firmware\n");
+		ret = -EIO;
 		goto error;
 	}
-	PMD_INIT_LOG(INFO, "nspu ABI version: %d.%d\n", major, minor);
 
-	if ((major == 0) && (minor < 20)) {
-		RTE_LOG(INFO, PMD, "NFP NSP ABI version too old. Required 0.20 or higher\n");
+	/* Now the symbol table should be there */
+	sym_tbl = nfp_rtsym_table_read(cpp);
+	if (!sym_tbl) {
+		RTE_LOG(ERR, PMD, "Something is wrong with the firmware"
+				" symbol table");
+		ret = -EIO;
 		goto error;
 	}
 
-	ret = nfp_nsp_fw_setup(nspu_desc, "nfd_cfg_pf0_num_ports",
-			       &offset_symbol);
-	if (ret)
+	total_ports = nfp_rtsym_read_le(sym_tbl, "nfd_cfg_pf0_num_ports", &err);
+	if (total_ports != (int)nfp_eth_table->count) {
+		RTE_LOG(ERR, PMD, "Inconsistent number of ports\n");
+		ret = -EIO;
 		goto error;
-
-	bar_offset = (uint8_t *)dev->mem_resource[0].addr;
-	bar_offset += offset_symbol;
-	total_ports = (uint32_t)*bar_offset;
+	}
 	PMD_INIT_LOG(INFO, "Total pf ports: %d\n", total_ports);
 
 	if (total_ports <= 0 || total_ports > 8) {
@@ -3070,18 +3200,15 @@ static int nfp_pf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	}
 
 	for (i = 0; i < total_ports; i++) {
-		ret = nfp_pf_create_dev(dev, i, total_ports, nfpu_desc, &priv);
+		ret = nfp_pf_create_dev(dev, i, total_ports, cpp, hwinfo,
+					nfp_eth_table->ports[i].index,
+					sym_tbl, &priv);
 		if (ret)
-			goto error;
+			break;
 	}
 
-	return 0;
-
 error:
-	nfpu_close(nfpu_desc);
-nfpu_error:
-	rte_free(nfpu_desc);
-
+	free(nfp_eth_table);
 	return ret;
 }
 
@@ -3129,8 +3256,19 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev)
 	if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
 	    (pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
 		port = get_pf_port_number(eth_dev->data->name);
+		/*
+		 * hotplug is not possible with multiport PF although freeing
+		 * data structures can be done for first port.
+		 */
+		if (port != 0)
+			return -ENOTSUP;
 		hwport0 = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 		hw = &hwport0[port];
+		nfp_cpp_area_free(hw->ctrl_area);
+		nfp_cpp_area_free(hw->hwqueues_area);
+		free(hw->hwinfo);
+		free(hw->sym_tbl);
+		nfp_cpp_free(hw->cpp);
 	} else {
 		hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 	}
diff --git a/drivers/net/nfp/nfp_net_pmd.h b/drivers/net/nfp/nfp_net_pmd.h
index 1ae0ea6..097c871 100644
--- a/drivers/net/nfp/nfp_net_pmd.h
+++ b/drivers/net/nfp/nfp_net_pmd.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014, 2015 Netronome Systems, Inc.
+ * Copyright (c) 2014-2018 Netronome Systems, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -63,6 +63,7 @@
 #define NFP_NET_CRTL_BAR        0
 #define NFP_NET_TX_BAR          2
 #define NFP_NET_RX_BAR          2
+#define NFP_QCP_QUEUE_AREA_SZ			0x80000
 
 /* Macros for accessing the Queue Controller Peripheral 'CSRs' */
 #define NFP_QCP_QUEUE_OFF(_x)                 ((_x) * 0x800)
@@ -430,20 +431,21 @@ struct nfp_net_hw {
 	/* Records starting point for counters */
 	struct rte_eth_stats eth_stats_base;
 
-#ifdef NFP_NET_LIBNFP
 	struct nfp_cpp *cpp;
 	struct nfp_cpp_area *ctrl_area;
-	struct nfp_cpp_area *tx_area;
-	struct nfp_cpp_area *rx_area;
+	struct nfp_cpp_area *hwqueues_area;
 	struct nfp_cpp_area *msix_area;
-#endif
+
 	uint8_t *hw_queues;
 	uint8_t is_pf;
 	uint8_t pf_port_idx;
 	uint8_t pf_multiport_enabled;
+	uint8_t total_ports;
+
 	union eth_table_entry *eth_table;
-	nspu_desc_t *nspu_desc;
-	nfpu_desc_t *nfpu_desc;
+
+	struct nfp_hwinfo *hwinfo;
+	struct nfp_rtsym_table *sym_tbl;
 };
 
 struct nfp_net_adapter {
-- 
1.9.1

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH 1/4] net/nfp: add NFP CPP support
  @ 2018-03-23 17:35  1% ` Alejandro Lucero
  2018-03-23 17:35  6% ` [dpdk-dev] [PATCH 2/4] net/nfp: update PMD for using new CPP interface Alejandro Lucero
  1 sibling, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-03-23 17:35 UTC (permalink / raw)
  To: dev

CPP refers to the internal NFP Command Push Pull bus. This patch allows
to create CPP commands from user space allowing to access any single
part of the chip.

This CPP interface is the base for having other functionalities like
mutexes when accessing specific chip components, chip resources management,
firmware upload or using the NSP, an embedded arm processor which can
perform tasks on demand.

NSP was the previous only way for doing things in the chip by the PMD,
where a NSPU interface was used for commands like firmware upload or
port link configuration. CPP interface supersedes NSPU, but it is still
possible to use NSP through CPP.

CPP interface adds a great flexibility for doing things like extended
stats, firmware debugging or selecting properly the firmware file to
upload.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h    | 748 +++++++++++++++++
 drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h |  62 ++
 drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h    | 620 ++++++++++++++
 drivers/net/nfp/nfpcore/nfp6000/nfp6000.h         |  68 ++
 drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h         |  54 ++
 drivers/net/nfp/nfpcore/nfp_cpp.h                 | 803 ++++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c        | 962 ++++++++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_cppcore.c             | 901 ++++++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_crc.c                 |  75 ++
 drivers/net/nfp/nfpcore/nfp_crc.h                 |  45 +
 drivers/net/nfp/nfpcore/nfp_hwinfo.c              | 226 +++++
 drivers/net/nfp/nfpcore/nfp_hwinfo.h              | 111 +++
 drivers/net/nfp/nfpcore/nfp_mip.c                 | 180 ++++
 drivers/net/nfp/nfpcore/nfp_mip.h                 |  47 ++
 drivers/net/nfp/nfpcore/nfp_mutex.c               | 450 ++++++++++
 drivers/net/nfp/nfpcore/nfp_nffw.c                | 261 ++++++
 drivers/net/nfp/nfpcore/nfp_nffw.h                | 112 +++
 drivers/net/nfp/nfpcore/nfp_nsp.c                 | 453 ++++++++++
 drivers/net/nfp/nfpcore/nfp_nsp.h                 | 330 ++++++++
 drivers/net/nfp/nfpcore/nfp_nsp_cmds.c            | 135 +++
 drivers/net/nfp/nfpcore/nfp_nsp_eth.c             | 691 ++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_resource.c            | 291 +++++++
 drivers/net/nfp/nfpcore/nfp_resource.h            |  78 ++
 drivers/net/nfp/nfpcore/nfp_rtsym.c               | 353 ++++++++
 drivers/net/nfp/nfpcore/nfp_rtsym.h               |  87 ++
 drivers/net/nfp/nfpcore/nfp_target.h              | 605 ++++++++++++++
 26 files changed, 8748 insertions(+)
 create mode 100644 drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp6000/nfp6000.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_cpp.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_cppcore.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_crc.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_crc.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_hwinfo.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_hwinfo.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_mip.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_mip.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_mutex.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nffw.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nffw.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp_eth.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_resource.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_resource.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_rtsym.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_rtsym.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_target.h

diff --git a/drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h b/drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h
new file mode 100644
index 0000000..fbeec57
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h
@@ -0,0 +1,748 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_CPPAT_H__
+#define __NFP_CPPAT_H__
+
+#include "nfp_platform.h"
+#include "nfp_resid.h"
+
+/* This file contains helpers for creating CPP commands
+ *
+ * All magic NFP-6xxx IMB 'mode' numbers here are from:
+ * Databook (1 August 2013)
+ * - System Overview and Connectivity
+ * -- Internal Connectivity
+ * --- Distributed Switch Fabric - Command Push/Pull (DSF-CPP) Bus
+ * ---- CPP addressing
+ * ----- Table 3.6. CPP Address Translation Mode Commands
+ */
+
+#define _NIC_NFP6000_MU_LOCALITY_DIRECT 2
+
+static inline int
+_nfp6000_decode_basic(uint64_t addr, int *dest_island, int cpp_tgt, int mode,
+		      int addr40, int isld1, int isld0);
+
+static uint64_t
+_nic_mask64(int msb, int lsb, int at0)
+{
+	uint64_t v;
+	int w = msb - lsb + 1;
+
+	if (w == 64)
+		return ~(uint64_t)0;
+
+	if ((lsb + w) > 64)
+		return 0;
+
+	v = (UINT64_C(1) << w) - 1;
+
+	if (at0)
+		return v;
+
+	return v << lsb;
+}
+
+/* For VQDR, we may not modify the Channel bits, which might overlap
+ * with the Index bit. When it does, we need to ensure that isld0 == isld1.
+ */
+static inline int
+_nfp6000_encode_basic(uint64_t *addr, int dest_island, int cpp_tgt, int mode,
+		      int addr40, int isld1, int isld0)
+{
+	uint64_t _u64;
+	int iid_lsb, idx_lsb;
+	int i, v = 0;
+	int isld[2];
+
+	isld[0] = isld0;
+	isld[1] = isld1;
+
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_MU:
+		/* This function doesn't handle MU */
+		return NFP_ERRNO(EINVAL);
+	case NFP6000_CPPTGT_CTXPB:
+		/* This function doesn't handle CTXPB */
+		return NFP_ERRNO(EINVAL);
+	default:
+		break;
+	}
+
+	switch (mode) {
+	case 0:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			/*
+			 * In this specific mode we'd rather not modify the
+			 * address but we can verify if the existing contents
+			 * will point to a valid island.
+			 */
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1,
+						  isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			/*
+			 * If dest_island is invalid, the current address won't
+			 * go where expected.
+			 */
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		iid_lsb = (addr40) ? 34 : 26;
+
+		/* <39:34> or <31:26> */
+		_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+		*addr &= ~_u64;
+		*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+		return 0;
+	case 1:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1, isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			/*
+			 * If dest_island is invalid, the current address won't
+			 * go where expected.
+			 */
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		idx_lsb = (addr40) ? 39 : 31;
+		if (dest_island == isld0) {
+			/* Only need to clear the Index bit */
+			*addr &= ~_nic_mask64(idx_lsb, idx_lsb, 0);
+			return 0;
+		}
+
+		if (dest_island == isld1) {
+			/* Only need to set the Index bit */
+			*addr |= (UINT64_C(1) << idx_lsb);
+			return 0;
+		}
+
+		return NFP_ERRNO(ENODEV);
+	case 2:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			/* iid<0> = addr<30> = channel<0> */
+			/* channel<1> = addr<31> = Index */
+
+			/*
+			 * Special case where we allow channel bits to be set
+			 * before hand and with them select an island.
+			 * So we need to confirm that it's at least plausible.
+			 */
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1, isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			/*
+			 * If dest_island is invalid, the current address won't
+			 * go where expected.
+			 */
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		/*
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 **/
+		isld[0] &= ~1;
+		isld[1] &= ~1;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 1;
+
+		/*
+		 * Try each option, take first one that fits. Not sure if we
+		 * would want to do some smarter searching and prefer 0 or non-0
+		 * island IDs.
+		 */
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 2; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+
+		return NFP_ERRNO(ENODEV);
+	case 3:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			/*
+			 * iid<0> = addr<29> = data
+			 * iid<1> = addr<30> = channel<0>
+			 * channel<1> = addr<31> = Index
+			 */
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1, isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		isld[0] &= ~3;
+		isld[1] &= ~3;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 2;
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 4; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+		return NFP_ERRNO(ENODEV);
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_decode_basic(uint64_t addr, int *dest_island, int cpp_tgt, int mode,
+		      int addr40, int isld1, int isld0)
+{
+	int iid_lsb, idx_lsb;
+
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_MU:
+		/* This function doesn't handle MU */
+		return NFP_ERRNO(EINVAL);
+	case NFP6000_CPPTGT_CTXPB:
+		/* This function doesn't handle CTXPB */
+		return NFP_ERRNO(EINVAL);
+	default:
+		break;
+	}
+
+	switch (mode) {
+	case 0:
+		/*
+		 * For VQDR, in this mode for 32-bit addressing it would be
+		 * islands 0, 16, 32 and 48 depending on channel and upper
+		 * address bits. Since those are not all valid islands, most
+		 * decode cases would result in bad island IDs, but we do them
+		 * anyway since this is decoding an address that is already
+		 * assumed to be used as-is to get to sram.
+		 */
+		iid_lsb = (addr40) ? 34 : 26;
+		*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+		return 0;
+	case 1:
+		/*
+		 * For VQDR 32-bit, this would decode as:
+		 *	Channel 0: island#0
+		 *	Channel 1: island#0
+		 *	Channel 2: island#1
+		 *	Channel 3: island#1
+		 *
+		 * That would be valid as long as both islands have VQDR.
+		 * Let's allow this.
+		 */
+
+		idx_lsb = (addr40) ? 39 : 31;
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1;
+		else
+			*dest_island = isld0;
+
+		return 0;
+	case 2:
+		/*
+		 * For VQDR 32-bit:
+		 *	Channel 0: (island#0 | 0)
+		 *	Channel 1: (island#0 | 1)
+		 *	Channel 2: (island#1 | 0)
+		 *	Channel 3: (island#1 | 1)
+		 *
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 */
+		isld0 &= ~1;
+		isld1 &= ~1;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 1;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 1);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 1);
+
+		return 0;
+	case 3:
+		/*
+		 * In this mode the data address starts to affect the island ID
+		 * so rather not allow it. In some really specific case one
+		 * could use this to send the upper half of the VQDR channel to
+		 * another MU, but this is getting very specific. However, as
+		 * above for mode 0, this is the decoder and the caller should
+		 * validate the resulting IID. This blindly does what the
+		 * silicon would do.
+		 */
+
+		isld0 &= ~3;
+		isld1 &= ~3;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 2;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 3);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 3);
+
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_cppat_mu_locality_lsb(int mode, int addr40)
+{
+	switch (mode) {
+	case 0:
+	case 1:
+	case 2:
+	case 3:
+		return (addr40) ? 38 : 30;
+	default:
+		break;
+	}
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_encode_mu(uint64_t *addr, int dest_island, int mode, int addr40,
+		   int isld1, int isld0)
+{
+	uint64_t _u64;
+	int iid_lsb, idx_lsb, locality_lsb;
+	int i, v;
+	int isld[2];
+	int da;
+
+	isld[0] = isld0;
+	isld[1] = isld1;
+	locality_lsb = _nfp6000_cppat_mu_locality_lsb(mode, addr40);
+
+	if (((*addr >> locality_lsb) & 3) == _NIC_NFP6000_MU_LOCALITY_DIRECT)
+		da = 1;
+	else
+		da = 0;
+
+	switch (mode) {
+	case 0:
+		iid_lsb = (addr40) ? 32 : 24;
+		_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+		*addr &= ~_u64;
+		*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+		return 0;
+	case 1:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+			*addr &= ~_u64;
+			*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+			return 0;
+		}
+
+		idx_lsb = (addr40) ? 37 : 29;
+		if (dest_island == isld0) {
+			*addr &= ~_nic_mask64(idx_lsb, idx_lsb, 0);
+			return 0;
+		}
+
+		if (dest_island == isld1) {
+			*addr |= (UINT64_C(1) << idx_lsb);
+			return 0;
+		}
+
+		return NFP_ERRNO(ENODEV);
+	case 2:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+			*addr &= ~_u64;
+			*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+			return 0;
+		}
+
+		/*
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 */
+		isld[0] &= ~1;
+		isld[1] &= ~1;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 1;
+
+		/*
+		 * Try each option, take first one that fits. Not sure if we
+		 * would want to do some smarter searching and prefer 0 or
+		 * non-0 island IDs.
+		 */
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 2; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+		return NFP_ERRNO(ENODEV);
+	case 3:
+		/*
+		 * Only the EMU will use 40 bit addressing. Silently set the
+		 * direct locality bit for everyone else. The SDK toolchain
+		 * uses dest_island <= 0 to test for atypical address encodings
+		 * to support access to local-island CTM with a 32-but address
+		 * (high-locality is effectively ignored and just used for
+		 * routing to island #0).
+		 */
+		if (dest_island > 0 &&
+		    (dest_island < 24 || dest_island > 26)) {
+			*addr |= ((uint64_t)_NIC_NFP6000_MU_LOCALITY_DIRECT)
+				 << locality_lsb;
+			da = 1;
+		}
+
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+			*addr &= ~_u64;
+			*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+			return 0;
+		}
+
+		isld[0] &= ~3;
+		isld[1] &= ~3;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 2;
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 4; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+
+		return NFP_ERRNO(ENODEV);
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_decode_mu(uint64_t addr, int *dest_island, int mode, int addr40,
+		   int isld1, int isld0)
+{
+	int iid_lsb, idx_lsb, locality_lsb;
+	int da;
+
+	locality_lsb = _nfp6000_cppat_mu_locality_lsb(mode, addr40);
+
+	if (((addr >> locality_lsb) & 3) == _NIC_NFP6000_MU_LOCALITY_DIRECT)
+		da = 1;
+	else
+		da = 0;
+
+	switch (mode) {
+	case 0:
+		iid_lsb = (addr40) ? 32 : 24;
+		*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+		return 0;
+	case 1:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+			return 0;
+		}
+
+		idx_lsb = (addr40) ? 37 : 29;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1;
+		else
+			*dest_island = isld0;
+
+		return 0;
+	case 2:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+			return 0;
+		}
+		/*
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 */
+		isld0 &= ~1;
+		isld1 &= ~1;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 1;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 1);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 1);
+
+		return 0;
+	case 3:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+			return 0;
+		}
+
+		isld0 &= ~3;
+		isld1 &= ~3;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 2;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 3);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 3);
+
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_cppat_addr_encode(uint64_t *addr, int dest_island, int cpp_tgt,
+			   int mode, int addr40, int isld1, int isld0)
+{
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_NBI:
+	case NFP6000_CPPTGT_VQDR:
+	case NFP6000_CPPTGT_ILA:
+	case NFP6000_CPPTGT_PCIE:
+	case NFP6000_CPPTGT_ARM:
+	case NFP6000_CPPTGT_CRYPTO:
+	case NFP6000_CPPTGT_CLS:
+		return _nfp6000_encode_basic(addr, dest_island, cpp_tgt, mode,
+					     addr40, isld1, isld0);
+
+	case NFP6000_CPPTGT_MU:
+		return _nfp6000_encode_mu(addr, dest_island, mode, addr40,
+					  isld1, isld0);
+
+	case NFP6000_CPPTGT_CTXPB:
+		if (mode != 1 || addr40 != 0)
+			return NFP_ERRNO(EINVAL);
+
+		*addr &= ~_nic_mask64(29, 24, 0);
+		*addr |= (((uint64_t)dest_island) << 24) &
+			  _nic_mask64(29, 24, 0);
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_cppat_addr_decode(uint64_t addr, int *dest_island, int cpp_tgt,
+			   int mode, int addr40, int isld1, int isld0)
+{
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_NBI:
+	case NFP6000_CPPTGT_VQDR:
+	case NFP6000_CPPTGT_ILA:
+	case NFP6000_CPPTGT_PCIE:
+	case NFP6000_CPPTGT_ARM:
+	case NFP6000_CPPTGT_CRYPTO:
+	case NFP6000_CPPTGT_CLS:
+		return _nfp6000_decode_basic(addr, dest_island, cpp_tgt, mode,
+					     addr40, isld1, isld0);
+
+	case NFP6000_CPPTGT_MU:
+		return _nfp6000_decode_mu(addr, dest_island, mode, addr40,
+					  isld1, isld0);
+
+	case NFP6000_CPPTGT_CTXPB:
+		if (mode != 1 || addr40 != 0)
+			return -EINVAL;
+		*dest_island = (int)(addr >> 24) & 0x3F;
+		return 0;
+	default:
+		break;
+	}
+
+	return -EINVAL;
+}
+
+static inline int
+_nfp6000_cppat_addr_iid_clear(uint64_t *addr, int cpp_tgt, int mode, int addr40)
+{
+	int iid_lsb, locality_lsb, da;
+
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_NBI:
+	case NFP6000_CPPTGT_VQDR:
+	case NFP6000_CPPTGT_ILA:
+	case NFP6000_CPPTGT_PCIE:
+	case NFP6000_CPPTGT_ARM:
+	case NFP6000_CPPTGT_CRYPTO:
+	case NFP6000_CPPTGT_CLS:
+		switch (mode) {
+		case 0:
+			iid_lsb = (addr40) ? 34 : 26;
+			*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+			return 0;
+		case 1:
+			iid_lsb = (addr40) ? 39 : 31;
+			*addr &= ~_nic_mask64(iid_lsb, iid_lsb, 0);
+			return 0;
+		case 2:
+			iid_lsb = (addr40) ? 38 : 30;
+			*addr &= ~_nic_mask64(iid_lsb + 1, iid_lsb, 0);
+			return 0;
+		case 3:
+			iid_lsb = (addr40) ? 37 : 29;
+			*addr &= ~_nic_mask64(iid_lsb + 2, iid_lsb, 0);
+			return 0;
+		default:
+			break;
+		}
+	case NFP6000_CPPTGT_MU:
+		locality_lsb = _nfp6000_cppat_mu_locality_lsb(mode, addr40);
+		da = (((*addr >> locality_lsb) & 3) ==
+		      _NIC_NFP6000_MU_LOCALITY_DIRECT);
+		switch (mode) {
+		case 0:
+			iid_lsb = (addr40) ? 32 : 24;
+			*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+			return 0;
+		case 1:
+			if (da) {
+				iid_lsb = (addr40) ? 32 : 24;
+				*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+				return 0;
+			}
+			iid_lsb = (addr40) ? 37 : 29;
+			*addr &= ~_nic_mask64(iid_lsb, iid_lsb, 0);
+			return 0;
+		case 2:
+			if (da) {
+				iid_lsb = (addr40) ? 32 : 24;
+				*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+				return 0;
+			}
+
+			iid_lsb = (addr40) ? 36 : 28;
+			*addr &= ~_nic_mask64(iid_lsb + 1, iid_lsb, 0);
+			return 0;
+		case 3:
+			if (da) {
+				iid_lsb = (addr40) ? 32 : 24;
+				*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+				return 0;
+			}
+
+			iid_lsb = (addr40) ? 35 : 27;
+			*addr &= ~_nic_mask64(iid_lsb + 2, iid_lsb, 0);
+			return 0;
+		default:
+			break;
+		}
+	case NFP6000_CPPTGT_CTXPB:
+		if (mode != 1 || addr40 != 0)
+			return 0;
+		*addr &= ~(UINT64_C(0x3F) << 24);
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+#endif /* __NFP_CPPAT_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h b/drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h
new file mode 100644
index 0000000..bee2e94
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_PLATFORM_H__
+#define __NFP_PLATFORM_H__
+
+#include <fcntl.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+#include <ctype.h>
+#include <inttypes.h>
+#include <sys/cdefs.h>
+#include <sys/stat.h>
+#include <limits.h>
+#include <errno.h>
+
+#ifndef BIT_ULL
+#define BIT(x) (1 << (x))
+#define BIT_ULL(x) (1ULL << (x))
+#endif
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#define NFP_ERRNO(err) (errno = (err), -1)
+#define NFP_ERRNO_RET(err, ret) (errno = (err), (ret))
+#define NFP_NOERR(errv) (errno)
+#define NFP_ERRPTR(err) (errno = (err), NULL)
+#define NFP_PTRERR(errv) (errno)
+
+#endif /* __NFP_PLATFORM_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h b/drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h
new file mode 100644
index 0000000..629b27e
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h
@@ -0,0 +1,620 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_RESID_H__
+#define __NFP_RESID_H__
+
+#if (!defined(_NFP_RESID_NO_C_FUNC) && \
+	(defined(__NFP_TOOL_NFCC) || defined(__NFP_TOOL_NFAS)))
+#define _NFP_RESID_NO_C_FUNC
+#endif
+
+#ifndef _NFP_RESID_NO_C_FUNC
+#include "nfp_platform.h"
+#endif
+
+/*
+ * NFP Chip Architectures
+ *
+ * These are semi-arbitrary values to indicate an NFP architecture.
+ * They serve as a software view of a group of chip families, not necessarily a
+ * direct mapping to actual hardware design.
+ */
+#define NFP_CHIP_ARCH_YD	1
+#define NFP_CHIP_ARCH_TH	2
+
+/*
+ * NFP Chip Families.
+ *
+ * These are not enums, because they need to be microcode compatible.
+ * They are also not maskable.
+ *
+ * Note: The NFP-4xxx family is handled as NFP-6xxx in most software
+ * components.
+ *
+ */
+#define NFP_CHIP_FAMILY_NFP6000 0x6000	/* ARCH_TH */
+
+/* NFP Microengine/Flow Processing Core Versions */
+#define NFP_CHIP_ME_VERSION_2_7 0x0207
+#define NFP_CHIP_ME_VERSION_2_8 0x0208
+#define NFP_CHIP_ME_VERSION_2_9 0x0209
+
+/* NFP Chip Base Revisions. Minor stepping can just be added to these */
+#define NFP_CHIP_REVISION_A0 0x00
+#define NFP_CHIP_REVISION_B0 0x10
+#define NFP_CHIP_REVISION_C0 0x20
+#define NFP_CHIP_REVISION_PF 0xff /* Maximum possible revision */
+
+/* CPP Targets for each chip architecture */
+#define NFP6000_CPPTGT_NBI 1
+#define NFP6000_CPPTGT_VQDR 2
+#define NFP6000_CPPTGT_ILA 6
+#define NFP6000_CPPTGT_MU 7
+#define NFP6000_CPPTGT_PCIE 9
+#define NFP6000_CPPTGT_ARM 10
+#define NFP6000_CPPTGT_CRYPTO 12
+#define NFP6000_CPPTGT_CTXPB 14
+#define NFP6000_CPPTGT_CLS 15
+
+/*
+ * Wildcard indicating a CPP read or write action
+ *
+ * The action used will be either read or write depending on whether a read or
+ * write instruction/call is performed on the NFP_CPP_ID.  It is recomended that
+ * the RW action is used even if all actions to be performed on a NFP_CPP_ID are
+ * known to be only reads or writes. Doing so will in many cases save NFP CPP
+ * internal software resources.
+ */
+#define NFP_CPP_ACTION_RW 32
+
+#define NFP_CPP_TARGET_ID_MASK 0x1f
+
+/*
+ *  NFP_CPP_ID - pack target, token, and action into a CPP ID.
+ *
+ * Create a 32-bit CPP identifier representing the access to be made.
+ * These identifiers are used as parameters to other NFP CPP functions. Some
+ * CPP devices may allow wildcard identifiers to be specified.
+ *
+ * @param[in]	target	NFP CPP target id
+ * @param[in]	action	NFP CPP action id
+ * @param[in]	token	NFP CPP token id
+ * @return		NFP CPP ID
+ */
+#define NFP_CPP_ID(target, action, token)                   \
+	((((target) & 0x7f) << 24) | (((token) & 0xff) << 16) | \
+	 (((action) & 0xff) << 8))
+
+#define NFP_CPP_ISLAND_ID(target, action, token, island)    \
+	((((target) & 0x7f) << 24) | (((token) & 0xff) << 16) | \
+	 (((action) & 0xff) << 8) | (((island) & 0xff) << 0))
+
+#ifndef _NFP_RESID_NO_C_FUNC
+
+/**
+ * Return the NFP CPP target of a NFP CPP ID
+ * @param[in]	id	NFP CPP ID
+ * @return	NFP CPP target
+ */
+static inline uint8_t
+NFP_CPP_ID_TARGET_of(uint32_t id)
+{
+	return (id >> 24) & NFP_CPP_TARGET_ID_MASK;
+}
+
+/*
+ * Return the NFP CPP token of a NFP CPP ID
+ * @param[in]	id	NFP CPP ID
+ * @return	NFP CPP token
+ */
+static inline uint8_t
+NFP_CPP_ID_TOKEN_of(uint32_t id)
+{
+	return (id >> 16) & 0xff;
+}
+
+/*
+ * Return the NFP CPP action of a NFP CPP ID
+ * @param[in]	id	NFP CPP ID
+ * @return	NFP CPP action
+ */
+static inline uint8_t
+NFP_CPP_ID_ACTION_of(uint32_t id)
+{
+	return (id >> 8) & 0xff;
+}
+
+/*
+ * Return the NFP CPP action of a NFP CPP ID
+ * @param[in]   id      NFP CPP ID
+ * @return      NFP CPP action
+ */
+static inline uint8_t
+NFP_CPP_ID_ISLAND_of(uint32_t id)
+{
+	return (id) & 0xff;
+}
+
+#endif /* _NFP_RESID_NO_C_FUNC */
+
+/*
+ *  Check if @p chip_family is an ARCH_TH chip.
+ * @param chip_family One of NFP_CHIP_FAMILY_*
+ */
+#define NFP_FAMILY_IS_ARCH_TH(chip_family) \
+	((int)(chip_family) == (int)NFP_CHIP_FAMILY_NFP6000)
+
+/*
+ *  Get the NFP_CHIP_ARCH_* of @p chip_family.
+ * @param chip_family One of NFP_CHIP_FAMILY_*
+ */
+#define NFP_FAMILY_ARCH(x) \
+	(__extension__ ({ \
+		typeof(x) _x = (x); \
+		(NFP_FAMILY_IS_ARCH_TH(_x) ? NFP_CHIP_ARCH_TH : \
+		NFP_FAMILY_IS_ARCH_YD(_x) ? NFP_CHIP_ARCH_YD : -1) \
+	}))
+
+/*
+ *  Check if @p chip_family is an NFP-6xxx chip.
+ * @param chip_family One of NFP_CHIP_FAMILY_*
+ */
+#define NFP_FAMILY_IS_NFP6000(chip_family) \
+	((int)(chip_family) == (int)NFP_CHIP_FAMILY_NFP6000)
+
+/*
+ *  Make microengine ID for NFP-6xxx.
+ * @param island_id   Island ID.
+ * @param menum       ME number, 0 based, within island.
+ *
+ * NOTE: menum should really be unsigned - MSC compiler throws error (not
+ * warning) if a clause is always true i.e. menum >= 0 if cluster_num is type
+ * unsigned int hence the cast of the menum to an int in that particular clause
+ */
+#define NFP6000_MEID(a, b)                       \
+	(__extension__ ({ \
+		typeof(a) _a = (a); \
+		typeof(b) _b = (b); \
+		(((((int)(_a) & 0x3F) == (int)(_a)) &&   \
+		(((int)(_b) >= 0) && ((int)(_b) < 12))) ?    \
+		(int)(((_a) << 4) | ((_b) + 4)) : -1) \
+	}))
+
+/*
+ *  Do a general sanity check on the ME ID.
+ * The check is on the highest possible island ID for the chip family and the
+ * microengine number must  be a master ID.
+ * @param meid      ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_IS_VALID(meid) \
+	(__extension__ ({ \
+		typeof(meid) _a = (meid); \
+		((((_a) >> 4) < 64) && (((_a) >> 4) >= 0) && \
+		 (((_a) & 0xF) >= 4)) \
+	}))
+
+/*
+ *  Extract island ID from ME ID.
+ * @param meid   ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_ISLAND_of(meid) (((meid) >> 4) & 0x3F)
+
+/*
+ * Extract microengine number (0 based) from ME ID.
+ * @param meid   ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_MENUM_of(meid) (((meid) & 0xF) - 4)
+
+/*
+ * Extract microengine group number (0 based) from ME ID.
+ * The group is two code-sharing microengines, so group  0 refers to MEs 0,1,
+ * group 1 refers to MEs 2,3 etc.
+ * @param meid   ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_MEGRP_of(meid) (NFP6000_MEID_MENUM_of(meid) >> 1)
+
+#ifndef _NFP_RESID_NO_C_FUNC
+
+/*
+ *  Convert a string to an ME ID.
+ *
+ * @param s       A string of format iX.meY
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME ID part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return     ME ID on success, -1 on error.
+ */
+int nfp6000_idstr2meid(const char *s, const char **endptr);
+
+/*
+ *  Extract island ID from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp6000_idstr2island("i32.me5", &c);
+ * // val == 32, c == "me5"
+ * val = nfp6000_idstr2island("i32", &c);
+ * // val == 32, c == ""
+ *
+ * @param s       A string of format "iX.anything" or "iX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the island part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the island ID, -1 on error.
+ */
+int nfp6000_idstr2island(const char *s, const char **endptr);
+
+/*
+ *  Extract microengine number from string.
+ *
+ * Example:
+ * char *c;
+ * int menum = nfp6000_idstr2menum("me5.anything", &c);
+ * // menum == 5, c == "anything"
+ * menum = nfp6000_idstr2menum("me5", &c);
+ * // menum == 5, c == ""
+ *
+ * @param s       A string of format "meX.anything" or "meX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME number part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the ME number, -1 on error.
+ */
+int nfp6000_idstr2menum(const char *s, const char **endptr);
+
+/*
+ * Extract context number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp6000_idstr2ctxnum("ctx5.anything", &c);
+ * // val == 5, c == "anything"
+ * val = nfp6000_idstr2ctxnum("ctx5", &c);
+ * // val == 5, c == ""
+ *
+ * @param s       A string of format "ctxN.anything" or "ctxN"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the context number part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the context number, -1 on error.
+ */
+int nfp6000_idstr2ctxnum(const char *s, const char **endptr);
+
+/*
+ * Extract microengine group number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp6000_idstr2megrp("tg2.anything", &c);
+ * // val == 2, c == "anything"
+ * val = nfp6000_idstr2megrp("tg5", &c);
+ * // val == 2, c == ""
+ *
+ * @param s       A string of format "tgX.anything" or "tgX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME group part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the ME group number, -1 on error.
+ */
+int nfp6000_idstr2megrp(const char *s, const char **endptr);
+
+/*
+ * Create ME ID string of format "iX[.meY]".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param meid   Microengine ID.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_meid2str(char *s, int meid);
+
+/*
+ * Create ME ID string of format "name[.meY]" or "iX[.meY]".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param meid   Microengine ID.
+ * @return       Pointer to "s" on success, NULL on error.
+ *
+ * Similar to nfp6000_meid2str() except use an alias instead of "iX"
+ * if one exists for the island.
+ */
+const char *nfp6000_meid2altstr(char *s, int meid);
+
+/*
+ * Create string of format "iX".
+ *
+ * @param s         Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                  The resulting string is output here.
+ * @param island_id Island ID.
+ * @return          Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_island2str(char *s, int island_id);
+
+/*
+ * Create string of format "name", an island alias.
+ *
+ * @param s         Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                  The resulting string is output here.
+ * @param island_id Island ID.
+ * @return          Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_island2altstr(char *s, int island_id);
+
+/*
+ * Create string of format "meY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param menum  Microengine number within island.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_menum2str(char *s, int menum);
+
+/*
+ * Create string of format "ctxY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param ctxnum Context number within microengine.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_ctxnum2str(char *s, int ctxnum);
+
+/*
+ * Create string of format "tgY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param megrp  Microengine group number within cluster.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_megrp2str(char *s, int megrp);
+
+/*
+ * Convert a string to an ME ID.
+ *
+ * @param chip_family Chip family ID
+ * @param s           A string of format iX.meY (or clX.meY)
+ * @param endptr      If non-NULL, *endptr will point to the trailing
+ *                    string after the ME ID part of the string, which
+ *                    is either an empty string or the first character
+ *                    after the separating period.
+ * @return            ME ID on success, -1 on error.
+ */
+int nfp_idstr2meid(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract island ID from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp_idstr2island(chip, "i32.me5", &c);
+ * // val == 32, c == "me5"
+ * val = nfp_idstr2island(chip, "i32", &c);
+ * // val == 32, c == ""
+ *
+ * @param chip_family Chip family ID
+ * @param s           A string of format "iX.anything" or "iX"
+ * @param endptr      If non-NULL, *endptr will point to the trailing
+ *                    striong after the ME ID part of the string, which
+ *                    is either an empty string or the first character
+ *                    after the separating period.
+ * @return            The island ID on succes, -1 on error.
+ */
+int nfp_idstr2island(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract microengine number from string.
+ *
+ * Example:
+ * char *c;
+ * int menum = nfp_idstr2menum("me5.anything", &c);
+ * // menum == 5, c == "anything"
+ * menum = nfp_idstr2menum("me5", &c);
+ * // menum == 5, c == ""
+ *
+ * @param chip_family Chip family ID
+ * @param s           A string of format "meX.anything" or "meX"
+ * @param endptr      If non-NULL, *endptr will point to the trailing
+ *                    striong after the ME ID part of the string, which
+ *                    is either an empty string or the first character
+ *                    after the separating period.
+ * @return            The ME number on succes, -1 on error.
+ */
+int nfp_idstr2menum(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract context number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp_idstr2ctxnum("ctx5.anything", &c);
+ * // val == 5, c == "anything"
+ * val = nfp_idstr2ctxnum("ctx5", &c);
+ * // val == 5, c == ""
+ *
+ * @param s       A string of format "ctxN.anything" or "ctxN"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the context number part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the context number, -1 on error.
+ */
+int nfp_idstr2ctxnum(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract microengine group number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp_idstr2megrp("tg2.anything", &c);
+ * // val == 2, c == "anything"
+ * val = nfp_idstr2megrp("tg5", &c);
+ * // val == 5, c == ""
+ *
+ * @param s       A string of format "tgX.anything" or "tgX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME group part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the ME group number, -1 on error.
+ */
+int nfp_idstr2megrp(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Create ME ID string of format "iX[.meY]".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param meid        Microengine ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_meid2str(int chip_family, char *s, int meid);
+
+/*
+ * Create ME ID string of format "name[.meY]" or "iX[.meY]".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param meid        Microengine ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ *
+ * Similar to nfp_meid2str() except use an alias instead of "iX"
+ * if one exists for the island.
+ */
+const char *nfp_meid2altstr(int chip_family, char *s, int meid);
+
+/*
+ * Create string of format "iX".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param island_id   Island ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_island2str(int chip_family, char *s, int island_id);
+
+/*
+ * Create string of format "name", an island alias.
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param island_id   Island ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_island2altstr(int chip_family, char *s, int island_id);
+
+/*
+ * Create string of format "meY".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param menum       Microengine number within island.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_menum2str(int chip_family, char *s, int menum);
+
+/*
+ * Create string of format "ctxY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param ctxnum Context number within microengine.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_ctxnum2str(int chip_family, char *s, int ctxnum);
+
+/*
+ * Create string of format "tgY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param megrp  Microengine group number within cluster.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_megrp2str(int chip_family, char *s, int megrp);
+
+/*
+ * Convert a two character string to revision number.
+ *
+ * Revision integer is 0x00 for A0, 0x11 for B1 etc.
+ *
+ * @param s     Two character string.
+ * @return      Revision number, -1 on error
+ */
+int nfp_idstr2rev(const char *s);
+
+/*
+ * Create string from revision number.
+ *
+ * String will be upper case.
+ *
+ * @param s     Pointer to char buffer with size of at least 3
+ *              for 2 characters and string terminator.
+ * @param rev   Revision number.
+ * @return      Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_rev2str(char *s, int rev);
+
+/*
+ * Get the NFP CPP address from a string
+ *
+ * String is in the format [island@]target[:[action:[token:]]address]
+ *
+ * @param chip_family Chip family ID
+ * @param tid           Pointer to string to parse
+ * @param cpp_idp       Pointer to CPP ID
+ * @param cpp_addrp     Pointer to CPP address
+ * @return              0 on success, or -1 and errno
+ */
+int nfp_str2cpp(int chip_family,
+		const char *tid,
+		uint32_t *cpp_idp,
+		uint64_t *cpp_addrp);
+
+
+#endif /* _NFP_RESID_NO_C_FUNC */
+
+#endif /* __NFP_RESID_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp6000/nfp6000.h b/drivers/net/nfp/nfpcore/nfp6000/nfp6000.h
new file mode 100644
index 0000000..dc0a359
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp6000/nfp6000.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_NFP6000_H__
+#define __NFP_NFP6000_H__
+
+/* CPP Target IDs */
+#define NFP_CPP_TARGET_INVALID          0
+#define NFP_CPP_TARGET_NBI              1
+#define NFP_CPP_TARGET_QDR              2
+#define NFP_CPP_TARGET_ILA              6
+#define NFP_CPP_TARGET_MU               7
+#define NFP_CPP_TARGET_PCIE             9
+#define NFP_CPP_TARGET_ARM              10
+#define NFP_CPP_TARGET_CRYPTO           12
+#define NFP_CPP_TARGET_ISLAND_XPB       14	/* Shared with CAP */
+#define NFP_CPP_TARGET_ISLAND_CAP       14	/* Shared with XPB */
+#define NFP_CPP_TARGET_CT_XPB           14
+#define NFP_CPP_TARGET_LOCAL_SCRATCH    15
+#define NFP_CPP_TARGET_CLS              NFP_CPP_TARGET_LOCAL_SCRATCH
+
+#define NFP_ISL_EMEM0                   24
+
+#define NFP_MU_ADDR_ACCESS_TYPE_MASK    3ULL
+#define NFP_MU_ADDR_ACCESS_TYPE_DIRECT  2ULL
+
+static inline int
+nfp_cppat_mu_locality_lsb(int mode, int addr40)
+{
+	switch (mode) {
+	case 0 ... 3:
+		return addr40 ? 38 : 30;
+	default:
+		return -EINVAL;
+	}
+}
+
+#endif /* NFP_NFP6000_H */
diff --git a/drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h b/drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h
new file mode 100644
index 0000000..2a89519
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_XPB_H__
+#define __NFP_XPB_H__
+
+/*
+ * For use with NFP6000 Databook "XPB Addressing" section
+ */
+#define NFP_XPB_OVERLAY(island)  (((island) & 0x3f) << 24)
+
+#define NFP_XPB_ISLAND(island)   (NFP_XPB_OVERLAY(island) + 0x60000)
+
+#define NFP_XPB_ISLAND_of(offset) (((offset) >> 24) & 0x3F)
+
+/*
+ * For use with NFP6000 Databook "XPB Island and Device IDs" chapter
+ */
+#define NFP_XPB_DEVICE(island, slave, device) \
+				(NFP_XPB_OVERLAY(island) | \
+				 (((slave) & 3) << 22) | \
+				 (((device) & 0x3f) << 16))
+
+#endif /* NFP_XPB_H */
diff --git a/drivers/net/nfp/nfpcore/nfp_cpp.h b/drivers/net/nfp/nfpcore/nfp_cpp.h
new file mode 100644
index 0000000..e1d8fda
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_cpp.h
@@ -0,0 +1,803 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+#ifndef __NFP_CPP_H__
+#define __NFP_CPP_H__
+
+#include "nfp-common/nfp_platform.h"
+#include "nfp-common/nfp_resid.h"
+
+struct nfp_cpp_mutex;
+
+/*
+ * NFP CPP handle
+ */
+struct nfp_cpp {
+	uint32_t model;
+	uint32_t interface;
+	uint8_t *serial;
+	int serial_len;
+	void *priv;
+
+	/* Mutex cache */
+	struct nfp_cpp_mutex *mutex_cache;
+	const struct nfp_cpp_operations *op;
+
+	/*
+	 * NFP-6xxx originating island IMB CPP Address Translation. CPP Target
+	 * ID is index into array. Values are obtained at runtime from local
+	 * island XPB CSRs.
+	 */
+	uint32_t imb_cat_table[16];
+};
+
+/*
+ * NFP CPP device area handle
+ */
+struct nfp_cpp_area {
+	struct nfp_cpp *cpp;
+	char *name;
+	unsigned long long offset;
+	unsigned long size;
+	/* Here follows the 'priv' part of nfp_cpp_area. */
+};
+
+/*
+ * NFP CPP operations structure
+ */
+struct nfp_cpp_operations {
+	/* Size of priv area in struct nfp_cpp_area */
+	size_t area_priv_size;
+
+	/* Instance an NFP CPP */
+	int (*init)(struct nfp_cpp *cpp, const char *devname);
+
+	/*
+	 * Free the bus.
+	 * Called only once, during nfp_cpp_unregister()
+	 */
+	void (*free)(struct nfp_cpp *cpp);
+
+	/*
+	 * Initialize a new NFP CPP area
+	 * NOTE: This is _not_ serialized
+	 */
+	int (*area_init)(struct nfp_cpp_area *area,
+			 uint32_t dest,
+			 unsigned long long address,
+			 unsigned long size);
+	/*
+	 * Clean up a NFP CPP area before it is freed
+	 * NOTE: This is _not_ serialized
+	 */
+	void (*area_cleanup)(struct nfp_cpp_area *area);
+
+	/*
+	 * Acquire resources for a NFP CPP area
+	 * Serialized
+	 */
+	int (*area_acquire)(struct nfp_cpp_area *area);
+	/*
+	 * Release resources for a NFP CPP area
+	 * Serialized
+	 */
+	void (*area_release)(struct nfp_cpp_area *area);
+	/*
+	 * Return a void IO pointer to a NFP CPP area
+	 * NOTE: This is _not_ serialized
+	 */
+
+	void *(*area_iomem)(struct nfp_cpp_area *area);
+
+	void *(*area_mapped)(struct nfp_cpp_area *area);
+	/*
+	 * Perform a read from a NFP CPP area
+	 * Serialized
+	 */
+	int (*area_read)(struct nfp_cpp_area *area,
+			 void *kernel_vaddr,
+			 unsigned long offset,
+			 unsigned int length);
+	/*
+	 * Perform a write to a NFP CPP area
+	 * Serialized
+	 */
+	int (*area_write)(struct nfp_cpp_area *area,
+			  const void *kernel_vaddr,
+			  unsigned long offset,
+			  unsigned int length);
+};
+
+/*
+ * This should be the only external function the transport
+ * module supplies
+ */
+const struct nfp_cpp_operations *nfp_cpp_transport_operations(void);
+
+/*
+ * Set the model id
+ *
+ * @param   cpp     NFP CPP operations structure
+ * @param   model   Model ID
+ */
+void nfp_cpp_model_set(struct nfp_cpp *cpp, uint32_t model);
+
+/*
+ * Set the private instance owned data of a nfp_cpp struct
+ *
+ * @param   cpp     NFP CPP operations structure
+ * @param   interface Interface ID
+ */
+void nfp_cpp_interface_set(struct nfp_cpp *cpp, uint32_t interface);
+
+/*
+ * Set the private instance owned data of a nfp_cpp struct
+ *
+ * @param   cpp     NFP CPP operations structure
+ * @param   serial  NFP serial byte array
+ * @param   len     Length of the serial byte array
+ */
+int nfp_cpp_serial_set(struct nfp_cpp *cpp, const uint8_t *serial,
+		       size_t serial_len);
+
+/*
+ * Set the private data of the nfp_cpp instance
+ *
+ * @param   cpp NFP CPP operations structure
+ * @return      Opaque device pointer
+ */
+void nfp_cpp_priv_set(struct nfp_cpp *cpp, void *priv);
+
+/*
+ * Return the private data of the nfp_cpp instance
+ *
+ * @param   cpp NFP CPP operations structure
+ * @return      Opaque device pointer
+ */
+void *nfp_cpp_priv(struct nfp_cpp *cpp);
+
+/*
+ * Get the privately allocated portion of a NFP CPP area handle
+ *
+ * @param   cpp_area    NFP CPP area handle
+ * @return          Pointer to the private area, or NULL on failure
+ */
+void *nfp_cpp_area_priv(struct nfp_cpp_area *cpp_area);
+
+uint32_t __nfp_cpp_model_autodetect(struct nfp_cpp *cpp);
+
+/*
+ * NFP CPP core interface for CPP clients.
+ */
+
+/*
+ * Open a NFP CPP handle to a CPP device
+ *
+ * @param[in]	id	0-based ID for the CPP interface to use
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp *nfp_cpp_from_device_name(const char *devname);
+
+/*
+ * Free a NFP CPP handle
+ *
+ * @param[in]	cpp	NFP CPP handle
+ */
+void nfp_cpp_free(struct nfp_cpp *cpp);
+
+#define NFP_CPP_MODEL_INVALID   0xffffffff
+
+/*
+ * NFP_CPP_MODEL_CHIP_of - retrieve the chip ID from the model ID
+ *
+ * The chip ID is a 16-bit BCD+A-F encoding for the chip type.
+ *
+ * @param[in]   model   NFP CPP model id
+ * @return      NFP CPP chip id
+ */
+#define NFP_CPP_MODEL_CHIP_of(model)        (((model) >> 16) & 0xffff)
+
+/*
+ * NFP_CPP_MODEL_IS_6000 - Check for the NFP6000 family of devices
+ *
+ * NOTE: The NFP4000 series is considered as a NFP6000 series variant.
+ *
+ * @param[in]	model	NFP CPP model id
+ * @return		true if model is in the NFP6000 family, false otherwise.
+ */
+#define NFP_CPP_MODEL_IS_6000(model)		     \
+		((NFP_CPP_MODEL_CHIP_of(model) >= 0x4000) && \
+		(NFP_CPP_MODEL_CHIP_of(model) < 0x7000))
+
+/*
+ * nfp_cpp_model - Retrieve the Model ID of the NFP
+ *
+ * @param[in]	cpp	NFP CPP handle
+ * @return		NFP CPP Model ID
+ */
+uint32_t nfp_cpp_model(struct nfp_cpp *cpp);
+
+/*
+ * NFP Interface types - logical interface for this CPP connection 4 bits are
+ * reserved for interface type.
+ */
+#define NFP_CPP_INTERFACE_TYPE_INVALID		0x0
+#define NFP_CPP_INTERFACE_TYPE_PCI		0x1
+#define NFP_CPP_INTERFACE_TYPE_ARM		0x2
+#define NFP_CPP_INTERFACE_TYPE_RPC		0x3
+#define NFP_CPP_INTERFACE_TYPE_ILA		0x4
+
+/*
+ * Construct a 16-bit NFP Interface ID
+ *
+ * Interface IDs consists of 4 bits of interface type, 4 bits of unit
+ * identifier, and 8 bits of channel identifier.
+ *
+ * The NFP Interface ID is used in the implementation of NFP CPP API mutexes,
+ * which use the MU Atomic CompareAndWrite operation - hence the limit to 16
+ * bits to be able to use the NFP Interface ID as a lock owner.
+ *
+ * @param[in]	type	NFP Interface Type
+ * @param[in]	unit	Unit identifier for the interface type
+ * @param[in]	channel	Channel identifier for the interface unit
+ * @return		Interface ID
+ */
+#define NFP_CPP_INTERFACE(type, unit, channel)	\
+	((((type) & 0xf) << 12) | \
+	 (((unit) & 0xf) <<  8) | \
+	 (((channel) & 0xff) << 0))
+
+/*
+ * Get the interface type of a NFP Interface ID
+ * @param[in]	interface	NFP Interface ID
+ * @return			NFP Interface ID's type
+ */
+#define NFP_CPP_INTERFACE_TYPE_of(interface)	(((interface) >> 12) & 0xf)
+
+/*
+ * Get the interface unit of a NFP Interface ID
+ * @param[in]	interface	NFP Interface ID
+ * @return			NFP Interface ID's unit
+ */
+#define NFP_CPP_INTERFACE_UNIT_of(interface)	(((interface) >>  8) & 0xf)
+
+/*
+ * Get the interface channel of a NFP Interface ID
+ * @param[in]	interface	NFP Interface ID
+ * @return			NFP Interface ID's channel
+ */
+#define NFP_CPP_INTERFACE_CHANNEL_of(interface)	(((interface) >>  0) & 0xff)
+
+/*
+ * Retrieve the Interface ID of the NFP
+ * @param[in]	cpp	NFP CPP handle
+ * @return		NFP CPP Interface ID
+ */
+uint16_t nfp_cpp_interface(struct nfp_cpp *cpp);
+
+/*
+ * Retrieve the NFP Serial Number (unique per NFP)
+ * @param[in]	cpp	NFP CPP handle
+ * @param[out]	serial	Pointer to reference the serial number array
+ *
+ * @return	size of the NFP6000 serial number, in bytes
+ */
+int nfp_cpp_serial(struct nfp_cpp *cpp, const uint8_t **serial);
+
+/*
+ * Allocate a NFP CPP area handle, as an offset into a CPP ID
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	size	Size of the area to reserve
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp_area *nfp_cpp_area_alloc(struct nfp_cpp *cpp, uint32_t cpp_id,
+					unsigned long long address,
+					unsigned long size);
+
+/*
+ * Allocate a NFP CPP area handle, as an offset into a CPP ID, by a named owner
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	name	Name of owner of the area
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	size	Size of the area to reserve
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp_area *nfp_cpp_area_alloc_with_name(struct nfp_cpp *cpp,
+						  uint32_t cpp_id,
+						  const char *name,
+						  unsigned long long address,
+						  unsigned long size);
+
+/*
+ * Free an allocated NFP CPP area handle
+ * @param[in]	area	NFP CPP area handle
+ */
+void nfp_cpp_area_free(struct nfp_cpp_area *area);
+
+/*
+ * Acquire the resources needed to access the NFP CPP area handle
+ *
+ * @param[in]	area	NFP CPP area handle
+ *
+ * @return 0 on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_acquire(struct nfp_cpp_area *area);
+
+/*
+ * Release the resources needed to access the NFP CPP area handle
+ *
+ * @param[in]	area	NFP CPP area handle
+ */
+void nfp_cpp_area_release(struct nfp_cpp_area *area);
+
+/*
+ * Allocate, then acquire the resources needed to access the NFP CPP area handle
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	size	Size of the area to reserve
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp_area *nfp_cpp_area_alloc_acquire(struct nfp_cpp *cpp,
+						uint32_t cpp_id,
+						unsigned long long address,
+						unsigned long size);
+
+/*
+ * Release the resources, then free the NFP CPP area handle
+ * @param[in]	area	NFP CPP area handle
+ */
+void nfp_cpp_area_release_free(struct nfp_cpp_area *area);
+
+uint8_t *nfp_cpp_map_area(struct nfp_cpp *cpp, int domain, int target,
+			   uint64_t addr, unsigned long size,
+			   struct nfp_cpp_area **area);
+/*
+ * Return an IO pointer to the beginning of the NFP CPP area handle. The area
+ * must be acquired with 'nfp_cpp_area_acquire()' before calling this operation.
+ *
+ * @param[in]	area	NFP CPP area handle
+ *
+ * @return Pointer to IO memory, or NULL on failure (and set errno accordingly).
+ */
+void *nfp_cpp_area_mapped(struct nfp_cpp_area *area);
+
+/*
+ * Read from a NFP CPP area handle into a buffer. The area must be acquired with
+ * 'nfp_cpp_area_acquire()' before calling this operation.
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the area
+ * @param[in]	buffer	Location of buffer to receive the data
+ * @param[in]	length	Length of the data to read
+ *
+ * @return bytes read on success, -1 on failure (and set errno accordingly).
+ *
+ */
+int nfp_cpp_area_read(struct nfp_cpp_area *area, unsigned long offset,
+		      void *buffer, size_t length);
+
+/*
+ * Write to a NFP CPP area handle from a buffer. The area must be acquired with
+ * 'nfp_cpp_area_acquire()' before calling this operation.
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the area
+ * @param[in]	buffer	Location of buffer that holds the data
+ * @param[in]	length	Length of the data to read
+ *
+ * @return bytes written on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_write(struct nfp_cpp_area *area, unsigned long offset,
+		       const void *buffer, size_t length);
+
+/*
+ * nfp_cpp_area_iomem() - get IOMEM region for CPP area
+ * @area:       CPP area handle
+ *
+ * Returns an iomem pointer for use with readl()/writel() style operations.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ *
+ * Return: pointer to the area, or NULL
+ */
+void *nfp_cpp_area_iomem(struct nfp_cpp_area *area);
+
+/*
+ * Verify that IO can be performed on an offset in an area
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the area
+ * @param[in]	size	Size of region to validate
+ *
+ * @return 0 on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_check_range(struct nfp_cpp_area *area,
+			     unsigned long long offset, unsigned long size);
+
+/*
+ * Get the NFP CPP handle that is the parent of a NFP CPP area handle
+ *
+ * @param	cpp_area	NFP CPP area handle
+ * @return			NFP CPP handle
+ */
+struct nfp_cpp *nfp_cpp_area_cpp(struct nfp_cpp_area *cpp_area);
+
+/*
+ * Get the name passed during allocation of the NFP CPP area handle
+ *
+ * @param	cpp_area	NFP CPP area handle
+ * @return			Pointer to the area's name
+ */
+const char *nfp_cpp_area_name(struct nfp_cpp_area *cpp_area);
+
+/*
+ * Read a block of data from a NFP CPP ID
+ *
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	kernel_vaddr	Buffer to copy read data to
+ * @param[in]	length	Size of the area to reserve
+ *
+ * @return bytes read on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_read(struct nfp_cpp *cpp, uint32_t cpp_id,
+		 unsigned long long address, void *kernel_vaddr, size_t length);
+
+/*
+ * Write a block of data to a NFP CPP ID
+ *
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	kernel_vaddr	Buffer to copy write data from
+ * @param[in]	length	Size of the area to reserve
+ *
+ * @return bytes written on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_write(struct nfp_cpp *cpp, uint32_t cpp_id,
+		  unsigned long long address, const void *kernel_vaddr,
+		  size_t length);
+
+
+
+/*
+ * Fill a NFP CPP area handle and offset with a value
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the NFP CPP ID address space
+ * @param[in]	value	32-bit value to fill area with
+ * @param[in]	length	Size of the area to reserve
+ *
+ * @return bytes written on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_fill(struct nfp_cpp_area *area, unsigned long offset,
+		      uint32_t value, size_t length);
+
+/*
+ * Read a single 32-bit value from a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		output value
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 32-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_readl(struct nfp_cpp_area *area, unsigned long offset,
+		       uint32_t *value);
+
+/*
+ * Write a single 32-bit value to a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		value to write
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 32-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_writel(struct nfp_cpp_area *area, unsigned long offset,
+			uint32_t value);
+
+/*
+ * Read a single 64-bit value from a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		output value
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 64-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_readq(struct nfp_cpp_area *area, unsigned long offset,
+		       uint64_t *value);
+
+/*
+ * Write a single 64-bit value to a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		value to write
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 64-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_writeq(struct nfp_cpp_area *area, unsigned long offset,
+			uint64_t value);
+
+/*
+ * Write a single 32-bit value on the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt	XPB target and address
+ * @param value         value to write
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_writel(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t value);
+
+/*
+ * Read a single 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt	XPB target and address
+ * @param value         output value
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_readl(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t *value);
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to modify
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_writelm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+		    uint32_t value);
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to monitor for
+ * @param timeout_us    maximum number of us to wait (-1 for forever)
+ *
+ * @return >= 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_waitlm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+		   uint32_t value, int timeout_us);
+
+/*
+ * Read a 32-bit word from a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         output value
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_readl(struct nfp_cpp *cpp, uint32_t cpp_id,
+		  unsigned long long address, uint32_t *value);
+
+/*
+ * Write a 32-bit value to a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         value to write
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ *
+ */
+int nfp_cpp_writel(struct nfp_cpp *cpp, uint32_t cpp_id,
+		   unsigned long long address, uint32_t value);
+
+/*
+ * Read a 64-bit work from a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         output value
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_readq(struct nfp_cpp *cpp, uint32_t cpp_id,
+		  unsigned long long address, uint64_t *value);
+
+/*
+ * Write a 64-bit value to a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         value to write
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_writeq(struct nfp_cpp *cpp, uint32_t cpp_id,
+		   unsigned long long address, uint64_t value);
+
+/*
+ * Initialize a mutex location
+
+ * The CPP target:address must point to a 64-bit aligned location, and will
+ * initialize 64 bits of data at the location.
+ *
+ * This creates the initial mutex state, as locked by this nfp_cpp_interface().
+ *
+ * This function should only be called when setting up the initial lock state
+ * upon boot-up of the system.
+ *
+ * @param cpp		NFP CPP handle
+ * @param target	NFP CPP target ID
+ * @param address	Offset into the address space of the NFP CPP target ID
+ * @param key_id	Unique 32-bit value for this mutex
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_mutex_init(struct nfp_cpp *cpp, int target,
+		       unsigned long long address, uint32_t key_id);
+
+/*
+ * Create a mutex handle from an address controlled by a MU Atomic engine
+ *
+ * The CPP target:address must point to a 64-bit aligned location, and reserve
+ * 64 bits of data at the location for use by the handle.
+ *
+ * Only target/address pairs that point to entities that support the MU Atomic
+ * Engine's CmpAndSwap32 command are supported.
+ *
+ * @param cpp		NFP CPP handle
+ * @param target	NFP CPP target ID
+ * @param address	Offset into the address space of the NFP CPP target ID
+ * @param key_id	32-bit unique key (must match the key at this location)
+ *
+ * @return		A non-NULL struct nfp_cpp_mutex * on success, NULL on
+ *                      failure.
+ */
+struct nfp_cpp_mutex *nfp_cpp_mutex_alloc(struct nfp_cpp *cpp, int target,
+					  unsigned long long address,
+					  uint32_t key_id);
+
+/*
+ * Get the NFP CPP handle the mutex was created with
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          NFP CPP handle
+ */
+struct nfp_cpp *nfp_cpp_mutex_cpp(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex key
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Mutex key
+ */
+uint32_t nfp_cpp_mutex_key(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex owner
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Interface ID of the mutex owner
+ *
+ * NOTE: This is for debug purposes ONLY - the owner may change at any time,
+ * unless it has been locked by this NFP CPP handle.
+ */
+uint16_t nfp_cpp_mutex_owner(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex target
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Mutex CPP target (ie NFP_CPP_TARGET_MU)
+ */
+int nfp_cpp_mutex_target(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex address
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Mutex CPP address
+ */
+uint64_t nfp_cpp_mutex_address(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Free a mutex handle - does not alter the lock state
+ *
+ * @param mutex		NFP CPP Mutex handle
+ */
+void nfp_cpp_mutex_free(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex		NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Unlock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex		NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_mutex_unlock(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Attempt to lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex		NFP CPP Mutex handle
+ * @return		0 if the lock succeeded, -1 on failure (and errno set
+ *			appropriately).
+ */
+int nfp_cpp_mutex_trylock(struct nfp_cpp_mutex *mutex);
+
+#endif /* !__NFP_CPP_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
new file mode 100644
index 0000000..817e70f
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -0,0 +1,962 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * nfp_cpp_pcie_ops.c
+ * Authors: Vinayak Tammineedi <vinayak.tammineedi@netronome.com>
+ *
+ * Multiplexes the NFP BARs between NFP internal resources and
+ * implements the PCIe specific interface for generic CPP bus access.
+ *
+ * The BARs are managed and allocated if they are available.
+ * The generic CPP bus abstraction builds upon this BAR interface.
+ */
+
+#include <assert.h>
+#include <stdio.h>
+#include <execinfo.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <fcntl.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <dirent.h>
+#include <libgen.h>
+
+#include <sys/mman.h>
+#include <sys/file.h>
+#include <sys/stat.h>
+
+#include "nfp_cpp.h"
+#include "nfp_target.h"
+#include "nfp6000/nfp6000.h"
+
+#define NFP_PCIE_BAR(_pf)	(0x30000 + ((_pf) & 7) * 0xc0)
+
+#define NFP_PCIE_BAR_PCIE2CPP_ACTION_BASEADDRESS(_x)  (((_x) & 0x1f) << 16)
+#define NFP_PCIE_BAR_PCIE2CPP_BASEADDRESS(_x)         (((_x) & 0xffff) << 0)
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT(_x)        (((_x) & 0x3) << 27)
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_32BIT    0
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_64BIT    1
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_0BYTE    3
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE(_x)             (((_x) & 0x7) << 29)
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_OF(_x)          (((_x) >> 29) & 0x7)
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_FIXED         0
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_BULK          1
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_TARGET        2
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_GENERAL       3
+#define NFP_PCIE_BAR_PCIE2CPP_TARGET_BASEADDRESS(_x)  (((_x) & 0xf) << 23)
+#define NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(_x)   (((_x) & 0x3) << 21)
+
+/*
+ * Minimal size of the PCIe cfg memory we depend on being mapped,
+ * queue controller and DMA controller don't have to be covered.
+ */
+#define NFP_PCI_MIN_MAP_SIZE				0x080000
+
+#define NFP_PCIE_P2C_FIXED_SIZE(bar)               (1 << (bar)->bitsize)
+#define NFP_PCIE_P2C_BULK_SIZE(bar)                (1 << (bar)->bitsize)
+#define NFP_PCIE_P2C_GENERAL_TARGET_OFFSET(bar, x) ((x) << ((bar)->bitsize - 2))
+#define NFP_PCIE_P2C_GENERAL_TOKEN_OFFSET(bar, x) ((x) << ((bar)->bitsize - 4))
+#define NFP_PCIE_P2C_GENERAL_SIZE(bar)             (1 << ((bar)->bitsize - 4))
+
+#define NFP_PCIE_CFG_BAR_PCIETOCPPEXPBAR(bar, slot) \
+	(NFP_PCIE_BAR(0) + ((bar) * 8 + (slot)) * 4)
+
+#define NFP_PCIE_CPP_BAR_PCIETOCPPEXPBAR(bar, slot) \
+	(((bar) * 8 + (slot)) * 4)
+
+/*
+ * Define to enable a bit more verbose debug output.
+ * Set to 1 to enable a bit more verbose debug output.
+ */
+struct nfp_pcie_user;
+struct nfp6000_area_priv;
+
+/*
+ * struct nfp_bar - describes BAR configuration and usage
+ * @nfp:	backlink to owner
+ * @barcfg:	cached contents of BAR config CSR
+ * @base:	the BAR's base CPP offset
+ * @mask:       mask for the BAR aperture (read only)
+ * @bitsize:	bitsize of BAR aperture (read only)
+ * @index:	index of the BAR
+ * @lock:	lock to specify if bar is in use
+ * @refcnt:	number of current users
+ * @iomem:	mapped IO memory
+ */
+#define NFP_BAR_MAX 7
+struct nfp_bar {
+	struct nfp_pcie_user *nfp;
+	uint32_t barcfg;
+	uint64_t base;		/* CPP address base */
+	uint64_t mask;		/* Bit mask of the bar */
+	uint32_t bitsize;	/* Bit size of the bar */
+	int index;
+	int lock;
+
+	char *csr;
+	char *iomem;
+};
+
+#define BUSDEV_SZ	13
+struct nfp_pcie_user {
+	struct nfp_bar bar[NFP_BAR_MAX];
+
+	int device;
+	int lock;
+	char busdev[BUSDEV_SZ];
+	int barsz;
+	char *cfg;
+};
+
+static uint32_t
+nfp_bar_maptype(struct nfp_bar *bar)
+{
+	return NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_OF(bar->barcfg);
+}
+
+#define TARGET_WIDTH_32    4
+#define TARGET_WIDTH_64    8
+
+static int
+nfp_compute_bar(const struct nfp_bar *bar, uint32_t *bar_config,
+		uint64_t *bar_base, int tgt, int act, int tok,
+		uint64_t offset, size_t size, int width)
+{
+	uint32_t bitsize;
+	uint32_t newcfg;
+	uint64_t mask;
+
+	if (tgt >= 16)
+		return -EINVAL;
+
+	switch (width) {
+	case 8:
+		newcfg =
+		    NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT
+		    (NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_64BIT);
+		break;
+	case 4:
+		newcfg =
+		    NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT
+		    (NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_32BIT);
+		break;
+	case 0:
+		newcfg =
+		    NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT
+		    (NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_0BYTE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (act != NFP_CPP_ACTION_RW && act != 0) {
+		/* Fixed CPP mapping with specific action */
+		mask = ~(NFP_PCIE_P2C_FIXED_SIZE(bar) - 1);
+
+		newcfg |=
+		    NFP_PCIE_BAR_PCIE2CPP_MAPTYPE
+		    (NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_FIXED);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TARGET_BASEADDRESS(tgt);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_ACTION_BASEADDRESS(act);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
+
+		if ((offset & mask) != ((offset + size - 1) & mask)) {
+			printf("BAR%d: Won't use for Fixed mapping\n",
+				bar->index);
+			printf("\t<%#llx,%#llx>, action=%d\n",
+				(unsigned long long)offset,
+				(unsigned long long)(offset + size), act);
+			printf("\tBAR too small (0x%llx).\n",
+				(unsigned long long)mask);
+			return -EINVAL;
+		}
+		offset &= mask;
+
+#ifdef DEBUG
+		printf("BAR%d: Created Fixed mapping\n", bar->index);
+		printf("\t%d:%d:%d:0x%#llx-0x%#llx>\n", tgt, act, tok,
+			(unsigned long long)offset,
+			(unsigned long long)(offset + mask));
+#endif
+
+		bitsize = 40 - 16;
+	} else {
+		mask = ~(NFP_PCIE_P2C_BULK_SIZE(bar) - 1);
+
+		/* Bulk mapping */
+		newcfg |=
+		    NFP_PCIE_BAR_PCIE2CPP_MAPTYPE
+		    (NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_BULK);
+
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TARGET_BASEADDRESS(tgt);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
+
+		if ((offset & mask) != ((offset + size - 1) & mask)) {
+			printf("BAR%d: Won't use for bulk mapping\n",
+				bar->index);
+			printf("\t<%#llx,%#llx>\n", (unsigned long long)offset,
+				(unsigned long long)(offset + size));
+			printf("\ttarget=%d, token=%d\n", tgt, tok);
+			printf("\tBAR too small (%#llx) - (%#llx != %#llx).\n",
+				(unsigned long long)mask,
+				(unsigned long long)(offset & mask),
+				(unsigned long long)(offset + size - 1) & mask);
+
+			return -EINVAL;
+		}
+
+		offset &= mask;
+
+#ifdef DEBUG
+		printf("BAR%d: Created bulk mapping %d:x:%d:%#llx-%#llx\n",
+			bar->index, tgt, tok, (unsigned long long)offset,
+			(unsigned long long)(offset + ~mask));
+#endif
+
+		bitsize = 40 - 21;
+	}
+
+	if (bar->bitsize < bitsize) {
+		printf("BAR%d: Too small for %d:%d:%d\n", bar->index, tgt, tok,
+			act);
+		return -EINVAL;
+	}
+
+	newcfg |= offset >> bitsize;
+
+	if (bar_base)
+		*bar_base = offset;
+
+	if (bar_config)
+		*bar_config = newcfg;
+
+	return 0;
+}
+
+static int
+nfp_bar_write(struct nfp_pcie_user *nfp, struct nfp_bar *bar,
+		  uint32_t newcfg)
+{
+	int base, slot;
+
+	base = bar->index >> 3;
+	slot = bar->index & 7;
+
+	if (!nfp->cfg)
+		return (-ENOMEM);
+
+	bar->csr = nfp->cfg +
+		   NFP_PCIE_CFG_BAR_PCIETOCPPEXPBAR(base, slot);
+
+	*(uint32_t *)(bar->csr) = newcfg;
+
+	bar->barcfg = newcfg;
+#ifdef DEBUG
+	printf("BAR%d: updated to 0x%08x\n", bar->index, newcfg);
+#endif
+
+	return 0;
+}
+
+static int
+nfp_reconfigure_bar(struct nfp_pcie_user *nfp, struct nfp_bar *bar, int tgt,
+		int act, int tok, uint64_t offset, size_t size, int width)
+{
+	uint64_t newbase;
+	uint32_t newcfg;
+	int err;
+
+	err = nfp_compute_bar(bar, &newcfg, &newbase, tgt, act, tok, offset,
+			      size, width);
+	if (err)
+		return err;
+
+	bar->base = newbase;
+
+	return nfp_bar_write(nfp, bar, newcfg);
+}
+
+/*
+ * Map all PCI bars. We assume that the BAR with the PCIe config block is
+ * already mapped.
+ *
+ * BAR0.0: Reserved for General Mapping (for MSI-X access to PCIe SRAM)
+ */
+static int
+nfp_enable_bars(struct nfp_pcie_user *nfp)
+{
+	struct nfp_bar *bar;
+	int x;
+
+	for (x = ARRAY_SIZE(nfp->bar); x > 0; x--) {
+		bar = &nfp->bar[x - 1];
+		bar->barcfg = 0;
+		bar->nfp = nfp;
+		bar->index = x;
+		bar->mask = (1 << (nfp->barsz - 3)) - 1;
+		bar->bitsize = nfp->barsz - 3;
+		bar->base = 0;
+		bar->iomem = NULL;
+		bar->lock = 0;
+		bar->csr = nfp->cfg +
+			   NFP_PCIE_CFG_BAR_PCIETOCPPEXPBAR(bar->index >> 3,
+							   bar->index & 7);
+		bar->iomem =
+		    (char *)mmap(0, 1 << bar->bitsize, PROT_READ | PROT_WRITE,
+				 MAP_SHARED, nfp->device,
+				 bar->index << bar->bitsize);
+
+		if (bar->iomem == MAP_FAILED)
+			return (-ENOMEM);
+	}
+	return 0;
+}
+
+static struct nfp_bar *
+nfp_alloc_bar(struct nfp_pcie_user *nfp)
+{
+	struct nfp_bar *bar;
+	int x;
+
+	for (x = ARRAY_SIZE(nfp->bar); x > 0; x--) {
+		bar = &nfp->bar[x - 1];
+		if (!bar->lock) {
+			bar->lock = 1;
+			return bar;
+		}
+	}
+	return NULL;
+}
+
+static void
+nfp_disable_bars(struct nfp_pcie_user *nfp)
+{
+	struct nfp_bar *bar;
+	int x;
+
+	for (x = ARRAY_SIZE(nfp->bar); x > 0; x--) {
+		bar = &nfp->bar[x - 1];
+		if (bar->iomem) {
+			munmap(bar->iomem, 1 << (nfp->barsz - 3));
+			bar->iomem = NULL;
+			bar->lock = 0;
+		}
+	}
+}
+
+/*
+ * Generic CPP bus access interface.
+ */
+
+struct nfp6000_area_priv {
+	struct nfp_bar *bar;
+	uint32_t bar_offset;
+
+	uint32_t target;
+	uint32_t action;
+	uint32_t token;
+	uint64_t offset;
+	struct {
+		int read;
+		int write;
+		int bar;
+	} width;
+	size_t size;
+	char *iomem;
+};
+
+static int
+nfp6000_area_init(struct nfp_cpp_area *area, uint32_t dest,
+		  unsigned long long address, unsigned long size)
+{
+	struct nfp_pcie_user *nfp = nfp_cpp_priv(nfp_cpp_area_cpp(area));
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+	uint32_t target = NFP_CPP_ID_TARGET_of(dest);
+	uint32_t action = NFP_CPP_ID_ACTION_of(dest);
+	uint32_t token = NFP_CPP_ID_TOKEN_of(dest);
+	int pp, ret = 0;
+
+	pp = nfp6000_target_pushpull(NFP_CPP_ID(target, action, token),
+				     address);
+	if (pp < 0)
+		return pp;
+
+	priv->width.read = PUSH_WIDTH(pp);
+	priv->width.write = PULL_WIDTH(pp);
+
+	if (priv->width.read > 0 &&
+	    priv->width.write > 0 && priv->width.read != priv->width.write)
+		return -EINVAL;
+
+	if (priv->width.read > 0)
+		priv->width.bar = priv->width.read;
+	else
+		priv->width.bar = priv->width.write;
+
+	priv->bar = nfp_alloc_bar(nfp);
+	if (priv->bar == NULL)
+		return -ENOMEM;
+
+	priv->target = target;
+	priv->action = action;
+	priv->token = token;
+	priv->offset = address;
+	priv->size = size;
+
+	ret = nfp_reconfigure_bar(nfp, priv->bar, priv->target, priv->action,
+				  priv->token, priv->offset, priv->size,
+				  priv->width.bar);
+
+	return ret;
+}
+
+static int
+nfp6000_area_acquire(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+
+	/* Calculate offset into BAR. */
+	if (nfp_bar_maptype(priv->bar) ==
+	    NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_GENERAL) {
+		priv->bar_offset = priv->offset &
+			(NFP_PCIE_P2C_GENERAL_SIZE(priv->bar) - 1);
+		priv->bar_offset +=
+			NFP_PCIE_P2C_GENERAL_TARGET_OFFSET(priv->bar,
+							   priv->target);
+		priv->bar_offset +=
+		    NFP_PCIE_P2C_GENERAL_TOKEN_OFFSET(priv->bar, priv->token);
+	} else {
+		priv->bar_offset = priv->offset & priv->bar->mask;
+	}
+
+	/* Must have been too big. Sub-allocate. */
+	if (!priv->bar->iomem)
+		return (-ENOMEM);
+
+	priv->iomem = priv->bar->iomem + priv->bar_offset;
+
+	return 0;
+}
+
+static void *
+nfp6000_area_mapped(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *area_priv = nfp_cpp_area_priv(area);
+
+	if (!area_priv->iomem)
+		return NULL;
+
+	return area_priv->iomem;
+}
+
+static void
+nfp6000_area_release(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+	priv->bar->lock = 0;
+	priv->bar = NULL;
+	priv->iomem = NULL;
+}
+
+static void *
+nfp6000_area_iomem(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+	return priv->iomem;
+}
+
+static int
+nfp6000_area_read(struct nfp_cpp_area *area, void *kernel_vaddr,
+		  unsigned long offset, unsigned int length)
+{
+	uint64_t *wrptr64 = kernel_vaddr;
+	const volatile uint64_t *rdptr64;
+	struct nfp6000_area_priv *priv;
+	uint32_t *wrptr32 = kernel_vaddr;
+	const volatile uint32_t *rdptr32;
+	int width;
+	unsigned int n;
+	bool is_64;
+
+	priv = nfp_cpp_area_priv(area);
+	rdptr64 = (uint64_t *)(priv->iomem + offset);
+	rdptr32 = (uint32_t *)(priv->iomem + offset);
+
+	if (offset + length > priv->size)
+		return -EFAULT;
+
+	width = priv->width.read;
+
+	if (width <= 0)
+		return -EINVAL;
+
+	/* Unaligned? Translate to an explicit access */
+	if ((priv->offset + offset) & (width - 1)) {
+		printf("aread_read unaligned!!!\n");
+		return -EINVAL;
+	}
+
+	is_64 = width == TARGET_WIDTH_64;
+
+	/* MU reads via a PCIe2CPP BAR supports 32bit (and other) lengths */
+	if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
+	    priv->action == NFP_CPP_ACTION_RW) {
+		is_64 = false;
+	}
+
+	if (is_64) {
+		if (offset % sizeof(uint64_t) != 0 ||
+		    length % sizeof(uint64_t) != 0)
+			return -EINVAL;
+	} else {
+		if (offset % sizeof(uint32_t) != 0 ||
+		    length % sizeof(uint32_t) != 0)
+			return -EINVAL;
+	}
+
+	if (!priv->bar)
+		return -EFAULT;
+
+	if (is_64)
+		for (n = 0; n < length; n += sizeof(uint64_t)) {
+			*wrptr64 = *rdptr64;
+			wrptr64++;
+			rdptr64++;
+		}
+	else
+		for (n = 0; n < length; n += sizeof(uint32_t)) {
+			*wrptr32 = *rdptr32;
+			wrptr32++;
+			rdptr32++;
+		}
+
+	return n;
+}
+
+static int
+nfp6000_area_write(struct nfp_cpp_area *area, const void *kernel_vaddr,
+		   unsigned long offset, unsigned int length)
+{
+	const uint64_t *rdptr64 = kernel_vaddr;
+	uint64_t *wrptr64;
+	const uint32_t *rdptr32 = kernel_vaddr;
+	struct nfp6000_area_priv *priv;
+	uint32_t *wrptr32;
+	int width;
+	unsigned int n;
+	bool is_64;
+
+	priv = nfp_cpp_area_priv(area);
+	wrptr64 = (uint64_t *)(priv->iomem + offset);
+	wrptr32 = (uint32_t *)(priv->iomem + offset);
+
+	if (offset + length > priv->size)
+		return -EFAULT;
+
+	width = priv->width.write;
+
+	if (width <= 0)
+		return -EINVAL;
+
+	/* Unaligned? Translate to an explicit access */
+	if ((priv->offset + offset) & (width - 1))
+		return -EINVAL;
+
+	is_64 = width == TARGET_WIDTH_64;
+
+	/* MU writes via a PCIe2CPP BAR supports 32bit (and other) lengths */
+	if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
+	    priv->action == NFP_CPP_ACTION_RW)
+		is_64 = false;
+
+	if (is_64) {
+		if (offset % sizeof(uint64_t) != 0 ||
+		    length % sizeof(uint64_t) != 0)
+			return -EINVAL;
+	} else {
+		if (offset % sizeof(uint32_t) != 0 ||
+		    length % sizeof(uint32_t) != 0)
+			return -EINVAL;
+	}
+
+	if (!priv->bar)
+		return -EFAULT;
+
+	if (is_64)
+		for (n = 0; n < length; n += sizeof(uint64_t)) {
+			*wrptr64 = *rdptr64;
+			wrptr64++;
+			rdptr64++;
+		}
+	else
+		for (n = 0; n < length; n += sizeof(uint32_t)) {
+			*wrptr32 = *rdptr32;
+			wrptr32++;
+			rdptr32++;
+		}
+
+	return n;
+}
+
+#define PCI_DEVICES "/sys/bus/pci/devices"
+
+static int
+nfp_acquire_process_lock(struct nfp_pcie_user *desc)
+{
+	int rc;
+	struct flock lock;
+	char lockname[30];
+
+	memset(&lock, 0, sizeof(lock));
+
+	snprintf(lockname, sizeof(lockname), "/var/lock/nfp_%s", desc->busdev);
+	desc->lock = open(lockname, O_RDWR | O_CREAT, 0666);
+	if (desc->lock < 0)
+		return desc->lock;
+
+	lock.l_type = F_WRLCK;
+	lock.l_whence = SEEK_SET;
+	rc = -1;
+	while (rc != 0) {
+		rc = fcntl(desc->lock, F_SETLKW, &lock);
+		if (rc < 0) {
+			if (errno != EAGAIN && errno != EACCES) {
+				close(desc->lock);
+				return rc;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int
+nfp6000_set_model(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
+{
+	char tmp_str[80];
+	uint32_t tmp;
+	int fp;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = open(tmp_str, O_RDONLY);
+	if (!fp)
+		return -1;
+
+	lseek(fp, 0x2e, SEEK_SET);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("Error reading config file for model\n");
+		return -1;
+	}
+
+	tmp = tmp << 16;
+
+	if (close(fp) == -1)
+		return -1;
+
+	nfp_cpp_model_set(cpp, tmp);
+
+	return 0;
+}
+
+static int
+nfp6000_set_interface(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
+{
+	char tmp_str[80];
+	uint16_t tmp;
+	int fp;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = open(tmp_str, O_RDONLY);
+	if (!fp)
+		return -1;
+
+	lseek(fp, 0x154, SEEK_SET);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for interface\n");
+		return -1;
+	}
+
+	if (close(fp) == -1)
+		return -1;
+
+	nfp_cpp_interface_set(cpp, tmp);
+
+	return 0;
+}
+
+#define PCI_CFG_SPACE_SIZE	256
+#define PCI_CFG_SPACE_EXP_SIZE	4096
+#define PCI_EXT_CAP_ID(header)		(int)(header & 0x0000ffff)
+#define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
+#define PCI_EXT_CAP_ID_DSN	0x03
+static int
+nfp_pci_find_next_ext_capability(int fp, int cap)
+{
+	uint32_t header;
+	int ttl;
+	int pos = PCI_CFG_SPACE_SIZE;
+
+	/* minimum 8 bytes per capability */
+	ttl = (PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) / 8;
+
+	lseek(fp, pos, SEEK_SET);
+	if (read(fp, &header, sizeof(header)) != sizeof(header)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	/*
+	 * If we have no capabilities, this is indicated by cap ID,
+	 * cap version and next pointer all being 0.
+	 */
+	if (header == 0)
+		return 0;
+
+	while (ttl-- > 0) {
+		if (PCI_EXT_CAP_ID(header) == cap)
+			return pos;
+
+		pos = PCI_EXT_CAP_NEXT(header);
+		if (pos < PCI_CFG_SPACE_SIZE)
+			break;
+
+		lseek(fp, pos, SEEK_SET);
+		if (read(fp, &header, sizeof(header)) != sizeof(header)) {
+			printf("error reading config file for serial\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int
+nfp6000_set_serial(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
+{
+	char tmp_str[80];
+	uint16_t tmp;
+	uint8_t serial[6];
+	int serial_len = 6;
+	int fp, pos;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = open(tmp_str, O_RDONLY);
+	if (!fp)
+		return -1;
+
+	pos = nfp_pci_find_next_ext_capability(fp, PCI_EXT_CAP_ID_DSN);
+	if (pos <= 0) {
+		printf("PCI_EXT_CAP_ID_DSN not found. Using default offset\n");
+		lseek(fp, 0x156, SEEK_SET);
+	} else {
+		lseek(fp, pos + 6, SEEK_SET);
+	}
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	serial[4] = (uint8_t)((tmp >> 8) & 0xff);
+	serial[5] = (uint8_t)(tmp & 0xff);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	serial[2] = (uint8_t)((tmp >> 8) & 0xff);
+	serial[3] = (uint8_t)(tmp & 0xff);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	serial[0] = (uint8_t)((tmp >> 8) & 0xff);
+	serial[1] = (uint8_t)(tmp & 0xff);
+
+	if (close(fp) == -1)
+		return -1;
+
+	nfp_cpp_serial_set(cpp, serial, serial_len);
+
+	return 0;
+}
+
+static int
+nfp6000_set_barsz(struct nfp_pcie_user *desc)
+{
+	char tmp_str[80];
+	unsigned long start, end, flags, tmp;
+	int i;
+	FILE *fp;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/resource", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = fopen(tmp_str, "r");
+	if (!fp)
+		return -1;
+
+	if (fscanf(fp, "0x%lx 0x%lx 0x%lx", &start, &end, &flags) == 0) {
+		printf("error reading resource file for bar size\n");
+		return -1;
+	}
+
+	if (fclose(fp) == -1)
+		return -1;
+
+	tmp = (end - start) + 1;
+	i = 0;
+	while (tmp >>= 1)
+		i++;
+	desc->barsz = i;
+	return 0;
+}
+
+static int
+nfp6000_init(struct nfp_cpp *cpp, const char *devname)
+{
+	char link[120];
+	char tmp_str[80];
+	ssize_t size;
+	int ret = 0;
+	uint32_t model;
+	struct nfp_pcie_user *desc;
+
+	desc = malloc(sizeof(*desc));
+	if (!desc)
+		return -1;
+
+
+	memset(desc->busdev, 0, BUSDEV_SZ);
+	strncpy(desc->busdev, devname, strlen(devname));
+
+	ret = nfp_acquire_process_lock(desc);
+	if (ret)
+		return -1;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/driver", PCI_DEVICES,
+		 desc->busdev);
+
+	size = readlink(tmp_str, link, sizeof(link));
+
+	if (size == -1)
+		tmp_str[0] = '\0';
+
+	if (size == sizeof(link))
+		tmp_str[0] = '\0';
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/resource0", PCI_DEVICES,
+		 desc->busdev);
+
+	desc->device = open(tmp_str, O_RDWR);
+	if (desc->device == -1)
+		return -1;
+
+	if (nfp6000_set_model(desc, cpp) < 0)
+		return -1;
+	if (nfp6000_set_interface(desc, cpp) < 0)
+		return -1;
+	if (nfp6000_set_serial(desc, cpp) < 0)
+		return -1;
+	if (nfp6000_set_barsz(desc) < 0)
+		return -1;
+
+	desc->cfg = (char *)mmap(0, 1 << (desc->barsz - 3),
+				 PROT_READ | PROT_WRITE,
+				 MAP_SHARED, desc->device, 0);
+
+	if (desc->cfg == MAP_FAILED)
+		return -1;
+
+	nfp_enable_bars(desc);
+
+	nfp_cpp_priv_set(cpp, desc);
+
+	model = __nfp_cpp_model_autodetect(cpp);
+	nfp_cpp_model_set(cpp, model);
+
+	return ret;
+}
+
+static void
+nfp6000_free(struct nfp_cpp *cpp)
+{
+	struct nfp_pcie_user *desc = nfp_cpp_priv(cpp);
+	int x;
+
+	/* Unmap may cause if there are any pending transaxctions */
+	nfp_disable_bars(desc);
+	munmap(desc->cfg, 1 << (desc->barsz - 3));
+
+	for (x = ARRAY_SIZE(desc->bar); x > 0; x--) {
+		if (desc->bar[x - 1].iomem)
+			munmap(desc->bar[x - 1].iomem, 1 << (desc->barsz - 3));
+	}
+	close(desc->lock);
+	close(desc->device);
+	free(desc);
+}
+
+static const struct nfp_cpp_operations nfp6000_pcie_ops = {
+	.init = nfp6000_init,
+	.free = nfp6000_free,
+
+	.area_priv_size = sizeof(struct nfp6000_area_priv),
+	.area_init = nfp6000_area_init,
+	.area_acquire = nfp6000_area_acquire,
+	.area_release = nfp6000_area_release,
+	.area_mapped = nfp6000_area_mapped,
+	.area_read = nfp6000_area_read,
+	.area_write = nfp6000_area_write,
+	.area_iomem = nfp6000_area_iomem,
+};
+
+const struct
+nfp_cpp_operations *nfp_cpp_transport_operations(void)
+{
+	return &nfp6000_pcie_ops;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_cppcore.c b/drivers/net/nfp/nfpcore/nfp_cppcore.c
new file mode 100644
index 0000000..f110b8f
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_cppcore.c
@@ -0,0 +1,901 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <sys/types.h>
+
+#include <rte_byteorder.h>
+
+#include "nfp_cpp.h"
+#include "nfp_target.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp6000/nfp_xpb.h"
+#include "nfp_nffw.h"
+
+#define NFP_PL_DEVICE_ID                        0x00000004
+#define NFP_PL_DEVICE_ID_MASK                   0xff
+
+#define NFP6000_ARM_GCSR_SOFTMODEL0             0x00400144
+
+void
+nfp_cpp_priv_set(struct nfp_cpp *cpp, void *priv)
+{
+	cpp->priv = priv;
+}
+
+void *
+nfp_cpp_priv(struct nfp_cpp *cpp)
+{
+	return cpp->priv;
+}
+
+void
+nfp_cpp_model_set(struct nfp_cpp *cpp, uint32_t model)
+{
+	cpp->model = model;
+}
+
+uint32_t
+nfp_cpp_model(struct nfp_cpp *cpp)
+{
+	if (!cpp)
+		return NFP_CPP_MODEL_INVALID;
+
+	if (cpp->model == 0)
+		cpp->model = __nfp_cpp_model_autodetect(cpp);
+
+	return cpp->model;
+}
+
+void
+nfp_cpp_interface_set(struct nfp_cpp *cpp, uint32_t interface)
+{
+	cpp->interface = interface;
+}
+
+int
+nfp_cpp_serial(struct nfp_cpp *cpp, const uint8_t **serial)
+{
+	*serial = cpp->serial;
+	return cpp->serial_len;
+}
+
+int
+nfp_cpp_serial_set(struct nfp_cpp *cpp, const uint8_t *serial,
+		   size_t serial_len)
+{
+	if (cpp->serial_len)
+		free(cpp->serial);
+
+	cpp->serial = malloc(serial_len);
+	if (!cpp->serial)
+		return -1;
+
+	memcpy(cpp->serial, serial, serial_len);
+	cpp->serial_len = serial_len;
+
+	return 0;
+}
+
+uint16_t
+nfp_cpp_interface(struct nfp_cpp *cpp)
+{
+	if (!cpp)
+		return NFP_CPP_INTERFACE(NFP_CPP_INTERFACE_TYPE_INVALID, 0, 0);
+
+	return cpp->interface;
+}
+
+void *
+nfp_cpp_area_priv(struct nfp_cpp_area *cpp_area)
+{
+	return &cpp_area[1];
+}
+
+struct nfp_cpp *
+nfp_cpp_area_cpp(struct nfp_cpp_area *cpp_area)
+{
+	return cpp_area->cpp;
+}
+
+const char *
+nfp_cpp_area_name(struct nfp_cpp_area *cpp_area)
+{
+	return cpp_area->name;
+}
+
+/*
+ * nfp_cpp_area_alloc - allocate a new CPP area
+ * @cpp:    CPP handle
+ * @dest:   CPP id
+ * @address:    start address on CPP target
+ * @size:   size of area in bytes
+ *
+ * Allocate and initialize a CPP area structure.  The area must later
+ * be locked down with an 'acquire' before it can be safely accessed.
+ *
+ * NOTE: @address and @size must be 32-bit aligned values.
+ */
+struct nfp_cpp_area *
+nfp_cpp_area_alloc_with_name(struct nfp_cpp *cpp, uint32_t dest,
+			      const char *name, unsigned long long address,
+			      unsigned long size)
+{
+	struct nfp_cpp_area *area;
+	uint64_t tmp64 = (uint64_t)address;
+	int tmp, err;
+
+	if (!cpp)
+		return NULL;
+
+	/* CPP bus uses only a 40-bit address */
+	if ((address + size) > (1ULL << 40))
+		return NFP_ERRPTR(EFAULT);
+
+	/* Remap from cpp_island to cpp_target */
+	err = nfp_target_cpp(dest, tmp64, &dest, &tmp64, cpp->imb_cat_table);
+	if (err < 0)
+		return NULL;
+
+	address = (unsigned long long)tmp64;
+
+	if (!name)
+		name = "";
+
+	area = calloc(1, sizeof(*area) + cpp->op->area_priv_size +
+		      strlen(name) + 1);
+	if (!area)
+		return NULL;
+
+	area->cpp = cpp;
+	area->name = ((char *)area) + sizeof(*area) + cpp->op->area_priv_size;
+	memcpy(area->name, name, strlen(name) + 1);
+
+	/*
+	 * Preserve errno around the call to area_init, since most
+	 * implementations will blindly call nfp_target_action_width()for both
+	 * read or write modes, and that will set errno to EINVAL.
+	 */
+	tmp = errno;
+
+	err = cpp->op->area_init(area, dest, address, size);
+	if (err < 0) {
+		free(area);
+		return NULL;
+	}
+
+	/* Restore errno */
+	errno = tmp;
+
+	area->offset = address;
+	area->size = size;
+
+	return area;
+}
+
+struct nfp_cpp_area *
+nfp_cpp_area_alloc(struct nfp_cpp *cpp, uint32_t dest,
+		    unsigned long long address, unsigned long size)
+{
+	return nfp_cpp_area_alloc_with_name(cpp, dest, NULL, address, size);
+}
+
+/*
+ * nfp_cpp_area_alloc_acquire - allocate a new CPP area and lock it down
+ *
+ * @cpp:    CPP handle
+ * @dest:   CPP id
+ * @address:    start address on CPP target
+ * @size:   size of area
+ *
+ * Allocate and initilizae a CPP area structure, and lock it down so
+ * that it can be accessed directly.
+ *
+ * NOTE: @address and @size must be 32-bit aligned values.
+ *
+ * NOTE: The area must also be 'released' when the structure is freed.
+ */
+struct nfp_cpp_area *
+nfp_cpp_area_alloc_acquire(struct nfp_cpp *cpp, uint32_t destination,
+			    unsigned long long address, unsigned long size)
+{
+	struct nfp_cpp_area *area;
+
+	area = nfp_cpp_area_alloc(cpp, destination, address, size);
+	if (!area)
+		return NULL;
+
+	if (nfp_cpp_area_acquire(area)) {
+		nfp_cpp_area_free(area);
+		return NULL;
+	}
+
+	return area;
+}
+
+/*
+ * nfp_cpp_area_free - free up the CPP area
+ * area:    CPP area handle
+ *
+ * Frees up memory resources held by the CPP area.
+ */
+void
+nfp_cpp_area_free(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_cleanup)
+		area->cpp->op->area_cleanup(area);
+	free(area);
+}
+
+/*
+ * nfp_cpp_area_release_free - release CPP area and free it
+ * area:    CPP area handle
+ *
+ * Releases CPP area and frees up memory resources held by the it.
+ */
+void
+nfp_cpp_area_release_free(struct nfp_cpp_area *area)
+{
+	nfp_cpp_area_release(area);
+	nfp_cpp_area_free(area);
+}
+
+/*
+ * nfp_cpp_area_acquire - lock down a CPP area for access
+ * @area:   CPP area handle
+ *
+ * Locks down the CPP area for a potential long term activity.  Area
+ * must always be locked down before being accessed.
+ */
+int
+nfp_cpp_area_acquire(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_acquire) {
+		int err = area->cpp->op->area_acquire(area);
+
+		if (err < 0)
+			return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * nfp_cpp_area_release - release a locked down CPP area
+ * @area:   CPP area handle
+ *
+ * Releases a previously locked down CPP area.
+ */
+void
+nfp_cpp_area_release(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_release)
+		area->cpp->op->area_release(area);
+}
+
+/*
+ * nfp_cpp_area_iomem() - get IOMEM region for CPP area
+ *
+ * @area:       CPP area handle
+ *
+ * Returns an iomem pointer for use with readl()/writel() style operations.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ *
+ * Return: pointer to the area, or NULL
+ */
+void *
+nfp_cpp_area_iomem(struct nfp_cpp_area *area)
+{
+	void *iomem = NULL;
+
+	if (area->cpp->op->area_iomem)
+		iomem = area->cpp->op->area_iomem(area);
+
+	return iomem;
+}
+
+/*
+ * nfp_cpp_area_read - read data from CPP area
+ *
+ * @area:       CPP area handle
+ * @offset:     offset into CPP area
+ * @kernel_vaddr:   kernel address to put data into
+ * @length:     number of bytes to read
+ *
+ * Read data from indicated CPP region.
+ *
+ * NOTE: @offset and @length must be 32-bit aligned values.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ */
+int
+nfp_cpp_area_read(struct nfp_cpp_area *area, unsigned long offset,
+		  void *kernel_vaddr, size_t length)
+{
+	if ((offset + length) > area->size)
+		return NFP_ERRNO(EFAULT);
+
+	return area->cpp->op->area_read(area, kernel_vaddr, offset, length);
+}
+
+/*
+ * nfp_cpp_area_write - write data to CPP area
+ *
+ * @area:       CPP area handle
+ * @offset:     offset into CPP area
+ * @kernel_vaddr:   kernel address to read data from
+ * @length:     number of bytes to write
+ *
+ * Write data to indicated CPP region.
+ *
+ * NOTE: @offset and @length must be 32-bit aligned values.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ */
+int
+nfp_cpp_area_write(struct nfp_cpp_area *area, unsigned long offset,
+		   const void *kernel_vaddr, size_t length)
+{
+	if ((offset + length) > area->size)
+		return NFP_ERRNO(EFAULT);
+
+	return area->cpp->op->area_write(area, kernel_vaddr, offset, length);
+}
+
+void *
+nfp_cpp_area_mapped(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_mapped)
+		return area->cpp->op->area_mapped(area);
+	return NULL;
+}
+
+/*
+ * nfp_cpp_area_check_range - check if address range fits in CPP area
+ *
+ * @area:   CPP area handle
+ * @offset: offset into CPP area
+ * @length: size of address range in bytes
+ *
+ * Check if address range fits within CPP area.  Return 0 if area fits
+ * or -1 on error.
+ */
+int
+nfp_cpp_area_check_range(struct nfp_cpp_area *area, unsigned long long offset,
+			 unsigned long length)
+{
+	if (((offset + length) > area->size))
+		return NFP_ERRNO(EFAULT);
+
+	return 0;
+}
+
+/*
+ * Return the correct CPP address, and fixup xpb_addr as needed,
+ * based upon NFP model.
+ */
+static uint32_t
+nfp_xpb_to_cpp(struct nfp_cpp *cpp, uint32_t *xpb_addr)
+{
+	uint32_t xpb;
+	int island;
+
+	if (!NFP_CPP_MODEL_IS_6000(cpp->model))
+		return 0;
+
+	xpb = NFP_CPP_ID(14, NFP_CPP_ACTION_RW, 0);
+
+	/*
+	 * Ensure that non-local XPB accesses go out through the
+	 * global XPBM bus.
+	 */
+	island = ((*xpb_addr) >> 24) & 0x3f;
+
+	if (!island)
+		return xpb;
+
+	if (island == 1) {
+		/*
+		 * Accesses to the ARM Island overlay uses Island 0
+		 * Global Bit
+		 */
+		(*xpb_addr) &= ~0x7f000000;
+		if (*xpb_addr < 0x60000)
+			*xpb_addr |= (1 << 30);
+		else
+			/* And only non-ARM interfaces use island id = 1 */
+			if (NFP_CPP_INTERFACE_TYPE_of(nfp_cpp_interface(cpp)) !=
+			    NFP_CPP_INTERFACE_TYPE_ARM)
+				*xpb_addr |= (1 << 24);
+	} else {
+		(*xpb_addr) |= (1 << 30);
+	}
+
+	return xpb;
+}
+
+int
+nfp_cpp_area_readl(struct nfp_cpp_area *area, unsigned long offset,
+		   uint32_t *value)
+{
+	int sz;
+	uint32_t tmp;
+
+	sz = nfp_cpp_area_read(area, offset, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_32(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_area_writel(struct nfp_cpp_area *area, unsigned long offset,
+		    uint32_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_32(value);
+	sz = nfp_cpp_area_write(area, offset, &value, sizeof(value));
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_area_readq(struct nfp_cpp_area *area, unsigned long offset,
+		   uint64_t *value)
+{
+	int sz;
+	uint64_t tmp;
+
+	sz = nfp_cpp_area_read(area, offset, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_64(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_area_writeq(struct nfp_cpp_area *area, unsigned long offset,
+		    uint64_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_64(value);
+	sz = nfp_cpp_area_write(area, offset, &value, sizeof(value));
+
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_readl(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	      uint32_t *value)
+{
+	int sz;
+	uint32_t tmp;
+
+	sz = nfp_cpp_read(cpp, cpp_id, address, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_32(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_writel(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	       uint32_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_32(value);
+	sz = nfp_cpp_write(cpp, cpp_id, address, &value, sizeof(value));
+
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_readq(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	      uint64_t *value)
+{
+	int sz;
+	uint64_t tmp;
+
+	sz = nfp_cpp_read(cpp, cpp_id, address, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_64(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_writeq(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	       uint64_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_64(value);
+	sz = nfp_cpp_write(cpp, cpp_id, address, &value, sizeof(value));
+
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_xpb_writel(struct nfp_cpp *cpp, uint32_t xpb_addr, uint32_t value)
+{
+	uint32_t cpp_dest;
+
+	cpp_dest = nfp_xpb_to_cpp(cpp, &xpb_addr);
+
+	return nfp_cpp_writel(cpp, cpp_dest, xpb_addr, value);
+}
+
+int
+nfp_xpb_readl(struct nfp_cpp *cpp, uint32_t xpb_addr, uint32_t *value)
+{
+	uint32_t cpp_dest;
+
+	cpp_dest = nfp_xpb_to_cpp(cpp, &xpb_addr);
+
+	return nfp_cpp_readl(cpp, cpp_dest, xpb_addr, value);
+}
+
+static struct nfp_cpp *
+nfp_cpp_alloc(const char *devname)
+{
+	const struct nfp_cpp_operations *ops;
+	struct nfp_cpp *cpp;
+	int err;
+
+	ops = nfp_cpp_transport_operations();
+
+	if (!ops || !ops->init)
+		return NFP_ERRPTR(EINVAL);
+
+	cpp = calloc(1, sizeof(*cpp));
+	if (!cpp)
+		return NULL;
+
+	cpp->op = ops;
+
+	if (cpp->op->init) {
+		err = cpp->op->init(cpp, devname);
+		if (err < 0) {
+			free(cpp);
+			return NULL;
+		}
+	}
+
+	if (NFP_CPP_MODEL_IS_6000(nfp_cpp_model(cpp))) {
+		uint32_t xpbaddr;
+		size_t tgt;
+
+		for (tgt = 0; tgt < ARRAY_SIZE(cpp->imb_cat_table); tgt++) {
+			/* Hardcoded XPB IMB Base, island 0 */
+			xpbaddr = 0x000a0000 + (tgt * 4);
+			err = nfp_xpb_readl(cpp, xpbaddr,
+				(uint32_t *)&cpp->imb_cat_table[tgt]);
+			if (err < 0) {
+				free(cpp);
+				return NULL;
+			}
+		}
+	}
+
+	return cpp;
+}
+
+/*
+ * nfp_cpp_free - free the CPP handle
+ * @cpp:    CPP handle
+ */
+void
+nfp_cpp_free(struct nfp_cpp *cpp)
+{
+	if (cpp->op && cpp->op->free)
+		cpp->op->free(cpp);
+
+	if (cpp->serial_len)
+		free(cpp->serial);
+
+	free(cpp);
+}
+
+struct nfp_cpp *
+nfp_cpp_from_device_name(const char *devname)
+{
+	return nfp_cpp_alloc(devname);
+}
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to modify
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_xpb_writelm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+		uint32_t value)
+{
+	int err;
+	uint32_t tmp;
+
+	err = nfp_xpb_readl(cpp, xpb_tgt, &tmp);
+	if (err < 0)
+		return err;
+
+	tmp &= ~mask;
+	tmp |= (mask & value);
+	return nfp_xpb_writel(cpp, xpb_tgt, tmp);
+}
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to monitor for
+ * @param timeout_us    maximum number of us to wait (-1 for forever)
+ *
+ * @return >= 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_xpb_waitlm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+	       uint32_t value, int timeout_us)
+{
+	uint32_t tmp;
+	int err;
+
+	do {
+		err = nfp_xpb_readl(cpp, xpb_tgt, &tmp);
+		if (err < 0)
+			goto exit;
+
+		if ((tmp & mask) == (value & mask)) {
+			if (timeout_us < 0)
+				timeout_us = 0;
+			break;
+		}
+
+		if (timeout_us < 0)
+			continue;
+
+		timeout_us -= 100;
+		usleep(100);
+	} while (timeout_us >= 0);
+
+	if (timeout_us < 0)
+		err = NFP_ERRNO(ETIMEDOUT);
+	else
+		err = timeout_us;
+
+exit:
+	return err;
+}
+
+/*
+ * nfp_cpp_read - read from CPP target
+ * @cpp:        CPP handle
+ * @destination:    CPP id
+ * @address:        offset into CPP target
+ * @kernel_vaddr:   kernel buffer for result
+ * @length:     number of bytes to read
+ */
+int
+nfp_cpp_read(struct nfp_cpp *cpp, uint32_t destination,
+	     unsigned long long address, void *kernel_vaddr, size_t length)
+{
+	struct nfp_cpp_area *area;
+	int err;
+
+	area = nfp_cpp_area_alloc_acquire(cpp, destination, address, length);
+	if (!area) {
+		printf("Area allocation/acquire failed\n");
+		return -1;
+	}
+
+	err = nfp_cpp_area_read(area, 0, kernel_vaddr, length);
+
+	nfp_cpp_area_release_free(area);
+	return err;
+}
+
+/*
+ * nfp_cpp_write - write to CPP target
+ * @cpp:        CPP handle
+ * @destination:    CPP id
+ * @address:        offset into CPP target
+ * @kernel_vaddr:   kernel buffer to read from
+ * @length:     number of bytes to write
+ */
+int
+nfp_cpp_write(struct nfp_cpp *cpp, uint32_t destination,
+	      unsigned long long address, const void *kernel_vaddr,
+	      size_t length)
+{
+	struct nfp_cpp_area *area;
+	int err;
+
+	area = nfp_cpp_area_alloc_acquire(cpp, destination, address, length);
+	if (!area)
+		return -1;
+
+	err = nfp_cpp_area_write(area, 0, kernel_vaddr, length);
+
+	nfp_cpp_area_release_free(area);
+	return err;
+}
+
+/*
+ * nfp_cpp_area_fill - fill a CPP area with a value
+ * @area:       CPP area
+ * @offset:     offset into CPP area
+ * @value:      value to fill with
+ * @length:     length of area to fill
+ */
+int
+nfp_cpp_area_fill(struct nfp_cpp_area *area, unsigned long offset,
+		  uint32_t value, size_t length)
+{
+	int err;
+	size_t i;
+	uint64_t value64;
+
+	value = rte_cpu_to_le_32(value);
+	value64 = ((uint64_t)value << 32) | value;
+
+	if ((offset + length) > area->size)
+		return NFP_ERRNO(EINVAL);
+
+	if ((area->offset + offset) & 3)
+		return NFP_ERRNO(EINVAL);
+
+	if (((area->offset + offset) & 7) == 4 && length >= 4) {
+		err = nfp_cpp_area_write(area, offset, &value, sizeof(value));
+		if (err < 0)
+			return err;
+		if (err != sizeof(value))
+			return NFP_ERRNO(ENOSPC);
+		offset += sizeof(value);
+		length -= sizeof(value);
+	}
+
+	for (i = 0; (i + sizeof(value)) < length; i += sizeof(value64)) {
+		err =
+		    nfp_cpp_area_write(area, offset + i, &value64,
+				       sizeof(value64));
+		if (err < 0)
+			return err;
+		if (err != sizeof(value64))
+			return NFP_ERRNO(ENOSPC);
+	}
+
+	if ((i + sizeof(value)) <= length) {
+		err =
+		    nfp_cpp_area_write(area, offset + i, &value, sizeof(value));
+		if (err < 0)
+			return err;
+		if (err != sizeof(value))
+			return NFP_ERRNO(ENOSPC);
+		i += sizeof(value);
+	}
+
+	return (int)i;
+}
+
+static inline uint8_t
+__nfp_bytemask_of(int width, uint64_t addr)
+{
+	uint8_t byte_mask;
+
+	if (width == 8)
+		byte_mask = 0xff;
+	else if (width == 4)
+		byte_mask = 0x0f << (addr & 4);
+	else if (width == 2)
+		byte_mask = 0x03 << (addr & 6);
+	else if (width == 1)
+		byte_mask = 0x01 << (addr & 7);
+	else
+		byte_mask = 0;
+
+	return byte_mask;
+}
+
+/*
+ * NOTE: This code should not use nfp_xpb_* functions,
+ * as those are model-specific
+ */
+uint32_t
+__nfp_cpp_model_autodetect(struct nfp_cpp *cpp)
+{
+	uint32_t arm_id = NFP_CPP_ID(NFP_CPP_TARGET_ARM, 0, 0);
+	uint32_t model = 0;
+
+	nfp_cpp_readl(cpp, arm_id, NFP6000_ARM_GCSR_SOFTMODEL0, &model);
+
+	if (NFP_CPP_MODEL_IS_6000(model)) {
+		uint32_t tmp;
+
+		nfp_cpp_model_set(cpp, model);
+
+		/* The PL's PluDeviceID revision code is authoratative */
+		model &= ~0xff;
+		nfp_xpb_readl(cpp, NFP_XPB_DEVICE(1, 1, 16) +
+				   NFP_PL_DEVICE_ID, &tmp);
+		model |= (NFP_PL_DEVICE_ID_MASK & tmp) - 0x10;
+	}
+
+	return model;
+}
+
+/*
+ * nfp_cpp_map_area() - Helper function to map an area
+ * @cpp:    NFP CPP handler
+ * @domain: CPP domain
+ * @target: CPP target
+ * @addr:   CPP address
+ * @size:   Size of the area
+ * @area:   Area handle (output)
+ *
+ * Map an area of IOMEM access.  To undo the effect of this function call
+ * @nfp_cpp_area_release_free(*area).
+ *
+ * Return: Pointer to memory mapped area or ERR_PTR
+ */
+uint8_t *
+nfp_cpp_map_area(struct nfp_cpp *cpp, int domain, int target, uint64_t addr,
+		 unsigned long size, struct nfp_cpp_area **area)
+{
+	uint8_t *res;
+	uint32_t dest;
+
+	dest = NFP_CPP_ISLAND_ID(target, NFP_CPP_ACTION_RW, 0, domain);
+
+	*area = nfp_cpp_area_alloc_acquire(cpp, dest, addr, size);
+	if (!*area)
+		goto err_eio;
+
+	res = nfp_cpp_area_iomem(*area);
+	if (!res)
+		goto err_release_free;
+
+	return res;
+
+err_release_free:
+	nfp_cpp_area_release_free(*area);
+err_eio:
+	return NULL;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_crc.c b/drivers/net/nfp/nfpcore/nfp_crc.c
new file mode 100644
index 0000000..a3c0e92
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_crc.c
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <inttypes.h>
+
+#include "nfp_crc.h"
+
+static inline uint32_t
+nfp_crc32_be_generic(uint32_t crc, unsigned char const *p, size_t len,
+		 uint32_t polynomial)
+{
+	int i;
+	while (len--) {
+		crc ^= *p++ << 24;
+		for (i = 0; i < 8; i++)
+			crc = (crc << 1) ^ ((crc & 0x80000000) ? polynomial :
+					  0);
+	}
+	return crc;
+}
+
+static inline uint32_t
+nfp_crc32_be(uint32_t crc, unsigned char const *p, size_t len)
+{
+	return nfp_crc32_be_generic(crc, p, len, CRCPOLY_BE);
+}
+
+static uint32_t
+nfp_crc32_posix_end(uint32_t crc, size_t total_len)
+{
+	/* Extend with the length of the string. */
+	while (total_len != 0) {
+		uint8_t c = total_len & 0xff;
+
+		crc = nfp_crc32_be(crc, &c, 1);
+		total_len >>= 8;
+	}
+
+	return ~crc;
+}
+
+uint32_t
+nfp_crc32_posix(const void *buff, size_t len)
+{
+	return nfp_crc32_posix_end(nfp_crc32_be(0, buff, len), len);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_crc.h b/drivers/net/nfp/nfpcore/nfp_crc.h
new file mode 100644
index 0000000..f8a112b
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_crc.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_CRC_H__
+#define __NFP_CRC_H__
+
+/*
+ * There are multiple 16-bit CRC polynomials in common use, but this is
+ * *the* standard CRC-32 polynomial, first popularized by Ethernet.
+ * x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x^1+x^0
+ */
+#define CRCPOLY_LE 0xedb88320
+#define CRCPOLY_BE 0x04c11db7
+
+uint32_t nfp_crc32_posix(const void *buff, size_t len);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_hwinfo.c b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
new file mode 100644
index 0000000..b8d6400
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* Parse the hwinfo table that the ARM firmware builds in the ARM scratch SRAM
+ * after chip reset.
+ *
+ * Examples of the fields:
+ *   me.count = 40
+ *   me.mask = 0x7f_ffff_ffff
+ *
+ *   me.count is the total number of MEs on the system.
+ *   me.mask is the bitmask of MEs that are available for application usage.
+ *
+ *   (ie, in this example, ME 39 has been reserved by boardconfig.)
+ */
+
+#include <stdio.h>
+#include <time.h>
+#include <zlib.h>
+
+#include "nfp_cpp.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp_resource.h"
+#include "nfp_hwinfo.h"
+#include "nfp_crc.h"
+
+static int
+nfp_hwinfo_is_updating(struct nfp_hwinfo *hwinfo)
+{
+	return hwinfo->version & NFP_HWINFO_VERSION_UPDATING;
+}
+
+static int
+nfp_hwinfo_db_walk(struct nfp_hwinfo *hwinfo, uint32_t size)
+{
+	const char *key, *val, *end = hwinfo->data + size;
+
+	for (key = hwinfo->data; *key && key < end;
+	     key = val + strlen(val) + 1) {
+		val = key + strlen(key) + 1;
+		if (val >= end) {
+			printf("Bad HWINFO - overflowing key\n");
+			return -EINVAL;
+		}
+
+		if (val + strlen(val) + 1 > end) {
+			printf("Bad HWINFO - overflowing value\n");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int
+nfp_hwinfo_db_validate(struct nfp_hwinfo *db, uint32_t len)
+{
+	uint32_t size, new_crc, *crc;
+
+	size = db->size;
+	if (size > len) {
+		printf("Unsupported hwinfo size %u > %u\n", size, len);
+		return -EINVAL;
+	}
+
+	size -= sizeof(uint32_t);
+	new_crc = nfp_crc32_posix((char *)db, size);
+	crc = (uint32_t *)(db->start + size);
+	if (new_crc != *crc) {
+		printf("Corrupt hwinfo table (CRC mismatch)\n");
+		printf("\tcalculated 0x%x, expected 0x%x\n", new_crc, *crc);
+		return -EINVAL;
+	}
+
+	return nfp_hwinfo_db_walk(db, size);
+}
+
+static struct nfp_hwinfo *
+nfp_hwinfo_try_fetch(struct nfp_cpp *cpp, size_t *cpp_size)
+{
+	struct nfp_hwinfo *header;
+	void *res;
+	uint64_t cpp_addr;
+	uint32_t cpp_id;
+	int err;
+	uint8_t *db;
+
+	res = nfp_resource_acquire(cpp, NFP_RESOURCE_NFP_HWINFO);
+	if (res) {
+		cpp_id = nfp_resource_cpp_id(res);
+		cpp_addr = nfp_resource_address(res);
+		*cpp_size = nfp_resource_size(res);
+
+		nfp_resource_release(res);
+
+		if (*cpp_size < HWINFO_SIZE_MIN)
+			return NULL;
+	} else {
+		return NULL;
+	}
+
+	db = malloc(*cpp_size + 1);
+	if (!db)
+		return NULL;
+
+	err = nfp_cpp_read(cpp, cpp_id, cpp_addr, db, *cpp_size);
+	if (err != (int)*cpp_size)
+		goto exit_free;
+
+	header = (void *)db;
+	printf("NFP HWINFO header: %08x\n", *(uint32_t *)header);
+	if (nfp_hwinfo_is_updating(header))
+		goto exit_free;
+
+	if (header->version != NFP_HWINFO_VERSION_2) {
+		printf("Unknown HWInfo version: 0x%08x\n",
+			header->version);
+		goto exit_free;
+	}
+
+	/* NULL-terminate for safety */
+	db[*cpp_size] = '\0';
+
+	return (void *)db;
+exit_free:
+	free(db);
+	return NULL;
+}
+
+static struct nfp_hwinfo *
+nfp_hwinfo_fetch(struct nfp_cpp *cpp, size_t *hwdb_size)
+{
+	struct timespec wait;
+	struct nfp_hwinfo *db;
+	int count;
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 10000000;
+	count = 0;
+
+	for (;;) {
+		db = nfp_hwinfo_try_fetch(cpp, hwdb_size);
+		if (db)
+			return db;
+
+		nanosleep(&wait, NULL);
+		if (count++ > 200) {
+			printf("NFP access error\n");
+			return NULL;
+		}
+	}
+}
+
+struct nfp_hwinfo *
+nfp_hwinfo_read(struct nfp_cpp *cpp)
+{
+	struct nfp_hwinfo *db;
+	size_t hwdb_size = 0;
+	int err;
+
+	db = nfp_hwinfo_fetch(cpp, &hwdb_size);
+	if (!db)
+		return NULL;
+
+	err = nfp_hwinfo_db_validate(db, hwdb_size);
+	if (err) {
+		free(db);
+		return NULL;
+	}
+	return db;
+}
+
+/*
+ * nfp_hwinfo_lookup() - Find a value in the HWInfo table by name
+ * @hwinfo:	NFP HWinfo table
+ * @lookup:	HWInfo name to search for
+ *
+ * Return: Value of the HWInfo name, or NULL
+ */
+const char *
+nfp_hwinfo_lookup(struct nfp_hwinfo *hwinfo, const char *lookup)
+{
+	const char *key, *val, *end;
+
+	if (!hwinfo || !lookup)
+		return NULL;
+
+	end = hwinfo->data + hwinfo->size - sizeof(uint32_t);
+
+	for (key = hwinfo->data; *key && key < end;
+	     key = val + strlen(val) + 1) {
+		val = key + strlen(key) + 1;
+
+		if (strcmp(key, lookup) == 0)
+			return val;
+	}
+
+	return NULL;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_hwinfo.h b/drivers/net/nfp/nfpcore/nfp_hwinfo.h
new file mode 100644
index 0000000..e9a9b49
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_hwinfo.h
@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_HWINFO_H__
+#define __NFP_HWINFO_H__
+
+#include <inttypes.h>
+
+#define HWINFO_SIZE_MIN	0x100
+
+/*
+ * The Hardware Info Table defines the properties of the system.
+ *
+ * HWInfo v1 Table (fixed size)
+ *
+ * 0x0000: uint32_t version	        Hardware Info Table version (1.0)
+ * 0x0004: uint32_t size	        Total size of the table, including the
+ *					CRC32 (IEEE 802.3)
+ * 0x0008: uint32_t jumptab	        Offset of key/value table
+ * 0x000c: uint32_t keys	        Total number of keys in the key/value
+ *					table
+ * NNNNNN:				Key/value jump table and string data
+ * (size - 4): uint32_t crc32	CRC32 (same as IEEE 802.3, POSIX csum, etc)
+ *				CRC32("",0) = ~0, CRC32("a",1) = 0x48C279FE
+ *
+ * HWInfo v2 Table (variable size)
+ *
+ * 0x0000: uint32_t version	        Hardware Info Table version (2.0)
+ * 0x0004: uint32_t size	        Current size of the data area, excluding
+ *					CRC32
+ * 0x0008: uint32_t limit	        Maximum size of the table
+ * 0x000c: uint32_t reserved	        Unused, set to zero
+ * NNNNNN:			Key/value data
+ * (size - 4): uint32_t crc32	CRC32 (same as IEEE 802.3, POSIX csum, etc)
+ *				CRC32("",0) = ~0, CRC32("a",1) = 0x48C279FE
+ *
+ * If the HWInfo table is in the process of being updated, the low bit of
+ * version will be set.
+ *
+ * HWInfo v1 Key/Value Table
+ * -------------------------
+ *
+ *  The key/value table is a set of offsets to ASCIIZ strings which have
+ *  been strcmp(3) sorted (yes, please use bsearch(3) on the table).
+ *
+ *  All keys are guaranteed to be unique.
+ *
+ * N+0:	uint32_t key_1		Offset to the first key
+ * N+4:	uint32_t val_1		Offset to the first value
+ * N+8: uint32_t key_2		Offset to the second key
+ * N+c: uint32_t val_2		Offset to the second value
+ * ...
+ *
+ * HWInfo v2 Key/Value Table
+ * -------------------------
+ *
+ * Packed UTF8Z strings, ie 'key1\000value1\000key2\000value2\000'
+ *
+ * Unsorted.
+ */
+
+#define NFP_HWINFO_VERSION_1 ('H' << 24 | 'I' << 16 | 1 << 8 | 0 << 1 | 0)
+#define NFP_HWINFO_VERSION_2 ('H' << 24 | 'I' << 16 | 2 << 8 | 0 << 1 | 0)
+#define NFP_HWINFO_VERSION_UPDATING	BIT(0)
+
+struct nfp_hwinfo {
+	uint8_t start[0];
+
+	uint32_t version;
+	uint32_t size;
+
+	/* v2 specific fields */
+	uint32_t limit;
+	uint32_t resv;
+
+	char data[];
+};
+
+struct nfp_hwinfo *nfp_hwinfo_read(struct nfp_cpp *cpp);
+
+const char *nfp_hwinfo_lookup(struct nfp_hwinfo *hwinfo, const char *lookup);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_mip.c b/drivers/net/nfp/nfpcore/nfp_mip.c
new file mode 100644
index 0000000..6e14d06
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_mip.c
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <rte_byteorder.h>
+
+#include "nfp_cpp.h"
+#include "nfp_mip.h"
+#include "nfp_nffw.h"
+
+#define NFP_MIP_SIGNATURE	rte_cpu_to_le_32(0x0050494d)  /* "MIP\0" */
+#define NFP_MIP_VERSION		rte_cpu_to_le_32(1)
+#define NFP_MIP_MAX_OFFSET	(256 * 1024)
+
+struct nfp_mip {
+	uint32_t signature;
+	uint32_t mip_version;
+	uint32_t mip_size;
+	uint32_t first_entry;
+
+	uint32_t version;
+	uint32_t buildnum;
+	uint32_t buildtime;
+	uint32_t loadtime;
+
+	uint32_t symtab_addr;
+	uint32_t symtab_size;
+	uint32_t strtab_addr;
+	uint32_t strtab_size;
+
+	char name[16];
+	char toolchain[32];
+};
+
+/* Read memory and check if it could be a valid MIP */
+static int
+nfp_mip_try_read(struct nfp_cpp *cpp, uint32_t cpp_id, uint64_t addr,
+		 struct nfp_mip *mip)
+{
+	int ret;
+
+	ret = nfp_cpp_read(cpp, cpp_id, addr, mip, sizeof(*mip));
+	if (ret != sizeof(*mip)) {
+		printf("Failed to read MIP data (%d, %zu)\n",
+			ret, sizeof(*mip));
+		return -EIO;
+	}
+	if (mip->signature != NFP_MIP_SIGNATURE) {
+		printf("Incorrect MIP signature (0x%08x)\n",
+			 rte_le_to_cpu_32(mip->signature));
+		return -EINVAL;
+	}
+	if (mip->mip_version != NFP_MIP_VERSION) {
+		printf("Unsupported MIP version (%d)\n",
+			 rte_le_to_cpu_32(mip->mip_version));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Try to locate MIP using the resource table */
+static int
+nfp_mip_read_resource(struct nfp_cpp *cpp, struct nfp_mip *mip)
+{
+	struct nfp_nffw_info *nffw_info;
+	uint32_t cpp_id;
+	uint64_t addr;
+	int err;
+
+	nffw_info = nfp_nffw_info_open(cpp);
+	if (!nffw_info)
+		return -ENODEV;
+
+	err = nfp_nffw_info_mip_first(nffw_info, &cpp_id, &addr);
+	if (err)
+		goto exit_close_nffw;
+
+	err = nfp_mip_try_read(cpp, cpp_id, addr, mip);
+exit_close_nffw:
+	nfp_nffw_info_close(nffw_info);
+	return err;
+}
+
+/*
+ * nfp_mip_open() - Get device MIP structure
+ * @cpp:	NFP CPP Handle
+ *
+ * Copy MIP structure from NFP device and return it.  The returned
+ * structure is handled internally by the library and should be
+ * freed by calling nfp_mip_close().
+ *
+ * Return: pointer to mip, NULL on failure.
+ */
+struct nfp_mip *
+nfp_mip_open(struct nfp_cpp *cpp)
+{
+	struct nfp_mip *mip;
+	int err;
+
+	mip = malloc(sizeof(*mip));
+	if (!mip)
+		return NULL;
+
+	err = nfp_mip_read_resource(cpp, mip);
+	if (err) {
+		free(mip);
+		return NULL;
+	}
+
+	mip->name[sizeof(mip->name) - 1] = 0;
+
+	return mip;
+}
+
+void
+nfp_mip_close(struct nfp_mip *mip)
+{
+	free(mip);
+}
+
+const char *
+nfp_mip_name(const struct nfp_mip *mip)
+{
+	return mip->name;
+}
+
+/*
+ * nfp_mip_symtab() - Get the address and size of the MIP symbol table
+ * @mip:	MIP handle
+ * @addr:	Location for NFP DDR address of MIP symbol table
+ * @size:	Location for size of MIP symbol table
+ */
+void
+nfp_mip_symtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size)
+{
+	*addr = rte_le_to_cpu_32(mip->symtab_addr);
+	*size = rte_le_to_cpu_32(mip->symtab_size);
+}
+
+/*
+ * nfp_mip_strtab() - Get the address and size of the MIP symbol name table
+ * @mip:	MIP handle
+ * @addr:	Location for NFP DDR address of MIP symbol name table
+ * @size:	Location for size of MIP symbol name table
+ */
+void
+nfp_mip_strtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size)
+{
+	*addr = rte_le_to_cpu_32(mip->strtab_addr);
+	*size = rte_le_to_cpu_32(mip->strtab_size);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_mip.h b/drivers/net/nfp/nfpcore/nfp_mip.h
new file mode 100644
index 0000000..48d8ef3
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_mip.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_MIP_H__
+#define __NFP_MIP_H__
+
+#include "nfp_nffw.h"
+
+struct nfp_mip;
+
+struct nfp_mip *nfp_mip_open(struct nfp_cpp *cpp);
+void nfp_mip_close(struct nfp_mip *mip);
+
+const char *nfp_mip_name(const struct nfp_mip *mip);
+void nfp_mip_symtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size);
+void nfp_mip_strtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size);
+int nfp_nffw_info_mip_first(struct nfp_nffw_info *state, uint32_t *cpp_id,
+			    uint64_t *off);
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_mutex.c b/drivers/net/nfp/nfpcore/nfp_mutex.c
new file mode 100644
index 0000000..76dc143
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_mutex.c
@@ -0,0 +1,450 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <errno.h>
+
+#include <malloc.h>
+#include <time.h>
+#include <sched.h>
+
+#include "nfp_cpp.h"
+#include "nfp6000/nfp6000.h"
+
+#define MUTEX_LOCKED(interface)  ((((uint32_t)(interface)) << 16) | 0x000f)
+#define MUTEX_UNLOCK(interface)  (0                               | 0x0000)
+
+#define MUTEX_IS_LOCKED(value)   (((value) & 0xffff) == 0x000f)
+#define MUTEX_IS_UNLOCKED(value) (((value) & 0xffff) == 0x0000)
+#define MUTEX_INTERFACE(value)   (((value) >> 16) & 0xffff)
+
+/*
+ * If you need more than 65536 recursive locks, please
+ * rethink your code.
+ */
+#define MUTEX_DEPTH_MAX         0xffff
+
+struct nfp_cpp_mutex {
+	struct nfp_cpp *cpp;
+	uint8_t target;
+	uint16_t depth;
+	unsigned long long address;
+	uint32_t key;
+	unsigned int usage;
+	struct nfp_cpp_mutex *prev, *next;
+};
+
+static int
+_nfp_cpp_mutex_validate(uint32_t model, int *target, unsigned long long address)
+{
+	/* Address must be 64-bit aligned */
+	if (address & 7)
+		return NFP_ERRNO(EINVAL);
+
+	if (NFP_CPP_MODEL_IS_6000(model)) {
+		if (*target != NFP_CPP_TARGET_MU)
+			return NFP_ERRNO(EINVAL);
+	} else {
+		return NFP_ERRNO(EINVAL);
+	}
+
+	return 0;
+}
+
+/*
+ * Initialize a mutex location
+ *
+ * The CPP target:address must point to a 64-bit aligned location, and
+ * will initialize 64 bits of data at the location.
+ *
+ * This creates the initial mutex state, as locked by this
+ * nfp_cpp_interface().
+ *
+ * This function should only be called when setting up
+ * the initial lock state upon boot-up of the system.
+ *
+ * @param mutex     NFP CPP Mutex handle
+ * @param target    NFP CPP target ID (ie NFP_CPP_TARGET_CLS or
+ *		    NFP_CPP_TARGET_MU)
+ * @param address   Offset into the address space of the NFP CPP target ID
+ * @param key       Unique 32-bit value for this mutex
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_cpp_mutex_init(struct nfp_cpp *cpp, int target, unsigned long long address,
+		   uint32_t key)
+{
+	uint32_t model = nfp_cpp_model(cpp);
+	uint32_t muw = NFP_CPP_ID(target, 4, 0);	/* atomic_write */
+	int err;
+
+	err = _nfp_cpp_mutex_validate(model, &target, address);
+	if (err < 0)
+		return err;
+
+	err = nfp_cpp_writel(cpp, muw, address + 4, key);
+	if (err < 0)
+		return err;
+
+	err =
+	    nfp_cpp_writel(cpp, muw, address + 0,
+			   MUTEX_LOCKED(nfp_cpp_interface(cpp)));
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+/*
+ * Create a mutex handle from an address controlled by a MU Atomic engine
+ *
+ * The CPP target:address must point to a 64-bit aligned location, and
+ * reserve 64 bits of data at the location for use by the handle.
+ *
+ * Only target/address pairs that point to entities that support the
+ * MU Atomic Engine are supported.
+ *
+ * @param cpp       NFP CPP handle
+ * @param target    NFP CPP target ID (ie NFP_CPP_TARGET_CLS or
+ *		    NFP_CPP_TARGET_MU)
+ * @param address   Offset into the address space of the NFP CPP target ID
+ * @param key       32-bit unique key (must match the key at this location)
+ *
+ * @return      A non-NULL struct nfp_cpp_mutex * on success, NULL on failure.
+ */
+struct nfp_cpp_mutex *
+nfp_cpp_mutex_alloc(struct nfp_cpp *cpp, int target,
+		     unsigned long long address, uint32_t key)
+{
+	uint32_t model = nfp_cpp_model(cpp);
+	struct nfp_cpp_mutex *mutex;
+	uint32_t mur = NFP_CPP_ID(target, 3, 0);	/* atomic_read */
+	int err;
+	uint32_t tmp;
+
+	/* Look for cached mutex */
+	for (mutex = cpp->mutex_cache; mutex; mutex = mutex->next) {
+		if (mutex->target == target && mutex->address == address)
+			break;
+	}
+
+	if (mutex) {
+		if (mutex->key == key) {
+			mutex->usage++;
+			return mutex;
+		}
+
+		/* If the key doesn't match... */
+		return NFP_ERRPTR(EEXIST);
+	}
+
+	err = _nfp_cpp_mutex_validate(model, &target, address);
+	if (err < 0)
+		return NULL;
+
+	err = nfp_cpp_readl(cpp, mur, address + 4, &tmp);
+	if (err < 0)
+		return NULL;
+
+	if (tmp != key)
+		return NFP_ERRPTR(EEXIST);
+
+	mutex = calloc(sizeof(*mutex), 1);
+	if (!mutex)
+		return NFP_ERRPTR(ENOMEM);
+
+	mutex->cpp = cpp;
+	mutex->target = target;
+	mutex->address = address;
+	mutex->key = key;
+	mutex->depth = 0;
+	mutex->usage = 1;
+
+	/* Add mutex to the cache */
+	if (cpp->mutex_cache) {
+		cpp->mutex_cache->prev = mutex;
+		mutex->next = cpp->mutex_cache;
+		cpp->mutex_cache = mutex;
+	} else {
+		cpp->mutex_cache = mutex;
+	}
+
+	return mutex;
+}
+
+struct nfp_cpp *
+nfp_cpp_mutex_cpp(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->cpp;
+}
+
+uint32_t
+nfp_cpp_mutex_key(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->key;
+}
+
+uint16_t
+nfp_cpp_mutex_owner(struct nfp_cpp_mutex *mutex)
+{
+	uint32_t mur = NFP_CPP_ID(mutex->target, 3, 0);	/* atomic_read */
+	uint32_t value, key;
+	int err;
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address, &value);
+	if (err < 0)
+		return err;
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address + 4, &key);
+	if (err < 0)
+		return err;
+
+	if (key != mutex->key)
+		return NFP_ERRNO(EPERM);
+
+	if (!MUTEX_IS_LOCKED(value))
+		return 0;
+
+	return MUTEX_INTERFACE(value);
+}
+
+int
+nfp_cpp_mutex_target(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->target;
+}
+
+uint64_t
+nfp_cpp_mutex_address(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->address;
+}
+
+/*
+ * Free a mutex handle - does not alter the lock state
+ *
+ * @param mutex     NFP CPP Mutex handle
+ */
+void
+nfp_cpp_mutex_free(struct nfp_cpp_mutex *mutex)
+{
+	mutex->usage--;
+	if (mutex->usage > 0)
+		return;
+
+	/* Remove mutex from the cache */
+	if (mutex->next)
+		mutex->next->prev = mutex->prev;
+	if (mutex->prev)
+		mutex->prev->next = mutex->next;
+
+	/* If mutex->cpp == NULL, something broke */
+	if (mutex->cpp && mutex == mutex->cpp->mutex_cache)
+		mutex->cpp->mutex_cache = mutex->next;
+
+	free(mutex);
+}
+
+/*
+ * Lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex     NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
+{
+	int err;
+	time_t warn_at = time(NULL) + 15;
+
+	while ((err = nfp_cpp_mutex_trylock(mutex)) != 0) {
+		/* If errno != EBUSY, then the lock was damaged */
+		if (err < 0 && errno != EBUSY)
+			return err;
+		if (time(NULL) >= warn_at) {
+			printf("Warning: waiting for NFP mutex\n");
+			printf("\tusage:%hd\n", mutex->usage);
+			printf("\tdepth:%hd]\n", mutex->depth);
+			printf("\ttarget:%d\n", mutex->target);
+			printf("\taddr:%llx\n", mutex->address);
+			printf("\tkey:%08x]\n", mutex->key);
+			warn_at = time(NULL) + 60;
+		}
+		sched_yield();
+	}
+	return 0;
+}
+
+/*
+ * Unlock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex     NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_cpp_mutex_unlock(struct nfp_cpp_mutex *mutex)
+{
+	uint32_t muw = NFP_CPP_ID(mutex->target, 4, 0);	/* atomic_write */
+	uint32_t mur = NFP_CPP_ID(mutex->target, 3, 0);	/* atomic_read */
+	struct nfp_cpp *cpp = mutex->cpp;
+	uint32_t key, value;
+	uint16_t interface = nfp_cpp_interface(cpp);
+	int err;
+
+	if (mutex->depth > 1) {
+		mutex->depth--;
+		return 0;
+	}
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address, &value);
+	if (err < 0)
+		goto exit;
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address + 4, &key);
+	if (err < 0)
+		goto exit;
+
+	if (key != mutex->key) {
+		err = NFP_ERRNO(EPERM);
+		goto exit;
+	}
+
+	if (value != MUTEX_LOCKED(interface)) {
+		err = NFP_ERRNO(EACCES);
+		goto exit;
+	}
+
+	err = nfp_cpp_writel(cpp, muw, mutex->address, MUTEX_UNLOCK(interface));
+	if (err < 0)
+		goto exit;
+
+	mutex->depth = 0;
+
+exit:
+	return err;
+}
+
+/*
+ * Attempt to lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * Valid lock states:
+ *
+ *      0x....0000      - Unlocked
+ *      0x....000f      - Locked
+ *
+ * @param mutex     NFP CPP Mutex handle
+ * @return      0 if the lock succeeded, -1 on failure (and errno set
+ *		appropriately).
+ */
+int
+nfp_cpp_mutex_trylock(struct nfp_cpp_mutex *mutex)
+{
+	uint32_t mur = NFP_CPP_ID(mutex->target, 3, 0);	/* atomic_read */
+	uint32_t muw = NFP_CPP_ID(mutex->target, 4, 0);	/* atomic_write */
+	uint32_t mus = NFP_CPP_ID(mutex->target, 5, 3);	/* test_set_imm */
+	uint32_t key, value, tmp;
+	struct nfp_cpp *cpp = mutex->cpp;
+	int err;
+
+	if (mutex->depth > 0) {
+		if (mutex->depth == MUTEX_DEPTH_MAX)
+			return NFP_ERRNO(E2BIG);
+
+		mutex->depth++;
+		return 0;
+	}
+
+	/* Verify that the lock marker is not damaged */
+	err = nfp_cpp_readl(cpp, mur, mutex->address + 4, &key);
+	if (err < 0)
+		goto exit;
+
+	if (key != mutex->key) {
+		err = NFP_ERRNO(EPERM);
+		goto exit;
+	}
+
+	/*
+	 * Compare against the unlocked state, and if true,
+	 * write the interface id into the top 16 bits, and
+	 * mark as locked.
+	 */
+	value = MUTEX_LOCKED(nfp_cpp_interface(cpp));
+
+	/*
+	 * We use test_set_imm here, as it implies a read
+	 * of the current state, and sets the bits in the
+	 * bytemask of the command to 1s. Since the mutex
+	 * is guaranteed to be 64-bit aligned, the bytemask
+	 * of this 32-bit command is ensured to be 8'b00001111,
+	 * which implies that the lower 4 bits will be set to
+	 * ones regardless of the initial state.
+	 *
+	 * Since this is a 'Readback' operation, with no Pull
+	 * data, we can treat this as a normal Push (read)
+	 * atomic, which returns the original value.
+	 */
+	err = nfp_cpp_readl(cpp, mus, mutex->address, &tmp);
+	if (err < 0)
+		goto exit;
+
+	/* Was it unlocked? */
+	if (MUTEX_IS_UNLOCKED(tmp)) {
+		/*
+		 * The read value can only be 0x....0000 in the unlocked state.
+		 * If there was another contending for this lock, then
+		 * the lock state would be 0x....000f
+		 *
+		 * Write our owner ID into the lock
+		 * While not strictly necessary, this helps with
+		 * debug and bookkeeping.
+		 */
+		err = nfp_cpp_writel(cpp, muw, mutex->address, value);
+		if (err < 0)
+			goto exit;
+
+		mutex->depth = 1;
+		goto exit;
+	}
+
+	/* Already locked by us? Success! */
+	if (tmp == value) {
+		mutex->depth = 1;
+		goto exit;
+	}
+
+	err = NFP_ERRNO(MUTEX_IS_LOCKED(tmp) ? EBUSY : EINVAL);
+
+exit:
+	return err;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nffw.c b/drivers/net/nfp/nfpcore/nfp_nffw.c
new file mode 100644
index 0000000..b8e5331
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nffw.c
@@ -0,0 +1,261 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "nfp_cpp.h"
+#include "nfp_nffw.h"
+#include "nfp_mip.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp_resource.h"
+
+/*
+ * flg_info_version = flags[0]<27:16>
+ * This is a small version counter intended only to detect if the current
+ * implementation can read the current struct. Struct changes should be very
+ * rare and as such a 12-bit counter should cover large spans of time. By the
+ * time it wraps around, we don't expect to have 4096 versions of this struct
+ * to be in use at the same time.
+ */
+static uint32_t
+nffw_res_info_version_get(const struct nfp_nffw_info_data *res)
+{
+	return (res->flags[0] >> 16) & 0xfff;
+}
+
+/* flg_init = flags[0]<0> */
+static uint32_t
+nffw_res_flg_init_get(const struct nfp_nffw_info_data *res)
+{
+	return (res->flags[0] >> 0) & 1;
+}
+
+/* loaded = loaded__mu_da__mip_off_hi<31:31> */
+static uint32_t
+nffw_fwinfo_loaded_get(const struct nffw_fwinfo *fi)
+{
+	return (fi->loaded__mu_da__mip_off_hi >> 31) & 1;
+}
+
+/* mip_cppid = mip_cppid */
+static uint32_t
+nffw_fwinfo_mip_cppid_get(const struct nffw_fwinfo *fi)
+{
+	return fi->mip_cppid;
+}
+
+/* loaded = loaded__mu_da__mip_off_hi<8:8> */
+static uint32_t
+nffw_fwinfo_mip_mu_da_get(const struct nffw_fwinfo *fi)
+{
+	return (fi->loaded__mu_da__mip_off_hi >> 8) & 1;
+}
+
+/* mip_offset = (loaded__mu_da__mip_off_hi<7:0> << 8) | mip_offset_lo */
+static uint64_t
+nffw_fwinfo_mip_offset_get(const struct nffw_fwinfo *fi)
+{
+	uint64_t mip_off_hi = fi->loaded__mu_da__mip_off_hi;
+
+	return (mip_off_hi & 0xFF) << 32 | fi->mip_offset_lo;
+}
+
+#define NFP_IMB_TGTADDRESSMODECFG_MODE_of(_x)		(((_x) >> 13) & 0x7)
+#define NFP_IMB_TGTADDRESSMODECFG_ADDRMODE		BIT(12)
+#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_32_BIT	0
+#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_40_BIT	BIT(12)
+
+static int
+nfp_mip_mu_locality_lsb(struct nfp_cpp *cpp)
+{
+	unsigned int mode, addr40;
+	uint32_t xpbaddr, imbcppat;
+	int err;
+
+	/* Hardcoded XPB IMB Base, island 0 */
+	xpbaddr = 0x000a0000 + NFP_CPP_TARGET_MU * 4;
+	err = nfp_xpb_readl(cpp, xpbaddr, &imbcppat);
+	if (err < 0)
+		return err;
+
+	mode = NFP_IMB_TGTADDRESSMODECFG_MODE_of(imbcppat);
+	addr40 = !!(imbcppat & NFP_IMB_TGTADDRESSMODECFG_ADDRMODE);
+
+	return nfp_cppat_mu_locality_lsb(mode, addr40);
+}
+
+static unsigned int
+nffw_res_fwinfos(struct nfp_nffw_info_data *fwinf, struct nffw_fwinfo **arr)
+{
+	/*
+	 * For the this code, version 0 is most likely to be version 1 in this
+	 * case. Since the kernel driver does not take responsibility for
+	 * initialising the nfp.nffw resource, any previous code (CA firmware or
+	 * userspace) that left the version 0 and did set the init flag is going
+	 * to be version 1.
+	 */
+	switch (nffw_res_info_version_get(fwinf)) {
+	case 0:
+	case 1:
+		*arr = &fwinf->info.v1.fwinfo[0];
+		return NFFW_FWINFO_CNT_V1;
+	case 2:
+		*arr = &fwinf->info.v2.fwinfo[0];
+		return NFFW_FWINFO_CNT_V2;
+	default:
+		*arr = NULL;
+		return 0;
+	}
+}
+
+/*
+ * nfp_nffw_info_open() - Acquire the lock on the NFFW table
+ * @cpp:	NFP CPP handle
+ *
+ * Return: 0, or -ERRNO
+ */
+struct nfp_nffw_info *
+nfp_nffw_info_open(struct nfp_cpp *cpp)
+{
+	struct nfp_nffw_info_data *fwinf;
+	struct nfp_nffw_info *state;
+	uint32_t info_ver;
+	int err;
+
+	state = malloc(sizeof(*state));
+	if (!state)
+		return NULL;
+
+	memset(state, 0, sizeof(*state));
+
+	state->res = nfp_resource_acquire(cpp, NFP_RESOURCE_NFP_NFFW);
+	if (!state->res)
+		goto err_free;
+
+	fwinf = &state->fwinf;
+
+	if (sizeof(*fwinf) > nfp_resource_size(state->res))
+		goto err_release;
+
+	err = nfp_cpp_read(cpp, nfp_resource_cpp_id(state->res),
+			   nfp_resource_address(state->res),
+			   fwinf, sizeof(*fwinf));
+	if (err < (int)sizeof(*fwinf))
+		goto err_release;
+
+	if (!nffw_res_flg_init_get(fwinf))
+		goto err_release;
+
+	info_ver = nffw_res_info_version_get(fwinf);
+	if (info_ver > NFFW_INFO_VERSION_CURRENT)
+		goto err_release;
+
+	state->cpp = cpp;
+	return state;
+
+err_release:
+	nfp_resource_release(state->res);
+err_free:
+	free(state);
+	return NULL;
+}
+
+/*
+ * nfp_nffw_info_release() - Release the lock on the NFFW table
+ * @state:	NFP FW info state
+ *
+ * Return: 0, or -ERRNO
+ */
+void
+nfp_nffw_info_close(struct nfp_nffw_info *state)
+{
+	nfp_resource_release(state->res);
+	free(state);
+}
+
+/*
+ * nfp_nffw_info_fwid_first() - Return the first firmware ID in the NFFW
+ * @state:	NFP FW info state
+ *
+ * Return: First NFFW firmware info, NULL on failure
+ */
+static struct nffw_fwinfo *
+nfp_nffw_info_fwid_first(struct nfp_nffw_info *state)
+{
+	struct nffw_fwinfo *fwinfo;
+	unsigned int cnt, i;
+
+	cnt = nffw_res_fwinfos(&state->fwinf, &fwinfo);
+	if (!cnt)
+		return NULL;
+
+	for (i = 0; i < cnt; i++)
+		if (nffw_fwinfo_loaded_get(&fwinfo[i]))
+			return &fwinfo[i];
+
+	return NULL;
+}
+
+/*
+ * nfp_nffw_info_mip_first() - Retrieve the location of the first FW's MIP
+ * @state:	NFP FW info state
+ * @cpp_id:	Pointer to the CPP ID of the MIP
+ * @off:	Pointer to the CPP Address of the MIP
+ *
+ * Return: 0, or -ERRNO
+ */
+int
+nfp_nffw_info_mip_first(struct nfp_nffw_info *state, uint32_t *cpp_id,
+			uint64_t *off)
+{
+	struct nffw_fwinfo *fwinfo;
+
+	fwinfo = nfp_nffw_info_fwid_first(state);
+	if (!fwinfo)
+		return -EINVAL;
+
+	*cpp_id = nffw_fwinfo_mip_cppid_get(fwinfo);
+	*off = nffw_fwinfo_mip_offset_get(fwinfo);
+
+	if (nffw_fwinfo_mip_mu_da_get(fwinfo)) {
+		int locality_off;
+
+		if (NFP_CPP_ID_TARGET_of(*cpp_id) != NFP_CPP_TARGET_MU)
+			return 0;
+
+		locality_off = nfp_mip_mu_locality_lsb(state->cpp);
+		if (locality_off < 0)
+			return locality_off;
+
+		*off &= ~(NFP_MU_ADDR_ACCESS_TYPE_MASK << locality_off);
+		*off |= NFP_MU_ADDR_ACCESS_TYPE_DIRECT << locality_off;
+	}
+
+	return 0;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nffw.h b/drivers/net/nfp/nfpcore/nfp_nffw.h
new file mode 100644
index 0000000..926ec67
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nffw.h
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_NFFW_H__
+#define __NFP_NFFW_H__
+
+#include "nfp-common/nfp_platform.h"
+#include "nfp_cpp.h"
+
+/*
+ * Init-CSR owner IDs for firmware map to firmware IDs which start at 4.
+ * Lower IDs are reserved for target and loader IDs.
+ */
+#define NFFW_FWID_EXT   3	/* For active MEs that we didn't load. */
+#define NFFW_FWID_BASE  4
+
+#define NFFW_FWID_ALL   255
+
+/* Init-CSR owner IDs for firmware map to firmware IDs which start at 4.
+ * Lower IDs are reserved for target and loader IDs.
+ */
+#define NFFW_FWID_EXT   3 /* For active MEs that we didn't load. */
+#define NFFW_FWID_BASE  4
+
+#define NFFW_FWID_ALL   255
+
+/**
+ * NFFW_INFO_VERSION history:
+ * 0: This was never actually used (before versioning), but it refers to
+ *    the previous struct which had FWINFO_CNT = MEINFO_CNT = 120 that later
+ *    changed to 200.
+ * 1: First versioned struct, with
+ *     FWINFO_CNT = 120
+ *     MEINFO_CNT = 120
+ * 2:  FWINFO_CNT = 200
+ *     MEINFO_CNT = 200
+ */
+#define NFFW_INFO_VERSION_CURRENT 2
+
+/* Enough for all current chip families */
+#define NFFW_MEINFO_CNT_V1 120
+#define NFFW_FWINFO_CNT_V1 120
+#define NFFW_MEINFO_CNT_V2 200
+#define NFFW_FWINFO_CNT_V2 200
+
+struct nffw_meinfo {
+	uint32_t ctxmask__fwid__meid;
+};
+
+struct nffw_fwinfo {
+	uint32_t loaded__mu_da__mip_off_hi;
+	uint32_t mip_cppid; /* 0 means no MIP */
+	uint32_t mip_offset_lo;
+};
+
+struct nfp_nffw_info_v1 {
+	struct nffw_meinfo meinfo[NFFW_MEINFO_CNT_V1];
+	struct nffw_fwinfo fwinfo[NFFW_FWINFO_CNT_V1];
+};
+
+struct nfp_nffw_info_v2 {
+	struct nffw_meinfo meinfo[NFFW_MEINFO_CNT_V2];
+	struct nffw_fwinfo fwinfo[NFFW_FWINFO_CNT_V2];
+};
+
+struct nfp_nffw_info_data {
+	uint32_t flags[2];
+	union {
+		struct nfp_nffw_info_v1 v1;
+		struct nfp_nffw_info_v2 v2;
+	} info;
+};
+
+struct nfp_nffw_info {
+	struct nfp_cpp *cpp;
+	struct nfp_resource *res;
+
+	struct nfp_nffw_info_data fwinf;
+};
+
+struct nfp_nffw_info *nfp_nffw_info_open(struct nfp_cpp *cpp);
+void nfp_nffw_info_close(struct nfp_nffw_info *state);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp.c b/drivers/net/nfp/nfpcore/nfp_nsp.c
new file mode 100644
index 0000000..6254f9a
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp.c
@@ -0,0 +1,453 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define NFP_SUBSYS "nfp_nsp"
+
+#include <stdio.h>
+#include <time.h>
+
+#include <rte_common.h>
+
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+#include "nfp_resource.h"
+
+int
+nfp_nsp_config_modified(struct nfp_nsp *state)
+{
+	return state->modified;
+}
+
+void
+nfp_nsp_config_set_modified(struct nfp_nsp *state, int modified)
+{
+	state->modified = modified;
+}
+
+void *
+nfp_nsp_config_entries(struct nfp_nsp *state)
+{
+	return state->entries;
+}
+
+unsigned int
+nfp_nsp_config_idx(struct nfp_nsp *state)
+{
+	return state->idx;
+}
+
+void
+nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries, unsigned int idx)
+{
+	state->entries = entries;
+	state->idx = idx;
+}
+
+void
+nfp_nsp_config_clear_state(struct nfp_nsp *state)
+{
+	state->entries = NULL;
+	state->idx = 0;
+}
+
+static void
+nfp_nsp_print_extended_error(uint32_t ret_val)
+{
+	int i;
+
+	if (!ret_val)
+		return;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nsp_errors); i++)
+		if (ret_val == (uint32_t)nsp_errors[i].code)
+			printf("err msg: %s\n", nsp_errors[i].msg);
+}
+
+static int
+nfp_nsp_check(struct nfp_nsp *state)
+{
+	struct nfp_cpp *cpp = state->cpp;
+	uint64_t nsp_status, reg;
+	uint32_t nsp_cpp;
+	int err;
+
+	nsp_cpp = nfp_resource_cpp_id(state->res);
+	nsp_status = nfp_resource_address(state->res) + NSP_STATUS;
+
+	err = nfp_cpp_readq(cpp, nsp_cpp, nsp_status, &reg);
+	if (err < 0)
+		return err;
+
+	if (FIELD_GET(NSP_STATUS_MAGIC, reg) != NSP_MAGIC) {
+		printf("Cannot detect NFP Service Processor\n");
+		return -ENODEV;
+	}
+
+	state->ver.major = FIELD_GET(NSP_STATUS_MAJOR, reg);
+	state->ver.minor = FIELD_GET(NSP_STATUS_MINOR, reg);
+
+	if (state->ver.major != NSP_MAJOR || state->ver.minor < NSP_MINOR) {
+		printf("Unsupported ABI %hu.%hu\n", state->ver.major,
+						    state->ver.minor);
+		return -EINVAL;
+	}
+
+	if (reg & NSP_STATUS_BUSY) {
+		printf("Service processor busy!\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/*
+ * nfp_nsp_open() - Prepare for communication and lock the NSP resource.
+ * @cpp:	NFP CPP Handle
+ */
+struct nfp_nsp *
+nfp_nsp_open(struct nfp_cpp *cpp)
+{
+	struct nfp_resource *res;
+	struct nfp_nsp *state;
+	int err;
+
+	res = nfp_resource_acquire(cpp, NFP_RESOURCE_NSP);
+	if (!res)
+		return NULL;
+
+	state = malloc(sizeof(*state));
+	if (!state) {
+		nfp_resource_release(res);
+		return NULL;
+	}
+	memset(state, 0, sizeof(*state));
+	state->cpp = cpp;
+	state->res = res;
+
+	err = nfp_nsp_check(state);
+	if (err) {
+		nfp_nsp_close(state);
+		return NULL;
+	}
+
+	return state;
+}
+
+/*
+ * nfp_nsp_close() - Clean up and unlock the NSP resource.
+ * @state:	NFP SP state
+ */
+void
+nfp_nsp_close(struct nfp_nsp *state)
+{
+	nfp_resource_release(state->res);
+	free(state);
+}
+
+uint16_t
+nfp_nsp_get_abi_ver_major(struct nfp_nsp *state)
+{
+	return state->ver.major;
+}
+
+uint16_t
+nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state)
+{
+	return state->ver.minor;
+}
+
+static int
+nfp_nsp_wait_reg(struct nfp_cpp *cpp, uint64_t *reg, uint32_t nsp_cpp,
+		 uint64_t addr, uint64_t mask, uint64_t val)
+{
+	struct timespec wait;
+	int count;
+	int err;
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 25000000;
+	count = 0;
+
+	for (;;) {
+		err = nfp_cpp_readq(cpp, nsp_cpp, addr, reg);
+		if (err < 0)
+			return err;
+
+		if ((*reg & mask) == val)
+			return 0;
+
+		nanosleep(&wait, 0);
+		if (count++ > 1000)
+			return -ETIMEDOUT;
+	}
+}
+
+/*
+ * nfp_nsp_command() - Execute a command on the NFP Service Processor
+ * @state:	NFP SP state
+ * @code:	NFP SP Command Code
+ * @option:	NFP SP Command Argument
+ * @buff_cpp:	NFP SP Buffer CPP Address info
+ * @buff_addr:	NFP SP Buffer Host address
+ *
+ * Return: 0 for success with no result
+ *
+ *	 positive value for NSP completion with a result code
+ *
+ *	-EAGAIN if the NSP is not yet present
+ *	-ENODEV if the NSP is not a supported model
+ *	-EBUSY if the NSP is stuck
+ *	-EINTR if interrupted while waiting for completion
+ *	-ETIMEDOUT if the NSP took longer than 30 seconds to complete
+ */
+static int
+nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
+		uint32_t buff_cpp, uint64_t buff_addr)
+{
+	uint64_t reg, ret_val, nsp_base, nsp_buffer, nsp_status, nsp_command;
+	struct nfp_cpp *cpp = state->cpp;
+	uint32_t nsp_cpp;
+	int err;
+
+	nsp_cpp = nfp_resource_cpp_id(state->res);
+	nsp_base = nfp_resource_address(state->res);
+	nsp_status = nsp_base + NSP_STATUS;
+	nsp_command = nsp_base + NSP_COMMAND;
+	nsp_buffer = nsp_base + NSP_BUFFER;
+
+	err = nfp_nsp_check(state);
+	if (err)
+		return err;
+
+	if (!FIELD_FIT(NSP_BUFFER_CPP, buff_cpp >> 8) ||
+	    !FIELD_FIT(NSP_BUFFER_ADDRESS, buff_addr)) {
+		printf("Host buffer out of reach %08x %" PRIx64 "\n",
+			buff_cpp, buff_addr);
+		return -EINVAL;
+	}
+
+	err = nfp_cpp_writeq(cpp, nsp_cpp, nsp_buffer,
+			     FIELD_PREP(NSP_BUFFER_CPP, buff_cpp >> 8) |
+			     FIELD_PREP(NSP_BUFFER_ADDRESS, buff_addr));
+	if (err < 0)
+		return err;
+
+	err = nfp_cpp_writeq(cpp, nsp_cpp, nsp_command,
+			     FIELD_PREP(NSP_COMMAND_OPTION, option) |
+			     FIELD_PREP(NSP_COMMAND_CODE, code) |
+			     FIELD_PREP(NSP_COMMAND_START, 1));
+	if (err < 0)
+		return err;
+
+	/* Wait for NSP_COMMAND_START to go to 0 */
+	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_command,
+			       NSP_COMMAND_START, 0);
+	if (err) {
+		printf("Error %d waiting for code 0x%04x to start\n",
+			err, code);
+		return err;
+	}
+
+	/* Wait for NSP_STATUS_BUSY to go to 0 */
+	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_status, NSP_STATUS_BUSY,
+			       0);
+	if (err) {
+		printf("Error %d waiting for code 0x%04x to complete\n",
+			err, code);
+		return err;
+	}
+
+	err = nfp_cpp_readq(cpp, nsp_cpp, nsp_command, &ret_val);
+	if (err < 0)
+		return err;
+	ret_val = FIELD_GET(NSP_COMMAND_OPTION, ret_val);
+
+	err = FIELD_GET(NSP_STATUS_RESULT, reg);
+	if (err) {
+		printf("Result (error) code set: %d (%d) command: %d\n",
+			 -err, (int)ret_val, code);
+		nfp_nsp_print_extended_error(ret_val);
+		return -err;
+	}
+
+	return ret_val;
+}
+
+#define SZ_1M 0x00100000
+
+static int
+nfp_nsp_command_buf(struct nfp_nsp *nsp, uint16_t code, uint32_t option,
+		    const void *in_buf, unsigned int in_size, void *out_buf,
+		    unsigned int out_size)
+{
+	struct nfp_cpp *cpp = nsp->cpp;
+	unsigned int max_size;
+	uint64_t reg, cpp_buf;
+	int ret, err;
+	uint32_t cpp_id;
+
+	if (nsp->ver.minor < 13) {
+		printf("NSP: Code 0x%04x with buffer not supported\n", code);
+		printf("\t(ABI %hu.%hu)\n", nsp->ver.major, nsp->ver.minor);
+		return -EOPNOTSUPP;
+	}
+
+	err = nfp_cpp_readq(cpp, nfp_resource_cpp_id(nsp->res),
+			    nfp_resource_address(nsp->res) +
+			    NSP_DFLT_BUFFER_CONFIG,
+			    &reg);
+	if (err < 0)
+		return err;
+
+	max_size = RTE_MAX(in_size, out_size);
+	if (FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M < max_size) {
+		printf("NSP: default buffer too small for command 0x%04x\n",
+		       code);
+		printf("\t(%llu < %u)\n",
+		       FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M,
+		       max_size);
+		return -EINVAL;
+	}
+
+	err = nfp_cpp_readq(cpp, nfp_resource_cpp_id(nsp->res),
+			    nfp_resource_address(nsp->res) +
+			    NSP_DFLT_BUFFER,
+			    &reg);
+	if (err < 0)
+		return err;
+
+	cpp_id = FIELD_GET(NSP_BUFFER_CPP, reg) << 8;
+	cpp_buf = FIELD_GET(NSP_BUFFER_ADDRESS, reg);
+
+	if (in_buf && in_size) {
+		err = nfp_cpp_write(cpp, cpp_id, cpp_buf, in_buf, in_size);
+		if (err < 0)
+			return err;
+	}
+	/* Zero out remaining part of the buffer */
+	if (out_buf && out_size && out_size > in_size) {
+		memset(out_buf, 0, out_size - in_size);
+		err = nfp_cpp_write(cpp, cpp_id, cpp_buf + in_size, out_buf,
+				    out_size - in_size);
+		if (err < 0)
+			return err;
+	}
+
+	ret = nfp_nsp_command(nsp, code, option, cpp_id, cpp_buf);
+	if (ret < 0)
+		return ret;
+
+	if (out_buf && out_size) {
+		err = nfp_cpp_read(cpp, cpp_id, cpp_buf, out_buf, out_size);
+		if (err < 0)
+			return err;
+	}
+
+	return ret;
+}
+
+int
+nfp_nsp_wait(struct nfp_nsp *state)
+{
+	struct timespec wait;
+	int count;
+	int err;
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 25000000;
+	count = 0;
+
+	for (;;) {
+		err = nfp_nsp_command(state, SPCODE_NOOP, 0, 0, 0);
+		if (err != -EAGAIN)
+			break;
+
+		nanosleep(&wait, 0);
+
+		if (count++ > 1000) {
+			err = -ETIMEDOUT;
+			break;
+		}
+	}
+	if (err)
+		printf("NSP failed to respond %d\n", err);
+
+	return err;
+}
+
+int
+nfp_nsp_device_soft_reset(struct nfp_nsp *state)
+{
+	return nfp_nsp_command(state, SPCODE_SOFT_RESET, 0, 0, 0);
+}
+
+int
+nfp_nsp_mac_reinit(struct nfp_nsp *state)
+{
+	return nfp_nsp_command(state, SPCODE_MAC_INIT, 0, 0, 0);
+}
+
+int
+nfp_nsp_load_fw(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_FW_LOAD, size, buf, size,
+				   NULL, 0);
+}
+
+int
+nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_ETH_RESCAN, size, NULL, 0,
+				   buf, size);
+}
+
+int
+nfp_nsp_write_eth_table(struct nfp_nsp *state, const void *buf,
+			unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_ETH_CONTROL, size, buf, size,
+				   NULL, 0);
+}
+
+int
+nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_NSP_IDENTIFY, size, NULL, 0,
+				   buf, size);
+}
+
+int
+nfp_nsp_read_sensors(struct nfp_nsp *state, unsigned int sensor_mask, void *buf,
+		     unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_NSP_SENSORS, sensor_mask, NULL,
+				   0, buf, size);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp.h b/drivers/net/nfp/nfpcore/nfp_nsp.h
new file mode 100644
index 0000000..37c861a
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp.h
@@ -0,0 +1,330 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NSP_NSP_H
+#define NSP_NSP_H 1
+
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+
+#define GENMASK_ULL(h, l) \
+	(((~0ULL) - (1ULL << (l)) + 1) & \
+	 (~0ULL >> (64 - 1 - (h))))
+
+#define __bf_shf(x) (__builtin_ffsll(x) - 1)
+
+#define FIELD_GET(_mask, _reg)	\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		(typeof(_x))(((_reg) & (_x)) >> __bf_shf(_x));	\
+	}))
+
+#define FIELD_FIT(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		!((((typeof(_x))_val) << __bf_shf(_x)) & ~(_x)); \
+	}))
+
+#define FIELD_PREP(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		((typeof(_x))(_val) << __bf_shf(_x)) & (_x);	\
+	}))
+
+/* Offsets relative to the CSR base */
+#define NSP_STATUS		0x00
+#define   NSP_STATUS_MAGIC	GENMASK_ULL(63, 48)
+#define   NSP_STATUS_MAJOR	GENMASK_ULL(47, 44)
+#define   NSP_STATUS_MINOR	GENMASK_ULL(43, 32)
+#define   NSP_STATUS_CODE	GENMASK_ULL(31, 16)
+#define   NSP_STATUS_RESULT	GENMASK_ULL(15, 8)
+#define   NSP_STATUS_BUSY	BIT_ULL(0)
+
+#define NSP_COMMAND		0x08
+#define   NSP_COMMAND_OPTION	GENMASK_ULL(63, 32)
+#define   NSP_COMMAND_CODE	GENMASK_ULL(31, 16)
+#define   NSP_COMMAND_START	BIT_ULL(0)
+
+/* CPP address to retrieve the data from */
+#define NSP_BUFFER		0x10
+#define   NSP_BUFFER_CPP	GENMASK_ULL(63, 40)
+#define   NSP_BUFFER_PCIE	GENMASK_ULL(39, 38)
+#define   NSP_BUFFER_ADDRESS	GENMASK_ULL(37, 0)
+
+#define NSP_DFLT_BUFFER		0x18
+
+#define NSP_DFLT_BUFFER_CONFIG	0x20
+#define   NSP_DFLT_BUFFER_SIZE_MB	GENMASK_ULL(7, 0)
+
+#define NSP_MAGIC		0xab10
+#define NSP_MAJOR		0
+#define NSP_MINOR		8
+
+#define NSP_CODE_MAJOR		GENMASK(15, 12)
+#define NSP_CODE_MINOR		GENMASK(11, 0)
+
+enum nfp_nsp_cmd {
+	SPCODE_NOOP		= 0, /* No operation */
+	SPCODE_SOFT_RESET	= 1, /* Soft reset the NFP */
+	SPCODE_FW_DEFAULT	= 2, /* Load default (UNDI) FW */
+	SPCODE_PHY_INIT		= 3, /* Initialize the PHY */
+	SPCODE_MAC_INIT		= 4, /* Initialize the MAC */
+	SPCODE_PHY_RXADAPT	= 5, /* Re-run PHY RX Adaptation */
+	SPCODE_FW_LOAD		= 6, /* Load fw from buffer, len in option */
+	SPCODE_ETH_RESCAN	= 7, /* Rescan ETHs, write ETH_TABLE to buf */
+	SPCODE_ETH_CONTROL	= 8, /* Update media config from buffer */
+	SPCODE_NSP_SENSORS	= 12, /* Read NSP sensor(s) */
+	SPCODE_NSP_IDENTIFY	= 13, /* Read NSP version */
+};
+
+static const struct {
+	int code;
+	const char *msg;
+} nsp_errors[] = {
+	{ 6010, "could not map to phy for port" },
+	{ 6011, "not an allowed rate/lanes for port" },
+	{ 6012, "not an allowed rate/lanes for port" },
+	{ 6013, "high/low error, change other port first" },
+	{ 6014, "config not found in flash" },
+};
+
+struct nfp_nsp {
+	struct nfp_cpp *cpp;
+	struct nfp_resource *res;
+	struct {
+		uint16_t major;
+		uint16_t minor;
+	} ver;
+
+	/* Eth table config state */
+	int modified;
+	unsigned int idx;
+	void *entries;
+};
+
+struct nfp_nsp *nfp_nsp_open(struct nfp_cpp *cpp);
+void nfp_nsp_close(struct nfp_nsp *state);
+uint16_t nfp_nsp_get_abi_ver_major(struct nfp_nsp *state);
+uint16_t nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state);
+int nfp_nsp_wait(struct nfp_nsp *state);
+int nfp_nsp_device_soft_reset(struct nfp_nsp *state);
+int nfp_nsp_load_fw(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_mac_reinit(struct nfp_nsp *state);
+int nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_read_sensors(struct nfp_nsp *state, unsigned int sensor_mask,
+			 void *buf, unsigned int size);
+
+static inline int nfp_nsp_has_mac_reinit(struct nfp_nsp *state)
+{
+	return nfp_nsp_get_abi_ver_minor(state) > 20;
+}
+
+enum nfp_eth_interface {
+	NFP_INTERFACE_NONE	= 0,
+	NFP_INTERFACE_SFP	= 1,
+	NFP_INTERFACE_SFPP	= 10,
+	NFP_INTERFACE_SFP28	= 28,
+	NFP_INTERFACE_QSFP	= 40,
+	NFP_INTERFACE_CXP	= 100,
+	NFP_INTERFACE_QSFP28	= 112,
+};
+
+enum nfp_eth_media {
+	NFP_MEDIA_DAC_PASSIVE = 0,
+	NFP_MEDIA_DAC_ACTIVE,
+	NFP_MEDIA_FIBRE,
+};
+
+enum nfp_eth_aneg {
+	NFP_ANEG_AUTO = 0,
+	NFP_ANEG_SEARCH,
+	NFP_ANEG_25G_CONSORTIUM,
+	NFP_ANEG_25G_IEEE,
+	NFP_ANEG_DISABLED,
+};
+
+enum nfp_eth_fec {
+	NFP_FEC_AUTO_BIT = 0,
+	NFP_FEC_BASER_BIT,
+	NFP_FEC_REED_SOLOMON_BIT,
+	NFP_FEC_DISABLED_BIT,
+};
+
+#define NFP_FEC_AUTO		BIT(NFP_FEC_AUTO_BIT)
+#define NFP_FEC_BASER		BIT(NFP_FEC_BASER_BIT)
+#define NFP_FEC_REED_SOLOMON	BIT(NFP_FEC_REED_SOLOMON_BIT)
+#define NFP_FEC_DISABLED	BIT(NFP_FEC_DISABLED_BIT)
+
+#define ETH_ALEN	6
+
+/**
+ * struct nfp_eth_table - ETH table information
+ * @count:	number of table entries
+ * @max_index:	max of @index fields of all @ports
+ * @ports:	table of ports
+ *
+ * @eth_index:	port index according to legacy ethX numbering
+ * @index:	chip-wide first channel index
+ * @nbi:	NBI index
+ * @base:	first channel index (within NBI)
+ * @lanes:	number of channels
+ * @speed:	interface speed (in Mbps)
+ * @interface:	interface (module) plugged in
+ * @media:	media type of the @interface
+ * @fec:	forward error correction mode
+ * @aneg:	auto negotiation mode
+ * @mac_addr:	interface MAC address
+ * @label_port:	port id
+ * @label_subport:  id of interface within port (for split ports)
+ * @enabled:	is enabled?
+ * @tx_enabled:	is TX enabled?
+ * @rx_enabled:	is RX enabled?
+ * @override_changed: is media reconfig pending?
+ *
+ * @port_type:	one of %PORT_* defines for ethtool
+ * @port_lanes:	total number of lanes on the port (sum of lanes of all subports)
+ * @is_split:	is interface part of a split port
+ * @fec_modes_supported:	bitmap of FEC modes supported
+ */
+struct nfp_eth_table {
+	unsigned int count;
+	unsigned int max_index;
+	struct nfp_eth_table_port {
+		unsigned int eth_index;
+		unsigned int index;
+		unsigned int nbi;
+		unsigned int base;
+		unsigned int lanes;
+		unsigned int speed;
+
+		unsigned int interface;
+		enum nfp_eth_media media;
+
+		enum nfp_eth_fec fec;
+		enum nfp_eth_aneg aneg;
+
+		uint8_t mac_addr[ETH_ALEN];
+
+		uint8_t label_port;
+		uint8_t label_subport;
+
+		int enabled;
+		int tx_enabled;
+		int rx_enabled;
+
+		int override_changed;
+
+		/* Computed fields */
+		uint8_t port_type;
+
+		unsigned int port_lanes;
+
+		int is_split;
+
+		unsigned int fec_modes_supported;
+	} ports[0];
+};
+
+struct nfp_eth_table *nfp_eth_read_ports(struct nfp_cpp *cpp);
+
+int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, int enable);
+int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx,
+			   int configed);
+int
+nfp_eth_set_fec(struct nfp_cpp *cpp, unsigned int idx, enum nfp_eth_fec mode);
+
+int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_write_eth_table(struct nfp_nsp *state, const void *buf,
+			    unsigned int size);
+void nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries,
+			      unsigned int idx);
+void nfp_nsp_config_clear_state(struct nfp_nsp *state);
+void nfp_nsp_config_set_modified(struct nfp_nsp *state, int modified);
+void *nfp_nsp_config_entries(struct nfp_nsp *state);
+int nfp_nsp_config_modified(struct nfp_nsp *state);
+unsigned int nfp_nsp_config_idx(struct nfp_nsp *state);
+
+static inline int nfp_eth_can_support_fec(struct nfp_eth_table_port *eth_port)
+{
+	return !!eth_port->fec_modes_supported;
+}
+
+static inline unsigned int
+nfp_eth_supported_fec_modes(struct nfp_eth_table_port *eth_port)
+{
+	return eth_port->fec_modes_supported;
+}
+
+struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx);
+int nfp_eth_config_commit_end(struct nfp_nsp *nsp);
+void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp);
+
+int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode);
+int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed);
+int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes);
+
+/**
+ * struct nfp_nsp_identify - NSP static information
+ * @version:      opaque version string
+ * @flags:        version flags
+ * @br_primary:   branch id of primary bootloader
+ * @br_secondary: branch id of secondary bootloader
+ * @br_nsp:       branch id of NSP
+ * @primary:      version of primarary bootloader
+ * @secondary:    version id of secondary bootloader
+ * @nsp:          version id of NSP
+ * @sensor_mask:  mask of present sensors available on NIC
+ */
+struct nfp_nsp_identify {
+	char version[40];
+	uint8_t flags;
+	uint8_t br_primary;
+	uint8_t br_secondary;
+	uint8_t br_nsp;
+	uint16_t primary;
+	uint16_t secondary;
+	uint16_t nsp;
+	uint64_t sensor_mask;
+};
+
+struct nfp_nsp_identify *__nfp_nsp_identify(struct nfp_nsp *nsp);
+
+enum nfp_nsp_sensor_id {
+	NFP_SENSOR_CHIP_TEMPERATURE,
+	NFP_SENSOR_ASSEMBLY_POWER,
+	NFP_SENSOR_ASSEMBLY_12V_POWER,
+	NFP_SENSOR_ASSEMBLY_3V3_POWER,
+};
+
+int nfp_hwmon_read_sensor(struct nfp_cpp *cpp, enum nfp_nsp_sensor_id id,
+			  long *val);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
new file mode 100644
index 0000000..d9d6013
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <rte_byteorder.h>
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+#include "nfp_nffw.h"
+
+struct nsp_identify {
+	uint8_t version[40];
+	uint8_t flags;
+	uint8_t br_primary;
+	uint8_t br_secondary;
+	uint8_t br_nsp;
+	uint16_t primary;
+	uint16_t secondary;
+	uint16_t nsp;
+	uint8_t reserved[6];
+	uint64_t sensor_mask;
+};
+
+struct nfp_nsp_identify *
+__nfp_nsp_identify(struct nfp_nsp *nsp)
+{
+	struct nfp_nsp_identify *nspi = NULL;
+	struct nsp_identify *ni;
+	int ret;
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 15)
+		return NULL;
+
+	ni = malloc(sizeof(*ni));
+	if (!ni)
+		return NULL;
+
+	memset(ni, 0, sizeof(*ni));
+	ret = nfp_nsp_read_identify(nsp, ni, sizeof(*ni));
+	if (ret < 0) {
+		printf("reading bsp version failed %d\n",
+			ret);
+		goto exit_free;
+	}
+
+	nspi = malloc(sizeof(*nspi));
+	if (!nspi)
+		goto exit_free;
+
+	memset(nspi, 0, sizeof(*nspi));
+	memcpy(nspi->version, ni->version, sizeof(nspi->version));
+	nspi->version[sizeof(nspi->version) - 1] = '\0';
+	nspi->flags = ni->flags;
+	nspi->br_primary = ni->br_primary;
+	nspi->br_secondary = ni->br_secondary;
+	nspi->br_nsp = ni->br_nsp;
+	nspi->primary = rte_le_to_cpu_16(ni->primary);
+	nspi->secondary = rte_le_to_cpu_16(ni->secondary);
+	nspi->nsp = rte_le_to_cpu_16(ni->nsp);
+	nspi->sensor_mask = rte_le_to_cpu_64(ni->sensor_mask);
+
+exit_free:
+	free(ni);
+	return nspi;
+}
+
+struct nfp_sensors {
+	uint32_t chip_temp;
+	uint32_t assembly_power;
+	uint32_t assembly_12v_power;
+	uint32_t assembly_3v3_power;
+};
+
+int
+nfp_hwmon_read_sensor(struct nfp_cpp *cpp, enum nfp_nsp_sensor_id id, long *val)
+{
+	struct nfp_sensors s;
+	struct nfp_nsp *nsp;
+	int ret;
+
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp)
+		return -EIO;
+
+	ret = nfp_nsp_read_sensors(nsp, BIT(id), &s, sizeof(s));
+	nfp_nsp_close(nsp);
+
+	if (ret < 0)
+		return ret;
+
+	switch (id) {
+	case NFP_SENSOR_CHIP_TEMPERATURE:
+		*val = rte_le_to_cpu_32(s.chip_temp);
+		break;
+	case NFP_SENSOR_ASSEMBLY_POWER:
+		*val = rte_le_to_cpu_32(s.assembly_power);
+		break;
+	case NFP_SENSOR_ASSEMBLY_12V_POWER:
+		*val = rte_le_to_cpu_32(s.assembly_12v_power);
+		break;
+	case NFP_SENSOR_ASSEMBLY_3V3_POWER:
+		*val = rte_le_to_cpu_32(s.assembly_3v3_power);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
new file mode 100644
index 0000000..409578c
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
@@ -0,0 +1,691 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+#include "nfp6000/nfp6000.h"
+
+#define GENMASK_ULL(h, l) \
+	(((~0ULL) - (1ULL << (l)) + 1) & \
+	 (~0ULL >> (64 - 1 - (h))))
+
+#define __bf_shf(x) (__builtin_ffsll(x) - 1)
+
+#define FIELD_GET(_mask, _reg)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		(typeof(_x))(((_reg) & (_x)) >> __bf_shf(_x));	\
+	}))
+
+#define FIELD_FIT(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		!((((typeof(_x))_val) << __bf_shf(_x)) & ~(_x)); \
+	}))
+
+#define FIELD_PREP(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		((typeof(_x))(_val) << __bf_shf(_x)) & (_x);	\
+	}))
+
+#define NSP_ETH_NBI_PORT_COUNT		24
+#define NSP_ETH_MAX_COUNT		(2 * NSP_ETH_NBI_PORT_COUNT)
+#define NSP_ETH_TABLE_SIZE		(NSP_ETH_MAX_COUNT *		\
+					 sizeof(union eth_table_entry))
+
+#define NSP_ETH_PORT_LANES		GENMASK_ULL(3, 0)
+#define NSP_ETH_PORT_INDEX		GENMASK_ULL(15, 8)
+#define NSP_ETH_PORT_LABEL		GENMASK_ULL(53, 48)
+#define NSP_ETH_PORT_PHYLABEL		GENMASK_ULL(59, 54)
+#define NSP_ETH_PORT_FEC_SUPP_BASER	BIT_ULL(60)
+#define NSP_ETH_PORT_FEC_SUPP_RS	BIT_ULL(61)
+
+#define NSP_ETH_PORT_LANES_MASK		rte_cpu_to_le_64(NSP_ETH_PORT_LANES)
+
+#define NSP_ETH_STATE_CONFIGURED	BIT_ULL(0)
+#define NSP_ETH_STATE_ENABLED		BIT_ULL(1)
+#define NSP_ETH_STATE_TX_ENABLED	BIT_ULL(2)
+#define NSP_ETH_STATE_RX_ENABLED	BIT_ULL(3)
+#define NSP_ETH_STATE_RATE		GENMASK_ULL(11, 8)
+#define NSP_ETH_STATE_INTERFACE		GENMASK_ULL(19, 12)
+#define NSP_ETH_STATE_MEDIA		GENMASK_ULL(21, 20)
+#define NSP_ETH_STATE_OVRD_CHNG		BIT_ULL(22)
+#define NSP_ETH_STATE_ANEG		GENMASK_ULL(25, 23)
+#define NSP_ETH_STATE_FEC		GENMASK_ULL(27, 26)
+
+#define NSP_ETH_CTRL_CONFIGURED		BIT_ULL(0)
+#define NSP_ETH_CTRL_ENABLED		BIT_ULL(1)
+#define NSP_ETH_CTRL_TX_ENABLED		BIT_ULL(2)
+#define NSP_ETH_CTRL_RX_ENABLED		BIT_ULL(3)
+#define NSP_ETH_CTRL_SET_RATE		BIT_ULL(4)
+#define NSP_ETH_CTRL_SET_LANES		BIT_ULL(5)
+#define NSP_ETH_CTRL_SET_ANEG		BIT_ULL(6)
+#define NSP_ETH_CTRL_SET_FEC		BIT_ULL(7)
+
+/* Which connector port. */
+#define PORT_TP			0x00
+#define PORT_AUI		0x01
+#define PORT_MII		0x02
+#define PORT_FIBRE		0x03
+#define PORT_BNC		0x04
+#define PORT_DA			0x05
+#define PORT_NONE		0xef
+#define PORT_OTHER		0xff
+
+#define SPEED_10		10
+#define SPEED_100		100
+#define SPEED_1000		1000
+#define SPEED_2500		2500
+#define SPEED_5000		5000
+#define SPEED_10000		10000
+#define SPEED_14000		14000
+#define SPEED_20000		20000
+#define SPEED_25000		25000
+#define SPEED_40000		40000
+#define SPEED_50000		50000
+#define SPEED_56000		56000
+#define SPEED_100000		100000
+
+enum nfp_eth_raw {
+	NSP_ETH_RAW_PORT = 0,
+	NSP_ETH_RAW_STATE,
+	NSP_ETH_RAW_MAC,
+	NSP_ETH_RAW_CONTROL,
+
+	NSP_ETH_NUM_RAW
+};
+
+enum nfp_eth_rate {
+	RATE_INVALID = 0,
+	RATE_10M,
+	RATE_100M,
+	RATE_1G,
+	RATE_10G,
+	RATE_25G,
+};
+
+union eth_table_entry {
+	struct {
+		uint64_t port;
+		uint64_t state;
+		uint8_t mac_addr[6];
+		uint8_t resv[2];
+		uint64_t control;
+	};
+	uint64_t raw[NSP_ETH_NUM_RAW];
+};
+
+static const struct {
+	enum nfp_eth_rate rate;
+	unsigned int speed;
+} nsp_eth_rate_tbl[] = {
+	{ RATE_INVALID,	0, },
+	{ RATE_10M,	SPEED_10, },
+	{ RATE_100M,	SPEED_100, },
+	{ RATE_1G,	SPEED_1000, },
+	{ RATE_10G,	SPEED_10000, },
+	{ RATE_25G,	SPEED_25000, },
+};
+
+static unsigned int
+nfp_eth_rate2speed(enum nfp_eth_rate rate)
+{
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nsp_eth_rate_tbl); i++)
+		if (nsp_eth_rate_tbl[i].rate == rate)
+			return nsp_eth_rate_tbl[i].speed;
+
+	return 0;
+}
+
+static unsigned int
+nfp_eth_speed2rate(unsigned int speed)
+{
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nsp_eth_rate_tbl); i++)
+		if (nsp_eth_rate_tbl[i].speed == speed)
+			return nsp_eth_rate_tbl[i].rate;
+
+	return RATE_INVALID;
+}
+
+static void
+nfp_eth_copy_mac_reverse(uint8_t *dst, const uint8_t *src)
+{
+	int i;
+
+	for (i = 0; i < (int)ETH_ALEN; i++)
+		dst[ETH_ALEN - i - 1] = src[i];
+}
+
+static void
+nfp_eth_port_translate(struct nfp_nsp *nsp, const union eth_table_entry *src,
+		       unsigned int index, struct nfp_eth_table_port *dst)
+{
+	unsigned int rate;
+	unsigned int fec;
+	uint64_t port, state;
+
+	port = rte_le_to_cpu_64(src->port);
+	state = rte_le_to_cpu_64(src->state);
+
+	dst->eth_index = FIELD_GET(NSP_ETH_PORT_INDEX, port);
+	dst->index = index;
+	dst->nbi = index / NSP_ETH_NBI_PORT_COUNT;
+	dst->base = index % NSP_ETH_NBI_PORT_COUNT;
+	dst->lanes = FIELD_GET(NSP_ETH_PORT_LANES, port);
+
+	dst->enabled = FIELD_GET(NSP_ETH_STATE_ENABLED, state);
+	dst->tx_enabled = FIELD_GET(NSP_ETH_STATE_TX_ENABLED, state);
+	dst->rx_enabled = FIELD_GET(NSP_ETH_STATE_RX_ENABLED, state);
+
+	rate = nfp_eth_rate2speed(FIELD_GET(NSP_ETH_STATE_RATE, state));
+	dst->speed = dst->lanes * rate;
+
+	dst->interface = FIELD_GET(NSP_ETH_STATE_INTERFACE, state);
+	dst->media = FIELD_GET(NSP_ETH_STATE_MEDIA, state);
+
+	nfp_eth_copy_mac_reverse(dst->mac_addr, src->mac_addr);
+
+	dst->label_port = FIELD_GET(NSP_ETH_PORT_PHYLABEL, port);
+	dst->label_subport = FIELD_GET(NSP_ETH_PORT_LABEL, port);
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 17)
+		return;
+
+	dst->override_changed = FIELD_GET(NSP_ETH_STATE_OVRD_CHNG, state);
+	dst->aneg = FIELD_GET(NSP_ETH_STATE_ANEG, state);
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 22)
+		return;
+
+	fec = FIELD_GET(NSP_ETH_PORT_FEC_SUPP_BASER, port);
+	dst->fec_modes_supported |= fec << NFP_FEC_BASER_BIT;
+	fec = FIELD_GET(NSP_ETH_PORT_FEC_SUPP_RS, port);
+	dst->fec_modes_supported |= fec << NFP_FEC_REED_SOLOMON_BIT;
+	if (dst->fec_modes_supported)
+		dst->fec_modes_supported |= NFP_FEC_AUTO | NFP_FEC_DISABLED;
+
+	dst->fec = 1 << FIELD_GET(NSP_ETH_STATE_FEC, state);
+}
+
+static void
+nfp_eth_calc_port_geometry(struct nfp_eth_table *table)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < table->count; i++) {
+		table->max_index = RTE_MAX(table->max_index,
+					   table->ports[i].index);
+
+		for (j = 0; j < table->count; j++) {
+			if (table->ports[i].label_port !=
+			    table->ports[j].label_port)
+				continue;
+			table->ports[i].port_lanes += table->ports[j].lanes;
+
+			if (i == j)
+				continue;
+			if (table->ports[i].label_subport ==
+			    table->ports[j].label_subport)
+				printf("Port %d subport %d is a duplicate\n",
+					 table->ports[i].label_port,
+					 table->ports[i].label_subport);
+
+			table->ports[i].is_split = 1;
+		}
+	}
+}
+
+static void
+nfp_eth_calc_port_type(struct nfp_eth_table_port *entry)
+{
+	if (entry->interface == NFP_INTERFACE_NONE) {
+		entry->port_type = PORT_NONE;
+		return;
+	}
+
+	if (entry->media == NFP_MEDIA_FIBRE)
+		entry->port_type = PORT_FIBRE;
+	else
+		entry->port_type = PORT_DA;
+}
+
+static struct nfp_eth_table *
+__nfp_eth_read_ports(struct nfp_nsp *nsp)
+{
+	union eth_table_entry *entries;
+	struct nfp_eth_table *table;
+	uint32_t table_sz;
+	int i, j, ret, cnt = 0;
+
+	entries = malloc(NSP_ETH_TABLE_SIZE);
+	if (!entries)
+		return NULL;
+
+	memset(entries, 0, NSP_ETH_TABLE_SIZE);
+	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
+	if (ret < 0) {
+		printf("reading port table failed %d\n", ret);
+		goto err;
+	}
+
+	for (i = 0; i < NSP_ETH_MAX_COUNT; i++)
+		if (entries[i].port & NSP_ETH_PORT_LANES_MASK)
+			cnt++;
+
+	/* Some versions of flash will give us 0 instead of port count. For
+	 * those that give a port count, verify it against the value calculated
+	 * above.
+	 */
+	if (ret && ret != cnt) {
+		printf("table entry count (%d) unmatch entries present (%d)\n",
+		       ret, cnt);
+		goto err;
+	}
+
+	table_sz = sizeof(*table) + sizeof(struct nfp_eth_table_port) * cnt;
+	table = malloc(table_sz);
+	if (!table)
+		goto err;
+
+	memset(table, 0, table_sz);
+	table->count = cnt;
+	for (i = 0, j = 0; i < NSP_ETH_MAX_COUNT; i++)
+		if (entries[i].port & NSP_ETH_PORT_LANES_MASK)
+			nfp_eth_port_translate(nsp, &entries[i], i,
+					       &table->ports[j++]);
+
+	nfp_eth_calc_port_geometry(table);
+	for (i = 0; i < (int)table->count; i++)
+		nfp_eth_calc_port_type(&table->ports[i]);
+
+	free(entries);
+
+	return table;
+
+err:
+	free(entries);
+	return NULL;
+}
+
+/*
+ * nfp_eth_read_ports() - retrieve port information
+ * @cpp:	NFP CPP handle
+ *
+ * Read the port information from the device.  Returned structure should
+ * be freed with kfree() once no longer needed.
+ *
+ * Return: populated ETH table or NULL on error.
+ */
+struct nfp_eth_table *
+nfp_eth_read_ports(struct nfp_cpp *cpp)
+{
+	struct nfp_eth_table *ret;
+	struct nfp_nsp *nsp;
+
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp)
+		return NULL;
+
+	ret = __nfp_eth_read_ports(nsp);
+	nfp_nsp_close(nsp);
+
+	return ret;
+}
+
+struct nfp_nsp *
+nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx)
+{
+	union eth_table_entry *entries;
+	struct nfp_nsp *nsp;
+	int ret;
+
+	entries = malloc(NSP_ETH_TABLE_SIZE);
+	if (!entries)
+		return NULL;
+
+	memset(entries, 0, NSP_ETH_TABLE_SIZE);
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp) {
+		free(entries);
+		return nsp;
+	}
+
+	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
+	if (ret < 0) {
+		printf("reading port table failed %d\n", ret);
+		goto err;
+	}
+
+	if (!(entries[idx].port & NSP_ETH_PORT_LANES_MASK)) {
+		printf("trying to set port state on disabled port %d\n", idx);
+		goto err;
+	}
+
+	nfp_nsp_config_set_state(nsp, entries, idx);
+	return nsp;
+
+err:
+	nfp_nsp_close(nsp);
+	free(entries);
+	return NULL;
+}
+
+void
+nfp_eth_config_cleanup_end(struct nfp_nsp *nsp)
+{
+	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+
+	nfp_nsp_config_set_modified(nsp, 0);
+	nfp_nsp_config_clear_state(nsp);
+	nfp_nsp_close(nsp);
+	free(entries);
+}
+
+/*
+ * nfp_eth_config_commit_end() - perform recorded configuration changes
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ *
+ * Perform the configuration which was requested with __nfp_eth_set_*()
+ * helpers and recorded in @nsp state.  If device was already configured
+ * as requested or no __nfp_eth_set_*() operations were made no NSP command
+ * will be performed.
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_config_commit_end(struct nfp_nsp *nsp)
+{
+	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+	int ret = 1;
+
+	if (nfp_nsp_config_modified(nsp)) {
+		ret = nfp_nsp_write_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
+		ret = ret < 0 ? ret : 0;
+	}
+
+	nfp_eth_config_cleanup_end(nsp);
+
+	return ret;
+}
+
+/*
+ * nfp_eth_set_mod_enable() - set PHY module enable control bit
+ * @cpp:	NFP CPP handle
+ * @idx:	NFP chip-wide port index
+ * @enable:	Desired state
+ *
+ * Enable or disable PHY module (this usually means setting the TX lanes
+ * disable bits).
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, int enable)
+{
+	union eth_table_entry *entries;
+	struct nfp_nsp *nsp;
+	uint64_t reg;
+
+	nsp = nfp_eth_config_start(cpp, idx);
+	if (!nsp)
+		return -1;
+
+	entries = nfp_nsp_config_entries(nsp);
+
+	/* Check if we are already in requested state */
+	reg = rte_le_to_cpu_64(entries[idx].state);
+	if (enable != (int)FIELD_GET(NSP_ETH_CTRL_ENABLED, reg)) {
+		reg = rte_le_to_cpu_64(entries[idx].control);
+		reg &= ~NSP_ETH_CTRL_ENABLED;
+		reg |= FIELD_PREP(NSP_ETH_CTRL_ENABLED, enable);
+		entries[idx].control = rte_cpu_to_le_64(reg);
+
+		nfp_nsp_config_set_modified(nsp, 1);
+	}
+
+	return nfp_eth_config_commit_end(nsp);
+}
+
+/*
+ * nfp_eth_set_configured() - set PHY module configured control bit
+ * @cpp:	NFP CPP handle
+ * @idx:	NFP chip-wide port index
+ * @configed:	Desired state
+ *
+ * Set the ifup/ifdown state on the PHY.
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, int configed)
+{
+	union eth_table_entry *entries;
+	struct nfp_nsp *nsp;
+	uint64_t reg;
+
+	nsp = nfp_eth_config_start(cpp, idx);
+	if (!nsp)
+		return -EIO;
+
+	/*
+	 * Older ABI versions did support this feature, however this has only
+	 * been reliable since ABI 20.
+	 */
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 20) {
+		nfp_eth_config_cleanup_end(nsp);
+		return -EOPNOTSUPP;
+	}
+
+	entries = nfp_nsp_config_entries(nsp);
+
+	/* Check if we are already in requested state */
+	reg = rte_le_to_cpu_64(entries[idx].state);
+	if (configed != (int)FIELD_GET(NSP_ETH_STATE_CONFIGURED, reg)) {
+		reg = rte_le_to_cpu_64(entries[idx].control);
+		reg &= ~NSP_ETH_CTRL_CONFIGURED;
+		reg |= FIELD_PREP(NSP_ETH_CTRL_CONFIGURED, configed);
+		entries[idx].control = rte_cpu_to_le_64(reg);
+
+		nfp_nsp_config_set_modified(nsp, 1);
+	}
+
+	return nfp_eth_config_commit_end(nsp);
+}
+
+static int
+nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
+		       const uint64_t mask, const unsigned int shift,
+		       unsigned int val, const uint64_t ctrl_bit)
+{
+	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+	unsigned int idx = nfp_nsp_config_idx(nsp);
+	uint64_t reg;
+
+	/*
+	 * Note: set features were added in ABI 0.14 but the error
+	 *	 codes were initially not populated correctly.
+	 */
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 17) {
+		printf("set operations not supported, please update flash\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* Check if we are already in requested state */
+	reg = rte_le_to_cpu_64(entries[idx].raw[raw_idx]);
+	if (val == (reg & mask) >> shift)
+		return 0;
+
+	reg &= ~mask;
+	reg |= (val << shift) & mask;
+	entries[idx].raw[raw_idx] = rte_cpu_to_le_64(reg);
+
+	entries[idx].control |= rte_cpu_to_le_64(ctrl_bit);
+
+	nfp_nsp_config_set_modified(nsp, 1);
+
+	return 0;
+}
+
+#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit)	\
+	(__extension__ ({ \
+		typeof(mask) _x = (mask); \
+		nfp_eth_set_bit_config(nsp, raw_idx, _x, __bf_shf(_x), \
+				       val, ctrl_bit);			\
+	}))
+
+/*
+ * __nfp_eth_set_aneg() - set PHY autonegotiation control bit
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @mode:	Desired autonegotiation mode
+ *
+ * Allow/disallow PHY module to advertise/perform autonegotiation.
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+int
+__nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode)
+{
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
+				      NSP_ETH_STATE_ANEG, mode,
+				      NSP_ETH_CTRL_SET_ANEG);
+}
+
+/*
+ * __nfp_eth_set_fec() - set PHY forward error correction control bit
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @mode:	Desired fec mode
+ *
+ * Set the PHY module forward error correction mode.
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+static int
+__nfp_eth_set_fec(struct nfp_nsp *nsp, enum nfp_eth_fec mode)
+{
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
+				      NSP_ETH_STATE_FEC, mode,
+				      NSP_ETH_CTRL_SET_FEC);
+}
+
+/*
+ * nfp_eth_set_fec() - set PHY forward error correction control mode
+ * @cpp:	NFP CPP handle
+ * @idx:	NFP chip-wide port index
+ * @mode:	Desired fec mode
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_set_fec(struct nfp_cpp *cpp, unsigned int idx, enum nfp_eth_fec mode)
+{
+	struct nfp_nsp *nsp;
+	int err;
+
+	nsp = nfp_eth_config_start(cpp, idx);
+	if (!nsp)
+		return -EIO;
+
+	err = __nfp_eth_set_fec(nsp, mode);
+	if (err) {
+		nfp_eth_config_cleanup_end(nsp);
+		return err;
+	}
+
+	return nfp_eth_config_commit_end(nsp);
+}
+
+/*
+ * __nfp_eth_set_speed() - set interface speed/rate
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @speed:	Desired speed (per lane)
+ *
+ * Set lane speed.  Provided @speed value should be subport speed divided
+ * by number of lanes this subport is spanning (i.e. 10000 for 40G, 25000 for
+ * 50G, etc.)
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+int
+__nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
+{
+	enum nfp_eth_rate rate;
+
+	rate = nfp_eth_speed2rate(speed);
+	if (rate == RATE_INVALID) {
+		printf("could not find matching lane rate for speed %u\n",
+			 speed);
+		return -EINVAL;
+	}
+
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
+				      NSP_ETH_STATE_RATE, rate,
+				      NSP_ETH_CTRL_SET_RATE);
+}
+
+/*
+ * __nfp_eth_set_split() - set interface lane split
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @lanes:	Desired lanes per port
+ *
+ * Set number of lanes in the port.
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+int
+__nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes)
+{
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
+				      lanes, NSP_ETH_CTRL_SET_LANES);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_resource.c b/drivers/net/nfp/nfpcore/nfp_resource.c
new file mode 100644
index 0000000..e4f1efe
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_resource.c
@@ -0,0 +1,291 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <time.h>
+#include <zlib.h>
+#include <endian.h>
+
+#include "nfp_cpp.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp_resource.h"
+#include "nfp_crc.h"
+
+#define NFP_RESOURCE_TBL_TARGET		NFP_CPP_TARGET_MU
+#define NFP_RESOURCE_TBL_BASE		0x8100000000ULL
+
+/* NFP Resource Table self-identifier */
+#define NFP_RESOURCE_TBL_NAME		"nfp.res"
+#define NFP_RESOURCE_TBL_KEY		0x00000000 /* Special key for entry 0 */
+
+#define NFP_RESOURCE_ENTRY_NAME_SZ	8
+
+/*
+ * struct nfp_resource_entry - Resource table entry
+ * @owner:		NFP CPP Lock, interface owner
+ * @key:		NFP CPP Lock, posix_crc32(name, 8)
+ * @region:		Memory region descriptor
+ * @name:		ASCII, zero padded name
+ * @reserved
+ * @cpp_action:		CPP Action
+ * @cpp_token:		CPP Token
+ * @cpp_target:		CPP Target ID
+ * @page_offset:	256-byte page offset into target's CPP address
+ * @page_size:		size, in 256-byte pages
+ */
+struct nfp_resource_entry {
+	struct nfp_resource_entry_mutex {
+		uint32_t owner;
+		uint32_t key;
+	} mutex;
+	struct nfp_resource_entry_region {
+		uint8_t  name[NFP_RESOURCE_ENTRY_NAME_SZ];
+		uint8_t  reserved[5];
+		uint8_t  cpp_action;
+		uint8_t  cpp_token;
+		uint8_t  cpp_target;
+		uint32_t page_offset;
+		uint32_t page_size;
+	} region;
+};
+
+#define NFP_RESOURCE_TBL_SIZE		4096
+#define NFP_RESOURCE_TBL_ENTRIES	(int)(NFP_RESOURCE_TBL_SIZE /	\
+					 sizeof(struct nfp_resource_entry))
+
+struct nfp_resource {
+	char name[NFP_RESOURCE_ENTRY_NAME_SZ + 1];
+	uint32_t cpp_id;
+	uint64_t addr;
+	uint64_t size;
+	struct nfp_cpp_mutex *mutex;
+};
+
+static int
+nfp_cpp_resource_find(struct nfp_cpp *cpp, struct nfp_resource *res)
+{
+	char name_pad[NFP_RESOURCE_ENTRY_NAME_SZ] = {};
+	struct nfp_resource_entry entry;
+	uint32_t cpp_id, key;
+	int ret, i;
+
+	cpp_id = NFP_CPP_ID(NFP_RESOURCE_TBL_TARGET, 3, 0);  /* Atomic read */
+
+	memset(name_pad, 0, NFP_RESOURCE_ENTRY_NAME_SZ);
+	strncpy(name_pad, res->name, sizeof(name_pad));
+
+	/* Search for a matching entry */
+	if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) {
+		printf("Grabbing device lock not supported\n");
+		return -EOPNOTSUPP;
+	}
+	key = nfp_crc32_posix(name_pad, sizeof(name_pad));
+
+	for (i = 0; i < NFP_RESOURCE_TBL_ENTRIES; i++) {
+		uint64_t addr = NFP_RESOURCE_TBL_BASE +
+			sizeof(struct nfp_resource_entry) * i;
+
+		ret = nfp_cpp_read(cpp, cpp_id, addr, &entry, sizeof(entry));
+		if (ret != sizeof(entry))
+			return -EIO;
+
+		if (entry.mutex.key != key)
+			continue;
+
+		/* Found key! */
+		res->mutex =
+			nfp_cpp_mutex_alloc(cpp,
+					    NFP_RESOURCE_TBL_TARGET, addr, key);
+		res->cpp_id = NFP_CPP_ID(entry.region.cpp_target,
+					 entry.region.cpp_action,
+					 entry.region.cpp_token);
+		res->addr = ((uint64_t)entry.region.page_offset) << 8;
+		res->size = (uint64_t)entry.region.page_size << 8;
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+static int
+nfp_resource_try_acquire(struct nfp_cpp *cpp, struct nfp_resource *res,
+			 struct nfp_cpp_mutex *dev_mutex)
+{
+	int err;
+
+	if (nfp_cpp_mutex_lock(dev_mutex))
+		return -EINVAL;
+
+	err = nfp_cpp_resource_find(cpp, res);
+	if (err)
+		goto err_unlock_dev;
+
+	err = nfp_cpp_mutex_trylock(res->mutex);
+	if (err)
+		goto err_res_mutex_free;
+
+	nfp_cpp_mutex_unlock(dev_mutex);
+
+	return 0;
+
+err_res_mutex_free:
+	nfp_cpp_mutex_free(res->mutex);
+err_unlock_dev:
+	nfp_cpp_mutex_unlock(dev_mutex);
+
+	return err;
+}
+
+/*
+ * nfp_resource_acquire() - Acquire a resource handle
+ * @cpp:	NFP CPP handle
+ * @name:	Name of the resource
+ *
+ * NOTE: This function locks the acquired resource
+ *
+ * Return: NFP Resource handle, or ERR_PTR()
+ */
+struct nfp_resource *
+nfp_resource_acquire(struct nfp_cpp *cpp, const char *name)
+{
+	struct nfp_cpp_mutex *dev_mutex;
+	struct nfp_resource *res;
+	int err;
+	struct timespec wait;
+	int count;
+
+	res = malloc(sizeof(*res));
+	if (!res)
+		return NULL;
+
+	memset(res, 0, sizeof(*res));
+
+	strncpy(res->name, name, NFP_RESOURCE_ENTRY_NAME_SZ);
+
+	dev_mutex = nfp_cpp_mutex_alloc(cpp, NFP_RESOURCE_TBL_TARGET,
+					NFP_RESOURCE_TBL_BASE,
+					NFP_RESOURCE_TBL_KEY);
+	if (!dev_mutex) {
+		free(res);
+		return NULL;
+	}
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 1000000;
+	count = 0;
+
+	for (;;) {
+		err = nfp_resource_try_acquire(cpp, res, dev_mutex);
+		if (!err)
+			break;
+		if (err != -EBUSY)
+			goto err_free;
+
+		if (count++ > 1000) {
+			printf("Error: resource %s timed out\n", name);
+			err = -EBUSY;
+			goto err_free;
+		}
+
+		nanosleep(&wait, NULL);
+	}
+
+	nfp_cpp_mutex_free(dev_mutex);
+
+	return res;
+
+err_free:
+	nfp_cpp_mutex_free(dev_mutex);
+	free(res);
+	return NULL;
+}
+
+/*
+ * nfp_resource_release() - Release a NFP Resource handle
+ * @res:	NFP Resource handle
+ *
+ * NOTE: This function implictly unlocks the resource handle
+ */
+void
+nfp_resource_release(struct nfp_resource *res)
+{
+	nfp_cpp_mutex_unlock(res->mutex);
+	nfp_cpp_mutex_free(res->mutex);
+	free(res);
+}
+
+/*
+ * nfp_resource_cpp_id() - Return the cpp_id of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: NFP CPP ID
+ */
+uint32_t
+nfp_resource_cpp_id(const struct nfp_resource *res)
+{
+	return res->cpp_id;
+}
+
+/*
+ * nfp_resource_name() - Return the name of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: const char pointer to the name of the resource
+ */
+const char
+*nfp_resource_name(const struct nfp_resource *res)
+{
+	return res->name;
+}
+
+/*
+ * nfp_resource_address() - Return the address of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: Address of the resource
+ */
+uint64_t
+nfp_resource_address(const struct nfp_resource *res)
+{
+	return res->addr;
+}
+
+/*
+ * nfp_resource_size() - Return the size in bytes of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: Size of the resource in bytes
+ */
+uint64_t
+nfp_resource_size(const struct nfp_resource *res)
+{
+	return res->size;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_resource.h b/drivers/net/nfp/nfpcore/nfp_resource.h
new file mode 100644
index 0000000..2f71d31
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_resource.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NFP_RESOURCE_H
+#define NFP_RESOURCE_H
+
+#include "nfp_cpp.h"
+
+#define NFP_RESOURCE_NFP_NFFW           "nfp.nffw"
+#define NFP_RESOURCE_NFP_HWINFO         "nfp.info"
+#define NFP_RESOURCE_NSP		"nfp.sp"
+
+/**
+ * Opaque handle to a NFP Resource
+ */
+struct nfp_resource;
+
+struct nfp_resource *nfp_resource_acquire(struct nfp_cpp *cpp,
+					  const char *name);
+
+/**
+ * Release a NFP Resource, and free the handle
+ * @param[in]   res     NFP Resource handle
+ */
+void nfp_resource_release(struct nfp_resource *res);
+
+/**
+ * Return the CPP ID of a NFP Resource
+ * @param[in]   res     NFP Resource handle
+ * @return      CPP ID of the NFP Resource
+ */
+uint32_t nfp_resource_cpp_id(const struct nfp_resource *res);
+
+/**
+ * Return the name of a NFP Resource
+ * @param[in]   res     NFP Resource handle
+ * @return      Name of the NFP Resource
+ */
+const char *nfp_resource_name(const struct nfp_resource *res);
+
+/**
+ * Return the target address of a NFP Resource
+ * @param[in]   res     NFP Resource handle
+ * @return      Address of the NFP Resource
+ */
+uint64_t nfp_resource_address(const struct nfp_resource *res);
+
+uint64_t nfp_resource_size(const struct nfp_resource *res);
+
+#endif /* NFP_RESOURCE_H */
diff --git a/drivers/net/nfp/nfpcore/nfp_rtsym.c b/drivers/net/nfp/nfpcore/nfp_rtsym.c
new file mode 100644
index 0000000..1622395
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_rtsym.c
@@ -0,0 +1,353 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * nfp_rtsym.c
+ * Interface for accessing run-time symbol table
+ */
+
+#include <stdio.h>
+#include <rte_byteorder.h>
+#include "nfp_cpp.h"
+#include "nfp_mip.h"
+#include "nfp_rtsym.h"
+#include "nfp6000/nfp6000.h"
+
+/* These need to match the linker */
+#define SYM_TGT_LMEM		0
+#define SYM_TGT_EMU_CACHE	0x17
+
+struct nfp_rtsym_entry {
+	uint8_t	type;
+	uint8_t	target;
+	uint8_t	island;
+	uint8_t	addr_hi;
+	uint32_t addr_lo;
+	uint16_t name;
+	uint8_t	menum;
+	uint8_t	size_hi;
+	uint32_t size_lo;
+};
+
+struct nfp_rtsym_table {
+	struct nfp_cpp *cpp;
+	int num;
+	char *strtab;
+	struct nfp_rtsym symtab[];
+};
+
+static int
+nfp_meid(uint8_t island_id, uint8_t menum)
+{
+	return (island_id & 0x3F) == island_id && menum < 12 ?
+		(island_id << 4) | (menum + 4) : -1;
+}
+
+static void
+nfp_rtsym_sw_entry_init(struct nfp_rtsym_table *cache, uint32_t strtab_size,
+			struct nfp_rtsym *sw, struct nfp_rtsym_entry *fw)
+{
+	sw->type = fw->type;
+	sw->name = cache->strtab + rte_le_to_cpu_16(fw->name) % strtab_size;
+	sw->addr = ((uint64_t)fw->addr_hi << 32) |
+		   rte_le_to_cpu_32(fw->addr_lo);
+	sw->size = ((uint64_t)fw->size_hi << 32) |
+		   rte_le_to_cpu_32(fw->size_lo);
+
+#ifdef DEBUG
+	printf("rtsym_entry_init\n");
+	printf("\tname=%s, addr=%" PRIx64 ", size=%" PRIu64 ",target=%d\n",
+		sw->name, sw->addr, sw->size, sw->target);
+#endif
+	switch (fw->target) {
+	case SYM_TGT_LMEM:
+		sw->target = NFP_RTSYM_TARGET_LMEM;
+		break;
+	case SYM_TGT_EMU_CACHE:
+		sw->target = NFP_RTSYM_TARGET_EMU_CACHE;
+		break;
+	default:
+		sw->target = fw->target;
+		break;
+	}
+
+	if (fw->menum != 0xff)
+		sw->domain = nfp_meid(fw->island, fw->menum);
+	else if (fw->island != 0xff)
+		sw->domain = fw->island;
+	else
+		sw->domain = -1;
+}
+
+struct nfp_rtsym_table *
+nfp_rtsym_table_read(struct nfp_cpp *cpp)
+{
+	struct nfp_rtsym_table *rtbl;
+	struct nfp_mip *mip;
+
+	mip = nfp_mip_open(cpp);
+	rtbl = __nfp_rtsym_table_read(cpp, mip);
+	nfp_mip_close(mip);
+
+	return rtbl;
+}
+
+/*
+ * This looks more complex than it should be. But we need to get the type for
+ * the ~ right in round_down (it needs to be as wide as the result!), and we
+ * want to evaluate the macro arguments just once each.
+ */
+#define __round_mask(x, y) ((__typeof__(x))((y) - 1))
+
+#define round_up(x, y) \
+	(__extension__ ({ \
+		typeof(x) _x = (x); \
+		((((_x) - 1) | __round_mask(_x, y)) + 1); \
+	}))
+
+#define round_down(x, y) \
+	(__extension__ ({ \
+		typeof(x) _x = (x); \
+		((_x) & ~__round_mask(_x, y)); \
+	}))
+
+struct nfp_rtsym_table *
+__nfp_rtsym_table_read(struct nfp_cpp *cpp, const struct nfp_mip *mip)
+{
+	uint32_t strtab_addr, symtab_addr, strtab_size, symtab_size;
+	struct nfp_rtsym_entry *rtsymtab;
+	struct nfp_rtsym_table *cache;
+	const uint32_t dram =
+		NFP_CPP_ID(NFP_CPP_TARGET_MU, NFP_CPP_ACTION_RW, 0) |
+		NFP_ISL_EMEM0;
+	int err, n, size;
+
+	if (!mip)
+		return NULL;
+
+	nfp_mip_strtab(mip, &strtab_addr, &strtab_size);
+	nfp_mip_symtab(mip, &symtab_addr, &symtab_size);
+
+	if (!symtab_size || !strtab_size || symtab_size % sizeof(*rtsymtab))
+		return NULL;
+
+	/* Align to 64 bits */
+	symtab_size = round_up(symtab_size, 8);
+	strtab_size = round_up(strtab_size, 8);
+
+	rtsymtab = malloc(symtab_size);
+	if (!rtsymtab)
+		return NULL;
+
+	size = sizeof(*cache);
+	size += symtab_size / sizeof(*rtsymtab) * sizeof(struct nfp_rtsym);
+	size +=	strtab_size + 1;
+	cache = malloc(size);
+	if (!cache)
+		goto exit_free_rtsym_raw;
+
+	cache->cpp = cpp;
+	cache->num = symtab_size / sizeof(*rtsymtab);
+	cache->strtab = (void *)&cache->symtab[cache->num];
+
+	err = nfp_cpp_read(cpp, dram, symtab_addr, rtsymtab, symtab_size);
+	if (err != (int)symtab_size)
+		goto exit_free_cache;
+
+	err = nfp_cpp_read(cpp, dram, strtab_addr, cache->strtab, strtab_size);
+	if (err != (int)strtab_size)
+		goto exit_free_cache;
+	cache->strtab[strtab_size] = '\0';
+
+	for (n = 0; n < cache->num; n++)
+		nfp_rtsym_sw_entry_init(cache, strtab_size,
+					&cache->symtab[n], &rtsymtab[n]);
+
+	free(rtsymtab);
+
+	return cache;
+
+exit_free_cache:
+	free(cache);
+exit_free_rtsym_raw:
+	free(rtsymtab);
+	return NULL;
+}
+
+/*
+ * nfp_rtsym_count() - Get the number of RTSYM descriptors
+ * @rtbl:	NFP RTsym table
+ *
+ * Return: Number of RTSYM descriptors
+ */
+int
+nfp_rtsym_count(struct nfp_rtsym_table *rtbl)
+{
+	if (!rtbl)
+		return -EINVAL;
+
+	return rtbl->num;
+}
+
+/*
+ * nfp_rtsym_get() - Get the Nth RTSYM descriptor
+ * @rtbl:	NFP RTsym table
+ * @idx:	Index (0-based) of the RTSYM descriptor
+ *
+ * Return: const pointer to a struct nfp_rtsym descriptor, or NULL
+ */
+const struct nfp_rtsym *
+nfp_rtsym_get(struct nfp_rtsym_table *rtbl, int idx)
+{
+	if (!rtbl)
+		return NULL;
+
+	if (idx >= rtbl->num)
+		return NULL;
+
+	return &rtbl->symtab[idx];
+}
+
+/*
+ * nfp_rtsym_lookup() - Return the RTSYM descriptor for a symbol name
+ * @rtbl:	NFP RTsym table
+ * @name:	Symbol name
+ *
+ * Return: const pointer to a struct nfp_rtsym descriptor, or NULL
+ */
+const struct nfp_rtsym *
+nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char *name)
+{
+	int n;
+
+	if (!rtbl)
+		return NULL;
+
+	for (n = 0; n < rtbl->num; n++)
+		if (strcmp(name, rtbl->symtab[n].name) == 0)
+			return &rtbl->symtab[n];
+
+	return NULL;
+}
+
+/*
+ * nfp_rtsym_read_le() - Read a simple unsigned scalar value from symbol
+ * @rtbl:	NFP RTsym table
+ * @name:	Symbol name
+ * @error:	Poniter to error code (optional)
+ *
+ * Lookup a symbol, map, read it and return it's value. Value of the symbol
+ * will be interpreted as a simple little-endian unsigned value. Symbol can
+ * be 4 or 8 bytes in size.
+ *
+ * Return: value read, on error sets the error and returns ~0ULL.
+ */
+uint64_t
+nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name, int *error)
+{
+	const struct nfp_rtsym *sym;
+	uint32_t val32, id;
+	uint64_t val;
+	int err;
+
+	sym = nfp_rtsym_lookup(rtbl, name);
+	if (!sym) {
+		err = -ENOENT;
+		goto exit;
+	}
+
+	id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
+
+#ifdef DEBUG
+	printf("Reading symbol %s with size %" PRIu64 " at %" PRIx64 "\n",
+		name, sym->size, sym->addr);
+#endif
+	switch (sym->size) {
+	case 4:
+		err = nfp_cpp_readl(rtbl->cpp, id, sym->addr, &val32);
+		val = val32;
+		break;
+	case 8:
+		err = nfp_cpp_readq(rtbl->cpp, id, sym->addr, &val);
+		break;
+	default:
+		printf("rtsym '%s' unsupported size: %" PRId64 "\n",
+			name, sym->size);
+		err = -EINVAL;
+		break;
+	}
+
+	if (err)
+		err = -EIO;
+exit:
+	if (error)
+		*error = err;
+
+	if (err)
+		return ~0ULL;
+
+	return val;
+}
+
+uint8_t *
+nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
+	      unsigned int min_size, struct nfp_cpp_area **area)
+{
+	const struct nfp_rtsym *sym;
+	uint8_t *mem;
+
+#ifdef DEBUG
+	printf("mapping symbol %s\n", name);
+#endif
+	sym = nfp_rtsym_lookup(rtbl, name);
+	if (!sym) {
+		printf("symbol lookup fails for %s\n", name);
+		return NULL;
+	}
+
+	if (sym->size < min_size) {
+		printf("Symbol %s too small (%" PRIu64 " < %u)\n", name,
+			sym->size, min_size);
+		return NULL;
+	}
+
+	mem = nfp_cpp_map_area(rtbl->cpp, sym->domain, sym->target, sym->addr,
+			       sym->size, area);
+	if (!mem) {
+		printf("Failed to map symbol %s\n", name);
+		return NULL;
+	}
+#ifdef DEBUG
+	printf("symbol %s with address %p\n", name, mem);
+#endif
+
+	return mem;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_rtsym.h b/drivers/net/nfp/nfpcore/nfp_rtsym.h
new file mode 100644
index 0000000..7941ecc
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_rtsym.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_RTSYM_H__
+#define __NFP_RTSYM_H__
+
+#define NFP_RTSYM_TYPE_NONE             0
+#define NFP_RTSYM_TYPE_OBJECT           1
+#define NFP_RTSYM_TYPE_FUNCTION         2
+#define NFP_RTSYM_TYPE_ABS              3
+
+#define NFP_RTSYM_TARGET_NONE           0
+#define NFP_RTSYM_TARGET_LMEM           -1
+#define NFP_RTSYM_TARGET_EMU_CACHE      -7
+
+/*
+ * Structure describing a run-time NFP symbol.
+ *
+ * The memory target of the symbol is generally the CPP target number and can be
+ * used directly by the nfp_cpp API calls.  However, in some cases (i.e., for
+ * local memory or control store) the target is encoded using a negative number.
+ *
+ * When the target type can not be used to fully describe the location of a
+ * symbol the domain field is used to further specify the location (i.e., the
+ * specific ME or island number).
+ *
+ * For ME target resources, 'domain' is an MEID.
+ * For Island target resources, 'domain' is an island ID, with the one exception
+ * of "sram" symbols for backward compatibility, which are viewed as global.
+ */
+struct nfp_rtsym {
+	const char *name;
+	uint64_t addr;
+	uint64_t size;
+	int type;
+	int target;
+	int domain;
+};
+
+struct nfp_rtsym_table;
+
+struct nfp_rtsym_table *nfp_rtsym_table_read(struct nfp_cpp *cpp);
+
+struct nfp_rtsym_table *
+__nfp_rtsym_table_read(struct nfp_cpp *cpp, const struct nfp_mip *mip);
+
+int nfp_rtsym_count(struct nfp_rtsym_table *rtbl);
+
+const struct nfp_rtsym *nfp_rtsym_get(struct nfp_rtsym_table *rtbl, int idx);
+
+const struct nfp_rtsym *
+nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char *name);
+
+uint64_t nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name,
+			   int *error);
+uint8_t *
+nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
+	      unsigned int min_size, struct nfp_cpp_area **area);
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_target.h b/drivers/net/nfp/nfpcore/nfp_target.h
new file mode 100644
index 0000000..3b8c1db
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_target.h
@@ -0,0 +1,605 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NFP_TARGET_H
+#define NFP_TARGET_H
+
+#include "nfp-common/nfp_resid.h"
+#include "nfp-common/nfp_cppat.h"
+#include "nfp-common/nfp_platform.h"
+#include "nfp_cpp.h"
+
+#define P32 1
+#define P64 2
+
+#define PUSHPULL(_pull, _push) (((_pull) << 4) | ((_push) << 0))
+
+#ifndef NFP_ERRNO
+#include <errno.h>
+#define NFP_ERRNO(x)    (errno = (x), -1)
+#endif
+
+static inline int
+pushpull_width(int pp)
+{
+	pp &= 0xf;
+
+	if (pp == 0)
+		return NFP_ERRNO(EINVAL);
+	return (2 << pp);
+}
+
+#define PUSH_WIDTH(_pushpull)      pushpull_width((_pushpull) >> 0)
+#define PULL_WIDTH(_pushpull)      pushpull_width((_pushpull) >> 4)
+
+static inline int
+target_rw(uint32_t cpp_id, int pp, int start, int len)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < start || island > (start + len)))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0):
+		return PUSHPULL(0, pp);
+	case NFP_CPP_ID(0, 1, 0):
+		return PUSHPULL(pp, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(pp, pp);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_dma(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiDma */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* WriteNbiDma */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_stats(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiStats */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* WriteNbiStats */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_tm(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiTM */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0):  /* WriteNbiTM */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_ppc(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiPreclassifier */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* WriteNbiPreclassifier */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi(uint32_t cpp_id, uint64_t address)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+	uint64_t rel_addr = address & 0x3fFFFF;
+
+	if (island && (island < 8 || island > 9))
+		return NFP_ERRNO(EINVAL);
+
+	if (rel_addr < (1 << 20))
+		return nfp6000_nbi_dma(cpp_id);
+	if (rel_addr < (2 << 20))
+		return nfp6000_nbi_stats(cpp_id);
+	if (rel_addr < (3 << 20))
+		return nfp6000_nbi_tm(cpp_id);
+	return nfp6000_nbi_ppc(cpp_id);
+}
+
+/*
+ * This structure ONLY includes items that can be done with a read or write of
+ * 32-bit or 64-bit words. All others are not listed.
+ */
+static inline int
+nfp6000_mu_common(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0): /* read_be/write_be */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 1): /* read_le/write_le */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 2): /* {read/write}_swap_be */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 3): /* {read/write}_swap_le */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, 0, 0): /* read_be */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 0, 1): /* read_le */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 0, 2): /* read_swap_be */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 0, 3): /* read_swap_le */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* write_be */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 1, 1): /* write_le */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 1, 2): /* write_swap_be */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 1, 3): /* write_swap_le */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 3, 0): /* atomic_read */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 2): /* mask_compare_write */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 0): /* atomic_write */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 2): /* atomic_write_imm */
+		return PUSHPULL(0, 0);
+	case NFP_CPP_ID(0, 4, 3): /* swap_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 5, 0): /* set */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 5, 3): /* test_set_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 6, 0): /* clr */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 3): /* test_clr_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 7, 0): /* add */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 7, 3): /* test_add_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 8, 0): /* addsat */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 8, 3): /* test_subsat_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 0): /* sub */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 9, 3): /* test_sub_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 10, 0): /* subsat */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 10, 3): /* test_subsat_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 13, 0): /* microq128_get */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 13, 1): /* microq128_pop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 13, 2): /* microq128_put */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 15, 0): /* xor */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 15, 3): /* test_xor_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 0): /* read32_be */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 1): /* read32_le */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 2): /* read32_swap_be */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 3): /* read32_swap_le */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 31, 0): /* write32_be */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 31, 1): /* write32_le */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 31, 2): /* write32_swap_be */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 31, 3): /* write32_swap_le */
+		return PUSHPULL(P32, 0);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_mu_ctm(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 16, 1): /* packet_read_packet_status */
+		return PUSHPULL(0, P32);
+	default:
+		return nfp6000_mu_common(cpp_id);
+	}
+}
+
+static inline int
+nfp6000_mu_emu(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 18, 0): /* read_queue */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 18, 1): /* read_queue_ring */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 18, 2): /* write_queue */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 18, 3): /* write_queue_ring */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 20, 2): /* journal */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 21, 0): /* get */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 21, 1): /* get_eop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 21, 2): /* get_freely */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 22, 0): /* pop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 22, 1): /* pop_eop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 22, 2): /* pop_freely */
+		return PUSHPULL(0, P32);
+	default:
+		return nfp6000_mu_common(cpp_id);
+	}
+}
+
+static inline int
+nfp6000_mu_imu(uint32_t cpp_id)
+{
+	return nfp6000_mu_common(cpp_id);
+}
+
+static inline int
+nfp6000_mu(uint32_t cpp_id, uint64_t address)
+{
+	int pp;
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island == 0) {
+		if (address < 0x2000000000ULL)
+			pp = nfp6000_mu_ctm(cpp_id);
+		else if (address < 0x8000000000ULL)
+			pp = nfp6000_mu_emu(cpp_id);
+		else if (address < 0x9800000000ULL)
+			pp = nfp6000_mu_ctm(cpp_id);
+		else if (address < 0x9C00000000ULL)
+			pp = nfp6000_mu_emu(cpp_id);
+		else if (address < 0xA000000000ULL)
+			pp = nfp6000_mu_imu(cpp_id);
+		else
+			pp = nfp6000_mu_ctm(cpp_id);
+	} else if (island >= 24 && island <= 27) {
+		pp = nfp6000_mu_emu(cpp_id);
+	} else if (island >= 28 && island <= 31) {
+		pp = nfp6000_mu_imu(cpp_id);
+	} else if (island == 1 ||
+		   (island >= 4 && island <= 7) ||
+		   (island >= 12 && island <= 13) ||
+		   (island >= 32 && island <= 47) ||
+		   (island >= 48 && island <= 51)) {
+		pp = nfp6000_mu_ctm(cpp_id);
+	} else {
+		pp = NFP_ERRNO(EINVAL);
+	}
+
+	return pp;
+}
+
+static inline int
+nfp6000_ila(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 48 || island > 51))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 1): /* read_check_error */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 0): /* read_int */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 0): /* write_int */
+		return PUSHPULL(P32, 0);
+	default:
+		return target_rw(cpp_id, P32, 48, 4);
+	}
+}
+
+static inline int
+nfp6000_pci(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 4 || island > 7))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 2, 0):
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 0):
+		return PUSHPULL(P32, 0);
+	default:
+		return target_rw(cpp_id, P32, 4, 4);
+	}
+}
+
+static inline int
+nfp6000_crypto(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 12 || island > 15))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 2, 0):
+		return PUSHPULL(P64, 0);
+	default:
+		return target_rw(cpp_id, P64, 12, 4);
+	}
+}
+
+static inline int
+nfp6000_cap_xpb(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 1 || island > 63))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 1): /* RingGet */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 0, 2): /* Interthread Signal */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 1, 1): /* RingPut */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 1, 2): /* CTNNWr */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 2, 0): /* ReflectRd, signal none */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 1): /* ReflectRd, signal self */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 2): /* ReflectRd, signal remote */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 3): /* ReflectRd, signal both */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 0): /* ReflectWr, signal none */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 3, 1): /* ReflectWr, signal self */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 3, 2): /* ReflectWr, signal remote */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 3, 3): /* ReflectWr, signal both */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 1):
+		return PUSHPULL(P32, P32);
+	default:
+		return target_rw(cpp_id, P32, 1, 63);
+	}
+}
+
+static inline int
+nfp6000_cls(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 1 || island > 63))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 3): /* xor */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 2, 0): /* set */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 2, 1): /* clr */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 0): /* add */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 1): /* add64 */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 0): /* sub */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 1): /* sub64 */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 2): /* subsat */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 8, 2): /* hash_mask */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 8, 3): /* hash_clear */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 9, 0): /* ring_get */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 1): /* ring_pop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 2): /* ring_get_freely */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 3): /* ring_pop_freely */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 10, 0): /* ring_put */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 10, 2): /* ring_journal */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 14, 0): /* reflect_write_sig_local */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 15, 1):  /* reflect_read_sig_local */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 17, 2): /* statistic */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 24, 0): /* ring_read */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 24, 1): /* ring_write */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 25, 0): /* ring_workq_add_thread */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 25, 1): /* ring_workq_add_work */
+		return PUSHPULL(P32, 0);
+	default:
+		return target_rw(cpp_id, P32, 0, 64);
+	}
+}
+
+static inline int
+nfp6000_target_pushpull(uint32_t cpp_id, uint64_t address)
+{
+	switch (NFP_CPP_ID_TARGET_of(cpp_id)) {
+	case NFP6000_CPPTGT_NBI:
+		return nfp6000_nbi(cpp_id, address);
+	case NFP6000_CPPTGT_VQDR:
+		return target_rw(cpp_id, P32, 24, 4);
+	case NFP6000_CPPTGT_ILA:
+		return nfp6000_ila(cpp_id);
+	case NFP6000_CPPTGT_MU:
+		return nfp6000_mu(cpp_id, address);
+	case NFP6000_CPPTGT_PCIE:
+		return nfp6000_pci(cpp_id);
+	case NFP6000_CPPTGT_ARM:
+		if (address < 0x10000)
+			return target_rw(cpp_id, P64, 1, 1);
+		else
+			return target_rw(cpp_id, P32, 1, 1);
+	case NFP6000_CPPTGT_CRYPTO:
+		return nfp6000_crypto(cpp_id);
+	case NFP6000_CPPTGT_CTXPB:
+		return nfp6000_cap_xpb(cpp_id);
+	case NFP6000_CPPTGT_CLS:
+		return nfp6000_cls(cpp_id);
+	case 0:
+		return target_rw(cpp_id, P32, 4, 4);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp_target_pushpull_width(int pp, int write_not_read)
+{
+	if (pp < 0)
+		return pp;
+
+	if (write_not_read)
+		return PULL_WIDTH(pp);
+	else
+		return PUSH_WIDTH(pp);
+}
+
+static inline int
+nfp6000_target_action_width(uint32_t cpp_id, uint64_t address,
+			    int write_not_read)
+{
+	int pp;
+
+	pp = nfp6000_target_pushpull(cpp_id, address);
+
+	return nfp_target_pushpull_width(pp, write_not_read);
+}
+
+static inline int
+nfp_target_action_width(uint32_t model, uint32_t cpp_id, uint64_t address,
+			int write_not_read)
+{
+	if (NFP_CPP_MODEL_IS_6000(model)) {
+		return nfp6000_target_action_width(cpp_id, address,
+						   write_not_read);
+	} else {
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp_target_cpp(uint32_t cpp_island_id, uint64_t cpp_island_address,
+	       uint32_t *cpp_target_id, uint64_t *cpp_target_address,
+	       const uint32_t *imb_table)
+{
+	int err;
+	int island = NFP_CPP_ID_ISLAND_of(cpp_island_id);
+	int target = NFP_CPP_ID_TARGET_of(cpp_island_id);
+	uint32_t imb;
+
+	if (target < 0 || target >= 16)
+		return NFP_ERRNO(EINVAL);
+
+	if (island == 0) {
+		/* Already translated */
+		*cpp_target_id = cpp_island_id;
+		*cpp_target_address = cpp_island_address;
+		return 0;
+	}
+
+	if (!imb_table) {
+		/* CPP + Island only allowed on systems with IMB tables */
+		return NFP_ERRNO(EINVAL);
+	}
+
+	imb = imb_table[target];
+
+	*cpp_target_address = cpp_island_address;
+	err = _nfp6000_cppat_addr_encode(cpp_target_address, island, target,
+					 ((imb >> 13) & 7),
+					 ((imb >> 12) & 1),
+					 ((imb >> 6) & 0x3f),
+					 ((imb >> 0) & 0x3f));
+	if (err == 0) {
+		*cpp_target_id =
+		    NFP_CPP_ID(target, NFP_CPP_ID_ACTION_of(cpp_island_id),
+			       NFP_CPP_ID_TOKEN_of(cpp_island_id));
+	}
+
+	return err;
+}
+
+#endif /* NFP_TARGET_H */
-- 
1.9.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build
  2018-03-23 12:58  3% [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes Adrien Mazarguil
@ 2018-03-23 12:58  4% ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-03-23 12:58 UTC (permalink / raw)
  To: dev; +Cc: Kirill Rybalchenko

Must remain synchronized with its Makefile counterpart.

Fixes: a4b0b30723b2 ("ethdev: remove versioning of ethdev filter control function")
Cc: Kirill Rybalchenko <kirill.rybalchenko@intel.com>

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_ether/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ether/meson.build b/lib/librte_ether/meson.build
index 7fed86056..12bdb6b61 100644
--- a/lib/librte_ether/meson.build
+++ b/lib/librte_ether/meson.build
@@ -2,7 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 name = 'ethdev'
-version = 8
+version = 9
 allow_experimental_apis = true
 sources = files('ethdev_profile.c',
 	'rte_ethdev.c',
-- 
2.11.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes
@ 2018-03-23 12:58  3% Adrien Mazarguil
  2018-03-23 12:58  4% ` [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-03-23 12:58 UTC (permalink / raw)
  To: dev

This series contains several fixes for rte_flow and its implementation in
mlx4 and testpmd. Upcoming work on the flow API depends on it.

Adrien Mazarguil (9):
  net/mlx4: fix RSS resource leak in case of error
  net/mlx4: fix ignored RSS hash types
  app/testpmd: fix flow completion for RSS queues
  app/testpmd: fix lack of flow action configuration
  app/testpmd: fix RSS flow action configuration
  app/testpmd: fix missing RSS fields in flow action
  ethdev: fix shallow copy of flow API RSS action
  ethdev: fix missing boolean values in flow command
  ethdev: fix ABI version in meson build

 app/test-pmd/cmdline_flow.c                 | 255 ++++++++++++++++++++---
 app/test-pmd/config.c                       | 161 +++++++++-----
 app/test-pmd/testpmd.h                      |  13 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   8 +
 drivers/net/mlx4/mlx4_flow.c                |  17 +-
 lib/librte_ether/meson.build                |   2 +-
 lib/librte_ether/rte_flow.c                 | 145 +++++++++----
 7 files changed, 477 insertions(+), 124 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6] eal: provide API for querying valid socket id's
  2018-03-22 12:36  5%       ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
@ 2018-03-22 17:07  0%         ` gowrishankar muthukrishnan
  0 siblings, 0 replies; 200+ results
From: gowrishankar muthukrishnan @ 2018-03-22 17:07 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: Bruce Richardson, thomas, chaozhu

On Thursday 22 March 2018 06:06 PM, Anatoly Burakov wrote:
> During lcore scan, find all socket ID's and store them, and
> provide public API to query valid socket id's. This will break
> the ABI, so bump ABI version.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>

Thanks,
Gowrishankar

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6] eal: provide API for querying valid socket id's
  2018-03-22 10:58  4%     ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
  2018-03-22 11:45  3%       ` Burakov, Anatoly
@ 2018-03-22 12:36  5%       ` Anatoly Burakov
  2018-03-22 17:07  0%         ` gowrishankar muthukrishnan
  1 sibling, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-03-22 12:36 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, thomas, chaozhu, gowrishankar.m

During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v6:
    - Fixed meson ABI version header
    
    v5:
    - Move API to experimental
    - Store list of valid socket id's instead of simply
      recording the biggest one
    
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 75 ++++++++++++++++++++++++++-----
 lib/librte_eal/common/include/rte_eal.h   |  3 ++
 lib/librte_eal/common/include/rte_lcore.h | 30 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/meson.build                |  2 +-
 lib/librte_eal/rte_eal_version.map        |  2 +
 7 files changed, 101 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..50d9f82 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <dirent.h>
 
+#include <rte_errno.h>
 #include <rte_log.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -16,6 +17,19 @@
 #include "eal_private.h"
 #include "eal_thread.h"
 
+static int
+socket_id_cmp(const void *a, const void *b)
+{
+	const int *lcore_id_a = a;
+	const int *lcore_id_b = b;
+
+	if (*lcore_id_a < *lcore_id_b)
+		return -1;
+	if (*lcore_id_a > *lcore_id_b)
+		return 1;
+	return 0;
+}
+
 /*
  * Parse /sys/devices/system/cpu to get the number of physical and logical
  * processors on the machine. The function will fill the cpu_info
@@ -28,6 +42,8 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, prev_socket_id;
+	int lcore_to_socket_id[RTE_MAX_LCORE];
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +55,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		lcore_to_socket_id[lcore_id] = socket_id;
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +83,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +97,38 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	/* sort all socket id's in ascending order */
+	qsort(lcore_to_socket_id, RTE_DIM(lcore_to_socket_id),
+			sizeof(lcore_to_socket_id[0]), socket_id_cmp);
+
+	prev_socket_id = -1;
+	config->numa_node_count = 0;
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		socket_id = lcore_to_socket_id[lcore_id];
+		if (socket_id != prev_socket_id)
+			config->numa_nodes[config->numa_node_count++] =
+					socket_id;
+		prev_socket_id = socket_id;
+	}
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int __rte_experimental
+rte_num_socket_ids(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
+
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	if (idx >= config->numa_node_count) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	return config->numa_nodes[idx];
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 93ca4cc..6109472 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,9 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
+	uint32_t numa_nodes[RTE_MAX_NUMA_NODES];
+	/**< List of detected numa nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 0472220..c6511a9 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -132,6 +132,36 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets detected on the system.
+ *
+ * Note that number of nodes may not be correspondent to their physical id's:
+ * for example, a system may report two socket id's, but the actual socket id's
+ * may be 0 and 8.
+ *
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ */
+unsigned int __rte_experimental
+rte_num_socket_ids(void);
+
+/**
+ * Return socket id with a particular index.
+ *
+ * This will return socket id at a particular position in list of all detected
+ * physical socket id's. For example, on a machine with sockets [0, 8], passing
+ * 1 as a parameter will return 8.
+ *
+ * @param idx
+ *   index of physical socket id to return
+ *
+ * @return
+ *   - physical socket id as recognized by EAL
+ *   - -1 on error, with errno set to EINVAL
+ */
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index d9ba385..8f0ce1b 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -43,7 +43,7 @@ else
 	error('unsupported system type @0@'.format(hostmachine.system()))
 endif
 
-version = 6  # the version of the EAL API
+version = 7  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 cflags += '-D_GNU_SOURCE'
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d88437..6a4c355 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -229,6 +229,8 @@ EXPERIMENTAL {
 	rte_mp_request;
 	rte_mp_request_async;
 	rte_mp_reply;
+	rte_num_socket_ids;
+	rte_socket_id_by_idx;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's
  2018-03-22 10:58  4%     ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
@ 2018-03-22 11:45  3%       ` Burakov, Anatoly
  2018-03-22 12:36  5%       ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-22 11:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, thomas, chaozhu, gowrishankar.m

On 22-Mar-18 10:58 AM, Anatoly Burakov wrote:
> During lcore scan, find all socket ID's and store them, and
> provide public API to query valid socket id's. This will break
> the ABI, so bump ABI version.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Left out meson ABI version, will respin.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's
  2018-02-05 16:37  8%   ` [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
                       ` (2 preceding siblings ...)
  2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
@ 2018-03-22 10:58  4%     ` Anatoly Burakov
  2018-03-22 11:45  3%       ` Burakov, Anatoly
  2018-03-22 12:36  5%       ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
  3 siblings, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-03-22 10:58 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, thomas, chaozhu, gowrishankar.m

During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v5:
    - Move API to experimental
    - Store list of valid socket id's instead of simply
      recording the biggest one
    
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 75 ++++++++++++++++++++++++++-----
 lib/librte_eal/common/include/rte_eal.h   |  3 ++
 lib/librte_eal/common/include/rte_lcore.h | 30 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/rte_eal_version.map        |  2 +
 6 files changed, 100 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..50d9f82 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <dirent.h>
 
+#include <rte_errno.h>
 #include <rte_log.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -16,6 +17,19 @@
 #include "eal_private.h"
 #include "eal_thread.h"
 
+static int
+socket_id_cmp(const void *a, const void *b)
+{
+	const int *lcore_id_a = a;
+	const int *lcore_id_b = b;
+
+	if (*lcore_id_a < *lcore_id_b)
+		return -1;
+	if (*lcore_id_a > *lcore_id_b)
+		return 1;
+	return 0;
+}
+
 /*
  * Parse /sys/devices/system/cpu to get the number of physical and logical
  * processors on the machine. The function will fill the cpu_info
@@ -28,6 +42,8 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, prev_socket_id;
+	int lcore_to_socket_id[RTE_MAX_LCORE];
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +55,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		lcore_to_socket_id[lcore_id] = socket_id;
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +83,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +97,38 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	/* sort all socket id's in ascending order */
+	qsort(lcore_to_socket_id, RTE_DIM(lcore_to_socket_id),
+			sizeof(lcore_to_socket_id[0]), socket_id_cmp);
+
+	prev_socket_id = -1;
+	config->numa_node_count = 0;
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		socket_id = lcore_to_socket_id[lcore_id];
+		if (socket_id != prev_socket_id)
+			config->numa_nodes[config->numa_node_count++] =
+					socket_id;
+		prev_socket_id = socket_id;
+	}
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int __rte_experimental
+rte_num_socket_ids(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
+
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	if (idx >= config->numa_node_count) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	return config->numa_nodes[idx];
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 93ca4cc..6109472 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,9 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
+	uint32_t numa_nodes[RTE_MAX_NUMA_NODES];
+	/**< List of detected numa nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 0472220..c6511a9 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -132,6 +132,36 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets detected on the system.
+ *
+ * Note that number of nodes may not be correspondent to their physical id's:
+ * for example, a system may report two socket id's, but the actual socket id's
+ * may be 0 and 8.
+ *
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ */
+unsigned int __rte_experimental
+rte_num_socket_ids(void);
+
+/**
+ * Return socket id with a particular index.
+ *
+ * This will return socket id at a particular position in list of all detected
+ * physical socket id's. For example, on a machine with sockets [0, 8], passing
+ * 1 as a parameter will return 8.
+ *
+ * @param idx
+ *   index of physical socket id to return
+ *
+ * @return
+ *   - physical socket id as recognized by EAL
+ *   - -1 on error, with errno set to EINVAL
+ */
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d88437..6a4c355 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -229,6 +229,8 @@ EXPERIMENTAL {
 	rte_mp_request;
 	rte_mp_request_async;
 	rte_mp_reply;
+	rte_num_socket_ids;
+	rte_socket_id_by_idx;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 2/5] vhost: support selective datapath
  2018-03-22  7:55  3%       ` Wang, Zhihong
@ 2018-03-22  8:31  0%         ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2018-03-22  8:31 UTC (permalink / raw)
  To: Wang, Zhihong, dev
  Cc: Tan, Jianfeng, Bie, Tiwei, yliu, Liang, Cunming, Wang, Xiao W, Daly, Dan



On 03/22/2018 08:55 AM, Wang, Zhihong wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Thursday, March 22, 2018 5:06 AM
>> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
>> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>; yliu@fridaylinux.org; Liang, Cunming
>> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Daly,
>> Dan <dan.daly@intel.com>
>> Subject: Re: [PATCH v3 2/5] vhost: support selective datapath
>>
>>
>>
>> On 02/27/2018 11:13 AM, Zhihong Wang wrote:
>>> This patch introduces support for selective datapath in DPDK vhost-user lib
>>> to enable various types of virtio-compatible devices to do data transfer
>>> with virtio driver directly to enable acceleration. The default datapath is
>>> the existing software implementation, more options will be available when
>>> new engines are registered.
>>>
>>> An engine is a group of virtio-compatible devices under a single address.
>>> The engine driver includes:
>>>
>>>    1. A set of engine ops is defined in rte_vdpa_eng_ops to perform engine
>>>       init, uninit, and attributes reporting.
>>>
>>>    2. A set of device ops is defined in rte_vdpa_dev_ops for virtio devices
>>>       in the engine to do device specific operations:
>>>
>>>        a. dev_conf: Called to configure the actual device when the virtio
>>>           device becomes ready.
>>>
>>>        b. dev_close: Called to close the actual device when the virtio device
>>>           is stopped.
>>>
>>>        c. vring_state_set: Called to change the state of the vring in the
>>>           actual device when vring state changes.
>>>
>>>        d. feature_set: Called to set the negotiated features to device.
>>>
>>>        e. migration_done: Called to allow the device to response to RARP
>>>           sending.
>>>
>>>        f. get_vfio_group_fd: Called to get the VFIO group fd of the device.
>>>
>>>        g. get_vfio_device_fd: Called to get the VFIO device fd of the device.
>>>
>>>        h. get_notify_area: Called to get the notify area info of the queue.
>>>
>>> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
>>> ---
>>> Changes in v2:
>>>
>>>    1. Add VFIO related vDPA device ops.
>>>
>>>    lib/librte_vhost/Makefile              |   4 +-
>>>    lib/librte_vhost/rte_vdpa.h            | 126
>> +++++++++++++++++++++++++++++++++
>>>    lib/librte_vhost/rte_vhost_version.map |   8 +++
>>>    lib/librte_vhost/vdpa.c                | 124
>> ++++++++++++++++++++++++++++++++
>>>    4 files changed, 260 insertions(+), 2 deletions(-)
>>>    create mode 100644 lib/librte_vhost/rte_vdpa.h
>>>    create mode 100644 lib/librte_vhost/vdpa.c
>>>
>>> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
>>> index 5d6c6abae..37044ac03 100644
>>> --- a/lib/librte_vhost/Makefile
>>> +++ b/lib/librte_vhost/Makefile
>>> @@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
>> lrte_ethdev -lrte_net
>>>
>>>    # all source are stored in SRCS-y
>>>    SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c
>> \
>>> -					vhost_user.c virtio_net.c
>>> +					vhost_user.c virtio_net.c vdpa.c
>>>
>>>    # install includes
>>> -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
>>> +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
>> rte_vdpa.h
>>>
>>>    include $(RTE_SDK)/mk/rte.lib.mk
>>> diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
>>> new file mode 100644
>>> index 000000000..23fb471be
>>> --- /dev/null
>>> +++ b/lib/librte_vhost/rte_vdpa.h
>>> @@ -0,0 +1,126 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright(c) 2018 Intel Corporation
>>> + */
>>> +
>>> +#ifndef _RTE_VDPA_H_
>>> +#define _RTE_VDPA_H_
>>> +
>>> +/**
>>> + * @file
>>> + *
>>> + * Device specific vhost lib
>>> + */
>>> +
>>> +#include <rte_pci.h>
>>> +#include "rte_vhost.h"
>>> +
>>> +#define MAX_VDPA_ENGINE_NUM 128
>>> +#define MAX_VDPA_NAME_LEN 128
>>> +
>>> +struct rte_vdpa_eng_addr {
>>> +	union {
>>> +		uint8_t __dummy[64];
>>> +		struct rte_pci_addr pci_addr;
>> I think we should not only support PCI, but any type of buses.
>> At least in the API.
> 
> Exactly, so we defined a 64 bytes union so any bus types can be added
> without breaking the ABI.

Oh right, I missed that. That's good to me.

> But there is one place that may be impacted is the is_same_eng() function.
> Maybe comparing all the bytes in __dummy[64] is a better way. What do you
> think?

I think that for now that's fine to keep comparing PCI addresses as we 
don't have other bus supported than PCI. My concern was about the API,
and I mistakenly didn't see you already took care of it.

Thanks!
Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 2/5] vhost: support selective datapath
  @ 2018-03-22  7:55  3%       ` Wang, Zhihong
  2018-03-22  8:31  0%         ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Wang, Zhihong @ 2018-03-22  7:55 UTC (permalink / raw)
  To: Maxime Coquelin, dev
  Cc: Tan, Jianfeng, Bie, Tiwei, yliu, Liang, Cunming, Wang, Xiao W, Daly, Dan



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Thursday, March 22, 2018 5:06 AM
> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; yliu@fridaylinux.org; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Daly,
> Dan <dan.daly@intel.com>
> Subject: Re: [PATCH v3 2/5] vhost: support selective datapath
> 
> 
> 
> On 02/27/2018 11:13 AM, Zhihong Wang wrote:
> > This patch introduces support for selective datapath in DPDK vhost-user lib
> > to enable various types of virtio-compatible devices to do data transfer
> > with virtio driver directly to enable acceleration. The default datapath is
> > the existing software implementation, more options will be available when
> > new engines are registered.
> >
> > An engine is a group of virtio-compatible devices under a single address.
> > The engine driver includes:
> >
> >   1. A set of engine ops is defined in rte_vdpa_eng_ops to perform engine
> >      init, uninit, and attributes reporting.
> >
> >   2. A set of device ops is defined in rte_vdpa_dev_ops for virtio devices
> >      in the engine to do device specific operations:
> >
> >       a. dev_conf: Called to configure the actual device when the virtio
> >          device becomes ready.
> >
> >       b. dev_close: Called to close the actual device when the virtio device
> >          is stopped.
> >
> >       c. vring_state_set: Called to change the state of the vring in the
> >          actual device when vring state changes.
> >
> >       d. feature_set: Called to set the negotiated features to device.
> >
> >       e. migration_done: Called to allow the device to response to RARP
> >          sending.
> >
> >       f. get_vfio_group_fd: Called to get the VFIO group fd of the device.
> >
> >       g. get_vfio_device_fd: Called to get the VFIO device fd of the device.
> >
> >       h. get_notify_area: Called to get the notify area info of the queue.
> >
> > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > ---
> > Changes in v2:
> >
> >   1. Add VFIO related vDPA device ops.
> >
> >   lib/librte_vhost/Makefile              |   4 +-
> >   lib/librte_vhost/rte_vdpa.h            | 126
> +++++++++++++++++++++++++++++++++
> >   lib/librte_vhost/rte_vhost_version.map |   8 +++
> >   lib/librte_vhost/vdpa.c                | 124
> ++++++++++++++++++++++++++++++++
> >   4 files changed, 260 insertions(+), 2 deletions(-)
> >   create mode 100644 lib/librte_vhost/rte_vdpa.h
> >   create mode 100644 lib/librte_vhost/vdpa.c
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 5d6c6abae..37044ac03 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
> lrte_ethdev -lrte_net
> >
> >   # all source are stored in SRCS-y
> >   SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c
> \
> > -					vhost_user.c virtio_net.c
> > +					vhost_user.c virtio_net.c vdpa.c
> >
> >   # install includes
> > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> rte_vdpa.h
> >
> >   include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> > new file mode 100644
> > index 000000000..23fb471be
> > --- /dev/null
> > +++ b/lib/librte_vhost/rte_vdpa.h
> > @@ -0,0 +1,126 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +#ifndef _RTE_VDPA_H_
> > +#define _RTE_VDPA_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * Device specific vhost lib
> > + */
> > +
> > +#include <rte_pci.h>
> > +#include "rte_vhost.h"
> > +
> > +#define MAX_VDPA_ENGINE_NUM 128
> > +#define MAX_VDPA_NAME_LEN 128
> > +
> > +struct rte_vdpa_eng_addr {
> > +	union {
> > +		uint8_t __dummy[64];
> > +		struct rte_pci_addr pci_addr;
> I think we should not only support PCI, but any type of buses.
> At least in the API.

Exactly, so we defined a 64 bytes union so any bus types can be added
without breaking the ABI.

But there is one place that may be impacted is the is_same_eng() function.
Maybe comparing all the bytes in __dummy[64] is a better way. What do you
think?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/4] ethdev: add per-PMD tuning of RxTx parmeters
  2018-03-07 12:08  3% [dpdk-dev] [RFC PATCH v1 0/4] ethdev: add per-PMD tuning of RxTx parmeters Remy Horton
@ 2018-03-21 14:27  3% ` Remy Horton
  0 siblings, 0 replies; 200+ results
From: Remy Horton @ 2018-03-21 14:27 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing,
	Shreyansh Jain, Thomas Monjalon

The optimal values of several transmission & reception related parameters,
such as burst sizes, descriptor ring sizes, and number of queues, varies
between different network interface devices. This patchset allows individual
PMDs to specify their preferred parameter values, and if so indicated by an
application, for them to be used automatically by the ethdev layer.

rte_eth_dev_configure() has been changed so that specifying zero for both
nb_rx_q AND nb_tx_q causes it to use driver preferred values, and if these
are not available, falls back to EAL defaults. Setting one (but not both)
to zero does not cause the use of defaults, as having one of them zeroed is
a valid setup.

This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
that subsequent patchsets will cover other PMDs. A deprecation notice
covering the API/ABI change is in place.


Changes in v2:
* Rebased to 
* Removed fallback values from rte_eth_dev_info_get()
* Added fallback values to rte_rte_[rt]x_queue_setup()
* Added fallback values to rte_eth_dev_configure()
* Corrected comment
* Removed deprecation notice
* Split RX and Tx into seperate structures
* Changed parameter names


Remy Horton (4):
  ethdev: add support for PMD-tuned Tx/Rx parameters
  net/e1000: add TxRx tuning parameters
  net/i40e: add TxRx tuning parameters
  testpmd: make use of per-PMD TxRx parameters

 app/test-pmd/testpmd.c               |  5 ++--
 doc/guides/rel_notes/deprecation.rst | 13 -----------
 drivers/net/e1000/em_ethdev.c        |  6 +++++
 drivers/net/i40e/i40e_ethdev.c       | 33 ++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.c        | 44 ++++++++++++++++++++++++++++--------
 lib/librte_ether/rte_ethdev.h        | 23 +++++++++++++++++++
 6 files changed, 97 insertions(+), 27 deletions(-)

-- 
2.9.5

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters
  2018-03-21 10:02  3%                             ` Ferruh Yigit
@ 2018-03-21 10:45  0%                               ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-03-21 10:45 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Bruce Richardson, Horton, Remy, Ananyev, Konstantin, dev, Lu,
	Wenzhuo, Wu, Jingjing, Zhang, Qi Z, Xing, Beilei,
	Thomas Monjalon

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Wednesday, March 21, 2018 3:33 PM
> To: Shreyansh Jain <shreyansh.jain@nxp.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Horton, Remy
> <remy.horton@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Qi
> Z <qi.z.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-
> tuned Tx/Rx parameters
> 
> On 3/21/2018 6:51 AM, Shreyansh Jain wrote:
> > On Tue, Mar 20, 2018 at 8:24 PM, Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
> >> On 3/16/2018 1:54 PM, Shreyansh Jain wrote:
> >>> On Thu, Mar 15, 2018 at 8:27 PM, Ferruh Yigit
> <ferruh.yigit@intel.com> wrote:
> >
> > [...]
> >

[...]

> >>>
> >>> C) Unlike the original proposal, this would add two separate members
> >>> to rte_eth_dev_info - one each for Rx and Tx. They both are still
> >>> expected to be populated through the info_get() implementation but
> not
> >>> by lib_eal.
> >>> IMO, doesn't matter.
> >>
> >> There won't be new members, which ones are you talking about?
> >
> > original proposal: (ignore change of names, please)
> >
> >  rte_eth_dev_preferred_info {
> >      rx_burst_size
> >      tx_burst_size
> >      rx_ring_size
> >      tx_ring_size
> >      ...
> >   }
> >
> > And this is what I think last few comments intended:
> >
> >  rte_eth_rxpreferred {
> >    ...
> >    rx_burst_size
> >    rx_ring_size
> >    ...
> >  }
> >
> >  rte_eth_txpreferred {
> >    ...
> >    tx_burst_size
> >    tx_ring_size
> >    ...
> >  }
> >
> > both the above added rte_eth_dev_info{}
> >
> > This is what I meant when I stated "...this would add two separate
> > members to rte_eth_dev_info - one each for Rx and Tx..."
> 
> Got it. I don't have any strong opinion on adding single struct or two
> (one for
> Rx and one for Tx).
> Since these will be public structs, do you think will there be any
> difference
> from ABI stability point of view?

No. It was just an observation. To me, it doesn't matter which approach is selected.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-03-21  4:59  0%       ` gowrishankar muthukrishnan
@ 2018-03-21 10:24  0%         ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-21 10:24 UTC (permalink / raw)
  To: gowrishankar muthukrishnan; +Cc: dev, Bruce Richardson, Chao Zhu

On 21-Mar-18 4:59 AM, gowrishankar muthukrishnan wrote:
> On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      v3:
>>      - Added ABI compatibility
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 
>> +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
>> b/lib/librte_eal/bsdapp/eal/Makefile
>> index dd455e6..ed1d17b 100644
>> --- a/lib/librte_eal/bsdapp/eal/Makefile
>> +++ b/lib/librte_eal/bsdapp/eal/Makefile
>> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>>
>>   EXPORT_MAP := ../../rte_eal_version.map
>>
>> -LIBABIVER := 6
>> +LIBABIVER := 7
>>
>>   # specific to bsdapp exec-env
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
>> diff --git a/lib/librte_eal/common/eal_common_lcore.c 
>> b/lib/librte_eal/common/eal_common_lcore.c
>> index 7724fa4..827ddeb 100644
>> --- a/lib/librte_eal/common/eal_common_lcore.c
>> +++ b/lib/librte_eal/common/eal_common_lcore.c
>> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>>       struct rte_config *config = rte_eal_get_configuration();
>>       unsigned lcore_id;
>>       unsigned count = 0;
>> +    unsigned int socket_id, max_socket_id = 0;
>>
>>       /*
>>        * Parse the maximum set of logical cores, detect the subset of 
>> running
>> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>>           /* init cpuset for per lcore config */
>>           CPU_ZERO(&lcore_config[lcore_id].cpuset);
>>
>> +        /* find socket first */
>> +        socket_id = eal_cpu_socket_id(lcore_id);
>> +        if (socket_id >= RTE_MAX_NUMA_NODES) {
>> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> +            socket_id = 0;
>> +#else
>> +            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than 
>> RTE_MAX_NUMA_NODES (%d)\n",
>> +                    socket_id, RTE_MAX_NUMA_NODES);
>> +            return -1;
>> +#endif
>> +        }
>> +        max_socket_id = RTE_MAX(max_socket_id, socket_id);
>> +
>>           /* in 1:1 mapping, record related cpu detected state */
>>           lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>>           if (lcore_config[lcore_id].detected == 0) {
>> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>>           config->lcore_role[lcore_id] = ROLE_RTE;
>>           lcore_config[lcore_id].core_role = ROLE_RTE;
>>           lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
>> -        lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
>> -        if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
>> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> -            lcore_config[lcore_id].socket_id = 0;
>> -#else
>> -            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
>> -                "RTE_MAX_NUMA_NODES (%d)\n",
>> -                lcore_config[lcore_id].socket_id,
>> -                RTE_MAX_NUMA_NODES);
>> -            return -1;
>> -#endif
>> -        }
>> +        lcore_config[lcore_id].socket_id = socket_id;
>>           RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>>                   "core %u on socket %u\n",
>>                   lcore_id, lcore_config[lcore_id].core_id,
>> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>>           RTE_MAX_LCORE);
>>       RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>>
>> +    config->numa_node_count = max_socket_id + 1;
> 
> In some IBM servers, socket ID number does not seem to be in sequence. 
> For an instance, 0 and 8 for a 2 node server.
> 
> In this case, numa_node_count would mislead users if wrongly understood 
> by its variable name IMO (see below)
>> +    RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", 
>> config->numa_node_count);
> 
> For an instance, reading above message would tell 'EAL detected 8 nodes' 
> in my server, but actually there are only two nodes.
> 
> Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
> count of numa nodes in _count variable.
> 
> Also, there could be a case when there is no local memory available to a 
> numa node too.
> 
> Thanks,
> Gowrishankar

The point of this patchset is to (pre)allocate memory only on existing 
sockets.

If we don't know how many sockets there are, we are forced to 
preallocate VA space per each *possible* NUMA node - that is, reserve 
e.g. 8x128G of memory, 6 of which will go unused on a 2-socket system. 
We can't know if there is no memory on socket in advance, but we can at 
least avoid preallocating VA space for sockets that don't exist in the 
first place.

How about we store all possible socket id's instead? e.g. something like:

static int numa_node_ids[MAX_NUMA_NODES];
<...>
int rte_eal_cpu_init() {
	int sockets[RTE_MAX_LCORE];
	<...>
	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
		core_to_socket[lcore_id] = socket;
	}
	<...>
	qsort(sockets);
	<...>
	// store all unique sockets in numa_node_ids in ascending order
}
<...>

on a 2 socket system we then get:

rte_num_sockets() => return 2
rte_get_socket_id(int idx) => return numa_node_ids[idx]

Would that be suitable?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters
  @ 2018-03-21 10:02  3%                             ` Ferruh Yigit
  2018-03-21 10:45  0%                               ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-21 10:02 UTC (permalink / raw)
  To: Shreyansh Jain
  Cc: Bruce Richardson, Horton, Remy, Ananyev, Konstantin, dev, Lu,
	Wenzhuo, Wu, Jingjing, Zhang, Qi Z, Xing, Beilei,
	Thomas Monjalon

On 3/21/2018 6:51 AM, Shreyansh Jain wrote:
> On Tue, Mar 20, 2018 at 8:24 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>> On 3/16/2018 1:54 PM, Shreyansh Jain wrote:
>>> On Thu, Mar 15, 2018 at 8:27 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
> [...]
> 
>>>> Hi Remy, Shreyansh,
>>>>
>>>> What do you think about using a variable name consistent with existing
>>>> "default_[rt]xconf" in dev_info?
>>>
>>> It just turned out to be much more complex than I initially thought :)
>>> Is this what the above conversation merging at (for Rx, as example):
>>>
>>> 1. 'default_rx_size_conf' is added in rte_eth_dev_info (and this
>>> includes I/O  params like burst size, besides configure time nb_queue,
>>> nb_desc etc). Driver would return these values filled in when
>>> info_get() is called.
>>>
>>> 2a. If an application needs the defaults, it would perform info_get()
>>> and get the values. then, use the values in configuration APIs
>>> (rx_queue_setup for nb_rx_desc, eth_dev_dev_configure for
>>> nb_rx_queues).
>>> For rx_burst calls, it would use the burst_size fields obtained from info_get().
>>> This is good enough for configuration and datapath (rx_burst).
>>>
>>> OR, another case
>>>
>>> 2b. Application wants to use default vaules provided by driver without
>>> calling info_get. In which case, it would call
>>> rx_queue_setup(nb_rx_desc=0..) or eth_dev_configure(nb_rx_queue=0,
>>> nb_tx_queue=0). The implementation would query the value from
>>> 'default_rx_size_conf' through info_get() and use those values.
>>> Though, in this case, rte_eth_rx_burst(burst=0) might not work for
>>> picking up the default within rte_ethdev.h.
>>
>> In Bruce's suggestion where ethdev keep defaults is changed.
>> Initial suggestion was rte_eth_dev_info_get() filling default data, now defaults
>> will be defined in functions like rte_eth_rx_queue_setup().
>>
>> This is a little different from filling defaults in rte_eth_dev_info_get():
>> - Application can know where the defaults are coming from because dev_info
>> fields are only modified by PMD. Application still prefer to use ethdev defaults.
>>
>> - The default values in ethdev library provided in function related to that
>> data, instead of separate rte_eth_dev_info_get() function.
> 
> It seems we both are on same page (almost) - just that I couldn't
> articulate my comments properly earlier, maybe.
> 
> rte_eth_dev_info_get is only a method to get defaults set by PMDs.
> dev_info_get is not setting defaults by itself. I get this.
> 
>>
>>
>> What application can do:
>> -  Application can call rte_eth_dev_info_get() and can learn if PMD provided
>> defaults or not.
>> - If PMD doesn't provided any default values application can prefer to use
>> application defined values. This may be an option for the application looking
>> for most optimized values.
>> - Although PMD doesn't provide any defaults, application still can use defaults
>> provided by ethdev by providing '0' as arguments.
> 
> Yes, agree - and only comment I added previously in this case is that
> this is not applicable for burst APIs. So, optimal [rt]x burst size
> cannot be 'defaulted' to EAL layer. Other values like ring size, queue
> count can be delegated to EAL for overwriting if passed as '0'.

Yes you are right.

> 
>>
>>
>> So how related ethdev functions will be updated:
>> if argument != 0
>>   use argument
>> else
>>   dev_info_get()
>>     if dev_info->argument != 0
>>       use dev_info->argument
>>     else
>>       use function_prov
> 
> Perfect, but only for eth_dev_configure and eth_[rt]x_queue_setup functions -
> and that is OK with me.
> 
>>
>>>
>>> :Four observations:
>>> A). For burst size (or any other I/O time value added in future),
>>> values would have to be explicitly used by application - always. If
>>> value reported by info_get() is '0' (see (B) below), application to
>>> use its own judgement. No default override by lib_eal.
>>> IMO, This is good enough assumption.
>>
>> This is no more true after Bruce's comment.
>> If application provides any values it will overwrite everything else,
>> application has the final word.
>> But application may prefer to use provided default values.
> 
> I am not sure what has changed with Bruce's comment - but I agree with
> what you are stating.
> 
>>
>>>
>>> B). '0' as an indicator for 'no-default-value-available-from-driver'
>>> is still an open point. It is good enough for current proposed
>>> parameters, but may be a valid numerical value in future.
>>> IMO, this can be ignored for now.
>>
>> Agree that we can ignore it for now.
>>
>>>
>>> C) Unlike the original proposal, this would add two separate members
>>> to rte_eth_dev_info - one each for Rx and Tx. They both are still
>>> expected to be populated through the info_get() implementation but not
>>> by lib_eal.
>>> IMO, doesn't matter.
>>
>> There won't be new members, which ones are you talking about?
> 
> original proposal: (ignore change of names, please)
> 
>  rte_eth_dev_preferred_info {
>      rx_burst_size
>      tx_burst_size
>      rx_ring_size
>      tx_ring_size
>      ...
>   }
> 
> And this is what I think last few comments intended:
> 
>  rte_eth_rxpreferred {
>    ...
>    rx_burst_size
>    rx_ring_size
>    ...
>  }
> 
>  rte_eth_txpreferred {
>    ...
>    tx_burst_size
>    tx_ring_size
>    ...
>  }
> 
> both the above added rte_eth_dev_info{}
> 
> This is what I meant when I stated "...this would add two separate
> members to rte_eth_dev_info - one each for Rx and Tx..."

Got it. I don't have any strong opinion on adding single struct or two (one for
Rx and one for Tx).
Since these will be public structs, do you think will there be any difference
from ABI stability point of view?

> 
> [...]
> 
> -
> Shreyansh
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  2018-03-21  2:28  4%   ` Tan, Jianfeng
@ 2018-03-21  9:55  3%     ` Pattan, Reshma
  0 siblings, 0 replies; 200+ results
From: Pattan, Reshma @ 2018-03-21  9:55 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

Hi,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Wednesday, March 21, 2018 2:28 AM
> To: Pattan, Reshma <reshma.pattan@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH] pdump: change to use generic multi-process channel
> 

Hi,

> > 1) I feel ABI breakage has to be addressed  first for change in
> rte_pdump_init() .
> > 2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then
> remove it completely .
> 
> This patch itself does not break any ABI. It just puts parameters of
> rte_pdump_init() not used. And make rte_pdump_set_socket_dir() as a
> dummy function.
> 

So, for current release you just mark parameters unused and functions set to dummy, in future release you announce 
ABI breakage by removing them completely? If that is agreed plan I don't have any issues.

> > 3)Need to do cleanup of the code app/dpdk-pdump.
> 
> Yes, I understand it's a normal process to announce deprecation firstly, and
> then do the change.
> 
> But here is the thing, with generic mp introduced, we will not be compatible
> with DPDK versions.
> So we want to unify the use of generic mp channel in this release for vfio,
> pdump, vdev, memory rework.
> And in fact, ABI/API changes could be delayed to later releases.

So, you want to remove unnecessary socket  related code from dpdk-pdump in future release itself?  Kind of making sense. 
But dpdk-pdump  tool has socket path related command line options which user still can pass on, isn't it kind of confusion we creating w.r.t
Internal design and usage? 

> 
> > 4)After all the changes we need to make sure dpdk-pdump works fine
> > without breaking the functionality, validation team should be able to help.
> 
> I have done a simple test of pdump. Can you suggest where can I get the
> comprehensive test cases?
> 

Ok, if you have verified and observed packets are been captured successfully, that is good enough.

Thanks,
Reshma

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-21  8:21  0%         ` Thomas Monjalon
@ 2018-03-21  8:47  0%           ` Arnon Warshavsky
  0 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-03-21  8:47 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

I don't know what needs to be modified.

> I think the first step is to clearly identified the different kind
> of changes by splitting your patch.
> Then we will decide how to integrate them.
>
>
> Ok.Will split the commit with no abi handling and continue from there

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 23:04  0%       ` Arnon Warshavsky
@ 2018-03-21  8:21  0%         ` Thomas Monjalon
  2018-03-21  8:47  0%           ` Arnon Warshavsky
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-21  8:21 UTC (permalink / raw)
  To: Arnon Warshavsky
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

21/03/2018 00:04, Arnon Warshavsky:
> >
> > You are talking about API, and I agree the old applications can keep
> > considering the functions as void.
> > But I was talking about ABI, meaning: can we use an old application
> > without recompiling and update only the DPDK (in .so file)?
> >
> >
> > You are right of course. Once again I mixed the two..
> I will modify accordingly

I don't know what needs to be modified.
I think the first step is to clearly identified the different kind
of changes by splitting your patch.
Then we will decide how to integrate them.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
  2018-03-08 12:12  3%       ` Bruce Richardson
@ 2018-03-21  4:59  0%       ` gowrishankar muthukrishnan
  2018-03-21 10:24  0%         ` Burakov, Anatoly
  1 sibling, 1 reply; 200+ results
From: gowrishankar muthukrishnan @ 2018-03-21  4:59 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Bruce Richardson, Chao Zhu

On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it. This will
> break the ABI, so bump ABI version.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
>      v4:
>      - Remove backwards ABI compatibility, bump ABI instead
>      
>      v3:
>      - Added ABI compatibility
>      
>      v2:
>      - checkpatch changes
>      - check socket before deciding if the core is not to be used
>
>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>   6 files changed, 44 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
> index dd455e6..ed1d17b 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>
>   EXPORT_MAP := ../../rte_eal_version.map
>
> -LIBABIVER := 6
> +LIBABIVER := 7
>
>   # specific to bsdapp exec-env
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
> diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
> index 7724fa4..827ddeb 100644
> --- a/lib/librte_eal/common/eal_common_lcore.c
> +++ b/lib/librte_eal/common/eal_common_lcore.c
> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>   	struct rte_config *config = rte_eal_get_configuration();
>   	unsigned lcore_id;
>   	unsigned count = 0;
> +	unsigned int socket_id, max_socket_id = 0;
>
>   	/*
>   	 * Parse the maximum set of logical cores, detect the subset of running
> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>   		/* init cpuset for per lcore config */
>   		CPU_ZERO(&lcore_config[lcore_id].cpuset);
>
> +		/* find socket first */
> +		socket_id = eal_cpu_socket_id(lcore_id);
> +		if (socket_id >= RTE_MAX_NUMA_NODES) {
> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
> +			socket_id = 0;
> +#else
> +			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
> +					socket_id, RTE_MAX_NUMA_NODES);
> +			return -1;
> +#endif
> +		}
> +		max_socket_id = RTE_MAX(max_socket_id, socket_id);
> +
>   		/* in 1:1 mapping, record related cpu detected state */
>   		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>   		if (lcore_config[lcore_id].detected == 0) {
> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>   		config->lcore_role[lcore_id] = ROLE_RTE;
>   		lcore_config[lcore_id].core_role = ROLE_RTE;
>   		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
> -		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
> -		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
> -			lcore_config[lcore_id].socket_id = 0;
> -#else
> -			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
> -				"RTE_MAX_NUMA_NODES (%d)\n",
> -				lcore_config[lcore_id].socket_id,
> -				RTE_MAX_NUMA_NODES);
> -			return -1;
> -#endif
> -		}
> +		lcore_config[lcore_id].socket_id = socket_id;
>   		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>   				"core %u on socket %u\n",
>   				lcore_id, lcore_config[lcore_id].core_id,
> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>   		RTE_MAX_LCORE);
>   	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>
> +	config->numa_node_count = max_socket_id + 1;

In some IBM servers, socket ID number does not seem to be in sequence. 
For an instance, 0 and 8 for a 2 node server.

In this case, numa_node_count would mislead users if wrongly understood 
by its variable name IMO (see below)
> +	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);

For an instance, reading above message would tell 'EAL detected 8 nodes' 
in my server, but actually there are only two nodes.

Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
count of numa nodes in _count variable.

Also, there could be a case when there is no local memory available to a 
numa node too.

Thanks,
Gowrishankar
> +
>   	return 0;
>   }
> +
> +unsigned int
> +rte_num_sockets(void)
> +{
> +	const struct rte_config *config = rte_eal_get_configuration();
> +	return config->numa_node_count;
> +}
> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index 08c6637..63fcc2e 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -57,6 +57,7 @@ enum rte_proc_type_t {
>   struct rte_config {
>   	uint32_t master_lcore;       /**< Id of the master lcore */
>   	uint32_t lcore_count;        /**< Number of available logical cores. */
> +	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
>   	uint32_t service_lcore_count;/**< Number of available service cores. */
>   	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
>
> diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
> index d84bcff..ddf4c64 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -120,6 +120,14 @@ rte_lcore_index(int lcore_id)
>   unsigned rte_socket_id(void);
>
>   /**
> + * Return number of physical sockets on the system.
> + * @return
> + *   the number of physical sockets as recognized by EAL
> + *
> + */
> +unsigned int rte_num_sockets(void);
> +
> +/**
>    * Get the ID of the physical socket of the specified lcore
>    *
>    * @param lcore_id
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index 7e5bbe8..b9c7727 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
>   EXPORT_MAP := ../../rte_eal_version.map
>   VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
>
> -LIBABIVER := 6
> +LIBABIVER := 7
>
>   VPATH += $(RTE_SDK)/lib/librte_eal/common
>
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 4146907..fc83e74 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -211,6 +211,13 @@ DPDK_18.02 {
>
>   }  DPDK_17.11;
>
> +DPDK_18.05 {
> +	global:
> +
> +	rte_num_sockets;
> +
> +} DPDK_18.02;
> +
>   EXPERIMENTAL {
>   	global:
>
> @@ -255,4 +262,4 @@ EXPERIMENTAL {
>   	rte_service_set_stats_enable;
>   	rte_service_start_with_defaults;
>
> -} DPDK_18.02;
> +} DPDK_18.05;

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  2018-03-20 16:37  4% ` Pattan, Reshma
@ 2018-03-21  2:28  4%   ` Tan, Jianfeng
  2018-03-21  9:55  3%     ` Pattan, Reshma
  0 siblings, 1 reply; 200+ results
From: Tan, Jianfeng @ 2018-03-21  2:28 UTC (permalink / raw)
  To: Pattan, Reshma, dev

Thank you for the comments!

> -----Original Message-----
> From: Pattan, Reshma
> Sent: Wednesday, March 21, 2018 12:38 AM
> To: Tan, Jianfeng; dev@dpdk.org
> Subject: RE: [PATCH] pdump: change to use generic multi-process channel
> 
> Hi,
> 
> > -----Original Message-----
> > From: Tan, Jianfeng
> > Sent: Sunday, March 4, 2018 3:04 PM
> > To: dev@dpdk.org
> > Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Pattan, Reshma
> > <reshma.pattan@intel.com>
> > Subject: [PATCH] pdump: change to use generic multi-process channel
> >
> > The original code replies on the private channel for primary and secondary
> > communication. Change to use the generic multi-process channel.
> >
> > Note with this change, dpdk-pdump will be not compatible with old version
> > DPDK applications.
> >
> 
> Is this the correct time to make this change? As I see the rte_mp APIs are still
> in experimental stage?
> If you wish to change them,
> 
> 1) I feel ABI breakage has to be addressed  first for change in rte_pdump_init() . 
> 2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then remove it completely .

This patch itself does not break any ABI. It just puts parameters of rte_pdump_init() not used. And make rte_pdump_set_socket_dir() as a dummy function.

> 3)Need to do cleanup of the code app/dpdk-pdump.

Yes, I understand it's a normal process to announce deprecation firstly, and then do the change.

But here is the thing, with generic mp introduced, we will not be compatible with DPDK versions.
So we want to unify the use of generic mp channel in this release for vfio, pdump, vdev, memory rework.
And in fact, ABI/API changes could be delayed to later releases.

> 4)After all the changes we need to make sure dpdk-pdump works fine
> without breaking the functionality, validation team should be able to help.

I have done a simple test of pdump. Can you suggest where can I get the comprehensive test cases?

> 5)Replace strcpy  with snprintf.

Will do.

Thanks,
Jianfeng

> 
> Thanks,
> Reshma
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 22:49  3%     ` Thomas Monjalon
@ 2018-03-20 23:04  0%       ` Arnon Warshavsky
  2018-03-21  8:21  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Arnon Warshavsky @ 2018-03-20 23:04 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

>
> You are talking about API, and I agree the old applications can keep
> considering the functions as void.
> But I was talking about ABI, meaning: can we use an old application
> without recompiling and update only the DPDK (in .so file)?
>
>
> You are right of course. Once again I mixed the two..
I will modify accordingly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 22:42  0%   ` Arnon Warshavsky
@ 2018-03-20 22:49  3%     ` Thomas Monjalon
  2018-03-20 23:04  0%       ` Arnon Warshavsky
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-20 22:49 UTC (permalink / raw)
  To: Arnon Warshavsky
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

20/03/2018 23:42, Arnon Warshavsky:
> > Thanks for working on this important topic.
> 
> With pleasure :)
> 
> > My feeling is that we could replace most of them by a log + return.
> > I did not think you would add a new macro. Why you chose this way?
> 
> This was meant to keep the code shorter, and imply to the reader that this
> return is actually meant to be fatal
> >
> > >   I would like to define a device health state that can be monitored from
> > >   the side,and this will be an independant patch.
> >
> > You mean when a device become unusable?
> 
> Yes. Obviously not a simple issue, but essential for refraining from panic
> in the interrupt/data-path context,
> while allowing to detect and execute on the slow/management path.
> 
> > > - Some previously panicing void functions where changed to return a
> > > value, with callers modified accordingly.
> >
> > If the function is exposed to the application, I think it is an ABI change
> > and should follow the deprecation process.
> 
> In this case I thought there would be no actual change for the user as the
> transition is from returning void to int,
> and existing calls should continue to behave as before (except for not
> crashing)

You are talking about API, and I agree the old applications can keep
considering the functions as void.
But I was talking about ABI, meaning: can we use an old application
without recompiling and update only the DPDK (in .so file)?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 22:04  3% ` Thomas Monjalon
@ 2018-03-20 22:42  0%   ` Arnon Warshavsky
  2018-03-20 22:49  3%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Arnon Warshavsky @ 2018-03-20 22:42 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

> Thanks for working on this important topic.
> With pleasure :)
>
 ...

> My feeling is that we could replace most of them by a log + return.
> I did not think you would add a new macro. Why you chose this way?
> This was meant to keep the code shorter, and imply to the reader that this
> return is actually meant to be fatal
>
...
> >   I would like to define a device health state that can be monitored from
> >   the side,and this will be an independant patch.
>
> You mean when a device become unusable?
>
Yes. Obviously not a simple issue, but essential for refraining from panic
in the interrupt/data-path context,
while allowing to detect and execute on the slow/management path.

>
> ..
> > - Some previously panicing void functions where changed to return a
> value,
> >   with callers modified accordingly.
>
> If the function is exposed to the application, I think it is an ABI change
> and should follow the deprecation process.
>
In this case I thought there would be no actual change for the user as the
transition is from returning void to int,
and existing calls should continue to behave as before (except for not
crashing)
Following references to these functions, they all seem to comply to this
rule, but I guess that's what reviews are for :)


>
>
> /Arnon

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  @ 2018-03-20 22:04  3% ` Thomas Monjalon
  2018-03-20 22:42  0%   ` Arnon Warshavsky
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-20 22:04 UTC (permalink / raw)
  To: Arnon Warshavsky
  Cc: anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit, dev

Hi,

20/03/2018 22:28, Arnon Warshavsky:
> The purpose of this patch is to cleanup the library code
> from paths that end up aborting the process,
> and move to checking error values, in order to allow the running process
> perform an orderly teardown or other mitigation of the event.

Thanks for working on this important topic.

> This patch modifies the majority of rte_panic calls under lib and drivers,
> and replaces them with a new variation of rte_panic macro
> that does not abort and returns an error value
> that can be propagated up the call stack.

My feeling is that we could replace most of them by a log + return.
I did not think you would add a new macro. Why you chose this way?

> - Focus was given to the dpdk initialization path
> - Some of the panic calls within drivers were left in place where
>   the call is from within an interrupt or calls that are on the data path,
>   where there is no simple applicative route to propagate
>   the error to temination.
>   These should be handled by the driver maintainers.

Yes, better to let driver maintainers decide if you are not sure.

>   I would like to define a device health state that can be monitored from
>   the side,and this will be an independant patch.

You mean when a device become unusable?

> - No change took place in example and test files

Yes, panic/exit is allowed in applications.

> - No change took place for debug assertions calling panic

Yes, debug assert is a special case.

> - Some previously panicing void functions where changed to return a value,
>   with callers modified accordingly.

If the function is exposed to the application, I think it is an ABI change
and should follow the deprecation process.

> An additional independant patch to devtools/checkpatches.sh
> will be submitted in order to prevent new additions of calls to rte_panic
> under lib and drivers.

Yes please! +1 for an automatic check.

> Keep calm and don't panic.

Sure :)

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce ethdev CRC strip flag deprecation
  @ 2018-03-20 17:23  3%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-20 17:23 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Neil Horman, John McNamara, Marko Kovacevic, dev

On 3/20/2018 11:35 AM, Thomas Monjalon wrote:
> 20/03/2018 12:26, Ferruh Yigit:
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> +* ethdev: Make CRC stript default behavior without any flag required and add a
> 
> s/stript/stripping/
> 
>> +  new offload flag to let application request for keeping CRC if PMD reports
>> +  capability for it.
>> +  ``DEV_RX_OFFLOAD_CRC_STRIP`` flag will be removed.
>> +  ``DEV_RX_OFFLOAD_KEETP_CRC`` will be added.
> 
> s/KEETP/KEEP/
> 
> I think we should introduce the new flag without removing the old one
> for one release.
> Setting both flags would be an error.
> Setting no flag would mean stripping.
> So the CRC_STRIP flag would be just ignored by PMDs.
> 
> Opinions?

Introducing KEEP_CRC in this release is OK, since no PMD announces this
capability, it should be safe.

Since many PMD's offloading patches are not applied yet, deciding on "Setting no
flag would mean stripping" helps to get this correct at first place for them.
Only this is ABI break, but I am not aware of any PMD that implements not
stripping CRC case so I hope this is acceptable.

If there is no objection, I will update notice to only remove flag in next release.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  @ 2018-03-20 16:37  4% ` Pattan, Reshma
  2018-03-21  2:28  4%   ` Tan, Jianfeng
  0 siblings, 1 reply; 200+ results
From: Pattan, Reshma @ 2018-03-20 16:37 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

Hi,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Sunday, March 4, 2018 3:04 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Pattan, Reshma
> <reshma.pattan@intel.com>
> Subject: [PATCH] pdump: change to use generic multi-process channel
> 
> The original code replies on the private channel for primary and secondary
> communication. Change to use the generic multi-process channel.
> 
> Note with this change, dpdk-pdump will be not compatible with old version
> DPDK applications.
> 

Is this the correct time to make this change? As I see the rte_mp APIs are still in experimental stage?
If you wish to change them,

1) I feel ABI breakage has to be addressed  first for change in  rte_pdump_init() .
2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then remove it completely . 
3)Need to do cleanup of the code app/dpdk-pdump. 
4)After all the changes we need to make sure dpdk-pdump works fine without breaking the functionality, validation team should be able to help.
5)Replace strcpy  with snprintf.

Thanks,
Reshma

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 19:06  0%           ` Neil Horman
@ 2018-03-20 15:51  0%             ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-20 15:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On 3/9/2018 7:06 PM, Neil Horman wrote:
> On Fri, Mar 09, 2018 at 03:45:49PM +0000, Ferruh Yigit wrote:
>> On 3/9/2018 3:16 PM, Neil Horman wrote:
>>> On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
>>>> On 3/9/2018 12:36 PM, Neil Horman wrote:
>>>>> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
>>>>>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
>>>>>> used as named opaque type.
>>>>>>
>>>>>> So the functions that are adding callbacks can return objects in this
>>>>>> type instead of void pointer.
>>>>>>
>>>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>>> ---
>>>>>> v2:
>>>>>> * keep using struct * in parameters, instead add callback functions
>>>>>> return struct rte_eth_rxtx_callback pointer.
>>>>>>
>>>>>> v4:
>>>>>> * Remove deprecation notice. LIBABIVER already increased in this release
>>>>>> ---
>>>>>>  doc/guides/rel_notes/deprecation.rst |  7 -------
>>>>>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
>>>>>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
>>>>>>  3 files changed, 11 insertions(+), 15 deletions(-)
>>>>>>
>>>>> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
>>>>> internal data structure, then it shouldn't be used as part of the prototype for
>>>>> an exported function, as the structure will then no longer be a internal data
>>>>> structure, but rather part of the public ABI.
>>>>
>>>> "struct rte_eth_rxtx_callback" is internal data structure. And application
>>>> should not access elements of this structure.
>>>>
>>>> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
>>>> can use it as opaque type.
>>>>
>>>> It is possible that both "add" and "remove" APIs use "void *" and API itself can
>>>> cast it. But the inconsistency was "add" related APIs return "void *" and
>>>> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
>>>>
>>>> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
>>>> "void *", because named opaque type documents intention/usage better.
>>>>
>>>> Thanks,
>>>> ferruh
>>>>
>>> I get what you're saying about rte_eth_rxtx_callback being an internals
>>> structure (or its intent is to be an internal structure), but it doesn't seem to
>>> hold up to the header file layout.  rte_eth_rxtx_callback is defined in
>>> rte_ethdev_core.h which according to the makefile, is listed as a symlinked
>>> file, and therefore available for external applications to include.  This
>>> negates the intended opaque nature of the struct.  I think before you do this,
>>> you want to rectify that.
>>
>> Intention is to make "struct rte_eth_rxtx_callback" internal, but as you said it
>> is available to applications. This is same for all data structures in
>> rte_ethdev_core.h
>>
> Well...yes.  Thats what I said
> 
>> Unfortunately it can't be actual internal because of inline functions in public
>> header uses them. And we can't change inline functions because of performance
>> concerns.
>>
> I'm sorry, thats not ok with me.  Just declaring a data structure to be
> internal-only without enforcing that is asking for applications to mangle
> internal data, and theres no reason it can't be fixed (and done without
> sacrificing performance).

Currently that is what blocking us, mainly rte_eth_[rt]x_burst() inline
functions. Is there a way to fix it without loosing performance?

> 
>> Since we can't make the structure real internal, we can't really prevent
>> applications to access the internals, this same if you use "void *".
>>
> Just typedef a void pointer to some rte_ethdev_cb_handle_t type and pass that
> back and forth instead.  That at least hides the fact that you are using a non
> opaque structure from user applications without some intentional casting.  You
> can further lock the call down by declaring the handles const so that no one
> tries to dereference or modify them without generating a warning.

Even handle won't work if user really wants to update it. This may highlight the
intention but I believe moving struct to ethdev_core.h already highlights the
intention.

Related to the const qualifier I was thinking if remove functions needs to
update the struct, but it doesn't seems the case, I will add them.

> 
> Neil
> 
>>>
>>> Neil
>>>
>>>>>
>>>>> Neil
>>>>>
>>>>
>>>>
>>
>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-19 17:03  0%       ` Olivier Matz
@ 2018-03-20 14:41  0%         ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-03-20 14:41 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Andrew Rybchenko, dev

On Mon, Mar 19, 2018 at 06:03:52PM +0100, Olivier Matz wrote:
> On Sat, Mar 10, 2018 at 03:39:34PM +0000, Andrew Rybchenko wrote:
> > Size of memory chunk required to populate mempool objects depends
> > on how objects are stored in the memory. Different mempool drivers
> > may have different requirements and a new operation allows to
> > calculate memory size in accordance with driver requirements and
> > advertise requirements on minimum memory chunk size and alignment
> > in a generic way.
> > 
> > Bump ABI version since the patch breaks it.
> > 
> > Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> Looks good to me. Just see below for few minor comments.
> 
> > ---
> > RFCv2 -> v1:
> >  - move default calc_mem_size callback to rte_mempool_ops_default.c
> >  - add ABI changes to release notes
> >  - name default callback consistently: rte_mempool_op_<callback>_default()
> >  - bump ABI version since it is the first patch which breaks ABI
> >  - describe default callback behaviour in details
> >  - avoid introduction of internal function to cope with depration
> 
> typo (depration)
> 
> >    (keep it to deprecation patch)
> >  - move cache-line or page boundary chunk alignment to default callback
> >  - highlight that min_chunk_size and align parameters are output only
> 
> [...]
> 
> > --- a/lib/librte_mempool/Makefile
> > +++ b/lib/librte_mempool/Makefile
> > @@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
> >  
> >  EXPORT_MAP := rte_mempool_version.map
> >  
> > -LIBABIVER := 3
> > +LIBABIVER := 4
> >  
> >  # all source are stored in SRCS-y
> >  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
> >  # install includes
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
> >  
> > diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
> > index 7a4f3da..9e3b527 100644
> > --- a/lib/librte_mempool/meson.build
> > +++ b/lib/librte_mempool/meson.build
> > @@ -1,7 +1,8 @@
> >  # SPDX-License-Identifier: BSD-3-Clause
> >  # Copyright(c) 2017 Intel Corporation
> >  
> > -version = 2
> > -sources = files('rte_mempool.c', 'rte_mempool_ops.c')
> > +version = 4
> > +sources = files('rte_mempool.c', 'rte_mempool_ops.c',
> > +		'rte_mempool_ops_default.c')
> >  headers = files('rte_mempool.h')
> >  deps += ['ring']
> 
> It's strange to see that meson does not have the same
> .so version than the legacy build system.
> 
> +CC Bruce in case he wants to fix this issue separately.
>
The so version drift occurred during the development of the next-build
tree, sadly. While initially all version were correct, as the patches
flowed into mainline I wasn't able to keep up with all the version changed.
:-(
Since nobody is actually using meson for packaging (yet), I'm not sure this
is critical, so I don't mind whether it's fixed in a separate patch or not.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 13/41] eal: replace memseg with memseg lists
  2018-03-19 17:39  3%   ` Olivier Matz
@ 2018-03-20  9:47  4%     ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-20  9:47 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Thomas Monjalon, Yuanhan Liu, Maxime Coquelin, Tiwei Bie,
	keith.wiles, jianfeng.tan, andras.kovacs, laszlo.vadkeri,
	benjamin.walker, bruce.richardson, konstantin.ananyev,
	kuralamudhan.ramakrishnan, louise.m.daly, nelio.laranjeiro,
	yskoh, pepperjo, jerin.jacob, hemant.agrawal

On 19-Mar-18 5:39 PM, Olivier Matz wrote:
> On Sat, Mar 03, 2018 at 01:46:01PM +0000, Anatoly Burakov wrote:
> 
> [...]
> 
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -61,7 +61,20 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>>   CONFIG_RTE_LIBRTE_EAL=y
>>   CONFIG_RTE_MAX_LCORE=128
>>   CONFIG_RTE_MAX_NUMA_NODES=8
>> -CONFIG_RTE_MAX_MEMSEG=256
>> +CONFIG_RTE_MAX_MEMSEG_LISTS=32
>> +# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
>> +# or RTE_MAX_MEM_PER_LIST gigabytes worth of memory, whichever is the smallest
>> +CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192
>> +CONFIG_RTE_MAX_MEM_PER_LIST=32
>> +# a "type" is a combination of page size and NUMA node. total number of memseg
>> +# lists per type will be limited to either RTE_MAX_MEMSEG_PER_TYPE pages (split
>> +# over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or RTE_MAX_MEM_PER_TYPE
>> +# gigabytes of memory (split over multiple lists of RTE_MAX_MEM_PER_LIST),
>> +# whichever is the smallest
>> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
>> +CONFIG_RTE_MAX_MEM_PER_TYPE=128
>> +# legacy mem mode only
>> +CONFIG_RTE_MAX_LEGACY_MEMSEG=256
> 
> Would it be possible to suffix CONFIG_RTE_MAX_MEM_PER_LIST and
> CONFIG_RTE_MAX_MEM_PER_TYPE with _GB? It's not that obvious that is it
> gigabytes.

Sure, will add this.

> 
> What is the impact of changing one of these values on the ABI?

Some of them will change the ABI, some won't. MAX_MEMSEG_LISTS will 
change the ABI because it's in the rte_eal_memconfig, but other values 
are not and are only used during init (and LEGACY_MEMSEG is already 
removed in GitHub code).

> And what would be the impact on performance?

Depending on what you mean by performance. Generally, no impact on 
performance will be noticeable because we're not really doing anything 
differently - a page is a page, no matter how or when it is mapped. 
These changes might also speed up some lookup operations on memseg lists 
themselves.

> The underlying question is: shall we increase these values to avoid changing them later?
> 

I do plan to increase the MAX_MEMSEG_LISTS value to at least 64.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 13/41] eal: replace memseg with memseg lists
  @ 2018-03-19 17:39  3%   ` Olivier Matz
  2018-03-20  9:47  4%     ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-03-19 17:39 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Thomas Monjalon, Yuanhan Liu, Maxime Coquelin, Tiwei Bie,
	keith.wiles, jianfeng.tan, andras.kovacs, laszlo.vadkeri,
	benjamin.walker, bruce.richardson, konstantin.ananyev,
	kuralamudhan.ramakrishnan, louise.m.daly, nelio.laranjeiro,
	yskoh, pepperjo, jerin.jacob, hemant.agrawal

On Sat, Mar 03, 2018 at 01:46:01PM +0000, Anatoly Burakov wrote:

[...]

> --- a/config/common_base
> +++ b/config/common_base
> @@ -61,7 +61,20 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>  CONFIG_RTE_LIBRTE_EAL=y
>  CONFIG_RTE_MAX_LCORE=128
>  CONFIG_RTE_MAX_NUMA_NODES=8
> -CONFIG_RTE_MAX_MEMSEG=256
> +CONFIG_RTE_MAX_MEMSEG_LISTS=32
> +# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
> +# or RTE_MAX_MEM_PER_LIST gigabytes worth of memory, whichever is the smallest
> +CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192
> +CONFIG_RTE_MAX_MEM_PER_LIST=32
> +# a "type" is a combination of page size and NUMA node. total number of memseg
> +# lists per type will be limited to either RTE_MAX_MEMSEG_PER_TYPE pages (split
> +# over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or RTE_MAX_MEM_PER_TYPE
> +# gigabytes of memory (split over multiple lists of RTE_MAX_MEM_PER_LIST),
> +# whichever is the smallest
> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
> +CONFIG_RTE_MAX_MEM_PER_TYPE=128
> +# legacy mem mode only
> +CONFIG_RTE_MAX_LEGACY_MEMSEG=256

Would it be possible to suffix CONFIG_RTE_MAX_MEM_PER_LIST and
CONFIG_RTE_MAX_MEM_PER_TYPE with _GB? It's not that obvious that is it
gigabytes.

What is the impact of changing one of these values on the ABI? And what
would be the impact on performance? The underlying question is: shall we
increase these values to avoid changing them later?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-10 15:39  7%     ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-11 12:51  0%       ` santosh
@ 2018-03-19 17:03  0%       ` Olivier Matz
  2018-03-20 14:41  0%         ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Olivier Matz @ 2018-03-19 17:03 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Bruce Richardson

On Sat, Mar 10, 2018 at 03:39:34PM +0000, Andrew Rybchenko wrote:
> Size of memory chunk required to populate mempool objects depends
> on how objects are stored in the memory. Different mempool drivers
> may have different requirements and a new operation allows to
> calculate memory size in accordance with driver requirements and
> advertise requirements on minimum memory chunk size and alignment
> in a generic way.
> 
> Bump ABI version since the patch breaks it.
> 
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>

Looks good to me. Just see below for few minor comments.

> ---
> RFCv2 -> v1:
>  - move default calc_mem_size callback to rte_mempool_ops_default.c
>  - add ABI changes to release notes
>  - name default callback consistently: rte_mempool_op_<callback>_default()
>  - bump ABI version since it is the first patch which breaks ABI
>  - describe default callback behaviour in details
>  - avoid introduction of internal function to cope with depration

typo (depration)

>    (keep it to deprecation patch)
>  - move cache-line or page boundary chunk alignment to default callback
>  - highlight that min_chunk_size and align parameters are output only

[...]

> --- a/lib/librte_mempool/Makefile
> +++ b/lib/librte_mempool/Makefile
> @@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
>  
>  EXPORT_MAP := rte_mempool_version.map
>  
> -LIBABIVER := 3
> +LIBABIVER := 4
>  
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
>  
> diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
> index 7a4f3da..9e3b527 100644
> --- a/lib/librte_mempool/meson.build
> +++ b/lib/librte_mempool/meson.build
> @@ -1,7 +1,8 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2017 Intel Corporation
>  
> -version = 2
> -sources = files('rte_mempool.c', 'rte_mempool_ops.c')
> +version = 4
> +sources = files('rte_mempool.c', 'rte_mempool_ops.c',
> +		'rte_mempool_ops_default.c')
>  headers = files('rte_mempool.h')
>  deps += ['ring']

It's strange to see that meson does not have the same
.so version than the legacy build system.

+CC Bruce in case he wants to fix this issue separately.

[...]

> --- a/lib/librte_mempool/rte_mempool_version.map
> +++ b/lib/librte_mempool/rte_mempool_version.map
> @@ -51,3 +51,11 @@ DPDK_17.11 {
>  	rte_mempool_populate_iova_tab;
>  
>  } DPDK_16.07;
> +
> +DPDK_18.05 {
> +	global:
> +
> +	rte_mempool_op_calc_mem_size_default;
> +
> +} DPDK_17.11;
> +

Another minor comment. When applying the patch with git am:

Applying: mempool: add op to calculate memory size to be allocated
.git/rebase-apply/patch:399: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                       ` (4 preceding siblings ...)
  2018-03-10 15:39  8%     ` [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area Andrew Rybchenko
@ 2018-03-19 17:03  0%     ` Olivier Matz
  5 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-03-19 17:03 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, Santosh Shukla, Jerin Jacob, Hemant Agrawal, Shreyansh Jain

Hi Andrew,

Thank you for this nice rework.
Globally, the patchset looks good to me. I'm sending some comments
as reply to specific patches.

On Sat, Mar 10, 2018 at 03:39:33PM +0000, Andrew Rybchenko wrote:
> The initial patch series [1] is split into two to simplify processing.
> The second series relies on this one and will add bucket mempool driver
> and related ops.
> 
> The patch series has generic enhancements suggested by Olivier.
> Basically it adds driver callbacks to calculate required memory size and
> to populate objects using provided memory area. It allows to remove
> so-called capability flags used before to tell generic code how to
> allocate and slice allocated memory into mempool objects.
> Clean up which removes get_capabilities and register_memory_area is
> not strictly required, but I think right thing to do.
> Existing mempool drivers are updated.
> 
> I've kept rte_mempool_populate_iova_tab() intact since it seems to
> be not directly related XMEM API functions.

The function rte_mempool_populate_iova_tab() (actually, it was
rte_mempool_populate_phys_tab()) was introduced to support XMEM
API. In my opinion, it can also be deprecated.

> It breaks ABI since changes rte_mempool_ops. Also it removes
> rte_mempool_ops_register_memory_area() and
> rte_mempool_ops_get_capabilities() since corresponding callbacks are
> removed.
> 
> Internal global functions are not listed in map file since it is not
> a part of external API.
> 
> [1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
> 
> RFCv1 -> RFCv2:
>   - add driver ops to calculate required memory size and populate
>     mempool objects, remove extra flags which were required before
>     to control it
>   - transition of octeontx and dpaa drivers to the new callbacks
>   - change info API to get information from driver required to
>     API user to know contiguous block size
>   - remove get_capabilities (not required any more and may be
>     substituted with more in info get API)
>   - remove register_memory_area since it is substituted with
>     populate callback which can do more
>   - use SPDX tags
>   - avoid all objects affinity to single lcore
>   - fix bucket get_count
>   - deprecate XMEM API
>   - avoid introduction of a new function to flush cache
>   - fix NO_CACHE_ALIGN case in bucket mempool
> 
> RFCv2 -> v1:
>   - split the series in two
>   - squash octeontx patches which implement calc_mem_size and populate
>     callbacks into the patch which removes get_capabilities since it is
>     the easiest way to untangle the tangle of tightly related library
>     functions and flags advertised by the driver
>   - consistently name default callbacks
>   - move default callbacks to dedicated file
>   - see detailed description in patches
> 
> Andrew Rybchenko (7):
>   mempool: add op to calculate memory size to be allocated
>   mempool: add op to populate objects using provided memory
>   mempool: remove callback to get capabilities
>   mempool: deprecate xmem functions
>   mempool/octeontx: prepare to remove register memory area op
>   mempool/dpaa: prepare to remove register memory area op
>   mempool: remove callback to register memory area
> 
> Artem V. Andreev (2):
>   mempool: ensure the mempool is initialized before populating
>   mempool: support flushing the default cache of the mempool
> 
>  doc/guides/rel_notes/deprecation.rst            |  12 +-
>  doc/guides/rel_notes/release_18_05.rst          |  32 ++-
>  drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
>  drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
>  lib/librte_mempool/Makefile                     |   3 +-
>  lib/librte_mempool/meson.build                  |   5 +-
>  lib/librte_mempool/rte_mempool.c                | 159 +++++++--------
>  lib/librte_mempool/rte_mempool.h                | 260 +++++++++++++++++-------
>  lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
>  lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
>  lib/librte_mempool/rte_mempool_version.map      |  11 +-
>  test/test/test_mempool.c                        |  31 ---
>  12 files changed, 437 insertions(+), 241 deletions(-)
>  create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c
> 
> -- 
> 2.7.4
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-15 15:08  0%           ` Zhang, Qi Z
@ 2018-03-15 15:38  0%             ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-03-15 15:38 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Thursday, March 15, 2018 3:09 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, March 15, 2018 9:17 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> >
> > Hi Qi,
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z
> > > Sent: Thursday, March 15, 2018 3:14 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > > thomas@monjalon.net
> > > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > > setup
> > >
> > > Hi Konstantin:
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Wednesday, March 14, 2018 8:32 PM
> > > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > > > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang,
> > > > Qi Z <qi.z.zhang@intel.com>
> > > > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > > > setup
> > > >
> > > > Hi Qi,
> > > >
> > > > >
> > > > > The patch let etherdev driver expose the capability flag through
> > > > > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > > > > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > > > > continue to setup the queue or just return fail when device
> > > > > already started.
> > > > >
> > > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > > ---
> > > > >  doc/guides/nics/features.rst  |  8 ++++++++
> > > > > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > > > > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> > > > >  3 files changed, 37 insertions(+), 12 deletions(-)
> > > > >
> > > > > diff --git a/doc/guides/nics/features.rst
> > > > > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > > > > --- a/doc/guides/nics/features.rst
> > > > > +++ b/doc/guides/nics/features.rst
> > > > > @@ -892,7 +892,15 @@ Documentation describes performance
> > values.
> > > > >
> > > > >  See ``dpdk.org/doc/perf/*``.
> > > > >
> > > > > +.. _nic_features_queue_deferred_setup_capabilities:
> > > > >
> > > > > +Queue deferred setup capabilities
> > > > > +---------------------------------
> > > > > +
> > > > > +Supports queue setup / release after device started.
> > > > > +
> > > > > +* **[provides] rte_eth_dev_info**:
> > > > >
> > > >
> > ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> > > > RRED_
> > > > > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > > > > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > > > > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> > > > >
> > > > >  .. _nic_features_other:
> > > > >
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > > > uint16_t rx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > -	if (dev->data->dev_started) {
> > > > > -		RTE_PMD_DEBUG_TRACE(
> > > > > -		    "port %d must be stopped to allow configuration\n",
> > port_id);
> > > > > -		return -EBUSY;
> > > > > -	}
> > > > > -
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > > -ENOTSUP);
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> > > > -ENOTSUP);
> > > > >
> > > > > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t
> > port_id,
> > > > uint16_t rx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > +	if (dev->data->dev_started &&
> > > > > +		!(dev_info.deferred_queue_config_capa &
> > > > > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > > > > +		return -EINVAL;
> > > > > +
> > > >
> > > > I think now you have to check here that the queue is stopped.
> > > > Otherwise you might attempt to reconfigure running queue.
> > >
> > > I'm not sure if it's necessary to let application use different API sequence
> > for a deferred configure and deferred re-configure.
> > > Can we just call dev_ops->rx_queue_stop before rx_queue_release here
> >
> > I don't follow you here.
> > Let say now inside queue_start() we do check:
> >
> > if (dev->data->rx_queue_state[rx_queue_id] !=
> > RTE_ETH_QUEUE_STATE_STOPPED)
> >
> > Right now it is not possible to call queue_setup() without dev_stop() before
> > it - that's why we have check if (dev->data->dev_started) in queue_setup()
> > right now.
> > Though with your patch it not the case anymore - user is able to call
> > queue_setup() without stopping the whole device.
> > But he still has to stop the queue.
> 
> >
> > >
> > > >
> > > >
> > > > >  	rxq = dev->data->rx_queues;
> > > > >  	if (rxq[rx_queue_id]) {
> > > > >
> > 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > > > >  					-ENOTSUP);
> > > >
> > > > I don't think it is *that* straightforward.
> > > > rx_queue_setup() parameters can imply different rx function (and
> > > > related dev
> > > > icesettings) that are already setuped by previous
> > queue_setup()/dev_start.
> > > > So I think you need to do one of 2 things:
> > > > 1. rework ethdev layer to introduce a separate rx function (and
> > > > related
> > > > settings) for each queue.
> > > > 2. at rx_queue_setup() if it is invoked after dev_start - check that
> > > > given queue settings wouldn't contradict with current device
> > > > settings  (rx function, etc.).
> > > > If they do - return an error.
> > > Yes, I think what we have is option 2 here, the
> > > dev_ops->rx_queue_setup will return fail if conflict with previous
> > > setting
> >
> > Hmm and what makes you think that?
> > As I know it is not the case  right now.
> > Let say I do:
> >     ....
> >    rx_queue_setup(port=0,queue=0, mp=mb_size_2048);
> >    dev_start(port=0);
> >    ...
> >    rx_queue_setup(port=0,queue=1,mp=mb_size_1024);
> >
> >  If current rx function doesn't support multi-segs then second
> > rx_queue_setup() should fail.
> >  Though I don't think that would happen with the current implementation.
> 
> Why you think that would not happen? dev_ops->rx_queue_setup can fail, right?
> I mean it's the responsibility of low level driver (i40e) to check the conflict with current implementation.

Yes it is responsibility if the PMD because only it knows its own logic of rx/tx function selection.
But I don't see such changes in i40e in your patch series.
Probably I missed them?

> >
> > Same story for TX offloads, though it probably not that critical, as for most
> > Intel PMDs HW TX offloads will become per port in 18.05.
> >
> > As I can see you do have either of these options implemented right  now -
> > that's the problem.
> >
> > > I'm also thinking about option 1, the idea is to move per queue rx/tx
> > function into driver layer, so it will not break existing API.
> > >
> > > 1. driver can expose the capability like per_queue_rx or per_queue_tx
> > > 2. application can enable this capability by dev_config with
> > > rte_eth_conf 3, if per_queue_rx is not enable, nothing change, so we
> > > are at option 2 4. if per_queue_rx is enabled, driver will set
> > > rx_pkt_burst with a hook function which redirect to an function ptr in
> > > a per queue rx function tables ( I guess performance is impacted
> > > somehow, but this is the cost if you want different offload for
> > > different queue)
> >
> > I don't think we need to overcomplicate things here.
> > It should be transparent to the user - user just calls queue_setup() - based on
> > its input parameters PMD selects a function that fits best.
> > Pretty much what we have right now, just possibly have an array of functions
> > (one per queue).
> 
> If we don't introduce a new capability or something like, but just take per queue functions as default way,
> does that mean, we need to change all drivers to adapt this?
> Or do you mean below?
> 
> If (dev->rx_pkt_burst)
> 	/* default way */
> else
> 	/* per queue function */

For me either way seems ok.
Second one probably a bit easier, as no changes from PMDs are required.
But again - might be even rte_ethdev layer can fill queue's rx_pkt_burst[] array
for the drivers that don't support it - just by copying dev->rx_pkt_burst into it.
Konstantin 

> 
> Regards
> Qi
> 
> >
> > >
> > > >
> > > > From my perspective - 1) is a better choice though it required more
> > > > work, and possibly ABI breakage.
> > > > I did some work in that direction as RFC:
> > > > http://dpdk.org/dev/patchwork/patch/31866/
> > >
> > > I will learn this, thanks for the heads up.
> > > >
> > > > 2) might be also possible, but looks a bit clumsy as
> > > > rx_queue_setup() might now fail even with valid parameters - all
> > > > depends on previous queue configurations.
> > > >
> > > > Same story applies for TX.
> > > >
> > > >
> > > > > +		if (dev->data->dev_started &&
> > > > > +			!(dev_info.deferred_queue_config_capa &
> > > > > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > > > > +			return -EINVAL;
> > > > >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > > > >  		rxq[rx_queue_id] = NULL;
> > > > >  	}
> > > > > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > > > uint16_t tx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > -	if (dev->data->dev_started) {
> > > > > -		RTE_PMD_DEBUG_TRACE(
> > > > > -		    "port %d must be stopped to allow configuration\n",
> > port_id);
> > > > > -		return -EBUSY;
> > > > > -	}
> > > > > -
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > > -ENOTSUP);
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> > > > -ENOTSUP);
> > > > >
> > > > > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t
> > port_id,
> > > > uint16_t tx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > +	if (dev->data->dev_started &&
> > > > > +		!(dev_info.deferred_queue_config_capa &
> > > > > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > > > > +		return -EINVAL;
> > > > > +
> > > > >  	txq = dev->data->tx_queues;
> > > > >  	if (txq[tx_queue_id]) {
> > > > >
> > 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> > > > >  					-ENOTSUP);
> > > > > +		if (dev->data->dev_started &&
> > > > > +			!(dev_info.deferred_queue_config_capa &
> > > > > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > > > > +			return -EINVAL;
> > > > >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > > > >  		txq[tx_queue_id] = NULL;
> > > > >  	}
> > > > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > > > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.h
> > > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> > > > >   */
> > > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > > >
> > > > > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**<
> > Deferred
> > > > setup rx
> > > > > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002
> > /**<
> > > > Deferred
> > > > > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> > > > 0x00000004
> > > > > +/**< Deferred release rx queue */ #define
> > > > > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred
> > release
> > > > tx
> > > > > +queue */
> > > > > +
> > > >
> > > > I don't think we do need flags for both setup a and release.
> > > > If runtime setup is supported - surely dynamic release should be
> > > > supported too.
> > > > Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.
> > >
> > > Agree
> > >
> > > Thanks
> > > Qi
> > >
> > > >
> > > > Konstantin
> > > >
> > > > >  /*
> > > > >   * If new Tx offload capabilities are defined, they also must be
> > > > >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > > > > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> > > > >  	/** Configured number of rx/tx queues */
> > > > >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> > > > >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > > > > +	uint64_t deferred_queue_config_capa;
> > > > > +	/**< queues can be setup/release after dev_start
> > > > > +(DEV_DEFERRED_). */
> > > > >  };
> > > > >
> > > > >  /**
> > > > > --
> > > > > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-15 13:16  0%         ` Ananyev, Konstantin
@ 2018-03-15 15:08  0%           ` Zhang, Qi Z
  2018-03-15 15:38  0%             ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Zhang, Qi Z @ 2018-03-15 15:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, March 15, 2018 9:17 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> Hi Qi,
> 
> > -----Original Message-----
> > From: Zhang, Qi Z
> > Sent: Thursday, March 15, 2018 3:14 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > thomas@monjalon.net
> > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > setup
> >
> > Hi Konstantin:
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Wednesday, March 14, 2018 8:32 PM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang,
> > > Qi Z <qi.z.zhang@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > > setup
> > >
> > > Hi Qi,
> > >
> > > >
> > > > The patch let etherdev driver expose the capability flag through
> > > > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > > > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > > > continue to setup the queue or just return fail when device
> > > > already started.
> > > >
> > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > ---
> > > >  doc/guides/nics/features.rst  |  8 ++++++++
> > > > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > > > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> > > >  3 files changed, 37 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/doc/guides/nics/features.rst
> > > > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > > > --- a/doc/guides/nics/features.rst
> > > > +++ b/doc/guides/nics/features.rst
> > > > @@ -892,7 +892,15 @@ Documentation describes performance
> values.
> > > >
> > > >  See ``dpdk.org/doc/perf/*``.
> > > >
> > > > +.. _nic_features_queue_deferred_setup_capabilities:
> > > >
> > > > +Queue deferred setup capabilities
> > > > +---------------------------------
> > > > +
> > > > +Supports queue setup / release after device started.
> > > > +
> > > > +* **[provides] rte_eth_dev_info**:
> > > >
> > >
> ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> > > RRED_
> > > > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > > > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > > > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> > > >
> > > >  .. _nic_features_other:
> > > >
> > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > > uint16_t rx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > -	if (dev->data->dev_started) {
> > > > -		RTE_PMD_DEBUG_TRACE(
> > > > -		    "port %d must be stopped to allow configuration\n",
> port_id);
> > > > -		return -EBUSY;
> > > > -	}
> > > > -
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > -ENOTSUP);
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> > > -ENOTSUP);
> > > >
> > > > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t
> port_id,
> > > uint16_t rx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > +	if (dev->data->dev_started &&
> > > > +		!(dev_info.deferred_queue_config_capa &
> > > > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > > > +		return -EINVAL;
> > > > +
> > >
> > > I think now you have to check here that the queue is stopped.
> > > Otherwise you might attempt to reconfigure running queue.
> >
> > I'm not sure if it's necessary to let application use different API sequence
> for a deferred configure and deferred re-configure.
> > Can we just call dev_ops->rx_queue_stop before rx_queue_release here
> 
> I don't follow you here.
> Let say now inside queue_start() we do check:
> 
> if (dev->data->rx_queue_state[rx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED)
> 
> Right now it is not possible to call queue_setup() without dev_stop() before
> it - that's why we have check if (dev->data->dev_started) in queue_setup()
> right now.
> Though with your patch it not the case anymore - user is able to call
> queue_setup() without stopping the whole device.
> But he still has to stop the queue.

> 
> >
> > >
> > >
> > > >  	rxq = dev->data->rx_queues;
> > > >  	if (rxq[rx_queue_id]) {
> > > >
> 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > > >  					-ENOTSUP);
> > >
> > > I don't think it is *that* straightforward.
> > > rx_queue_setup() parameters can imply different rx function (and
> > > related dev
> > > icesettings) that are already setuped by previous
> queue_setup()/dev_start.
> > > So I think you need to do one of 2 things:
> > > 1. rework ethdev layer to introduce a separate rx function (and
> > > related
> > > settings) for each queue.
> > > 2. at rx_queue_setup() if it is invoked after dev_start - check that
> > > given queue settings wouldn't contradict with current device
> > > settings  (rx function, etc.).
> > > If they do - return an error.
> > Yes, I think what we have is option 2 here, the
> > dev_ops->rx_queue_setup will return fail if conflict with previous
> > setting
> 
> Hmm and what makes you think that?
> As I know it is not the case  right now.
> Let say I do:
>     ....
>    rx_queue_setup(port=0,queue=0, mp=mb_size_2048);
>    dev_start(port=0);
>    ...
>    rx_queue_setup(port=0,queue=1,mp=mb_size_1024);
> 
>  If current rx function doesn't support multi-segs then second
> rx_queue_setup() should fail.
>  Though I don't think that would happen with the current implementation.

Why you think that would not happen? dev_ops->rx_queue_setup can fail, right?
I mean it's the responsibility of low level driver (i40e) to check the conflict with current implementation.
> 
> Same story for TX offloads, though it probably not that critical, as for most
> Intel PMDs HW TX offloads will become per port in 18.05.
> 
> As I can see you do have either of these options implemented right  now -
> that's the problem.
> 
> > I'm also thinking about option 1, the idea is to move per queue rx/tx
> function into driver layer, so it will not break existing API.
> >
> > 1. driver can expose the capability like per_queue_rx or per_queue_tx
> > 2. application can enable this capability by dev_config with
> > rte_eth_conf 3, if per_queue_rx is not enable, nothing change, so we
> > are at option 2 4. if per_queue_rx is enabled, driver will set
> > rx_pkt_burst with a hook function which redirect to an function ptr in
> > a per queue rx function tables ( I guess performance is impacted
> > somehow, but this is the cost if you want different offload for
> > different queue)
> 
> I don't think we need to overcomplicate things here.
> It should be transparent to the user - user just calls queue_setup() - based on
> its input parameters PMD selects a function that fits best.
> Pretty much what we have right now, just possibly have an array of functions
> (one per queue).

If we don't introduce a new capability or something like, but just take per queue functions as default way, 
does that mean, we need to change all drivers to adapt this?
Or do you mean below?

If (dev->rx_pkt_burst)
	/* default way */
else
	/* per queue function */

Regards
Qi

> 
> >
> > >
> > > From my perspective - 1) is a better choice though it required more
> > > work, and possibly ABI breakage.
> > > I did some work in that direction as RFC:
> > > http://dpdk.org/dev/patchwork/patch/31866/
> >
> > I will learn this, thanks for the heads up.
> > >
> > > 2) might be also possible, but looks a bit clumsy as
> > > rx_queue_setup() might now fail even with valid parameters - all
> > > depends on previous queue configurations.
> > >
> > > Same story applies for TX.
> > >
> > >
> > > > +		if (dev->data->dev_started &&
> > > > +			!(dev_info.deferred_queue_config_capa &
> > > > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > > > +			return -EINVAL;
> > > >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > > >  		rxq[rx_queue_id] = NULL;
> > > >  	}
> > > > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > > uint16_t tx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > -	if (dev->data->dev_started) {
> > > > -		RTE_PMD_DEBUG_TRACE(
> > > > -		    "port %d must be stopped to allow configuration\n",
> port_id);
> > > > -		return -EBUSY;
> > > > -	}
> > > > -
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > -ENOTSUP);
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> > > -ENOTSUP);
> > > >
> > > > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t
> port_id,
> > > uint16_t tx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > +	if (dev->data->dev_started &&
> > > > +		!(dev_info.deferred_queue_config_capa &
> > > > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > > > +		return -EINVAL;
> > > > +
> > > >  	txq = dev->data->tx_queues;
> > > >  	if (txq[tx_queue_id]) {
> > > >
> 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> > > >  					-ENOTSUP);
> > > > +		if (dev->data->dev_started &&
> > > > +			!(dev_info.deferred_queue_config_capa &
> > > > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > > > +			return -EINVAL;
> > > >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > > >  		txq[tx_queue_id] = NULL;
> > > >  	}
> > > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > > > --- a/lib/librte_ether/rte_ethdev.h
> > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> > > >   */
> > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > >
> > > > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**<
> Deferred
> > > setup rx
> > > > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002
> /**<
> > > Deferred
> > > > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> > > 0x00000004
> > > > +/**< Deferred release rx queue */ #define
> > > > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred
> release
> > > tx
> > > > +queue */
> > > > +
> > >
> > > I don't think we do need flags for both setup a and release.
> > > If runtime setup is supported - surely dynamic release should be
> > > supported too.
> > > Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.
> >
> > Agree
> >
> > Thanks
> > Qi
> >
> > >
> > > Konstantin
> > >
> > > >  /*
> > > >   * If new Tx offload capabilities are defined, they also must be
> > > >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > > > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> > > >  	/** Configured number of rx/tx queues */
> > > >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> > > >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > > > +	uint64_t deferred_queue_config_capa;
> > > > +	/**< queues can be setup/release after dev_start
> > > > +(DEV_DEFERRED_). */
> > > >  };
> > > >
> > > >  /**
> > > > --
> > > > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-15  3:13  0%       ` Zhang, Qi Z
@ 2018-03-15 13:16  0%         ` Ananyev, Konstantin
  2018-03-15 15:08  0%           ` Zhang, Qi Z
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-03-15 13:16 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo

Hi Qi,

> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Thursday, March 15, 2018 3:14 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> Hi Konstantin:
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Wednesday, March 14, 2018 8:32 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> >
> > Hi Qi,
> >
> > >
> > > The patch let etherdev driver expose the capability flag through
> > > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > > continue to setup the queue or just return fail when device already
> > > started.
> > >
> > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > ---
> > >  doc/guides/nics/features.rst  |  8 ++++++++
> > > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> > >  3 files changed, 37 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/doc/guides/nics/features.rst
> > > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > > --- a/doc/guides/nics/features.rst
> > > +++ b/doc/guides/nics/features.rst
> > > @@ -892,7 +892,15 @@ Documentation describes performance values.
> > >
> > >  See ``dpdk.org/doc/perf/*``.
> > >
> > > +.. _nic_features_queue_deferred_setup_capabilities:
> > >
> > > +Queue deferred setup capabilities
> > > +---------------------------------
> > > +
> > > +Supports queue setup / release after device started.
> > > +
> > > +* **[provides] rte_eth_dev_info**:
> > >
> > ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> > RRED_
> > > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> > >
> > >  .. _nic_features_other:
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > uint16_t rx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > -	if (dev->data->dev_started) {
> > > -		RTE_PMD_DEBUG_TRACE(
> > > -		    "port %d must be stopped to allow configuration\n", port_id);
> > > -		return -EBUSY;
> > > -	}
> > > -
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > -ENOTSUP);
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> > -ENOTSUP);
> > >
> > > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > uint16_t rx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > +	if (dev->data->dev_started &&
> > > +		!(dev_info.deferred_queue_config_capa &
> > > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > > +		return -EINVAL;
> > > +
> >
> > I think now you have to check here that the queue is stopped.
> > Otherwise you might attempt to reconfigure running queue.
> 
> I'm not sure if it's necessary to let application use different API sequence for a deferred configure and deferred re-configure.
> Can we just call dev_ops->rx_queue_stop before rx_queue_release here

I don't follow you here.
Let say now inside queue_start() we do check:

if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED)

Right now it is not possible to call queue_setup() without dev_stop() before it -
that's why we have check if (dev->data->dev_started) in queue_setup() right now.
Though with your patch it not the case anymore - user is able to call queue_setup()
without stopping the whole device.
But he still has to stop the queue. 

> 
> >
> >
> > >  	rxq = dev->data->rx_queues;
> > >  	if (rxq[rx_queue_id]) {
> > >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > >  					-ENOTSUP);
> >
> > I don't think it is *that* straightforward.
> > rx_queue_setup() parameters can imply different rx function (and related dev
> > icesettings) that are already setuped by previous queue_setup()/dev_start.
> > So I think you need to do one of 2 things:
> > 1. rework ethdev layer to introduce a separate rx function (and related
> > settings) for each queue.
> > 2. at rx_queue_setup() if it is invoked after dev_start - check that given
> > queue settings wouldn't contradict with current device settings  (rx function,
> > etc.).
> > If they do - return an error.
> Yes, I think what we have is option 2 here, the dev_ops->rx_queue_setup will return fail if conflict with previous setting

Hmm and what makes you think that?
As I know it is not the case  right now.
Let say I do:
    ....
   rx_queue_setup(port=0,queue=0, mp=mb_size_2048);
   dev_start(port=0);
   ...
   rx_queue_setup(port=0,queue=1,mp=mb_size_1024);
   
 If current rx function doesn't support multi-segs then second rx_queue_setup() should fail.
 Though I don't think that would happen with the current implementation. 

Same story for TX offloads, though it probably not that critical, as for most Intel PMDs HW TX offloads will become per port in 18.05.

As I can see you do have either of these options implemented right  now - that's the problem.

> I'm also thinking about option 1, the idea is to move per queue rx/tx function into driver layer, so it will not break existing API.
> 
> 1. driver can expose the capability like per_queue_rx or per_queue_tx
> 2. application can enable this capability by dev_config with rte_eth_conf
> 3, if per_queue_rx is not enable, nothing change, so we are at option 2
> 4. if per_queue_rx is enabled, driver will set rx_pkt_burst with a hook function which redirect to an function ptr in a per queue rx function
> tables ( I guess performance is impacted somehow, but this is the cost if you want different offload for different queue)

I don't think we need to overcomplicate things here.
It should be transparent to the user - user just calls queue_setup() - based on its input parameters
PMD selects a function that fits best.
Pretty much what we have right now, just possibly have an array of functions (one per queue).

> 
> >
> > From my perspective - 1) is a better choice though it required more work,
> > and possibly ABI breakage.
> > I did some work in that direction as RFC:
> > http://dpdk.org/dev/patchwork/patch/31866/
> 
> I will learn this, thanks for the heads up.
> >
> > 2) might be also possible, but looks a bit clumsy as rx_queue_setup() might
> > now fail even with valid parameters - all depends on previous queue
> > configurations.
> >
> > Same story applies for TX.
> >
> >
> > > +		if (dev->data->dev_started &&
> > > +			!(dev_info.deferred_queue_config_capa &
> > > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > > +			return -EINVAL;
> > >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > >  		rxq[rx_queue_id] = NULL;
> > >  	}
> > > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > uint16_t tx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > -	if (dev->data->dev_started) {
> > > -		RTE_PMD_DEBUG_TRACE(
> > > -		    "port %d must be stopped to allow configuration\n", port_id);
> > > -		return -EBUSY;
> > > -	}
> > > -
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > -ENOTSUP);
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> > -ENOTSUP);
> > >
> > > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > uint16_t tx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > +	if (dev->data->dev_started &&
> > > +		!(dev_info.deferred_queue_config_capa &
> > > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > > +		return -EINVAL;
> > > +
> > >  	txq = dev->data->tx_queues;
> > >  	if (txq[tx_queue_id]) {
> > >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> > >  					-ENOTSUP);
> > > +		if (dev->data->dev_started &&
> > > +			!(dev_info.deferred_queue_config_capa &
> > > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > > +			return -EINVAL;
> > >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > >  		txq[tx_queue_id] = NULL;
> > >  	}
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> > >   */
> > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > >
> > > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**< Deferred
> > setup rx
> > > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002 /**<
> > Deferred
> > > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> > 0x00000004
> > > +/**< Deferred release rx queue */ #define
> > > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred release
> > tx
> > > +queue */
> > > +
> >
> > I don't think we do need flags for both setup a and release.
> > If runtime setup is supported - surely dynamic release should be supported
> > too.
> > Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.
> 
> Agree
> 
> Thanks
> Qi
> 
> >
> > Konstantin
> >
> > >  /*
> > >   * If new Tx offload capabilities are defined, they also must be
> > >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> > >  	/** Configured number of rx/tx queues */
> > >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> > >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > > +	uint64_t deferred_queue_config_capa;
> > > +	/**< queues can be setup/release after dev_start (DEV_DEFERRED_). */
> > >  };
> > >
> > >  /**
> > > --
> > > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-14 12:31  0%     ` Ananyev, Konstantin
@ 2018-03-15  3:13  0%       ` Zhang, Qi Z
  2018-03-15 13:16  0%         ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Zhang, Qi Z @ 2018-03-15  3:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo

Hi Konstantin:

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, March 14, 2018 8:32 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> Hi Qi,
> 
> >
> > The patch let etherdev driver expose the capability flag through
> > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > continue to setup the queue or just return fail when device already
> > started.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> >  doc/guides/nics/features.rst  |  8 ++++++++
> > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> >  3 files changed, 37 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/nics/features.rst
> > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > --- a/doc/guides/nics/features.rst
> > +++ b/doc/guides/nics/features.rst
> > @@ -892,7 +892,15 @@ Documentation describes performance values.
> >
> >  See ``dpdk.org/doc/perf/*``.
> >
> > +.. _nic_features_queue_deferred_setup_capabilities:
> >
> > +Queue deferred setup capabilities
> > +---------------------------------
> > +
> > +Supports queue setup / release after device started.
> > +
> > +* **[provides] rte_eth_dev_info**:
> >
> ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> RRED_
> > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> >
> >  .. _nic_features_other:
> >
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > -	if (dev->data->dev_started) {
> > -		RTE_PMD_DEBUG_TRACE(
> > -		    "port %d must be stopped to allow configuration\n", port_id);
> > -		return -EBUSY;
> > -	}
> > -
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> -ENOTSUP);
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> -ENOTSUP);
> >
> > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.deferred_queue_config_capa &
> > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > +		return -EINVAL;
> > +
> 
> I think now you have to check here that the queue is stopped.
> Otherwise you might attempt to reconfigure running queue.

I'm not sure if it's necessary to let application use different API sequence for a deferred configure and deferred re-configure.
Can we just call dev_ops->rx_queue_stop before rx_queue_release here

> 
> 
> >  	rxq = dev->data->rx_queues;
> >  	if (rxq[rx_queue_id]) {
> >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> >  					-ENOTSUP);
> 
> I don't think it is *that* straightforward.
> rx_queue_setup() parameters can imply different rx function (and related dev
> icesettings) that are already setuped by previous queue_setup()/dev_start.
> So I think you need to do one of 2 things:
> 1. rework ethdev layer to introduce a separate rx function (and related
> settings) for each queue.
> 2. at rx_queue_setup() if it is invoked after dev_start - check that given
> queue settings wouldn't contradict with current device settings  (rx function,
> etc.).
> If they do - return an error.
Yes, I think what we have is option 2 here, the dev_ops->rx_queue_setup will return fail if conflict with previous setting
I'm also thinking about option 1, the idea is to move per queue rx/tx function into driver layer, so it will not break existing API.

1. driver can expose the capability like per_queue_rx or per_queue_tx
2. application can enable this capability by dev_config with rte_eth_conf
3, if per_queue_rx is not enable, nothing change, so we are at option 2
4. if per_queue_rx is enabled, driver will set rx_pkt_burst with a hook function which redirect to an function ptr in a per queue rx function tables ( I guess performance is impacted somehow, but this is the cost if you want different offload for different queue)

> 
> From my perspective - 1) is a better choice though it required more work,
> and possibly ABI breakage.
> I did some work in that direction as RFC:
> http://dpdk.org/dev/patchwork/patch/31866/

I will learn this, thanks for the heads up.
> 
> 2) might be also possible, but looks a bit clumsy as rx_queue_setup() might
> now fail even with valid parameters - all depends on previous queue
> configurations.
> 
> Same story applies for TX.
> 
> 
> > +		if (dev->data->dev_started &&
> > +			!(dev_info.deferred_queue_config_capa &
> > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > +			return -EINVAL;
> >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> >  		rxq[rx_queue_id] = NULL;
> >  	}
> > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > -	if (dev->data->dev_started) {
> > -		RTE_PMD_DEBUG_TRACE(
> > -		    "port %d must be stopped to allow configuration\n", port_id);
> > -		return -EBUSY;
> > -	}
> > -
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> -ENOTSUP);
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> -ENOTSUP);
> >
> > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.deferred_queue_config_capa &
> > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > +		return -EINVAL;
> > +
> >  	txq = dev->data->tx_queues;
> >  	if (txq[tx_queue_id]) {
> >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> >  					-ENOTSUP);
> > +		if (dev->data->dev_started &&
> > +			!(dev_info.deferred_queue_config_capa &
> > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > +			return -EINVAL;
> >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> >  		txq[tx_queue_id] = NULL;
> >  	}
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> >   */
> >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> >
> > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**< Deferred
> setup rx
> > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002 /**<
> Deferred
> > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> 0x00000004
> > +/**< Deferred release rx queue */ #define
> > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred release
> tx
> > +queue */
> > +
> 
> I don't think we do need flags for both setup a and release.
> If runtime setup is supported - surely dynamic release should be supported
> too.
> Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.

Agree

Thanks
Qi

> 
> Konstantin
> 
> >  /*
> >   * If new Tx offload capabilities are defined, they also must be
> >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> >  	/** Configured number of rx/tx queues */
> >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > +	uint64_t deferred_queue_config_capa;
> > +	/**< queues can be setup/release after dev_start (DEV_DEFERRED_). */
> >  };
> >
> >  /**
> > --
> > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  @ 2018-03-14 12:31  0%     ` Ananyev, Konstantin
  2018-03-15  3:13  0%       ` Zhang, Qi Z
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-03-14 12:31 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas
  Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo, Zhang, Qi Z

Hi Qi,

> 
> The patch let etherdev driver expose the capability flag through
> rte_eth_dev_info_get when it support deferred queue configuraiton,
> then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> continue to setup the queue or just return fail when device already
> started.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>  doc/guides/nics/features.rst  |  8 ++++++++
>  lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
>  lib/librte_ether/rte_ethdev.h | 11 +++++++++++
>  3 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 1b4fb979f..36ad21a1f 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -892,7 +892,15 @@ Documentation describes performance values.
> 
>  See ``dpdk.org/doc/perf/*``.
> 
> +.. _nic_features_queue_deferred_setup_capabilities:
> 
> +Queue deferred setup capabilities
> +---------------------------------
> +
> +Supports queue setup / release after device started.
> +
> +* **[provides] rte_eth_dev_info**:
> ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFERRED_TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> +* **[related]  API**: ``rte_eth_dev_info_get()``.
> 
>  .. _nic_features_other:
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index a6ce2a5ba..6c906c4df 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  		return -EINVAL;
>  	}
> 
> -	if (dev->data->dev_started) {
> -		RTE_PMD_DEBUG_TRACE(
> -		    "port %d must be stopped to allow configuration\n", port_id);
> -		return -EBUSY;
> -	}
> -
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
> 
> @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  		return -EINVAL;
>  	}
> 
> +	if (dev->data->dev_started &&
> +		!(dev_info.deferred_queue_config_capa &
> +			DEV_DEFERRED_RX_QUEUE_SETUP))
> +		return -EINVAL;
> +

I think now you have to check here that the queue is stopped.
Otherwise you might attempt to reconfigure running queue.


>  	rxq = dev->data->rx_queues;
>  	if (rxq[rx_queue_id]) {
>  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
>  					-ENOTSUP);

I don't think it is *that* straightforward.
rx_queue_setup() parameters can imply different rx function (and related dev icesettings)
that are already setuped by previous queue_setup()/dev_start.
So I think you need to do one of 2 things:
1. rework ethdev layer to introduce a separate rx function (and related settings) for each queue.
2. at rx_queue_setup() if it is invoked after dev_start - check that given queue settings wouldn't
contradict with current device settings  (rx function, etc.).
If they do - return an error.

>From my perspective - 1) is a better choice though it required more work, and possibly ABI breakage.
I did some work in that direction as RFC:
http://dpdk.org/dev/patchwork/patch/31866/

2) might be also possible, but looks a bit clumsy as rx_queue_setup() might now fail even with
valid parameters - all depends on previous queue configurations.

Same story applies for TX.


> +		if (dev->data->dev_started &&
> +			!(dev_info.deferred_queue_config_capa &
> +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> +			return -EINVAL;
>  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>  		rxq[rx_queue_id] = NULL;
>  	}
> @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>  		return -EINVAL;
>  	}
> 
> -	if (dev->data->dev_started) {
> -		RTE_PMD_DEBUG_TRACE(
> -		    "port %d must be stopped to allow configuration\n", port_id);
> -		return -EBUSY;
> -	}
> -
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup, -ENOTSUP);
> 
> @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>  		return -EINVAL;
>  	}
> 
> +	if (dev->data->dev_started &&
> +		!(dev_info.deferred_queue_config_capa &
> +			DEV_DEFERRED_TX_QUEUE_SETUP))
> +		return -EINVAL;
> +
>  	txq = dev->data->tx_queues;
>  	if (txq[tx_queue_id]) {
>  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
>  					-ENOTSUP);
> +		if (dev->data->dev_started &&
> +			!(dev_info.deferred_queue_config_capa &
> +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> +			return -EINVAL;
>  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
>  		txq[tx_queue_id] = NULL;
>  	}
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 036153306..410e58c50 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -981,6 +981,15 @@ struct rte_eth_conf {
>   */
>  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> 
> +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001
> +/**< Deferred setup rx queue */
> +#define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002
> +/**< Deferred setup tx queue */
> +#define DEV_DEFERRED_RX_QUEUE_RELEASE 0x00000004
> +/**< Deferred release rx queue */
> +#define DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008
> +/**< Deferred release tx queue */
> +

I don't think we do need flags for both setup a and release.
If runtime setup is supported - surely dynamic release should be supported too.
Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.

Konstantin

>  /*
>   * If new Tx offload capabilities are defined, they also must be
>   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
>  	/** Configured number of rx/tx queues */
>  	uint16_t nb_rx_queues; /**< Number of RX queues. */
>  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> +	uint64_t deferred_queue_config_capa;
> +	/**< queues can be setup/release after dev_start (DEV_DEFERRED_). */
>  };
> 
>  /**
> --
> 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  @ 2018-03-14 11:09  4%   ` Bruce Richardson
  2018-03-25 18:17  0%     ` Vladimir Medvedkin
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-03-14 11:09 UTC (permalink / raw)
  To: Medvedkin Vladimir; +Cc: dev

On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> RIB is an alternative to current LPM library.
> It solves the following problems
>  - Increases the speed of control plane operations against lpm such as
>    adding/deleting routes
>  - Adds abstraction from dataplane algorithms, so it is possible to add
>    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
>    in addition to current dir24_8
>  - It is possible to keep user defined application specific additional
>    information in struct rte_rib_node which represents route entry.
>    It can be next hop/set of next hops (i.e. active and feasible),
>    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
>    plenty of additional control plane information.
>  - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
>    field that helps to save 6 bits.
>  - Also new dir24_8 implementation supports different next_hop sizes
>    (1/2/4/8 bytes per next hop)
>  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary operator.
>    Instead it returns special default value if there is no route.
> 
> Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> ---
>  config/common_base                 |   6 +
>  doc/api/doxy-api.conf              |   1 +
>  lib/Makefile                       |   2 +
>  lib/librte_rib/Makefile            |  22 ++
>  lib/librte_rib/rte_dir24_8.c       | 482 +++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
>  lib/librte_rib/rte_rib.c           | 526 +++++++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
>  lib/librte_rib/rte_rib_version.map |  18 ++
>  mk/rte.app.mk                      |   1 +
>  10 files changed, 1496 insertions(+)
>  create mode 100644 lib/librte_rib/Makefile
>  create mode 100644 lib/librte_rib/rte_dir24_8.c
>  create mode 100644 lib/librte_rib/rte_dir24_8.h
>  create mode 100644 lib/librte_rib/rte_rib.c
>  create mode 100644 lib/librte_rib/rte_rib.h
>  create mode 100644 lib/librte_rib/rte_rib_version.map
> 

First pass review comments. For now just reviewed the main public header
file rte_rib.h. Later reviews will cover the other files as best I can.

/Bruce

<snip>
> diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> new file mode 100644
> index 0000000..6eac8fb
> --- /dev/null
> +++ b/lib/librte_rib/rte_rib.h
> @@ -0,0 +1,322 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> + */
> +
> +#ifndef _RTE_RIB_H_
> +#define _RTE_RIB_H_
> +
> +/**
> + * @file
> + * Compressed trie implementation for Longest Prefix Match
> + */
> +
> +/** @internal Macro to enable/disable run-time checks. */
> +#if defined(RTE_LIBRTE_RIB_DEBUG)
> +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {	\
> +	if (cond)					\
> +		return retval;				\
> +} while (0)
> +#else
> +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> +#endif

use RTE_ASSERT?

> +
> +#define RTE_RIB_VALID_NODE	1

should there be an INVALID_NODE macro?

> +#define RTE_RIB_GET_NXT_ALL	0
> +#define RTE_RIB_GET_NXT_COVER	1
> +
> +#define RTE_RIB_INVALID_ROUTE	0
> +#define RTE_RIB_VALID_ROUTE	1
> +
> +/** Max number of characters in RIB name. */
> +#define RTE_RIB_NAMESIZE	64
> +
> +/** Maximum depth value possible for IPv4 RIB. */
> +#define RTE_RIB_MAXDEPTH	32

I think we should have IPv4 in the name here. Will it not be extended to
support IPv6 in future?

> +
> +/**
> + * Macro to check if prefix1 {key1/depth1}
> + * is covered by prefix2 {key2/depth2}
> + */
> +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)			\
> +	((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> +		&& (depth1 > depth2))
Neat check!

Any particular reason for using UINT64_MAX here rather than UINT32_MAX?
I think you can avoid the casting and have a slightly shorter mask by
changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to 
"~(UINT32_MAX >> depth2)"
I'd also suggest for readability putting the second check first, and,
for maintainability, using an inline function rather than a macro.

> +
> +/** @internal Macro to get next node in tree*/
> +#define RTE_RIB_GET_NXT_NODE(node, key)					\
> +	((key & (1 << (31 - node->depth))) ? node->right : node->left)
> +/** @internal Macro to check if node is right child*/
> +#define RTE_RIB_IS_RIGHT_NODE(node)	(node->parent->right == node)

Again, consider inline fns rather than macros.
For the latter macro, rather than doing additional pointer derefs to
parent, can you also get if it's a right node by using:
"(node->key & (1 << (32 - node->depth)))"? 

> +
> +
> +struct rte_rib_node {
> +	struct rte_rib_node *left;
> +	struct rte_rib_node *right;
> +	struct rte_rib_node *parent;
> +	uint32_t	key;
> +	uint8_t		depth;
> +	uint8_t		flag;
> +	uint64_t	nh;
> +	uint64_t	ext[0];
> +};
> +
> +struct rte_rib;
> +
> +/** Type of FIB struct*/
> +enum rte_rib_type {
> +	RTE_RIB_DIR24_8_1B,
> +	RTE_RIB_DIR24_8_2B,
> +	RTE_RIB_DIR24_8_4B,
> +	RTE_RIB_DIR24_8_8B,
> +	RTE_RIB_TYPE_MAX
> +};

If the plan is to support multiple underlying fib types and algorithms
under the rib library, would it not be better to separate out the
algorithm part from the data storage part? So have the type just be
DIR_24_8, and have the 1, 2, 4 or 8 specified separately.

> +
> +enum rte_rib_op {
> +	RTE_RIB_ADD,
> +	RTE_RIB_DEL
> +};
> +
> +/** RIB nodes allocation type */
> +enum rte_rib_alloc_type {
> +	RTE_RIB_MALLOC,
> +	RTE_RIB_MEMPOOL,
> +	RTE_RIB_ALLOC_MAX
> +};

Not sure you need this any more. Malloc allocations and mempool
allocations are now pretty much the same thing.

> +
> +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> +	uint8_t depth, uint64_t next_hop, enum rte_rib_op op);

Do you anticipate more ops in future than just add and delete? If not,
why not just split this function into two and drop the op struct.

> +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> +	uint64_t *next_hops, const unsigned n);
> +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib *rib);
> +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> +	struct rte_rib_node *node);
> +
> +struct rte_rib {
> +	char name[RTE_RIB_NAMESIZE];
> +	/*pointer to rib trie*/
> +	struct rte_rib_node	*trie;
> +	/*pointer to dataplane struct*/
> +	void	*fib;
> +	/*prefix modification*/
> +	rte_rib_modify_fn_t	modify;
> +	/* Bulk lookup fn*/
> +	rte_rib_tree_lookup_fn_t	lookup;
> +	/*alloc trie element*/
> +	rte_rib_alloc_node_fn_t	alloc_node;
> +	/*free trie element*/
> +	rte_rib_free_node_fn_t	free_node;
> +	struct rte_mempool	*node_pool;
> +	uint32_t		cur_nodes;
> +	uint32_t		cur_routes;
> +	int			max_nodes;
> +	int			node_sz;
> +	enum rte_rib_type	type;
> +	enum rte_rib_alloc_type	alloc_type;
> +};
> +
> +/** RIB configuration structure */
> +struct rte_rib_conf {
> +	enum rte_rib_type	type;
> +	enum rte_rib_alloc_type	alloc_type;
> +	int	max_nodes;
> +	size_t	node_sz;
> +	uint64_t def_nh;
> +};
> +
> +/**
> + * Lookup an IP into the RIB structure
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  IP to be looked up in the RIB
> + * @return
> + *  pointer to struct rte_rib_node on success,
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> +
> +/**
> + * Lookup less specific route into the RIB structure
> + *
> + * @param ent
> + *  Pointer to struct rte_rib_node that represents target route
> + * @return
> + *  pointer to struct rte_rib_node that represents
> + *  less specific route on success,
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> +
> +/**
> + * Lookup prefix into the RIB structure
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net to be looked up in the RIB
> + * @param depth
> + *  prefix length
> + * @return
> + *  pointer to struct rte_rib_node on success,
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t depth);

Can you explain the difference between this and regular lookup, and how
they would be used. I don't think the names convey the differences
sufficiently, and so we should look to rename one or both to be clearer.

> +
> +/**
> + * Retrieve next more specific prefix from the RIB
s/more/most/

> + * that is covered by key/depth supernet
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net address of supernet prefix that covers returned more specific prefixes
> + * @param depth
> + *  supernet prefix length
> + * @param cur
> + *   pointer to the last returned prefix to get next prefix
> + *   or
> + *   NULL to get first more specific prefix
> + * @param flag
> + *  -RTE_RIB_GET_NXT_ALL
> + *   get all prefixes from subtrie

By all prefixes do you mean more specific, i.e. the final prefix?

> + *  -RTE_RIB_GET_NXT_COVER
> + *   get only first more specific prefix even if it have more specifics
> + * @return
> + *  pointer to the next more specific prefix
> + *  or
> + *  NULL if there is no prefixes left
> + */
> +struct rte_rib_node *
> +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> +	struct rte_rib_node *cur, int flag);
> +
> +/**
> + * Remove prefix from the RIB
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net to be removed from the RIB
> + * @param depth
> + *  prefix length
> + */
> +void
> +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> +
> +/**
> + * Insert prefix into the RIB
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net to be inserted to the RIB
> + * @param depth
> + *  prefix length
> + * @return
> + *  pointer to new rte_rib_node on success
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> +
> +/**
> + * Create RIB
> + *
> + * @param name
> + *  RIB name
> + * @param socket_id
> + *  NUMA socket ID for RIB table memory allocation
> + * @param conf
> + *  Structure containing the configuration
> + * @return
> + *  Handle to RIB object on success
> + *  NULL otherwise with rte_errno set to an appropriate values.
> + */
> +struct rte_rib *
> +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf *conf);
> +
> +/**
> + * Find an existing RIB object and return a pointer to it.
> + *
> + * @param name
> + *  Name of the rib object as passed to rte_rib_create()
> + * @return
> + *  Pointer to rib object or NULL if object not found with rte_errno
> + *  set appropriately. Possible rte_errno values include:
> + *   - ENOENT - required entry not available to return.
> + */
> +struct rte_rib *
> +rte_rib_find_existing(const char *name);
> +
> +/**
> + * Free an RIB object.
> + *
> + * @param rib
> + *   RIB object handle
> + * @return
> + *   None
> + */
> +void
> +rte_rib_free(struct rte_rib *rib);
> +
> +/**
> + * Add a rule to the RIB.
> + *
> + * @param rib
> + *   RIB object handle
> + * @param ip
> + *   IP of the rule to be added to the RIB
> + * @param depth
> + *   Depth of the rule to be added to the RIB
> + * @param next_hop
> + *   Next hop of the rule to be added to the RIB
> + * @return
> + *   0 on success, negative value otherwise
> + */
> +int
> +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t next_hop);
> +
> +/**
> + * Delete a rule from the RIB.
> + *
> + * @param rib
> + *   RIB object handle
> + * @param ip
> + *   IP of the rule to be deleted from the RIB
> + * @param depth
> + *   Depth of the rule to be deleted from the RIB
> + * @return
> + *   0 on success, negative value otherwise
> + */
> +int
> +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> +
> +/**
> + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> + * macro, so the address of the function should not be used.
> + *
> + * @param RIB
> + *   RIB object handle
> + * @param ips
> + *   Array of IPs to be looked up in the FIB
> + * @param next_hops
> + *   Next hop of the most specific rule found for IP.
> + *   This is an array of eight byte values.
> + *   If the lookup for the given IP failed, then corresponding element would
> + *   contain default value, see description of then next parameter.
> + * @param n
> + *   Number of elements in ips (and next_hops) array to lookup. This should be a
> + *   compile time constant, and divisible by 8 for best performance.
> + * @param defv
> + *   Default value to populate into corresponding element of hop[] array,
> + *   if lookup would fail.
> + *  @return
> + *   -EINVAL for incorrect arguments, otherwise 0
> + */
> +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)	\
> +	rib->lookup(rib->fib, ips, next_hops, n)

My main thought here is whether this needs to be a function at all?
Given that it takes a full burst of addresses in a single go, how much
performance would actually be lost by making this a regular function in
the C file?
IF we do convert this to a regular function, then a lot of the structure
definitions above - most importantly, the rib structure itself - can
probably be moved to a private header file and not exposed to
applications at all. This will make ABI compatibility a *lot* easier, as
the structures can be changed without affecting the public ABI.

/Bruce

> +
> +#endif /* _RTE_RIB_H_ */

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [RFC] Switch device offload with DPDK
@ 2018-03-12 15:55  1% Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-03-12 15:55 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Doherty, Declan, Qi Zhang, Alejandro Lucero

Hi All,

(I tried to CC key people part of related discussions but likely missed
many, feel free to add them back)

The following RFC formalizes what has been discussed so far regarding switch
offload control using rte_flow through port/VF representors. It doesn't
bring new ideas besides the "transfer" attribute used internally to
implement port representors themselves and reuses some from past and current
threads or RFCs (see related anchors [2][3][4][5][7][8][9][10]).

It is meant to be converted without much hassle as proper DPDK documentation
hence the verbosity, however it all boils down to this:

- Port representors shall exist as additional ethdevs [2] created by PMDs
  according to devargs [4] discussed elsewhere; out of scope for this RFC.

- Besides enabling basic management of these ethdevs (e.g. MAC
  configuration), they should support rte_flow rules.

- Flow rules can target a different DPDK port ID [7] in order for matching
  traffic to be automatically forwarded without going through software.

- Forwarding may involve transformation such as VXLAN encap/decap [8][9].

---

===============================
Switch device offload with DPDK
===============================

Rationale
=========

Network adapters with multiple physical ports and/or SR-IOV capabilities
usually support the offload of traffic steering rules between their virtual
functions (VFs), physical functions (PFs) and ports.

Like for standard Ethernet switches, this involves a combination of
automatic MAC learning and manual configuration. For most purposes it is
managed by the host system and fully transparent to users and applications.

On the other hand, applications typically found on hypervisors that process
layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
according on their own criteria.

Without a standard software interface to manage traffic steering rules
between VFs, PFs and the various physical ports of a given device,
applications cannot take advantage of these offloads; software processing is
mandatory even for traffic which ends up re-injected into the device it
originates from.

This document describes how such steering rules can be configured through
the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
(PF/VF steering) using a single physical port for clarity, however the same
logic applies to any number of ports without necessarily involving SR-IOV.

Port representors
=================

In many cases, traffic steering rules cannot be determined in advance;
applications usually have to process a bit of traffic in software before
thinking about offloading specific flows to hardware.

Applications therefore need the ability to receive and inject traffic to
various device endpoints (other VFs, PFs or physical ports) before
connecting them together. Device drivers must provide means to hook the
"other end" of these endpoints and to refer them when configuring flow
rules.

This role is left to so-called "port representors" (also known as "VF
representors" in the specific context of VFs), which are to DPDK what the
Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
which can be thought as a software "patch panel" front-end for applications.

- DPDK port representors are implemented as additional virtual Ethernet
  device (**ethdev**) instances [2]_, spawned on a needed basis through
  configuration parameters [3]_ [4]_ by the driver of the underlying
  device.

- As virtual devices, they may be more limited than their physical
  counterparts, for instance by exposing only a subset of device
  configuration callbacks and/or by not necessarily having Rx/Tx capability.

- Among other things, they can be used to assign MAC addresses to the
  resource they represent.

- Applications can tell port representors apart by checking their device
  information structure which contains dedicated fields [5]_ describing
  parent/child device or group relationship (exact API remains to be
  defined).

.. [1] `Ethernet switch device driver model (switchdev)
       <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_

.. [2] `[RFC 0/5] Port Representor for control and monitoring of VF devices
       <http://dpdk.org/ml/archives/dev/2017-December/084639.html>`_

.. [3] `[PATCH v4 0/5] lib: add Port Representors
       <http://dpdk.org/ml/archives/dev/2018-January/086598.html>`_

.. [4] `doc: document the new devargs syntax
       <http://dpdk.org/ml/archives/dev/2018-January/087416.html>`_

.. [5] `doc: announce ABI change to support VF representors
       <http://dpdk.org/ml/archives/dev/2018-February/090958.html>`_

Basic SR-IOV
============

"Basic" in the sense that it is not managed by applications, which
nonetheless expect traffic to flow between the various endpoints and the
outside as if everything was linked by an Ethernet hub.

The following diagram pictures a setup involving a device with one PF, two
VFs and one shared physical port::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+----------'                 `----------+--' `--+----------'
       |                                       |       |
 .-----+-----.                                 |       |
 | port_id 3 |                                 |       |
 `-----+-----'                                 |       |
       |                                       |       |
     .-+--.                                .---+--. .--+---.
     | PF |                                | VF 1 | | VF 2 |
     `-+--'                                `---+--' `--+---'
       |                                       |       |
       `---------.     .-----------------------'       |
                 |     |     .-------------------------'
                 |     |     |
              .--+-----+-----+--.
              | interconnection |
              `--------+--------'
                       |
                  .----+-----.
                  | physical |
                  |  port 0  |
                  `----------'

- A DPDK application running on the hypervisor owns the PF device, which is
  arbitrarily assigned port index 3.

- Both VFs are assigned to VMs and used by unknown applications; they may be
  DPDK-based or anything else.

- Interconnection is not necessarily done through a true Ethernet switch and
  may not even exist as a separate entity. The role of this block is to show
  that something brings PF, VFs and physical ports together and enables
  communication between them, with a number of built-in restrictions.

Subsequent sections in this document describe means for DPDK applications
running on the hypervisor to freely assign specific flows between PF, VFs
and physical ports based on traffic properties, by managing this
interconnection.

Controlled SR-IOV
=================

Initialization
--------------

When a DPDK application gets assigned a PF device and is deliberately not
started in `basic SR-IOV`_ mode, any traffic coming from physical ports is
received by PF according to default rules, while VFs remain isolated::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+----------'                 `----------+--' `--+----------'
       |                                       |       |
 .-----+-----.                                 |       |
 | port_id 3 |                                 |       |
 `-----+-----'                                 |       |
       |                                       |       |
     .-+--.                                .---+--. .--+---.
     | PF |                                | VF 1 | | VF 2 |
     `-+--'                                `------' `------'
       |
       `-----.
             |
          .--+----------------------.
          | managed interconnection |
          `------------+------------'
                       |
                  .----+-----.
                  | physical |
                  |  port 0  |
                  `----------'

In this mode, interconnection must be configured by the application to
enable VF communication, for instance by explicitly directing traffic with a
given destination MAC address to VF 1 and allowing that with the same source
MAC address to come out of it.

For this to work, hypervisor applications need a way to refer to either VF 1
or VF 2 in addition to the PF. This is addressed by `VF representors`_.

VF representors
---------------

VF representors are virtual but standard DPDK network devices (albeit with
limited capabilities) created by PMDs when managing a PF device.

Since they represent VF instances used by other applications, configuring
them (e.g. assigning a MAC address or setting up promiscuous mode) affects
interconnection accordingly. If supported, they may also be used as two-way
communication ports with VFs (assuming **switchdev** topology)::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       |             |             |           |       |
 .-----+-----. .-----+-----. .-----+-----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
       |             |             |           |       |
       |             |   .---------'           |       |
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--+-------+---+---+---+--.
          | managed interconnection |
          `------------+------------'
                       |
                  .----+-----.
                  | physical |
                  |  port 0  |
                  `----------'

- VF representors are assigned arbitrary port indices 4 and 5 in the
  hypervisor application and are respectively associated with VF 1 and VF 2.

- They can't be dissociated; even if VF 1 and VF 2 were not connected,
  representors could still be used for configuration.

- In this context, port index 3 can be thought as a representor for physical
  port 0.

As previously described, the "interconnection" block represents a logical
concept. Interconnection occurs when hardware configuration enables traffic
flows from one place to another (e.g. physical port 0 to VF 1) according to
some criteria.

This is discussed in more detail in `traffic steering`_.

Traffic steering
----------------

In the following diagram, each meaningful traffic origin or endpoint as seen
by the hypervisor application is tagged with a unique letter from A to F::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       |             |             |           |       |
 .----(A)----. .----(B)----. .----(C)----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
       |             |             |           |       |
       |             |   .---------'           |       |
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--+-------+---+---+---+--.
          | managed interconnection |
          `------------+------------'
                       |
                  .---(F)----.
                  | physical |
                  |  port 0  |
                  `----------'

- **A**: PF device.
- **B**: port representor for VF 1.
- **C**: port representor for VF 2.
- **D**: VF 1 proper.
- **E**: VF 2 proper.
- **F**: physical port.

Although uncommon, some devices do not enforce a one to one mapping between
PF and physical ports. For instance, by default all ports of **mlx4**
adapters are available to all their PF/VF instances, in which case
additional ports appear next to **F** in the above diagram.

Assuming no interconnection is provided by default in this mode, setting up
a `basic SR-IOV`_ configuration involving physical port 0 could be broken
down as:

PF:

- **A to F**: let everything through.
- **F to A**: PF MAC as destination.

VF 1:

- **A to D**, **E to D** and **F to D**: VF 1 MAC as destination.
- **D to A**: VF 1 MAC as source and PF MAC as destination.
- **D to E**: VF 1 MAC as source and VF 2 MAC as destination.
- **D to F**: VF 1 MAC as source.

VF 2:

- **A to E**, **D to E** and **F to E**: VF 2 MAC as destination.
- **E to A**: VF 2 MAC as source and PF MAC as destination.
- **E to D**: VF 2 MAC as source and VF 1 MAC as destination.
- **E to F**: VF 2 MAC as source.

Devices may additionally support advanced matching criteria such as
IPv4/IPv6 addresses or TCP/UDP ports.

The combination of matching criteria with target endpoints fits well with
**rte_flow** [6]_, which expresses flow rules as combinations of patterns
and actions.

Enhancing **rte_flow** with the ability to make flow rules match and target
these endpoints provides a standard interface to manage their
interconnection without introducing new concepts and whole new API to
implement them. This is described in `flow API (rte_flow)`_.

.. [6] `Generic flow API (rte_flow)
       <http://dpdk.org/doc/guides/prog_guide/rte_flow.html>`_

Flow API (rte_flow)
===================

Extensions
----------

Compared to creating a brand new dedicated interface, **rte_flow** was
deemed flexible enough to manage representor traffic only with minor
extensions:

- Using physical ports, PF, VF or port representors as targets.

- Affecting traffic that is not necessarily addressed to the DPDK port ID a
  flow rule is associated with (e.g. forcing VF traffic redirection to PF).

For advanced uses:

- Rule-based packet counters.

- The ability to combine several identical actions for traffic duplication
  (e.g. VF representor in addition to a physical port).

- Dedicated actions for traffic encapsulation / decapsulation before
  reaching a endpoint.

The extensions described in the following sections follow up on Qi Zhang's
original RFC [7]_.

.. [7] `rte_flow extension for vSwitch acceleration
       <http://dpdk.org/ml/archives/dev/2017-December/084598.html>`_

Traffic direction
-----------------

>From an application standpoint, "ingress" and "egress" flow rule attributes
apply to the DPDK port ID they are associated with. They select a traffic
direction for matching patterns, but have no impact on actions.

When matching traffic coming from or going to a different place than the
immediate port ID a flow rule is associated with, these attributes keep
their meaning while applying to the chosen origin, as highlighted by the
following diagram::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       | ^           | ^           | ^         |       |
       | | ingress   | | ingress   | | ingress |       |
       | | egress    | | egress    | | egress  |       |
       | v           | v           | v         |       |
 .----(A)----. .----(B)----. .----(C)----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
       |             |             |         ^ |       | ^
       |             |             |  egress | |       | | egress
       |             |             | ingress | |       | | ingress
       |             |   .---------'         v |       | v
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--+-------+---+---+---+--.
          | managed interconnection |
          `------------+------------'
                     ^ |
             ingress | |
              egress | |
                     v |
                  .---(F)----.
                  | physical |
                  |  port 0  |
                  `----------'

Ingress and egress are defined as relative to the application creating the
flow rule.

For instance, matching traffic sent by VM 2 would be done through an ingress
flow rule on VF 2 (**E**). Likewise for incoming traffic on physical port
(**F**). This also applies to **C** and **A** respectively.

Transferring traffic
--------------------

Without port representors
~~~~~~~~~~~~~~~~~~~~~~~~~

`Traffic direction`_ describes how an application could match traffic coming
from or going to a specific place reachable from a DPDK port ID. This makes
sense when the traffic in question is normally seen (i.e. sent or received)
by the application creating the flow rule (e.g. as in "redirect all traffic
coming from VF 1 to local queue 6").

However this does not force such traffic to take a specific route. Creating
a flow rule on **A** matching traffic coming from **D** is only meaningful
if it can be received by **A** in the first place, otherwise doing so simply
has no effect.

A new flow rule attribute named "transfer" is necessary for that. Combining
it with "ingress" or "egress" and a specific origin requests a flow rule to
be applied at the lowest level::

          ingress only           :       ingress + transfer
                                 :
 .-------------. .-------------. : .-------------. .-------------.
 | hypervisor  | |    VM 1     | : | hypervisor  | |    VM 1     |
 | application | | application | : | application | | application |
 `------+------' `--+----------' : `------+------' `--+----------'
        |           | | traffic  :        |           | | traffic
  .----(A)----.     | v          :  .----(A)----.     | v
  | port_id 3 |     |            :  | port_id 3 |     |
  `-----+-----'     |            :  `-----+-----'     |
        |           |            :        | ^         |
        |           |            :        | | traffic |
      .-+--.    .---+--.         :      .-+--.    .---+--.
      | PF |    | VF 1 |         :      | PF |    | VF 1 |
      `-+--'    `--(D)-'         :      `-+--'    `--(D)-'
        |           | | traffic  :        | ^         | | traffic
        |           | v          :        | | traffic | v
     .--+-----------+--.         :     .--+-----------+--.
     | interconnection |         :     | interconnection |
     `--------+--------'         :     `--------+--------'
              | | traffic        :              |
              | v                :              |
         .---(F)----.            :         .---(F)----.
         | physical |            :         | physical |
         |  port 0  |            :         |  port 0  |
         `----------'            :         `----------'

With "ingress" only, traffic is matched on **A** thus still goes to physical
port **F** by default::

 testpmd> flow create 3 ingress pattern vf id is 1 / end
              actions queue index 6 / end

With "ingress + transfer", traffic is matched on **D** and is therefore
successfully assigned to queue 6 on **A**::

 testpmd> flow create 3 ingress transfer pattern vf id is 1 / end
              actions queue index 6 / end

With port representors
~~~~~~~~~~~~~~~~~~~~~~

When port representors exist, implicit flow rules with the "transfer"
attribute (described in `without port representors`_) are be assumed to
exist between them and their represented resources. These may be immutable.

In this case, traffic is received by default through the representor and
neither the "transfer" attribute nor traffic origin in flow rule patterns
are necessary. They simply have to be created on the representor port
directly and may target a different representor as described in `PORT_ID
action`_.

Implicit traffic flow with port representor::

    .-------------.   .-------------.
    | hypervisor  |   |    VM 1     |
    | application |   | application |
    `--+-------+--'   `----------+--'
       |       | ^               | | traffic
       |       | | traffic       | v
       |       `-----.           |
       |             |           |
 .----(A)----. .----(B)----.     |
 | port_id 3 | | port_id 4 |     |
 `-----+-----' `-----+-----'     |
       |             |           |
     .-+--.    .-----+-----. .---+--.
     | PF |    | VF 1 rep. | | VF 1 |
     `-+--'    `-----+-----' `--(D)-'
       |             |           |
    .--|-------------|-----------|--.
    |  |             |           |  |
    |  |             `-----------'  |
    |  |              <-- traffic   |
    `--|----------------------------'
       |
  .---(F)----.
  | physical |
  |  port 0  |
  `----------'

Pattern items and actions
-------------------------

``PORT`` pattern item
~~~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) a physical
port of the underlying device.

Using this pattern item without specifying a port index matches the physical
port associated with the current DPDK port ID by default. As described in
`traffic steering`_, specifying it should be rarely needed.

- Matches **F** in `traffic steering`_.

``PORT`` action
~~~~~~~~~~~~~~~

Directs matching traffic to a given physical port index.

- Targets **F** in `traffic steering`_.

``PORT_ID`` pattern item
~~~~~~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) a given DPDK
port ID.

Normally only supported if the port ID in question is known by the
underlying PMD and related to the device the flow rule is created against.

This must not be confused with the `PORT pattern item`_ which refers to the
physical port of a device. ``PORT_ID`` refers to a ``struct rte_eth_dev``
object on the application side (also known as "port representor" depending
on the kind of underlying device).

- Matches **A**, **B** or **C** in `traffic steering`_.

``PORT_ID`` action
~~~~~~~~~~~~~~~~~~

Directs matching traffic to a given DPDK port ID.

Same restrictions as `PORT_ID pattern item`_.

- Targets **A**, **B** or **C** in `traffic steering`_.

``PF`` pattern item
~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) the physical
function of the current device.

If supported, should work even if the physical function is not managed by
the application and thus not associated with a DPDK port ID. Its behavior is
otherwise similar to `PORT_ID pattern item`_ using PF port ID.

- Matches **A** in `traffic steering`_.

``PF`` action
~~~~~~~~~~~~~

Directs matching traffic to the physical function of the current device.

Same restrictions as `PF pattern item`_.

- Targets **A** in `traffic steering`_.

``VF`` pattern item
~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) a given
virtual function of the current device.

If supported, should work even if the virtual function is not managed by
the application and thus not associated with a DPDK port ID. Its behavior is
otherwise similar to `PORT_ID pattern item`_ using VF port ID.

Note this pattern item does not match VF representors traffic which, as
separate entities, should be addressed through their own port IDs.

- Matches **D** or **E** in `traffic steering`_.

``VF`` action
~~~~~~~~~~~~~

Directs matching traffic to a given virtual function of the current device.

Same restrictions as `VF pattern item`_.

- Targets **D** or **E** in `traffic steering`_.

``*_ENCAP`` actions
~~~~~~~~~~~~~~~~~~~

These actions are named according to the protocol they encapsulate traffic
with (e.g. ``VXLAN_ENCAP``) and using specific parameters (e.g. VNI for
VXLAN).

While they modify traffic and can be used multiple times (order matters),
unlike `PORT_ID action`_ and friends, they have no impact on steering.

As described in `actions order and repetition`_ this means they are useless
if used alone in an action list, the resulting traffic gets dropped unless
combined with either ``PASSTHRU`` or other endpoint-targeting actions.

All these are under discussion in the context of adding support for tunnel
endpoint (TEP) [8]_ [9]_.

.. [8] `[RFC] tunnel endpoint hw acceleration enablement
       <http://dpdk.org/ml/archives/dev/2017-December/084676.html>`_

.. [9] `ethdev: Additions to rte_flows to support vTEP encap/decap offload
       <http://dpdk.org/ml/archives/dev/2018-March/092378.html>`_

``*_DECAP`` actions
~~~~~~~~~~~~~~~~~~~

They perform the reverse of `*_ENCAP actions`_ by popping protocol headers
from traffic instead of pushing them. They can be used multiple times as
well.

Note that using these actions on non-matching traffic results in undefined
behavior. It is recommended to match the protocol headers to decapsulate on
the pattern side of a flow rule in order to use these actions or otherwise
make sure only matching traffic goes through.

Actions order and repetition
----------------------------

Flow rules are currently restricted to at most a single action of each
supported type, performed in an unpredictable order (or all at once). To
repeat actions in a predictable fashion, applications have to make rules
pass-through and use priority levels.

It's now clear that PMD support for chaining multiple non-terminating flow
rules of varying priority levels is prohibitively difficult to implement
compared to simply allowing multiple identical actions performed in a
defined order by a single flow rule.

- This change is required to support protocol encapsulation offloads and the
  ability to perform them multiple times (e.g. VLAN then VXLAN).

- It makes the ``DUP`` action redundant since multiple ``QUEUE`` actions can
  be combined for duplication.

- The (non-)terminating property of actions must be discarded. Instead, flow
  rules themselves must be considered terminating by default (i.e. dropping
  traffic if there is no specific target) unless a ``PASSTHRU`` action is
  also specified.

This change was announced [10]_ on the DPDK mailing list.

.. [10] `doc: announce API change for flow actions
	<http://dpdk.org/ml/archives/dev/2018-February/090989.html>`_

Examples
--------

This section provides practical examples based on the established Testpmd
flow command syntax [11]_, in the context described in `traffic steering`_::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       |             |             |           |       |
 .----(A)----. .----(B)----. .----(C)----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
       |             |             |           |       |
       |             |   .---------'           |       |
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--|-------|---|---|---|--.
          |  |       |   `---|---'  |
          |  |       `-------'      |
          |  `---------.            |
          `------------|------------'
                       |
                  .---(F)----.
                  | physical |
                  |  port 0  |
                  `----------'

By default, PF (**A**) can communicate with the physical port it is
associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
and restricted to communicate with the hypervisor application through their
respective representors (**B** and **C**) if supported.

Examples in subsequent sections apply to hypervisor applications only and
are based on port representors **A**, **B** and **C**.

.. [11] `Flow syntax
	<http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`_

Associating VF 1 with physical port 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Assign all port traffic (**F**) to VF 1 (**D**) indiscriminately through
their representors::

 flow create 3 ingress pattern / end actions port_id id 4 / end
 flow create 4 ingress pattern / end actions port_id id 3 / end

More practical example with MAC address restrictions::

 flow create 3 ingress
     pattern eth dst is {VF 1 MAC} / end
     actions port_id id 4 / end
 flow create 4 ingress
     pattern eth src is {VF 1 MAC} / end
     actions port_id id 3 / end

Sharing broadcasts
~~~~~~~~~~~~~~~~~~

>From outside to PF and VFs::

 flow create 3 ingress
     pattern eth dst is ff:ff:ff:ff:ff:ff / end
     actions port_id id 3 / port_id id 4 / port_id id 5 / end

Note ``port_id id 3`` is necessary otherwise only VFs would receive matching
traffic.

>From PF to outside and VFs::

 flow create 3 egress
     pattern eth dst is ff:ff:ff:ff:ff:ff / end
     actions port / port_id id 4 / port_id id 5 / end

>From VFs to outside and PF::

 flow create 4 ingress
     pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 1 MAC} / end
     actions port_id id 3 / port_id id 5 / end
 flow create 5 ingress
     pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 2 MAC} / end
     actions port_id id 4 / port_id id 4 / end

Similar ``33:33:*`` rules based on known MAC addresses should be added for
IPv6 traffic.

Encapsulating VF 2 traffic in VXLAN
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Assuming pass-through flow rules are supported::

 flow create 5 ingress
     pattern eth / end
     actions vxlan_encap vni 42 / passthru / end
 flow create 5 egress
     pattern vxlan vni is 42 / end
     actions vxlan_decap / passthru / end

Here ``passthru`` is needed since as described in `actions order and
repetition`_, flow rules are otherwise terminating; if supported, a rule
without a target endpoint will drop traffic.

Without pass-through support, ingress encapsulation on the destination
endpoint might not be supported and action list must provide one::

 flow create 5 ingress
      pattern eth src is {VF 2 MAC} / end
      actions vxlan_encap vni 42 / port_id id 3 / end
 flow create 3 ingress
      pattern vxlan vni is 42 / end
      actions vxlan_decap / port_id id 5 / end

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
  2018-03-12 14:30  3%     ` Eads, Gage
@ 2018-03-12 14:38  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-03-12 14:38 UTC (permalink / raw)
  To: Eads, Gage
  Cc: dev, Van Haaren, Harry, hemant.agrawal, Richardson, Bruce,
	santosh.shukla, nipun.gupta

-----Original Message-----
> Date: Mon, 12 Mar 2018 14:30:49 +0000
> From: "Eads, Gage" <gage.eads@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, "Van Haaren, Harry"
>  <harry.van.haaren@intel.com>, "hemant.agrawal@nxp.com"
>  <hemant.agrawal@nxp.com>, "Richardson, Bruce"
>  <bruce.richardson@intel.com>, "santosh.shukla@caviumnetworks.com"
>  <santosh.shukla@caviumnetworks.com>, "nipun.gupta@nxp.com"
>  <nipun.gupta@nxp.com>
> Subject: RE: [PATCH v2 1/2] eventdev: add device stop flush callback
> 
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Monday, March 12, 2018 1:25 AM
> > To: Eads, Gage <gage.eads@intel.com>
> > Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haaren@intel.com>;
> > hemant.agrawal@nxp.com; Richardson, Bruce <bruce.richardson@intel.com>;
> > santosh.shukla@caviumnetworks.com; nipun.gupta@nxp.com
> > Subject: Re: [PATCH v2 1/2] eventdev: add device stop flush callback
> > 
> > -----Original Message-----
> > > When an event device is stopped, it drains all event queues. These
> > > events may contain pointers, so to prevent memory leaks eventdev now
> > > supports a user-provided flush callback that is called during the queue drain
> > process.
> > > This callback is stored in process memory, so the callback must be
> > > registered by any process that may call rte_event_dev_stop().
> > >
> > > This commit also clarifies the behavior of rte_event_dev_stop().
> > >
> > > This follows this mailing list discussion:
> > > http://dpdk.org/ml/archives/dev/2018-January/087484.html
> > >
> > > Signed-off-by: Gage Eads <gage.eads@intel.com>
> > > ---
> > > v2: allow a NULL callback pointer to unregister the callback
> > >
> > >  lib/librte_eventdev/rte_eventdev.c           | 17 +++++++++
> > >  lib/librte_eventdev/rte_eventdev.h           | 55
> > +++++++++++++++++++++++++++-
> > >  lib/librte_eventdev/rte_eventdev_version.map |  6 +++
> > >  3 files changed, 76 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_eventdev/rte_eventdev.c
> > > b/lib/librte_eventdev/rte_eventdev.c
> > > index 851a119..1aacb7b 100644
> > > --- a/lib/librte_eventdev/rte_eventdev.c
> > > +++ b/lib/librte_eventdev/rte_eventdev.c
> > > @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id)
> > >  	return 0;
> > >  }
> > >
> > > +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
> > > +		void *arg);
> > > +/**< Callback function called during rte_event_dev_stop(), invoked
> > > +once per
> > > + * flushed event.
> > > + */
> > > +
> > >  #define RTE_EVENTDEV_NAME_MAX_LEN	(64)
> > >  /**< @internal Max length of name of event PMD */
> > >
> > > @@ -1176,6 +1194,11 @@ struct rte_eventdev {
> > >  	event_dequeue_burst_t dequeue_burst;
> > >  	/**< Pointer to PMD dequeue burst function. */
> > >
> > > +	eventdev_stop_flush_t dev_stop_flush;
> > > +	/**< Optional, user-provided event flush function */
> > > +	void *dev_stop_flush_arg;
> > > +	/**< User-provided argument for event flush function */
> > > +
> > 
> > I think, we can move this additions to the internal rte_eventdev_data structure.
> > Reasons are
> > 1) Changes to "struct rte_eventdev" would call for ABI change
> > 2) We can keep "struct rte_eventdev" only for fast path functions, slow path
> > functions can have additional redirection.
> > 
> 
> Good points -- I hadn't considered the ABI impact of modifying rte_eventdev. rte_eventdev_data is in shared memory, though, so it's not multi-process friendly for function pointers. How about putting it in rte_eventdev_ops?

Yes. Make sense to move to rte_eventdev_ops. But need to take care
updating the those function pointers in secondary process.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
  2018-03-12  6:25  3%   ` Jerin Jacob
@ 2018-03-12 14:30  3%     ` Eads, Gage
  2018-03-12 14:38  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Eads, Gage @ 2018-03-12 14:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Van Haaren, Harry, hemant.agrawal, Richardson, Bruce,
	santosh.shukla, nipun.gupta



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Monday, March 12, 2018 1:25 AM
> To: Eads, Gage <gage.eads@intel.com>
> Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haaren@intel.com>;
> hemant.agrawal@nxp.com; Richardson, Bruce <bruce.richardson@intel.com>;
> santosh.shukla@caviumnetworks.com; nipun.gupta@nxp.com
> Subject: Re: [PATCH v2 1/2] eventdev: add device stop flush callback
> 
> -----Original Message-----
> > When an event device is stopped, it drains all event queues. These
> > events may contain pointers, so to prevent memory leaks eventdev now
> > supports a user-provided flush callback that is called during the queue drain
> process.
> > This callback is stored in process memory, so the callback must be
> > registered by any process that may call rte_event_dev_stop().
> >
> > This commit also clarifies the behavior of rte_event_dev_stop().
> >
> > This follows this mailing list discussion:
> > http://dpdk.org/ml/archives/dev/2018-January/087484.html
> >
> > Signed-off-by: Gage Eads <gage.eads@intel.com>
> > ---
> > v2: allow a NULL callback pointer to unregister the callback
> >
> >  lib/librte_eventdev/rte_eventdev.c           | 17 +++++++++
> >  lib/librte_eventdev/rte_eventdev.h           | 55
> +++++++++++++++++++++++++++-
> >  lib/librte_eventdev/rte_eventdev_version.map |  6 +++
> >  3 files changed, 76 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eventdev/rte_eventdev.c
> > b/lib/librte_eventdev/rte_eventdev.c
> > index 851a119..1aacb7b 100644
> > --- a/lib/librte_eventdev/rte_eventdev.c
> > +++ b/lib/librte_eventdev/rte_eventdev.c
> > @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id)
> >  	return 0;
> >  }
> >
> > +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
> > +		void *arg);
> > +/**< Callback function called during rte_event_dev_stop(), invoked
> > +once per
> > + * flushed event.
> > + */
> > +
> >  #define RTE_EVENTDEV_NAME_MAX_LEN	(64)
> >  /**< @internal Max length of name of event PMD */
> >
> > @@ -1176,6 +1194,11 @@ struct rte_eventdev {
> >  	event_dequeue_burst_t dequeue_burst;
> >  	/**< Pointer to PMD dequeue burst function. */
> >
> > +	eventdev_stop_flush_t dev_stop_flush;
> > +	/**< Optional, user-provided event flush function */
> > +	void *dev_stop_flush_arg;
> > +	/**< User-provided argument for event flush function */
> > +
> 
> I think, we can move this additions to the internal rte_eventdev_data structure.
> Reasons are
> 1) Changes to "struct rte_eventdev" would call for ABI change
> 2) We can keep "struct rte_eventdev" only for fast path functions, slow path
> functions can have additional redirection.
> 

Good points -- I hadn't considered the ABI impact of modifying rte_eventdev. rte_eventdev_data is in shared memory, though, so it's not multi-process friendly for function pointers. How about putting it in rte_eventdev_ops?

> >  	struct rte_eventdev_data *data;
> >  	/**< Pointer to device data */
> >  	const struct rte_eventdev_ops *dev_ops; @@ -1822,6 +1845,34 @@
> > rte_event_dev_xstats_reset(uint8_t dev_id,
> >   */
> >  int rte_event_dev_selftest(uint8_t dev_id);
> >
> > +/**
> > + * Registers a callback function to be invoked during
> > +rte_event_dev_stop() for
> > + * each flushed event. This function can be used to properly dispose
> > +of queued
> > + * events, for example events containing memory pointers.
> > + *
> > + * The callback function is only registered for the calling process.
> > +The
> > + * callback function must be registered in every process that can
> > +call
> > + * rte_event_dev_stop().
> > + *
> > + * To unregister a callback, call this function with a NULL callback pointer.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param callback
> > + *   Callback function invoked once per flushed event.
> > + * @param userdata
> > + *   Argument supplied to callback.
> > + *
> > + * @return
> > + *  - 0 on success.
> > + *  - -EINVAL if *dev_id* is invalid
> > + *
> > + * @see rte_event_dev_stop()
> > + */
> > +int
> > +rte_event_dev_stop_flush_callback_register(uint8_t dev_id,
> > +		eventdev_stop_flush_t callback, void *userdata);
> > +
> IMO, It would be better if we place this function near to rte_event_dev_stop().
> 
> Other than above minor changes, It looks good to me.

Ok, will address in v3.

Thanks,
Gage

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 3/6] net/mlx5: add a function to rdma-core glue
  @ 2018-03-12  9:13  3%   ` Nélio Laranjeiro
  0 siblings, 0 replies; 200+ results
From: Nélio Laranjeiro @ 2018-03-12  9:13 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: wenzhuo.lu, jingjing.wu, adrien.mazarguil, olivier.matz, dev

On Fri, Mar 09, 2018 at 05:25:29PM -0800, Yongseok Koh wrote:
> mlx5dv_create_wq() is added.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_glue.c | 9 +++++++++
>  drivers/net/mlx5/mlx5_glue.h | 4 ++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
> index 1c4396ada..e33fc76b5 100644
> --- a/drivers/net/mlx5/mlx5_glue.c
> +++ b/drivers/net/mlx5/mlx5_glue.c
> @@ -287,6 +287,14 @@ mlx5_glue_dv_create_cq(struct ibv_context *context,
>  	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
>  }
>  
> +static struct ibv_wq *
> +mlx5_glue_dv_create_wq(struct ibv_context *context,
> +		       struct ibv_wq_init_attr *wq_attr,
> +		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
> +{
> +	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
> +}
> +
>  static int
>  mlx5_glue_dv_query_device(struct ibv_context *ctx,
>  			  struct mlx5dv_context *attrs_out)
> @@ -347,6 +355,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
>  	.port_state_str = mlx5_glue_port_state_str,
>  	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
>  	.dv_create_cq = mlx5_glue_dv_create_cq,
> +	.dv_create_wq = mlx5_glue_dv_create_wq,
>  	.dv_query_device = mlx5_glue_dv_query_device,
>  	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
>  	.dv_init_obj = mlx5_glue_dv_init_obj,
> diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
> index b5efee3b6..21a713961 100644
> --- a/drivers/net/mlx5/mlx5_glue.h
> +++ b/drivers/net/mlx5/mlx5_glue.h
> @@ -100,6 +100,10 @@ struct mlx5_glue {
>  		(struct ibv_context *context,
>  		 struct ibv_cq_init_attr_ex *cq_attr,
>  		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
> +	struct ibv_wq *(*dv_create_wq)
> +		(struct ibv_context *context,
> +		 struct ibv_wq_init_attr *wq_attr,
> +		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
>  	int (*dv_query_device)(struct ibv_context *ctx_in,
>  			       struct mlx5dv_context *attrs_out);
>  	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
> -- 
> 2.11.0
 
You missed to change the GLUE ABI version, it must be updated.

Regards,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-11 12:51  0%       ` santosh
@ 2018-03-12  6:53  0%         ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-12  6:53 UTC (permalink / raw)
  To: santosh, dev; +Cc: Olivier MATZ

On 03/11/2018 03:51 PM, santosh wrote:
> Hi Andrew,
>
>
> On Saturday 10 March 2018 09:09 PM, Andrew Rybchenko wrote:
>> Size of memory chunk required to populate mempool objects depends
>> on how objects are stored in the memory. Different mempool drivers
>> may have different requirements and a new operation allows to
>> calculate memory size in accordance with driver requirements and
>> advertise requirements on minimum memory chunk size and alignment
>> in a generic way.
>>
>> Bump ABI version since the patch breaks it.
>>
>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>> RFCv2 -> v1:
>>   - move default calc_mem_size callback to rte_mempool_ops_default.c
>>   - add ABI changes to release notes
>>   - name default callback consistently: rte_mempool_op_<callback>_default()
>>   - bump ABI version since it is the first patch which breaks ABI
>>   - describe default callback behaviour in details
>>   - avoid introduction of internal function to cope with depration
>>     (keep it to deprecation patch)
>>   - move cache-line or page boundary chunk alignment to default callback
>>   - highlight that min_chunk_size and align parameters are output only
>>
> [...]
>
>> diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
>> new file mode 100644
>> index 0000000..57fe79b
>> --- /dev/null
>> +++ b/lib/librte_mempool/rte_mempool_ops_default.c
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2016 Intel Corporation.
>> + * Copyright(c) 2016 6WIND S.A.
>> + * Copyright(c) 2018 Solarflare Communications Inc.
>> + */
>> +
>> +#include <rte_mempool.h>
>> +
>> +ssize_t
>> +rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
>> +				     uint32_t obj_num, uint32_t pg_shift,
>> +				     size_t *min_chunk_size, size_t *align)
>> +{
>> +	unsigned int mp_flags;
>> +	int ret;
>> +	size_t total_elt_sz;
>> +	size_t mem_size;
>> +
>> +	/* Get mempool capabilities */
>> +	mp_flags = 0;
>> +	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
>> +	if ((ret < 0) && (ret != -ENOTSUP))
>> +		return ret;
>> +
>> +	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>> +
>> +	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
>> +					 mp->flags | mp_flags);
>> +
> Looks ok to me except a nit:
> (mp->flags | mp_flags) style expression is to differentiate that
> mp_flags holds driver specific flag like BLK_ALIGN and mp->flags
> has appl specific flags.. is it so? If not then why not simply
> do like:
> mp->flags |= mp_flags.

In fact it does not mater a lot since the code is removed in the patch 3.
Here it is required just for consistency. Also, mp argument is a const
which will not allow to change its members.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
  @ 2018-03-12  6:25  3%   ` Jerin Jacob
  2018-03-12 14:30  3%     ` Eads, Gage
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-03-12  6:25 UTC (permalink / raw)
  To: Gage Eads
  Cc: dev, harry.van.haaren, hemant.agrawal, bruce.richardson,
	santosh.shukla, nipun.gupta

-----Original Message-----
> When an event device is stopped, it drains all event queues. These events
> may contain pointers, so to prevent memory leaks eventdev now supports a
> user-provided flush callback that is called during the queue drain process.
> This callback is stored in process memory, so the callback must be
> registered by any process that may call rte_event_dev_stop().
> 
> This commit also clarifies the behavior of rte_event_dev_stop().
> 
> This follows this mailing list discussion:
> http://dpdk.org/ml/archives/dev/2018-January/087484.html
> 
> Signed-off-by: Gage Eads <gage.eads@intel.com>
> ---
> v2: allow a NULL callback pointer to unregister the callback
> 
>  lib/librte_eventdev/rte_eventdev.c           | 17 +++++++++
>  lib/librte_eventdev/rte_eventdev.h           | 55 +++++++++++++++++++++++++++-
>  lib/librte_eventdev/rte_eventdev_version.map |  6 +++
>  3 files changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
> index 851a119..1aacb7b 100644
> --- a/lib/librte_eventdev/rte_eventdev.c
> +++ b/lib/librte_eventdev/rte_eventdev.c
> @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id)
>  	return 0;
>  }
>  
> +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
> +		void *arg);
> +/**< Callback function called during rte_event_dev_stop(), invoked once per
> + * flushed event.
> + */
> +
>  #define RTE_EVENTDEV_NAME_MAX_LEN	(64)
>  /**< @internal Max length of name of event PMD */
>  
> @@ -1176,6 +1194,11 @@ struct rte_eventdev {
>  	event_dequeue_burst_t dequeue_burst;
>  	/**< Pointer to PMD dequeue burst function. */
>  
> +	eventdev_stop_flush_t dev_stop_flush;
> +	/**< Optional, user-provided event flush function */
> +	void *dev_stop_flush_arg;
> +	/**< User-provided argument for event flush function */
> +

I think, we can move this additions to the internal rte_eventdev_data structure. Reasons are
1) Changes to "struct rte_eventdev" would call for ABI change
2) We can keep "struct rte_eventdev" only for fast path functions,
slow path functions can have additional redirection.

>  	struct rte_eventdev_data *data;
>  	/**< Pointer to device data */
>  	const struct rte_eventdev_ops *dev_ops;
> @@ -1822,6 +1845,34 @@ rte_event_dev_xstats_reset(uint8_t dev_id,
>   */
>  int rte_event_dev_selftest(uint8_t dev_id);
>  
> +/**
> + * Registers a callback function to be invoked during rte_event_dev_stop() for
> + * each flushed event. This function can be used to properly dispose of queued
> + * events, for example events containing memory pointers.
> + *
> + * The callback function is only registered for the calling process. The
> + * callback function must be registered in every process that can call
> + * rte_event_dev_stop().
> + *
> + * To unregister a callback, call this function with a NULL callback pointer.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param callback
> + *   Callback function invoked once per flushed event.
> + * @param userdata
> + *   Argument supplied to callback.
> + *
> + * @return
> + *  - 0 on success.
> + *  - -EINVAL if *dev_id* is invalid
> + *
> + * @see rte_event_dev_stop()
> + */
> +int
> +rte_event_dev_stop_flush_callback_register(uint8_t dev_id,
> +		eventdev_stop_flush_t callback, void *userdata);
> +
IMO, It would be better if we place this function near to rte_event_dev_stop().

Other than above minor changes, It looks good to me.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-10 15:39  7%     ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-11 12:51  0%       ` santosh
  2018-03-12  6:53  0%         ` Andrew Rybchenko
  2018-03-19 17:03  0%       ` Olivier Matz
  1 sibling, 1 reply; 200+ results
From: santosh @ 2018-03-11 12:51 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Olivier MATZ

Hi Andrew,


On Saturday 10 March 2018 09:09 PM, Andrew Rybchenko wrote:
> Size of memory chunk required to populate mempool objects depends
> on how objects are stored in the memory. Different mempool drivers
> may have different requirements and a new operation allows to
> calculate memory size in accordance with driver requirements and
> advertise requirements on minimum memory chunk size and alignment
> in a generic way.
>
> Bump ABI version since the patch breaks it.
>
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
> RFCv2 -> v1:
>  - move default calc_mem_size callback to rte_mempool_ops_default.c
>  - add ABI changes to release notes
>  - name default callback consistently: rte_mempool_op_<callback>_default()
>  - bump ABI version since it is the first patch which breaks ABI
>  - describe default callback behaviour in details
>  - avoid introduction of internal function to cope with depration
>    (keep it to deprecation patch)
>  - move cache-line or page boundary chunk alignment to default callback
>  - highlight that min_chunk_size and align parameters are output only
>
[...]

> diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
> new file mode 100644
> index 0000000..57fe79b
> --- /dev/null
> +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2016 Intel Corporation.
> + * Copyright(c) 2016 6WIND S.A.
> + * Copyright(c) 2018 Solarflare Communications Inc.
> + */
> +
> +#include <rte_mempool.h>
> +
> +ssize_t
> +rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
> +				     uint32_t obj_num, uint32_t pg_shift,
> +				     size_t *min_chunk_size, size_t *align)
> +{
> +	unsigned int mp_flags;
> +	int ret;
> +	size_t total_elt_sz;
> +	size_t mem_size;
> +
> +	/* Get mempool capabilities */
> +	mp_flags = 0;
> +	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
> +	if ((ret < 0) && (ret != -ENOTSUP))
> +		return ret;
> +
> +	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> +
> +	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
> +					 mp->flags | mp_flags);
> +

Looks ok to me except a nit:
(mp->flags | mp_flags) style expression is to differentiate that
mp_flags holds driver specific flag like BLK_ALIGN and mp->flags
has appl specific flags.. is it so? If not then why not simply
do like:
mp->flags |= mp_flags.

Thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                       ` (3 preceding siblings ...)
  2018-03-10 15:39  5%     ` [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions Andrew Rybchenko
@ 2018-03-10 15:39  8%     ` Andrew Rybchenko
  2018-03-19 17:03  0%     ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Olivier Matz
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback is not required any more since there is a new callback
to populate objects using provided memory area which provides
the same information.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - advertise ABI changes in release notes

 doc/guides/rel_notes/deprecation.rst       |  1 -
 doc/guides/rel_notes/release_18_05.rst     |  2 ++
 lib/librte_mempool/rte_mempool.c           |  5 -----
 lib/librte_mempool/rte_mempool.h           | 31 ------------------------------
 lib/librte_mempool/rte_mempool_ops.c       | 14 --------------
 lib/librte_mempool/rte_mempool_version.map |  1 -
 6 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 473330d..5301259 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -63,7 +63,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 0244f91..9d40db1 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -107,6 +107,8 @@ ABI Changes
   Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
   since its features are covered by ``calc_mem_size`` and ``populate``
   callbacks.
+  Callback ``register_memory_area`` has been removed from ``rte_mempool_ops``
+  since the new callback ``populate`` may be used instead of it.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b57ba2a..844d907 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -344,11 +344,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 		mp->flags |= MEMPOOL_F_POOL_CREATED;
 	}
 
-	/* Notify memory area to mempool */
-	ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len);
-	if (ret != -ENOTSUP && ret < 0)
-		return ret;
-
 	/* mempool is already populated */
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index ebfc95c..5f63f86 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -370,12 +370,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Notify new memory area to mempool.
- */
-typedef int (*rte_mempool_ops_register_memory_area_t)
-(const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * Calculate memory size required to store given number of objects.
  *
  * @param[in] mp
@@ -507,10 +501,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Notify new memory area to mempool
-	 */
-	rte_mempool_ops_register_memory_area_t register_memory_area;
-	/**
 	 * Optional callback to calculate memory size required to
 	 * store specified number of objects.
 	 */
@@ -632,27 +622,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops register_memory_area callback.
- * API to notify the mempool handler when a new memory area is added to pool.
- *
- * @param mp
- *   Pointer to the memory pool.
- * @param vaddr
- *   Pointer to the buffer virtual address.
- * @param iova
- *   Pointer to the buffer IO address.
- * @param len
- *   Pool size.
- * @return
- *   - 0: Success;
- *   - -ENOTSUP - doesn't support register_memory_area ops (valid error case).
- *   - Otherwise, rte_mempool_populate_phys fails thus pool create fails.
- */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
-				char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * @internal wrapper for mempool_ops calc_mem_size callback.
  * API to calculate size of memory required to store specified number of
  * object.
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 6ac669a..ea9be1e 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
 
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 }
 
 /* wrapper to notify new memory area to external mempool */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
-					rte_iova_t iova, size_t len)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->register_memory_area, -ENOTSUP);
-	return ops->register_memory_area(mp, vaddr, iova, len);
-}
-
-/* wrapper to notify new memory area to external mempool */
 ssize_t
 rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				uint32_t obj_num, uint32_t pg_shift,
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 42ca4df..f539a5a 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
 
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                       ` (2 preceding siblings ...)
  2018-03-10 15:39  6%     ` [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities Andrew Rybchenko
@ 2018-03-10 15:39  5%     ` Andrew Rybchenko
  2018-03-10 15:39  8%     ` [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area Andrew Rybchenko
  2018-03-19 17:03  0%     ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Olivier Matz
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_deafult().

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - advertise deprecation in release notes
 - factor out default memory size calculation into non-deprecated
   internal function to avoid usage of deprecated function internally
 - remove test for deprecated functions to address build issue because
   of usage of deprecated functions (it is easy to allow usage of
   deprecated function in Makefile, but very complicated in meson)

 doc/guides/rel_notes/deprecation.rst         |  7 -------
 doc/guides/rel_notes/release_18_05.rst       | 10 +++++++++
 lib/librte_mempool/rte_mempool.c             | 19 ++++++++++++++---
 lib/librte_mempool/rte_mempool.h             | 25 ++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops_default.c |  4 ++--
 test/test/test_mempool.c                     | 31 ----------------------------
 6 files changed, 53 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 4deed9a..473330d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -60,13 +60,6 @@ Deprecation Notices
   - ``rte_eal_mbuf_default_mempool_ops``
 
 * mempool: several API and ABI changes are planned in v18.05.
-  The following functions, introduced for Xen, which is not supported
-  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
-  Therefore they will be deprecated in v18.05 and removed in v18.08:
-
-  - ``rte_mempool_xmem_create``
-  - ``rte_mempool_xmem_size``
-  - ``rte_mempool_xmem_usage``
 
   The following changes are planned:
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index c50f26c..0244f91 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -74,6 +74,16 @@ API Changes
   Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
   used to achieve it without specific knowledge in the generic code.
 
+* **Deprecated mempool xmem functions.**
+
+  The following functions, introduced for Xen, which is not supported
+  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
+  Therefore they were deprecated in v18.05 and will be removed in v18.08:
+
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index fdcda45..b57ba2a 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Calculate maximum amount of memory required to store given number of objects.
+ * Internal function to calculate required memory chunk size shared
+ * by default implementation of the corresponding callback and
+ * deprecated external function.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
+rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+				 uint32_t pg_shift)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 
@@ -228,6 +230,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 }
 
 /*
+ * Calculate maximum amount of memory required to store given number of objects.
+ */
+size_t
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
+		      __rte_unused unsigned int flags)
+{
+	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
+						pg_shift);
+}
+
+/*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index cd3b229..ebfc95c 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -420,6 +420,28 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal Helper function to calculate memory size required to store
+ * specified number of objects in assumption that the memory buffer will
+ * be aligned at page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * @param elt_num
+ *   Number of elements.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by rte_mempool_calc_obj_size().
+ * @param pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+size_t rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+		uint32_t pg_shift);
+
+/**
  * Function to be called for each populated object.
  *
  * @param[in] mp
@@ -905,6 +927,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. See rte_mempool_create() for details.
  */
+__rte_deprecated
 struct rte_mempool *
 rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
 		unsigned cache_size, unsigned private_data_size,
@@ -1667,6 +1690,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * @return
  *   Required memory size aligned at page boundary.
  */
+__rte_deprecated
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
 	uint32_t pg_shift, unsigned int flags);
 
@@ -1698,6 +1722,7 @@ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
  *   buffer is too small, return a negative value whose absolute value
  *   is the actual number of elements that can be stored in that buffer.
  */
+__rte_deprecated
 ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
 	uint32_t pg_shift, unsigned int flags);
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 3defc15..fd63ca1 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -16,8 +16,8 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags);
+	mem_size = rte_mempool_calc_mem_size_helper(obj_num, total_elt_sz,
+						    pg_shift);
 
 	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 63f921e..8d29af2 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -444,34 +444,6 @@ test_mempool_same_name_twice_creation(void)
 	return 0;
 }
 
-/*
- * Basic test for mempool_xmem functions.
- */
-static int
-test_mempool_xmem_misc(void)
-{
-	uint32_t elt_num, total_size;
-	size_t sz;
-	ssize_t usz;
-
-	elt_num = MAX_KEEP;
-	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
-	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
-
-	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
-		MEMPOOL_PG_SHIFT_MAX, 0);
-
-	if (sz != (size_t)usz)  {
-		printf("failure @ %s: rte_mempool_xmem_usage(%u, %u) "
-			"returns: %#zx, while expected: %#zx;\n",
-			__func__, elt_num, total_size, sz, (size_t)usz);
-		return -1;
-	}
-
-	return 0;
-}
-
 static void
 walk_cb(struct rte_mempool *mp, void *userdata __rte_unused)
 {
@@ -596,9 +568,6 @@ test_mempool(void)
 	if (test_mempool_same_name_twice_creation() < 0)
 		goto err;
 
-	if (test_mempool_xmem_misc() < 0)
-		goto err;
-
 	/* test the stack handler */
 	if (test_mempool_basic(mp_stack, 1) < 0)
 		goto err;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-10 15:39  7%     ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-10 15:39  6%     ` [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory Andrew Rybchenko
@ 2018-03-10 15:39  6%     ` Andrew Rybchenko
  2018-03-10 15:39  5%     ` [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions Andrew Rybchenko
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - squash mempool/octeontx patches to add calc_mem_size and populate
   callbacks to this one in order to avoid breakages in the middle of
   patchset
 - advertise API changes in release notes

 doc/guides/rel_notes/deprecation.rst            |  1 -
 doc/guides/rel_notes/release_18_05.rst          | 11 +++++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 +++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.c                | 44 ++----------------
 lib/librte_mempool/rte_mempool.h                | 52 +---------------------
 lib/librte_mempool/rte_mempool_ops.c            | 14 ------
 lib/librte_mempool/rte_mempool_ops_default.c    | 15 +------
 lib/librte_mempool/rte_mempool_version.map      |  1 -
 8 files changed, 68 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index c06fc67..4deed9a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -70,7 +70,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index abaefe5..c50f26c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -66,6 +66,14 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Removed mempool capability flags and related functions.**
+
+  Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and
+  ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool
+  driver to customize generic mempool library behaviour.
+  Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
+  used to achieve it without specific knowledge in the generic code.
+
 
 ABI Changes
 -----------
@@ -86,6 +94,9 @@ ABI Changes
   to allow to customize required memory size calculation.
   A new callback ``populate`` has been added to ``rte_mempool_ops``
   to allow to customize objects population.
+  Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
+  since its features are covered by ``calc_mem_size`` and ``populate``
+  callbacks.
 
 
 Removed Items
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index d143d05..f2c4f6a 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
 	return octeontx_fpa_bufpool_free_count(pool);
 }
 
-static int
-octeontx_fpavf_get_capabilities(const struct rte_mempool *mp,
-				unsigned int *flags)
+static ssize_t
+octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp,
+			     uint32_t obj_num, uint32_t pg_shift,
+			     size_t *min_chunk_size, size_t *align)
 {
-	RTE_SET_USED(mp);
-	*flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
-			MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS);
-	return 0;
+	ssize_t mem_size;
+
+	/*
+	 * Simply need space for one more object to be able to
+	 * fullfil alignment requirements.
+	 */
+	mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1,
+							pg_shift,
+							min_chunk_size, align);
+	if (mem_size >= 0) {
+		/*
+		 * Memory area which contains objects must be physically
+		 * contiguous.
+		 */
+		*min_chunk_size = mem_size;
+	}
+
+	return mem_size;
 }
 
 static int
@@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp,
 	return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool);
 }
 
+static int
+octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs,
+			void *vaddr, rte_iova_t iova, size_t len,
+			rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+
+	if (iova == RTE_BAD_IOVA)
+		return -EINVAL;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	/* align object start address to a multiple of total_elt_sz */
+	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
+
+	if (len < off)
+		return -EINVAL;
+
+	vaddr = (char *)vaddr + off;
+	iova += off;
+	len -= off;
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len,
+					       obj_cb, obj_cb_arg);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.name = "octeontx_fpavf",
 	.alloc = octeontx_fpavf_alloc,
@@ -157,8 +199,9 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.enqueue = octeontx_fpavf_enqueue,
 	.dequeue = octeontx_fpavf_dequeue,
 	.get_count = octeontx_fpavf_get_count,
-	.get_capabilities = octeontx_fpavf_get_capabilities,
 	.register_memory_area = octeontx_fpavf_register_memory_area,
+	.calc_mem_size = octeontx_fpavf_calc_mem_size,
+	.populate = octeontx_fpavf_populate,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index ed0e982..fdcda45 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -208,15 +208,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      __rte_unused unsigned int flags)
 {
 	size_t obj_per_page, pg_num, pg_sz;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	if (total_elt_sz == 0)
 		return 0;
@@ -240,18 +234,12 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 ssize_t
 rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags)
+	uint32_t pg_shift, __rte_unused unsigned int flags)
 {
 	uint32_t elt_cnt = 0;
 	rte_iova_t start, end;
 	uint32_t iova_idx;
 	size_t pg_sz = (size_t)1 << pg_shift;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	/* if iova is NULL, assume contiguous memory */
 	if (iova == NULL) {
@@ -330,8 +318,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque)
 {
-	unsigned total_elt_sz;
-	unsigned int mp_capa_flags;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -354,27 +340,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-
-	/* Get mempool capabilities */
-	mp_capa_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_capa_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_capa_flags;
-
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -386,10 +351,7 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp_capa_flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
-		/* align object start address to a multiple of total_elt_sz */
-		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
-	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 49083bd..cd3b229 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -245,24 +245,6 @@ struct rte_mempool {
 #define MEMPOOL_F_SC_GET         0x0008 /**< Default get is "single-consumer".*/
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
-/**
- * This capability flag is advertised by a mempool handler, if the whole
- * memory area containing the objects must be physically contiguous.
- * Note: This flag should not be passed by application.
- */
-#define MEMPOOL_F_CAPA_PHYS_CONTIG 0x0040
-/**
- * This capability flag is advertised by a mempool handler. Used for a case
- * where mempool driver wants object start address(vaddr) aligned to block
- * size(/ total element size).
- *
- * Note:
- * - This flag should not be passed by application.
- *   Flag used for mempool driver only.
- * - Mempool driver must also set MEMPOOL_F_CAPA_PHYS_CONTIG flag along with
- *   MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS.
- */
-#define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -388,12 +370,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Get the mempool capabilities.
- */
-typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
-		unsigned int *flags);
-
-/**
  * Notify new memory area to mempool.
  */
 typedef int (*rte_mempool_ops_register_memory_area_t)
@@ -433,13 +409,7 @@ typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
- * If mempool driver requires object addresses to be block size aligned
- * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
- * reserved to be able to meet the requirement.
- *
- * Minimum size of memory chunk is either all required space, if
- * capabilities say that whole memory area must be physically contiguous
- * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * Minimum size of memory chunk is a maximum of the page size and total
  * element size.
  *
  * Required memory chunk alignment is a maximum of page size and cache
@@ -515,10 +485,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Get the mempool capabilities
-	 */
-	rte_mempool_get_capabilities_t get_capabilities;
-	/**
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
@@ -644,22 +610,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops get_capabilities callback.
- *
- * @param mp [in]
- *   Pointer to the memory pool.
- * @param flags [out]
- *   Pointer to the mempool flags.
- * @return
- *   - 0: Success; The mempool driver has advertised his pool capabilities in
- *   flags param.
- *   - -ENOTSUP - doesn't support get_capabilities ops (valid case).
- *   - Otherwise, pool create fails.
- */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags);
-/**
  * @internal wrapper for mempool_ops register_memory_area callback.
  * API to notify the mempool handler when a new memory area is added to pool.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 1a7f39f..6ac669a 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 	return ops->get_count(mp);
 }
 
-/* wrapper to get external mempool capabilities. */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->get_capabilities, -ENOTSUP);
-	return ops->get_capabilities(mp, flags);
-}
-
 /* wrapper to notify new memory area to external mempool */
 int
 rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57295f7..3defc15 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -11,26 +11,15 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 				     uint32_t obj_num, uint32_t pg_shift,
 				     size_t *min_chunk_size, size_t *align)
 {
-	unsigned int mp_flags;
-	int ret;
 	size_t total_elt_sz;
 	size_t mem_size;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
 	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags | mp_flags);
+					 mp->flags);
 
-	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
-		*min_chunk_size = mem_size;
-	else
-		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
 	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 90e79ec..42ca4df 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_get_capabilities;
 	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-10 15:39  7%     ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-10 15:39  6%     ` Andrew Rybchenko
  2018-03-10 15:39  6%     ` [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities Andrew Rybchenko
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - advertise ABI changes in release notes
 - use consistent name for default callback:
   rte_mempool_op_<callback>_default()
 - add opaque data pointer to populated object callback
 - move default callback to dedicated file

 doc/guides/rel_notes/deprecation.rst         |  2 +-
 doc/guides/rel_notes/release_18_05.rst       |  2 +
 lib/librte_mempool/rte_mempool.c             | 23 +++----
 lib/librte_mempool/rte_mempool.h             | 90 ++++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops.c         | 21 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 24 ++++++++
 lib/librte_mempool/rte_mempool_version.map   |  1 +
 7 files changed, 148 insertions(+), 15 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e02d4ca..c06fc67 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,7 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize objects population and allocate contiguous
+  - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 59583ea..abaefe5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,8 @@ ABI Changes
 
   A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
   to allow to customize required memory size calculation.
+  A new callback ``populate`` has been added to ``rte_mempool_ops``
+  to allow to customize objects population.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 3bfb36e..ed0e982 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size)
 }
 
 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
+mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
+		 void *obj, rte_iova_t iova)
 {
 	struct rte_mempool_objhdr *hdr;
 	struct rte_mempool_objtlr *tlr __rte_unused;
@@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 	tlr = __mempool_get_trailer(obj);
 	tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-
-	/* enqueue in ring */
-	rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
 }
 
 /* call obj_cb() for each mempool element */
@@ -396,16 +394,13 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
 
-	while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
-		off += mp->header_size;
-		if (iova == RTE_BAD_IOVA)
-			mempool_add_elem(mp, (char *)vaddr + off,
-				RTE_BAD_IOVA);
-		else
-			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
-		off += mp->elt_size + mp->trailer_size;
-		i++;
-	}
+	if (off > len)
+		return -EINVAL;
+
+	i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
+		(char *)vaddr + off,
+		(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
+		len - off, mempool_add_elem, NULL);
 
 	/* not enough room to store one object */
 	if (i == 0)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0151f6c..49083bd 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -449,6 +449,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		uint32_t obj_num, uint32_t pg_shift,
 		size_t *min_chunk_size, size_t *align);
 
+/**
+ * Function to be called for each populated object.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] opaque
+ *   An opaque pointer passed to iterator.
+ * @param[in] vaddr
+ *   Object virtual address.
+ * @param[in] iova
+ *   Input/output virtual address of the object or RTE_BAD_IOVA.
+ */
+typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp,
+		void *opaque, void *vaddr, rte_iova_t iova);
+
+/**
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * Populated objects should be enqueued to the pool, e.g. using
+ * rte_mempool_ops_enqueue_bulk().
+ *
+ * If the given IO address is unknown (iova = RTE_BAD_IOVA),
+ * the chunk doesn't need to be physically contiguous (only virtually),
+ * and allocated objects may span two pages.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+typedef int (*rte_mempool_populate_t)(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
+/**
+ * Default way to populate memory pool object using provided memory
+ * chunk: just slice objects one by one.
+ */
+int rte_mempool_op_populate_default(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -470,6 +527,11 @@ struct rte_mempool_ops {
 	 * store specified number of objects.
 	 */
 	rte_mempool_calc_mem_size_t calc_mem_size;
+	/**
+	 * Optional callback to populate mempool objects using
+	 * provided memory chunk.
+	 */
+	rte_mempool_populate_t populate;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -642,6 +704,34 @@ ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				      size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal wrapper for mempool_ops populate callback.
+ *
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+			     void *vaddr, rte_iova_t iova, size_t len,
+			     rte_mempool_populate_obj_cb_t *obj_cb,
+			     void *obj_cb_arg);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 26908cc..1a7f39f 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
+	ops->populate = h->populate;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -141,6 +142,26 @@ rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
 }
 
+/* wrapper to populate memory pool objects using provided memory chunk */
+int
+rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+				void *vaddr, rte_iova_t iova, size_t len,
+				rte_mempool_populate_obj_cb_t *obj_cb,
+				void *obj_cb_arg)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->populate == NULL)
+		return rte_mempool_op_populate_default(mp, max_objs, vaddr,
+						       iova, len, obj_cb,
+						       obj_cb_arg);
+
+	return ops->populate(mp, max_objs, vaddr, iova, len, obj_cb,
+			     obj_cb_arg);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57fe79b..57295f7 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -36,3 +36,27 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	return mem_size;
 }
+
+int
+rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+	unsigned int i;
+	void *obj;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+		off += mp->header_size;
+		obj = (char *)vaddr + off;
+		obj_cb(mp, obj_cb_arg, obj,
+		       (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off));
+		rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
+		off += mp->elt_size + mp->trailer_size;
+	}
+
+	return i;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index e2a054b..90e79ec 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -56,6 +56,7 @@ DPDK_18.05 {
 	global:
 
 	rte_mempool_op_calc_mem_size_default;
+	rte_mempool_op_populate_default;
 
 } DPDK_17.11;
 
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-10 15:39  7%     ` Andrew Rybchenko
  2018-03-11 12:51  0%       ` santosh
  2018-03-19 17:03  0%       ` Olivier Matz
  2018-03-10 15:39  6%     ` [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory Andrew Rybchenko
                       ` (4 subsequent siblings)
  5 siblings, 2 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - move default calc_mem_size callback to rte_mempool_ops_default.c
 - add ABI changes to release notes
 - name default callback consistently: rte_mempool_op_<callback>_default()
 - bump ABI version since it is the first patch which breaks ABI
 - describe default callback behaviour in details
 - avoid introduction of internal function to cope with depration
   (keep it to deprecation patch)
 - move cache-line or page boundary chunk alignment to default callback
 - highlight that min_chunk_size and align parameters are output only

 doc/guides/rel_notes/deprecation.rst         |  3 +-
 doc/guides/rel_notes/release_18_05.rst       |  7 ++-
 lib/librte_mempool/Makefile                  |  3 +-
 lib/librte_mempool/meson.build               |  5 +-
 lib/librte_mempool/rte_mempool.c             | 43 +++++++--------
 lib/librte_mempool/rte_mempool.h             | 80 +++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c         | 18 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 38 +++++++++++++
 lib/librte_mempool/rte_mempool_version.map   |  8 +++
 9 files changed, 177 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6594585..e02d4ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,8 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize required memory chunk calculation,
-    customize objects population and allocate contiguous
+  - addition of new ops to customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index f2525bb..59583ea 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -80,6 +80,11 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Changed rte_mempool_ops structure.**
+
+  A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
+  to allow to customize required memory size calculation.
+
 
 Removed Items
 -------------
@@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.3
-     librte_mempool.so.3
+   + librte_mempool.so.4
    + librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..072740f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
 
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 7a4f3da..9e3b527 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 2
-sources = files('rte_mempool.c', 'rte_mempool_ops.c')
+version = 4
+sources = files('rte_mempool.c', 'rte_mempool_ops.c',
+		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f7f4b..3bfb36e 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -544,39 +544,33 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
-	size_t size, total_elt_sz, align, pg_sz, pg_shift;
+	ssize_t mem_size;
+	size_t align, pg_sz, pg_shift;
 	rte_iova_t iova;
 	unsigned mz_id, n;
-	unsigned int mp_flags;
 	int ret;
 
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_flags;
-
 	if (rte_eal_has_hugepages()) {
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
-		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
-		align = pg_sz;
 	}
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		size_t min_chunk_size;
+
+		mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+				&min_chunk_size, &align);
+		if (mem_size < 0) {
+			ret = mem_size;
+			goto fail;
+		}
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -585,7 +579,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
+		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 			mp->socket_id, mz_flags, align);
 		/* not enough memory, retry with the biggest zone we have */
 		if (mz == NULL)
@@ -596,6 +590,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
+		if (mz->len < min_chunk_size) {
+			rte_memzone_free(mz);
+			ret = -ENOMEM;
+			goto fail;
+		}
+
 		if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
 			iova = RTE_BAD_IOVA;
 		else
@@ -628,13 +628,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 static size_t
 get_anon_size(const struct rte_mempool *mp)
 {
-	size_t size, total_elt_sz, pg_sz, pg_shift;
+	size_t size, pg_sz, pg_shift;
+	size_t min_chunk_size;
+	size_t align;
 
 	pg_sz = getpagesize();
 	pg_shift = rte_bsf32(pg_sz);
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+	size = rte_mempool_ops_calc_mem_size(mp, mp->size, pg_shift,
+					     &min_chunk_size, &align);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7..0151f6c 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -399,6 +399,56 @@ typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
 typedef int (*rte_mempool_ops_register_memory_area_t)
 (const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
 
+/**
+ * Calculate memory size required to store given number of objects.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location with required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+		uint32_t obj_num,  uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+		uint32_t obj_num, uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -415,6 +465,11 @@ struct rte_mempool_ops {
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
+	/**
+	 * Optional callback to calculate memory size required to
+	 * store specified number of objects.
+	 */
+	rte_mempool_calc_mem_size_t calc_mem_size;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -564,6 +619,29 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
 				char *vaddr, rte_iova_t iova, size_t len);
 
 /**
+ * @internal wrapper for mempool_ops calc_mem_size callback.
+ * API to calculate size of memory required to store specified number of
+ * object.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location with required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				      uint32_t obj_num, uint32_t pg_shift,
+				      size_t *min_chunk_size, size_t *align);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
@@ -1533,7 +1611,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * of objects. Assume that the memory buffer will be aligned at page
  * boundary.
  *
- * Note that if object size is bigger then page size, then it assumes
+ * Note that if object size is bigger than page size, then it assumes
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 0732255..26908cc 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_count = h->get_count;
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
+	ops->calc_mem_size = h->calc_mem_size;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -123,6 +124,23 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
 	return ops->register_memory_area(mp, vaddr, iova, len);
 }
 
+/* wrapper to notify new memory area to external mempool */
+ssize_t
+rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				uint32_t obj_num, uint32_t pg_shift,
+				size_t *min_chunk_size, size_t *align)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->calc_mem_size == NULL)
+		return rte_mempool_op_calc_mem_size_default(mp, obj_num,
+				pg_shift, min_chunk_size, align);
+
+	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
new file mode 100644
index 0000000..57fe79b
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 6WIND S.A.
+ * Copyright(c) 2018 Solarflare Communications Inc.
+ */
+
+#include <rte_mempool.h>
+
+ssize_t
+rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+				     uint32_t obj_num, uint32_t pg_shift,
+				     size_t *min_chunk_size, size_t *align)
+{
+	unsigned int mp_flags;
+	int ret;
+	size_t total_elt_sz;
+	size_t mem_size;
+
+	/* Get mempool capabilities */
+	mp_flags = 0;
+	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
+					 mp->flags | mp_flags);
+
+	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
+		*min_chunk_size = mem_size;
+	else
+		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+
+	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
+
+	return mem_size;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 62b76f9..e2a054b 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -51,3 +51,11 @@ DPDK_17.11 {
 	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
+
+DPDK_18.05 {
+	global:
+
+	rte_mempool_op_calc_mem_size_default;
+
+} DPDK_17.11;
+
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver
    2018-01-31 16:44  0%   ` Olivier Matz
@ 2018-03-10 15:39  3%   ` Andrew Rybchenko
  2018-03-10 15:39  7%     ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
                       ` (5 more replies)
  2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
  2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  3 siblings, 6 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev
  Cc: Olivier MATZ, Santosh Shukla, Jerin Jacob, Hemant Agrawal,
	Shreyansh Jain

The initial patch series [1] is split into two to simplify processing.
The second series relies on this one and will add bucket mempool driver
and related ops.

The patch series has generic enhancements suggested by Olivier.
Basically it adds driver callbacks to calculate required memory size and
to populate objects using provided memory area. It allows to remove
so-called capability flags used before to tell generic code how to
allocate and slice allocated memory into mempool objects.
Clean up which removes get_capabilities and register_memory_area is
not strictly required, but I think right thing to do.
Existing mempool drivers are updated.

I've kept rte_mempool_populate_iova_tab() intact since it seems to
be not directly related XMEM API functions.

It breaks ABI since changes rte_mempool_ops. Also it removes
rte_mempool_ops_register_memory_area() and
rte_mempool_ops_get_capabilities() since corresponding callbacks are
removed.

Internal global functions are not listed in map file since it is not
a part of external API.

[1] http://dpdk.org/ml/archives/dev/2018-January/088698.html

RFCv1 -> RFCv2:
  - add driver ops to calculate required memory size and populate
    mempool objects, remove extra flags which were required before
    to control it
  - transition of octeontx and dpaa drivers to the new callbacks
  - change info API to get information from driver required to
    API user to know contiguous block size
  - remove get_capabilities (not required any more and may be
    substituted with more in info get API)
  - remove register_memory_area since it is substituted with
    populate callback which can do more
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - deprecate XMEM API
  - avoid introduction of a new function to flush cache
  - fix NO_CACHE_ALIGN case in bucket mempool

RFCv2 -> v1:
  - split the series in two
  - squash octeontx patches which implement calc_mem_size and populate
    callbacks into the patch which removes get_capabilities since it is
    the easiest way to untangle the tangle of tightly related library
    functions and flags advertised by the driver
  - consistently name default callbacks
  - move default callbacks to dedicated file
  - see detailed description in patches

Andrew Rybchenko (7):
  mempool: add op to calculate memory size to be allocated
  mempool: add op to populate objects using provided memory
  mempool: remove callback to get capabilities
  mempool: deprecate xmem functions
  mempool/octeontx: prepare to remove register memory area op
  mempool/dpaa: prepare to remove register memory area op
  mempool: remove callback to register memory area

Artem V. Andreev (2):
  mempool: ensure the mempool is initialized before populating
  mempool: support flushing the default cache of the mempool

 doc/guides/rel_notes/deprecation.rst            |  12 +-
 doc/guides/rel_notes/release_18_05.rst          |  32 ++-
 drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
 lib/librte_mempool/Makefile                     |   3 +-
 lib/librte_mempool/meson.build                  |   5 +-
 lib/librte_mempool/rte_mempool.c                | 159 +++++++--------
 lib/librte_mempool/rte_mempool.h                | 260 +++++++++++++++++-------
 lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
 lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
 lib/librte_mempool/rte_mempool_version.map      |  11 +-
 test/test/test_mempool.c                        |  31 ---
 12 files changed, 437 insertions(+), 241 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 15:45  0%         ` Ferruh Yigit
@ 2018-03-09 19:06  0%           ` Neil Horman
  2018-03-20 15:51  0%             ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-09 19:06 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On Fri, Mar 09, 2018 at 03:45:49PM +0000, Ferruh Yigit wrote:
> On 3/9/2018 3:16 PM, Neil Horman wrote:
> > On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
> >> On 3/9/2018 12:36 PM, Neil Horman wrote:
> >>> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
> >>>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
> >>>> used as named opaque type.
> >>>>
> >>>> So the functions that are adding callbacks can return objects in this
> >>>> type instead of void pointer.
> >>>>
> >>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>> ---
> >>>> v2:
> >>>> * keep using struct * in parameters, instead add callback functions
> >>>> return struct rte_eth_rxtx_callback pointer.
> >>>>
> >>>> v4:
> >>>> * Remove deprecation notice. LIBABIVER already increased in this release
> >>>> ---
> >>>>  doc/guides/rel_notes/deprecation.rst |  7 -------
> >>>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
> >>>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
> >>>>  3 files changed, 11 insertions(+), 15 deletions(-)
> >>>>
> >>> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
> >>> internal data structure, then it shouldn't be used as part of the prototype for
> >>> an exported function, as the structure will then no longer be a internal data
> >>> structure, but rather part of the public ABI.
> >>
> >> "struct rte_eth_rxtx_callback" is internal data structure. And application
> >> should not access elements of this structure.
> >>
> >> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
> >> can use it as opaque type.
> >>
> >> It is possible that both "add" and "remove" APIs use "void *" and API itself can
> >> cast it. But the inconsistency was "add" related APIs return "void *" and
> >> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
> >>
> >> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
> >> "void *", because named opaque type documents intention/usage better.
> >>
> >> Thanks,
> >> ferruh
> >>
> > I get what you're saying about rte_eth_rxtx_callback being an internals
> > structure (or its intent is to be an internal structure), but it doesn't seem to
> > hold up to the header file layout.  rte_eth_rxtx_callback is defined in
> > rte_ethdev_core.h which according to the makefile, is listed as a symlinked
> > file, and therefore available for external applications to include.  This
> > negates the intended opaque nature of the struct.  I think before you do this,
> > you want to rectify that.
> 
> Intention is to make "struct rte_eth_rxtx_callback" internal, but as you said it
> is available to applications. This is same for all data structures in
> rte_ethdev_core.h
> 
Well...yes.  Thats what I said

> Unfortunately it can't be actual internal because of inline functions in public
> header uses them. And we can't change inline functions because of performance
> concerns.
> 
I'm sorry, thats not ok with me.  Just declaring a data structure to be
internal-only without enforcing that is asking for applications to mangle
internal data, and theres no reason it can't be fixed (and done without
sacrificing performance).

> Since we can't make the structure real internal, we can't really prevent
> applications to access the internals, this same if you use "void *".
> 
Just typedef a void pointer to some rte_ethdev_cb_handle_t type and pass that
back and forth instead.  That at least hides the fact that you are using a non
opaque structure from user applications without some intentional casting.  You
can further lock the call down by declaring the handles const so that no one
tries to dereference or modify them without generating a warning.

Neil

> > 
> > Neil
> > 
> >>>
> >>> Neil
> >>>
> >>
> >>
> 
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework
  @ 2018-03-09 16:42  2% ` Konstantin Ananyev
  0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 config/common_linuxapp             |   1 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  48 ++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  37 +++
 lib/librte_bpf/bpf_load.c          | 380 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/rte_bpf.h           | 158 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 mk/rte.app.mk                      |   2 +
 12 files changed, 1182 insertions(+)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ad03cf433..2205b684f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -823,3 +823,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..7b4a0ce7d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -10,6 +10,7 @@ CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
+CONFIG_RTE_LIBRTE_BPF=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..4727d2251
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+
+	if (rc != 0)
+		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..f1c1d3be3
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_LOG(ERR, USER1, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_LOG(ERR, USER1,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..f09417088
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..6ced9c640
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,380 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm = (uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_LOG(ERR, USER1, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_LOG(ERR, USER1,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_LOG(ERR, USER1, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_LOG(ERR, USER1,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_LOG(INFO, USER1, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..7c1267cbd
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_LOG(ERR, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..efee35ad4
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3eb41d176..fb41c77d2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-03-08 14:38  0%         ` Burakov, Anatoly
@ 2018-03-09 16:32  0%           ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-03-09 16:32 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Thu, Mar 08, 2018 at 02:38:37PM +0000, Burakov, Anatoly wrote:
> On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> > On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
> > > During lcore scan, find maximum socket ID and store it. This will
> > > break the ABI, so bump ABI version.
> > > 
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > > 
> > > Notes:
> > >      v4:
> > >      - Remove backwards ABI compatibility, bump ABI instead
> > >      v3:
> > >      - Added ABI compatibility
> > >      v2:
> > >      - checkpatch changes
> > >      - check socket before deciding if the core is not to be used
> > > 
> > >   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
> > >   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
> > >   lib/librte_eal/common/include/rte_eal.h   |  1 +
> > >   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
> > >   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
> > >   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
> > >   6 files changed, 44 insertions(+), 15 deletions(-)
> > > 
> > Breaking the ABI is the best way to implement this change, and given the
> > deprecation was previously announced I'm ok with that.
> > 
> > Question: we are ok assuming that the socket numbers are sequential, or
> > nearly so, and knowing the maximum socket number seen is a good
> > approximation of the actual physical sockets? I know in terms of cores
> > on a system, the core id's often jump - are there systems where the
> > socket numbers do too?
> > 
> > /Bruce
> > 
> 
> I am not aware of any system that would jump sockets like that. I'm open to
> corrections, however :)
> 
> -- 
In the absense of any corrections, I think this is fine to have.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 15:16  0%       ` Neil Horman
@ 2018-03-09 15:45  0%         ` Ferruh Yigit
  2018-03-09 19:06  0%           ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-09 15:45 UTC (permalink / raw)
  To: Neil Horman; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On 3/9/2018 3:16 PM, Neil Horman wrote:
> On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
>> On 3/9/2018 12:36 PM, Neil Horman wrote:
>>> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
>>>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
>>>> used as named opaque type.
>>>>
>>>> So the functions that are adding callbacks can return objects in this
>>>> type instead of void pointer.
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
>>>> ---
>>>> v2:
>>>> * keep using struct * in parameters, instead add callback functions
>>>> return struct rte_eth_rxtx_callback pointer.
>>>>
>>>> v4:
>>>> * Remove deprecation notice. LIBABIVER already increased in this release
>>>> ---
>>>>  doc/guides/rel_notes/deprecation.rst |  7 -------
>>>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
>>>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
>>>>  3 files changed, 11 insertions(+), 15 deletions(-)
>>>>
>>> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
>>> internal data structure, then it shouldn't be used as part of the prototype for
>>> an exported function, as the structure will then no longer be a internal data
>>> structure, but rather part of the public ABI.
>>
>> "struct rte_eth_rxtx_callback" is internal data structure. And application
>> should not access elements of this structure.
>>
>> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
>> can use it as opaque type.
>>
>> It is possible that both "add" and "remove" APIs use "void *" and API itself can
>> cast it. But the inconsistency was "add" related APIs return "void *" and
>> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
>>
>> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
>> "void *", because named opaque type documents intention/usage better.
>>
>> Thanks,
>> ferruh
>>
> I get what you're saying about rte_eth_rxtx_callback being an internals
> structure (or its intent is to be an internal structure), but it doesn't seem to
> hold up to the header file layout.  rte_eth_rxtx_callback is defined in
> rte_ethdev_core.h which according to the makefile, is listed as a symlinked
> file, and therefore available for external applications to include.  This
> negates the intended opaque nature of the struct.  I think before you do this,
> you want to rectify that.

Intention is to make "struct rte_eth_rxtx_callback" internal, but as you said it
is available to applications. This is same for all data structures in
rte_ethdev_core.h

Unfortunately it can't be actual internal because of inline functions in public
header uses them. And we can't change inline functions because of performance
concerns.

Since we can't make the structure real internal, we can't really prevent
applications to access the internals, this same if you use "void *".

> 
> Neil
> 
>>>
>>> Neil
>>>
>>
>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 13:00  0%     ` Ferruh Yigit
@ 2018-03-09 15:16  0%       ` Neil Horman
  2018-03-09 15:45  0%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-09 15:16 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
> On 3/9/2018 12:36 PM, Neil Horman wrote:
> > On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
> >> "struct rte_eth_rxtx_callback" is defined as internal data structure and
> >> used as named opaque type.
> >>
> >> So the functions that are adding callbacks can return objects in this
> >> type instead of void pointer.
> >>
> >> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> >> ---
> >> v2:
> >> * keep using struct * in parameters, instead add callback functions
> >> return struct rte_eth_rxtx_callback pointer.
> >>
> >> v4:
> >> * Remove deprecation notice. LIBABIVER already increased in this release
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst |  7 -------
> >>  lib/librte_ether/rte_ethdev.c        |  6 +++---
> >>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
> >>  3 files changed, 11 insertions(+), 15 deletions(-)
> >>
> > This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
> > internal data structure, then it shouldn't be used as part of the prototype for
> > an exported function, as the structure will then no longer be a internal data
> > structure, but rather part of the public ABI.
> 
> "struct rte_eth_rxtx_callback" is internal data structure. And application
> should not access elements of this structure.
> 
> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
> can use it as opaque type.
> 
> It is possible that both "add" and "remove" APIs use "void *" and API itself can
> cast it. But the inconsistency was "add" related APIs return "void *" and
> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
> 
> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
> "void *", because named opaque type documents intention/usage better.
> 
> Thanks,
> ferruh
> 
I get what you're saying about rte_eth_rxtx_callback being an internals
structure (or its intent is to be an internal structure), but it doesn't seem to
hold up to the header file layout.  rte_eth_rxtx_callback is defined in
rte_ethdev_core.h which according to the makefile, is listed as a symlinked
file, and therefore available for external applications to include.  This
negates the intended opaque nature of the struct.  I think before you do this,
you want to rectify that.

Neil

> > 
> > Neil
> > 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
       [not found]       ` <20180309123651.GB19004@hmswarspite.think-freely.org>
@ 2018-03-09 13:00  0%     ` Ferruh Yigit
  2018-03-09 15:16  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-09 13:00 UTC (permalink / raw)
  To: Neil Horman; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On 3/9/2018 12:36 PM, Neil Horman wrote:
> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
>> used as named opaque type.
>>
>> So the functions that are adding callbacks can return objects in this
>> type instead of void pointer.
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
>> ---
>> v2:
>> * keep using struct * in parameters, instead add callback functions
>> return struct rte_eth_rxtx_callback pointer.
>>
>> v4:
>> * Remove deprecation notice. LIBABIVER already increased in this release
>> ---
>>  doc/guides/rel_notes/deprecation.rst |  7 -------
>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
>>  3 files changed, 11 insertions(+), 15 deletions(-)
>>
> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
> internal data structure, then it shouldn't be used as part of the prototype for
> an exported function, as the structure will then no longer be a internal data
> structure, but rather part of the public ABI.

"struct rte_eth_rxtx_callback" is internal data structure. And application
should not access elements of this structure.

"struct rte_eth_rxtx_callback;" is defined in the public header, so applications
can use it as opaque type.

It is possible that both "add" and "remove" APIs use "void *" and API itself can
cast it. But the inconsistency was "add" related APIs return "void *" and
"remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.

While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
"void *", because named opaque type documents intention/usage better.

Thanks,
ferruh

> 
> Neil
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 21:34  4%             ` Thomas Monjalon
@ 2018-03-09  0:18  4%               ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2018-03-09  0:18 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On Thu, Mar 08, 2018 at 10:34:14PM +0100, Thomas Monjalon wrote:
> 08/03/2018 20:40, Neil Horman:
> > On Thu, Mar 08, 2018 at 05:04:01PM +0100, Thomas Monjalon wrote:
> > > 08/03/2018 16:35, Neil Horman:
> > > > On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > > > > 08/03/2018 12:43, Ferruh Yigit:
> > > > > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > > > > 07/03/2018 18:44, Ferruh Yigit:
> > > > > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > > > > >> config and process which has similar targets?
> > > > > > > 
> > > > > > > They are different targets.
> > > > > > > Experimental API is always enabled but may be avoided by applications.
> > > > > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > > > > old ABI compatibility. It is almost never used because it is preferred
> > > > > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > > > > period after notice.
> > > > > > 
> > > > > > OK, I see.
> > > > > > 
> > > > > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > > > > attention to this config option will get and ABI/API break.
> > > > > 
> > > > > Yes I think you are right, it can be disabled by default.
> > > > > 
> > > > I would agree, there seems to be overlap here, and the experimental tagging can
> > > > cover what the NEXT_API flag is meant to do.  It can be removed I think.
> > > 
> > > It is not NEXT_API but NEXT_ABI.
> > Sorry, typo, though I'm sure you got that, since the former doesn't exist,
> > right?
> > > Why do you think it overlaps experimental API tagging?
> > 
> > I assert that because the compat lib has macros to map common symbols to version
> > specific ones.  That is to say, if you change a data structure, you can setup
> > the API calls that use said structure such that version 1 or the symbol maps to
> > an internal function that uses the old structure, while version 2 maps to an
> > internal function that uses the new symbol
> > 
> > That is to say, if you're planning on introducing ABI changes, the experimental
> > API tagging can be used to implement what the NEXT_ABI macro does.
> 
> It is a different usage.
> Experimental API tagging is for new functions.
> rte_compat is used to avoid breaking the ABI when changing old code.
> NEXT_ABI has been used in the past to disable an ABI breakage, which was
> not possible to mitigate with rte_compat because impacting too many functions.
> 
Thats not entirely true.  It _is_ used to manage ABI changes when backwards
compatibiilty needs to be preserved. It _can_be_ used for experimental abi
management.  That is to say, if you want to modify an existing ABI symbol, you
can do so by writing a new function, and then exporting the new function as the
old symbol with the @EXPERIMENTAL version.  Not saying we have to do that, but
we certainly can, and can eliminate NEXT_ABI in the process.

> I am not saying that I like NEXT_ABI, but it could be useful exceptionnally.
> 
Well, if the consensus is that it should be kept, its no skin off my nose, but
the discussion was around removing NEXT_ABI, and I was copied, so I thought I'd
add my $0.02

Neil

> 
> 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 2/7] eventtimer: add common code
  @ 2018-03-08 21:54  2%   ` Erik Gabriel Carrillo
  0 siblings, 0 replies; 200+ results
From: Erik Gabriel Carrillo @ 2018-03-08 21:54 UTC (permalink / raw)
  To: pbhagavatula; +Cc: dev, jerin.jacob, nipun.gupta, hemant.agrawal

This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 config/common_base                                |   1 +
 drivers/event/sw/sw_evdev.c                       |  18 +
 lib/librte_eventdev/Makefile                      |   2 +
 lib/librte_eventdev/rte_event_timer_adapter.c     | 459 ++++++++++++++++++++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 150 +++++++
 lib/librte_eventdev/rte_eventdev.h                |   3 +
 lib/librte_eventdev/rte_eventdev_pmd.h            |  35 ++
 lib/librte_eventdev/rte_eventdev_version.map      |  20 +
 8 files changed, 688 insertions(+)
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter.c
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter_pmd.h

diff --git a/config/common_base b/config/common_base
index ad03cf4..286df74 100644
--- a/config/common_base
+++ b/config/common_base
@@ -546,6 +546,7 @@ CONFIG_RTE_LIBRTE_EVENTDEV=y
 CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
 CONFIG_RTE_EVENT_MAX_DEVS=16
 CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
+CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 
 #
 # Compile PMD for skeleton event device
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 6672fd8..0847547 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -464,6 +464,22 @@ sw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
 	return 0;
 }
 
+static int
+sw_timer_adapter_caps_get(const struct rte_eventdev *dev,
+			  uint64_t flags,
+			  uint32_t *caps,
+			  const struct rte_event_timer_adapter_ops **ops)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(flags);
+	*caps = 0;
+
+	/* Use default SW ops */
+	*ops = NULL;
+
+	return 0;
+}
+
 static void
 sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info)
 {
@@ -791,6 +807,8 @@ sw_probe(struct rte_vdev_device *vdev)
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
+			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
+
 			.xstats_get = sw_xstats_get,
 			.xstats_get_names = sw_xstats_get_names,
 			.xstats_get_by_name = sw_xstats_get_by_name,
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 549b182..8b16e3f 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -20,6 +20,7 @@ LDLIBS += -lrte_eal -lrte_ring -lrte_ethdev -lrte_hash
 SRCS-y += rte_eventdev.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
+SRCS-y += rte_event_timer_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -29,6 +30,7 @@ SYMLINK-y-include += rte_eventdev_pmd_vdev.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
+SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
new file mode 100644
index 0000000..711d6b9
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -0,0 +1,459 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#include <string.h>
+
+#include <rte_memzone.h>
+#include <rte_memory.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+
+#include "rte_eventdev.h"
+#include "rte_eventdev_pmd.h"
+#include "rte_event_timer_adapter.h"
+#include "rte_event_timer_adapter_pmd.h"
+
+#define DATA_MZ_NAME_MAX_LEN 64
+#define DATA_MZ_NAME_FORMAT "rte_event_timer_adapter_data_%d"
+
+static int evtim_logtype;
+
+static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
+
+static inline int
+adapter_valid(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter != NULL && adapter->allocated == 1;
+}
+
+#define EVTIM_LOG(level, logtype, ...) \
+	rte_log(RTE_LOG_ ## level, logtype, \
+		RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__,) \
+			"\n", __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,)))
+
+#define EVTIM_LOG_ERR(...) EVTIM_LOG(ERR, evtim_logtype, __VA_ARGS__)
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+#define EVTIM_LOG_DBG(...) \
+	EVTIM_LOG(DEBUG, evtim_logtype, __VA_ARGS__)
+#else
+#define EVTIM_LOG_DBG(...) (void)0
+#endif
+
+#define ADAPTER_VALID_OR_ERR_RET(adapter, retval) do { \
+	if (!adapter_valid(adapter))		       \
+		return retval;			       \
+} while (0)
+
+#define FUNC_PTR_OR_ERR_RET(func, errval) do { \
+	if ((func) == NULL)		       \
+		return errval;		       \
+} while (0)
+
+#define FUNC_PTR_OR_NULL_RET_WITH_ERRNO(func, errval) do { \
+	if ((func) == NULL) {				   \
+		rte_errno = errval;			   \
+		return NULL;				   \
+	}						   \
+} while (0)
+
+static int
+default_port_conf_cb(uint16_t id, uint8_t event_dev_id, uint8_t *event_port_id,
+		     void *conf_arg)
+{
+	struct rte_event_timer_adapter *adapter;
+	struct rte_eventdev *dev;
+	struct rte_event_dev_config dev_conf;
+	struct rte_event_port_conf *port_conf, def_port_conf = {0};
+	int started;
+	uint8_t port_id;
+	uint8_t dev_id;
+	int ret;
+
+	RTE_SET_USED(event_dev_id);
+
+	adapter = &adapters[id];
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+	dev_id = dev->data->dev_id;
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to configure event dev %u\n", dev_id);
+		if (started)
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+
+		return ret;
+	}
+
+	if (conf_arg != NULL)
+		port_conf = conf_arg;
+	else {
+		port_conf = &def_port_conf;
+		ret = rte_event_port_default_conf_get(dev_id, port_id,
+						      port_conf);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rte_event_port_setup(dev_id, port_id, port_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to setup event port %u on event dev %u\n",
+			      port_id, dev_id);
+		return ret;
+	}
+
+	*event_port_id = port_id;
+
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+
+	return ret;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create(const struct rte_event_timer_adapter_conf *conf)
+{
+	return rte_event_timer_adapter_create_ext(conf, default_port_conf_cb,
+						  NULL);
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create_ext(
+		const struct rte_event_timer_adapter_conf *conf,
+		rte_event_timer_adapter_port_conf_cb_t conf_cb,
+		void *conf_arg)
+{
+	uint16_t adapter_id;
+	struct rte_event_timer_adapter *adapter;
+	const struct rte_memzone *mz;
+	char mz_name[DATA_MZ_NAME_MAX_LEN];
+	int n, ret;
+	struct rte_eventdev *dev;
+
+	if (conf == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check eventdev ID */
+	if (!rte_event_pmd_is_valid_dev(conf->event_dev_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	dev = &rte_eventdevs[conf->event_dev_id];
+
+	adapter_id = conf->timer_adapter_id;
+
+	/* Check that adapter_id is in range */
+	if (adapter_id >= RTE_EVENT_TIMER_ADAPTER_NUM_MAX) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check adapter ID not already allocated */
+	adapter = &adapters[adapter_id];
+	if (adapter->allocated) {
+		rte_errno = EEXIST;
+		return NULL;
+	}
+
+	/* Create shared data area. */
+	n = snprintf(mz_name, sizeof(mz_name), DATA_MZ_NAME_FORMAT, adapter_id);
+	if (n >= (int)sizeof(mz_name)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	mz = rte_memzone_reserve(mz_name,
+				 sizeof(struct rte_event_timer_adapter_data),
+				 conf->socket_id, 0);
+	if (mz == NULL)
+		/* rte_errno set by rte_memzone_reserve */
+		return NULL;
+
+	adapter->data = mz->addr;
+	memset(adapter->data, 0, sizeof(struct rte_event_timer_adapter_data));
+
+	adapter->data->mz = mz;
+	adapter->data->event_dev_id = conf->event_dev_id;
+	adapter->data->id = adapter_id;
+	adapter->data->socket_id = conf->socket_id;
+	adapter->data->conf = *conf;  /* copy conf structure */
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	if (!(adapter->data->caps &
+	      RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT)) {
+		FUNC_PTR_OR_NULL_RET_WITH_ERRNO(conf_cb, -EINVAL);
+		ret = conf_cb(adapter->data->id, adapter->data->event_dev_id,
+			      &adapter->data->event_port_id, conf_arg);
+		if (ret < 0) {
+			rte_errno = ret;
+			goto free_memzone;
+		}
+	}
+
+	/* Allow driver to do some setup */
+	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
+	ret = adapter->ops->init(adapter);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+
+free_memzone:
+	rte_memzone_free(adapter->data->mz);
+	return NULL;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->ops->get_info)
+		/* let driver set values it knows */
+		adapter->ops->get_info(adapter, adapter_info);
+
+	/* Set common values */
+	adapter_info->conf = adapter->data->conf;
+	adapter_info->event_dev_port_id = adapter->data->event_port_id;
+	adapter_info->caps = adapter->data->caps;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->start, -EINVAL);
+
+	ret = adapter->ops->start(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 1;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stop, -EINVAL);
+
+	if (adapter->data->started == 0) {
+		EVTIM_LOG_ERR("event timer adapter %hu already stopped",
+			      adapter->data->id);
+		return 0;
+	}
+
+	ret = adapter->ops->stop(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 0;
+
+	return 0;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_lookup(uint16_t adapter_id)
+{
+	char name[DATA_MZ_NAME_MAX_LEN];
+	const struct rte_memzone *mz;
+	struct rte_event_timer_adapter_data *data;
+	struct rte_event_timer_adapter *adapter;
+	int ret;
+	struct rte_eventdev *dev;
+
+	if (adapters[adapter_id].allocated)
+		return &adapters[adapter_id]; /* Adapter is already loaded */
+
+	snprintf(name, DATA_MZ_NAME_MAX_LEN, DATA_MZ_NAME_FORMAT, adapter_id);
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	data = mz->addr;
+
+	adapter = &adapters[data->id];
+	adapter->data = data;
+
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_free(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->uninit, -EINVAL);
+
+	if (adapter->data->started == 1) {
+		EVTIM_LOG_ERR("event timer adapter %hu must be stopped "
+			      "before freeing", adapter->data->id);
+		return -EBUSY;
+	}
+
+	/* free impl priv data */
+	ret = adapter->ops->uninit(adapter);
+	if (ret < 0)
+		return ret;
+
+	/* free shared data area */
+	ret = rte_memzone_free(adapter->data->mz);
+	if (ret < 0)
+		return ret;
+
+	adapter->data = NULL;
+	adapter->allocated = 0;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_service_id_get(struct rte_event_timer_adapter *adapter,
+				       uint32_t *service_id)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->data->service_inited && service_id != NULL)
+		*service_id = adapter->data->service_id;
+
+	return adapter->data->service_inited ? 0 : -ESRCH;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_get(struct rte_event_timer_adapter *adapter,
+				  struct rte_event_timer_adapter_stats *stats)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_get, -EINVAL);
+	if (stats == NULL)
+		return -EINVAL;
+
+	return adapter->ops->stats_get(adapter, stats);
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_reset, -EINVAL);
+	return adapter->ops->stats_reset(adapter);
+}
+
+void __rte_experimental
+rte_event_timer_init(struct rte_event_timer *evtim)
+{
+	evtim->ev.op = RTE_EVENT_OP_NEW;
+	evtim->ev.event_type = RTE_EVENT_TYPE_TIMER;
+	evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+}
+
+int __rte_experimental
+rte_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
+			  struct rte_event_timer **evtims,
+			  uint16_t nb_evtims)
+{
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->arm_burst, -EINVAL);
+#endif
+
+	return adapter->arm_burst(adapter, evtims, nb_evtims);
+}
+
+int __rte_experimental
+rte_event_timer_arm_tmo_tick_burst(
+			const struct rte_event_timer_adapter *adapter,
+			struct rte_event_timer **evtims,
+			const uint64_t timeout_ticks,
+			const uint16_t nb_evtims)
+{
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->arm_tmo_tick_burst, -EINVAL);
+#endif
+
+	return adapter->arm_tmo_tick_burst(adapter, evtims, timeout_ticks,
+					   nb_evtims);
+}
+
+int __rte_experimental
+rte_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
+			     struct rte_event_timer **evtims,
+			     uint16_t nb_evtims)
+{
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->cancel_burst, -EINVAL);
+#endif
+
+	return adapter->cancel_burst(adapter, evtims, nb_evtims);
+}
+
+RTE_INIT(event_timer_adapter_init_log);
+static void
+event_timer_adapter_init_log(void)
+{
+	evtim_logtype = rte_log_register("lib.eventdev.adapter.timer");
+	if (evtim_logtype >= 0)
+		rte_log_set_level(evtim_logtype, RTE_LOG_NOTICE);
+}
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
new file mode 100644
index 0000000..db044c8
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#ifndef __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+#define __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+
+/**
+ * @file
+ * RTE Event Timer Adapter API (PMD Side)
+ *
+ * @note
+ * This file provides implementation helpers for internal use by PMDs.  They
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_event_timer_adapter.h"
+
+/*
+ * Definitions of functions exported by an event timer adapter implementation
+ * through *rte_event_timer_adapter_ops* structure supplied in the
+ * *rte_event_timer_adapter* structure associated with an event timer adapter.
+ */
+
+typedef int (*rte_event_timer_adapter_init_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation setup */
+typedef int (*rte_event_timer_adapter_uninit_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation teardown */
+typedef int (*rte_event_timer_adapter_start_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Start running event timer adapter */
+typedef int (*rte_event_timer_adapter_stop_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Stop running event timer adapter */
+typedef void (*rte_event_timer_adapter_get_info_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info);
+/**< @internal Get contextual information for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_get_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats);
+/**< @internal Get statistics for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_reset_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Reset statistics for event timer adapter */
+typedef int (*rte_event_timer_arm_burst_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **tims,
+		uint16_t nb_tims);
+/**< @internal Enable event timers to enqueue timer events upon expiry */
+typedef int (*rte_event_timer_arm_tmo_tick_burst_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **tims,
+		uint64_t timeout_tick,
+		uint16_t nb_tims);
+/**< @internal Enable event timers with common expiration time */
+typedef int (*rte_event_timer_cancel_burst_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **tims,
+		uint16_t nb_tims);
+/**< @internal Prevent event timers from enqueuing timer events */
+
+/**
+ * @internal Structure containing the functions exported by an event timer
+ * adapter implementation.
+ */
+struct rte_event_timer_adapter_ops {
+	rte_event_timer_adapter_init_t		init;  /**< Set up adapter */
+	rte_event_timer_adapter_uninit_t	uninit;/**< Tear down adapter */
+	rte_event_timer_adapter_start_t		start; /**< Start adapter */
+	rte_event_timer_adapter_stop_t		stop;  /**< Stop adapter */
+	rte_event_timer_adapter_get_info_t	get_info;
+	/**< Get info from driver */
+	rte_event_timer_adapter_stats_get_t	stats_get;
+	/**< Get adapter statistics */
+	rte_event_timer_adapter_stats_reset_t	stats_reset;
+	/**< Reset adapter statistics */
+	rte_event_timer_arm_burst_t		arm_burst;
+	/**< Arm one or more event timers */
+	rte_event_timer_arm_tmo_tick_burst_t	arm_tmo_tick_burst;
+	/**< Arm event timers with same expiration time */
+	rte_event_timer_cancel_burst_t		cancel_burst;
+	/**< Cancel one or more event timers */
+};
+
+/**
+ * @internal Adapter data; structure to be placed in shared memory to be
+ * accessible by various processes in a multi-process configuration.
+ */
+struct rte_event_timer_adapter_data {
+	uint8_t id;
+	/**< Event timer adapter ID */
+	uint8_t event_dev_id;
+	/**< Event device ID */
+	uint32_t socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t event_port_id;
+	/**< Optional: event port ID used when the inbuilt port is absent */
+	const struct rte_memzone *mz;
+	/**< Event timer adapter memzone pointer */
+	struct rte_event_timer_adapter_conf conf;
+	/**< Configuration used to configure the adapter. */
+	uint32_t caps;
+	/**< Adapter capabilities */
+	void *adapter_priv;
+	/**< Timer adapter private data*/
+	uint8_t service_inited;
+	/**< Service initialization state */
+	uint32_t service_id;
+	/**< Service ID*/
+
+	RTE_STD_C11
+	uint8_t started : 1;
+	/**< Flag to indicate adapter started. */
+} __rte_cache_aligned;
+
+/**
+ * @internal Data structure associated with each event timer adapter.
+ */
+struct rte_event_timer_adapter {
+	rte_event_timer_arm_burst_t arm_burst;
+	/**< Pointer to driver arm_burst function. */
+	rte_event_timer_arm_tmo_tick_burst_t arm_tmo_tick_burst;
+	/**< Pointer to driver arm_tmo_tick_burst function. */
+	rte_event_timer_cancel_burst_t cancel_burst;
+	/**< Pointer to driver cancel function. */
+	struct rte_event_timer_adapter_data *data;
+	/**< Pointer to shared adapter data */
+	const struct rte_event_timer_adapter_ops *ops;
+	/**< Functions exported by adapter driver */
+
+	RTE_STD_C11
+	uint8_t allocated : 1;
+	/**< Flag to indicate that this adapter has been allocated */
+} __rte_cache_aligned;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __RTE_EVENT_TIMER_ADAPTER_PMD_H__ */
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index f9ad71e..888bcf1 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1046,6 +1046,9 @@ struct rte_event {
  * @see struct rte_event_eth_rx_adapter_queue_conf::rx_queue_flags
  */
 
+#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 1)
+/**< This flag is set when the timer mechanism is in HW. */
+
 /**
  * Retrieve the event device's ethdev Rx adapter capabilities for the
  * specified ethernet port
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index 31343b5..0e37f1c 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -26,6 +26,7 @@ extern "C" {
 #include <rte_malloc.h>
 
 #include "rte_eventdev.h"
+#include "rte_event_timer_adapter_pmd.h"
 
 /* Logging Macros */
 #define RTE_EDEV_LOG_ERR(...) \
@@ -449,6 +450,37 @@ typedef int (*eventdev_eth_rx_adapter_caps_get_t)
 struct rte_event_eth_rx_adapter_queue_conf *queue_conf;
 
 /**
+ * Retrieve the event device's timer adapter capabilities, as well as the ops
+ * structure that an event timer adapter should call through to enter the
+ * driver
+ *
+ * @param dev
+ *   Event device pointer
+ *
+ * @param flags
+ *   Flags that can be used to determine how to select an event timer
+ *   adapter ops structure
+ *
+ * @param[out] caps
+ *   A pointer to memory filled with Rx event adapter capabilities.
+ *
+ * @param[out] ops
+ *   A pointer to the ops pointer to set with the address of the desired ops
+ *   structure
+ *
+ * @return
+ *   - 0: Success, driver provides Rx event adapter capabilities for the
+ *	ethernet device.
+ *   - <0: Error code returned by the driver function.
+ *
+ */
+typedef int (*eventdev_timer_adapter_caps_get_t)(
+				const struct rte_eventdev *dev,
+				uint64_t flags,
+				uint32_t *caps,
+				const struct rte_event_timer_adapter_ops **ops);
+
+/**
  * Add ethernet Rx queues to event device. This callback is invoked if
  * the caps returned from rte_eventdev_eth_rx_adapter_caps_get(, eth_port_id)
  * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set.
@@ -640,6 +672,9 @@ struct rte_eventdev_ops {
 	eventdev_eth_rx_adapter_stats_reset eth_rx_adapter_stats_reset;
 	/**< Reset ethernet Rx stats */
 
+	eventdev_timer_adapter_caps_get_t timer_adapter_caps_get;
+	/**< Get timer adapter capabilities */
+
 	eventdev_selftest dev_selftest;
 	/**< Start eventdev Selftest */
 };
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 2aef470..345b0b1 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -74,3 +74,23 @@ DPDK_18.02 {
 
 	rte_event_dev_selftest;
 } DPDK_17.11;
+
+EXPERIMENTAL {
+	global:
+
+	rte_event_timer_adapter_create;
+	rte_event_timer_adapter_create_ext;
+	rte_event_timer_adapter_free;
+	rte_event_timer_adapter_get_info;
+	rte_event_timer_adapter_lookup;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_start;
+	rte_event_timer_adapter_stats_get;
+	rte_event_timer_adapter_stats_reset;
+	rte_event_timer_adapter_stop;
+	rte_event_timer_init;
+	rte_event_timer_arm_burst;
+	rte_event_timer_arm_tmo_tick_burst;
+	rte_event_timer_cancel_burst;
+} DPDK_18.02;
-- 
2.6.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 19:40  3%           ` Neil Horman
@ 2018-03-08 21:34  4%             ` Thomas Monjalon
  2018-03-09  0:18  4%               ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08 21:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

08/03/2018 20:40, Neil Horman:
> On Thu, Mar 08, 2018 at 05:04:01PM +0100, Thomas Monjalon wrote:
> > 08/03/2018 16:35, Neil Horman:
> > > On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > > > 08/03/2018 12:43, Ferruh Yigit:
> > > > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > > > 07/03/2018 18:44, Ferruh Yigit:
> > > > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > > > >> config and process which has similar targets?
> > > > > > 
> > > > > > They are different targets.
> > > > > > Experimental API is always enabled but may be avoided by applications.
> > > > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > > > old ABI compatibility. It is almost never used because it is preferred
> > > > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > > > period after notice.
> > > > > 
> > > > > OK, I see.
> > > > > 
> > > > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > > > attention to this config option will get and ABI/API break.
> > > > 
> > > > Yes I think you are right, it can be disabled by default.
> > > > 
> > > I would agree, there seems to be overlap here, and the experimental tagging can
> > > cover what the NEXT_API flag is meant to do.  It can be removed I think.
> > 
> > It is not NEXT_API but NEXT_ABI.
> Sorry, typo, though I'm sure you got that, since the former doesn't exist,
> right?
> > Why do you think it overlaps experimental API tagging?
> 
> I assert that because the compat lib has macros to map common symbols to version
> specific ones.  That is to say, if you change a data structure, you can setup
> the API calls that use said structure such that version 1 or the symbol maps to
> an internal function that uses the old structure, while version 2 maps to an
> internal function that uses the new symbol
> 
> That is to say, if you're planning on introducing ABI changes, the experimental
> API tagging can be used to implement what the NEXT_ABI macro does.

It is a different usage.
Experimental API tagging is for new functions.
rte_compat is used to avoid breaking the ABI when changing old code.
NEXT_ABI has been used in the past to disable an ABI breakage, which was
not possible to mitigate with rte_compat because impacting too many functions.

I am not saying that I like NEXT_ABI, but it could be useful exceptionnally.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 16:04  0%         ` Thomas Monjalon
@ 2018-03-08 19:40  3%           ` Neil Horman
  2018-03-08 21:34  4%             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-08 19:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On Thu, Mar 08, 2018 at 05:04:01PM +0100, Thomas Monjalon wrote:
> 08/03/2018 16:35, Neil Horman:
> > On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > > 08/03/2018 12:43, Ferruh Yigit:
> > > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > > 07/03/2018 18:44, Ferruh Yigit:
> > > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > > >> config and process which has similar targets?
> > > > > 
> > > > > They are different targets.
> > > > > Experimental API is always enabled but may be avoided by applications.
> > > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > > old ABI compatibility. It is almost never used because it is preferred
> > > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > > period after notice.
> > > > 
> > > > OK, I see.
> > > > 
> > > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > > attention to this config option will get and ABI/API break.
> > > 
> > > Yes I think you are right, it can be disabled by default.
> > > 
> > I would agree, there seems to be overlap here, and the experimental tagging can
> > cover what the NEXT_API flag is meant to do.  It can be removed I think.
> 
> It is not NEXT_API but NEXT_ABI.
Sorry, typo, though I'm sure you got that, since the former doesn't exist,
right?
> Why do you think it overlaps experimental API tagging?

I assert that because the compat lib has macros to map common symbols to version
specific ones.  That is to say, if you change a data structure, you can setup
the API calls that use said structure such that version 1 or the symbol maps to
an internal function that uses the old structure, while version 2 maps to an
internal function that uses the new symbol

That is to say, if you're planning on introducing ABI changes, the experimental
API tagging can be used to implement what the NEXT_ABI macro does.

Neil

> 
> 
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 15:35  0%       ` Neil Horman
@ 2018-03-08 16:04  0%         ` Thomas Monjalon
  2018-03-08 19:40  3%           ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08 16:04 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

08/03/2018 16:35, Neil Horman:
> On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > 08/03/2018 12:43, Ferruh Yigit:
> > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > 07/03/2018 18:44, Ferruh Yigit:
> > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > >> config and process which has similar targets?
> > > > 
> > > > They are different targets.
> > > > Experimental API is always enabled but may be avoided by applications.
> > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > old ABI compatibility. It is almost never used because it is preferred
> > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > period after notice.
> > > 
> > > OK, I see.
> > > 
> > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > attention to this config option will get and ABI/API break.
> > 
> > Yes I think you are right, it can be disabled by default.
> > 
> I would agree, there seems to be overlap here, and the experimental tagging can
> cover what the NEXT_API flag is meant to do.  It can be removed I think.

It is not NEXT_API but NEXT_ABI.
Why do you think it overlaps experimental API tagging?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 15:17  0%     ` Thomas Monjalon
@ 2018-03-08 15:35  0%       ` Neil Horman
  2018-03-08 16:04  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-08 15:35 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> 08/03/2018 12:43, Ferruh Yigit:
> > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > 07/03/2018 18:44, Ferruh Yigit:
> > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > >> config and process which has similar targets?
> > > 
> > > They are different targets.
> > > Experimental API is always enabled but may be avoided by applications.
> > > Next ABI can be used to break ABI without notice and disabled to keep
> > > old ABI compatibility. It is almost never used because it is preferred
> > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > period after notice.
> > 
> > OK, I see.
> > 
> > Shouldn't we disable it by default at least? Otherwise who is not paying
> > attention to this config option will get and ABI/API break.
> 
> Yes I think you are right, it can be disabled by default.
> 
I would agree, there seems to be overlap here, and the experimental tagging can
cover what the NEXT_API flag is meant to do.  It can be removed I think.
Neil

> 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 11:43  3%   ` Ferruh Yigit
@ 2018-03-08 15:17  0%     ` Thomas Monjalon
  2018-03-08 15:35  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08 15:17 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

08/03/2018 12:43, Ferruh Yigit:
> On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > 07/03/2018 18:44, Ferruh Yigit:
> >> After experimental API process defined do we still need RTE_NEXT_ABI
> >> config and process which has similar targets?
> > 
> > They are different targets.
> > Experimental API is always enabled but may be avoided by applications.
> > Next ABI can be used to break ABI without notice and disabled to keep
> > old ABI compatibility. It is almost never used because it is preferred
> > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > period after notice.
> 
> OK, I see.
> 
> Shouldn't we disable it by default at least? Otherwise who is not paying
> attention to this config option will get and ABI/API break.

Yes I think you are right, it can be disabled by default.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-03-08 12:12  3%       ` Bruce Richardson
@ 2018-03-08 14:38  0%         ` Burakov, Anatoly
  2018-03-09 16:32  0%           ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-03-08 14:38 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      
>>      v3:
>>      - Added ABI compatibility
>>      
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
> Breaking the ABI is the best way to implement this change, and given the
> deprecation was previously announced I'm ok with that.
> 
> Question: we are ok assuming that the socket numbers are sequential, or
> nearly so, and knowing the maximum socket number seen is a good
> approximation of the actual physical sockets? I know in terms of cores
> on a system, the core id's often jump - are there systems where the
> socket numbers do too?
> 
> /Bruce
> 

I am not aware of any system that would jump sockets like that. I'm open 
to corrections, however :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
@ 2018-03-08 12:12  3%       ` Bruce Richardson
  2018-03-08 14:38  0%         ` Burakov, Anatoly
  2018-03-21  4:59  0%       ` gowrishankar muthukrishnan
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-03-08 12:12 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it. This will
> break the ABI, so bump ABI version.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>     v4:
>     - Remove backwards ABI compatibility, bump ABI instead
>     
>     v3:
>     - Added ABI compatibility
>     
>     v2:
>     - checkpatch changes
>     - check socket before deciding if the core is not to be used
> 
>  lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>  lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>  lib/librte_eal/common/include/rte_eal.h   |  1 +
>  lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>  lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>  lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>  6 files changed, 44 insertions(+), 15 deletions(-)
> 
Breaking the ABI is the best way to implement this change, and given the
deprecation was previously announced I'm ok with that.

Question: we are ok assuming that the socket numbers are sequential, or
nearly so, and knowing the maximum socket number seen is a good
approximation of the actual physical sockets? I know in terms of cores
on a system, the core id's often jump - are there systems where the
socket numbers do too?

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08  8:05  5% ` Thomas Monjalon
@ 2018-03-08 11:43  3%   ` Ferruh Yigit
  2018-03-08 15:17  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-08 11:43 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> 07/03/2018 18:44, Ferruh Yigit:
>> After experimental API process defined do we still need RTE_NEXT_ABI
>> config and process which has similar targets?
> 
> They are different targets.
> Experimental API is always enabled but may be avoided by applications.
> Next ABI can be used to break ABI without notice and disabled to keep
> old ABI compatibility. It is almost never used because it is preferred
> to keep ABI compatibility with rte_compat macros, or wait a deprecation
> period after notice.

OK, I see.

Shouldn't we disable it by default at least? Otherwise who is not paying
attention to this config option will get and ABI/API break.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-07 17:44 23% [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
  2018-03-07 18:06  0% ` Luca Boccassi
@ 2018-03-08  8:05  5% ` Thomas Monjalon
  2018-03-08 11:43  3%   ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08  8:05 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

07/03/2018 18:44, Ferruh Yigit:
> After experimental API process defined do we still need RTE_NEXT_ABI
> config and process which has similar targets?

They are different targets.
Experimental API is always enabled but may be avoided by applications.
Next ABI can be used to break ABI without notice and disabled to keep
old ABI compatibility. It is almost never used because it is preferred
to keep ABI compatibility with rte_compat macros, or wait a deprecation
period after notice.

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [RFC PATCH 1/5] bpf: add BPF loading and execution framework
  @ 2018-03-08  1:29  2% ` Konstantin Ananyev
  0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-03-08  1:29 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 config/common_linuxapp             |   1 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  48 ++++
 lib/librte_bpf/bpf_exec.c          | 453 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  37 +++
 lib/librte_bpf/bpf_load.c          | 344 ++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/rte_bpf.h           | 154 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 mk/rte.app.mk                      |   2 +
 12 files changed, 1143 insertions(+)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ad03cf433..2205b684f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -823,3 +823,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..7b4a0ce7d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -10,6 +10,7 @@ CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
+CONFIG_RTE_LIBRTE_BPF=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..4727d2251
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+
+	if (rc != 0)
+		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..4bad0cc9e
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,453 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_LOG(ERR, USER1, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_LOG(ERR, USER1,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
+
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..f09417088
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..84c6b9417
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,344 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+static uint32_t
+bpf_find_func(const char *sn, const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == RTE_BPF_XTYPE_FUNC &&
+				strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_LOG(ERR, USER1, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	uint32_t i, idx, fidx, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+		if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+			return -EINVAL;
+
+		idx = ofs / sizeof(ins[0]);
+		if (ins[idx].code != (BPF_JMP | BPF_CALL))
+			return -EINVAL;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		fidx = bpf_find_func(sn, prm->xsym, prm->nb_xsym);
+		if (fidx == UINT32_MAX)
+			return -EINVAL;
+
+		ins[idx].imm = fidx;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_LOG(ERR, USER1, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_LOG(ERR, USER1,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_LOG(INFO, USER1, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p;\n",
+		__func__, fname, sname, bpf);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..7c1267cbd
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_LOG(ERR, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..45f622818
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	uint64_t (*func)(uint64_t, uint64_t, uint64_t, uint64_t, uint64_t);
+	/**< value */
+};
+
+/**
+ * Possible BPF program types.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3eb41d176..fb41c77d2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-07 17:44 23% [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
@ 2018-03-07 18:06  0% ` Luca Boccassi
  2018-03-08  8:05  5% ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Luca Boccassi @ 2018-03-07 18:06 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon, Neil Horman, John McNamara,
	Marko Kovacevic
  Cc: dev, Christian Ehrhardt

On Wed, 2018-03-07 at 17:44 +0000, Ferruh Yigit wrote:
> After experimental API process defined do we still need RTE_NEXT_ABI
> config and process which has similar targets?
> 
> Are distros disable experimental APIs when delivering DPDK? And is
> there
> any config required to control this, as RTE_NEXT_ABI intended to do?

I tried to tinker with not exporting experimental APIs - but the
problem is intra-project dependencies, iow: librte_foo has a
foo_experimental API that librte_bar uses, so if librte_foo
foo_experimental symbol is not available everything breaks down. I need
to spend a bit more on this problem but -ENOTIME

> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>  config/common_base                     |  5 -----
>  devtools/test-build.sh                 |  2 --
>  devtools/validate-abi.sh               |  1 -
>  doc/guides/contributing/versioning.rst | 10 ----------
>  mk/rte.lib.mk                          |  5 -----
>  pkg/dpdk.spec                          |  1 -
>  6 files changed, 24 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index ad03cf433..6b867f6a9 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -41,11 +41,6 @@ CONFIG_RTE_ARCH_STRICT_ALIGN=n
>  CONFIG_RTE_BUILD_SHARED_LIB=n
>  
>  #
> -# Use newest code breaking previous ABI
> -#
> -CONFIG_RTE_NEXT_ABI=y
> -
> -#
>  # Major ABI to overwrite library specific LIBABIVER
>  #
>  CONFIG_RTE_MAJOR_ABI=
> diff --git a/devtools/test-build.sh b/devtools/test-build.sh
> index 3362edcc5..22b4e1a98 100755
> --- a/devtools/test-build.sh
> +++ b/devtools/test-build.sh
> @@ -154,8 +154,6 @@ config () # <directory> <target> <options>
>  		# Built-in options (lowercase)
>  		! echo $3 | grep -q '+default' || \
>  		sed -ri 's,(RTE_MACHINE=")native,\1default,'
> $1/.config
> -		echo $3 | grep -q '+next' || \
> -		sed -ri           's,(NEXT_ABI=)y,\1n,' $1/.config
>  		! echo $3 | grep -q '+shared' || \
>  		sed -ri         's,(SHARED_LIB=)n,\1y,' $1/.config
>  		! echo $3 | grep -q '+debug' || ( \
> diff --git a/devtools/validate-abi.sh b/devtools/validate-abi.sh
> index 138436d93..a64edf92f 100755
> --- a/devtools/validate-abi.sh
> +++ b/devtools/validate-abi.sh
> @@ -105,7 +105,6 @@ set_log_file() {
>  fixup_config() {
>  	local conf=config/defconfig_$target
>  	cmd sed -i -e"$ a\CONFIG_RTE_BUILD_SHARED_LIB=y" $conf
> -	cmd sed -i -e"$ a\CONFIG_RTE_NEXT_ABI=n" $conf
>  	cmd sed -i -e"$ a\CONFIG_RTE_EAL_IGB_UIO=n" $conf
>  	cmd sed -i -e"$ a\CONFIG_RTE_LIBRTE_KNI=n" $conf
>  	cmd sed -i -e"$ a\CONFIG_RTE_KNI_KMOD=n" $conf
> diff --git a/doc/guides/contributing/versioning.rst
> b/doc/guides/contributing/versioning.rst
> index c495294db..59ff0e8b7 100644
> --- a/doc/guides/contributing/versioning.rst
> +++ b/doc/guides/contributing/versioning.rst
> @@ -91,19 +91,9 @@ being provided. The requirements for doing so are:
>       interest" be sought for each deprecation, for example: from NIC
> vendors,
>       CPU vendors, end-users, etc.
>  
> -#. The changes (including an alternative map file) must be gated
> with
> -   the ``RTE_NEXT_ABI`` option, and provided with a deprecation
> notice at the
> -   same time.
> -   It will become the default ABI in the next release.
> -
>  #. A full deprecation cycle, as explained above, must be made to
> offer
>     downstream consumers sufficient warning of the change.
>  
> -#. At the beginning of the next release cycle, every
> ``RTE_NEXT_ABI``
> -   conditions will be removed, the ``LIBABIVER`` variable in the
> makefile(s)
> -   where the ABI is changed will be incremented, and the map files
> will
> -   be updated.
> -
>  Note that the above process for ABI deprecation should not be
> undertaken
>  lightly. ABI stability is extremely important for downstream
> consumers of the
>  DPDK, especially when distributed in shared object form. Every
> effort should
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index c696a2174..8ac26face 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -20,11 +20,6 @@ endif
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> -ifeq ($(CONFIG_RTE_MAJOR_ABI),)
> -ifeq ($(CONFIG_RTE_NEXT_ABI),y)
> -LIB := $(LIB).1
> -endif
> -endif
>  CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
>  endif
>  endif
> diff --git a/pkg/dpdk.spec b/pkg/dpdk.spec
> index 4d3b5745c..d118f0463 100644
> --- a/pkg/dpdk.spec
> +++ b/pkg/dpdk.spec
> @@ -84,7 +84,6 @@ make O=%{target} T=%{config} config
>  sed -ri 's,(RTE_MACHINE=).*,\1%{machine},' %{target}/.config
>  sed -ri 's,(RTE_APP_TEST=).*,\1n,'         %{target}/.config
>  sed -ri 's,(RTE_BUILD_SHARED_LIB=).*,\1y,' %{target}/.config
> -sed -ri 's,(RTE_NEXT_ABI=).*,\1n,'         %{target}/.config
>  sed -ri 's,(LIBRTE_VHOST=).*,\1y,'         %{target}/.config
>  sed -ri 's,(LIBRTE_PMD_PCAP=).*,\1y,'      %{target}/.config
>  make O=%{target} %{?_smp_mflags}

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function
  2018-03-07 17:17  0%   ` Ferruh Yigit
@ 2018-03-07 17:47  0%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-07 17:47 UTC (permalink / raw)
  To: Kirill Rybalchenko, dev; +Cc: andrey.chilikin, thomas

On 3/7/2018 5:17 PM, Ferruh Yigit wrote:
> On 2/27/2018 2:18 PM, Kirill Rybalchenko wrote:
>> In 18.02 release the ABI of ethdev component was changed.
>> To keep compatibility with previous versions of the library
>> the versioning of rte_eth_dev_filter_ctrl function was implemented.
>> As soon as deprecation note was issued in 18.02 release, there is
>> no need to keep compatibility with previous versions.
>> Remove the versioning of rte_eth_dev_filter_ctrl function.
>>
>> v2:
>> Modify map file, increment library version,
>> remove deprecation notice
>>
>> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> 
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
@ 2018-03-07 17:44 23% Ferruh Yigit
  2018-03-07 18:06  0% ` Luca Boccassi
  2018-03-08  8:05  5% ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2018-03-07 17:44 UTC (permalink / raw)
  To: Thomas Monjalon, Neil Horman, John McNamara, Marko Kovacevic
  Cc: dev, Ferruh Yigit, Luca Boccassi, Christian Ehrhardt

After experimental API process defined do we still need RTE_NEXT_ABI
config and process which has similar targets?

Are distros disable experimental APIs when delivering DPDK? And is there
any config required to control this, as RTE_NEXT_ABI intended to do?

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 config/common_base                     |  5 -----
 devtools/test-build.sh                 |  2 --
 devtools/validate-abi.sh               |  1 -
 doc/guides/contributing/versioning.rst | 10 ----------
 mk/rte.lib.mk                          |  5 -----
 pkg/dpdk.spec                          |  1 -
 6 files changed, 24 deletions(-)

diff --git a/config/common_base b/config/common_base
index ad03cf433..6b867f6a9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -41,11 +41,6 @@ CONFIG_RTE_ARCH_STRICT_ALIGN=n
 CONFIG_RTE_BUILD_SHARED_LIB=n
 
 #
-# Use newest code breaking previous ABI
-#
-CONFIG_RTE_NEXT_ABI=y
-
-#
 # Major ABI to overwrite library specific LIBABIVER
 #
 CONFIG_RTE_MAJOR_ABI=
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 3362edcc5..22b4e1a98 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -154,8 +154,6 @@ config () # <directory> <target> <options>
 		# Built-in options (lowercase)
 		! echo $3 | grep -q '+default' || \
 		sed -ri 's,(RTE_MACHINE=")native,\1default,' $1/.config
-		echo $3 | grep -q '+next' || \
-		sed -ri           's,(NEXT_ABI=)y,\1n,' $1/.config
 		! echo $3 | grep -q '+shared' || \
 		sed -ri         's,(SHARED_LIB=)n,\1y,' $1/.config
 		! echo $3 | grep -q '+debug' || ( \
diff --git a/devtools/validate-abi.sh b/devtools/validate-abi.sh
index 138436d93..a64edf92f 100755
--- a/devtools/validate-abi.sh
+++ b/devtools/validate-abi.sh
@@ -105,7 +105,6 @@ set_log_file() {
 fixup_config() {
 	local conf=config/defconfig_$target
 	cmd sed -i -e"$ a\CONFIG_RTE_BUILD_SHARED_LIB=y" $conf
-	cmd sed -i -e"$ a\CONFIG_RTE_NEXT_ABI=n" $conf
 	cmd sed -i -e"$ a\CONFIG_RTE_EAL_IGB_UIO=n" $conf
 	cmd sed -i -e"$ a\CONFIG_RTE_LIBRTE_KNI=n" $conf
 	cmd sed -i -e"$ a\CONFIG_RTE_KNI_KMOD=n" $conf
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index c495294db..59ff0e8b7 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -91,19 +91,9 @@ being provided. The requirements for doing so are:
      interest" be sought for each deprecation, for example: from NIC vendors,
      CPU vendors, end-users, etc.
 
-#. The changes (including an alternative map file) must be gated with
-   the ``RTE_NEXT_ABI`` option, and provided with a deprecation notice at the
-   same time.
-   It will become the default ABI in the next release.
-
 #. A full deprecation cycle, as explained above, must be made to offer
    downstream consumers sufficient warning of the change.
 
-#. At the beginning of the next release cycle, every ``RTE_NEXT_ABI``
-   conditions will be removed, the ``LIBABIVER`` variable in the makefile(s)
-   where the ABI is changed will be incremented, and the map files will
-   be updated.
-
 Note that the above process for ABI deprecation should not be undertaken
 lightly. ABI stability is extremely important for downstream consumers of the
 DPDK, especially when distributed in shared object form. Every effort should
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index c696a2174..8ac26face 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -20,11 +20,6 @@ endif
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
 ifeq ($(EXTLIB_BUILD),n)
-ifeq ($(CONFIG_RTE_MAJOR_ABI),)
-ifeq ($(CONFIG_RTE_NEXT_ABI),y)
-LIB := $(LIB).1
-endif
-endif
 CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
 endif
 endif
diff --git a/pkg/dpdk.spec b/pkg/dpdk.spec
index 4d3b5745c..d118f0463 100644
--- a/pkg/dpdk.spec
+++ b/pkg/dpdk.spec
@@ -84,7 +84,6 @@ make O=%{target} T=%{config} config
 sed -ri 's,(RTE_MACHINE=).*,\1%{machine},' %{target}/.config
 sed -ri 's,(RTE_APP_TEST=).*,\1n,'         %{target}/.config
 sed -ri 's,(RTE_BUILD_SHARED_LIB=).*,\1y,' %{target}/.config
-sed -ri 's,(RTE_NEXT_ABI=).*,\1n,'         %{target}/.config
 sed -ri 's,(LIBRTE_VHOST=).*,\1y,'         %{target}/.config
 sed -ri 's,(LIBRTE_PMD_PCAP=).*,\1y,'      %{target}/.config
 make O=%{target} %{?_smp_mflags}
-- 
2.13.6

^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function
  2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
@ 2018-03-07 17:17  0%   ` Ferruh Yigit
  2018-03-07 17:47  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-07 17:17 UTC (permalink / raw)
  To: Kirill Rybalchenko, dev; +Cc: andrey.chilikin, thomas

On 2/27/2018 2:18 PM, Kirill Rybalchenko wrote:
> In 18.02 release the ABI of ethdev component was changed.
> To keep compatibility with previous versions of the library
> the versioning of rte_eth_dev_filter_ctrl function was implemented.
> As soon as deprecation note was issued in 18.02 release, there is
> no need to keep compatibility with previous versions.
> Remove the versioning of rte_eth_dev_filter_ctrl function.
> 
> v2:
> Modify map file, increment library version,
> remove deprecation notice
> 
> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC PATCH v1 0/4] ethdev: add per-PMD tuning of RxTx parmeters
@ 2018-03-07 12:08  3% Remy Horton
  2018-03-21 14:27  3% ` [dpdk-dev] [PATCH v2 " Remy Horton
  0 siblings, 1 reply; 200+ results
From: Remy Horton @ 2018-03-07 12:08 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing, Shreyansh Jain,
	Thomas Monjalon

The optimal values of several transmission & reception related parameters,
such as burst sizes, descriptor ring sizes, and number of queues, varies
between different network interface devices. This patchset allows individual
PMDs to specify their preferred parameter values, and if so indicated by an
application, for them to be used automatically by the ethdev layer.

This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
that subsequent patchsets will cover other PMDs. A deprecation notice
covering the API/ABI change is in place.


Remy Horton (4):
  ethdev: add support for PMD-tuned Tx/Rx parameters
  net/e1000: add TxRx tuning parameters
  net/i40e: add TxRx tuning parameters
  testpmd: make use of per-PMD TxRx parameters

 app/test-pmd/testpmd.c         |  5 +++--
 drivers/net/e1000/em_ethdev.c  |  8 ++++++++
 drivers/net/i40e/i40e_ethdev.c | 35 ++++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.c  | 18 ++++++++++++++++++
 lib/librte_ether/rte_ethdev.h  | 15 +++++++++++++++
 5 files changed, 76 insertions(+), 5 deletions(-)

-- 
2.9.5

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-07  9:59  0%     ` Thomas Monjalon
@ 2018-03-07 11:29  0%       ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-07 11:29 UTC (permalink / raw)
  To: Thomas Monjalon, Arnon Warshavsky; +Cc: bruce.richardson, dev

On 07-Mar-18 9:59 AM, Thomas Monjalon wrote:
> 07/03/2018 10:05, Burakov, Anatoly:
>> On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> 06/03/2018 19:28, Arnon Warshavsky:
>>>> The use case addressed here is dpdk environment init
>>>> aborting the process due to panic,
>>>> preventing the calling process from running its own tear-down actions.
>>>
>>> Thank you for working on this long standing issue.
>>>
>>>> A preferred, though ABI breaking solution would be
>>>> to have the environment init always return a value
>>>> rather than abort upon distress.
>>>
>>> Yes, it is the preferred solution.
>>> We should not use exit (panic & co) inside a library.
>>> It is important enough to break the API.
>>
>> +1, panic exists mostly for historical reasons AFAIK. it's a pity i
>> didn't think of it at the time of submitting the memory hotplug RFC,
>> because i now hit the same issue with the v1 - we might panic while
>> holding a lock, and didn't realize that it was an API break to change
>> this behavior.
>>
>> Can this really go into current release without deprecation notices?
> 
> If such an exception is done, it must be approved by the technical board.
> We need to check few criterias:
> 	- which functions need to be changed
> 	- how the application is impacted
> 	- what is the urgency
> 
> If a panic is removed and the application is not already checking some
> error code, the execution will continue without considering the error.
> 
> Some rte_panic could be probably removed without any impact on applications.
> Some rte_panic could wait for 18.08 with a notice in 18.05.
> If some rte_panic cannot wait, it must be discussed specifically.
> 

Can we add a compile warning for adding new rte_panic's into code? It's 
a nice tool while debugging, but it probably shouldn't be in any new 
production code.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-07  9:05  0%   ` Burakov, Anatoly
@ 2018-03-07  9:59  0%     ` Thomas Monjalon
  2018-03-07 11:29  0%       ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-07  9:59 UTC (permalink / raw)
  To: Burakov, Anatoly, Arnon Warshavsky; +Cc: bruce.richardson, dev

07/03/2018 10:05, Burakov, Anatoly:
> On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
> > Hi,
> > 
> > 06/03/2018 19:28, Arnon Warshavsky:
> >> The use case addressed here is dpdk environment init
> >> aborting the process due to panic,
> >> preventing the calling process from running its own tear-down actions.
> > 
> > Thank you for working on this long standing issue.
> > 
> >> A preferred, though ABI breaking solution would be
> >> to have the environment init always return a value
> >> rather than abort upon distress.
> > 
> > Yes, it is the preferred solution.
> > We should not use exit (panic & co) inside a library.
> > It is important enough to break the API.
> 
> +1, panic exists mostly for historical reasons AFAIK. it's a pity i 
> didn't think of it at the time of submitting the memory hotplug RFC, 
> because i now hit the same issue with the v1 - we might panic while 
> holding a lock, and didn't realize that it was an API break to change 
> this behavior.
> 
> Can this really go into current release without deprecation notices?

If such an exception is done, it must be approved by the technical board.
We need to check few criterias:
	- which functions need to be changed
	- how the application is impacted
	- what is the urgency

If a panic is removed and the application is not already checking some
error code, the execution will continue without considering the error.

Some rte_panic could be probably removed without any impact on applications.
Some rte_panic could wait for 18.08 with a notice in 18.05.
If some rte_panic cannot wait, it must be discussed specifically.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-07  8:32  0% ` Thomas Monjalon
@ 2018-03-07  9:05  0%   ` Burakov, Anatoly
  2018-03-07  9:59  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-03-07  9:05 UTC (permalink / raw)
  To: Thomas Monjalon, Arnon Warshavsky; +Cc: bruce.richardson, dev

On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
> Hi,
> 
> 06/03/2018 19:28, Arnon Warshavsky:
>> The use case addressed here is dpdk environment init
>> aborting the process due to panic,
>> preventing the calling process from running its own tear-down actions.
> 
> Thank you for working on this long standing issue.
> 
>> A preferred, though ABI breaking solution would be
>> to have the environment init always return a value
>> rather than abort upon distress.
> 
> Yes, it is the preferred solution.
> We should not use exit (panic & co) inside a library.
> It is important enough to break the API.

+1, panic exists mostly for historical reasons AFAIK. it's a pity i 
didn't think of it at the time of submitting the memory hotplug RFC, 
because i now hit the same issue with the v1 - we might panic while 
holding a lock, and didn't realize that it was an API break to change 
this behavior.

Can this really go into current release without deprecation notices?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-06 18:28  3% [dpdk-dev] [PATCH] eal: register rte_panic user callback Arnon Warshavsky
@ 2018-03-07  8:32  0% ` Thomas Monjalon
  2018-03-07  9:05  0%   ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-07  8:32 UTC (permalink / raw)
  To: Arnon Warshavsky; +Cc: bruce.richardson, dev

Hi,

06/03/2018 19:28, Arnon Warshavsky:
> The use case addressed here is dpdk environment init
> aborting the process due to panic,
> preventing the calling process from running its own tear-down actions.

Thank you for working on this long standing issue.

> A preferred, though ABI breaking solution would be
> to have the environment init always return a value
> rather than abort upon distress.

Yes, it is the preferred solution.
We should not use exit (panic & co) inside a library.
It is important enough to break the API.
I would be in favor of accepting such breakage in 18.05.

> This patch defines a couple of callback registration functions,
> one for panic and one for exit
> in case one wishes to distinguish between these events.
> Once a callback is set and panic takes place,
> it will be called prior to calling abort.
> 
> Maiden voyage patch for Qwilt and myself.

Are you OK to visit the other side of the solution?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] eal: register rte_panic user callback
@ 2018-03-06 18:28  3% Arnon Warshavsky
  2018-03-07  8:32  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Arnon Warshavsky @ 2018-03-06 18:28 UTC (permalink / raw)
  To: thomas, bruce.richardson; +Cc: dev, Arnon Warshavsky

The use case addressed here is dpdk environment init
aborting the process due to panic,
preventing the calling process from running its own tear-down actions.
A preferred, though ABI breaking solution would be
to have the environment init always return a value
rather than abort upon distress.

This patch defines a couple of callback registration functions,
one for panic and one for exit
in case one wishes to distinguish between these events.
Once a callback is set and panic takes place,
it will be called prior to calling abort.

Maiden voyage patch for Qwilt and myself.

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 lib/librte_eal/bsdapp/eal/eal_debug.c     | 37 ++++++++++++++++++++++++++++++
 lib/librte_eal/common/include/rte_debug.h | 24 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_debug.c   | 38 +++++++++++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map        |  2 ++
 4 files changed, 101 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_debug.c b/lib/librte_eal/bsdapp/eal/eal_debug.c
index 5d92500..010859d 100644
--- a/lib/librte_eal/bsdapp/eal/eal_debug.c
+++ b/lib/librte_eal/bsdapp/eal/eal_debug.c
@@ -18,6 +18,39 @@
 
 #define BACKTRACE_SIZE 256
 
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_exit()
+ */
+static rte_user_abort_callback_t *exit_user_callback;
+
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_panic()
+ */
+static rte_user_abort_callback_t *panic_user_callback;
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	panic_user_callback = cb;
+}
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	exit_user_callback = cb;
+}
+
+
 /* dump the stack of the calling core */
 void rte_dump_stack(void)
 {
@@ -59,6 +92,8 @@ void __rte_panic(const char *funcname, const char *format, ...)
 	va_end(ap);
 	rte_dump_stack();
 	rte_dump_registers();
+	if (panic_user_callback)
+		(*panic_user_callback)();
 	abort();
 }
 
@@ -78,6 +113,8 @@ rte_exit(int exit_code, const char *format, ...)
 	va_start(ap, format);
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
+	if (exit_user_callback)
+		(*exit_user_callback)();
 
 #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
diff --git a/lib/librte_eal/common/include/rte_debug.h b/lib/librte_eal/common/include/rte_debug.h
index 272df49..7e3d0a2 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -16,11 +16,35 @@
 
 #include "rte_log.h"
 #include "rte_branch_prediction.h"
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+
+/*
+ * Definition of user function pointer type to be called during
+ * the execution of rte_panic
+ */
+
+typedef void  (*rte_user_abort_callback_t)(void);
+/**< @internal Ethernet device configuration. */
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb);
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb);
+
 /**
  * Dump the stack of the calling core to the console.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_debug.c b/lib/librte_eal/linuxapp/eal/eal_debug.c
index 5d92500..b1748b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_debug.c
+++ b/lib/librte_eal/linuxapp/eal/eal_debug.c
@@ -16,8 +16,42 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 
+
 #define BACKTRACE_SIZE 256
 
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_exit()
+ */
+static rte_user_abort_callback_t *exit_user_callback;
+
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_panic()
+ */
+static rte_user_abort_callback_t *panic_user_callback;
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	panic_user_callback = cb;
+}
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	exit_user_callback = cb;
+}
+
+
 /* dump the stack of the calling core */
 void rte_dump_stack(void)
 {
@@ -59,6 +93,8 @@ void __rte_panic(const char *funcname, const char *format, ...)
 	va_end(ap);
 	rte_dump_stack();
 	rte_dump_registers();
+	if (panic_user_callback)
+		(*panic_user_callback)();
 	abort();
 }
 
@@ -78,6 +114,8 @@ rte_exit(int exit_code, const char *format, ...)
 	va_start(ap, format);
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
+	if (exit_user_callback)
+		(*exit_user_callback)();
 
 #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d123602..7b8f55d 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,11 +221,13 @@ EXPERIMENTAL {
 	rte_eal_hotplug_add;
 	rte_eal_hotplug_remove;
 	rte_eal_mbuf_user_pool_ops;
+	rte_exit_user_callback_register;
 	rte_mp_action_register;
 	rte_mp_action_unregister;
 	rte_mp_sendmsg;
 	rte_mp_request;
 	rte_mp_reply;
+	rte_panic_user_callback_register;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function
  2018-02-27 10:29  3% [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function Kirill Rybalchenko
  2018-02-27 11:01  0% ` Ferruh Yigit
@ 2018-02-27 14:18  7% ` Kirill Rybalchenko
  2018-03-07 17:17  0%   ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Kirill Rybalchenko @ 2018-02-27 14:18 UTC (permalink / raw)
  To: dev; +Cc: kirill.rybalchenko, andrey.chilikin, thomas, ferruh.yigit

In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

v2:
Modify map file, increment library version,
remove deprecation notice

Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
---
 doc/guides/rel_notes/deprecation.rst    |   6 --
 doc/guides/rel_notes/release_18_05.rst  |   2 +-
 lib/librte_ether/Makefile               |   2 +-
 lib/librte_ether/rte_ethdev.c           | 155 +-------------------------------
 lib/librte_ether/rte_ethdev_version.map |   1 -
 5 files changed, 4 insertions(+), 162 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 74c18ed..6594585 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -149,12 +149,6 @@ Deprecation Notices
   as parameter. For consistency functions adding callback will return
   ``struct rte_eth_rxtx_callback \*`` instead of ``void \*``.
 
-* ethdev: The size of variables ``flow_types_mask`` in
-  ``rte_eth_fdir_info structure``, ``sym_hash_enable_mask`` and
-  ``valid_bit_mask`` in ``rte_eth_hash_global_conf`` structure
-  will be increased from 32 to 64 bits to fulfill hardware requirements.
-  This change will break existing ABI as size of the structures will increase.
-
 * ethdev: ``rte_eth_dev_get_sec_ctx()`` fix port id storage
   ``rte_eth_dev_get_sec_ctx()`` is using ``uint8_t`` for ``port_id``,
   which should be ``uint16_t``.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 3923dc2..22da411 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -128,7 +128,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_cryptodev.so.4
      librte_distributor.so.1
      librte_eal.so.6
-     librte_ethdev.so.8
+   + librte_ethdev.so.9
      librte_eventdev.so.3
      librte_flow_classify.so.1
      librte_gro.so.1
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 3ca5782..c2f2f7d 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -16,7 +16,7 @@ LDLIBS += -lrte_mbuf
 
 EXPORT_MAP := rte_ethdev_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..78b8376 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,6 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-#include <rte_compat.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -3490,153 +3489,8 @@ rte_eth_dev_filter_supported(uint16_t port_id,
 }
 
 int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg)
-{
-	struct rte_eth_fdir_info_v22 {
-		enum rte_fdir_mode mode;
-		struct rte_eth_fdir_masks mask;
-		struct rte_eth_fdir_flex_conf flex_conf;
-		uint32_t guarant_spc;
-		uint32_t best_spc;
-		uint32_t flow_types_mask[1];
-		uint32_t max_flexpayload;
-		uint32_t flex_payload_unit;
-		uint32_t max_flex_payload_segment_num;
-		uint16_t flex_payload_limit;
-		uint32_t flex_bitmask_unit;
-		uint32_t max_flex_bitmask_num;
-	};
-
-	struct rte_eth_hash_global_conf_v22 {
-		enum rte_eth_hash_function hash_func;
-		uint32_t sym_hash_enable_mask[1];
-		uint32_t valid_bit_mask[1];
-	};
-
-	struct rte_eth_hash_filter_info_v22 {
-		enum rte_eth_hash_filter_info_type info_type;
-		union {
-			uint8_t enable;
-			struct rte_eth_hash_global_conf_v22 global_conf;
-			struct rte_eth_input_set_conf input_set_conf;
-		} info;
-	};
-
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
-	if (filter_op == RTE_ETH_FILTER_INFO) {
-		int retval;
-		struct rte_eth_fdir_info_v22 *fdir_info_v22;
-		struct rte_eth_fdir_info fdir_info;
-
-		fdir_info_v22 = (struct rte_eth_fdir_info_v22 *)arg;
-
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&fdir_info);
-		fdir_info_v22->mode = fdir_info.mode;
-		fdir_info_v22->mask = fdir_info.mask;
-		fdir_info_v22->flex_conf = fdir_info.flex_conf;
-		fdir_info_v22->guarant_spc = fdir_info.guarant_spc;
-		fdir_info_v22->best_spc = fdir_info.best_spc;
-		fdir_info_v22->flow_types_mask[0] =
-			(uint32_t)fdir_info.flow_types_mask[0];
-		fdir_info_v22->max_flexpayload = fdir_info.max_flexpayload;
-		fdir_info_v22->flex_payload_unit = fdir_info.flex_payload_unit;
-		fdir_info_v22->max_flex_payload_segment_num =
-			fdir_info.max_flex_payload_segment_num;
-		fdir_info_v22->flex_payload_limit =
-			fdir_info.flex_payload_limit;
-		fdir_info_v22->flex_bitmask_unit = fdir_info.flex_bitmask_unit;
-		fdir_info_v22->max_flex_bitmask_num =
-			fdir_info.max_flex_bitmask_num;
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_GET) {
-		int retval;
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_info_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_info_v22->info_type;
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&f_info);
-
-		switch (f_info_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info_v22->info.enable = f_info.info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info_v22->info.global_conf.hash_func =
-				f_info.info.global_conf.hash_func;
-			f_info_v22->info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.sym_hash_enable_mask[0];
-			f_info_v22->info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info_v22->info.input_set_conf =
-				f_info.info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_SET) {
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_v22->info_type;
-		switch (f_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info.info.enable = f_v22->info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info.info.global_conf.hash_func =
-				f_v22->info.global_conf.hash_func;
-			f_info.info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.sym_hash_enable_mask[0];
-			f_info.info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info.info.input_set_conf =
-				f_v22->info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    (void *)&f_info);
-	} else
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    arg);
-}
-VERSION_SYMBOL(rte_eth_dev_filter_ctrl, _v22, 2.2);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg)
+rte_eth_dev_filter_ctrl(uint16_t port_id, enum rte_filter_type filter_type,
+			enum rte_filter_op filter_op, void *arg)
 {
 	struct rte_eth_dev *dev;
 
@@ -3647,11 +3501,6 @@ rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
 	return eth_err(port_id, (*dev->dev_ops->filter_ctrl)(dev, filter_type,
 							     filter_op, arg));
 }
-BIND_DEFAULT_SYMBOL(rte_eth_dev_filter_ctrl, _v1802, 18.02);
-MAP_STATIC_SYMBOL(int rte_eth_dev_filter_ctrl(uint16_t port_id,
-		  enum rte_filter_type filter_type,
-		  enum rte_filter_op filter_op, void *arg),
-		  rte_eth_dev_filter_ctrl_v1802);
 
 void *
 rte_eth_add_rx_callback(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 87f02fb..34df6c8 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -16,7 +16,6 @@ DPDK_2.2 {
 	rte_eth_dev_count;
 	rte_eth_dev_default_mac_addr_set;
 	rte_eth_dev_detach;
-	rte_eth_dev_filter_ctrl;
 	rte_eth_dev_filter_supported;
 	rte_eth_dev_flow_ctrl_get;
 	rte_eth_dev_flow_ctrl_set;
-- 
2.5.5

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function
  2018-02-27 11:01  0% ` Ferruh Yigit
@ 2018-02-27 13:45  3%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-27 13:45 UTC (permalink / raw)
  To: Ferruh Yigit, Kirill Rybalchenko; +Cc: dev, andrey.chilikin

27/02/2018 12:01, Ferruh Yigit:
> On 2/27/2018 10:29 AM, Kirill Rybalchenko wrote:
> > In 18.02 release the ABI of ethdev component was changed.
> > To keep compatibility with previous versions of the library
> > the versioning of rte_eth_dev_filter_ctrl function was implemented.
> > As soon as deprecation note was issued in 18.02 release, there is
> > no need to keep compatibility with previous versions.
> > Remove the versioning of rte_eth_dev_filter_ctrl function.
> > 
> > Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> > ---
> >  lib/librte_ether/rte_ethdev.c | 155 +-----------------------------------------
> 
> Hi Kirill,
> 
> You need to update .map file and removed deprecation notice in this patch.

And bump the ABI version in Makefile and release notes.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function
  2018-02-27 10:29  3% [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function Kirill Rybalchenko
@ 2018-02-27 11:01  0% ` Ferruh Yigit
  2018-02-27 13:45  3%   ` Thomas Monjalon
  2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-02-27 11:01 UTC (permalink / raw)
  To: Kirill Rybalchenko, dev; +Cc: andrey.chilikin, thomas

On 2/27/2018 10:29 AM, Kirill Rybalchenko wrote:
> In 18.02 release the ABI of ethdev component was changed.
> To keep compatibility with previous versions of the library
> the versioning of rte_eth_dev_filter_ctrl function was implemented.
> As soon as deprecation note was issued in 18.02 release, there is
> no need to keep compatibility with previous versions.
> Remove the versioning of rte_eth_dev_filter_ctrl function.
> 
> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 155 +-----------------------------------------

Hi Kirill,

You need to update .map file and removed deprecation notice in this patch.

Thanks,
ferruh

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function
@ 2018-02-27 10:29  3% Kirill Rybalchenko
  2018-02-27 11:01  0% ` Ferruh Yigit
  2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
  0 siblings, 2 replies; 200+ results
From: Kirill Rybalchenko @ 2018-02-27 10:29 UTC (permalink / raw)
  To: dev; +Cc: kirill.rybalchenko, andrey.chilikin, thomas, ferruh.yigit

In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
---
 lib/librte_ether/rte_ethdev.c | 155 +-----------------------------------------
 1 file changed, 2 insertions(+), 153 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..78b8376 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,6 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-#include <rte_compat.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -3490,153 +3489,8 @@ rte_eth_dev_filter_supported(uint16_t port_id,
 }
 
 int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg)
-{
-	struct rte_eth_fdir_info_v22 {
-		enum rte_fdir_mode mode;
-		struct rte_eth_fdir_masks mask;
-		struct rte_eth_fdir_flex_conf flex_conf;
-		uint32_t guarant_spc;
-		uint32_t best_spc;
-		uint32_t flow_types_mask[1];
-		uint32_t max_flexpayload;
-		uint32_t flex_payload_unit;
-		uint32_t max_flex_payload_segment_num;
-		uint16_t flex_payload_limit;
-		uint32_t flex_bitmask_unit;
-		uint32_t max_flex_bitmask_num;
-	};
-
-	struct rte_eth_hash_global_conf_v22 {
-		enum rte_eth_hash_function hash_func;
-		uint32_t sym_hash_enable_mask[1];
-		uint32_t valid_bit_mask[1];
-	};
-
-	struct rte_eth_hash_filter_info_v22 {
-		enum rte_eth_hash_filter_info_type info_type;
-		union {
-			uint8_t enable;
-			struct rte_eth_hash_global_conf_v22 global_conf;
-			struct rte_eth_input_set_conf input_set_conf;
-		} info;
-	};
-
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
-	if (filter_op == RTE_ETH_FILTER_INFO) {
-		int retval;
-		struct rte_eth_fdir_info_v22 *fdir_info_v22;
-		struct rte_eth_fdir_info fdir_info;
-
-		fdir_info_v22 = (struct rte_eth_fdir_info_v22 *)arg;
-
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&fdir_info);
-		fdir_info_v22->mode = fdir_info.mode;
-		fdir_info_v22->mask = fdir_info.mask;
-		fdir_info_v22->flex_conf = fdir_info.flex_conf;
-		fdir_info_v22->guarant_spc = fdir_info.guarant_spc;
-		fdir_info_v22->best_spc = fdir_info.best_spc;
-		fdir_info_v22->flow_types_mask[0] =
-			(uint32_t)fdir_info.flow_types_mask[0];
-		fdir_info_v22->max_flexpayload = fdir_info.max_flexpayload;
-		fdir_info_v22->flex_payload_unit = fdir_info.flex_payload_unit;
-		fdir_info_v22->max_flex_payload_segment_num =
-			fdir_info.max_flex_payload_segment_num;
-		fdir_info_v22->flex_payload_limit =
-			fdir_info.flex_payload_limit;
-		fdir_info_v22->flex_bitmask_unit = fdir_info.flex_bitmask_unit;
-		fdir_info_v22->max_flex_bitmask_num =
-			fdir_info.max_flex_bitmask_num;
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_GET) {
-		int retval;
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_info_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_info_v22->info_type;
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&f_info);
-
-		switch (f_info_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info_v22->info.enable = f_info.info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info_v22->info.global_conf.hash_func =
-				f_info.info.global_conf.hash_func;
-			f_info_v22->info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.sym_hash_enable_mask[0];
-			f_info_v22->info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info_v22->info.input_set_conf =
-				f_info.info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_SET) {
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_v22->info_type;
-		switch (f_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info.info.enable = f_v22->info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info.info.global_conf.hash_func =
-				f_v22->info.global_conf.hash_func;
-			f_info.info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.sym_hash_enable_mask[0];
-			f_info.info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info.info.input_set_conf =
-				f_v22->info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    (void *)&f_info);
-	} else
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    arg);
-}
-VERSION_SYMBOL(rte_eth_dev_filter_ctrl, _v22, 2.2);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg)
+rte_eth_dev_filter_ctrl(uint16_t port_id, enum rte_filter_type filter_type,
+			enum rte_filter_op filter_op, void *arg)
 {
 	struct rte_eth_dev *dev;
 
@@ -3647,11 +3501,6 @@ rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
 	return eth_err(port_id, (*dev->dev_ops->filter_ctrl)(dev, filter_type,
 							     filter_op, arg));
 }
-BIND_DEFAULT_SYMBOL(rte_eth_dev_filter_ctrl, _v1802, 18.02);
-MAP_STATIC_SYMBOL(int rte_eth_dev_filter_ctrl(uint16_t port_id,
-		  enum rte_filter_type filter_type,
-		  enum rte_filter_op filter_op, void *arg),
-		  rte_eth_dev_filter_ctrl_v1802);
 
 void *
 rte_eth_add_rx_callback(uint16_t port_id, uint16_t queue_id,
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 1/7] crypto/virtio: add virtio related fundamental functions
  @ 2018-02-24 13:14  2% ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-02-24 13:14 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Since there are not have the common virtio library, we have to put
these files here. They are basically the same with virtio net related files
with some minor changes.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 config/common_base                  |  20 ++
 drivers/crypto/virtio/virtio_logs.h |  47 ++++
 drivers/crypto/virtio/virtio_pci.c  | 460 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h  | 252 ++++++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h | 137 +++++++++++
 drivers/crypto/virtio/virtqueue.c   |  43 ++++
 drivers/crypto/virtio/virtqueue.h   | 176 ++++++++++++++
 7 files changed, 1135 insertions(+)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/config/common_base b/config/common_base
index ad03cf4..19d0cdd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -482,6 +482,26 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n
 CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 
 #
+# Compile PMD for virtio crypto devices
+#
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP=n
+#
+# Number of maximum virtio crypto devices
+#
+CONFIG_RTE_MAX_VIRTIO_CRYPTO=32
+#
+# Number of sessions to create in the session memory pool
+# on a single virtio crypto device.
+#
+CONFIG_RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS=1024
+
+#
 # Compile PMD for AESNI backed device
 #
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..20582a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
+#define PMD_INIT_FUNC_TRACE() do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION
+#define PMD_SESSION_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() session: " fmt "\n", __func__, ## args)
+#else
+#define PMD_SESSION_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): driver " fmt "\n", __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..7aa5cdd
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	PMD_INIT_LOG(DEBUG, "queue %u addresses:", vq->vq_queue_index);
+	PMD_INIT_LOG(DEBUG, "\t desc_addr: %" PRIx64, desc_addr);
+	PMD_INIT_LOG(DEBUG, "\t aval_addr: %" PRIx64, avail_addr);
+	PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
+	PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		PMD_INIT_LOG(ERR, "invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		PMD_INIT_LOG(ERR, "offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		PMD_INIT_LOG(ERR,
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		PMD_INIT_LOG(DEBUG, "failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			PMD_INIT_LOG(ERR,
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			PMD_INIT_LOG(DEBUG,
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		PMD_INIT_LOG(DEBUG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		PMD_INIT_LOG(INFO, "no modern virtio pci device found.");
+		return -1;
+	}
+
+	PMD_INIT_LOG(INFO, "found modern virtio pci device.");
+
+	PMD_INIT_LOG(DEBUG, "common cfg mapped at: %p", hw->common_cfg);
+	PMD_INIT_LOG(DEBUG, "device cfg mapped at: %p", hw->dev_cfg);
+	PMD_INIT_LOG(DEBUG, "isr cfg mapped at: %p", hw->isr);
+	PMD_INIT_LOG(DEBUG, "notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		PMD_INIT_LOG(INFO, "modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..a469ea3
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,252 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..1bd0e89
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,176 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	PMD_INIT_LOG(DEBUG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH] doc: fixing grammar
@ 2018-02-22 12:15  8% Alejandro Lucero
  0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-02-22 12:15 UTC (permalink / raw)
  To: dev; +Cc: stable

My english is far worse than those from the marketing team.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 doc/guides/nics/nfp.rst | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index 99a3b76..67e574e 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -34,14 +34,14 @@ NFP poll mode driver library
 Netronome's sixth generation of flow processors pack 216 programmable
 cores and over 100 hardware accelerators that uniquely combine packet,
 flow, security and content processing in a single device that scales
-up to 400 Gbps.
+up to 400-Gb/s.
 
 This document explains how to use DPDK with the Netronome Poll Mode
 Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
 (NFP-6xxx) and Netronome's Flow Processor 4xxx (NFP-4xxx).
 
 NFP is a SRIOV capable device and the PMD driver supports the physical
-function (PF) and virtual functions (VFs).
+function (PF) and the virtual functions (VFs).
 
 Dependencies
 ------------
@@ -49,17 +49,18 @@ Dependencies
 Before using the Netronome's DPDK PMD some NFP configuration,
 which is not related to DPDK, is required. The system requires
 installation of **Netronome's BSP (Board Support Package)** along
-with some specific NFP firmware application. Netronome's NSP ABI
+with a specific NFP firmware application. Netronome's NSP ABI
 version should be 0.20 or higher.
 
 If you have a NFP device you should already have the code and
-documentation for doing all this configuration. Contact
+documentation for this configuration. Contact
 **support@netronome.com** to obtain the latest available firmware.
 
-The NFP Linux netdev kernel driver for VFs is part of vanilla kernel
-since kernel version 4.5, and support for the PF since kernel version
-4.11. Support for older kernels can be obtained on Github at
-**https://github.com/Netronome/nfp-drv-kmods** along with build
+The NFP Linux netdev kernel driver for VFs has been a part of the
+vanilla kernel since kernel version 4.5, and support for the PF
+since kernel version 4.11. Support for older kernels can be obtained
+on Github at
+**https://github.com/Netronome/nfp-drv-kmods** along with the build
 instructions.
 
 NFP PMD needs to be used along with UIO ``igb_uio`` or VFIO (``vfio-pci``)
@@ -70,15 +71,15 @@ Building the software
 
 Netronome's PMD code is provided in the **drivers/net/nfp** directory.
 Although NFP PMD has Netronome´s BSP dependencies, it is possible to
-compile it along with other DPDK PMDs even if no BSP was installed before.
+compile it along with other DPDK PMDs even if no BSP was installed previously.
 Of course, a DPDK app will require such a BSP installed for using the
 NFP PMD, along with a specific NFP firmware application.
 
-Default PMD configuration is at **common_linuxapp configuration** file:
+Default PMD configuration is at the **common_linuxapp configuration** file:
 
 - **CONFIG_RTE_LIBRTE_NFP_PMD=y**
 
-Once DPDK is built all the DPDK apps and examples include support for
+Once the DPDK is built all the DPDK apps and examples include support for
 the NFP PMD.
 
 
@@ -91,18 +92,18 @@ for details.
 Using the PF
 ------------
 
-NFP PMD has support for using the NFP PF as another DPDK port, but it does not
+NFP PMD supports using the NFP PF as another DPDK port, but it does not
 have any functionality for controlling VFs. In fact, it is not possible to use
 the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
-bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK version will
+bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
 have a PMD able to work with the PF and VFs at the same time and with the PF
 implementing VF management along with other PF-only functionalities/offloads.
 
-The PMD PF has extra work to do which will delay the DPDK app initialization
-like checking if a firmware is already available in the device, uploading the
-firmware if necessary, and configure the Link state properly when starting or
-stopping a PF port. Note that firmware upload is not always necessary which is
-the main delay for NFP PF PMD initialization.
+The PMD PF has extra work to do which will delay the DPDK app initialization.
+This additional effort could be checking if a firmware is already available in
+the device, uploading the firmware if necessary or configuring the Link state
+properly when starting or stopping a PF port. Note that firmware upload is not
+always necessary which is the main delay for NFP PF PMD initialization.
 
 Depending on the Netronome product installed in the system, firmware files
 should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
@@ -114,14 +115,14 @@ PF multiport support
 --------------------
 
 Some NFP cards support several physical ports with just one single PCI device.
-DPDK core is designed with the 1:1 relationship between PCI devices and DPDK
+The DPDK core is designed with a 1:1 relationship between PCI devices and DPDK
 ports, so NFP PMD PF support requires handling the multiport case specifically.
 During NFP PF initialization, the PMD will extract the information about the
 number of PF ports from the firmware and will create as many DPDK ports as
 needed.
 
 Because the unusual relationship between a single PCI device and several DPDK
-ports, there are some limitations when using more than one PF DPDK ports: there
+ports, there are some limitations when using more than one PF DPDK port: there
 is no support for RX interrupts and it is not possible either to use those PF
 ports with the device hotplug functionality.
 
@@ -136,7 +137,7 @@ System configuration
    get the drivers from the above Github repository and follow the instructions
    for building and installing it.
 
-   Virtual Functions need to be enabled before they can be used with the PMD.
+   VFs need to be enabled before they can be used with the PMD.
    Before enabling the VFs it is useful to obtain information about the
    current NFP PCI device detected by the system:
 
-- 
1.9.1

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH v4] lib/librte_meter: add meter configuration profile
  @ 2018-02-19 21:12  3%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-19 21:12 UTC (permalink / raw)
  To: Jasvinder Singh, cristian.dumitrescu; +Cc: dev

08/01/2018 16:43, Jasvinder Singh:
> From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> 
> This patch adds support for meter configuration profiles.
> Benefits: simplified configuration procedure, improved performance.
> 
> Q1: What is the configuration profile and why does it make sense?
> A1: The configuration profile represents the set of configuration
>     parameters for a given meter object, such as the rates and sizes for
>     the token buckets. The configuration profile concept makes sense when
>     many meter objects share the same configuration, which is the typical
>     usage model: thousands of traffic flows are each individually metered
>     according to just a few service levels (i.e. profiles).
> 
> Q2: How is the configuration profile improving the performance?
> A2: The performance improvement is achieved by reducing the memory
>     footprint of a meter object, which results in better cache utilization
>     for the typical case when large arrays of meter objects are used. The
>     internal data structures stored for each meter object contain:
>        a) Constant fields: Low level translation of the configuration
>           parameters that does not change post-configuration. This is
>           really duplicated for all meters that use the same
>           configuration. This is the configuration profile data that is
>           moved away from the meter object. Current size (implementation
>           dependent): srTCM = 32 bytes, trTCM = 32 bytes.
>        b) Variable fields: Time stamps and running counters that change
>           during the on-going traffic metering process. Current size
>           (implementation dependant): srTCM = 24 bytes, trTCM = 32 bytes.
>           Therefore, by moving the constant fields to a separate profile
>           data structure shared by all the meters with the same
>           configuration, the size of the meter object is reduced by ~50%.
> 
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

Applied for 18.05 (was postponed to preserve 18.02 ABI), thanks.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05
  2018-02-15 13:04  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05 John McNamara
@ 2018-02-16 22:54  0% ` Carrillo, Erik G
  0 siblings, 0 replies; 200+ results
From: Carrillo, Erik G @ 2018-02-16 22:54 UTC (permalink / raw)
  To: Mcnamara, John, dev; +Cc: Mcnamara, John

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of John McNamara
> Sent: Thursday, February 15, 2018 7:04 AM
> To: dev@dpdk.org
> Cc: Mcnamara, John <john.mcnamara@intel.com>
> Subject: [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05
> 
> Add template release notes for DPDK 18.05 with inline comments and
> explanations of the various sections.
> 
> Signed-off-by: John McNamara <john.mcnamara@intel.com>
> ---
>  doc/guides/rel_notes/release_18_05.rst | 187
> +++++++++++++++++++++++++++++++++
>  1 file changed, 187 insertions(+)
>  create mode 100644 doc/guides/rel_notes/release_18_05.rst
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> new file mode 100644
> index 0000000..85f4dc5
> --- /dev/null
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -0,0 +1,187 @@
> +DPDK Release 18.05
> +==================
> +
> +.. **Read this first.**
> +
> +   The text in the sections below explains how to update the release notes.
> +
> +   Use proper spelling, capitalization and punctuation in all sections.
> +
> +   Variable and config names should be quoted as fixed width text:
> +   ``LIKE_THIS``.
> +
> +   Build the docs and view the output file to ensure the changes are correct::
> +
> +      make doc-guides-html
> +
> +      xdg-open build/doc/html/guides/rel_notes/release_18_05.html
> +
> +
> +New Features
> +------------
> +
> +.. This section should contain new features added in this release. Sample
> +   format:
> +
> +   * **Add a title in the past tense with a full stop.**
> +
> +     Add a short 1-2 sentence description in the past tense. The description
> +     should be enough to allow someone scanning the release notes to
> +     understand the new feature.
> +
> +     If the feature adds a lot of sub-features you can use a bullet list like
> +     this:
> +
> +     * Added feature foo to do something.
> +     * Enhanced feature bar to do something else.
> +
> +     Refer to the previous release notes for examples.
> +
> +     This section is a comment. Do not overwrite or remove it.
> +     Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +API Changes
> +-----------
> +
> +.. This section should contain API changes. Sample format:
> +
> +   * Add a short 1-2 sentence description of the API change. Use fixed width
> +     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
> +     tense.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +ABI Changes
> +-----------
> +
> +.. This section should contain ABI changes. Sample format:
> +
> +   * Add a short 1-2 sentence description of the ABI change that was
> announced
> +     in the previous releases and made in this release. Use fixed width quotes
> +     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +Removed Items
> +-------------
> +
> +.. This section should contain removed items in this release. Sample format:
> +
> +   * Add a short 1-2 sentence description of the removed item in the past
> +     tense.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +Known Issues
> +------------
> +
> +.. This section should contain new known issues in this release. Sample
> format:
> +
> +   * **Add title in present tense with full stop.**
> +
> +     Add a short 1-2 sentence description of the known issue in the present
> +     tense. Add information on any known workarounds.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +Shared Library Versions
> +-----------------------
> +
> +.. Update any library version updated in this release and prepend with a ``+``
> +   sign, like this:
> +
> +     librte_acl.so.2
> +   + librte_cfgfile.so.2
> +     librte_cmdline.so.2
> +
> +   This section is a comment. Do not overwrite or remove it.
> +
> =========================================================
> +
> +
> +The libraries prepended with a plus sign were incremented in this version.
> +
> +.. code-block:: diff
> +
> +     librte_acl.so.2
> +     librte_bbdev.so.1
> +     librte_bitratestats.so.2
> +     librte_bus_dpaa.so.1
> +     librte_bus_fslmc.so.1
> +     librte_bus_pci.so.1
> +     librte_bus_vdev.so.1
> +     librte_cfgfile.so.2
> +     librte_cmdline.so.2
> +     librte_cryptodev.so.4
> +     librte_distributor.so.1
> +     librte_eal.so.6
> +     librte_ethdev.so.8
> +     librte_eventdev.so.3
> +     librte_flow_classify.so.1
> +     librte_gro.so.1
> +     librte_gso.so.1
> +     librte_hash.so.2
> +     librte_ip_frag.so.1
> +     librte_jobstats.so.1
> +     librte_kni.so.2
> +     librte_kvargs.so.1
> +     librte_latencystats.so.1
> +     librte_lpm.so.2
> +     librte_mbuf.so.3
> +     librte_mempool.so.3
> +     librte_meter.so.1
> +     librte_metrics.so.1
> +     librte_net.so.1
> +     librte_pci.so.1
> +     librte_pdump.so.2
> +     librte_pipeline.so.3
> +     librte_pmd_bnxt.so.2
> +     librte_pmd_bond.so.2
> +     librte_pmd_i40e.so.2
> +     librte_pmd_ixgbe.so.2
> +     librte_pmd_ring.so.2
> +     librte_pmd_softnic.so.1
> +     librte_pmd_vhost.so.2
> +     librte_port.so.3
> +     librte_power.so.1
> +     librte_rawdev.so.1
> +     librte_reorder.so.1
> +     librte_ring.so.1
> +     librte_sched.so.1
> +     librte_security.so.1
> +     librte_table.so.3
> +     librte_timer.so.1
> +     librte_vhost.so.3
> +
> +
> +Tested Platforms
> +----------------
> +
> +.. This section should contain a list of platforms that were tested with this
> +   release.
> +
> +   The format is:
> +
> +   * <vendor> platform with <vendor> <type of devices> combinations
> +
> +     * List of CPU
> +     * List of OS
> +     * List of devices
> +     * Other relevant details...
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> --
> 2.7.5

Acked-by:  Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] net/tap: add CRC stripping capability
  2018-02-15 21:55  3%   ` Stephen Hemminger
@ 2018-02-16 13:00  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-16 13:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ophir Munk, dev, Pascal Mazon, Olga Shern

15/02/2018 22:55, Stephen Hemminger:
> On Tue, 13 Feb 2018 17:35:20 +0100
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 13/02/2018 09:14, Ophir Munk:
> > > CRC stripping is executed in the kernel outside of TAP PMD scope.
> > > There is no prevention that the TAP PMD will report on Rx CRC
> > > stripping capability.
> > > In the corrupted code, TAP PMD did not report on this capability.
> > > The fix enables TAP PMD to report that Rx CRC stripping is supported.
> > > 
> > > Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD")
> > > Cc: stable@dpdk.org
> > > 
> > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>  
> > 
> > Applied, thanks
> > 
> 
> The whole CRC strip flag notion is backwards. It really should of been
> a bit set if driver allows preserving CRC.
> 
> Since changing the ABI is not possible right now;
> the ethdev core ought to log a warning whenever driver is registered
> without CRC_STRIP flag.
> 
> Or is lack of CRC_STRIP in offload flags implying that driver can
> do strip and not stripping?

I agree we should change the API.
Let's open a new thread to discuss it with a wider audience.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] net/tap: add CRC stripping capability
  @ 2018-02-15 21:55  3%   ` Stephen Hemminger
  2018-02-16 13:00  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-02-15 21:55 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Ophir Munk, dev, Pascal Mazon, Olga Shern, stable

On Tue, 13 Feb 2018 17:35:20 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> 13/02/2018 09:14, Ophir Munk:
> > CRC stripping is executed in the kernel outside of TAP PMD scope.
> > There is no prevention that the TAP PMD will report on Rx CRC
> > stripping capability.
> > In the corrupted code, TAP PMD did not report on this capability.
> > The fix enables TAP PMD to report that Rx CRC stripping is supported.
> > 
> > Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>  
> 
> Applied, thanks
> 

The whole CRC strip flag notion is backwards. It really should of been
a bit set if driver allows preserving CRC.

Since changing the ABI is not possible right now;
the ethdev core ought to log a warning whenever driver is registered
without CRC_STRIP flag.

Or is lack of CRC_STRIP in offload flags implying that driver can
do strip and not stripping?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05
@ 2018-02-15 13:04  6% John McNamara
  2018-02-16 22:54  0% ` Carrillo, Erik G
  0 siblings, 1 reply; 200+ results
From: John McNamara @ 2018-02-15 13:04 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Add template release notes for DPDK 18.05 with inline
comments and explanations of the various sections.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_18_05.rst | 187 +++++++++++++++++++++++++++++++++
 1 file changed, 187 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_18_05.rst

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
new file mode 100644
index 0000000..85f4dc5
--- /dev/null
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -0,0 +1,187 @@
+DPDK Release 18.05
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      xdg-open build/doc/html/guides/rel_notes/release_18_05.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample
+   format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to
+     understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like
+     this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =========================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
+     tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change that was announced
+     in the previous releases and made in this release. Use fixed width quotes
+     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item in the past
+     tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+``
+   sign, like this:
+
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+   This section is a comment. Do not overwrite or remove it.
+   =========================================================
+
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     librte_acl.so.2
+     librte_bbdev.so.1
+     librte_bitratestats.so.2
+     librte_bus_dpaa.so.1
+     librte_bus_fslmc.so.1
+     librte_bus_pci.so.1
+     librte_bus_vdev.so.1
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_cryptodev.so.4
+     librte_distributor.so.1
+     librte_eal.so.6
+     librte_ethdev.so.8
+     librte_eventdev.so.3
+     librte_flow_classify.so.1
+     librte_gro.so.1
+     librte_gso.so.1
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_latencystats.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.3
+     librte_mempool.so.3
+     librte_meter.so.1
+     librte_metrics.so.1
+     librte_net.so.1
+     librte_pci.so.1
+     librte_pdump.so.2
+     librte_pipeline.so.3
+     librte_pmd_bnxt.so.2
+     librte_pmd_bond.so.2
+     librte_pmd_i40e.so.2
+     librte_pmd_ixgbe.so.2
+     librte_pmd_ring.so.2
+     librte_pmd_softnic.so.1
+     librte_pmd_vhost.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_rawdev.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_security.so.1
+     librte_table.so.3
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this
+   release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
-- 
2.7.5

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v6] checkpatches.sh: Add checks for ABI symbol addition
                     ` (2 preceding siblings ...)
  2018-02-09 15:21  6% ` [dpdk-dev] [PATCH v5] " Neil Horman
@ 2018-02-14 19:19  6% ` Neil Horman
  3 siblings, 0 replies; 200+ results
From: Neil Horman @ 2018-02-14 19:19 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, thomas, john.mcnamara, bruce.richardson,
	Ferruh Yigit, Stephen Hemminger

Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental.  The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file.  The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change.  It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: thomas@monjalon.net
CC: john.mcnamara@intel.com
CC: bruce.richardson@intel.com
CC: Ferruh Yigit <ferruh.yigit@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>

---
Change notes

v2)
 * Cleaned up and documented awk script (shemminger)
 * fixed sort/uniq usage (shemminger)
 * moved checking to new script (tmonjalon)
 * added maintainer entry (tmonjalon)
 * added license (tmonjalon)

v3)
 * Changed symbol check script name (tmonjalon)
 * Trapped exit to clean temp file (tmonjalon)
 * Honored verbose command (tmonjalon)
 * Cleaned left over debug bits (tmonjalon)
 * Updated location in MAINTAINERS file (tmonjalon)

v4)
 * Updated maintainers file (tmonjalon)

v5)
 * undo V4 (tmojalon)

v6)
 * Cleaning up more nits (tmonjalon)
 * Combining some lines (tmonjalon)
 * Fixing error print condition (tmonjalon)
 * Redirect stdin to a file to allow rewinding for
   Multiple passes on tools (nhorman)
---
 MAINTAINERS                     |   1 +
 devtools/check-symbol-change.sh | 146 ++++++++++++++++++++++++++++++++++++++++
 devtools/checkpatches.sh        |  46 +++++++++++--
 3 files changed, 188 insertions(+), 5 deletions(-)
 create mode 100755 devtools/check-symbol-change.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index a646ca3e1..f83b9ab33 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -87,6 +87,7 @@ M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: devtools/validate-abi.sh
+F: devtools/check-symbol-change.sh
 F: buildtools/check-experimental-syms.sh
 
 Driver information
diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
new file mode 100755
index 000000000..22b17e6f2
--- /dev/null
+++ b/devtools/check-symbol-change.sh
@@ -0,0 +1,146 @@
+#!/bin/sh
+
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
+
+build_map_changes()
+{
+	local fname=$1
+	local mapdb=$2
+
+	cat $fname | filterdiff -i *.map | awk '
+		# Initialize our variables
+		BEGIN {map="";sym="";ar="";sec=""; in_sec=0}
+
+		# Anything that starts with + or -, followed by an a
+		# and ends in the string .map is the name of our map file
+		# This may appear multiple times in a patch if multiple
+		# map files are altered, and all section/symbol names
+		# appearing between a triggering of this rule and the
+		# next trigger of this rule are associated with this file
+		/[-+] a\/.*\.map/ {map=$2}
+
+		# Triggering this rule, which starts a line with a + and ends it
+		# with a { identifies a versioned section.  The section name is
+		# the rest of the line with the + and { symbols remvoed.
+		# Triggering this rule sets in_sec to 1, which actives the
+		# symbol rule below
+		/+.*{/ {gsub("+","");sec=$1; in_sec=1}
+
+		# This rule idenfies the end of a section, and disables the
+		# symbol rule
+		/.*}/ {in_sec=0}
+
+		# This rule matches on a + followed by any characters except a :
+		# (which denotes a global vs local segment), and ends with a ;.
+		# The semicolon is removed and the symbol is printed with its
+		# association file name and version section, along with an
+		# indicator that the symbol is a new addition.  Note this rule
+		# only works if we have found a version section in the rule
+		# above (hence the in_sec check).  Otherwise we flag it as an
+		# unknown section
+		/^+[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " add"
+			} else {
+				print map " " sym " unknown add"
+			}
+		}
+
+		# This is the same rule as above, but the rule matches on a
+		# leading - rather than a +, denoting that the symbol is being
+		# removed.
+		/^-[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " del"
+			} else {
+				print map " " sym " unknown del"
+			}
+		}' > ./$mapdb
+
+		sort -u $mapdb > ./$mapdb.2
+		mv -f $mapdb.2 $mapdb
+
+}
+
+check_for_rule_violations()
+{
+	local mapdb=$1
+	local mname
+	local symname
+	local secname
+	local ar
+	local ret=0
+
+	while read mname symname secname ar
+	do
+		if [ "$ar" == "add" ]
+		then
+
+			if [ "$secname" == "unknown" ]
+			then
+				# Just inform the user of this occurrence, but
+				# don't flag it as an error
+				echo -n "INFO: symbol $syname is added but "
+				echo -n "patch has insuficient context "
+				echo -n "to determine the section name "
+				echo -n "please ensure the version is "
+				echo "EXPERIMENTAL"
+				continue
+			fi
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Symbols that are getting added in a section
+				# other ithan the experimental section
+				# to be moving from an already supported
+				# section or its a violation
+				grep -q \
+				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				if [ $? -ne 0 ]
+				then
+					echo -n "ERROR: symbol $symname "
+					echo -n "is added in a section "
+					echo -n "other than the EXPERIMENTAL "
+					echo "section of the version map"
+					ret=1
+				fi
+			fi
+		else
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Just inform users that non-experimenal
+				# symbols need to go through a deprecation
+				# process
+				echo -n "INFO: symbol $symname is being "
+				echo -n "removed, ensure that it has "
+				echo "gone through the deprecation process"
+			fi
+		fi
+	done < $mapdb
+
+	return $ret
+}
+
+trap clean_and_exit_on_sig EXIT
+
+mapfile=`mktemp mapdb.XXXXXX`
+patch=$1
+exit_code=1
+
+clean_and_exit_on_sig()
+{
+	rm -f $mapfile
+	exit $exit_code
+}
+
+build_map_changes $patch $mapfile
+check_for_rule_violations $mapfile
+exit_code=$?
+
+rm -f $mapfile
+
+exit $exit_code
+
+
diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 7676a6b50..fa36b0d98 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -35,6 +35,10 @@
 # - DPDK_CHECKPATCH_LINE_LENGTH
 . $(dirname $(readlink -e $0))/load-devel-config
 
+trap "rm -f $TMPINPUT" SIGINT
+
+VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh
+
 length=${DPDK_CHECKPATCH_LINE_LENGTH:-80}
 
 # override default Linux options
@@ -61,6 +65,7 @@ print_usage () {
 	END_OF_HELP
 }
 
+
 number=0
 quiet=false
 verbose=false
@@ -86,19 +91,50 @@ total=0
 status=0
 
 check () { # <patch> <commit> <title>
+	local ret=0
+	TMPINPUT=`mktemp checkpatches.XXXXXX`
+
 	total=$(($total + 1))
 	! $verbose || printf '\n### %s\n\n' "$3"
 	if [ -n "$1" ] ; then
 		report=$($DPDK_CHECKPATCH_PATH $options "$1" 2>/dev/null)
 	elif [ -n "$2" ] ; then
-		report=$(git format-patch --find-renames --no-stat --stdout -1 $commit |
+		git format-patch --find-renames --no-stat --stdout -1 $commit > ./$TMPINPUT
+		report=$(cat ./$TMPINPUT |
 			$DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
 	else
-		report=$($DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
+		cat > ./$TMPINPUT
+		report=$(cat ./$TMPINPUT | $DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
+	fi
+	if [ $? -ne 0 ]
+	then
+		$verbose || printf '\n### %s\n\n' "$3"
+		printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+		ret=1
+	fi
+
+	! $verbose || printf '\nChecking API additions/removals:\n'
+
+	if [ -n "$1" ] ; then
+		report=$($VALIDATE_NEW_API "$1")
+	elif [ -n "$2" ] ; then
+		report=$( cat ./$TMPINPUT | 
+			$VALIDATE_NEW_API -)
+	else
+		report=$(cat ./$TMPINPUT | $VALIDATE_NEW_API -)
+	fi
+
+	if [ $? -ne 0 ]
+	then
+		printf '%s\n' "$report"
+		ret=1
+	fi
+
+	rm -f ./$TMPINPUT
+	if [ $ret -eq 0 ]
+	then
+		return 0
 	fi
-	[ $? -ne 0 ] || return 0
-	$verbose || printf '\n### %s\n\n' "$3"
-	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
 	status=$(($status + 1))
 }
 
-- 
2.14.3

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [dpdk-announce] DPDK 18.02 released
@ 2018-02-14 19:11  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 19:11 UTC (permalink / raw)
  To: announce

A new major release is available:
	http://fast.dpdk.org/rel/dpdk-18.02.tar.xz

Special attention was paid to not break the ABI in this release.
It means 18.02 could replace 17.11 without rebuilding the applications.
However it is advised to keep using 17.11 LTS for long term deployments.

Some highlights:
	- new license header (SPDX tag)
	- bbdev (Wireless Base Band) device class
	- rawdev device class
	- ethdev probe notifications and port ownership
	- Hyper-V platform driver
	- AVF (Adaptive Virtual Function) ethdev driver
	- IPsec offload in DPAA
	- DPAA eventdev driver
	- OPDL (Ordered Packet Distribution Library) eventdev driver
	- experimental tags and automatic check
	- meson build system (beta)

More details in the release notes:
	http://dpdk.org/doc/guides/rel_notes/release_18_02.html

The statistics are similar to previous release:
	1315 patches from 145 authors
	2316 files changed, 100569 insertions(+), 77209 deletions(-)

There are 46 new contributors
(including authors, reviewers and testers):
Thanks to Aleksey Baulin, Amr Mokhtar, Andrea Grandi, Andrew Jackson,
Anoob Joseph, Avi Kivity, Bao-Long Tran, Bharat Mota, Cheryl Houser,
Ciara Power, David Coyle, Dustin Lundquist, Erik Gabriel Carrillo,
George Wilkie, Georgios Katsikas, Gong Deli, Hyong Youb Kim,
Jerry Lilijun, Jun Yang, Junjie Chen, Kefu Chai, Kevin Laatz,
Laszlo Ersek, Liang Ma, Mallesh Koujalagi, Martin Klozik,
Matthew Smith, Michael McConville, Natalie Samsonov, Nikhil Agarwal,
Peter Mccarthy, Prashant Bhole, Rafal Kozik, Rosen Xu, Roy Franz,
Sharmila Podury, Stefan Hajnoczi, Sunil Kumar Kori, Thomas Speier,
Tomasz Jozwiak, Vijay Srivastava, Wisam Jaddo, Xin Long, Yang Zhang,
Yanglong Wu and Zhike Wang.

Below is the number of patches per company (accuracy not perfect):
    463     Intel (57)
    213     Mellanox (11)
    132     NXP (7)
    131     Cavium (9)
    102     6WIND (8)
     83     Solarflare (6)
     27     Broadcom (2)
     24     RedHat (5)
     21     Semihalf (3)
     20     Microsoft (2)
     17     Cisco (3)
     16     OKTET Labs (2)
      9     AT&T (4)
      6     Marvell (1)
      5     Netronome (1)
      5     IBM (2)
      4     ZTE (1)
      4     Linaro (1)
      4     HXT Semiconductor (1)
      4     ARM (2)

The new features for 18.05 must be submitted before the next month,
in order to be reviewed and integrated during March.
The next release is expected to happen at the beginning of May.

Thanks everyone

PS: Like last year, this release is done during Valentine's day.
It is an opportunity to stop working and offer a day to your Valentine!

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] doc: ethdev ABI change deprecation notice
  2018-02-14  0:14  4%         ` Thomas Monjalon
@ 2018-02-14 17:18  4%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 17:18 UTC (permalink / raw)
  To: Kirill Rybalchenko
  Cc: dev, Olivier Matz, Ferruh Yigit, Neil Horman, andrey.chilikin,
	adrien.mazarguil

14/02/2018 01:14, Thomas Monjalon:
> > > >> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> > > >>
> > > >> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
> > > >> ---
> > > >> +* ethdev: announce ABI change
> > > >> +  The size of variables flow_types_mask in rte_eth_fdir_info structure,
> > > >> +  sym_hash_enable_mask and valid_bit_mask in rte_eth_hash_global_conf structure
> > > >> +  will be increased from 32 to 64 bits to fulfill hardware requirements.
> > > >> +  This change will break existing ABI as size of the structures will increase.
> > > >> +
> > > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > > 
> > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > 
> > Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
> 
> I would prefer you drop the legacy code to keep only rte_flow.

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  2018-02-14 15:54  4%   ` Jerin Jacob
@ 2018-02-14 16:50  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 16:50 UTC (permalink / raw)
  To: shahafs
  Cc: dev, Jerin Jacob, Boccassi, Luca, nhorman, remy.horton,
	mohammad.abdul.awal, declan.doherty, ferruh.yigit

> > > This is following the RFC being discussed and targets 18.05
> > > 
> > > http://dpdk.org/ml/archives/dev/2018-January/085716.html
> > > 
> > > Cc: declan.doherty@intel.com
> > > Cc: mohammad.abdul.awal@intel.com
> > > Cc: ferruh.yigit@intel.com
> > > Cc: remy.horton@intel.com
> > > 
> > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > ---
> > > +* ethdev: A work is being planned for 18.05 to expose VF port
> > > representors
> > > +  as a mean to perform control and data path operation on the
> > > different VFs.
> > > +  As VF representor is an ethdev port, new fields are needed in
> > > order to map
> > > +  between the VF representor and the VF or the parent PF. Those new
> > > fields
> > > +  are to be included in ``rte_eth_dev_info`` struct.
> > 
> > Acked-by: Luca Boccassi <luca.boccassi@intl.att.com>
> > Acked-by: Alex Zelezniak <alexz@att.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure
  2018-02-13 12:10  4%       ` Jerin Jacob
@ 2018-02-14 16:28  4%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 16:28 UTC (permalink / raw)
  To: Xueming(Steven) Li
  Cc: dev, Jerin Jacob, Ferruh Yigit, Shahaf Shuler, Neil Horman

> > >> Update deprecation notice for the new rss_level field of rte_eth_rss_conf.
> > >>
> > >> Link: http://www.dpdk.org/dev/patchwork/patch/31891
> > >>
> > >> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > >> ---
> > >> +* ethdev: A new rss level field planned in 18.05.
> > >> +  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a
> > >> +choice
> > >> +  of RSS hash calculation on outer or inner header of tunneled packet.
> > > 
> > > Acked-By: Shahaf Shuler <shahafs@mellanox.com>
> > 
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules
  2018-02-14 15:55  0% ` [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Nélio Laranjeiro
@ 2018-02-14 16:06  0%   ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-02-14 16:06 UTC (permalink / raw)
  To: Nélio Laranjeiro, Adrien Mazarguil
  Cc: Neil Horman, Ferruh Yigit, dev, Doherty, Declan, Shahaf Shuler,
	John Daley (johndale),
	Xueming(Steven) Li, Thomas Monjalon

On 02/14/2018 06:55 PM, Nélio Laranjeiro wrote:
> On Wed, Feb 14, 2018 at 04:37:26PM +0100, Adrien Mazarguil wrote:
>> Series of API/ABI change announcements for rte_flow to enable proper
>> encap/decap support and address various design issues that can't be
>> addressed without ABI impact.
>>
>> Adrien Mazarguil (4):
>>    doc: announce API change for flow actions
>>    doc: announce API change for flow RSS action
>>    doc: announce API change for flow RSS/RAW actions
>>    doc: announce API change for flow VLAN pattern item
>>
>>   doc/guides/rel_notes/deprecation.rst | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> -- 
>> 2.11.0
> For the series,
>
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

For the series,

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  2018-02-14 15:27  4% ` Boccassi, Luca
@ 2018-02-14 15:54  4%   ` Jerin Jacob
  2018-02-14 16:50  4%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-02-14 15:54 UTC (permalink / raw)
  To: Boccassi, Luca
  Cc: shahafs, thomas, nhorman, remy.horton, mohammad.abdul.awal,
	declan.doherty, ferruh.yigit, dev

-----Original Message-----
> Date: Wed, 14 Feb 2018 15:27:50 +0000
> From: "Boccassi, Luca" <luca.boccassi@intl.att.com>
> To: "shahafs@mellanox.com" <shahafs@mellanox.com>, "thomas@monjalon.net"
>  <thomas@monjalon.net>, "nhorman@tuxdriver.com" <nhorman@tuxdriver.com>
> CC: "remy.horton@intel.com" <remy.horton@intel.com>,
>  "mohammad.abdul.awal@intel.com" <mohammad.abdul.awal@intel.com>,
>  "declan.doherty@intel.com" <declan.doherty@intel.com>,
>  "ferruh.yigit@intel.com" <ferruh.yigit@intel.com>, "dev@dpdk.org"
>  <dev@dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF
>  representors
> 
> On Wed, 2018-02-14 at 14:32 +0200, Shahaf Shuler wrote:
> > This is following the RFC being discussed and targets 18.05
> > 
> > http://dpdk.org/ml/archives/dev/2018-January/085716.html
> > 
> > Cc: declan.doherty@intel.com
> > Cc: mohammad.abdul.awal@intel.com
> > Cc: ferruh.yigit@intel.com
> > Cc: remy.horton@intel.com
> > 
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index d59ad5988..f6151de63 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -59,3 +59,9 @@ Deprecation Notices
> >    be added between the producer and consumer structures. The size of
> > the
> >    structure and the offset of the fields will remain the same on
> >    platforms with 64B cache line, but will change on other platforms.
> > +
> > +* ethdev: A work is being planned for 18.05 to expose VF port
> > representors
> > +  as a mean to perform control and data path operation on the
> > different VFs.
> > +  As VF representor is an ethdev port, new fields are needed in
> > order to map
> > +  between the VF representor and the VF or the parent PF. Those new
> > fields
> > +  are to be included in ``rte_eth_dev_info`` struct.
> 
> Acked-by: Luca Boccassi <luca.boccassi@intl.att.com>
> Acked-by: Alex Zelezniak <alexz@att.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

> 
> Acking on behalf of my colleague Alex as well, who replied privately.
> 
> -- 
> Kind regards,
> Luca Boccassi

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules
  2018-02-14 15:37  4% [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Adrien Mazarguil
  2018-02-14 15:37  3% ` [dpdk-dev] [PATCH v1 3/4] doc: announce API change for flow RSS/RAW actions Adrien Mazarguil
@ 2018-02-14 15:55  0% ` Nélio Laranjeiro
  2018-02-14 16:06  0%   ` Andrew Rybchenko
  1 sibling, 1 reply; 200+ results
From: Nélio Laranjeiro @ 2018-02-14 15:55 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Neil Horman, Ferruh Yigit, dev, Doherty, Declan, Shahaf Shuler,
	John Daley (johndale),
	Xueming(Steven) Li, Thomas Monjalon

On Wed, Feb 14, 2018 at 04:37:26PM +0100, Adrien Mazarguil wrote:
> Series of API/ABI change announcements for rte_flow to enable proper
> encap/decap support and address various design issues that can't be
> addressed without ABI impact.
> 
> Adrien Mazarguil (4):
>   doc: announce API change for flow actions
>   doc: announce API change for flow RSS action
>   doc: announce API change for flow RSS/RAW actions
>   doc: announce API change for flow VLAN pattern item
> 
>  doc/guides/rel_notes/deprecation.rst | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> -- 
> 2.11.0

For the series,

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 3/4] doc: announce API change for flow RSS/RAW actions
  2018-02-14 15:37  4% [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Adrien Mazarguil
@ 2018-02-14 15:37  3% ` Adrien Mazarguil
  2018-02-14 15:55  0% ` [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Nélio Laranjeiro
  1 sibling, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-14 15:37 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ferruh Yigit, dev, Doherty, Declan, Shahaf Shuler,
	John Daley (johndale),
	Nelio Laranjeiro, Xueming(Steven) Li, Thomas Monjalon

C99-style flexible arrays were a bad idea for this API. This announces a
minor API/ABI change to remove them.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 40b76b391..77390ce9f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,3 +72,8 @@ Deprecation Notices
   rte_eth_rss_conf`` due to its limitations. All parameters, including the
   currently missing hash function to use will be made part of ``struct
   rte_flow_action_rss`` directly.
+
+* rte_flow: C99-style flexible arrays in ``struct rte_flow_action_rss`` and
+  ``struct rte_flow_item_raw`` will be replaced by standard pointers to the
+  same data. They proved difficult to use in the field (e.g. no possibility
+  of static initialization) and are unsuitable for C++ applications.
-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules
@ 2018-02-14 15:37  4% Adrien Mazarguil
  2018-02-14 15:37  3% ` [dpdk-dev] [PATCH v1 3/4] doc: announce API change for flow RSS/RAW actions Adrien Mazarguil
  2018-02-14 15:55  0% ` [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Nélio Laranjeiro
  0 siblings, 2 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-14 15:37 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ferruh Yigit, dev, Doherty, Declan, Shahaf Shuler,
	John Daley (johndale),
	Nelio Laranjeiro, Xueming(Steven) Li, Thomas Monjalon

Series of API/ABI change announcements for rte_flow to enable proper
encap/decap support and address various design issues that can't be
addressed without ABI impact.

Adrien Mazarguil (4):
  doc: announce API change for flow actions
  doc: announce API change for flow RSS action
  doc: announce API change for flow RSS/RAW actions
  doc: announce API change for flow VLAN pattern item

 doc/guides/rel_notes/deprecation.rst | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

-- 
2.11.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  2018-02-14 12:32  4% [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors Shahaf Shuler
                   ` (2 preceding siblings ...)
  2018-02-14 14:50  4% ` Remy Horton
@ 2018-02-14 15:27  4% ` Boccassi, Luca
  2018-02-14 15:54  4%   ` Jerin Jacob
  3 siblings, 1 reply; 200+ results
From: Boccassi, Luca @ 2018-02-14 15:27 UTC (permalink / raw)
  To: shahafs, thomas, nhorman
  Cc: remy.horton, mohammad.abdul.awal, declan.doherty, ferruh.yigit, dev

On Wed, 2018-02-14 at 14:32 +0200, Shahaf Shuler wrote:
> This is following the RFC being discussed and targets 18.05
> 
> http://dpdk.org/ml/archives/dev/2018-January/085716.html
> 
> Cc: declan.doherty@intel.com
> Cc: mohammad.abdul.awal@intel.com
> Cc: ferruh.yigit@intel.com
> Cc: remy.horton@intel.com
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index d59ad5988..f6151de63 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -59,3 +59,9 @@ Deprecation Notices
>    be added between the producer and consumer structures. The size of
> the
>    structure and the offset of the fields will remain the same on
>    platforms with 64B cache line, but will change on other platforms.
> +
> +* ethdev: A work is being planned for 18.05 to expose VF port
> representors
> +  as a mean to perform control and data path operation on the
> different VFs.
> +  As VF representor is an ethdev port, new fields are needed in
> order to map
> +  between the VF representor and the VF or the parent PF. Those new
> fields
> +  are to be included in ``rte_eth_dev_info`` struct.

Acked-by: Luca Boccassi <luca.boccassi@intl.att.com>
Acked-by: Alex Zelezniak <alexz@att.com>

Acking on behalf of my colleague Alex as well, who replied privately.

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool
  2018-02-01 12:53  4%     ` Hemant Agrawal
@ 2018-02-14 15:23  4%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 15:23 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Hemant Agrawal, Jerin Jacob, Olivier Matz

> >>> An API/ABI changes are planned for 18.05 [1]:
> >>>
> >>>   * Allow to customize how mempool objects are stored in memory.
> >>>   * Deprecate mempool XMEM API.
> >>>   * Add mempool driver ops to get information from mempool driver and
> >>>     dequeue contiguous blocks of objects if driver supports it.
> >>>
> >>> [1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
> >>>
> >>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> >>
> >> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> > 
> > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > 
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  2018-02-14 12:32  4% [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors Shahaf Shuler
  2018-02-14 13:50  4% ` Thomas Monjalon
  2018-02-14 13:54  4% ` Doherty, Declan
@ 2018-02-14 14:50  4% ` Remy Horton
  2018-02-14 15:27  4% ` Boccassi, Luca
  3 siblings, 0 replies; 200+ results
From: Remy Horton @ 2018-02-14 14:50 UTC (permalink / raw)
  To: Shahaf Shuler, nhorman, thomas
  Cc: dev, declan.doherty, mohammad.abdul.awal, ferruh.yigit


On 14/02/2018 12:32, Shahaf Shuler wrote:
[..]
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>  1 file changed, 6 insertions(+)

Acked-by: Remy Horton <remy.horton@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory hotplug changes
  2018-02-07 10:11  0%     ` Jerin Jacob
@ 2018-02-14 14:48  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 14:48 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Jerin Jacob, Bruce Richardson, Neil Horman, John McNamara,
	Marko Kovacevic

07/02/2018 11:11, Jerin Jacob:
> -----Original Message-----
> > Date: Mon, 5 Feb 2018 11:47:42 +0000
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > To: Anatoly Burakov <anatoly.burakov@intel.com>
> > CC: dev@dpdk.org, Neil Horman <nhorman@tuxdriver.com>, John McNamara
> >  <john.mcnamara@intel.com>, Marko Kovacevic <marko.kovacevic@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory
> >  hotplug changes
> > User-Agent: Mutt/1.9.1 (2017-09-22)
> > 
> > On Thu, Jan 18, 2018 at 10:32:28AM +0000, Anatoly Burakov wrote:
> > > Due to coming changes outlined in memory hotplug RFC, there will
> > > be several API/ABI changes.
> > > 
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Applied

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal
  2018-02-14  0:04  4%         ` Thomas Monjalon
@ 2018-02-14 14:25  4%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 14:25 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Bruce Richardson, Jerin Jacob, Mcnamara, John, Neil Horman,
	Kovacevic, Marko

14/02/2018 01:04, Thomas Monjalon:
> > > > > There will be a new function added in v18.05 that will return number of
> > > > > detected sockets, which will change the ABI.
> > > > > 
> > > > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > > > ---
> > > > > +* eal: new ``numa_node_count`` member will be added to ``rte_config``
> > > > > +structure in v18.05.
> > > > 
> > > > Acked-by: John McNamara <john.mcnamara@intel.com>
> > > 
> > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > >
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  2018-02-14 12:32  4% [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors Shahaf Shuler
  2018-02-14 13:50  4% ` Thomas Monjalon
@ 2018-02-14 13:54  4% ` Doherty, Declan
  2018-02-14 14:50  4% ` Remy Horton
  2018-02-14 15:27  4% ` Boccassi, Luca
  3 siblings, 0 replies; 200+ results
From: Doherty, Declan @ 2018-02-14 13:54 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, mohammad.abdul.awal, ferruh.yigit, remy.horton, nhorman, thomas

On 14/02/2018 12:32 PM, Shahaf Shuler wrote:
> This is following the RFC being discussed and targets 18.05
>
> http://dpdk.org/ml/archives/dev/2018-January/085716.html
>
> Cc: declan.doherty@intel.com
> Cc: mohammad.abdul.awal@intel.com
> Cc: ferruh.yigit@intel.com
> Cc: remy.horton@intel.com
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>   doc/guides/rel_notes/deprecation.rst | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d59ad5988..f6151de63 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -59,3 +59,9 @@ Deprecation Notices
>     be added between the producer and consumer structures. The size of the
>     structure and the offset of the fields will remain the same on
>     platforms with 64B cache line, but will change on other platforms.
> +
> +* ethdev: A work is being planned for 18.05 to expose VF port representors
> +  as a mean to perform control and data path operation on the different VFs.
> +  As VF representor is an ethdev port, new fields are needed in order to map
> +  between the VF representor and the VF or the parent PF. Those new fields
> +  are to be included in ``rte_eth_dev_info`` struct.
>
Acked-by: Declan Doherty <declan.doherty@intel.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: update release notes for 18.02
  2018-02-14 12:21  6% [dpdk-dev] [PATCH v1] doc: update release notes for 18.02 John McNamara
@ 2018-02-14 13:50  6% ` John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2018-02-14 13:50 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 14767 bytes --]

Fix grammar, spelling and formatting of DPDK 18.02 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_18_02.rst | 199 ++++++++++++---------------------
 1 file changed, 69 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_02.rst b/doc/guides/rel_notes/release_18_02.rst
index 04202ba..bc08118 100644
--- a/doc/guides/rel_notes/release_18_02.rst
+++ b/doc/guides/rel_notes/release_18_02.rst
@@ -41,7 +41,7 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
-* **Add function to allow releasing internal EAL resources on exit**
+* **Added function to allow releasing internal EAL resources on exit.**
 
   During ``rte_eal_init()`` EAL allocates memory from hugepages to enable its
   core libraries to perform their tasks. The ``rte_eal_cleanup()`` function
@@ -50,32 +50,12 @@ New Features
   exiting. Not calling this function could result in leaking hugepages, leading
   to failure during initialization of secondary processes.
 
-* **Added the ixgbe ethernet driver to support RSS with flow API.**
+* **Added igb, ixgbe and i40e ethernet driver to support RSS with flow API.**
 
-  Rte_flow actually defined to include RSS, but till now, RSS is out of
-  rte_flow. This patch is to support igb and ixgbe NIC with existing RSS
-  configuration using rte_flow API.
+  Added support for igb, ixgbe and i40e NICs with existing RSS configuration
+  using the ``rte_flow`` API.
 
-* **Add MAC loopback support for i40e.**
-
-  Add MAC loopback support for i40e in order to support test task asked by
-  users. According to the device configuration, it will setup TX->RX loopback
-  link or not.
-
-* **Add the support of run time determination of number of queues per i40e VF**
-
-  The number of queue per VF is determined by its host PF. If the PCI address
-  of an i40e PF is aaaa:bb.cc, the number of queues per VF can be configured
-  with EAL parameter like -w aaaa:bb.cc,queue-num-per-vf=n. The value n can be
-  1, 2, 4, 8 or 16. If no such parameter is configured, the number of queues
-  per VF is 4 by default.
-
-* **Added the i40e ethernet driver to support RSS with flow API.**
-
-  Rte_flow actually defined to include RSS, but till now, RSS is out of
-  rte_flow. This patch is to support i40e NIC with existing RSS
-  configuration using rte_flow API.It also enable queue region configuration
-  using flow API for i40e.
+  Also enabled queue region configuration using the ``rte_flow`` API for i40e.
 
 * **Updated i40e driver to support PPPoE/PPPoL2TP.**
 
@@ -83,6 +63,20 @@ New Features
   profiles which can be programmed by dynamic device personalization (DDP)
   process.
 
+* **Added MAC loopback support for i40e.**
+
+  Added MAC loopback support for i40e in order to support test tasks requested
+  by users. It will setup ``Tx -> Rx`` loopback link according to the device
+  configuration.
+
+* **Added support of run time determination of number of queues per i40e VF.**
+
+  The number of queue per VF is determined by its host PF. If the PCI address
+  of an i40e PF is ``aaaa:bb.cc``, the number of queues per VF can be
+  configured with EAL parameter like ``-w aaaa:bb.cc,queue-num-per-vf=n``. The
+  value n can be 1, 2, 4, 8 or 16. If no such parameter is configured, the
+  number of queues per VF is 4 by default.
+
 * **Updated mlx5 driver.**
 
   Updated the mlx5 driver including the following changes:
@@ -117,16 +111,10 @@ New Features
   * Added tunneled packets classification.
   * Added inner checksum offload.
 
-* **Added the igb ethernet driver to support RSS with flow API.**
-
-  Rte_flow actually defined to include RSS, but till now, RSS is out of
-  rte_flow. This patch is to support igb NIC with existing RSS configuration
-  using rte_flow API.
-
-* **Add AVF (Adaptive Virtual Function) net PMD.**
+* **Added AVF (Adaptive Virtual Function) net PMD.**
 
-  A new net PMD has been added, which supports Intel® Ethernet Adaptive
-  Virtual Function (AVF) with features list below:
+  Added a new net PMD called AVF (Adaptive Virtual Function), which supports
+  Intel® Ethernet Adaptive Virtual Function (AVF) with features such as:
 
   * Basic Rx/Tx burst
   * SSE vectorized Rx/Tx burst
@@ -140,17 +128,22 @@ New Features
   * Rx/Tx descriptor status
   * Link status update/event
 
-* **Add feature supports for live migration from vhost-net to vhost-user.**
+* **Added feature supports for live migration from vhost-net to vhost-user.**
 
-  To make live migration from vhost-net to vhost-user possible, added
-  feature supports for vhost-user. The features include:
+  Added feature supports for vhost-user to make live migration from vhost-net
+  to vhost-user possible. The features include:
 
-  * VIRTIO_F_ANY_LAYOUT
-  * VIRTIO_F_EVENT_IDX
-  * VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_HOST_ECN
-  * VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_HOST_UFO
-  * VIRTIO_NET_F_GSO
+  * ``VIRTIO_F_ANY_LAYOUT``
+  * ``VIRTIO_F_EVENT_IDX``
+  * ``VIRTIO_NET_F_GUEST_ECN``, ``VIRTIO_NET_F_HOST_ECN``
+  * ``VIRTIO_NET_F_GUEST_UFO``, ``VIRTIO_NET_F_HOST_UFO``
+  * ``VIRTIO_NET_F_GSO``
 
+  Also added ``VIRTIO_NET_F_GUEST_ANNOUNCE`` feature support in virtio pmd.
+  In a scenario where the vhost backend doesn't have the ability to generate
+  RARP packets, the VM running virtio pmd can still be live migrated if
+  ``VIRTIO_NET_F_GUEST_ANNOUNCE`` feature is negotiated.
+    
 * **Updated the AESNI-MB PMD.**
 
   The AESNI-MB PMD has been updated with additional support for:
@@ -160,62 +153,65 @@ New Features
 * **Updated the DPAA_SEC crypto driver to support rte_security.**
 
   Updated the ``dpaa_sec`` crypto PMD to support ``rte_security`` lookaside
-  protocol offload for IPSec.
+  protocol offload for IPsec.
 
 * **Added Wireless Base Band Device (bbdev) abstraction.**
 
   The Wireless Baseband Device library is an acceleration abstraction
   framework for 3gpp Layer 1 processing functions that provides a common
-  programming interface for seamless opeartion on integrated or discrete
+  programming interface for seamless operation on integrated or discrete
   hardware accelerators or using optimized software libraries for signal
   processing.
+
   The current release only supports 3GPP CRC, Turbo Coding and Rate
   Matching operations, as specified in 3GPP TS 36.212.
 
   See the :doc:`../prog_guide/bbdev` programmer's guide for more details.
 
-* **Added New eventdev OPDL PMD**
+* **Added New eventdev Ordered Packet Distribution Library (OPDL) PMD.**
 
   The OPDL (Ordered Packet Distribution Library) eventdev is a specific
   implementation of the eventdev API. It is particularly suited to packet
   processing workloads that have high throughput and low latency requirements.
   All packets follow the same path through the device. The order in which
-  packets  follow is determinted by the order in which queues are set up.
+  packets follow is determined by the order in which queues are set up.
   Events are left on the ring until they are transmitted. As a result packets
   do not go out of order.
 
-  With this change, application can use OPDL PMD by eventdev api.
+  With this change, applications can use the OPDL PMD via the eventdev api.
 
-* **Added New pipeline use case for dpdk-test-eventdev application**
+* **Added new pipeline use case for dpdk-test-eventdev application.**
 
+  Added a new "pipeline" use case for the ``dpdk-test-eventdev`` application.
   The pipeline case can be used to simulate various stages in a real world
   application from packet receive to transmit while maintaining the packet
-  ordering also measure the performance of the event device across the stages
-  of the pipeline.
+  ordering. It can also be used to measure the performance of the event device
+  across the stages of the pipeline.
 
-  The pipeline use case has been made generic to work will all the event
+  The pipeline use case has been made generic to work with all the event
   devices based on the capabilities.
 
-* **Updated Eventdev Sample application to support event devices based on capability**
+* **Updated Eventdev sample application to support event devices based on capability.**
 
-  Updated Eventdev pipeline sample application to support various types of pipelines
-  based on the capabilities of the attached event and ethernet devices. Also,
-  renamed the application from SW PMD specific ``eventdev_pipeline_sw_pmd``
-  to PMD agnostic ``eventdev_pipeline``.
+  Updated the Eventdev pipeline sample application to support various types of
+  pipelines based on the capabilities of the attached event and ethernet
+  devices. Also, renamed the application from software PMD specific
+  ``eventdev_pipeline_sw_pmd`` to the more generic ``eventdev_pipeline``.
 
 * **Added Rawdev, a generic device support library.**
 
-  Rawdev library provides support for integrating any generic device type with
-  DPDK framework. Generic devices are those which do not have a pre-defined
+  The Rawdev library provides support for integrating any generic device type with
+  the DPDK framework. Generic devices are those which do not have a pre-defined
   type within DPDK, for example, ethernet, crypto, event etc.
+
   A set of northbound APIs have been defined which encompass a generic set of
   operations by allowing applications to interact with device using opaque
-  structures/buffers. Also, southbound APIs provide APIs for integrating device
+  structures/buffers. Also, southbound APIs provide a means of integrating devices
   either as as part of a physical bus (PCI, FSLMC etc) or through ``vdev``.
 
   See the :doc:`../prog_guide/rawdev` programmer's guide for more details.
 
-* **Added new multi-process communication channel**
+* **Added new multi-process communication channel.**
 
   Added a generic channel in EAL for multi-process (primary/secondary) communication.
   Consumers of this channel need to register an action with an action name to response
@@ -227,14 +223,14 @@ New Features
   * ``rte_mp_request`` is for sending a request message and will block until
     it gets a reply message which is sent from the peer by ``rte_mp_reply``.
 
-* **Add GRO support for VxLAN-tunneled packets.**
+* **Added GRO support for VxLAN-tunneled packets.**
 
-  Add GRO support for VxLAN-tunneled packets. Supported VxLAN packets
+  Added GRO support for VxLAN-tunneled packets. Supported VxLAN packets
   must contain an outer IPv4 header and inner TCP/IPv4 headers. VxLAN
   GRO doesn't check if input packets have correct checksums and doesn't
   update checksums for output packets. Additionally, it assumes the
-  packets are complete (i.e., MF==0 && frag_off==0), when IP
-  fragmentation is possible (i.e., DF==0).
+  packets are complete (i.e., ``MF==0 && frag_off==0``), when IP
+  fragmentation is possible (i.e., ``DF==0``).
 
 * **Increased default Rx and Tx ring size in sample applications.**
 
@@ -243,75 +239,19 @@ New Features
   general case. The user should experiment with various Rx and Tx ring sizes
   for their specific application to get best performance.
 
-* **Added new DPDK build system using the tools "meson" and "ninja" [EXPERIMENTAL]**
+* **Added new DPDK build system using the tools "meson" and "ninja" [EXPERIMENTAL].**
 
-  Added in support for building DPDK using ``meson`` and ``ninja``, which gives
+  Added support for building DPDK using ``meson`` and ``ninja``, which gives
   additional features, such as automatic build-time configuration, over the
   current build system using ``make``. For instructions on how to do a DPDK build
   using the new system, see the instructions in ``doc/build-sdk-meson.txt``.
 
-.. note::
-
-    This new build system support is incomplete at this point and is added
-    as experimental in this release. The existing build system using ``make``
-    is unaffected by these changes, and can continue to be used for this
-    and subsequent releases until such time as it's deprecation is announced.
-
-
-API Changes
------------
-
-.. This section should contain API changes. Sample format:
-
-   * Add a short 1-2 sentence description of the API change. Use fixed width
-     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
-     tense.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
+  .. note::
 
-ABI Changes
------------
-
-.. This section should contain ABI changes. Sample format:
-
-   * Add a short 1-2 sentence description of the ABI change that was announced
-     in the previous releases and made in this release. Use fixed width quotes
-     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-Removed Items
--------------
-
-.. This section should contain removed items in this release. Sample format:
-
-   * Add a short 1-2 sentence description of the removed item in the past
-     tense.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue in the present
-     tense. Add information on any known workarounds.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
+      This new build system support is incomplete at this point and is added
+      as experimental in this release. The existing build system using ``make``
+      is unaffected by these changes, and can continue to be used for this
+      and subsequent releases until such time as it's deprecation is announced.
 
 
 Shared Library Versions
@@ -428,10 +368,10 @@ Tested Platforms
      * Red Hat Enterprise Linux Server release 7.3
      * SUSE Enterprise Linux 12
      * Wind River Linux 8
-     * Ubantu 14.04
+     * Ubuntu 14.04
      * Ubuntu 16.04
      * Ubuntu 16.10
-     * Ubantu 17.10
+     * Ubuntu 17.10
 
    * NICs:
 
@@ -476,4 +416,3 @@ Tested Platforms
        * Firmware version: 1.63, 0x80000dda
        * Device id (pf/vf): 8086:1521 / 8086:1520
        * Driver version: 5.3.0-k (igb)
-
-- 
2.7.5

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  2018-02-14 12:32  4% [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors Shahaf Shuler
@ 2018-02-14 13:50  4% ` Thomas Monjalon
  2018-02-14 13:54  4% ` Doherty, Declan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 13:50 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: nhorman, dev, declan.doherty, mohammad.abdul.awal, ferruh.yigit,
	remy.horton

14/02/2018 13:32, Shahaf Shuler:
> This is following the RFC being discussed and targets 18.05
> 
> http://dpdk.org/ml/archives/dev/2018-January/085716.html
> 
> Cc: declan.doherty@intel.com
> Cc: mohammad.abdul.awal@intel.com
> Cc: ferruh.yigit@intel.com
> Cc: remy.horton@intel.com
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
@ 2018-02-14 12:32  4% Shahaf Shuler
  2018-02-14 13:50  4% ` Thomas Monjalon
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Shahaf Shuler @ 2018-02-14 12:32 UTC (permalink / raw)
  To: nhorman, thomas
  Cc: dev, declan.doherty, mohammad.abdul.awal, ferruh.yigit, remy.horton

This is following the RFC being discussed and targets 18.05

http://dpdk.org/ml/archives/dev/2018-January/085716.html

Cc: declan.doherty@intel.com
Cc: mohammad.abdul.awal@intel.com
Cc: ferruh.yigit@intel.com
Cc: remy.horton@intel.com

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
 doc/guides/rel_notes/deprecation.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad5988..f6151de63 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -59,3 +59,9 @@ Deprecation Notices
   be added between the producer and consumer structures. The size of the
   structure and the offset of the fields will remain the same on
   platforms with 64B cache line, but will change on other platforms.
+
+* ethdev: A work is being planned for 18.05 to expose VF port representors
+  as a mean to perform control and data path operation on the different VFs.
+  As VF representor is an ethdev port, new fields are needed in order to map
+  between the VF representor and the VF or the parent PF. Those new fields
+  are to be included in ``rte_eth_dev_info`` struct.
-- 
2.12.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1] doc: update release notes for 18.02
@ 2018-02-14 12:21  6% John McNamara
  2018-02-14 13:50  6% ` [dpdk-dev] [PATCH v2] " John McNamara
  0 siblings, 1 reply; 200+ results
From: John McNamara @ 2018-02-14 12:21 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 14409 bytes --]

Fix grammar, spelling and formatting of DPDK 18.02 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_18_02.rst | 194 +++++++++++----------------------
 1 file changed, 64 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_02.rst b/doc/guides/rel_notes/release_18_02.rst
index 04202ba..fa41207 100644
--- a/doc/guides/rel_notes/release_18_02.rst
+++ b/doc/guides/rel_notes/release_18_02.rst
@@ -41,7 +41,7 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
-* **Add function to allow releasing internal EAL resources on exit**
+* **Added function to allow releasing internal EAL resources on exit.**
 
   During ``rte_eal_init()`` EAL allocates memory from hugepages to enable its
   core libraries to perform their tasks. The ``rte_eal_cleanup()`` function
@@ -50,32 +50,12 @@ New Features
   exiting. Not calling this function could result in leaking hugepages, leading
   to failure during initialization of secondary processes.
 
-* **Added the ixgbe ethernet driver to support RSS with flow API.**
+* **Added igb, ixgbe and i40e ethernet driver to support RSS with flow API.**
 
-  Rte_flow actually defined to include RSS, but till now, RSS is out of
-  rte_flow. This patch is to support igb and ixgbe NIC with existing RSS
-  configuration using rte_flow API.
+  Added support for igb, ixgbe and i40e NICs with existing RSS configuration
+  using the ``rte_flow`` API.
 
-* **Add MAC loopback support for i40e.**
-
-  Add MAC loopback support for i40e in order to support test task asked by
-  users. According to the device configuration, it will setup TX->RX loopback
-  link or not.
-
-* **Add the support of run time determination of number of queues per i40e VF**
-
-  The number of queue per VF is determined by its host PF. If the PCI address
-  of an i40e PF is aaaa:bb.cc, the number of queues per VF can be configured
-  with EAL parameter like -w aaaa:bb.cc,queue-num-per-vf=n. The value n can be
-  1, 2, 4, 8 or 16. If no such parameter is configured, the number of queues
-  per VF is 4 by default.
-
-* **Added the i40e ethernet driver to support RSS with flow API.**
-
-  Rte_flow actually defined to include RSS, but till now, RSS is out of
-  rte_flow. This patch is to support i40e NIC with existing RSS
-  configuration using rte_flow API.It also enable queue region configuration
-  using flow API for i40e.
+  Also enabled queue region configuration using the ``rte_flow`` API for i40e.
 
 * **Updated i40e driver to support PPPoE/PPPoL2TP.**
 
@@ -83,6 +63,20 @@ New Features
   profiles which can be programmed by dynamic device personalization (DDP)
   process.
 
+* **Added MAC loopback support for i40e.**
+
+  Added MAC loopback support for i40e in order to support test tasks requested
+  by users. It will setup ``Tx -> Rx`` loopback link according to the device
+  configuration.
+
+* **Added support of run time determination of number of queues per i40e VF.**
+
+  The number of queue per VF is determined by its host PF. If the PCI address
+  of an i40e PF is ``aaaa:bb.cc``, the number of queues per VF can be
+  configured with EAL parameter like ``-w aaaa:bb.cc,queue-num-per-vf=n``. The
+  value n can be 1, 2, 4, 8 or 16. If no such parameter is configured, the
+  number of queues per VF is 4 by default.
+
 * **Updated mlx5 driver.**
 
   Updated the mlx5 driver including the following changes:
@@ -117,16 +111,10 @@ New Features
   * Added tunneled packets classification.
   * Added inner checksum offload.
 
-* **Added the igb ethernet driver to support RSS with flow API.**
-
-  Rte_flow actually defined to include RSS, but till now, RSS is out of
-  rte_flow. This patch is to support igb NIC with existing RSS configuration
-  using rte_flow API.
-
-* **Add AVF (Adaptive Virtual Function) net PMD.**
+* **Added AVF (Adaptive Virtual Function) net PMD.**
 
-  A new net PMD has been added, which supports Intel® Ethernet Adaptive
-  Virtual Function (AVF) with features list below:
+  Added a new net PMD called AVF (Adaptive Virtual Function), which supports
+  Intel® Ethernet Adaptive Virtual Function (AVF) with features such as:
 
   * Basic Rx/Tx burst
   * SSE vectorized Rx/Tx burst
@@ -140,16 +128,16 @@ New Features
   * Rx/Tx descriptor status
   * Link status update/event
 
-* **Add feature supports for live migration from vhost-net to vhost-user.**
+* **Added feature supports for live migration from vhost-net to vhost-user.**
 
-  To make live migration from vhost-net to vhost-user possible, added
-  feature supports for vhost-user. The features include:
+  Added feature supports for vhost-user to make live migration from vhost-net
+  to vhost-user possible. The features include:
 
-  * VIRTIO_F_ANY_LAYOUT
-  * VIRTIO_F_EVENT_IDX
-  * VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_HOST_ECN
-  * VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_HOST_UFO
-  * VIRTIO_NET_F_GSO
+  * ``VIRTIO_F_ANY_LAYOUT``
+  * ``VIRTIO_F_EVENT_IDX``
+  * ``VIRTIO_NET_F_GUEST_ECN``, ``VIRTIO_NET_F_HOST_ECN``
+  * ``VIRTIO_NET_F_GUEST_UFO``, ``VIRTIO_NET_F_HOST_UFO``
+  * ``VIRTIO_NET_F_GSO``
 
 * **Updated the AESNI-MB PMD.**
 
@@ -160,62 +148,65 @@ New Features
 * **Updated the DPAA_SEC crypto driver to support rte_security.**
 
   Updated the ``dpaa_sec`` crypto PMD to support ``rte_security`` lookaside
-  protocol offload for IPSec.
+  protocol offload for IPsec.
 
 * **Added Wireless Base Band Device (bbdev) abstraction.**
 
   The Wireless Baseband Device library is an acceleration abstraction
   framework for 3gpp Layer 1 processing functions that provides a common
-  programming interface for seamless opeartion on integrated or discrete
+  programming interface for seamless operation on integrated or discrete
   hardware accelerators or using optimized software libraries for signal
   processing.
+
   The current release only supports 3GPP CRC, Turbo Coding and Rate
   Matching operations, as specified in 3GPP TS 36.212.
 
   See the :doc:`../prog_guide/bbdev` programmer's guide for more details.
 
-* **Added New eventdev OPDL PMD**
+* **Added New eventdev Ordered Packet Distribution Library (OPDL) PMD.**
 
   The OPDL (Ordered Packet Distribution Library) eventdev is a specific
   implementation of the eventdev API. It is particularly suited to packet
   processing workloads that have high throughput and low latency requirements.
   All packets follow the same path through the device. The order in which
-  packets  follow is determinted by the order in which queues are set up.
+  packets follow is determined by the order in which queues are set up.
   Events are left on the ring until they are transmitted. As a result packets
   do not go out of order.
 
-  With this change, application can use OPDL PMD by eventdev api.
+  With this change, applications can use the OPDL PMD via the eventdev api.
 
-* **Added New pipeline use case for dpdk-test-eventdev application**
+* **Added new pipeline use case for dpdk-test-eventdev application.**
 
+  Added a new "pipeline" use case for the ``dpdk-test-eventdev`` application.
   The pipeline case can be used to simulate various stages in a real world
   application from packet receive to transmit while maintaining the packet
-  ordering also measure the performance of the event device across the stages
-  of the pipeline.
+  ordering. It can also be used to measure the performance of the event device
+  across the stages of the pipeline.
 
-  The pipeline use case has been made generic to work will all the event
+  The pipeline use case has been made generic to work with all the event
   devices based on the capabilities.
 
-* **Updated Eventdev Sample application to support event devices based on capability**
+* **Updated Eventdev sample application to support event devices based on capability.**
 
-  Updated Eventdev pipeline sample application to support various types of pipelines
-  based on the capabilities of the attached event and ethernet devices. Also,
-  renamed the application from SW PMD specific ``eventdev_pipeline_sw_pmd``
-  to PMD agnostic ``eventdev_pipeline``.
+  Updated the Eventdev pipeline sample application to support various types of
+  pipelines based on the capabilities of the attached event and ethernet
+  devices. Also, renamed the application from software PMD specific
+  ``eventdev_pipeline_sw_pmd`` to the more generic ``eventdev_pipeline``.
 
 * **Added Rawdev, a generic device support library.**
 
-  Rawdev library provides support for integrating any generic device type with
-  DPDK framework. Generic devices are those which do not have a pre-defined
+  The Rawdev library provides support for integrating any generic device type with
+  the DPDK framework. Generic devices are those which do not have a pre-defined
   type within DPDK, for example, ethernet, crypto, event etc.
+
   A set of northbound APIs have been defined which encompass a generic set of
   operations by allowing applications to interact with device using opaque
-  structures/buffers. Also, southbound APIs provide APIs for integrating device
+  structures/buffers. Also, southbound APIs provide a means of integrating devices
   either as as part of a physical bus (PCI, FSLMC etc) or through ``vdev``.
 
   See the :doc:`../prog_guide/rawdev` programmer's guide for more details.
 
-* **Added new multi-process communication channel**
+* **Added new multi-process communication channel.**
 
   Added a generic channel in EAL for multi-process (primary/secondary) communication.
   Consumers of this channel need to register an action with an action name to response
@@ -227,14 +218,14 @@ New Features
   * ``rte_mp_request`` is for sending a request message and will block until
     it gets a reply message which is sent from the peer by ``rte_mp_reply``.
 
-* **Add GRO support for VxLAN-tunneled packets.**
+* **Added GRO support for VxLAN-tunneled packets.**
 
-  Add GRO support for VxLAN-tunneled packets. Supported VxLAN packets
+  Added GRO support for VxLAN-tunneled packets. Supported VxLAN packets
   must contain an outer IPv4 header and inner TCP/IPv4 headers. VxLAN
   GRO doesn't check if input packets have correct checksums and doesn't
   update checksums for output packets. Additionally, it assumes the
-  packets are complete (i.e., MF==0 && frag_off==0), when IP
-  fragmentation is possible (i.e., DF==0).
+  packets are complete (i.e., ``MF==0 && frag_off==0``), when IP
+  fragmentation is possible (i.e., ``DF==0``).
 
 * **Increased default Rx and Tx ring size in sample applications.**
 
@@ -243,75 +234,19 @@ New Features
   general case. The user should experiment with various Rx and Tx ring sizes
   for their specific application to get best performance.
 
-* **Added new DPDK build system using the tools "meson" and "ninja" [EXPERIMENTAL]**
+* **Added new DPDK build system using the tools "meson" and "ninja" [EXPERIMENTAL].**
 
-  Added in support for building DPDK using ``meson`` and ``ninja``, which gives
+  Added support for building DPDK using ``meson`` and ``ninja``, which gives
   additional features, such as automatic build-time configuration, over the
   current build system using ``make``. For instructions on how to do a DPDK build
   using the new system, see the instructions in ``doc/build-sdk-meson.txt``.
 
-.. note::
-
-    This new build system support is incomplete at this point and is added
-    as experimental in this release. The existing build system using ``make``
-    is unaffected by these changes, and can continue to be used for this
-    and subsequent releases until such time as it's deprecation is announced.
-
-
-API Changes
------------
-
-.. This section should contain API changes. Sample format:
-
-   * Add a short 1-2 sentence description of the API change. Use fixed width
-     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
-     tense.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
+  .. note::
 
-ABI Changes
------------
-
-.. This section should contain ABI changes. Sample format:
-
-   * Add a short 1-2 sentence description of the ABI change that was announced
-     in the previous releases and made in this release. Use fixed width quotes
-     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-Removed Items
--------------
-
-.. This section should contain removed items in this release. Sample format:
-
-   * Add a short 1-2 sentence description of the removed item in the past
-     tense.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue in the present
-     tense. Add information on any known workarounds.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
+      This new build system support is incomplete at this point and is added
+      as experimental in this release. The existing build system using ``make``
+      is unaffected by these changes, and can continue to be used for this
+      and subsequent releases until such time as it's deprecation is announced.
 
 
 Shared Library Versions
@@ -428,10 +363,10 @@ Tested Platforms
      * Red Hat Enterprise Linux Server release 7.3
      * SUSE Enterprise Linux 12
      * Wind River Linux 8
-     * Ubantu 14.04
+     * Ubuntu 14.04
      * Ubuntu 16.04
      * Ubuntu 16.10
-     * Ubantu 17.10
+     * Ubuntu 17.10
 
    * NICs:
 
@@ -476,4 +411,3 @@ Tested Platforms
        * Firmware version: 1.63, 0x80000dda
        * Device id (pf/vf): 8086:1521 / 8086:1520
        * Driver version: 5.3.0-k (igb)
-
-- 
2.7.5

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role function
  2018-02-14  0:09  0% ` Thomas Monjalon
@ 2018-02-14 10:59  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 10:59 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: dev, nhorman, pbhagavatula, aconole

14/02/2018 01:09, Thomas Monjalon:
> 12/01/2018 21:45, Erik Gabriel Carrillo:
> > This an API/ABI change notice for DPDK 18.05 announcing a change in
> > the meaning of the return values of the rte_lcore_has_role() function.
> > 
> > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> > ---
> > +* eal: The semantics of the return value for the ``rte_lcore_has_role`` function
> > +  are planned to change in v18.05. The function currently returns 0 and <0 for
> > +  success and failure, respectively.  This will change to 1 and 0 for true and
> > +  false, respectively, to make use of the function more intuitive.
> 
> It will introduce some subtle bugs in applications.
> We must clearly advertise this API change in the release notes.
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] doc: ethdev ABI change deprecation notice
  2018-02-13 13:21  4%       ` Olivier Matz
@ 2018-02-14  0:14  4%         ` Thomas Monjalon
  2018-02-14 17:18  4%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-02-14  0:14 UTC (permalink / raw)
  To: Kirill Rybalchenko
  Cc: dev, Olivier Matz, Ferruh Yigit, Neil Horman, andrey.chilikin,
	adrien.mazarguil

> > >> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> > >>
> > >> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
> > >> ---
> > >> +* ethdev: announce ABI change
> > >> +  The size of variables flow_types_mask in rte_eth_fdir_info structure,
> > >> +  sym_hash_enable_mask and valid_bit_mask in rte_eth_hash_global_conf structure
> > >> +  will be increased from 32 to 64 bits to fulfill hardware requirements.
> > >> +  This change will break existing ABI as size of the structures will increase.
> > >> +
> > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>

I would prefer you drop the legacy code to keep only rte_flow.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role function
    2018-02-13 14:37  0% ` Ferruh Yigit
@ 2018-02-14  0:09  0% ` Thomas Monjalon
  2018-02-14 10:59  0%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-02-14  0:09 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: dev, nhorman, pbhagavatula, aconole

12/01/2018 21:45, Erik Gabriel Carrillo:
> This an API/ABI change notice for DPDK 18.05 announcing a change in
> the meaning of the return values of the rte_lcore_has_role() function.
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> ---
> +* eal: The semantics of the return value for the ``rte_lcore_has_role`` function
> +  are planned to change in v18.05. The function currently returns 0 and <0 for
> +  success and failure, respectively.  This will change to 1 and 0 for true and
> +  false, respectively, to make use of the function more intuitive.

It will introduce some subtle bugs in applications.
We must clearly advertise this API change in the release notes.

Acked-by: Thomas Monjalon <thomas@monjalon.net>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal
  2018-02-09 14:42  4%       ` Bruce Richardson
@ 2018-02-14  0:04  4%         ` Thomas Monjalon
  2018-02-14 14:25  4%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-02-14  0:04 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Bruce Richardson, Jerin Jacob, Mcnamara, John, Neil Horman,
	Kovacevic, Marko

> > > > There will be a new function added in v18.05 that will return number of
> > > > detected sockets, which will change the ABI.
> > > > 
> > > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > > ---
> > > > +* eal: new ``numa_node_count`` member will be added to ``rte_config``
> > > > +structure in v18.05.
> > > 
> > > Acked-by: John McNamara <john.mcnamara@intel.com>
> > 
> > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v5] checkpatches.sh: Add checks for ABI symbol addition
  2018-02-09 15:21  6% ` [dpdk-dev] [PATCH v5] " Neil Horman
@ 2018-02-13 22:57  4%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-13 22:57 UTC (permalink / raw)
  To: Neil Horman
  Cc: dev, john.mcnamara, bruce.richardson, Ferruh Yigit, Stephen Hemminger

Hi,

I wanted to push this patch in 18.02, but when looking more closely,
I see few things to improve.
As it is a tool, there is no harm to wait one more week and push it
early in 18.05.

09/02/2018 16:21, Neil Horman:
>  check () { # <patch> <commit> <title>
> +	local reta
>  	total=$(($total + 1))
>  	! $verbose || printf '\n### %s\n\n' "$3"
>  	if [ -n "$1" ] ; then
> @@ -96,9 +100,26 @@ check () { # <patch> <commit> <title>
>  	else
>  		report=$($DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
>  	fi
> -	[ $? -ne 0 ] || return 0

You are removing the return, so the report will be always printed.
You must print the report only in case of error.

> +	reta=$?
> +
>  	$verbose || printf '\n### %s\n\n' "$3"
>  	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
> +
> +	! $verbose || echo
> +	! $verbose || echo "Checking API additions/removals:"

You can use printf to combine these lines.

> +
> +	if [ -n "$1" ] ; then
> +		report=$($VALIDATE_NEW_API $1)

Beware of spaces in file names: use quoted "$1".

> +	elif [ -n "$2" ] ; then
> +		report=$(git format-patch \
> +			 --find-renames --no-stat --stdout -1 $commit |
> +			$VALIDATE_NEW_API -)
> +	else
> +		report=$($VALIDATE_NEW_API -)

So your script supports "-" for stdin? Nice

> +	fi
> +	[ $? -ne 0 -o $reta -ne 0 ] || return 0

Suggestion of more explicit variable naming:
$reta -> style_result
$? -> symbol_result

> +	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'

Wrong copy/paste: the sed is useless for the API report.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] eal: fix rte_errno values for IPC API
  @ 2018-02-13 15:39  3%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-02-13 15:39 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: Thomas Monjalon, Burakov, Anatoly, dev, Tan, Jianfeng

On Tue, Feb 13, 2018 at 02:16:08PM +0000, Van Haaren, Harry wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Tuesday, February 13, 2018 1:51 PM
> > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH] eal: fix rte_errno values for IPC API
> > 
> > > > rte_errno values should not be negative.
> > > >
> > > > Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
> > > > Fixes: 783b6e54971d ("eal: add synchronous multi-process communication")
> > > > Cc: jianfeng.tan@intel.com
> > > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > >
> > > Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
> > >
> > > Thanks for fixing this.
> > 
> > Applied, thanks
> > 
> > There are a lot of similar issues:
> > 
> > git grep -l 'rte_errno = -E' | sed 's,[^/]*$,,' | sort -u
> > 
> > 	drivers/event/opdl/
> > 	drivers/event/sw/
> <snip>
> > 	lib/librte_eventdev/
> 
> 
> I just checked the eventdev.h port_link() docs, which indicate negative return values.
> Perhaps the header is wrong too - but the PMDs adhere to the library header in this case.
> 
> Is there a requirement for rte_errno to be positive?
> It looks to be declared as per-lcore signed int in rte_errno.h +20
>
I think I wrote that part of the documentation, and it never crossed my
mind that people would set rte_errno to negative values, given how
errno from system calls are always positive. However, I think this
omission should be rectified, and we should enforce having rte_errno
values as positive.

> Either-way, if we want to change the PMDs, we should change the Eventdev APIs,
> which means API breakage, and application changes to handle changed return values.
> 
> Sound like more work than it is worth it to me?

I would view it as restoring sanity (or balance to the force if you
prefer! :-) ), so I'd definitely be ok with an ABI break to do that.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role function
  2018-02-13 14:43  0%   ` Van Haaren, Harry
@ 2018-02-13 14:47  0%     ` Pavan Nikhilesh
  0 siblings, 0 replies; 200+ results
From: Pavan Nikhilesh @ 2018-02-13 14:47 UTC (permalink / raw)
  To: Van Haaren, Harry, aconole, thomas, nhorman, Carrillo, Erik G,
	Yigit, Ferruh
  Cc: dev

On Tue, Feb 13, 2018 at 02:43:39PM +0000, Van Haaren, Harry wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> > Sent: Tuesday, February 13, 2018 2:38 PM
> > To: Carrillo, Erik G <erik.g.carrillo@intel.com>; nhorman@tuxdriver.com
> > Cc: dev@dpdk.org; pbhagavatula@caviumnetworks.com; aconole@redhat.com;
> > thomas@monjalon.net
> > Subject: Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role
> > function
> >
> > On 1/12/2018 8:45 PM, Erik Gabriel Carrillo wrote:
> > > This an API/ABI change notice for DPDK 18.05 announcing a change in
> > > the meaning of the return values of the rte_lcore_has_role() function.
> > >
> > > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> >
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
>
>
> Ah yes, lets make the return be 1 if the correct RTE_ROLE is probed - makes sense.
>
> @Pavan, as original author of code, do you have an Ack for this? :)
>
>
> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role function
  2018-02-13 14:37  0% ` Ferruh Yigit
@ 2018-02-13 14:43  0%   ` Van Haaren, Harry
  2018-02-13 14:47  0%     ` Pavan Nikhilesh
  0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-02-13 14:43 UTC (permalink / raw)
  To: pbhagavatula
  Cc: dev, aconole, thomas, nhorman, Carrillo, Erik G, Yigit, Ferruh

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Tuesday, February 13, 2018 2:38 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>; nhorman@tuxdriver.com
> Cc: dev@dpdk.org; pbhagavatula@caviumnetworks.com; aconole@redhat.com;
> thomas@monjalon.net
> Subject: Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role
> function
> 
> On 1/12/2018 8:45 PM, Erik Gabriel Carrillo wrote:
> > This an API/ABI change notice for DPDK 18.05 announcing a change in
> > the meaning of the return values of the rte_lcore_has_role() function.
> >
> > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>


Ah yes, lets make the return be 1 if the correct RTE_ROLE is probed - makes sense.

@Pavan, as original author of code, do you have an Ack for this? :)


Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role function
  @ 2018-02-13 14:37  0% ` Ferruh Yigit
  2018-02-13 14:43  0%   ` Van Haaren, Harry
  2018-02-14  0:09  0% ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-02-13 14:37 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, nhorman; +Cc: dev, pbhagavatula, aconole, thomas

On 1/12/2018 8:45 PM, Erik Gabriel Carrillo wrote:
> This an API/ABI change notice for DPDK 18.05 announcing a change in
> the meaning of the return values of the rte_lcore_has_role() function.
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] doc: ethdev ABI change deprecation notice
  2018-02-13 12:09  4%     ` Ferruh Yigit
@ 2018-02-13 13:21  4%       ` Olivier Matz
  2018-02-14  0:14  4%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-02-13 13:21 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Neil Horman, Kirill Rybalchenko, dev, andrey.chilikin, thomas

On Tue, Feb 13, 2018 at 12:09:19PM +0000, Ferruh Yigit wrote:
> On 1/12/2018 2:38 PM, Neil Horman wrote:
> > On Fri, Jan 12, 2018 at 10:29:46AM +0000, Kirill Rybalchenko wrote:
> >> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> >>
> >> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> >> index 13e8543..aaf306a 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -45,6 +45,12 @@ Deprecation Notices
> >>    Target release for removal of the legacy API will be defined once most
> >>    PMDs have switched to rte_flow.
> >>  
> >> +* ethdev: announce ABI change
> >> +  The size of variables flow_types_mask in rte_eth_fdir_info structure,
> >> +  sym_hash_enable_mask and valid_bit_mask in rte_eth_hash_global_conf structure
> >> +  will be increased from 32 to 64 bits to fulfill hardware requirements.
> >> +  This change will break existing ABI as size of the structures will increase.
> >> +
> >>  * i40e: The default flexible payload configuration which extracts the first 16
> >>    bytes of the payload for RSS will be deprecated starting from 18.02. If
> >>    required the previous behavior can be configured using existing flow
> >> -- 
> >> 2.5.5
> >>
> >>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure
  2018-02-13 11:27  4%     ` Ferruh Yigit
@ 2018-02-13 12:10  4%       ` Jerin Jacob
  2018-02-14 16:28  4%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-02-13 12:10 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Shahaf Shuler, Xueming(Steven) Li, Thomas Monjalon, Neil Horman, dev

-----Original Message-----
> Date: Tue, 13 Feb 2018 11:27:34 +0000
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> To: Shahaf Shuler <shahafs@mellanox.com>, "Xueming(Steven) Li"
>  <xuemingl@mellanox.com>, Thomas Monjalon <thomas@monjalon.net>, Neil
>  Horman <nhorman@tuxdriver.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS
>  configuration structure
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.6.0
> 
> On 2/13/2018 6:52 AM, Shahaf Shuler wrote:
> > Tuesday, February 6, 2018 9:39 AM, Xueming Li:
> >> Subject: [PATCH v2] doc: announce ABI change for RSS configuration
> >> structure
> >>
> >> Update deprecation notice for the new rss_level field of rte_eth_rss_conf.
> >>
> >> Link: http://www.dpdk.org/dev/patchwork/patch/31891
> >>
> >> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index d59ad5988..4bfce3bd7 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -59,3 +59,7 @@ Deprecation Notices
> >>    be added between the producer and consumer structures. The size of the
> >>    structure and the offset of the fields will remain the same on
> >>    platforms with 64B cache line, but will change on other platforms.
> >> +
> >> +* ethdev: A new rss level field planned in 18.05.
> >> +  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a
> >> +choice
> >> +  of RSS hash calculation on outer or inner header of tunneled packet.
> > 
> > Acked-By: Shahaf Shuler <shahafs@mellanox.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] doc: ethdev ABI change deprecation notice
  @ 2018-02-13 12:09  4%     ` Ferruh Yigit
  2018-02-13 13:21  4%       ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-02-13 12:09 UTC (permalink / raw)
  To: Neil Horman, Kirill Rybalchenko; +Cc: dev, andrey.chilikin, thomas

On 1/12/2018 2:38 PM, Neil Horman wrote:
> On Fri, Jan 12, 2018 at 10:29:46AM +0000, Kirill Rybalchenko wrote:
>> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
>>
>> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index 13e8543..aaf306a 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -45,6 +45,12 @@ Deprecation Notices
>>    Target release for removal of the legacy API will be defined once most
>>    PMDs have switched to rte_flow.
>>  
>> +* ethdev: announce ABI change
>> +  The size of variables flow_types_mask in rte_eth_fdir_info structure,
>> +  sym_hash_enable_mask and valid_bit_mask in rte_eth_hash_global_conf structure
>> +  will be increased from 32 to 64 bits to fulfill hardware requirements.
>> +  This change will break existing ABI as size of the structures will increase.
>> +
>>  * i40e: The default flexible payload configuration which extracts the first 16
>>    bytes of the payload for RSS will be deprecated starting from 18.02. If
>>    required the previous behavior can be configured using existing flow
>> -- 
>> 2.5.5
>>
>>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices
  2018-01-30 12:14  7% ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices Pablo de Lara
  2018-01-30 12:14  4%   ` [dpdk-dev] [PATCH v2 1/3] doc: announce ABI change for crypto info struct Pablo de Lara
@ 2018-02-13 11:45  4%   ` De Lara Guarch, Pablo
  1 sibling, 0 replies; 200+ results
From: De Lara Guarch, Pablo @ 2018-02-13 11:45 UTC (permalink / raw)
  To: akhil.goyal, hemant.agrawal, Doherty, Declan, jerin.jacob, Trahe,
	Fiona, Griffin, John, Jain, Deepak K, jck, tdu, dima, nsamsono,
	jianbo.liu
  Cc: dev



> -----Original Message-----
> From: De Lara Guarch, Pablo
> Sent: Tuesday, January 30, 2018 12:14 PM
> To: akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty, Declan
> <declan.doherty@intel.com>; jerin.jacob@caviumnetworks.com; Trahe,
> Fiona <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>; Jain,
> Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
> jianbo.liu@arm.com
> Cc: dev@dpdk.org; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: [PATCH v2 0/3] Cryptodev API/ABI deprecation notices
> 
> v2:
> - Added an extra deprecation announcement
> - Bonded the other two deprecation notices with the new one in a
>   patchset
> 
> Pablo de Lara (3):
>   doc: announce ABI change for crypto info struct
>   doc: announce deprecation for attach/detach crypto session
>   doc: announce deprecation in crypto queue pair start/stop
> 
>  doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
> lib/librte_cryptodev/rte_cryptodev.h |  4 ++++
>  2 files changed, 19 insertions(+)
> 
> --
> 2.14.3

Deferring this to 18.05, so we could discuss a replacement for pci_dev structure
in the cryptodev info structure, also needed for ethdev.

Pablo

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: remove eal API for default mempool ops name
  2018-02-02 14:01  0%     ` Olivier Matz
@ 2018-02-13 11:28  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-02-13 11:28 UTC (permalink / raw)
  To: Olivier Matz, Hemant Agrawal
  Cc: thomas, pbhagavatula, nipun.gupta, jerin.jacob, santosh.shukla, dev

On 2/2/2018 2:01 PM, Olivier Matz wrote:
> On Fri, Feb 02, 2018 at 02:01:42PM +0530, Hemant Agrawal wrote:
>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> ---
>> v2: fix checkpatch errors
>>
>>  doc/guides/rel_notes/deprecation.rst | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index d59ad59..c7d8f25 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -8,6 +8,15 @@ API and ABI deprecation notices are to be posted here.
>>  Deprecation Notices
>>  -------------------
>>  
>> +* eal: a new set of mbuf mempool ops name APIs for user, platform and best
>> +  mempool names have been defined in ``rte_mbuf`` in v18.02. The uses of
>> +  ``rte_eal_mbuf_default_mempool_ops`` shall be replaced by
>> +  ``rte_mbuf_best_mempool_ops``.
>> +  The following function is now redundant and it is target to be deprecated in
>> +  18.05:
>> +
>> +  - ``rte_eal_mbuf_default_mempool_ops``
>> +
>>  * eal: several API and ABI changes are planned for ``rte_devargs`` in v18.02.
>>    The format of device command line parameters will change. The bus will need
>>    to be explicitly stated in the device declaration. The enum ``rte_devtype``
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure
  2018-02-13  6:52  4%   ` Shahaf Shuler
@ 2018-02-13 11:27  4%     ` Ferruh Yigit
  2018-02-13 12:10  4%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-02-13 11:27 UTC (permalink / raw)
  To: Shahaf Shuler, Xueming(Steven) Li, Thomas Monjalon, Neil Horman; +Cc: dev

On 2/13/2018 6:52 AM, Shahaf Shuler wrote:
> Tuesday, February 6, 2018 9:39 AM, Xueming Li:
>> Subject: [PATCH v2] doc: announce ABI change for RSS configuration
>> structure
>>
>> Update deprecation notice for the new rss_level field of rte_eth_rss_conf.
>>
>> Link: http://www.dpdk.org/dev/patchwork/patch/31891
>>
>> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index d59ad5988..4bfce3bd7 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -59,3 +59,7 @@ Deprecation Notices
>>    be added between the producer and consumer structures. The size of the
>>    structure and the offset of the fields will remain the same on
>>    platforms with 64B cache line, but will change on other platforms.
>> +
>> +* ethdev: A new rss level field planned in 18.05.
>> +  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a
>> +choice
>> +  of RSS hash calculation on outer or inner header of tunneled packet.
> 
> Acked-By: Shahaf Shuler <shahafs@mellanox.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure
  2018-02-06  7:38  4% ` [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure Xueming Li
@ 2018-02-13  6:52  4%   ` Shahaf Shuler
  2018-02-13 11:27  4%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Shahaf Shuler @ 2018-02-13  6:52 UTC (permalink / raw)
  To: Xueming(Steven) Li, Thomas Monjalon, Neil Horman; +Cc: Xueming(Steven) Li, dev

Tuesday, February 6, 2018 9:39 AM, Xueming Li:
> Subject: [PATCH v2] doc: announce ABI change for RSS configuration
> structure
> 
> Update deprecation notice for the new rss_level field of rte_eth_rss_conf.
> 
> Link: http://www.dpdk.org/dev/patchwork/patch/31891
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index d59ad5988..4bfce3bd7 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -59,3 +59,7 @@ Deprecation Notices
>    be added between the producer and consumer structures. The size of the
>    structure and the offset of the fields will remain the same on
>    platforms with 64B cache line, but will change on other platforms.
> +
> +* ethdev: A new rss level field planned in 18.05.
> +  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a
> +choice
> +  of RSS hash calculation on outer or inner header of tunneled packet.

Acked-By: Shahaf Shuler <shahafs@mellanox.com>

> --
> 2.13.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory hotplug changes
    2018-02-05 11:47  0%   ` Bruce Richardson
  2018-02-12 15:58  0%   ` Jonas Pfefferle
@ 2018-02-13  0:24  0%   ` Yongseok Koh
  2 siblings, 0 replies; 200+ results
From: Yongseok Koh @ 2018-02-13  0:24 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Neil Horman, John McNamara, Marko Kovacevic


> On Jan 18, 2018, at 2:32 AM, Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> 
> Due to coming changes outlined in memory hotplug RFC, there will
> be several API/ABI changes.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
Acked-by: Yongseok Koh <yskoh@mellanox.com>
 
Thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal
    @ 2018-02-12 16:00  4%   ` Jonas Pfefferle
  1 sibling, 0 replies; 200+ results
From: Jonas Pfefferle @ 2018-02-12 16:00 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: Neil Horman, John McNamara, Marko Kovacevic

On Tue, 16 Jan 2018 17:53:40 +0000
  Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> There will be a new function added in v18.05 that will return
> number of detected sockets, which will change the ABI.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/deprecation.rst | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst 
>b/doc/guides/rel_notes/deprecation.rst
> index 13e8543..9662150 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,8 @@ API and ABI deprecation notices are to be posted 
>here.
> Deprecation Notices
> -------------------
> 
> +* eal: new ``numa_node_count`` member will be added to 
>``rte_config`` structure
> +  in v18.05.
> * eal: several API and ABI changes are planned for ``rte_devargs`` 
>in v18.02.
>   The format of device command line parameters will change. The bus 
>will need
>   to be explicitly stated in the device declaration. The enum 
>``rte_devtype``
> -- 
> 2.7.4

Acked-by: Jonas Pfefferle <pepperjo@japf.ch>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory hotplug changes
    2018-02-05 11:47  0%   ` Bruce Richardson
@ 2018-02-12 15:58  0%   ` Jonas Pfefferle
  2018-02-13  0:24  0%   ` Yongseok Koh
  2 siblings, 0 replies; 200+ results
From: Jonas Pfefferle @ 2018-02-12 15:58 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: Neil Horman, John McNamara, Marko Kovacevic

On Thu, 18 Jan 2018 10:32:28 +0000
  Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> Due to coming changes outlined in memory hotplug RFC, there will
> be several API/ABI changes.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>    Patch outlining future changes:
>    http://dpdk.org/dev/patchwork/patch/32467/
>    
>    v2: added rte_mem_config and rte_memzone changes to the 
>announcement
> 
> doc/guides/rel_notes/deprecation.rst | 9 +++++++++
> 1 file changed, 9 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst 
>b/doc/guides/rel_notes/deprecation.rst
> index 13e8543..93cbeea 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,15 @@ API and ABI deprecation notices are to be posted 
>here.
> Deprecation Notices
> -------------------
> 
> +* eal: due to internal data layoyut reorganization, there will be 
>changes to
> +  several structures and functions as a result of coming changes to 
>support
> +  memory hotplug in v18.05.
> +  ``rte_eal_get_physmem_layout`` will be deprecated and removed in 
>subsequent
> +  releases.
> +  ``rte_mem_config`` contents will change due to switch to memseg 
>lists.
> +  ``rte_memzone`` member ``memseg_id`` will no longer serve any 
>useful purpose
> +  and will be removed.
> +
> * eal: several API and ABI changes are planned for ``rte_devargs`` 
>in v18.02.
>   The format of device command line parameters will change. The bus 
>will need
>   to be explicitly stated in the device declaration. The enum 
>``rte_devtype``
> -- 
> 2.7.4

Acked-by: Jonas Pfefferle <pepperjo@japf.ch>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5] checkpatches.sh: Add checks for ABI symbol addition
    2018-01-31 17:27  6% ` [dpdk-dev] [PATCH v3] " Neil Horman
  2018-02-05 17:29  6% ` [dpdk-dev] [PATCH v4] " Neil Horman
@ 2018-02-09 15:21  6% ` Neil Horman
  2018-02-13 22:57  4%   ` Thomas Monjalon
  2018-02-14 19:19  6% ` [dpdk-dev] [PATCH v6] " Neil Horman
  3 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-02-09 15:21 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, thomas, john.mcnamara, bruce.richardson,
	Ferruh Yigit, Stephen Hemminger

Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental.  The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file.  The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change.  It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: thomas@monjalon.net
CC: john.mcnamara@intel.com
CC: bruce.richardson@intel.com
CC: Ferruh Yigit <ferruh.yigit@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>

---
Change notes

v2)
 * Cleaned up and documented awk script (shemminger)
 * fixed sort/uniq usage (shemminger)
 * moved checking to new script (tmonjalon)
 * added maintainer entry (tmonjalon)
 * added license (tmonjalon)

v3)
 * Changed symbol check script name (tmonjalon)
 * Trapped exit to clean temp file (tmonjalon)
 * Honored verbose command (tmonjalon)
 * Cleaned left over debug bits (tmonjalon)
 * Updated location in MAINTAINERS file (tmonjalon)

v4)
 * Updated maintainers file (tmonjalon)

v5)
 * undo V4 (tmojalon)
---
 MAINTAINERS                     |   1 +
 devtools/check-symbol-change.sh | 146 ++++++++++++++++++++++++++++++++++++++++
 devtools/checkpatches.sh        |  23 ++++++-
 3 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100755 devtools/check-symbol-change.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index acd056134..d9d2abff8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -86,6 +86,7 @@ M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: devtools/validate-abi.sh
+F: devtools/check-symbol-change.sh
 F: buildtools/check-experimental-syms.sh
 
 Driver information
diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
new file mode 100755
index 000000000..22b17e6f2
--- /dev/null
+++ b/devtools/check-symbol-change.sh
@@ -0,0 +1,146 @@
+#!/bin/sh
+
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
+
+build_map_changes()
+{
+	local fname=$1
+	local mapdb=$2
+
+	cat $fname | filterdiff -i *.map | awk '
+		# Initialize our variables
+		BEGIN {map="";sym="";ar="";sec=""; in_sec=0}
+
+		# Anything that starts with + or -, followed by an a
+		# and ends in the string .map is the name of our map file
+		# This may appear multiple times in a patch if multiple
+		# map files are altered, and all section/symbol names
+		# appearing between a triggering of this rule and the
+		# next trigger of this rule are associated with this file
+		/[-+] a\/.*\.map/ {map=$2}
+
+		# Triggering this rule, which starts a line with a + and ends it
+		# with a { identifies a versioned section.  The section name is
+		# the rest of the line with the + and { symbols remvoed.
+		# Triggering this rule sets in_sec to 1, which actives the
+		# symbol rule below
+		/+.*{/ {gsub("+","");sec=$1; in_sec=1}
+
+		# This rule idenfies the end of a section, and disables the
+		# symbol rule
+		/.*}/ {in_sec=0}
+
+		# This rule matches on a + followed by any characters except a :
+		# (which denotes a global vs local segment), and ends with a ;.
+		# The semicolon is removed and the symbol is printed with its
+		# association file name and version section, along with an
+		# indicator that the symbol is a new addition.  Note this rule
+		# only works if we have found a version section in the rule
+		# above (hence the in_sec check).  Otherwise we flag it as an
+		# unknown section
+		/^+[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " add"
+			} else {
+				print map " " sym " unknown add"
+			}
+		}
+
+		# This is the same rule as above, but the rule matches on a
+		# leading - rather than a +, denoting that the symbol is being
+		# removed.
+		/^-[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " del"
+			} else {
+				print map " " sym " unknown del"
+			}
+		}' > ./$mapdb
+
+		sort -u $mapdb > ./$mapdb.2
+		mv -f $mapdb.2 $mapdb
+
+}
+
+check_for_rule_violations()
+{
+	local mapdb=$1
+	local mname
+	local symname
+	local secname
+	local ar
+	local ret=0
+
+	while read mname symname secname ar
+	do
+		if [ "$ar" == "add" ]
+		then
+
+			if [ "$secname" == "unknown" ]
+			then
+				# Just inform the user of this occurrence, but
+				# don't flag it as an error
+				echo -n "INFO: symbol $syname is added but "
+				echo -n "patch has insuficient context "
+				echo -n "to determine the section name "
+				echo -n "please ensure the version is "
+				echo "EXPERIMENTAL"
+				continue
+			fi
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Symbols that are getting added in a section
+				# other ithan the experimental section
+				# to be moving from an already supported
+				# section or its a violation
+				grep -q \
+				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				if [ $? -ne 0 ]
+				then
+					echo -n "ERROR: symbol $symname "
+					echo -n "is added in a section "
+					echo -n "other than the EXPERIMENTAL "
+					echo "section of the version map"
+					ret=1
+				fi
+			fi
+		else
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Just inform users that non-experimenal
+				# symbols need to go through a deprecation
+				# process
+				echo -n "INFO: symbol $symname is being "
+				echo -n "removed, ensure that it has "
+				echo "gone through the deprecation process"
+			fi
+		fi
+	done < $mapdb
+
+	return $ret
+}
+
+trap clean_and_exit_on_sig EXIT
+
+mapfile=`mktemp mapdb.XXXXXX`
+patch=$1
+exit_code=1
+
+clean_and_exit_on_sig()
+{
+	rm -f $mapfile
+	exit $exit_code
+}
+
+build_map_changes $patch $mapfile
+check_for_rule_violations $mapfile
+exit_code=$?
+
+rm -f $mapfile
+
+exit $exit_code
+
+
diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 7676a6b50..0b2b5f039 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -35,6 +35,8 @@
 # - DPDK_CHECKPATCH_LINE_LENGTH
 . $(dirname $(readlink -e $0))/load-devel-config
 
+VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh
+
 length=${DPDK_CHECKPATCH_LINE_LENGTH:-80}
 
 # override default Linux options
@@ -61,6 +63,7 @@ print_usage () {
 	END_OF_HELP
 }
 
+
 number=0
 quiet=false
 verbose=false
@@ -86,6 +89,7 @@ total=0
 status=0
 
 check () { # <patch> <commit> <title>
+	local reta
 	total=$(($total + 1))
 	! $verbose || printf '\n### %s\n\n' "$3"
 	if [ -n "$1" ] ; then
@@ -96,9 +100,26 @@ check () { # <patch> <commit> <title>
 	else
 		report=$($DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
 	fi
-	[ $? -ne 0 ] || return 0
+	reta=$?
+
 	$verbose || printf '\n### %s\n\n' "$3"
 	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+
+	! $verbose || echo
+	! $verbose || echo "Checking API additions/removals:"
+
+	if [ -n "$1" ] ; then
+		report=$($VALIDATE_NEW_API $1)
+	elif [ -n "$2" ] ; then
+		report=$(git format-patch \
+			 --find-renames --no-stat --stdout -1 $commit |
+			$VALIDATE_NEW_API -)
+	else
+		report=$($VALIDATE_NEW_API -)
+	fi
+	[ $? -ne 0 -o $reta -ne 0 ] || return 0
+	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+
 	status=$(($status + 1))
 }
 
-- 
2.14.3

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal
  2018-02-07 10:10  4%     ` Jerin Jacob
@ 2018-02-09 14:42  4%       ` Bruce Richardson
  2018-02-14  0:04  4%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-02-09 14:42 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Mcnamara, John, Burakov, Anatoly, dev, Neil Horman, Kovacevic, Marko

On Wed, Feb 07, 2018 at 03:40:20PM +0530, Jerin Jacob wrote:
> -----Original Message-----
> > Date: Tue, 23 Jan 2018 10:39:58 +0000
> > From: "Mcnamara, John" <john.mcnamara@intel.com>
> > To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, "dev@dpdk.org"
> >  <dev@dpdk.org>
> > CC: Neil Horman <nhorman@tuxdriver.com>, "Kovacevic, Marko"
> >  <marko.kovacevic@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH] doc: add ABI change notice for
> >  numa_node_count in eal
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Burakov, Anatoly
> > > Sent: Tuesday, January 16, 2018 5:54 PM
> > > To: dev@dpdk.org
> > > Cc: Neil Horman <nhorman@tuxdriver.com>; Mcnamara, John
> > > <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>
> > > Subject: [PATCH] doc: add ABI change notice for numa_node_count in eal
> > > 
> > > There will be a new function added in v18.05 that will return number of
> > > detected sockets, which will change the ABI.
> > > 
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > >  doc/guides/rel_notes/deprecation.rst | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index 13e8543..9662150 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -8,6 +8,8 @@ API and ABI deprecation notices are to be posted here.
> > >  Deprecation Notices
> > >  -------------------
> > > 
> > > +* eal: new ``numa_node_count`` member will be added to ``rte_config``
> > > +structure in v18.05.
> > >  * eal: several API and ABI changes are planned for ``rte_devargs`` in  v18.02.
> > 
> > In general it is best to leave a blank line between the bullet points. But that
> > doesn't affect the rendering so:
> > 
> > Acked-by: John McNamara <john.mcnamara@intel.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory hotplug changes
  2018-02-05 11:47  0%   ` Bruce Richardson
@ 2018-02-07 10:11  0%     ` Jerin Jacob
  2018-02-14 14:48  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-02-07 10:11 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Anatoly Burakov, dev, Neil Horman, John McNamara, Marko Kovacevic

-----Original Message-----
> Date: Mon, 5 Feb 2018 11:47:42 +0000
> From: Bruce Richardson <bruce.richardson@intel.com>
> To: Anatoly Burakov <anatoly.burakov@intel.com>
> CC: dev@dpdk.org, Neil Horman <nhorman@tuxdriver.com>, John McNamara
>  <john.mcnamara@intel.com>, Marko Kovacevic <marko.kovacevic@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory
>  hotplug changes
> User-Agent: Mutt/1.9.1 (2017-09-22)
> 
> On Thu, Jan 18, 2018 at 10:32:28AM +0000, Anatoly Burakov wrote:
> > Due to coming changes outlined in memory hotplug RFC, there will
> > be several API/ABI changes.
> > 
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal
  @ 2018-02-07 10:10  4%     ` Jerin Jacob
  2018-02-09 14:42  4%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-02-07 10:10 UTC (permalink / raw)
  To: Mcnamara, John; +Cc: Burakov, Anatoly, dev, Neil Horman, Kovacevic, Marko

-----Original Message-----
> Date: Tue, 23 Jan 2018 10:39:58 +0000
> From: "Mcnamara, John" <john.mcnamara@intel.com>
> To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, "dev@dpdk.org"
>  <dev@dpdk.org>
> CC: Neil Horman <nhorman@tuxdriver.com>, "Kovacevic, Marko"
>  <marko.kovacevic@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: add ABI change notice for
>  numa_node_count in eal
> 
> 
> 
> > -----Original Message-----
> > From: Burakov, Anatoly
> > Sent: Tuesday, January 16, 2018 5:54 PM
> > To: dev@dpdk.org
> > Cc: Neil Horman <nhorman@tuxdriver.com>; Mcnamara, John
> > <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>
> > Subject: [PATCH] doc: add ABI change notice for numa_node_count in eal
> > 
> > There will be a new function added in v18.05 that will return number of
> > detected sockets, which will change the ABI.
> > 
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index 13e8543..9662150 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -8,6 +8,8 @@ API and ABI deprecation notices are to be posted here.
> >  Deprecation Notices
> >  -------------------
> > 
> > +* eal: new ``numa_node_count`` member will be added to ``rte_config``
> > +structure in v18.05.
> >  * eal: several API and ABI changes are planned for ``rte_devargs`` in  v18.02.
> 
> In general it is best to leave a blank line between the bullet points. But that
> doesn't affect the rendering so:
> 
> Acked-by: John McNamara <john.mcnamara@intel.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>


> 
> 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-02-05 16:37  8%   ` [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
  2018-02-05 17:39  3%     ` Burakov, Anatoly
  2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] Add " Anatoly Burakov
@ 2018-02-07  9:58  5%     ` Anatoly Burakov
  2018-03-08 12:12  3%       ` Bruce Richardson
  2018-03-21  4:59  0%       ` gowrishankar muthukrishnan
  2018-03-22 10:58  4%     ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
  3 siblings, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-02-07  9:58 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

During lcore scan, find maximum socket ID and store it. This will
break the ABI, so bump ABI version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
 lib/librte_eal/common/include/rte_eal.h   |  1 +
 lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/rte_eal_version.map        |  9 +++++++-
 6 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..827ddeb 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, max_socket_id = 0;
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		max_socket_id = RTE_MAX(max_socket_id, socket_id);
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	config->numa_node_count = max_socket_id + 1;
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int
+rte_num_sockets(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 08c6637..63fcc2e 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,7 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index d84bcff..ddf4c64 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -120,6 +120,14 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets on the system.
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ *
+ */
+unsigned int rte_num_sockets(void);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 4146907..fc83e74 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -211,6 +211,13 @@ DPDK_18.02 {
 
 }  DPDK_17.11;
 
+DPDK_18.05 {
+	global:
+
+	rte_num_sockets;
+
+} DPDK_18.02;
+
 EXPERIMENTAL {
 	global:
 
@@ -255,4 +262,4 @@ EXPERIMENTAL {
 	rte_service_set_stats_enable;
 	rte_service_start_with_defaults;
 
-} DPDK_18.02;
+} DPDK_18.05;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH 18.05 v4] Add function to return number of detected sockets
  2018-02-05 16:37  8%   ` [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
  2018-02-05 17:39  3%     ` Burakov, Anatoly
@ 2018-02-07  9:58  5%     ` Anatoly Burakov
  2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
  2018-03-22 10:58  4%     ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
  3 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-02-07  9:58 UTC (permalink / raw)
  To: dev

This patch is for 18.05 and implements changes referenced
in the deprecation notice[1]. (not yet merged as of this
writing)

This patchset breaks the EAL ABI and bumps its version. This
is arguably OK as memory changes will change a lot in EAL and
thus likely break ABI anyway. However, two other alternative
implementations are possible:

1) local static variable recording number of detected
   sockets. This is arguably less clean approach, but will not
   break the ABI and will have relatively little impact on the
   codebase.

2) keeping ABI compatibility, as shown in v3 of this patch [2].

My preference would be to keep this one.

[1] http://dpdk.org/dev/patchwork/patch/33853/
[2] http://dpdk.org/dev/patchwork/patch/34994/

Anatoly Burakov (1):
  eal: add function to return number of detected sockets

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
 lib/librte_eal/common/include/rte_eal.h   |  1 +
 lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/rte_eal_version.map        |  9 +++++++-
 6 files changed, 44 insertions(+), 15 deletions(-)

-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v1] doc: update deprecation notice of rte_devargs
@ 2018-02-07  9:26 11% Gaetan Rivet
  0 siblings, 0 replies; 200+ results
From: Gaetan Rivet @ 2018-02-07  9:26 UTC (permalink / raw)
  To: dev; +Cc: Gaetan Rivet

The declaration and identification of devices will change in v18.05.

Remove the precedent deprecation notice

Add new one reflecting the planned changes more accurately,
updated for v18.05.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad5988..07312f59a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,18 +8,23 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: several API and ABI changes are planned for ``rte_devargs`` in v18.02.
-  The format of device command line parameters will change. The bus will need
-  to be explicitly stated in the device declaration. The enum ``rte_devtype``
-  was used to identify a bus and will disappear.
-  The structure ``rte_devargs`` will change.
-  The ``rte_devargs_list`` will be made private.
-  The following functions are deprecated starting from 17.08 and will either be
-  modified or removed in 18.02:
+* eal: both declaring and identifying devices will be streamlined in v18.05.
+  New functions will appear to query a specific port from buses, classes of
+  device and device drivers. Device declaration will be made coherent with the
+  new scheme of device identification.
+  As such, ``rte_devargs`` device representation will change.
 
-  - ``rte_eal_devargs_add``
-  - ``rte_eal_devargs_type_count``
-  - ``rte_eal_parse_devargs_str``, replaced by ``rte_eal_devargs_parse``
+  - removal of ``name`` and ``args`` fields.
+  - The enum ``rte_devtype`` was used to identify a bus and will disappear.
+  - The ``rte_devargs_list`` will be made private.
+  - Functions previously deprecated will change or disappear:
+
+    + ``rte_eal_devargs_add``
+    + ``rte_eal_devargs_type_count``
+    + ``rte_eal_parse_devargs_str``, replaced by ``rte_eal_devargs_parse``
+    + ``rte_eal_devargs_parse`` will change its format and use.
+    + all ``rte_devargs`` related functions will be renamed, changing the
+      ``rte_eal_devargs_`` prefix to ``rte_devargs_``.
 
 * pci: Several exposed functions are misnamed.
   The following functions are deprecated starting from v17.11 and are replaced:
-- 
2.11.0

^ permalink raw reply	[relevance 11%]

* Re: [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration
  2018-02-02 16:46  3% ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
  2018-02-02 16:46  3%   ` [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
  2018-02-02 16:52  0%   ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Nélio Laranjeiro
@ 2018-02-06 11:31  0%   ` Shahaf Shuler
  2 siblings, 0 replies; 200+ results
From: Shahaf Shuler @ 2018-02-06 11:31 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Nélio Laranjeiro, dev, Marcelo Ricardo Leitner

Friday, February 2, 2018 6:46 PM, Adrien Mazarguil:
> The decision to deliver mlx4/mlx5 rdma-core glue plug-ins separately instead
> of generating them at run time due to security concerns [1] led to a few
> issues:
> 
> - They must be present on the file system before running DPDK.
> - Their location must be known to the dynamic linker.
> - Their names overlap and ABI compatibility is not guaranteed, which may
>   lead to crashes.
> 
> This series addresses the above by adding version information to plug-ins
> and taking CONFIG_RTE_EAL_PMD_PATH into account to locate them on the
> file system.

Series applied to next-net-mlx, with the following diff in patch 3/4:
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index cc9db9977..cc800493b 100644                                 
--- a/drivers/net/mlx4/Makefile                                   
+++ b/drivers/net/mlx4/Makefile                                   
@@ -35,7 +35,7 @@ include $(RTE_SDK)/mk/rte.vars.mk               
 LIB = librte_pmd_mlx4.a                                          
 LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)                  
 LIB_GLUE_BASE = librte_pmd_mlx4_glue.so                          
-LIB_GLUE_VERSION = 18.02.1                                       
+LIB_GLUE_VERSION = 18.02.0                                       
                                                                  
 # Sources.                                                       
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c                     
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 4086f2039..3bc9736c9 100644                                 
--- a/drivers/net/mlx5/Makefile                                   
+++ b/drivers/net/mlx5/Makefile                                   
@@ -35,7 +35,7 @@ include $(RTE_SDK)/mk/rte.vars.mk               
 LIB = librte_pmd_mlx5.a                                          
 LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)                  
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so                          
-LIB_GLUE_VERSION = 18.02.1                                       
+LIB_GLUE_VERSION = 18.02.0             

Thanks. 


> 
> [1]
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> k.org%2Fml%2Farchives%2Fdev%2F2018-
> January%2F089617.html&data=02%7C01%7Cshahafs%40mellanox.com%7C6d
> 6d87b37a574c15f41808d56a5c7eae%7Ca652971c7d2e4d9ba6a4d149256f461b
> %7C0%7C0%7C636531867854846685&sdata=9Bc7lEnU%2Fq4E5PxgOkEvgwDN
> zc46%2BZ1B5boHyxg1Cuo%3D&reserved=0
> 
> v2 changes:
> 
> - Fixed extra "\n" in glue file name generation (although it didn't break
>   functionality).
> 
> Adrien Mazarguil (4):
>   net/mlx: add debug checks to glue structure
>   net/mlx: fix missing includes for rdma-core glue
>   net/mlx: version rdma-core glue libraries
>   net/mlx: make rdma-core glue path configurable
> 
>  doc/guides/nics/mlx4.rst     | 17 ++++++++++++
>  doc/guides/nics/mlx5.rst     | 14 ++++++++++
>  drivers/net/mlx4/Makefile    |  8 ++++--
>  drivers/net/mlx4/mlx4.c      | 57
> ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx4/mlx4_glue.c |  4 +++
>  drivers/net/mlx4/mlx4_glue.h |  9 +++++++
>  drivers/net/mlx5/Makefile    |  8 ++++--
>  drivers/net/mlx5/mlx5.c      | 57
> ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_glue.c |  1 +
>  drivers/net/mlx5/mlx5_glue.h |  7 +++++
>  10 files changed, 176 insertions(+), 6 deletions(-)
> 
> --
> 2.11.0

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 17:06  0%                           ` Marcelo Ricardo Leitner
@ 2018-02-06 11:06  4%                             ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-06 11:06 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Thomas Monjalon, Van Haaren, Harry, dev, Shahaf Shuler, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 03:06:19PM -0200, Marcelo Ricardo Leitner wrote:
> On Mon, Feb 05, 2018 at 04:54:55PM +0100, Adrien Mazarguil wrote:
> > On Mon, Feb 05, 2018 at 01:29:42PM -0200, Marcelo Ricardo Leitner wrote:
> > > On Mon, Feb 05, 2018 at 03:59:18PM +0100, Adrien Mazarguil wrote:
> > > > On Mon, Feb 05, 2018 at 12:37:34PM -0200, Marcelo Ricardo Leitner wrote:
> > > > > On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> > > > > > 05/02/2018 14:44, Adrien Mazarguil:
> ...
> > > > > > > Using a weak one like CRC32 for a shorter name poses a risk of
> > > > > > > collision. Moreover the next time someone decides to update all version
> > > > > > > notices or modify a comment will impact that hash. We'd need to isolate the
> > > > > > > symbol definition itself, ignore parameter names in function prototypes and
> > > > > > > only then we may get a somewhat meaningful hash describing a given ABI.
> > > > > 
> > > > > That's what I meant with stricter. Yes it would catch such
> > > > > situations, but you tell me on how much we want to protect/restrict
> > > > > here.  Do you see a reason for building only the dpdk/pmd side and not
> > > > > the glue library at a time?
> > > > 
> > > > No, they're always built together. We're only adding this versioning to
> > > > avoid issues when users somehow end up with several DPDK versions installed
> > > > on their system, or with leftovers of previous releases lying around. That's
> > > > all we need to solve here. dlopen()'ing the proper file takes care of that,
> > > > the symbol version number check afterward is performed just in case.
> > > 
> > > Interesting. These leftovers probably wouldn't be there if it wasn't
> > > versioned in the first place. :-)
> > 
> > Seriously, we can't assume users will do everything using neat packages and
> > may run an unfortunate "make install" from the DPDK source tree without
> > noticing they wrecked their system. Someone will have to mop the ensuing but
> > preventable bug reports.
> > 
> > > > > > > Given the added complexity, is there really a problem with simple version
> > > > > > > numbers we increment every time something gets modified? (Note this is
> > > > > > > already how our .map files work, they're not generated automatically)
> > > > > > 
> > > > > > Our map files show the major version where a symbol was introduced.
> > > > > > It is simple because no symbol can be introduced in a minor version.
> > > > > > 
> > > > > > > How about keeping things as is?
> > > > > 
> > > > > I don't really see the need of unique filenames. The next patch is
> > > > > already leveraging RTE_EAL_PMD_PATH, which if versioned should be
> > > > > enough for this, no?
> > > > 
> > > > As you said, "if" versioned. As an undocumented empty string by default,
> > > > there's no way to be sure. Leaving the PMD version its internal but
> > > > (unfortunately) exposed bits will certainly prevent mistakes.
> > > > 
> > > > > > You are using 18.02.1 while it is introduced in 18.02.0.
> > > > > > If you don't want to correlate the .so version number with DPDK version
> > > > > > number, maybe that 1, 2, 3 would be a simpler choice (less confusing).
> > > > > 
> > > > > +1
> > > > 
> > > > Then are you fine with the "18.02.0" suffix?
> > > 
> > > Not really, sorry. It was more for the "1, 2, 3" sequence or tying it
> > > to dpdk version.
> > > 
> > > With the latest replies, I don't think the reasoning is enough to
> > > justify these extra checks, but I won't oppose to including it.
> > 
> > 18.02.0 makes it tied to the current release number, so I guess we agree.
> 
> It makes them equal, but not tied. If nobody patches it, when 18.02.1
> is out, the glue lib will still be 18.02.0.

Well this must be understood as "this plug-in implements 18.02.0's mlx4 glue
ABI", which remains true (and compatible) with subsequent DPDK releases as
long as the glue code is not updated.

Note this is no different from a single-digit suffix, which wouldn't be
updated either if the ABI isn't. Again, these initial digits are needed
because otherwise there is already a confusion with stable branches that
implement different ABIs and are therefore incompatible:

 librte_pmd_mlx4_glue.so.17.02.1
 librte_pmd_mlx4_glue.so.17.11.1
 librte_pmd_mlx4_glue.so.18.02.0

With a single digit, all of them would be named "librte_pmd_mlx4_glue.so.1",
rendering versioning basically useless.

> > The idea for now is this part remains tied to the DPDK release.
> > 
> > If a new ABI version is needed in a subsequent commit, the initial part gets
> > bumped to the current WIP DPDK release (say, 42.02.0). If subsequent
> > intermediate commits break the glue ABI, a fourth digit is added
> > (e.g. 42.02.0.1).
> 
> I'll defer this to other project developers. This is more about a
> project standard than anything here. I could even argue that this glue
> should be named after the pmd lib, such as
>    ./usr/local/lib/librte_pmd_mlx4_glue.so.1.1
> The fact of not providing the _glue.so symlink is enough to avoid
> others from linking against it. But it's more of a project standard
> than a technical decision, I guess, weather this lib is seen as a
> plugin or as a (private) library.

I think you nailed it, I call it a "plug-in" because dlopen() is manually
performed on it, however it's in fact a private library whose API is not
exposed and no application is supposed to use directly.

For this reason, while up to package maintainers, my suggestion is to not
install it in a public location like "/usr/local/lib" but configure
RTE_EAL_PMD_PATH to some DPDK-specific path, e.g. "/usr/share/dpdk/pmd",
which is possible since patch 4/4 of this series.

> Considering the versioning used for the PMD libs, such easy versioning
> is my preferred choice, FWIW.

Problem remains that the DPDK projects manages its own backports/stable
releases system instead of relying on package maintainers for that, so
properly versioning things from the beginning to avoid collisions is really
always a concern. Had backports not been a requirement in the first place,
I agree a single digit would have been enough.

My suggestion of using 18.02.0 (instead of 18.02.1) stands. It addresses
Thomas' concern by properly matching the DPDK release the ABI was last
updated for and mine for the backports issues mentioned above. Let's go with
that and move on.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/8] vhost: avoid enum fields in VhostUserMsg
  2018-02-05 12:16  3% ` [dpdk-dev] [PATCH 2/8] vhost: avoid enum fields in VhostUserMsg Stefan Hajnoczi
@ 2018-02-06  9:47  0%   ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2018-02-06  9:47 UTC (permalink / raw)
  To: Stefan Hajnoczi, dev; +Cc: Yuanhan Liu



On 02/05/2018 01:16 PM, Stefan Hajnoczi wrote:
> The VhostUserMsg struct binary representation must match the vhost-user
> protocol specification since this struct is read from and written to the
> socket.
> 
> The VhostUserMsg.request union contains enum fields.  Enum binary
> representation is implementation-defined according to the C standard and
> it is unportable to make assumptions about the representation:
> 
>    6.7.2.2 Enumeration specifiers
>    ...
>    Each enumerated type shall be compatible with char, a signed integer
>    type, or an unsigned integer type. The choice of type is
>    implementation-defined, but shall be capable of representing the
>    values of all the members of the enumeration.
> 
> Additionally, librte_vhost relies on the enum type being unsigned when
> validating untrusted inputs:
> 
>    if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
> 
> If msg.request.master is signed then negative values pass this check!
> 
> Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
> portability, the actual enum constants still affect the final type.  For
> example, if we add a negative constant then its type changes to signed
> int:
> 
>    typedef enum VhostUserRequest {
>        ...
>        VHOST_USER_INVALID = -1,
>    };
> 
> This is very fragile and it's unlikely that anyone changing the code
> would remember this.  A security hole can be introduced accidentally.
> 
> This patch switches VhostUserMsg.request fields to uint32_t to avoid the
> portability and potential security issues.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   lib/librte_vhost/vhost_user.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index d4bd604b9..0fafbe6e0 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -81,8 +81,8 @@ typedef struct VhostUserLog {
>   
>   typedef struct VhostUserMsg {
>   	union {
> -		VhostUserRequest master;
> -		VhostUserSlaveRequest slave;
> +		uint32_t master; /* a VhostUserRequest value */
> +		uint32_t slave;  /* a VhostUserSlaveRequest value*/
>   	} request;
>   
>   #define VHOST_USER_VERSION_MASK     0x3
> 

Maybe we could simplify to:

typedef struct VhostUserMsg {
  	uint32_t request; /* a VhostUserRequest or VhostUserSlaveRequest value */
...

Also, it seems QEMU's vhost-user master implementation uses an enum for
the request in its VhostUserMsg struct. Should it be fixed too?

Thanks,
Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets
  2018-02-06  9:28  0%         ` Burakov, Anatoly
@ 2018-02-06  9:47  0%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-06  9:47 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

06/02/2018 10:28, Burakov, Anatoly:
> On 05-Feb-18 10:45 PM, Thomas Monjalon wrote:
> > 05/02/2018 18:39, Burakov, Anatoly:
> >> On 05-Feb-18 4:37 PM, Anatoly Burakov wrote:
> >>> During lcore scan, find maximum socket ID and store it.
> >>>
> >>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >>> ---
> >>>
> >>> Notes:
> >>>       v3:
> >>>       - Added ABI backwards compatibility
> >>>       
> >>>       v2:
> >>>       - checkpatch changes
> >>>       - check socket before deciding if the core is not to be used
> >>>
> >>>    lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
> >>>    lib/librte_eal/common/include/rte_eal.h   | 25 +++++++++++++++++++++
> >>>    lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
> >>>    lib/librte_eal/linuxapp/eal/eal.c         | 27 +++++++++++++++++++++-
> >>>    lib/librte_eal/rte_eal_version.map        |  9 +++++++-
> >>>    5 files changed, 92 insertions(+), 14 deletions(-)
> >>>
> >>
> >> This patch does not break ABI, but does it in a very ugly way. Is it
> >> worth it?
> > 
> > I think we agreed to not get this patch in 18.02.
> > Did you change your mind?
> > 
> 
> Sorry, how do i mark this patch as for 18.05? Is it a patch header?

So your answer is "yes, it is for 18.05" :)

Next time, you could add 18.05 near "PATCH v3", or say it in annotations.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets
  2018-02-05 22:45  0%       ` Thomas Monjalon
@ 2018-02-06  9:28  0%         ` Burakov, Anatoly
  2018-02-06  9:47  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-02-06  9:28 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 05-Feb-18 10:45 PM, Thomas Monjalon wrote:
> 05/02/2018 18:39, Burakov, Anatoly:
>> On 05-Feb-18 4:37 PM, Anatoly Burakov wrote:
>>> During lcore scan, find maximum socket ID and store it.
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>>>
>>> Notes:
>>>       v3:
>>>       - Added ABI backwards compatibility
>>>       
>>>       v2:
>>>       - checkpatch changes
>>>       - check socket before deciding if the core is not to be used
>>>
>>>    lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>>>    lib/librte_eal/common/include/rte_eal.h   | 25 +++++++++++++++++++++
>>>    lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>>    lib/librte_eal/linuxapp/eal/eal.c         | 27 +++++++++++++++++++++-
>>>    lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>>    5 files changed, 92 insertions(+), 14 deletions(-)
>>>
>>
>> This patch does not break ABI, but does it in a very ugly way. Is it
>> worth it?
> 
> I think we agreed to not get this patch in 18.02.
> Did you change your mind?
> 

Sorry, how do i mark this patch as for 18.05? Is it a patch header?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure
  2018-02-04  7:24  4% [dpdk-dev] [PATCH] doc: annouce ABI change for RSS configuraiton structure Xueming Li
@ 2018-02-06  7:38  4% ` Xueming Li
  2018-02-13  6:52  4%   ` Shahaf Shuler
  0 siblings, 1 reply; 200+ results
From: Xueming Li @ 2018-02-06  7:38 UTC (permalink / raw)
  To: Thomas Monjalon, Neil Horman; +Cc: Xueming Li, dev, Shahaf Shuler

Update deprecation notice for the new rss_level field of
rte_eth_rss_conf.

Link: http://www.dpdk.org/dev/patchwork/patch/31891

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad5988..4bfce3bd7 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -59,3 +59,7 @@ Deprecation Notices
   be added between the producer and consumer structures. The size of the
   structure and the offset of the fields will remain the same on
   platforms with 64B cache line, but will change on other platforms.
+
+* ethdev: A new rss level field planned in 18.05.
+  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a choice
+  of RSS hash calculation on outer or inner header of tunneled packet.
-- 
2.13.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets
  2018-02-05 17:39  3%     ` Burakov, Anatoly
@ 2018-02-05 22:45  0%       ` Thomas Monjalon
  2018-02-06  9:28  0%         ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-02-05 22:45 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

05/02/2018 18:39, Burakov, Anatoly:
> On 05-Feb-18 4:37 PM, Anatoly Burakov wrote:
> > During lcore scan, find maximum socket ID and store it.
> > 
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> > 
> > Notes:
> >      v3:
> >      - Added ABI backwards compatibility
> >      
> >      v2:
> >      - checkpatch changes
> >      - check socket before deciding if the core is not to be used
> > 
> >   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
> >   lib/librte_eal/common/include/rte_eal.h   | 25 +++++++++++++++++++++
> >   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
> >   lib/librte_eal/linuxapp/eal/eal.c         | 27 +++++++++++++++++++++-
> >   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
> >   5 files changed, 92 insertions(+), 14 deletions(-)
> > 
> 
> This patch does not break ABI, but does it in a very ugly way. Is it 
> worth it?

I think we agreed to not get this patch in 18.02.
Did you change your mind?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] checkpatches.sh: Add checks for ABI symbol addition
  2018-02-05 17:29  6% ` [dpdk-dev] [PATCH v4] " Neil Horman
@ 2018-02-05 17:57  4%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-05 17:57 UTC (permalink / raw)
  To: Neil Horman
  Cc: dev, john.mcnamara, bruce.richardson, Ferruh Yigit, Stephen Hemminger

05/02/2018 18:29, Neil Horman:
>  Driver information
> +M: Neil Horman <nhorman@tuxdriver.com>
>  F: buildtools/pmdinfogen/
>  F: usertools/dpdk-pmdinfo.py
>  F: doc/guides/tools/pmdinfo.rst

This change deserves a separate patch announcing that you are
volunteer to maintain pmdinfo.

It is really not related to the rest of this patch,
but thank you for vounteering :)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets
  2018-02-05 16:37  8%   ` [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
@ 2018-02-05 17:39  3%     ` Burakov, Anatoly
  2018-02-05 22:45  0%       ` Thomas Monjalon
  2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] Add " Anatoly Burakov
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-02-05 17:39 UTC (permalink / raw)
  To: dev

On 05-Feb-18 4:37 PM, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>      v3:
>      - Added ABI backwards compatibility
>      
>      v2:
>      - checkpatch changes
>      - check socket before deciding if the core is not to be used
> 
>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>   lib/librte_eal/common/include/rte_eal.h   | 25 +++++++++++++++++++++
>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>   lib/librte_eal/linuxapp/eal/eal.c         | 27 +++++++++++++++++++++-
>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>   5 files changed, 92 insertions(+), 14 deletions(-)
> 

This patch does not break ABI, but does it in a very ugly way. Is it 
worth it?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4] checkpatches.sh: Add checks for ABI symbol addition
    2018-01-31 17:27  6% ` [dpdk-dev] [PATCH v3] " Neil Horman
@ 2018-02-05 17:29  6% ` Neil Horman
  2018-02-05 17:57  4%   ` Thomas Monjalon
  2018-02-09 15:21  6% ` [dpdk-dev] [PATCH v5] " Neil Horman
  2018-02-14 19:19  6% ` [dpdk-dev] [PATCH v6] " Neil Horman
  3 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-02-05 17:29 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, thomas, john.mcnamara, bruce.richardson,
	Ferruh Yigit, Stephen Hemminger

Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental.  The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file.  The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change.  It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: thomas@monjalon.net
CC: john.mcnamara@intel.com
CC: bruce.richardson@intel.com
CC: Ferruh Yigit <ferruh.yigit@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>

---
Change notes

v2)
 * Cleaned up and documented awk script (shemminger)
 * fixed sort/uniq usage (shemminger)
 * moved checking to new script (tmonjalon)
 * added maintainer entry (tmonjalon)
 * added license (tmonjalon)

v3)
 * Changed symbol check script name (tmonjalon)
 * Trapped exit to clean temp file (tmonjalon)
 * Honored verbose command (tmonjalon)
 * Cleaned left over debug bits (tmonjalon)
 * Updated location in MAINTAINERS file (tmonjalon)

v4)
 * Updated maintainers file (tmonjalon)
---
 MAINTAINERS                     |   2 +
 devtools/check-symbol-change.sh | 146 ++++++++++++++++++++++++++++++++++++++++
 devtools/checkpatches.sh        |  23 ++++++-
 3 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100755 devtools/check-symbol-change.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index acd056134..d1ef43479 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -86,9 +86,11 @@ M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: devtools/validate-abi.sh
+F: devtools/check-symbol-change.sh
 F: buildtools/check-experimental-syms.sh
 
 Driver information
+M: Neil Horman <nhorman@tuxdriver.com>
 F: buildtools/pmdinfogen/
 F: usertools/dpdk-pmdinfo.py
 F: doc/guides/tools/pmdinfo.rst
diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
new file mode 100755
index 000000000..22b17e6f2
--- /dev/null
+++ b/devtools/check-symbol-change.sh
@@ -0,0 +1,146 @@
+#!/bin/sh
+
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
+
+build_map_changes()
+{
+	local fname=$1
+	local mapdb=$2
+
+	cat $fname | filterdiff -i *.map | awk '
+		# Initialize our variables
+		BEGIN {map="";sym="";ar="";sec=""; in_sec=0}
+
+		# Anything that starts with + or -, followed by an a
+		# and ends in the string .map is the name of our map file
+		# This may appear multiple times in a patch if multiple
+		# map files are altered, and all section/symbol names
+		# appearing between a triggering of this rule and the
+		# next trigger of this rule are associated with this file
+		/[-+] a\/.*\.map/ {map=$2}
+
+		# Triggering this rule, which starts a line with a + and ends it
+		# with a { identifies a versioned section.  The section name is
+		# the rest of the line with the + and { symbols remvoed.
+		# Triggering this rule sets in_sec to 1, which actives the
+		# symbol rule below
+		/+.*{/ {gsub("+","");sec=$1; in_sec=1}
+
+		# This rule idenfies the end of a section, and disables the
+		# symbol rule
+		/.*}/ {in_sec=0}
+
+		# This rule matches on a + followed by any characters except a :
+		# (which denotes a global vs local segment), and ends with a ;.
+		# The semicolon is removed and the symbol is printed with its
+		# association file name and version section, along with an
+		# indicator that the symbol is a new addition.  Note this rule
+		# only works if we have found a version section in the rule
+		# above (hence the in_sec check).  Otherwise we flag it as an
+		# unknown section
+		/^+[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " add"
+			} else {
+				print map " " sym " unknown add"
+			}
+		}
+
+		# This is the same rule as above, but the rule matches on a
+		# leading - rather than a +, denoting that the symbol is being
+		# removed.
+		/^-[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " del"
+			} else {
+				print map " " sym " unknown del"
+			}
+		}' > ./$mapdb
+
+		sort -u $mapdb > ./$mapdb.2
+		mv -f $mapdb.2 $mapdb
+
+}
+
+check_for_rule_violations()
+{
+	local mapdb=$1
+	local mname
+	local symname
+	local secname
+	local ar
+	local ret=0
+
+	while read mname symname secname ar
+	do
+		if [ "$ar" == "add" ]
+		then
+
+			if [ "$secname" == "unknown" ]
+			then
+				# Just inform the user of this occurrence, but
+				# don't flag it as an error
+				echo -n "INFO: symbol $syname is added but "
+				echo -n "patch has insuficient context "
+				echo -n "to determine the section name "
+				echo -n "please ensure the version is "
+				echo "EXPERIMENTAL"
+				continue
+			fi
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Symbols that are getting added in a section
+				# other ithan the experimental section
+				# to be moving from an already supported
+				# section or its a violation
+				grep -q \
+				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				if [ $? -ne 0 ]
+				then
+					echo -n "ERROR: symbol $symname "
+					echo -n "is added in a section "
+					echo -n "other than the EXPERIMENTAL "
+					echo "section of the version map"
+					ret=1
+				fi
+			fi
+		else
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Just inform users that non-experimenal
+				# symbols need to go through a deprecation
+				# process
+				echo -n "INFO: symbol $symname is being "
+				echo -n "removed, ensure that it has "
+				echo "gone through the deprecation process"
+			fi
+		fi
+	done < $mapdb
+
+	return $ret
+}
+
+trap clean_and_exit_on_sig EXIT
+
+mapfile=`mktemp mapdb.XXXXXX`
+patch=$1
+exit_code=1
+
+clean_and_exit_on_sig()
+{
+	rm -f $mapfile
+	exit $exit_code
+}
+
+build_map_changes $patch $mapfile
+check_for_rule_violations $mapfile
+exit_code=$?
+
+rm -f $mapfile
+
+exit $exit_code
+
+
diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 7676a6b50..0b2b5f039 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -35,6 +35,8 @@
 # - DPDK_CHECKPATCH_LINE_LENGTH
 . $(dirname $(readlink -e $0))/load-devel-config
 
+VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh
+
 length=${DPDK_CHECKPATCH_LINE_LENGTH:-80}
 
 # override default Linux options
@@ -61,6 +63,7 @@ print_usage () {
 	END_OF_HELP
 }
 
+
 number=0
 quiet=false
 verbose=false
@@ -86,6 +89,7 @@ total=0
 status=0
 
 check () { # <patch> <commit> <title>
+	local reta
 	total=$(($total + 1))
 	! $verbose || printf '\n### %s\n\n' "$3"
 	if [ -n "$1" ] ; then
@@ -96,9 +100,26 @@ check () { # <patch> <commit> <title>
 	else
 		report=$($DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
 	fi
-	[ $? -ne 0 ] || return 0
+	reta=$?
+
 	$verbose || printf '\n### %s\n\n' "$3"
 	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+
+	! $verbose || echo
+	! $verbose || echo "Checking API additions/removals:"
+
+	if [ -n "$1" ] ; then
+		report=$($VALIDATE_NEW_API $1)
+	elif [ -n "$2" ] ; then
+		report=$(git format-patch \
+			 --find-renames --no-stat --stdout -1 $commit |
+			$VALIDATE_NEW_API -)
+	else
+		report=$($VALIDATE_NEW_API -)
+	fi
+	[ $? -ne 0 -o $reta -ne 0 ] || return 0
+	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+
 	status=$(($status + 1))
 }
 
-- 
2.14.3

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 15:54  4%                         ` Adrien Mazarguil
@ 2018-02-05 17:06  0%                           ` Marcelo Ricardo Leitner
  2018-02-06 11:06  4%                             ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Marcelo Ricardo Leitner @ 2018-02-05 17:06 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, Van Haaren, Harry, dev, Shahaf Shuler, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 04:54:55PM +0100, Adrien Mazarguil wrote:
> On Mon, Feb 05, 2018 at 01:29:42PM -0200, Marcelo Ricardo Leitner wrote:
> > On Mon, Feb 05, 2018 at 03:59:18PM +0100, Adrien Mazarguil wrote:
> > > On Mon, Feb 05, 2018 at 12:37:34PM -0200, Marcelo Ricardo Leitner wrote:
> > > > On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> > > > > 05/02/2018 14:44, Adrien Mazarguil:
...
> > > > > > Using a weak one like CRC32 for a shorter name poses a risk of
> > > > > > collision. Moreover the next time someone decides to update all version
> > > > > > notices or modify a comment will impact that hash. We'd need to isolate the
> > > > > > symbol definition itself, ignore parameter names in function prototypes and
> > > > > > only then we may get a somewhat meaningful hash describing a given ABI.
> > > > 
> > > > That's what I meant with stricter. Yes it would catch such
> > > > situations, but you tell me on how much we want to protect/restrict
> > > > here.  Do you see a reason for building only the dpdk/pmd side and not
> > > > the glue library at a time?
> > > 
> > > No, they're always built together. We're only adding this versioning to
> > > avoid issues when users somehow end up with several DPDK versions installed
> > > on their system, or with leftovers of previous releases lying around. That's
> > > all we need to solve here. dlopen()'ing the proper file takes care of that,
> > > the symbol version number check afterward is performed just in case.
> > 
> > Interesting. These leftovers probably wouldn't be there if it wasn't
> > versioned in the first place. :-)
> 
> Seriously, we can't assume users will do everything using neat packages and
> may run an unfortunate "make install" from the DPDK source tree without
> noticing they wrecked their system. Someone will have to mop the ensuing but
> preventable bug reports.
> 
> > > > > > Given the added complexity, is there really a problem with simple version
> > > > > > numbers we increment every time something gets modified? (Note this is
> > > > > > already how our .map files work, they're not generated automatically)
> > > > > 
> > > > > Our map files show the major version where a symbol was introduced.
> > > > > It is simple because no symbol can be introduced in a minor version.
> > > > > 
> > > > > > How about keeping things as is?
> > > > 
> > > > I don't really see the need of unique filenames. The next patch is
> > > > already leveraging RTE_EAL_PMD_PATH, which if versioned should be
> > > > enough for this, no?
> > > 
> > > As you said, "if" versioned. As an undocumented empty string by default,
> > > there's no way to be sure. Leaving the PMD version its internal but
> > > (unfortunately) exposed bits will certainly prevent mistakes.
> > > 
> > > > > You are using 18.02.1 while it is introduced in 18.02.0.
> > > > > If you don't want to correlate the .so version number with DPDK version
> > > > > number, maybe that 1, 2, 3 would be a simpler choice (less confusing).
> > > > 
> > > > +1
> > > 
> > > Then are you fine with the "18.02.0" suffix?
> > 
> > Not really, sorry. It was more for the "1, 2, 3" sequence or tying it
> > to dpdk version.
> > 
> > With the latest replies, I don't think the reasoning is enough to
> > justify these extra checks, but I won't oppose to including it.
> 
> 18.02.0 makes it tied to the current release number, so I guess we agree.

It makes them equal, but not tied. If nobody patches it, when 18.02.1
is out, the glue lib will still be 18.02.0.

> The idea for now is this part remains tied to the DPDK release.
> 
> If a new ABI version is needed in a subsequent commit, the initial part gets
> bumped to the current WIP DPDK release (say, 42.02.0). If subsequent
> intermediate commits break the glue ABI, a fourth digit is added
> (e.g. 42.02.0.1).

I'll defer this to other project developers. This is more about a
project standard than anything here. I could even argue that this glue
should be named after the pmd lib, such as
   ./usr/local/lib/librte_pmd_mlx4_glue.so.1.1
The fact of not providing the _glue.so symlink is enough to avoid
others from linking against it. But it's more of a project standard
than a technical decision, I guess, weather this lib is seen as a
plugin or as a (private) library.

Considering the versioning used for the PMD libs, such easy versioning
is my preferred choice, FWIW.

> 
> This role is currently held by the third digit but since there's a confusion
> with DPDK revisions, it won't be used internally by the PMD. Hopefully this
> fourth digit will remain unused (otherwise I can add as many digits as
> necessary to make it acceptable, I'll then re-consider the SHA1 idea :)

hehe :-)

  Marcelo

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets
       [not found]     ` <cover.1517848624.git.anatoly.burakov@intel.com>
@ 2018-02-05 16:37  8%   ` Anatoly Burakov
  2018-02-05 17:39  3%     ` Burakov, Anatoly
                       ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Anatoly Burakov @ 2018-02-05 16:37 UTC (permalink / raw)
  To: dev

During lcore scan, find maximum socket ID and store it.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v3:
    - Added ABI backwards compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
 lib/librte_eal/common/include/rte_eal.h   | 25 +++++++++++++++++++++
 lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
 lib/librte_eal/linuxapp/eal/eal.c         | 27 +++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map        |  9 +++++++-
 5 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..827ddeb 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, max_socket_id = 0;
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		max_socket_id = RTE_MAX(max_socket_id, socket_id);
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	config->numa_node_count = max_socket_id + 1;
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int
+rte_num_sockets(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 08c6637..bbf54e2 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,29 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
+	uint32_t service_lcore_count;/**< Number of available service cores. */
+	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
+
+	/** Primary or secondary configuration */
+	enum rte_proc_type_t process_type;
+
+	/** PA or VA mapping mode */
+	enum rte_iova_mode iova_mode;
+
+	/**
+	 * Pointer to memory configuration, which may be shared across multiple
+	 * DPDK instances
+	 */
+	struct rte_mem_config *mem_config;
+} __attribute__((__packed__));
+
+/**
+ * The global RTE configuration structure - 18.02 ABI version.
+ */
+struct rte_config_v1802 {
+	uint32_t master_lcore;       /**< Id of the master lcore */
+	uint32_t lcore_count;        /**< Number of available logical cores. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
@@ -80,6 +103,8 @@ struct rte_config {
  *   A pointer to the global configuration structure.
  */
 struct rte_config *rte_eal_get_configuration(void);
+struct rte_config_v1802 *rte_eal_get_configuration_v1802(void);
+struct rte_config *rte_eal_get_configuration_v1805(void);
 
 /**
  * Get a lcore's role.
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index d84bcff..ddf4c64 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -120,6 +120,14 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets on the system.
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ *
+ */
+unsigned int rte_num_sockets(void);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 451fdaf..757f404 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -67,6 +67,9 @@ static rte_usage_hook_t	rte_application_usage_hook = NULL;
 /* early configuration structure, when memory config is not mmapped */
 static struct rte_mem_config early_mem_config;
 
+/* compatibility structure to return to old ABI calls */
+static struct rte_config_v1802 v1802_config;
+
 /* define fd variable here, because file needs to be kept open for the
  * duration of the program, as we hold a write lock on it in the primary proc */
 static int mem_cfg_fd = -1;
@@ -103,11 +106,33 @@ rte_eal_mbuf_default_mempool_ops(void)
 }
 
 /* Return a pointer to the configuration structure */
+struct rte_config_v1802 *
+rte_eal_get_configuration_v1802(void)
+{
+	/* copy everything to old config so that it's up to date */
+	v1802_config.iova_mode = rte_config.iova_mode;
+	v1802_config.lcore_count = rte_config.lcore_count;
+	memcpy(v1802_config.lcore_role, rte_config.lcore_role,
+			sizeof(rte_config.lcore_role));
+	v1802_config.master_lcore = rte_config.master_lcore;
+	v1802_config.mem_config = rte_config.mem_config;
+	v1802_config.process_type = rte_config.process_type;
+	v1802_config.service_lcore_count = rte_config.service_lcore_count;
+
+	return &v1802_config;
+}
+VERSION_SYMBOL(rte_eal_get_configuration, _v1802, 18.02);
+
+/* Return a pointer to the configuration structure */
 struct rte_config *
-rte_eal_get_configuration(void)
+rte_eal_get_configuration_v1805(void)
 {
 	return &rte_config;
 }
+VERSION_SYMBOL(rte_eal_get_configuration, _v1805, 18.05);
+BIND_DEFAULT_SYMBOL(rte_eal_get_configuration, _v1805, 18.05);
+MAP_STATIC_SYMBOL(struct rte_config *rte_eal_get_configuration(void),
+		rte_eal_get_configuration_v1805);
 
 enum rte_iova_mode
 rte_eal_iova_mode(void)
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 4146907..fc83e74 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -211,6 +211,13 @@ DPDK_18.02 {
 
 }  DPDK_17.11;
 
+DPDK_18.05 {
+	global:
+
+	rte_num_sockets;
+
+} DPDK_18.02;
+
 EXPERIMENTAL {
 	global:
 
@@ -255,4 +262,4 @@ EXPERIMENTAL {
 	rte_service_set_stats_enable;
 	rte_service_start_with_defaults;
 
-} DPDK_18.02;
+} DPDK_18.05;
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 15:29  0%                       ` Marcelo Ricardo Leitner
@ 2018-02-05 15:54  4%                         ` Adrien Mazarguil
  2018-02-05 17:06  0%                           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-02-05 15:54 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Thomas Monjalon, Van Haaren, Harry, dev, Shahaf Shuler, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 01:29:42PM -0200, Marcelo Ricardo Leitner wrote:
> On Mon, Feb 05, 2018 at 03:59:18PM +0100, Adrien Mazarguil wrote:
> > On Mon, Feb 05, 2018 at 12:37:34PM -0200, Marcelo Ricardo Leitner wrote:
> > > On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> > > > 05/02/2018 14:44, Adrien Mazarguil:
> > > > > On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> > > > > > On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > > > > > > Sent: Monday, February 5, 2018 12:14 PM
> > > > > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > > > > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > > > > > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > > > > > > libraries
> > > > > > > > 
> > > > > > > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > > > > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > > > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > > > > >
> > > > > > > > > > >  # Library name.
> > > > > > > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > > > > > > >
> > > > > > > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > > > > > > Ideally, you should retrieve it from rte_version.h.
> > > > > > > > >
> > > > > > > > > Keep in mind this only needs to be updated when the glue API gets
> > > > > > > > modified,
> > > > > > > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > > > > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > > > > > > >
> > > > > > > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > > > > > > different sets of Verbs calls and thus a different version, hence the
> > > > > > > > added
> > > > > > > > > "18.02" as a starting point. The last digit may have to be modified
> > > > > > > > possibly
> > > > > > > > > several times between official DPDK releases while work is being done on
> > > > > > > > the
> > > > > > > > > PMD (i.e. per commit).
> > > > > > > > >
> > > > > > > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > > > > > > really think it should, doing so will make things more complex in the
> > > > > > > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > > > > > > 
> > > > > > > > What about appending date +%s output to it? It would be stricter and
> > > > > > > > automated.
> > > > > > > 
> > > > > > > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > > > > > > generally not recommended.
> > > > > > 
> > > > > > Then the sha1sum of mlx4_glue.h.
> > > > > > With this the size check I mentioned on the other patch would become
> > > > > > redundant and unnecessary.
> > > > > 
> > > > > Using a strong hash algorithm to version a library/symbol, while possible,
> > > > > seems a bit overkill and results in ugliness:
> > > > > 
> > > > >  librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3
> > > 
> > > Ugh yes, but it wouldn't need to be that visible. A pointer on
> > > mlx*_glue and a define on PMD would be enough already. As in, an
> > > extended check to the versioning.
> > 
> > I thought you suggested this as a replacement. I'm not sure we need or want
> > to go this far. The current string comparison is really not worse than
> > standard symbol versioning, which doesn't check symbol properties besides
> > whether they are functions or other objects. We could have used C++ with
> > automatically mangled symbol names for that, however that again would make
> > things way more complex than necessary.
> > 
> > > > > Using a weak one like CRC32 for a shorter name poses a risk of
> > > > > collision. Moreover the next time someone decides to update all version
> > > > > notices or modify a comment will impact that hash. We'd need to isolate the
> > > > > symbol definition itself, ignore parameter names in function prototypes and
> > > > > only then we may get a somewhat meaningful hash describing a given ABI.
> > > 
> > > That's what I meant with stricter. Yes it would catch such
> > > situations, but you tell me on how much we want to protect/restrict
> > > here.  Do you see a reason for building only the dpdk/pmd side and not
> > > the glue library at a time?
> > 
> > No, they're always built together. We're only adding this versioning to
> > avoid issues when users somehow end up with several DPDK versions installed
> > on their system, or with leftovers of previous releases lying around. That's
> > all we need to solve here. dlopen()'ing the proper file takes care of that,
> > the symbol version number check afterward is performed just in case.
> 
> Interesting. These leftovers probably wouldn't be there if it wasn't
> versioned in the first place. :-)

Seriously, we can't assume users will do everything using neat packages and
may run an unfortunate "make install" from the DPDK source tree without
noticing they wrecked their system. Someone will have to mop the ensuing but
preventable bug reports.

> > > > > Given the added complexity, is there really a problem with simple version
> > > > > numbers we increment every time something gets modified? (Note this is
> > > > > already how our .map files work, they're not generated automatically)
> > > > 
> > > > Our map files show the major version where a symbol was introduced.
> > > > It is simple because no symbol can be introduced in a minor version.
> > > > 
> > > > > How about keeping things as is?
> > > 
> > > I don't really see the need of unique filenames. The next patch is
> > > already leveraging RTE_EAL_PMD_PATH, which if versioned should be
> > > enough for this, no?
> > 
> > As you said, "if" versioned. As an undocumented empty string by default,
> > there's no way to be sure. Leaving the PMD version its internal but
> > (unfortunately) exposed bits will certainly prevent mistakes.
> > 
> > > > You are using 18.02.1 while it is introduced in 18.02.0.
> > > > If you don't want to correlate the .so version number with DPDK version
> > > > number, maybe that 1, 2, 3 would be a simpler choice (less confusing).
> > > 
> > > +1
> > 
> > Then are you fine with the "18.02.0" suffix?
> 
> Not really, sorry. It was more for the "1, 2, 3" sequence or tying it
> to dpdk version.
> 
> With the latest replies, I don't think the reasoning is enough to
> justify these extra checks, but I won't oppose to including it.

18.02.0 makes it tied to the current release number, so I guess we agree.
The idea for now is this part remains tied to the DPDK release.

If a new ABI version is needed in a subsequent commit, the initial part gets
bumped to the current WIP DPDK release (say, 42.02.0). If subsequent
intermediate commits break the glue ABI, a fourth digit is added
(e.g. 42.02.0.1).

This role is currently held by the third digit but since there's a confusion
with DPDK revisions, it won't be used internally by the PMD. Hopefully this
fourth digit will remain unused (otherwise I can add as many digits as
necessary to make it acceptable, I'll then re-consider the SHA1 idea :)

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 14:59  0%                     ` Adrien Mazarguil
@ 2018-02-05 15:29  0%                       ` Marcelo Ricardo Leitner
  2018-02-05 15:54  4%                         ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Marcelo Ricardo Leitner @ 2018-02-05 15:29 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, Van Haaren, Harry, dev, Shahaf Shuler, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 03:59:18PM +0100, Adrien Mazarguil wrote:
> On Mon, Feb 05, 2018 at 12:37:34PM -0200, Marcelo Ricardo Leitner wrote:
> > On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> > > 05/02/2018 14:44, Adrien Mazarguil:
> > > > On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> > > > > On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > > > > > Sent: Monday, February 5, 2018 12:14 PM
> > > > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > > > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > > > > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > > > > > libraries
> > > > > > > 
> > > > > > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > > > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > > > >
> > > > > > > > > >  # Library name.
> > > > > > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > > > > > >
> > > > > > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > > > > > Ideally, you should retrieve it from rte_version.h.
> > > > > > > >
> > > > > > > > Keep in mind this only needs to be updated when the glue API gets
> > > > > > > modified,
> > > > > > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > > > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > > > > > >
> > > > > > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > > > > > different sets of Verbs calls and thus a different version, hence the
> > > > > > > added
> > > > > > > > "18.02" as a starting point. The last digit may have to be modified
> > > > > > > possibly
> > > > > > > > several times between official DPDK releases while work is being done on
> > > > > > > the
> > > > > > > > PMD (i.e. per commit).
> > > > > > > >
> > > > > > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > > > > > really think it should, doing so will make things more complex in the
> > > > > > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > > > > > 
> > > > > > > What about appending date +%s output to it? It would be stricter and
> > > > > > > automated.
> > > > > > 
> > > > > > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > > > > > generally not recommended.
> > > > > 
> > > > > Then the sha1sum of mlx4_glue.h.
> > > > > With this the size check I mentioned on the other patch would become
> > > > > redundant and unnecessary.
> > > > 
> > > > Using a strong hash algorithm to version a library/symbol, while possible,
> > > > seems a bit overkill and results in ugliness:
> > > > 
> > > >  librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3
> > 
> > Ugh yes, but it wouldn't need to be that visible. A pointer on
> > mlx*_glue and a define on PMD would be enough already. As in, an
> > extended check to the versioning.
> 
> I thought you suggested this as a replacement. I'm not sure we need or want
> to go this far. The current string comparison is really not worse than
> standard symbol versioning, which doesn't check symbol properties besides
> whether they are functions or other objects. We could have used C++ with
> automatically mangled symbol names for that, however that again would make
> things way more complex than necessary.
> 
> > > > Using a weak one like CRC32 for a shorter name poses a risk of
> > > > collision. Moreover the next time someone decides to update all version
> > > > notices or modify a comment will impact that hash. We'd need to isolate the
> > > > symbol definition itself, ignore parameter names in function prototypes and
> > > > only then we may get a somewhat meaningful hash describing a given ABI.
> > 
> > That's what I meant with stricter. Yes it would catch such
> > situations, but you tell me on how much we want to protect/restrict
> > here.  Do you see a reason for building only the dpdk/pmd side and not
> > the glue library at a time?
> 
> No, they're always built together. We're only adding this versioning to
> avoid issues when users somehow end up with several DPDK versions installed
> on their system, or with leftovers of previous releases lying around. That's
> all we need to solve here. dlopen()'ing the proper file takes care of that,
> the symbol version number check afterward is performed just in case.

Interesting. These leftovers probably wouldn't be there if it wasn't
versioned in the first place. :-)

> 
> > > > Given the added complexity, is there really a problem with simple version
> > > > numbers we increment every time something gets modified? (Note this is
> > > > already how our .map files work, they're not generated automatically)
> > > 
> > > Our map files show the major version where a symbol was introduced.
> > > It is simple because no symbol can be introduced in a minor version.
> > > 
> > > > How about keeping things as is?
> > 
> > I don't really see the need of unique filenames. The next patch is
> > already leveraging RTE_EAL_PMD_PATH, which if versioned should be
> > enough for this, no?
> 
> As you said, "if" versioned. As an undocumented empty string by default,
> there's no way to be sure. Leaving the PMD version its internal but
> (unfortunately) exposed bits will certainly prevent mistakes.
> 
> > > You are using 18.02.1 while it is introduced in 18.02.0.
> > > If you don't want to correlate the .so version number with DPDK version
> > > number, maybe that 1, 2, 3 would be a simpler choice (less confusing).
> > 
> > +1
> 
> Then are you fine with the "18.02.0" suffix?

Not really, sorry. It was more for the "1, 2, 3" sequence or tying it
to dpdk version.

With the latest replies, I don't think the reasoning is enough to
justify these extra checks, but I won't oppose to including it.

  Marcelo

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 14:37  0%                   ` Marcelo Ricardo Leitner
@ 2018-02-05 14:59  0%                     ` Adrien Mazarguil
  2018-02-05 15:29  0%                       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-02-05 14:59 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Thomas Monjalon, Van Haaren, Harry, dev, Shahaf Shuler, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 12:37:34PM -0200, Marcelo Ricardo Leitner wrote:
> On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> > 05/02/2018 14:44, Adrien Mazarguil:
> > > On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> > > > On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > > > > Sent: Monday, February 5, 2018 12:14 PM
> > > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > > > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > > > > libraries
> > > > > > 
> > > > > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > > >
> > > > > > > > >  # Library name.
> > > > > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > > > > >
> > > > > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > > > > Ideally, you should retrieve it from rte_version.h.
> > > > > > >
> > > > > > > Keep in mind this only needs to be updated when the glue API gets
> > > > > > modified,
> > > > > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > > > > >
> > > > > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > > > > different sets of Verbs calls and thus a different version, hence the
> > > > > > added
> > > > > > > "18.02" as a starting point. The last digit may have to be modified
> > > > > > possibly
> > > > > > > several times between official DPDK releases while work is being done on
> > > > > > the
> > > > > > > PMD (i.e. per commit).
> > > > > > >
> > > > > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > > > > really think it should, doing so will make things more complex in the
> > > > > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > > > > 
> > > > > > What about appending date +%s output to it? It would be stricter and
> > > > > > automated.
> > > > > 
> > > > > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > > > > generally not recommended.
> > > > 
> > > > Then the sha1sum of mlx4_glue.h.
> > > > With this the size check I mentioned on the other patch would become
> > > > redundant and unnecessary.
> > > 
> > > Using a strong hash algorithm to version a library/symbol, while possible,
> > > seems a bit overkill and results in ugliness:
> > > 
> > >  librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3
> 
> Ugh yes, but it wouldn't need to be that visible. A pointer on
> mlx*_glue and a define on PMD would be enough already. As in, an
> extended check to the versioning.

I thought you suggested this as a replacement. I'm not sure we need or want
to go this far. The current string comparison is really not worse than
standard symbol versioning, which doesn't check symbol properties besides
whether they are functions or other objects. We could have used C++ with
automatically mangled symbol names for that, however that again would make
things way more complex than necessary.

> > > Using a weak one like CRC32 for a shorter name poses a risk of
> > > collision. Moreover the next time someone decides to update all version
> > > notices or modify a comment will impact that hash. We'd need to isolate the
> > > symbol definition itself, ignore parameter names in function prototypes and
> > > only then we may get a somewhat meaningful hash describing a given ABI.
> 
> That's what I meant with stricter. Yes it would catch such
> situations, but you tell me on how much we want to protect/restrict
> here.  Do you see a reason for building only the dpdk/pmd side and not
> the glue library at a time?

No, they're always built together. We're only adding this versioning to
avoid issues when users somehow end up with several DPDK versions installed
on their system, or with leftovers of previous releases lying around. That's
all we need to solve here. dlopen()'ing the proper file takes care of that,
the symbol version number check afterward is performed just in case.

> > > Given the added complexity, is there really a problem with simple version
> > > numbers we increment every time something gets modified? (Note this is
> > > already how our .map files work, they're not generated automatically)
> > 
> > Our map files show the major version where a symbol was introduced.
> > It is simple because no symbol can be introduced in a minor version.
> > 
> > > How about keeping things as is?
> 
> I don't really see the need of unique filenames. The next patch is
> already leveraging RTE_EAL_PMD_PATH, which if versioned should be
> enough for this, no?

As you said, "if" versioned. As an undocumented empty string by default,
there's no way to be sure. Leaving the PMD version its internal but
(unfortunately) exposed bits will certainly prevent mistakes.

> > You are using 18.02.1 while it is introduced in 18.02.0.
> > If you don't want to correlate the .so version number with DPDK version
> > number, maybe that 1, 2, 3 would be a simpler choice (less confusing).
> 
> +1

Then are you fine with the "18.02.0" suffix?

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 14:16  0%                 ` Thomas Monjalon
  2018-02-05 14:33  0%                   ` Adrien Mazarguil
@ 2018-02-05 14:37  0%                   ` Marcelo Ricardo Leitner
  2018-02-05 14:59  0%                     ` Adrien Mazarguil
  1 sibling, 1 reply; 200+ results
From: Marcelo Ricardo Leitner @ 2018-02-05 14:37 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Adrien Mazarguil, Van Haaren, Harry, dev, Shahaf Shuler,
	Nelio Laranjeiro

On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> 05/02/2018 14:44, Adrien Mazarguil:
> > On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> > > On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > > > Sent: Monday, February 5, 2018 12:14 PM
> > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > > > libraries
> > > > > 
> > > > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > >
> > > > > > > >  # Library name.
> > > > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > > > >
> > > > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > > > Ideally, you should retrieve it from rte_version.h.
> > > > > >
> > > > > > Keep in mind this only needs to be updated when the glue API gets
> > > > > modified,
> > > > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > > > >
> > > > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > > > different sets of Verbs calls and thus a different version, hence the
> > > > > added
> > > > > > "18.02" as a starting point. The last digit may have to be modified
> > > > > possibly
> > > > > > several times between official DPDK releases while work is being done on
> > > > > the
> > > > > > PMD (i.e. per commit).
> > > > > >
> > > > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > > > really think it should, doing so will make things more complex in the
> > > > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > > > 
> > > > > What about appending date +%s output to it? It would be stricter and
> > > > > automated.
> > > > 
> > > > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > > > generally not recommended.
> > > 
> > > Then the sha1sum of mlx4_glue.h.
> > > With this the size check I mentioned on the other patch would become
> > > redundant and unnecessary.
> > 
> > Using a strong hash algorithm to version a library/symbol, while possible,
> > seems a bit overkill and results in ugliness:
> > 
> >  librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3

Ugh yes, but it wouldn't need to be that visible. A pointer on
mlx*_glue and a define on PMD would be enough already. As in, an
extended check to the versioning.

> > 
> > Using a weak one like CRC32 for a shorter name poses a risk of
> > collision. Moreover the next time someone decides to update all version
> > notices or modify a comment will impact that hash. We'd need to isolate the
> > symbol definition itself, ignore parameter names in function prototypes and
> > only then we may get a somewhat meaningful hash describing a given ABI.

That's what I meant with stricter. Yes it would catch such
situations, but you tell me on how much we want to protect/restrict
here.  Do you see a reason for building only the dpdk/pmd side and not
the glue library at a time?

> > 
> > Given the added complexity, is there really a problem with simple version
> > numbers we increment every time something gets modified? (Note this is
> > already how our .map files work, they're not generated automatically)
> 
> Our map files show the major version where a symbol was introduced.
> It is simple because no symbol can be introduced in a minor version.
> 
> > How about keeping things as is?

I don't really see the need of unique filenames. The next patch is
already leveraging RTE_EAL_PMD_PATH, which if versioned should be
enough for this, no?

> 
> You are using 18.02.1 while it is introduced in 18.02.0.
> If you don't want to correlate the .so version number with DPDK version
> number, maybe that 1, 2, 3 would be a simpler choice (less confusing).

+1

  Marcelo

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 14:16  0%                 ` Thomas Monjalon
@ 2018-02-05 14:33  0%                   ` Adrien Mazarguil
  2018-02-05 14:37  0%                   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-05 14:33 UTC (permalink / raw)
  To: Thomas Monjalon, Shahaf Shuler
  Cc: Marcelo Ricardo Leitner, Van Haaren, Harry, dev, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 03:16:21PM +0100, Thomas Monjalon wrote:
> 05/02/2018 14:44, Adrien Mazarguil:
> > On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> > > On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > > > Sent: Monday, February 5, 2018 12:14 PM
> > > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > > > libraries
> > > > > 
> > > > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > > >
> > > > > > > >  # Library name.
> > > > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > > > >
> > > > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > > > Ideally, you should retrieve it from rte_version.h.
> > > > > >
> > > > > > Keep in mind this only needs to be updated when the glue API gets
> > > > > modified,
> > > > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > > > >
> > > > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > > > different sets of Verbs calls and thus a different version, hence the
> > > > > added
> > > > > > "18.02" as a starting point. The last digit may have to be modified
> > > > > possibly
> > > > > > several times between official DPDK releases while work is being done on
> > > > > the
> > > > > > PMD (i.e. per commit).
> > > > > >
> > > > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > > > really think it should, doing so will make things more complex in the
> > > > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > > > 
> > > > > What about appending date +%s output to it? It would be stricter and
> > > > > automated.
> > > > 
> > > > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > > > generally not recommended.
> > > 
> > > Then the sha1sum of mlx4_glue.h.
> > > With this the size check I mentioned on the other patch would become
> > > redundant and unnecessary.
> > 
> > Using a strong hash algorithm to version a library/symbol, while possible,
> > seems a bit overkill and results in ugliness:
> > 
> >  librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3
> > 
> > Using a weak one like CRC32 for a shorter name poses a risk of
> > collision. Moreover the next time someone decides to update all version
> > notices or modify a comment will impact that hash. We'd need to isolate the
> > symbol definition itself, ignore parameter names in function prototypes and
> > only then we may get a somewhat meaningful hash describing a given ABI.
> > 
> > Given the added complexity, is there really a problem with simple version
> > numbers we increment every time something gets modified? (Note this is
> > already how our .map files work, they're not generated automatically)
> 
> Our map files show the major version where a symbol was introduced.
> It is simple because no symbol can be introduced in a minor version.
> 
> > How about keeping things as is?
> 
> You are using 18.02.1 while it is introduced in 18.02.0.
> If you don't want to correlate the .so version number with DPDK version
> number, maybe that 1, 2, 3 would be a simpler choice (less confusing).

I don't really care as long as there's no confusion with their backported
counterparts (namely 17.11 and 17.02). I understand the possible confusion
for someone who'd grep the sources though.

If 18.02.0 is OK in everyone's opinion, let's use that. It satisfies the
uniqueness requirement. We'll add a digit or find some other versioning
scheme later if necessary.

Shahaf, can you make a minor adjustment while applying this series?

Both drivers/net/mlx4/Makefile and drivers/net/mlx5/Makefile need to be
modified as follows in patch 3/4:

 -LIB_GLUE_VERSION = 18.02.1
 +LIB_GLUE_VERSION = 18.02.0

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-05 13:44  3%               ` Adrien Mazarguil
@ 2018-02-05 14:16  0%                 ` Thomas Monjalon
  2018-02-05 14:33  0%                   ` Adrien Mazarguil
  2018-02-05 14:37  0%                   ` Marcelo Ricardo Leitner
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2018-02-05 14:16 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Marcelo Ricardo Leitner, Van Haaren, Harry, dev, Shahaf Shuler,
	Nelio Laranjeiro

05/02/2018 14:44, Adrien Mazarguil:
> On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> > On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > > Sent: Monday, February 5, 2018 12:14 PM
> > > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > > libraries
> > > > 
> > > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > > >
> > > > > > >  # Library name.
> > > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > > >
> > > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > > Ideally, you should retrieve it from rte_version.h.
> > > > >
> > > > > Keep in mind this only needs to be updated when the glue API gets
> > > > modified,
> > > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > > >
> > > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > > different sets of Verbs calls and thus a different version, hence the
> > > > added
> > > > > "18.02" as a starting point. The last digit may have to be modified
> > > > possibly
> > > > > several times between official DPDK releases while work is being done on
> > > > the
> > > > > PMD (i.e. per commit).
> > > > >
> > > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > > really think it should, doing so will make things more complex in the
> > > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > > 
> > > > What about appending date +%s output to it? It would be stricter and
> > > > automated.
> > > 
> > > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > > generally not recommended.
> > 
> > Then the sha1sum of mlx4_glue.h.
> > With this the size check I mentioned on the other patch would become
> > redundant and unnecessary.
> 
> Using a strong hash algorithm to version a library/symbol, while possible,
> seems a bit overkill and results in ugliness:
> 
>  librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3
> 
> Using a weak one like CRC32 for a shorter name poses a risk of
> collision. Moreover the next time someone decides to update all version
> notices or modify a comment will impact that hash. We'd need to isolate the
> symbol definition itself, ignore parameter names in function prototypes and
> only then we may get a somewhat meaningful hash describing a given ABI.
> 
> Given the added complexity, is there really a problem with simple version
> numbers we increment every time something gets modified? (Note this is
> already how our .map files work, they're not generated automatically)

Our map files show the major version where a symbol was introduced.
It is simple because no symbol can be introduced in a minor version.

> How about keeping things as is?

You are using 18.02.1 while it is introduced in 18.02.0.
If you don't want to correlate the .so version number with DPDK version
number, maybe that 1, 2, 3 would be a simpler choice (less confusing).

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  @ 2018-02-05 13:44  3%               ` Adrien Mazarguil
  2018-02-05 14:16  0%                 ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-02-05 13:44 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Van Haaren, Harry, Thomas Monjalon, dev, Shahaf Shuler, Nelio Laranjeiro

On Mon, Feb 05, 2018 at 10:58:06AM -0200, Marcelo Ricardo Leitner wrote:
> On Mon, Feb 05, 2018 at 12:24:23PM +0000, Van Haaren, Harry wrote:
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marcelo Ricardo Leitner
> > > Sent: Monday, February 5, 2018 12:14 PM
> > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org; Shahaf Shuler
> > > <shahafs@mellanox.com>; Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue
> > > libraries
> > > 
> > > On Mon, Feb 05, 2018 at 12:24:02PM +0100, Adrien Mazarguil wrote:
> > > > On Sun, Feb 04, 2018 at 03:29:38PM +0100, Thomas Monjalon wrote:
> > > > > 02/02/2018 17:46, Adrien Mazarguil:
> > > > > > --- a/drivers/net/mlx4/Makefile
> > > > > > +++ b/drivers/net/mlx4/Makefile
> > > > > > @@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > > > >
> > > > > >  # Library name.
> > > > > >  LIB = librte_pmd_mlx4.a
> > > > > > -LIB_GLUE = librte_pmd_mlx4_glue.so
> > > > > > +LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> > > > > > +LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
> > > > > > +LIB_GLUE_VERSION = 18.02.1
> > > > >
> > > > > You should use the version number of the release, i.e. 18.02.0
> > > > > Ideally, you should retrieve it from rte_version.h.
> > > >
> > > > Keep in mind this only needs to be updated when the glue API gets
> > > modified,
> > > > and this "18.02.1" string may remain unmodified for subsequent DPDK
> > > > releases, probably as long as the PMD doesn't use any new rdma-core calls.
> > > >
> > > > We've already backported this patch to 17.02 and 17.11, both requiring
> > > > different sets of Verbs calls and thus a different version, hence the
> > > added
> > > > "18.02" as a starting point. The last digit may have to be modified
> > > possibly
> > > > several times between official DPDK releases while work is being done on
> > > the
> > > > PMD (i.e. per commit).
> > > >
> > > > In short it's not meant to follow DPDK's public versioning scheme. If you
> > > > really think it should, doing so will make things more complex in the
> > > > Makefile, which will have to parse rte_version.h. What's your opinion?
> > > 
> > > What about appending date +%s output to it? It would be stricter and
> > > automated.
> > 
> > Adding current timestamp or date into a build breaks reproducibility of builds, so is
> > generally not recommended.
> 
> Then the sha1sum of mlx4_glue.h.
> With this the size check I mentioned on the other patch would become
> redundant and unnecessary.

Using a strong hash algorithm to version a library/symbol, while possible,
seems a bit overkill and results in ugliness:

 librte_pmd_mlx4.so.c4ca4eaf2fe975ead83453458f4f56db49e724f3

Using a weak one like CRC32 for a shorter name poses a risk of
collision. Moreover the next time someone decides to update all version
notices or modify a comment will impact that hash. We'd need to isolate the
symbol definition itself, ignore parameter names in function prototypes and
only then we may get a somewhat meaningful hash describing a given ABI.

Given the added complexity, is there really a problem with simple version
numbers we increment every time something gets modified? (Note this is
already how our .map files work, they're not generated automatically)

How about keeping things as is?

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 2/8] vhost: avoid enum fields in VhostUserMsg
  @ 2018-02-05 12:16  3% ` Stefan Hajnoczi
  2018-02-06  9:47  0%   ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Stefan Hajnoczi @ 2018-02-05 12:16 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Yuanhan Liu, Stefan Hajnoczi

The VhostUserMsg struct binary representation must match the vhost-user
protocol specification since this struct is read from and written to the
socket.

The VhostUserMsg.request union contains enum fields.  Enum binary
representation is implementation-defined according to the C standard and
it is unportable to make assumptions about the representation:

  6.7.2.2 Enumeration specifiers
  ...
  Each enumerated type shall be compatible with char, a signed integer
  type, or an unsigned integer type. The choice of type is
  implementation-defined, but shall be capable of representing the
  values of all the members of the enumeration.

Additionally, librte_vhost relies on the enum type being unsigned when
validating untrusted inputs:

  if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {

If msg.request.master is signed then negative values pass this check!

Even if we assume gcc on x86_64 (SysV amd64 ABI) and don't care about
portability, the actual enum constants still affect the final type.  For
example, if we add a negative constant then its type changes to signed
int:

  typedef enum VhostUserRequest {
      ...
      VHOST_USER_INVALID = -1,
  };

This is very fragile and it's unlikely that anyone changing the code
would remember this.  A security hole can be introduced accidentally.

This patch switches VhostUserMsg.request fields to uint32_t to avoid the
portability and potential security issues.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost_user.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index d4bd604b9..0fafbe6e0 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -81,8 +81,8 @@ typedef struct VhostUserLog {
 
 typedef struct VhostUserMsg {
 	union {
-		VhostUserRequest master;
-		VhostUserSlaveRequest slave;
+		uint32_t master; /* a VhostUserRequest value */
+		uint32_t slave;  /* a VhostUserSlaveRequest value*/
 	} request;
 
 #define VHOST_USER_VERSION_MASK     0x3
-- 
2.14.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory hotplug changes
  @ 2018-02-05 11:47  0%   ` Bruce Richardson
  2018-02-07 10:11  0%     ` Jerin Jacob
  2018-02-12 15:58  0%   ` Jonas Pfefferle
  2018-02-13  0:24  0%   ` Yongseok Koh
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-02-05 11:47 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Neil Horman, John McNamara, Marko Kovacevic

On Thu, Jan 18, 2018 at 10:32:28AM +0000, Anatoly Burakov wrote:
> Due to coming changes outlined in memory hotplug RFC, there will
> be several API/ABI changes.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] checkpatches.sh: Add checks for ABI symbol addition
  2018-01-31 17:27  6% ` [dpdk-dev] [PATCH v3] " Neil Horman
@ 2018-02-04 14:44  7%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-04 14:44 UTC (permalink / raw)
  To: Neil Horman
  Cc: dev, john.mcnamara, bruce.richardson, Ferruh Yigit, Stephen Hemminger

31/01/2018 18:27, Neil Horman:
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -42,6 +42,7 @@ F: doc/
>  
>  Developers and Maintainers Tools
>  M: Thomas Monjalon <thomas@monjalon.net>
> +M: Neil Horman <nhorman@tuxdriver.com>
>  F: MAINTAINERS
>  F: devtools/check-dup-includes.sh
>  F: devtools/check-maintainers.sh

You don't need to add your name in this general section.
The new file is now in "ABI versioning" section that you already maintain.

Talking about maintenance, you are welcome to put your name
into "Driver information" section :)

> @@ -86,6 +87,7 @@ M: Neil Horman <nhorman@tuxdriver.com>
>  F: lib/librte_compat/
>  F: doc/guides/rel_notes/deprecation.rst
>  F: devtools/validate-abi.sh
> +F: devtools/check-symbol-change.sh
>  F: buildtools/check-experimental-syms.sh

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH] doc: annouce ABI change for RSS configuraiton structure
@ 2018-02-04  7:24  4% Xueming Li
  2018-02-06  7:38  4% ` [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure Xueming Li
  0 siblings, 1 reply; 200+ results
From: Xueming Li @ 2018-02-04  7:24 UTC (permalink / raw)
  To: Thomas Monjalon, Neil Horman; +Cc: Xueming Li, dev, Shahaf Shuler

Update deprecation notice for the new level field of rte_eth_rss_conf.

Link: http://www.dpdk.org/dev/patchwork/patch/31891

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad5988..cdb7f6ba2 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -59,3 +59,7 @@ Deprecation Notices
   be added between the producer and consumer structures. The size of the
   structure and the offset of the fields will remain the same on
   platforms with 64B cache line, but will change on other platforms.
+  
+* ethdev: A new rss level field planned in 18.05.
+  The new API add level field to ``rte_eth_rss_conf`` to enable a choice
+  of RSS hash calculation on outer or inner header of tunneled packet.
-- 
2.13.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration
  2018-02-02 16:46  3% ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
  2018-02-02 16:46  3%   ` [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
@ 2018-02-02 16:52  0%   ` Nélio Laranjeiro
  2018-02-06 11:31  0%   ` Shahaf Shuler
  2 siblings, 0 replies; 200+ results
From: Nélio Laranjeiro @ 2018-02-02 16:52 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Shahaf Shuler, dev, Marcelo Ricardo Leitner

On Fri, Feb 02, 2018 at 05:46:10PM +0100, Adrien Mazarguil wrote:
> The decision to deliver mlx4/mlx5 rdma-core glue plug-ins separately instead
> of generating them at run time due to security concerns [1] led to a few
> issues:
> 
> - They must be present on the file system before running DPDK.
> - Their location must be known to the dynamic linker.
> - Their names overlap and ABI compatibility is not guaranteed, which may
>   lead to crashes.
> 
> This series addresses the above by adding version information to plug-ins
> and taking CONFIG_RTE_EAL_PMD_PATH into account to locate them on the file
> system.
> 
> [1] http://dpdk.org/ml/archives/dev/2018-January/089617.html
> 
> v2 changes:
> 
> - Fixed extra "\n" in glue file name generation (although it didn't break
>   functionality).
> 
> Adrien Mazarguil (4):
>   net/mlx: add debug checks to glue structure
>   net/mlx: fix missing includes for rdma-core glue
>   net/mlx: version rdma-core glue libraries
>   net/mlx: make rdma-core glue path configurable
> 
>  doc/guides/nics/mlx4.rst     | 17 ++++++++++++
>  doc/guides/nics/mlx5.rst     | 14 ++++++++++
>  drivers/net/mlx4/Makefile    |  8 ++++--
>  drivers/net/mlx4/mlx4.c      | 57 ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx4/mlx4_glue.c |  4 +++
>  drivers/net/mlx4/mlx4_glue.h |  9 +++++++
>  drivers/net/mlx5/Makefile    |  8 ++++--
>  drivers/net/mlx5/mlx5.c      | 57 ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_glue.c |  1 +
>  drivers/net/mlx5/mlx5_glue.h |  7 +++++
>  10 files changed, 176 insertions(+), 6 deletions(-)
> 
> -- 
> 2.11.0

For the series,

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries
  2018-02-02 16:46  3% ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
@ 2018-02-02 16:46  3%   ` Adrien Mazarguil
    2018-02-02 16:52  0%   ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Nélio Laranjeiro
  2018-02-06 11:31  0%   ` Shahaf Shuler
  2 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-02-02 16:46 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, dev, Marcelo Ricardo Leitner

When built as separate objects, these libraries do not have unique names.
Since they do not maintain a stable ABI, loading an incompatible library
may result in a crash (e.g. in case multiple versions are installed).

This patch addresses the above by versioning glue libraries, both on the
file system (version suffix) and by comparing a dedicated version field
member in glue structures.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    | 8 ++++++--
 drivers/net/mlx4/mlx4.c      | 5 +++++
 drivers/net/mlx4/mlx4_glue.c | 1 +
 drivers/net/mlx4/mlx4_glue.h | 6 ++++++
 drivers/net/mlx5/Makefile    | 8 ++++++--
 drivers/net/mlx5/mlx5.c      | 5 +++++
 drivers/net/mlx5/mlx5_glue.c | 1 +
 drivers/net/mlx5/mlx5_glue.h | 6 ++++++
 8 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index c004ac71c..cc9db9977 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx4.a
-LIB_GLUE = librte_pmd_mlx4_glue.so
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
+LIB_GLUE_VERSION = 18.02.1
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
@@ -64,6 +66,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 ifeq ($(CONFIG_RTE_LIBRTE_MLX4_DLOPEN_DEPS),y)
 CFLAGS += -DMLX4_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX4_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
 CFLAGS_mlx4_glue.o += -fPIC
 LDLIBS += -ldl
 else
@@ -131,6 +134,7 @@ $(LIB): $(LIB_GLUE)
 
 $(LIB_GLUE): mlx4_glue.o
 	$Q $(LD) $(LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
 		-s -shared -o $@ $< -libverbs -lmlx4
 
 mlx4_glue.o: mlx4_autoconf.h
@@ -139,6 +143,6 @@ endif
 
 clean_mlx4: FORCE
 	$Q rm -f -- mlx4_autoconf.h mlx4_autoconf.h.new
-	$Q rm -f -- mlx4_glue.o $(LIB_GLUE)
+	$Q rm -f -- mlx4_glue.o $(LIB_GLUE_BASE)*
 
 clean: clean_mlx4
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 201d39b6e..61a852fb9 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -808,6 +808,11 @@ rte_mlx4_pmd_init(void)
 			assert(((const void *const *)mlx4_glue)[i]);
 	}
 #endif
+	if (strcmp(mlx4_glue->version, MLX4_GLUE_VERSION)) {
+		ERROR("rdma-core glue \"%s\" mismatch: \"%s\" is required",
+		      mlx4_glue->version, MLX4_GLUE_VERSION);
+		return;
+	}
 	mlx4_glue->fork_init();
 	rte_pci_register(&mlx4_driver);
 }
diff --git a/drivers/net/mlx4/mlx4_glue.c b/drivers/net/mlx4/mlx4_glue.c
index 47ae7ad0f..3b79d320e 100644
--- a/drivers/net/mlx4/mlx4_glue.c
+++ b/drivers/net/mlx4/mlx4_glue.c
@@ -240,6 +240,7 @@ mlx4_glue_dv_set_context_attr(struct ibv_context *context,
 }
 
 const struct mlx4_glue *mlx4_glue = &(const struct mlx4_glue){
+	.version = MLX4_GLUE_VERSION,
 	.fork_init = mlx4_glue_fork_init,
 	.get_async_event = mlx4_glue_get_async_event,
 	.ack_async_event = mlx4_glue_ack_async_event,
diff --git a/drivers/net/mlx4/mlx4_glue.h b/drivers/net/mlx4/mlx4_glue.h
index de251c622..368f906bf 100644
--- a/drivers/net/mlx4/mlx4_glue.h
+++ b/drivers/net/mlx4/mlx4_glue.h
@@ -19,7 +19,13 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#ifndef MLX4_GLUE_VERSION
+#define MLX4_GLUE_VERSION ""
+#endif
+
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx4_glue {
+	const char *version;
 	int (*fork_init)(void);
 	int (*get_async_event)(struct ibv_context *context,
 			       struct ibv_async_event *event);
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 4b20d718b..4086f2039 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx5.a
-LIB_GLUE = librte_pmd_mlx5_glue.so
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
+LIB_GLUE_VERSION = 18.02.1
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
@@ -74,6 +76,7 @@ CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
 CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
 CFLAGS_mlx5_glue.o += -fPIC
 LDLIBS += -ldl
 else
@@ -180,6 +183,7 @@ $(LIB): $(LIB_GLUE)
 
 $(LIB_GLUE): mlx5_glue.o
 	$Q $(LD) $(LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
 		-s -shared -o $@ $< -libverbs -lmlx5
 
 mlx5_glue.o: mlx5_autoconf.h
@@ -188,6 +192,6 @@ endif
 
 clean_mlx5: FORCE
 	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
-	$Q rm -f -- mlx5_glue.o $(LIB_GLUE)
+	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
 
 clean: clean_mlx5
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 050cfac0d..341230d2b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1151,6 +1151,11 @@ rte_mlx5_pmd_init(void)
 			assert(((const void *const *)mlx5_glue)[i]);
 	}
 #endif
+	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
+		ERROR("rdma-core glue \"%s\" mismatch: \"%s\" is required",
+		      mlx5_glue->version, MLX5_GLUE_VERSION);
+		return;
+	}
 	mlx5_glue->fork_init();
 	rte_pci_register(&mlx5_driver);
 }
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index 8f500be6e..1c4396ada 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -308,6 +308,7 @@ mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
 }
 
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
+	.version = MLX5_GLUE_VERSION,
 	.fork_init = mlx5_glue_fork_init,
 	.alloc_pd = mlx5_glue_alloc_pd,
 	.dealloc_pd = mlx5_glue_dealloc_pd,
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 7fed302ba..b5efee3b6 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -19,6 +19,10 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#ifndef MLX5_GLUE_VERSION
+#define MLX5_GLUE_VERSION ""
+#endif
+
 #ifndef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT
 struct ibv_counter_set;
 struct ibv_counter_set_data;
@@ -27,7 +31,9 @@ struct ibv_counter_set_init_attr;
 struct ibv_query_counter_set_attr;
 #endif
 
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
+	const char *version;
 	int (*fork_init)(void);
 	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
 	int (*dealloc_pd)(struct ibv_pd *pd);
-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration
  2018-02-02 15:16  3% [dpdk-dev] [PATCH v1 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
  2018-02-02 15:16  3% ` [dpdk-dev] [PATCH v1 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
@ 2018-02-02 16:46  3% ` Adrien Mazarguil
  2018-02-02 16:46  3%   ` [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
                     ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-02 16:46 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, dev, Marcelo Ricardo Leitner

The decision to deliver mlx4/mlx5 rdma-core glue plug-ins separately instead
of generating them at run time due to security concerns [1] led to a few
issues:

- They must be present on the file system before running DPDK.
- Their location must be known to the dynamic linker.
- Their names overlap and ABI compatibility is not guaranteed, which may
  lead to crashes.

This series addresses the above by adding version information to plug-ins
and taking CONFIG_RTE_EAL_PMD_PATH into account to locate them on the file
system.

[1] http://dpdk.org/ml/archives/dev/2018-January/089617.html

v2 changes:

- Fixed extra "\n" in glue file name generation (although it didn't break
  functionality).

Adrien Mazarguil (4):
  net/mlx: add debug checks to glue structure
  net/mlx: fix missing includes for rdma-core glue
  net/mlx: version rdma-core glue libraries
  net/mlx: make rdma-core glue path configurable

 doc/guides/nics/mlx4.rst     | 17 ++++++++++++
 doc/guides/nics/mlx5.rst     | 14 ++++++++++
 drivers/net/mlx4/Makefile    |  8 ++++--
 drivers/net/mlx4/mlx4.c      | 57 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4_glue.c |  4 +++
 drivers/net/mlx4/mlx4_glue.h |  9 +++++++
 drivers/net/mlx5/Makefile    |  8 ++++--
 drivers/net/mlx5/mlx5.c      | 57 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_glue.c |  1 +
 drivers/net/mlx5/mlx5_glue.h |  7 +++++
 10 files changed, 176 insertions(+), 6 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 3/4] net/mlx: version rdma-core glue libraries
  2018-02-02 15:16  3% [dpdk-dev] [PATCH v1 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
@ 2018-02-02 15:16  3% ` Adrien Mazarguil
  2018-02-02 16:46  3% ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
  1 sibling, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-02 15:16 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, dev, Marcelo Ricardo Leitner

When built as separate objects, these libraries do not have unique names.
Since they do not maintain a stable ABI, loading an incompatible library
may result in a crash (e.g. in case multiple versions are installed).

This patch addresses the above by versioning glue libraries, both on the
file system (version suffix) and by comparing a dedicated version field
member in glue structures.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    | 8 ++++++--
 drivers/net/mlx4/mlx4.c      | 5 +++++
 drivers/net/mlx4/mlx4_glue.c | 1 +
 drivers/net/mlx4/mlx4_glue.h | 6 ++++++
 drivers/net/mlx5/Makefile    | 8 ++++++--
 drivers/net/mlx5/mlx5.c      | 5 +++++
 drivers/net/mlx5/mlx5_glue.c | 1 +
 drivers/net/mlx5/mlx5_glue.h | 6 ++++++
 8 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index c004ac71c..cc9db9977 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx4.a
-LIB_GLUE = librte_pmd_mlx4_glue.so
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx4_glue.so
+LIB_GLUE_VERSION = 18.02.1
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
@@ -64,6 +66,7 @@ CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
 ifeq ($(CONFIG_RTE_LIBRTE_MLX4_DLOPEN_DEPS),y)
 CFLAGS += -DMLX4_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX4_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
 CFLAGS_mlx4_glue.o += -fPIC
 LDLIBS += -ldl
 else
@@ -131,6 +134,7 @@ $(LIB): $(LIB_GLUE)
 
 $(LIB_GLUE): mlx4_glue.o
 	$Q $(LD) $(LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
 		-s -shared -o $@ $< -libverbs -lmlx4
 
 mlx4_glue.o: mlx4_autoconf.h
@@ -139,6 +143,6 @@ endif
 
 clean_mlx4: FORCE
 	$Q rm -f -- mlx4_autoconf.h mlx4_autoconf.h.new
-	$Q rm -f -- mlx4_glue.o $(LIB_GLUE)
+	$Q rm -f -- mlx4_glue.o $(LIB_GLUE_BASE)*
 
 clean: clean_mlx4
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 201d39b6e..61a852fb9 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -808,6 +808,11 @@ rte_mlx4_pmd_init(void)
 			assert(((const void *const *)mlx4_glue)[i]);
 	}
 #endif
+	if (strcmp(mlx4_glue->version, MLX4_GLUE_VERSION)) {
+		ERROR("rdma-core glue \"%s\" mismatch: \"%s\" is required",
+		      mlx4_glue->version, MLX4_GLUE_VERSION);
+		return;
+	}
 	mlx4_glue->fork_init();
 	rte_pci_register(&mlx4_driver);
 }
diff --git a/drivers/net/mlx4/mlx4_glue.c b/drivers/net/mlx4/mlx4_glue.c
index 47ae7ad0f..3b79d320e 100644
--- a/drivers/net/mlx4/mlx4_glue.c
+++ b/drivers/net/mlx4/mlx4_glue.c
@@ -240,6 +240,7 @@ mlx4_glue_dv_set_context_attr(struct ibv_context *context,
 }
 
 const struct mlx4_glue *mlx4_glue = &(const struct mlx4_glue){
+	.version = MLX4_GLUE_VERSION,
 	.fork_init = mlx4_glue_fork_init,
 	.get_async_event = mlx4_glue_get_async_event,
 	.ack_async_event = mlx4_glue_ack_async_event,
diff --git a/drivers/net/mlx4/mlx4_glue.h b/drivers/net/mlx4/mlx4_glue.h
index de251c622..368f906bf 100644
--- a/drivers/net/mlx4/mlx4_glue.h
+++ b/drivers/net/mlx4/mlx4_glue.h
@@ -19,7 +19,13 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#ifndef MLX4_GLUE_VERSION
+#define MLX4_GLUE_VERSION ""
+#endif
+
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx4_glue {
+	const char *version;
 	int (*fork_init)(void);
 	int (*get_async_event)(struct ibv_context *context,
 			       struct ibv_async_event *event);
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 4b20d718b..4086f2039 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -33,7 +33,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 # Library name.
 LIB = librte_pmd_mlx5.a
-LIB_GLUE = librte_pmd_mlx5_glue.so
+LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
+LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
+LIB_GLUE_VERSION = 18.02.1
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
@@ -74,6 +76,7 @@ CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -Wno-strict-prototypes
 ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS),y)
 CFLAGS += -DMLX5_GLUE='"$(LIB_GLUE)"'
+CFLAGS += -DMLX5_GLUE_VERSION='"$(LIB_GLUE_VERSION)"'
 CFLAGS_mlx5_glue.o += -fPIC
 LDLIBS += -ldl
 else
@@ -180,6 +183,7 @@ $(LIB): $(LIB_GLUE)
 
 $(LIB_GLUE): mlx5_glue.o
 	$Q $(LD) $(LDFLAGS) $(EXTRA_LDFLAGS) \
+		-Wl,-h,$(LIB_GLUE) \
 		-s -shared -o $@ $< -libverbs -lmlx5
 
 mlx5_glue.o: mlx5_autoconf.h
@@ -188,6 +192,6 @@ endif
 
 clean_mlx5: FORCE
 	$Q rm -f -- mlx5_autoconf.h mlx5_autoconf.h.new
-	$Q rm -f -- mlx5_glue.o $(LIB_GLUE)
+	$Q rm -f -- mlx5_glue.o $(LIB_GLUE_BASE)*
 
 clean: clean_mlx5
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 050cfac0d..341230d2b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1151,6 +1151,11 @@ rte_mlx5_pmd_init(void)
 			assert(((const void *const *)mlx5_glue)[i]);
 	}
 #endif
+	if (strcmp(mlx5_glue->version, MLX5_GLUE_VERSION)) {
+		ERROR("rdma-core glue \"%s\" mismatch: \"%s\" is required",
+		      mlx5_glue->version, MLX5_GLUE_VERSION);
+		return;
+	}
 	mlx5_glue->fork_init();
 	rte_pci_register(&mlx5_driver);
 }
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index 8f500be6e..1c4396ada 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -308,6 +308,7 @@ mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
 }
 
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
+	.version = MLX5_GLUE_VERSION,
 	.fork_init = mlx5_glue_fork_init,
 	.alloc_pd = mlx5_glue_alloc_pd,
 	.dealloc_pd = mlx5_glue_dealloc_pd,
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index 7fed302ba..b5efee3b6 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -19,6 +19,10 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#ifndef MLX5_GLUE_VERSION
+#define MLX5_GLUE_VERSION ""
+#endif
+
 #ifndef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT
 struct ibv_counter_set;
 struct ibv_counter_set_data;
@@ -27,7 +31,9 @@ struct ibv_counter_set_init_attr;
 struct ibv_query_counter_set_attr;
 #endif
 
+/* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
+	const char *version;
 	int (*fork_init)(void);
 	struct ibv_pd *(*alloc_pd)(struct ibv_context *context);
 	int (*dealloc_pd)(struct ibv_pd *pd);
-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 0/4] net/mlx: enhance rdma-core glue configuration
@ 2018-02-02 15:16  3% Adrien Mazarguil
  2018-02-02 15:16  3% ` [dpdk-dev] [PATCH v1 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
  2018-02-02 16:46  3% ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
  0 siblings, 2 replies; 200+ results
From: Adrien Mazarguil @ 2018-02-02 15:16 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Nelio Laranjeiro, dev, Marcelo Ricardo Leitner

The decision to deliver mlx4/mlx5 rdma-core glue plug-ins separately instead
of generating them at run time due to security concerns [1] led to a few
issues:

- They must be present on the file system before running DPDK.
- Their location must be known to the dynamic linker.
- Their names overlap and ABI compatibility is not guaranteed, which may
  lead to crashes.

This series addresses the above by adding version information to plug-ins
and taking CONFIG_RTE_EAL_PMD_PATH into account to locate them on the file
system.

[1] http://dpdk.org/ml/archives/dev/2018-January/089617.html

Adrien Mazarguil (4):
  net/mlx: add debug checks to glue structure
  net/mlx: fix missing includes for rdma-core glue
  net/mlx: version rdma-core glue libraries
  net/mlx: make rdma-core glue path configurable

 doc/guides/nics/mlx4.rst     | 17 ++++++++++++
 doc/guides/nics/mlx5.rst     | 14 ++++++++++
 drivers/net/mlx4/Makefile    |  8 ++++--
 drivers/net/mlx4/mlx4.c      | 57 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4_glue.c |  4 +++
 drivers/net/mlx4/mlx4_glue.h |  9 +++++++
 drivers/net/mlx5/Makefile    |  8 ++++--
 drivers/net/mlx5/mlx5.c      | 57 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_glue.c |  1 +
 drivers/net/mlx5/mlx5_glue.h |  7 +++++
 10 files changed, 176 insertions(+), 6 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] doc: remove eal API for default mempool ops name
  2018-02-02  8:31 10%   ` [dpdk-dev] [PATCH v2] " Hemant Agrawal
@ 2018-02-02 14:01  0%     ` Olivier Matz
  2018-02-13 11:28  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-02-02 14:01 UTC (permalink / raw)
  To: Hemant Agrawal
  Cc: thomas, pbhagavatula, nipun.gupta, jerin.jacob, santosh.shukla, dev

On Fri, Feb 02, 2018 at 02:01:42PM +0530, Hemant Agrawal wrote:
> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
> v2: fix checkpatch errors
> 
>  doc/guides/rel_notes/deprecation.rst | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d59ad59..c7d8f25 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,15 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
>  
> +* eal: a new set of mbuf mempool ops name APIs for user, platform and best
> +  mempool names have been defined in ``rte_mbuf`` in v18.02. The uses of
> +  ``rte_eal_mbuf_default_mempool_ops`` shall be replaced by
> +  ``rte_mbuf_best_mempool_ops``.
> +  The following function is now redundant and it is target to be deprecated in
> +  18.05:
> +
> +  - ``rte_eal_mbuf_default_mempool_ops``
> +
>  * eal: several API and ABI changes are planned for ``rte_devargs`` in v18.02.
>    The format of device command line parameters will change. The bus will need
>    to be explicitly stated in the device declaration. The enum ``rte_devtype``

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
  2018-02-02  9:07  4%             ` De Lara Guarch, Pablo
@ 2018-02-02 10:52  4%               ` Verma, Shally
  0 siblings, 0 replies; 200+ results
From: Verma, Shally @ 2018-02-02 10:52 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, Akhil Goyal, Trahe, Fiona, hemant.agrawal,
	Doherty, Declan, Griffin, John, Jain, Deepak K, jck, tdu, dima,
	nsamsono, jianbo.liu, Jacob,  Jerin, Athreya, Narayana Prasad,
	Murthy, Nidadavolu
  Cc: dev



>-----Original Message-----
>From: De Lara Guarch, Pablo [mailto:pablo.de.lara.guarch@intel.com]
>Sent: 02 February 2018 14:38
>To: Verma, Shally <Shally.Verma@cavium.com>; Akhil Goyal <akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>; Jain, Deepak K
><deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
>jianbo.liu@arm.com; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
><NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu <Nidadavolu.Murthy@cavium.com>
>Cc: dev@dpdk.org
>Subject: RE: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
>
>Hi Shally,
>
>> -----Original Message-----
>> From: Verma, Shally [mailto:Shally.Verma@cavium.com]
>> Sent: Tuesday, January 30, 2018 11:54 AM
>> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Akhil Goyal
>> <akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>> hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>;
>> Griffin, John <john.griffin@intel.com>; Jain, Deepak K
>> <deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com;
>> dima@marvell.com; nsamsono@marvell.com; jianbo.liu@arm.com; Jacob,
>> Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
>> <NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
>> <Nidadavolu.Murthy@cavium.com>
>> Cc: dev@dpdk.org
>> Subject: RE: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info
>> struct
>>
>>
>>
>> >-----Original Message-----
>> >From: De Lara Guarch, Pablo [mailto:pablo.de.lara.guarch@intel.com]
>> >Sent: 30 January 2018 16:51
>> >To: Verma, Shally <Shally.Verma@cavium.com>; Akhil Goyal
>> ><akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>> >hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>;
>> >Griffin, John <john.griffin@intel.com>; Jain, Deepak K
>> ><deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com;
>> >dima@marvell.com; nsamsono@marvell.com; jianbo.liu@arm.com; Jacob,
>> >Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
>> ><NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
>> ><Nidadavolu.Murthy@cavium.com>
>> >Cc: dev@dpdk.org
>> >Subject: RE: [dpdk-dev] [PATCH] doc: announce ABI change for crypto
>> >info struct
>> >
>> >Hi Shally/Ahkil,
>> >
>> >> -----Original Message-----
>> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Verma, Shally
>> >> Sent: Tuesday, January 30, 2018 7:56 AM
>> >> To: Akhil Goyal <akhil.goyal@nxp.com>; De Lara Guarch, Pablo
>> >> <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
>> >> <fiona.trahe@intel.com>; hemant.agrawal@nxp.com; Doherty, Declan
>> >> <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>;
>> >> Jain, Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
>> >> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
>> >> jianbo.liu@arm.com; Jacob, Jerin
>> >> <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
>> >> <NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
>> >> <Nidadavolu.Murthy@cavium.com>
>> >> Cc: dev@dpdk.org
>> >> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto
>> >> info struct
>> >>
>> >> I do see current cryptodev unit testcase (inside \test dir) uses
>> >> info.sym.max_nb_sessions param for session mempool_create. So, such
>> >> testcases change are also in proposal?
>> >
>> >Yes, for these tests, we can just define a macro in the tests, instead of
>> using the info structure.
>>
>> [Shally] Ok, then you mean applications will choose any random number
>> during mempool_create and not dependent on device max_nb_sessions?
>
>Yes, actually for the unit tests, even one session is enough.
>
>>
>> >>
>> >> Another point, we recently submitted an RFC patch on lib/cryptodev
>> >> with asymmetric crypto support
>> >> (https://dpdk.org/dev/patchwork/patch/34308/) which is awaiting
>> >> review and these fields have role to play there.
>> >> So, could this change be please viewed in conjunction with asym RFC?
>> >
>> >Do you need it for asymmetric? Anyway, this would remove the
>> symmetric function and structures, not applicable for you.
>>
>> [Shally] I would say addition of asym in lib/cryptodev is not entirely
>> standalone, specifically for PMDs that can support both.
>> My key concern are max_nb_sessions_per_qp and related
>> qp_attach_sym/asym APIs which enable management of queue distribution
>> among sym and asym in current proposal, specifically, for PMDs that can
>> support both but have dedicated qp for each. Right now proposal is open
>> for feedback and would prefer to be covered before sym related changes
>> could be applied.
>
>Actually, I have been thinking about this. Given the time we have until 18.02 is out,
>and that this is not urgent to be applied (this is just code cleanup),
>I am postponing this until next release.
>
[Shally] Ok. Thanks for acknowledging this.

>My other reason is that the info structure has a rte_pci_device pointer which should be removed.
>However, I believe it is better to leave it for next release and discuss it with other libraries which has this, like ethdev.
>
>Thanks,
>Pablo


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
  2018-01-30 11:53  4%           ` Verma, Shally
@ 2018-02-02  9:07  4%             ` De Lara Guarch, Pablo
  2018-02-02 10:52  4%               ` Verma, Shally
  0 siblings, 1 reply; 200+ results
From: De Lara Guarch, Pablo @ 2018-02-02  9:07 UTC (permalink / raw)
  To: Verma, Shally, Akhil Goyal, Trahe, Fiona, hemant.agrawal,
	Doherty, Declan, Griffin, John, Jain, Deepak K, jck, tdu, dima,
	nsamsono, jianbo.liu, Jacob,  Jerin, Athreya, Narayana Prasad,
	Murthy, Nidadavolu
  Cc: dev

Hi Shally,

> -----Original Message-----
> From: Verma, Shally [mailto:Shally.Verma@cavium.com]
> Sent: Tuesday, January 30, 2018 11:54 AM
> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Akhil Goyal
> <akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
> hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>;
> Griffin, John <john.griffin@intel.com>; Jain, Deepak K
> <deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com;
> dima@marvell.com; nsamsono@marvell.com; jianbo.liu@arm.com; Jacob,
> Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
> <NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
> <Nidadavolu.Murthy@cavium.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info
> struct
> 
> 
> 
> >-----Original Message-----
> >From: De Lara Guarch, Pablo [mailto:pablo.de.lara.guarch@intel.com]
> >Sent: 30 January 2018 16:51
> >To: Verma, Shally <Shally.Verma@cavium.com>; Akhil Goyal
> ><akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
> >hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>;
> >Griffin, John <john.griffin@intel.com>; Jain, Deepak K
> ><deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com;
> >dima@marvell.com; nsamsono@marvell.com; jianbo.liu@arm.com; Jacob,
> >Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
> ><NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
> ><Nidadavolu.Murthy@cavium.com>
> >Cc: dev@dpdk.org
> >Subject: RE: [dpdk-dev] [PATCH] doc: announce ABI change for crypto
> >info struct
> >
> >Hi Shally/Ahkil,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Verma, Shally
> >> Sent: Tuesday, January 30, 2018 7:56 AM
> >> To: Akhil Goyal <akhil.goyal@nxp.com>; De Lara Guarch, Pablo
> >> <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> >> <fiona.trahe@intel.com>; hemant.agrawal@nxp.com; Doherty, Declan
> >> <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>;
> >> Jain, Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
> >> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
> >> jianbo.liu@arm.com; Jacob, Jerin
> >> <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
> >> <NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
> >> <Nidadavolu.Murthy@cavium.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto
> >> info struct
> >>
> >> I do see current cryptodev unit testcase (inside \test dir) uses
> >> info.sym.max_nb_sessions param for session mempool_create. So, such
> >> testcases change are also in proposal?
> >
> >Yes, for these tests, we can just define a macro in the tests, instead of
> using the info structure.
> 
> [Shally] Ok, then you mean applications will choose any random number
> during mempool_create and not dependent on device max_nb_sessions?

Yes, actually for the unit tests, even one session is enough.

> 
> >>
> >> Another point, we recently submitted an RFC patch on lib/cryptodev
> >> with asymmetric crypto support
> >> (https://dpdk.org/dev/patchwork/patch/34308/) which is awaiting
> >> review and these fields have role to play there.
> >> So, could this change be please viewed in conjunction with asym RFC?
> >
> >Do you need it for asymmetric? Anyway, this would remove the
> symmetric function and structures, not applicable for you.
> 
> [Shally] I would say addition of asym in lib/cryptodev is not entirely
> standalone, specifically for PMDs that can support both.
> My key concern are max_nb_sessions_per_qp and related
> qp_attach_sym/asym APIs which enable management of queue distribution
> among sym and asym in current proposal, specifically, for PMDs that can
> support both but have dedicated qp for each. Right now proposal is open
> for feedback and would prefer to be covered before sym related changes
> could be applied.

Actually, I have been thinking about this. Given the time we have until 18.02 is out,
and that this is not urgent to be applied (this is just code cleanup),
I am postponing this until next release. 

My other reason is that the info structure has a rte_pci_device pointer which should be removed.
However, I believe it is better to leave it for next release and discuss it with other libraries which has this, like ethdev.

Thanks,
Pablo


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: remove eal API for default mempool ops name
  2018-02-02  8:03 10% ` [dpdk-dev] [PATCH] doc: remove eal API for default mempool ops name Hemant Agrawal
@ 2018-02-02  8:31 10%   ` Hemant Agrawal
  2018-02-02 14:01  0%     ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Hemant Agrawal @ 2018-02-02  8:31 UTC (permalink / raw)
  To: olivier.matz, thomas, pbhagavatula
  Cc: nipun.gupta, jerin.jacob, santosh.shukla, dev, Hemant Agrawal

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v2: fix checkpatch errors

 doc/guides/rel_notes/deprecation.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad59..c7d8f25 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,15 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* eal: a new set of mbuf mempool ops name APIs for user, platform and best
+  mempool names have been defined in ``rte_mbuf`` in v18.02. The uses of
+  ``rte_eal_mbuf_default_mempool_ops`` shall be replaced by
+  ``rte_mbuf_best_mempool_ops``.
+  The following function is now redundant and it is target to be deprecated in
+  18.05:
+
+  - ``rte_eal_mbuf_default_mempool_ops``
+
 * eal: several API and ABI changes are planned for ``rte_devargs`` in v18.02.
   The format of device command line parameters will change. The bus will need
   to be explicitly stated in the device declaration. The enum ``rte_devtype``
-- 
2.7.4

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH] doc: remove eal API for default mempool ops name
    @ 2018-02-02  8:03 10% ` Hemant Agrawal
  2018-02-02  8:31 10%   ` [dpdk-dev] [PATCH v2] " Hemant Agrawal
  1 sibling, 1 reply; 200+ results
From: Hemant Agrawal @ 2018-02-02  8:03 UTC (permalink / raw)
  To: olivier.matz, thomas, pbhagavatula
  Cc: nipun.gupta, jerin.jacob, santosh.shukla, dev, Hemant Agrawal

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad59..a2b391c 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,15 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* eal: a new set of mbuf mempool ops name APIs for user, platform and best
+  mempool names have been defined in ``rte_mbuf`` in v18.02. The uses of
+  ``rte_eal_mbuf_default_mempool_ops`` shall be replaced by
+  ``rte_mbuf_best_mempool_ops``.
+  The following function is now redundant and it is target to be deprecated in 
+  18.05:
+
+  - ``rte_eal_mbuf_default_mempool_ops``
+  
 * eal: several API and ABI changes are planned for ``rte_devargs`` in v18.02.
   The format of device command line parameters will change. The bus will need
   to be explicitly stated in the device declaration. The enum ``rte_devtype``
-- 
2.7.4

^ permalink raw reply	[relevance 10%]

* Re: [dpdk-dev] [PATCH 1/2] Revert "eal: fix default mempool ops"
  2018-02-01 20:40  3%   ` Pavan Nikhilesh
@ 2018-02-02  5:43  0%     ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2018-02-02  5:43 UTC (permalink / raw)
  To: Pavan Nikhilesh, olivier.matz, thomas, jerin.jacob; +Cc: dev

HI Pavan,
> 	Currently, best_mempool_ops is broken because when
> rte_mbuf_user_mempool_ops is invoked it is expected to returns 'NULL' through
> internal_config.user_mbuf_pool_ops_name. IMO it is best to create a named
> memzone ('mbuf_user_pool_ops') at the end of eal_init and copy mbuf-pool-ops
> passed to eal.
> 
> `rte_eal_mbuf_default_mempool_ops` is not expected to return 'NULL' would
> doing so break the ABI?.
> 
> ---
> /**
>  * Get default pool ops name for mbuf
>  *
>  * @return
>  *   returns default pool ops name.
>  */
> const char *
> rte_eal_mbuf_default_mempool_ops(void);
> ---
> 
> IMO creating named mempool at the end of eal_init and changing
> `rte_mbuf_user_mempool_ops` as below would be a better solution.
> 
> rte_mbuf_user_mempool_ops(void)
> {
> ...
>         mz = rte_memzone_lookup("mbuf_user_pool_ops");
>         if (mz == NULL)
>                 return NULL;
> 	...
> }
> 
> Thoughts?


[Hemant]  It seems reasonable. We can also deprecate the eal default mempool ops API . I will be sending patch shortly.
 
Unfortunately all NXP platforms are broken at the moment, so we need to get it fixed fast.

Hemant

> 
> Pavan.
> 
> On Thu, Feb 01, 2018 at 07:56:47PM +0000, Hemant Agrawal wrote:
> > Hi Pavan,
> > 	Your patch was breaking the design of the best_mempool_ops and the
> whole purpose of selection was getting lost.
> > I guess you were trying to fix  test_mempool.  I have sent another
> > patch, which fixes that and start using the best mempool ops API instead of
> default mempool ops API.
> >
> > Regards,
> > Hemant
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hemant Agrawal
> > > Sent: Friday, February 02, 2018 1:17 AM
> > > To: olivier.matz@6wind.com; pbhagavatula@caviumnetworks.com
> > > Cc: thomas@monjalon.net; dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH 1/2] Revert "eal: fix default mempool ops"
> > >
> > > This reverts commit fe06cb6c54fe5ada299ebba40a382bee37c919f2.
> > > ---

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] Revert "eal: fix default mempool ops"
  @ 2018-02-01 20:40  3%   ` Pavan Nikhilesh
  2018-02-02  5:43  0%     ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Pavan Nikhilesh @ 2018-02-01 20:40 UTC (permalink / raw)
  To: Hemant Agrawal, olivier.matz, thomas, jerin.jacob; +Cc: dev

Hi Hemanth,

	Currently, best_mempool_ops is broken because when
rte_mbuf_user_mempool_ops is invoked it is expected to returns 'NULL' through
internal_config.user_mbuf_pool_ops_name. IMO it is best to create a named
memzone ('mbuf_user_pool_ops') at the end of eal_init and copy mbuf-pool-ops
passed to eal.

`rte_eal_mbuf_default_mempool_ops` is not expected to return 'NULL' would doing
so break the ABI?.

---
/**
 * Get default pool ops name for mbuf
 *
 * @return
 *   returns default pool ops name.
 */
const char *
rte_eal_mbuf_default_mempool_ops(void);
---

IMO creating named mempool at the end of eal_init and changing
`rte_mbuf_user_mempool_ops` as below would be a better solution.

rte_mbuf_user_mempool_ops(void)
{
...
        mz = rte_memzone_lookup("mbuf_user_pool_ops");
        if (mz == NULL)
                return NULL;
	...
}

Thoughts?

Pavan.

On Thu, Feb 01, 2018 at 07:56:47PM +0000, Hemant Agrawal wrote:
> Hi Pavan,
> 	Your patch was breaking the design of the best_mempool_ops and the whole purpose of selection was getting lost.
> I guess you were trying to fix  test_mempool.  I have sent another patch, which fixes that and start using the best mempool ops API
> instead of default mempool ops API.
>
> Regards,
> Hemant
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hemant Agrawal
> > Sent: Friday, February 02, 2018 1:17 AM
> > To: olivier.matz@6wind.com; pbhagavatula@caviumnetworks.com
> > Cc: thomas@monjalon.net; dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH 1/2] Revert "eal: fix default mempool ops"
> >
> > This reverts commit fe06cb6c54fe5ada299ebba40a382bee37c919f2.
> > ---

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool
  2018-02-01  6:40  4%   ` Jerin Jacob
@ 2018-02-01 12:53  4%     ` Hemant Agrawal
  2018-02-14 15:23  4%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Hemant Agrawal @ 2018-02-01 12:53 UTC (permalink / raw)
  To: Jerin Jacob, Olivier Matz; +Cc: Andrew Rybchenko, dev

On 2/1/2018 12:10 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Wed, 31 Jan 2018 17:46:51 +0100
>> From: Olivier Matz <olivier.matz@6wind.com>
>> To: Andrew Rybchenko <arybchenko@solarflare.com>
>> CC: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool
>> User-Agent: NeoMutt/20170113 (1.7.2)
>>
>> On Tue, Jan 23, 2018 at 01:23:04PM +0000, Andrew Rybchenko wrote:
>>> An API/ABI changes are planned for 18.05 [1]:
>>>
>>>   * Allow to customize how mempool objects are stored in memory.
>>>   * Deprecate mempool XMEM API.
>>>   * Add mempool driver ops to get information from mempool driver and
>>>     dequeue contiguous blocks of objects if driver supports it.
>>>
>>> [1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
>>>
>>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>
>> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> 
> 
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] relicense various bits of the dpdk
  2018-02-01 12:19  8% ` [dpdk-dev] [PATCH v2] " Neil Horman
@ 2018-02-01 12:49  0%   ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2018-02-01 12:49 UTC (permalink / raw)
  To: Neil Horman, dev; +Cc: Thomas Monjalon

On 2/1/2018 5:49 PM, Neil Horman wrote:
> Received a note the other day from the Linux Foundation governance board
> for DPDK indicating that several files I have copyright on need to be
> relicensed to be compliant with the DPDK licensing guidelines.  I have
> some concerns with some parts of the request, but am not opposed to
> other parts.  So, for those pieces that we are in consensus on, I'm
> proposing that we change their license from BSD 2 clause to 3 clause.
> I'm also updating the files to use the SPDX licensing scheme
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Hemant Agrawal <hemant.agrawal@nxp.com>
> CC: Thomas Monjalon <thomas@monjalon.net>
> 
> ---
> Change notes
> V2) Cleaned up formatting (tmonjalon)
> ---
>   devtools/validate-abi.sh       | 32 ++++----------------------------
>   lib/librte_compat/rte_compat.h | 31 +++----------------------------
>   2 files changed, 7 insertions(+), 56 deletions(-)
> 

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] relicense various bits of the dpdk
  @ 2018-02-01 12:19  8% ` Neil Horman
  2018-02-01 12:49  0%   ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-02-01 12:19 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman, Hemant Agrawal, Thomas Monjalon

Received a note the other day from the Linux Foundation governance board
for DPDK indicating that several files I have copyright on need to be
relicensed to be compliant with the DPDK licensing guidelines.  I have
some concerns with some parts of the request, but am not opposed to
other parts.  So, for those pieces that we are in consensus on, I'm
proposing that we change their license from BSD 2 clause to 3 clause.
I'm also updating the files to use the SPDX licensing scheme

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Hemant Agrawal <hemant.agrawal@nxp.com>
CC: Thomas Monjalon <thomas@monjalon.net>

---
Change notes
V2) Cleaned up formatting (tmonjalon)
---
 devtools/validate-abi.sh       | 32 ++++----------------------------
 lib/librte_compat/rte_compat.h | 31 +++----------------------------
 2 files changed, 7 insertions(+), 56 deletions(-)

diff --git a/devtools/validate-abi.sh b/devtools/validate-abi.sh
index 8caf43e83..ee64b08fa 100755
--- a/devtools/validate-abi.sh
+++ b/devtools/validate-abi.sh
@@ -1,32 +1,8 @@
 #!/usr/bin/env bash
-#   BSD LICENSE
-#
-#   Copyright(c) 2015 Neil Horman. All rights reserved.
-#   Copyright(c) 2017 6WIND S.A.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-#     * Redistributions of source code must retain the above copyright
-#       notice, this list of conditions and the following disclaimer.
-#     * Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in
-#       the documentation and/or other materials provided with the
-#       distribution.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+# SPDX-License-Identifier:	BSD-3-Clause
+# Copyright(c) 2015 Neil Horman. All rights reserved.
+# Copyright(c) 2017 6WIND S.A.
+# All rights reserved
 
 set -e
 
diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
index d6e79f3fc..2cdc37214 100644
--- a/lib/librte_compat/rte_compat.h
+++ b/lib/librte_compat/rte_compat.h
@@ -1,31 +1,6 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2015 Neil Horman <nhorman@tuxdriver.com>.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+/* SPDX-License-Identifier: BSD-3-Clause 
+ * Copyright(c) 2015 Neil Horman <nhorman@tuxdriver.com>.
+ * All rights reserved.
  */
 
 #ifndef _RTE_COMPAT_H_
-- 
2.14.3

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool
  2018-01-31 16:46  4% ` Olivier Matz
@ 2018-02-01  6:40  4%   ` Jerin Jacob
  2018-02-01 12:53  4%     ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-02-01  6:40 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Andrew Rybchenko, dev

-----Original Message-----
> Date: Wed, 31 Jan 2018 17:46:51 +0100
> From: Olivier Matz <olivier.matz@6wind.com>
> To: Andrew Rybchenko <arybchenko@solarflare.com>
> CC: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool
> User-Agent: NeoMutt/20170113 (1.7.2)
> 
> On Tue, Jan 23, 2018 at 01:23:04PM +0000, Andrew Rybchenko wrote:
> > An API/ABI changes are planned for 18.05 [1]:
> > 
> >  * Allow to customize how mempool objects are stored in memory.
> >  * Deprecate mempool XMEM API.
> >  * Add mempool driver ops to get information from mempool driver and
> >    dequeue contiguous blocks of objects if driver supports it.
> > 
> > [1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
> > 
> > Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] net/octeontx: register fpa as platform HW mempool
  @ 2018-01-31 19:51  4% ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-01-31 19:51 UTC (permalink / raw)
  To: Pavan Nikhilesh, jerin.jacob, santosh.shukla, olivier.matz,
	hemant.agrawal
  Cc: dev, Neil Horman

On 1/22/2018 3:45 PM, Pavan Nikhilesh wrote:
> Register octeontx-fpavf as platform HW mempool when net/octeontx pmd is
> used.
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
> ---
> 
>  This patch depends on http://dpdk.org/dev/patchwork/patch/34239 patchset.

This patch was waiting dependent patch, which seems merged now.

But now because of "__rte_experimental" tag in
rte_mbuf_set_platform_mempool_ops() that this patch uses getting build errors [1].

Need to add a special note to pmd makefile to allow experimental API usage:
CFLAGS += -DALLOW_EXPERIMENTAL_API



[1]
...dpdk/drivers/net/octeontx/octeontx_ethdev.c:1330:2: error:
'rte_mbuf_set_platform_mempool_ops' is deprecated: Symbol is not yet part of
stable ABI [-Werror,-Wdeprecate
d-declarations]


        rte_mbuf_set_platform_mempool_ops("octeontx_fpavf");
        ^

...dpdk/x86_64-native-linuxapp-clang/include/rte_mbuf_pool_ops.h:37:5: note:
'rte_mbuf_set_platform_mempool_ops' has been explicitly marked deprecated here
int __rte_experimental

    ^


...dpdk/x86_64-native-linuxapp-clang/include/rte_compat.h:107:16: note: expanded
from macro '__rte_experimental'

__attribute__((deprecated("Symbol is not yet part of stable ABI"), \
               ^

1 error generated.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3] checkpatches.sh: Add checks for ABI symbol addition
  @ 2018-01-31 17:27  6% ` Neil Horman
  2018-02-04 14:44  7%   ` Thomas Monjalon
  2018-02-05 17:29  6% ` [dpdk-dev] [PATCH v4] " Neil Horman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-01-31 17:27 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, thomas, john.mcnamara, bruce.richardson,
	Ferruh Yigit, Stephen Hemminger

Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental.  The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file.  The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change.  It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: thomas@monjalon.net
CC: john.mcnamara@intel.com
CC: bruce.richardson@intel.com
CC: Ferruh Yigit <ferruh.yigit@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>

---
Change notes

v2)
 * Cleaned up and documented awk script (shemminger)
 * fixed sort/uniq usage (shemminger)
 * moved checking to new script (tmonjalon)
 * added maintainer entry (tmonjalon)
 * added license (tmonjalon)

v3)
 * Changed symbol check script name (tmonjalon)
 * Trapped exit to clean temp file (tmonjalon)
 * Honored verbose command (tmonjalon)
 * Cleaned left over debug bits (tmonjalon)
 * Updated location in MAINTAINERS file (tmonjalon)
---
 MAINTAINERS                     |   2 +
 devtools/check-symbol-change.sh | 146 ++++++++++++++++++++++++++++++++++++++++
 devtools/checkpatches.sh        |  23 ++++++-
 3 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100755 devtools/check-symbol-change.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index acd056134..417115f97 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -42,6 +42,7 @@ F: doc/
 
 Developers and Maintainers Tools
 M: Thomas Monjalon <thomas@monjalon.net>
+M: Neil Horman <nhorman@tuxdriver.com>
 F: MAINTAINERS
 F: devtools/check-dup-includes.sh
 F: devtools/check-maintainers.sh
@@ -86,6 +87,7 @@ M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: devtools/validate-abi.sh
+F: devtools/check-symbol-change.sh
 F: buildtools/check-experimental-syms.sh
 
 Driver information
diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
new file mode 100755
index 000000000..22b17e6f2
--- /dev/null
+++ b/devtools/check-symbol-change.sh
@@ -0,0 +1,146 @@
+#!/bin/sh
+
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
+
+build_map_changes()
+{
+	local fname=$1
+	local mapdb=$2
+
+	cat $fname | filterdiff -i *.map | awk '
+		# Initialize our variables
+		BEGIN {map="";sym="";ar="";sec=""; in_sec=0}
+
+		# Anything that starts with + or -, followed by an a
+		# and ends in the string .map is the name of our map file
+		# This may appear multiple times in a patch if multiple
+		# map files are altered, and all section/symbol names
+		# appearing between a triggering of this rule and the
+		# next trigger of this rule are associated with this file
+		/[-+] a\/.*\.map/ {map=$2}
+
+		# Triggering this rule, which starts a line with a + and ends it
+		# with a { identifies a versioned section.  The section name is
+		# the rest of the line with the + and { symbols remvoed.
+		# Triggering this rule sets in_sec to 1, which actives the
+		# symbol rule below
+		/+.*{/ {gsub("+","");sec=$1; in_sec=1}
+
+		# This rule idenfies the end of a section, and disables the
+		# symbol rule
+		/.*}/ {in_sec=0}
+
+		# This rule matches on a + followed by any characters except a :
+		# (which denotes a global vs local segment), and ends with a ;.
+		# The semicolon is removed and the symbol is printed with its
+		# association file name and version section, along with an
+		# indicator that the symbol is a new addition.  Note this rule
+		# only works if we have found a version section in the rule
+		# above (hence the in_sec check).  Otherwise we flag it as an
+		# unknown section
+		/^+[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " add"
+			} else {
+				print map " " sym " unknown add"
+			}
+		}
+
+		# This is the same rule as above, but the rule matches on a
+		# leading - rather than a +, denoting that the symbol is being
+		# removed.
+		/^-[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " del"
+			} else {
+				print map " " sym " unknown del"
+			}
+		}' > ./$mapdb
+
+		sort -u $mapdb > ./$mapdb.2
+		mv -f $mapdb.2 $mapdb
+
+}
+
+check_for_rule_violations()
+{
+	local mapdb=$1
+	local mname
+	local symname
+	local secname
+	local ar
+	local ret=0
+
+	while read mname symname secname ar
+	do
+		if [ "$ar" == "add" ]
+		then
+
+			if [ "$secname" == "unknown" ]
+			then
+				# Just inform the user of this occurrence, but
+				# don't flag it as an error
+				echo -n "INFO: symbol $syname is added but "
+				echo -n "patch has insuficient context "
+				echo -n "to determine the section name "
+				echo -n "please ensure the version is "
+				echo "EXPERIMENTAL"
+				continue
+			fi
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Symbols that are getting added in a section
+				# other ithan the experimental section
+				# to be moving from an already supported
+				# section or its a violation
+				grep -q \
+				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				if [ $? -ne 0 ]
+				then
+					echo -n "ERROR: symbol $symname "
+					echo -n "is added in a section "
+					echo -n "other than the EXPERIMENTAL "
+					echo "section of the version map"
+					ret=1
+				fi
+			fi
+		else
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Just inform users that non-experimenal
+				# symbols need to go through a deprecation
+				# process
+				echo -n "INFO: symbol $symname is being "
+				echo -n "removed, ensure that it has "
+				echo "gone through the deprecation process"
+			fi
+		fi
+	done < $mapdb
+
+	return $ret
+}
+
+trap clean_and_exit_on_sig EXIT
+
+mapfile=`mktemp mapdb.XXXXXX`
+patch=$1
+exit_code=1
+
+clean_and_exit_on_sig()
+{
+	rm -f $mapfile
+	exit $exit_code
+}
+
+build_map_changes $patch $mapfile
+check_for_rule_violations $mapfile
+exit_code=$?
+
+rm -f $mapfile
+
+exit $exit_code
+
+
diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 7676a6b50..0b2b5f039 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -35,6 +35,8 @@
 # - DPDK_CHECKPATCH_LINE_LENGTH
 . $(dirname $(readlink -e $0))/load-devel-config
 
+VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh
+
 length=${DPDK_CHECKPATCH_LINE_LENGTH:-80}
 
 # override default Linux options
@@ -61,6 +63,7 @@ print_usage () {
 	END_OF_HELP
 }
 
+
 number=0
 quiet=false
 verbose=false
@@ -86,6 +89,7 @@ total=0
 status=0
 
 check () { # <patch> <commit> <title>
+	local reta
 	total=$(($total + 1))
 	! $verbose || printf '\n### %s\n\n' "$3"
 	if [ -n "$1" ] ; then
@@ -96,9 +100,26 @@ check () { # <patch> <commit> <title>
 	else
 		report=$($DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
 	fi
-	[ $? -ne 0 ] || return 0
+	reta=$?
+
 	$verbose || printf '\n### %s\n\n' "$3"
 	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+
+	! $verbose || echo
+	! $verbose || echo "Checking API additions/removals:"
+
+	if [ -n "$1" ] ; then
+		report=$($VALIDATE_NEW_API $1)
+	elif [ -n "$2" ] ; then
+		report=$(git format-patch \
+			 --find-renames --no-stat --stdout -1 $commit |
+			$VALIDATE_NEW_API -)
+	else
+		report=$($VALIDATE_NEW_API -)
+	fi
+	[ $? -ne 0 -o $reta -ne 0 ] || return 0
+	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+
 	status=$(($status + 1))
 }
 
-- 
2.14.3

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool
  @ 2018-01-31 16:46  4% ` Olivier Matz
  2018-02-01  6:40  4%   ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-01-31 16:46 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev

On Tue, Jan 23, 2018 at 01:23:04PM +0000, Andrew Rybchenko wrote:
> An API/ABI changes are planned for 18.05 [1]:
> 
>  * Allow to customize how mempool objects are stored in memory.
>  * Deprecate mempool XMEM API.
>  * Add mempool driver ops to get information from mempool driver and
>    dequeue contiguous blocks of objects if driver supports it.
> 
> [1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
> 
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC v2 00/17] mempool: add bucket mempool driver
  @ 2018-01-31 16:44  0%   ` Olivier Matz
  2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-01-31 16:44 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, Santosh Shukla, Jerin Jacob, Hemant Agrawal, Shreyansh Jain

Hi,

On Tue, Jan 23, 2018 at 01:15:55PM +0000, Andrew Rybchenko wrote:
> The patch series starts from generic enhancements suggested by Olivier.
> Basically it adds driver callbacks to calculate required memory size and
> to populate objects using provided memory area. It allows to remove
> so-called capability flags used before to tell generic code how to
> allocate and slice allocated memory into mempool objects.
> Clean up which removes get_capabilities and register_memory_area is
> not strictly required, but I think right thing to do.
> Existing mempool drivers are updated.
> 
> I've kept rte_mempool_populate_iova_tab() intact since it seems to
> be not directly related XMEM API functions.
> 
> The patch series adds bucket mempool driver which allows to allocate
> (both physically and virtually) contiguous blocks of objects and adds
> mempool API to do it. It is still capable to provide separate objects,
> but it is definitely more heavy-weight than ring/stack drivers.
> The driver will be used by the future Solarflare driver enhancements
> which allow to utilize physical contiguous blocks in the NIC
> hardware/firmware.
> 
> The target usecase is dequeue in blocks and enqueue separate objects
> back (which are collected in buckets to be dequeued). So, the memory
> pool with bucket driver is created by an application and provided to
> networking PMD receive queue. The choice of bucket driver is done using
> rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous
> block allocation should report the bucket driver as the only supported
> and preferred one.
> 
> Introduction of the contiguous block dequeue operation is proven by
> performance measurements using autotest with minor enhancements:
>  - in the original test bulks are powers of two, which is unacceptable
>    for us, so they are changed to multiple of contig_block_size;
>  - the test code is duplicated to support plain dequeue and
>    dequeue_contig_blocks;
>  - all the extra test variations (with/without cache etc) are eliminated;
>  - a fake read from the dequeued buffer is added (in both cases) to
>    simulate mbufs access.
> 
> start performance test for bucket (without cache)
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Srate_persec=   111935488
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Srate_persec=   115290931
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Srate_persec=   353055539
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Srate_persec=   353330790
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Srate_persec=   224657407
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Srate_persec=   230411468
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Srate_persec=   706700902
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Srate_persec=   703673139
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Srate_persec=   425236887
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Srate_persec=   437295512
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Srate_persec=  1343409356
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Srate_persec=  1336567397
> start performance test for bucket (without cache + contiguous dequeue)
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Crate_persec=   122945536
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Crate_persec=   126458265
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Crate_persec=   374262988
> mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Crate_persec=   377316966
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Crate_persec=   244842496
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Crate_persec=   251618917
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Crate_persec=   751226060
> mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Crate_persec=   756233010
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Crate_persec=   462068120
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Crate_persec=   476997221
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Crate_persec=  1432171313
> mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Crate_persec=  1438829771
> 
> The number of objects in the contiguous block is a function of bucket
> memory size (.config option) and total element size. In the future
> additional API with possibility to pass parameters on mempool allocation
> may be added.
> 
> It breaks ABI since changes rte_mempool_ops. Also it removes
> rte_mempool_ops_register_memory_area() and
> rte_mempool_ops_get_capabilities() since corresponding callbacks are
> removed.
> 
> The target DPDK release is 18.05.
> 
> v2:
>   - add driver ops to calculate required memory size and populate
>     mempool objects, remove extra flags which were required before
>     to control it
>   - transition of octeontx and dpaa drivers to the new callbacks
>   - change info API to get information from driver required to
>     API user to know contiguous block size
>   - remove get_capabilities (not required any more and may be
>     substituted with more in info get API)
>   - remove register_memory_area since it is substituted with
>     populate callback which can do more
>   - use SPDX tags
>   - avoid all objects affinity to single lcore
>   - fix bucket get_count
>   - deprecate XMEM API
>   - avoid introduction of a new function to flush cache
>   - fix NO_CACHE_ALIGN case in bucket mempool
> 
> Andrew Rybchenko (10):
>   mempool: fix phys contig check if populate default skipped
>   mempool: add op to calculate memory size to be allocated
>   mempool/octeontx: add callback to calculate memory size
>   mempool: add op to populate objects using provided memory
>   mempool/octeontx: implement callback to populate objects
>   mempool: remove callback to get capabilities
>   mempool: deprecate xmem functions
>   mempool/octeontx: prepare to remove register memory area op
>   mempool/dpaa: convert to use populate driver op
>   mempool: remove callback to register memory area
> 
> Artem V. Andreev (7):
>   mempool: ensure the mempool is initialized before populating
>   mempool/bucket: implement bucket mempool manager
>   mempool: support flushing the default cache of the mempool
>   mempool: implement abstract mempool info API
>   mempool: support block dequeue operation
>   mempool/bucket: implement block dequeue operation
>   mempool/bucket: do not allow one lcore to grab all buckets
> 
>  MAINTAINERS                                        |   9 +
>  config/common_base                                 |   2 +
>  drivers/mempool/Makefile                           |   1 +
>  drivers/mempool/bucket/Makefile                    |  27 +
>  drivers/mempool/bucket/rte_mempool_bucket.c        | 626 +++++++++++++++++++++
>  .../mempool/bucket/rte_mempool_bucket_version.map  |   4 +
>  drivers/mempool/dpaa/dpaa_mempool.c                |  13 +-
>  drivers/mempool/octeontx/rte_mempool_octeontx.c    |  63 ++-
>  lib/librte_mempool/rte_mempool.c                   | 192 ++++---
>  lib/librte_mempool/rte_mempool.h                   | 366 +++++++++---
>  lib/librte_mempool/rte_mempool_ops.c               |  48 +-
>  lib/librte_mempool/rte_mempool_version.map         |  11 +-
>  mk/rte.app.mk                                      |   1 +
>  13 files changed, 1184 insertions(+), 179 deletions(-)
>  create mode 100644 drivers/mempool/bucket/Makefile
>  create mode 100644 drivers/mempool/bucket/rte_mempool_bucket.c
>  create mode 100644 drivers/mempool/bucket/rte_mempool_bucket_version.map

Globally, the RFC looks fine to me. Thanks for this good work.

I didn't review the mempool/bucket part like I did last time. About the
changes to the mempool API, I think it's a good enhancement: it makes
things more flexible and removes complexity in the common code. Some
points may still need some discussions, for instance how the PMDs and
applications take advantage of block dequeue operations and get_info().

I have some specific comments that are sent directly as replies to the
patches.

Since it changes dpaa and octeontx, having feedback from people from NXP
and Cavium Networks would be good.

Thanks,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC v2, 1/2] cryptodev: add support to set session private data
  @ 2018-01-31 13:40  0%     ` De Lara Guarch, Pablo
  0 siblings, 0 replies; 200+ results
From: De Lara Guarch, Pablo @ 2018-01-31 13:40 UTC (permalink / raw)
  To: Gujjar, Abhinandan S, Doherty, Declan, akhil.goyal,
	Jerin.JacobKollanukkaran
  Cc: dev, Vangati, Narender, Rao, Nikhil

Hi Abhinandan,

> -----Original Message-----
> From: Gujjar, Abhinandan S
> Sent: Thursday, January 25, 2018 3:38 PM
> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Doherty,
> Declan <declan.doherty@intel.com>; akhil.goyal@nxp.com;
> Jerin.JacobKollanukkaran@cavium.com
> Cc: dev@dpdk.org; Vangati, Narender <narender.vangati@intel.com>; Rao,
> Nikhil <nikhil.rao@intel.com>
> Subject: RE: [RFC v2, 1/2] cryptodev: add support to set session private
> data
> 
> Hi Pablo & Declan,
> 
> > -----Original Message-----
> > From: De Lara Guarch, Pablo
> > Sent: Thursday, January 25, 2018 1:17 AM
> > To: Gujjar, Abhinandan S <abhinandan.gujjar@intel.com>; Doherty,
> > Declan <declan.doherty@intel.com>; akhil.goyal@nxp.com;
> > Jerin.JacobKollanukkaran@cavium.com
> > Cc: dev@dpdk.org; Vangati, Narender <narender.vangati@intel.com>;
> Rao,
> > Nikhil <nikhil.rao@intel.com>
> > Subject: RE: [RFC v2, 1/2] cryptodev: add support to set session
> > private data
> >
> >
> >
> > > -----Original Message-----
> > > From: Gujjar, Abhinandan S
> > > Sent: Tuesday, January 23, 2018 8:54 AM
> > > To: Doherty, Declan <declan.doherty@intel.com>;
> akhil.goyal@nxp.com;
> > > De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> > > Jerin.JacobKollanukkaran@cavium.com
> > > Cc: dev@dpdk.org; Vangati, Narender <narender.vangati@intel.com>;
> > > Gujjar, Abhinandan S <abhinandan.gujjar@intel.com>; Rao, Nikhil
> > > <nikhil.rao@intel.com>
> > > Subject: [RFC v2, 1/2] cryptodev: add support to set session private
> > > data
> > >
> > > Update rte_crypto_op to indicate private data offset.
> > >
> > > The application may want to store private data along with the
> > > rte_cryptodev that is transparent to the rte_cryptodev layer.
> > > For e.g., If an eventdev based application is submitting a
> > > rte_cryptodev_sym_session operation and wants to indicate event
> > > information required to construct a new event that will be enqueued
> > > to eventdev after completion of the rte_cryptodev_sym_session
> operation.
> > > This patch provides a mechanism for the application to associate
> > > this information with the rte_cryptodev_sym_session session.
> > > The application can set the private data using
> > > rte_cryptodev_sym_session_set_private_data() and retrieve it using
> > > rte_cryptodev_sym_session_get_private_data().
> >
> > Hi Abhinandan,
> >
> > >
> > > Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
> > > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > > ---
> > > Notes:
> > >         V2:
> > > 	1. Removed enum rte_crypto_op_private_data_type
> > > 	2. Corrected formatting
> > >
> > >  lib/librte_cryptodev/rte_crypto.h    |  8 ++++++--
> > >  lib/librte_cryptodev/rte_cryptodev.h | 32
> > > ++++++++++++++++++++++++++++++++
> > >  2 files changed, 38 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_cryptodev/rte_crypto.h
> > > b/lib/librte_cryptodev/rte_crypto.h
> > > index 95cf861..14c87c8 100644
> > > --- a/lib/librte_cryptodev/rte_crypto.h
> > > +++ b/lib/librte_cryptodev/rte_crypto.h
> > > @@ -84,8 +84,12 @@ struct rte_crypto_op {
> > >  	 */
> > >  	uint8_t sess_type;
> > >  	/**< operation session type */
> > > -
> > > -	uint8_t reserved[5];
> > > +	uint16_t private_data_offset;
> > > +	/**< Offset to indicate start of private data (if any). The private
> > > +	 * data may be used by the application to store information which
> > > +	 * should remain untouched in the library/driver
> >
> > Is this the offset for the private data after the crypto operation?
> Yes. This is private date is meant for sessionless case.
> > From your title, it looks like it is for the session private data, but
> > then, this shouldn't be here.
> Agree.
> > If it is for the crypto operation, I suggest you to separate it in another
> patch.
> > Also, you should indicate where the offset starts from. For the IV,
> > the offset is counted from the start of the rte_crypto_op, so I think
> > it should be the same, to keep consistency.
> Sure. I will make a separate patch for this changes. Add some more
> information to make it clear.
> >
> > For the session private data, we see two options:
> >
> > 1 - Add a  "valid" private data field in the rte_cryptodev_sym_session
> > structure, so when it is set, it indicates that the session contains
> > private data (a single bit would be enough, 1 to indicate there is, and 0 to
> indicate there is not).
> > This would go into the beginning of the structure, so this would
> > require an ABI deprecation notice.
> > This also assumes that the private data starts just after the session
> > header
> >
> > 2 -  Do not add an extra "valid" private data field in
> > rte_cryptodev_sym_session structure, and add a small header in the
> private data, which contains the "valid"
> > bit.
> > Then, when calling sym_session_get_private_data, this bit should be
> checked.
> > Note that the object that holds the session structure needs to be big
> > enough to hold this value.
> > If the object has only space for the sess_private_data array, then the
> > session has no private data.
> > Therefore, this approach might be less performant, but with no ABI
> > deprecation required.
> I am with option 2 with slight changes as below:
> rte_cryptodev_sym_session_create() will have a flag as below indicating
> private data exits or not.
> {
> - memset(sess, 0, (sizeof(void *) * nb_drivers));
> +memset(sess, 0, (sizeof(void *) * nb_drivers ) +
> +sizeof(private_data_flag));
> }
> and in
> rte_cryptodev_get_header_session_size(void)
> {
>   /*
>    * Header contains pointers to the private data
>    * of all registered drivers
>    */
>   -return (sizeof(void *) * nb_drivers);
>   +return ((sizeof(void *) * nb_drivers) + sizeof(private_data_flag)); } With
> this, a flag indicating private data exists or not will always have valid value.

Sure, this should work. Go ahead and send a v3 with this change, separating the changes
made in the session from the changes made in the crypto operation (so you will have 3 patches in total).

Pablo

> 
> >
> > I would recommend you to send a deprecation notice for option 1, then
> > check the performance of both option, and if needed, make the change
> > in the structure, in 18.05.
> >
> > Regards,
> > Pablo

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/3] doc: announce ABI change for crypto info struct
  2018-01-30 12:14  7% ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices Pablo de Lara
@ 2018-01-30 12:14  4%   ` Pablo de Lara
  2018-02-13 11:45  4%   ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices De Lara Guarch, Pablo
  1 sibling, 0 replies; 200+ results
From: Pablo de Lara @ 2018-01-30 12:14 UTC (permalink / raw)
  To: akhil.goyal, hemant.agrawal, declan.doherty, jerin.jacob,
	fiona.trahe, john.griffin, deepak.k.jain, jck, tdu, dima,
	nsamsono, jianbo.liu
  Cc: dev, Pablo de Lara

Since the API changes made in 17.08, the session mempool
is not created anymore in each crypto device.
Therefore, there is no need to have, in the cryptodev info
structure, the maximum number of sessions supported per device
and per queue pair.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d59ad5988..5588ba7c1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -59,3 +59,8 @@ Deprecation Notices
   be added between the producer and consumer structures. The size of the
   structure and the offset of the fields will remain the same on
   platforms with 64B cache line, but will change on other platforms.
+
+* cryptodev: The structure ``sym``, including its fields ``max_nb_sessions``
+  and ``max_nb_sessions_per_qp``, in structure ``rte_cryptodev_info``,
+  will be removed in 18.05, as these fields are not relevant anymore
+  since the session mempool is not internal in the crypto device anymore.
-- 
2.14.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices
      2018-01-30 11:37  4% ` Akhil Goyal
@ 2018-01-30 12:14  7% ` Pablo de Lara
  2018-01-30 12:14  4%   ` [dpdk-dev] [PATCH v2 1/3] doc: announce ABI change for crypto info struct Pablo de Lara
  2018-02-13 11:45  4%   ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices De Lara Guarch, Pablo
  2 siblings, 2 replies; 200+ results
From: Pablo de Lara @ 2018-01-30 12:14 UTC (permalink / raw)
  To: akhil.goyal, hemant.agrawal, declan.doherty, jerin.jacob,
	fiona.trahe, john.griffin, deepak.k.jain, jck, tdu, dima,
	nsamsono, jianbo.liu
  Cc: dev, Pablo de Lara

v2:
- Added an extra deprecation announcement
- Bonded the other two deprecation notices with the new one in a
  patchset

Pablo de Lara (3):
  doc: announce ABI change for crypto info struct
  doc: announce deprecation for attach/detach crypto session
  doc: announce deprecation in crypto queue pair start/stop

 doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
 lib/librte_cryptodev/rte_cryptodev.h |  4 ++++
 2 files changed, 19 insertions(+)

-- 
2.14.3

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
  2018-01-30 11:21  4%         ` De Lara Guarch, Pablo
@ 2018-01-30 11:53  4%           ` Verma, Shally
  2018-02-02  9:07  4%             ` De Lara Guarch, Pablo
  0 siblings, 1 reply; 200+ results
From: Verma, Shally @ 2018-01-30 11:53 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, Akhil Goyal, Trahe, Fiona, hemant.agrawal,
	Doherty, Declan, Griffin, John, Jain, Deepak K, jck, tdu, dima,
	nsamsono, jianbo.liu, Jacob,  Jerin, Athreya, Narayana Prasad,
	Murthy, Nidadavolu
  Cc: dev



>-----Original Message-----
>From: De Lara Guarch, Pablo [mailto:pablo.de.lara.guarch@intel.com]
>Sent: 30 January 2018 16:51
>To: Verma, Shally <Shally.Verma@cavium.com>; Akhil Goyal <akhil.goyal@nxp.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>; Jain, Deepak K
><deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
>jianbo.liu@arm.com; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
><NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu <Nidadavolu.Murthy@cavium.com>
>Cc: dev@dpdk.org
>Subject: RE: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
>
>Hi Shally/Ahkil,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Verma, Shally
>> Sent: Tuesday, January 30, 2018 7:56 AM
>> To: Akhil Goyal <akhil.goyal@nxp.com>; De Lara Guarch, Pablo
>> <pablo.de.lara.guarch@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>;
>> hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>;
>> Griffin, John <john.griffin@intel.com>; Jain, Deepak K
>> <deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com;
>> dima@marvell.com; nsamsono@marvell.com; jianbo.liu@arm.com; Jacob,
>> Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
>> <NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
>> <Nidadavolu.Murthy@cavium.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info
>> struct
>>
>> I do see current cryptodev unit testcase (inside \test dir) uses
>> info.sym.max_nb_sessions param for session mempool_create. So, such
>> testcases change are also in proposal?
>
>Yes, for these tests, we can just define a macro in the tests, instead of using the info structure.

[Shally] Ok, then you mean applications will choose any random number during mempool_create and not dependent on device max_nb_sessions?

>>
>> Another point, we recently submitted an RFC patch on lib/cryptodev with
>> asymmetric crypto support
>> (https://dpdk.org/dev/patchwork/patch/34308/) which is awaiting review
>> and these fields have role to play there.
>> So, could this change be please viewed in conjunction with asym RFC?
>
>Do you need it for asymmetric? Anyway, this would remove the symmetric function and structures, not applicable for you.

[Shally] I would say addition of asym in lib/cryptodev is not entirely standalone, specifically for PMDs that can support both. 
My key concern are max_nb_sessions_per_qp and related qp_attach_sym/asym APIs which enable management of queue distribution among sym and asym in current proposal, specifically, for PMDs that can support both but have dedicated qp for each. Right now proposal is open for feedback and would prefer to be covered before sym related changes could be applied.

>>
>> Thanks
>> Shally
>>
>> > -----Original Message-----
>> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Akhil Goyal
>> > Sent: 29 January 2018 14:57
>> > To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe,
>> > Fiona <fiona.trahe@intel.com>; hemant.agrawal@nxp.com; Doherty,
>> Declan
>> > <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>;
>> > Jain, Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
>> > tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
>> > jianbo.liu@arm.com; Jacob, Jerin
>> <Jerin.JacobKollanukkaran@cavium.com>
>> > Cc: dev@dpdk.org
>> > Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto
>> > info struct
>> >
>> > Hi Pablo/Fiona,
>> >
>> > On 1/26/2018 4:38 PM, De Lara Guarch, Pablo wrote:
>> > >
>> > >
>> > >> -----Original Message-----
>> > >> From: Trahe, Fiona
>> > >> Sent: Friday, January 26, 2018 10:45 AM
>> > >> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
>> > >> akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty, Declan
>> > >> <declan.doherty@intel.com>; jerin.jacob@intel.com; Griffin, John
>> > >> <john.griffin@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>;
>> > >> jck@semihalf.com; tdu@semihalf.com; dima@marvell.com;
>> > >> nsamsono@marvell.com; jianbo.liu@arm.com
>> > >> Cc: dev@dpdk.org; Trahe, Fiona <fiona.trahe@intel.com>
>> > >> Subject: RE: [PATCH] doc: announce ABI change for crypto info
>> > >> struct
>> > >>
>> > >> Hi Pablo,
>> > >>
>> > >>> -----Original Message-----
>> > >>> From: De Lara Guarch, Pablo
>> > >>> Sent: Friday, January 26, 2018 9:04 AM
>> > >>> To: akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty,
>> Declan
>> > >>> <declan.doherty@intel.com>; jerin.jacob@intel.com; Trahe, Fiona
>> > >>> <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>;
>> > >>> Jain, Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
>> > >>> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
>> > >>> jianbo.liu@arm.com
>> > >>> Cc: dev@dpdk.org; De Lara Guarch, Pablo
>> > >>> <pablo.de.lara.guarch@intel.com>
>> > >>> Subject: [PATCH] doc: announce ABI change for crypto info struct
>> > >>>
>> > >>> Since the API changes made in 17.08, the session mempool is not
>> > >>> created anymore in each crypto device.
>> > >>> Therefore, there is no need to have, in the cryptodev info
>> > >>> structure, the maximum number of sessions supported per device
>> and
>> > >>> per queue pair.
>> > >>>
>> > >>> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
>> > >>> ---
>> > >>>   doc/guides/rel_notes/deprecation.rst | 5 +++++
>> > >>>   1 file changed, 5 insertions(+)
>> > >>>
>> > >>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> > >>> b/doc/guides/rel_notes/deprecation.rst
>> > >>> index d59ad5988..5588ba7c1 100644
>> > >>> --- a/doc/guides/rel_notes/deprecation.rst
>> > >>> +++ b/doc/guides/rel_notes/deprecation.rst
>> > >>> @@ -59,3 +59,8 @@ Deprecation Notices
>> > >>>     be added between the producer and consumer structures. The
>> > >>> size of
>> > >> the
>> > >>>     structure and the offset of the fields will remain the same on
>> > >>>     platforms with 64B cache line, but will change on other platforms.
>> > >>> +
>> > >>> +* cryptodev: The structure ``sym``, including its fields
>> > >>> +``max_nb_sessions``
>> > >>> +  and ``max_nb_sessions_per_qp``, in structure
>> > >>> +``rte_cryptodev_info``,
>> > >>> +  will be removed in 18.05, as these fields are not relevant
>> > >>> +anymore
>> > >>> +  since the session mempool is not internal in the crypto device
>> > >> anymore.
>> > >>> --
>> > >> [Fiona] max_nb_sessions must be also removed from struct
>> > >> rte_cryptodev_pmd_init_params
>> > >
>> > > Good point. Since this structure is internal, I guess we don't need
>> > > a
>> > deprecation notice for it,
>> > > but I will remove it in the patch for 18.05.
>> > >
>> > >> Regards deprecation of max_nb_sessions from both structs:
>> > >> Acked-by: Fiona Trahe <fiona.trahe@intel.com>
>> > >>
>> > >> If removing the max_nb_sessions_per_qp, then the following
>> > >> functions should also be deprecated.
>> > >> rte_cryptodev_queue_pair_attach_sym_session
>> > >> rte_cryptodev_queue_pair_detach_sym_session
>> > >> These and the max_nb_session_per_qp were added here at request of
>> > NXP:
>> > >> http://dpdk.org/ml/archives/dev/2017-March/060740.html
>> > >
>> > > Akhil, do you agree on this change?
>> > >
>> >
>> > We recently did some changes in the driver to take care of the
>> > dependency for limit on max_nb_sessions_per_qp, but it is not removed
>> > completely. We will need to look into this. But sending the
>> > deprecation notice at this moment is fine. If something comes up, will
>> > let you know later.
>
>Looks good to me. Could you ack this if you agree with it?
>
>Thanks,
>Pablo
>
>> >
>> > -Akhil


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
    @ 2018-01-30 11:37  4% ` Akhil Goyal
  2018-01-30 12:14  7% ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices Pablo de Lara
  2 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2018-01-30 11:37 UTC (permalink / raw)
  To: Pablo de Lara, hemant.agrawal, declan.doherty, jerin.jacob,
	fiona.trahe, john.griffin, deepak.k.jain, jck, tdu, dima,
	nsamsono, jianbo.liu
  Cc: dev

On 1/26/2018 2:33 PM, Pablo de Lara wrote:
> Since the API changes made in 17.08, the session mempool
> is not created anymore in each crypto device.
> Therefore, there is no need to have, in the cryptodev info
> structure, the maximum number of sessions supported per device
> and per queue pair.
> 
> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> ---
>   doc/guides/rel_notes/deprecation.rst | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d59ad5988..5588ba7c1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -59,3 +59,8 @@ Deprecation Notices
>     be added between the producer and consumer structures. The size of the
>     structure and the offset of the fields will remain the same on
>     platforms with 64B cache line, but will change on other platforms.
> +
> +* cryptodev: The structure ``sym``, including its fields ``max_nb_sessions``
> +  and ``max_nb_sessions_per_qp``, in structure ``rte_cryptodev_info``,
> +  will be removed in 18.05, as these fields are not relevant anymore
> +  since the session mempool is not internal in the crypto device anymore.
> 
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
  2018-01-30  7:55  4%       ` Verma, Shally
@ 2018-01-30 11:21  4%         ` De Lara Guarch, Pablo
  2018-01-30 11:53  4%           ` Verma, Shally
  0 siblings, 1 reply; 200+ results
From: De Lara Guarch, Pablo @ 2018-01-30 11:21 UTC (permalink / raw)
  To: Verma, Shally, Akhil Goyal, Trahe, Fiona, hemant.agrawal,
	Doherty, Declan, Griffin, John, Jain, Deepak K, jck, tdu, dima,
	nsamsono, jianbo.liu, Jacob,  Jerin, Athreya, Narayana Prasad,
	Murthy, Nidadavolu
  Cc: dev

Hi Shally/Ahkil,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Verma, Shally
> Sent: Tuesday, January 30, 2018 7:56 AM
> To: Akhil Goyal <akhil.goyal@nxp.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>;
> hemant.agrawal@nxp.com; Doherty, Declan <declan.doherty@intel.com>;
> Griffin, John <john.griffin@intel.com>; Jain, Deepak K
> <deepak.k.jain@intel.com>; jck@semihalf.com; tdu@semihalf.com;
> dima@marvell.com; nsamsono@marvell.com; jianbo.liu@arm.com; Jacob,
> Jerin <Jerin.JacobKollanukkaran@cavium.com>; Athreya, Narayana Prasad
> <NarayanaPrasad.Athreya@cavium.com>; Murthy, Nidadavolu
> <Nidadavolu.Murthy@cavium.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info
> struct
> 
> I do see current cryptodev unit testcase (inside \test dir) uses
> info.sym.max_nb_sessions param for session mempool_create. So, such
> testcases change are also in proposal?

Yes, for these tests, we can just define a macro in the tests, instead of using the info structure.
> 
> Another point, we recently submitted an RFC patch on lib/cryptodev with
> asymmetric crypto support
> (https://dpdk.org/dev/patchwork/patch/34308/) which is awaiting review
> and these fields have role to play there.
> So, could this change be please viewed in conjunction with asym RFC?

Do you need it for asymmetric? Anyway, this would remove the symmetric function and structures, not applicable for you.
> 
> Thanks
> Shally
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Akhil Goyal
> > Sent: 29 January 2018 14:57
> > To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe,
> > Fiona <fiona.trahe@intel.com>; hemant.agrawal@nxp.com; Doherty,
> Declan
> > <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>;
> > Jain, Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
> > tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
> > jianbo.liu@arm.com; Jacob, Jerin
> <Jerin.JacobKollanukkaran@cavium.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto
> > info struct
> >
> > Hi Pablo/Fiona,
> >
> > On 1/26/2018 4:38 PM, De Lara Guarch, Pablo wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Trahe, Fiona
> > >> Sent: Friday, January 26, 2018 10:45 AM
> > >> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> > >> akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty, Declan
> > >> <declan.doherty@intel.com>; jerin.jacob@intel.com; Griffin, John
> > >> <john.griffin@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>;
> > >> jck@semihalf.com; tdu@semihalf.com; dima@marvell.com;
> > >> nsamsono@marvell.com; jianbo.liu@arm.com
> > >> Cc: dev@dpdk.org; Trahe, Fiona <fiona.trahe@intel.com>
> > >> Subject: RE: [PATCH] doc: announce ABI change for crypto info
> > >> struct
> > >>
> > >> Hi Pablo,
> > >>
> > >>> -----Original Message-----
> > >>> From: De Lara Guarch, Pablo
> > >>> Sent: Friday, January 26, 2018 9:04 AM
> > >>> To: akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty,
> Declan
> > >>> <declan.doherty@intel.com>; jerin.jacob@intel.com; Trahe, Fiona
> > >>> <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>;
> > >>> Jain, Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
> > >>> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
> > >>> jianbo.liu@arm.com
> > >>> Cc: dev@dpdk.org; De Lara Guarch, Pablo
> > >>> <pablo.de.lara.guarch@intel.com>
> > >>> Subject: [PATCH] doc: announce ABI change for crypto info struct
> > >>>
> > >>> Since the API changes made in 17.08, the session mempool is not
> > >>> created anymore in each crypto device.
> > >>> Therefore, there is no need to have, in the cryptodev info
> > >>> structure, the maximum number of sessions supported per device
> and
> > >>> per queue pair.
> > >>>
> > >>> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> > >>> ---
> > >>>   doc/guides/rel_notes/deprecation.rst | 5 +++++
> > >>>   1 file changed, 5 insertions(+)
> > >>>
> > >>> diff --git a/doc/guides/rel_notes/deprecation.rst
> > >>> b/doc/guides/rel_notes/deprecation.rst
> > >>> index d59ad5988..5588ba7c1 100644
> > >>> --- a/doc/guides/rel_notes/deprecation.rst
> > >>> +++ b/doc/guides/rel_notes/deprecation.rst
> > >>> @@ -59,3 +59,8 @@ Deprecation Notices
> > >>>     be added between the producer and consumer structures. The
> > >>> size of
> > >> the
> > >>>     structure and the offset of the fields will remain the same on
> > >>>     platforms with 64B cache line, but will change on other platforms.
> > >>> +
> > >>> +* cryptodev: The structure ``sym``, including its fields
> > >>> +``max_nb_sessions``
> > >>> +  and ``max_nb_sessions_per_qp``, in structure
> > >>> +``rte_cryptodev_info``,
> > >>> +  will be removed in 18.05, as these fields are not relevant
> > >>> +anymore
> > >>> +  since the session mempool is not internal in the crypto device
> > >> anymore.
> > >>> --
> > >> [Fiona] max_nb_sessions must be also removed from struct
> > >> rte_cryptodev_pmd_init_params
> > >
> > > Good point. Since this structure is internal, I guess we don't need
> > > a
> > deprecation notice for it,
> > > but I will remove it in the patch for 18.05.
> > >
> > >> Regards deprecation of max_nb_sessions from both structs:
> > >> Acked-by: Fiona Trahe <fiona.trahe@intel.com>
> > >>
> > >> If removing the max_nb_sessions_per_qp, then the following
> > >> functions should also be deprecated.
> > >> rte_cryptodev_queue_pair_attach_sym_session
> > >> rte_cryptodev_queue_pair_detach_sym_session
> > >> These and the max_nb_session_per_qp were added here at request of
> > NXP:
> > >> http://dpdk.org/ml/archives/dev/2017-March/060740.html
> > >
> > > Akhil, do you agree on this change?
> > >
> >
> > We recently did some changes in the driver to take care of the
> > dependency for limit on max_nb_sessions_per_qp, but it is not removed
> > completely. We will need to look into this. But sending the
> > deprecation notice at this moment is fine. If something comes up, will
> > let you know later.

Looks good to me. Could you ack this if you agree with it?

Thanks,
Pablo

> >
> > -Akhil


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct
  @ 2018-01-30  7:55  4%       ` Verma, Shally
  2018-01-30 11:21  4%         ` De Lara Guarch, Pablo
  0 siblings, 1 reply; 200+ results
From: Verma, Shally @ 2018-01-30  7:55 UTC (permalink / raw)
  To: Akhil Goyal, De Lara Guarch, Pablo, Trahe, Fiona, hemant.agrawal,
	Doherty, Declan, Griffin, John, Jain, Deepak K, jck, tdu, dima,
	nsamsono, jianbo.liu, Jacob,  Jerin, Athreya, Narayana Prasad,
	Murthy, Nidadavolu
  Cc: dev

I do see current cryptodev unit testcase (inside \test dir) uses info.sym.max_nb_sessions param for session mempool_create. So, such testcases change are also in proposal?

Another point, we recently submitted an RFC patch on lib/cryptodev with asymmetric crypto support (https://dpdk.org/dev/patchwork/patch/34308/) which is awaiting review and these fields have role to play there. 
So, could this change be please viewed in conjunction with asym RFC?

Thanks
Shally

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Akhil Goyal
> Sent: 29 January 2018 14:57
> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; hemant.agrawal@nxp.com; Doherty, Declan
> <declan.doherty@intel.com>; Griffin, John <john.griffin@intel.com>; Jain,
> Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
> jianbo.liu@arm.com; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for crypto info
> struct
> 
> Hi Pablo/Fiona,
> 
> On 1/26/2018 4:38 PM, De Lara Guarch, Pablo wrote:
> >
> >
> >> -----Original Message-----
> >> From: Trahe, Fiona
> >> Sent: Friday, January 26, 2018 10:45 AM
> >> To: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> >> akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty, Declan
> >> <declan.doherty@intel.com>; jerin.jacob@intel.com; Griffin, John
> >> <john.griffin@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>;
> >> jck@semihalf.com; tdu@semihalf.com; dima@marvell.com;
> >> nsamsono@marvell.com; jianbo.liu@arm.com
> >> Cc: dev@dpdk.org; Trahe, Fiona <fiona.trahe@intel.com>
> >> Subject: RE: [PATCH] doc: announce ABI change for crypto info struct
> >>
> >> Hi Pablo,
> >>
> >>> -----Original Message-----
> >>> From: De Lara Guarch, Pablo
> >>> Sent: Friday, January 26, 2018 9:04 AM
> >>> To: akhil.goyal@nxp.com; hemant.agrawal@nxp.com; Doherty, Declan
> >>> <declan.doherty@intel.com>; jerin.jacob@intel.com; Trahe, Fiona
> >>> <fiona.trahe@intel.com>; Griffin, John <john.griffin@intel.com>; Jain,
> >>> Deepak K <deepak.k.jain@intel.com>; jck@semihalf.com;
> >>> tdu@semihalf.com; dima@marvell.com; nsamsono@marvell.com;
> >>> jianbo.liu@arm.com
> >>> Cc: dev@dpdk.org; De Lara Guarch, Pablo
> >>> <pablo.de.lara.guarch@intel.com>
> >>> Subject: [PATCH] doc: announce ABI change for crypto info struct
> >>>
> >>> Since the API changes made in 17.08, the session mempool is not
> >>> created anymore in each crypto device.
> >>> Therefore, there is no need to have, in the cryptodev info structure,
> >>> the maximum number of sessions supported per device and per queue
> >>> pair.
> >>>
> >>> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> >>> ---
> >>>   doc/guides/rel_notes/deprecation.rst | 5 +++++
> >>>   1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/doc/guides/rel_notes/deprecation.rst
> >>> b/doc/guides/rel_notes/deprecation.rst
> >>> index d59ad5988..5588ba7c1 100644
> >>> --- a/doc/guides/rel_notes/deprecation.rst
> >>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>> @@ -59,3 +59,8 @@ Deprecation Notices
> >>>     be added between the producer and consumer structures. The size of
> >> the
> >>>     structure and the offset of the fields will remain the same on
> >>>     platforms with 64B cache line, but will change on other platforms.
> >>> +
> >>> +* cryptodev: The structure ``sym``, including its fields
> >>> +``max_nb_sessions``
> >>> +  and ``max_nb_sessions_per_qp``, in structure
> >>> +``rte_cryptodev_info``,
> >>> +  will be removed in 18.05, as these fields are not relevant anymore
> >>> +  since the session mempool is not internal in the crypto device
> >> anymore.
> >>> --
> >> [Fiona] max_nb_sessions must be also removed from struct
> >> rte_cryptodev_pmd_init_params
> >
> > Good point. Since this structure is internal, I guess we don't need a
> deprecation notice for it,
> > but I will remove it in the patch for 18.05.
> >
> >> Regards deprecation of max_nb_sessions from both structs:
> >> Acked-by: Fiona Trahe <fiona.trahe@intel.com>
> >>
> >> If removing the max_nb_sessions_per_qp, then the following functions
> >> should also be deprecated.
> >> rte_cryptodev_queue_pair_attach_sym_session
> >> rte_cryptodev_queue_pair_detach_sym_session
> >> These and the max_nb_session_per_qp were added here at request of
> NXP:
> >> http://dpdk.org/ml/archives/dev/2017-March/060740.html
> >
> > Akhil, do you agree on this change?
> >
> 
> We recently did some changes in the driver to take care of the
> dependency for limit on max_nb_sessions_per_qp, but it is not removed
> completely. We will need to look into this. But sending the deprecation
> notice at this moment is fine. If something comes up, will let you know
> later.
> 
> -Akhil


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [[PATCH v5] 5/5] doc: Add ABI __experimental tag documentation
  @ 2018-01-29 21:42  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-01-29 21:42 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev, Mcnamara, John, Richardson, Bruce

23/01/2018 11:35, Mcnamara, John:
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Monday, January 22, 2018 1:48 AM
> > To: dev@dpdk.org
> > Cc: Neil Horman <nhorman@tuxdriver.com>; Thomas Monjalon
> > <thomas@monjalon.net>; Mcnamara, John <john.mcnamara@intel.com>;
> > Richardson, Bruce <bruce.richardson@intel.com>
> > Subject: [[PATCH v5] 5/5] doc: Add ABI __experimental tag documentation
> > 
> > Document the need to add the __experimental tag to appropriate functions
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Thomas Monjalon <thomas@monjalon.net>
> > CC: "Mcnamara, John" <john.mcnamara@intel.com>
> > CC: Bruce Richardson <bruce.richardson@intel.com>
> > ...
> > +Note that marking an API as experimental is a multi step process.  To
> > +mark an API as experimental, the symbols which are desired to be
> > +exported must be placed in an EXPERIMENTAL version block in the
> > +corresponding libraries' version map script. Secondly, the
> > +corresponding definitions of those exported functions, and their
> > +forward declarations (in the development header files), must be marked
> > +with the __rte_experimental tag (see rte_compat.h).  The DPDK build
> > +makefiles perform a check to ensure that the map file and the C code
> > +reflect the same list of symbols.  This check can be circumvented by
> > defining ALLOW_EXPERIMENTAL_API during compilation in the corresponding
> > library Makefile.
> > +
> > +In addition to tagging the code with __rte_experimental, the doxygen
> > +markup must also contain the EXPERIMENTAL string, and the MAINTAINER
> > +file should note that the library contains EXPERIMENTAL APIs.
> > +
> >  ABI versions, once released, are available until such time as their
> > deprecation has been noted in the Release Notes for at least one major
> > release  cycle. For example consider the case where the ABI for DPDK 2.0
> > has been
> > --
> > 2.14.3
> 
> Thanks for the update, and this work in general.
> 
> The rendered docs would probably look a better better with __rte_experimental
> and ALLOW_EXPERIMENTAL_API is fixed width backticks ``var`` but that is only
> a "nice to have" so no need for a respin.

Backticks added on apply.

Also changed the last sentence from
	the MAINTAINER file should note that the library contains EXPERIMENTAL APIs.
to
	the MAINTAINERS file should note the EXPERIMENTAL libraries.
Indeed, the practice is to note only new libraries as experimental in
the MAINTAINERS files.

^ permalink raw reply	[relevance 4%]

Results 10001-10200 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2017-11-24 16:06     [dpdk-dev] [RFC PATCH 0/6] mempool: add bucket mempool driver Andrew Rybchenko
2018-01-23 13:15     ` [dpdk-dev] [RFC v2 00/17] " Andrew Rybchenko
2018-01-31 16:44  0%   ` Olivier Matz
2018-03-10 15:39  3%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
2018-03-10 15:39  7%     ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
2018-03-11 12:51  0%       ` santosh
2018-03-12  6:53  0%         ` Andrew Rybchenko
2018-03-19 17:03  0%       ` Olivier Matz
2018-03-20 14:41  0%         ` Bruce Richardson
2018-03-10 15:39  6%     ` [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory Andrew Rybchenko
2018-03-10 15:39  6%     ` [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities Andrew Rybchenko
2018-03-10 15:39  5%     ` [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions Andrew Rybchenko
2018-03-10 15:39  8%     ` [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area Andrew Rybchenko
2018-03-19 17:03  0%     ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Olivier Matz
2018-03-25 16:20  2%   ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
2018-03-25 16:20  7%     ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
2018-03-25 16:20  6%     ` [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
2018-03-25 16:20  6%     ` [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
2018-03-25 16:20  4%     ` [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions Andrew Rybchenko
2018-03-25 16:20  8%     ` [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area Andrew Rybchenko
2018-03-26 16:09  2%   ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
2018-03-26 16:09  7%     ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
2018-03-26 16:09  6%     ` [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
2018-03-26 16:09  6%     ` [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
2018-03-26 16:09  4%     ` [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions Andrew Rybchenko
2017-12-01 18:56     [dpdk-dev] [PATCH 0/4] dpdk: enhance EXPERIMENTAL api tagging Neil Horman
2018-01-22  1:48     ` [dpdk-dev] [[PATCH v5] 5/5] doc: Add ABI __experimental tag documentation Neil Horman
2018-01-23 10:35       ` Mcnamara, John
2018-01-29 21:42  4%     ` Thomas Monjalon
2017-12-04 15:55     [dpdk-dev] [PATCH] relicense various bits of the dpdk Neil Horman
2018-02-01 12:19  8% ` [dpdk-dev] [PATCH v2] " Neil Horman
2018-02-01 12:49  0%   ` Hemant Agrawal
2017-12-22 12:41     [dpdk-dev] [PATCH v2] eal: add function to return number of detected sockets Anatoly Burakov
2018-01-16 17:53     ` [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal Anatoly Burakov
2018-01-23 10:39       ` Mcnamara, John
2018-02-07 10:10  4%     ` Jerin Jacob
2018-02-09 14:42  4%       ` Bruce Richardson
2018-02-14  0:04  4%         ` Thomas Monjalon
2018-02-14 14:25  4%           ` Thomas Monjalon
2018-02-12 16:00  4%   ` Jonas Pfefferle
     [not found]     ` <cover.1517848624.git.anatoly.burakov@intel.com>
2018-02-05 16:37  8%   ` [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
2018-02-05 17:39  3%     ` Burakov, Anatoly
2018-02-05 22:45  0%       ` Thomas Monjalon
2018-02-06  9:28  0%         ` Burakov, Anatoly
2018-02-06  9:47  0%           ` Thomas Monjalon
2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] Add " Anatoly Burakov
2018-02-07  9:58  5%     ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
2018-03-08 12:12  3%       ` Bruce Richardson
2018-03-08 14:38  0%         ` Burakov, Anatoly
2018-03-09 16:32  0%           ` Bruce Richardson
2018-03-21  4:59  0%       ` gowrishankar muthukrishnan
2018-03-21 10:24  0%         ` Burakov, Anatoly
2018-03-22 10:58  4%     ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
2018-03-22 11:45  3%       ` Burakov, Anatoly
2018-03-22 12:36  5%       ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
2018-03-22 17:07  0%         ` gowrishankar muthukrishnan
2018-01-08 10:00     [dpdk-dev] [PATCH v3] lib/librte_meter: add meter configuration profile Jasvinder Singh
2018-01-08 15:43     ` [dpdk-dev] [PATCH v4] " Jasvinder Singh
2018-02-19 21:12  3%   ` Thomas Monjalon
2018-01-11  0:20     [dpdk-dev] [PATCH v6 00/23] eventtimer: introduce event timer adapter Erik Gabriel Carrillo
2018-03-08 21:53     ` [dpdk-dev] [PATCH v7 0/7] " Erik Gabriel Carrillo
2018-03-08 21:54  2%   ` [dpdk-dev] [PATCH v7 2/7] eventtimer: add common code Erik Gabriel Carrillo
2018-01-12 10:27     [dpdk-dev] [PATCH v2] doc: ethdev ABI change deprecation notice Kirill Rybalchenko
2018-01-12 10:29     ` [dpdk-dev] [PATCH v3] " Kirill Rybalchenko
2018-01-12 14:38       ` Neil Horman
2018-02-13 12:09  4%     ` Ferruh Yigit
2018-02-13 13:21  4%       ` Olivier Matz
2018-02-14  0:14  4%         ` Thomas Monjalon
2018-02-14 17:18  4%           ` Thomas Monjalon
2018-01-12 20:45     [dpdk-dev] [PATCH 1/1] doc: announce API change to lcore role function Erik Gabriel Carrillo
2018-02-13 14:37  0% ` Ferruh Yigit
2018-02-13 14:43  0%   ` Van Haaren, Harry
2018-02-13 14:47  0%     ` Pavan Nikhilesh
2018-02-14  0:09  0% ` Thomas Monjalon
2018-02-14 10:59  0%   ` Thomas Monjalon
2018-01-15 19:05     [dpdk-dev] [PATCH] checkpatches.sh: Add checks for ABI symbol addition Neil Horman
2018-01-31 17:27  6% ` [dpdk-dev] [PATCH v3] " Neil Horman
2018-02-04 14:44  7%   ` Thomas Monjalon
2018-02-05 17:29  6% ` [dpdk-dev] [PATCH v4] " Neil Horman
2018-02-05 17:57  4%   ` Thomas Monjalon
2018-02-09 15:21  6% ` [dpdk-dev] [PATCH v5] " Neil Horman
2018-02-13 22:57  4%   ` Thomas Monjalon
2018-02-14 19:19  6% ` [dpdk-dev] [PATCH v6] " Neil Horman
2018-01-17 17:17     [dpdk-dev] [PATCH] doc: add deprecation notice for physmem layout function Anatoly Burakov
2018-01-18 10:32     ` [dpdk-dev] [PATCH v2] doc: add deprecation notice for memory hotplug changes Anatoly Burakov
2018-02-05 11:47  0%   ` Bruce Richardson
2018-02-07 10:11  0%     ` Jerin Jacob
2018-02-14 14:48  0%       ` Thomas Monjalon
2018-02-12 15:58  0%   ` Jonas Pfefferle
2018-02-13  0:24  0%   ` Yongseok Koh
2018-01-17 21:57     [dpdk-dev] [PATCH v3 2/6] ethdev: return named opaque type instead of void pointer Ferruh Yigit
2018-03-09 11:25     ` [dpdk-dev] [PATCH v4] " Ferruh Yigit
     [not found]       ` <20180309123651.GB19004@hmswarspite.think-freely.org>
2018-03-09 13:00  0%     ` Ferruh Yigit
2018-03-09 15:16  0%       ` Neil Horman
2018-03-09 15:45  0%         ` Ferruh Yigit
2018-03-09 19:06  0%           ` Neil Horman
2018-03-20 15:51  0%             ` Ferruh Yigit
2018-01-22 15:45     [dpdk-dev] [PATCH] net/octeontx: register fpa as platform HW mempool Pavan Nikhilesh
2018-01-31 19:51  4% ` Ferruh Yigit
2018-01-23  8:54     [dpdk-dev] [RFC v2, 1/2] cryptodev: add support to set session private data Abhinandan Gujjar
2018-01-24 19:46     ` De Lara Guarch, Pablo
2018-01-25 15:37       ` Gujjar, Abhinandan S
2018-01-31 13:40  0%     ` De Lara Guarch, Pablo
2018-01-23 13:23     [dpdk-dev] [PATCH] doc: announce API/ABI changes for mempool Andrew Rybchenko
2018-01-31 16:46  4% ` Olivier Matz
2018-02-01  6:40  4%   ` Jerin Jacob
2018-02-01 12:53  4%     ` Hemant Agrawal
2018-02-14 15:23  4%       ` Thomas Monjalon
2018-01-26  9:03     [dpdk-dev] [PATCH] doc: announce ABI change for crypto info struct Pablo de Lara
2018-01-26 10:44     ` Trahe, Fiona
2018-01-26 11:08       ` De Lara Guarch, Pablo
2018-01-29  9:26         ` Akhil Goyal
2018-01-30  7:55  4%       ` Verma, Shally
2018-01-30 11:21  4%         ` De Lara Guarch, Pablo
2018-01-30 11:53  4%           ` Verma, Shally
2018-02-02  9:07  4%             ` De Lara Guarch, Pablo
2018-02-02 10:52  4%               ` Verma, Shally
2018-01-30 11:37  4% ` Akhil Goyal
2018-01-30 12:14  7% ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices Pablo de Lara
2018-01-30 12:14  4%   ` [dpdk-dev] [PATCH v2 1/3] doc: announce ABI change for crypto info struct Pablo de Lara
2018-02-13 11:45  4%   ` [dpdk-dev] [PATCH v2 0/3] Cryptodev API/ABI deprecation notices De Lara Guarch, Pablo
2018-02-01 19:47     [dpdk-dev] [PATCH 1/2] Revert "eal: fix default mempool ops" Hemant Agrawal
2018-02-01 19:56     ` Hemant Agrawal
2018-02-01 20:40  3%   ` Pavan Nikhilesh
2018-02-02  5:43  0%     ` Hemant Agrawal
2018-02-02  8:03 10% ` [dpdk-dev] [PATCH] doc: remove eal API for default mempool ops name Hemant Agrawal
2018-02-02  8:31 10%   ` [dpdk-dev] [PATCH v2] " Hemant Agrawal
2018-02-02 14:01  0%     ` Olivier Matz
2018-02-13 11:28  0%       ` Ferruh Yigit
2018-02-02 15:16  3% [dpdk-dev] [PATCH v1 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
2018-02-02 15:16  3% ` [dpdk-dev] [PATCH v1 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
2018-02-02 16:46  3% ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Adrien Mazarguil
2018-02-02 16:46  3%   ` [dpdk-dev] [PATCH v2 3/4] net/mlx: version rdma-core glue libraries Adrien Mazarguil
2018-02-04 14:29         ` Thomas Monjalon
2018-02-05 11:24           ` Adrien Mazarguil
2018-02-05 12:13             ` Marcelo Ricardo Leitner
2018-02-05 12:24               ` Van Haaren, Harry
2018-02-05 12:58                 ` Marcelo Ricardo Leitner
2018-02-05 13:44  3%               ` Adrien Mazarguil
2018-02-05 14:16  0%                 ` Thomas Monjalon
2018-02-05 14:33  0%                   ` Adrien Mazarguil
2018-02-05 14:37  0%                   ` Marcelo Ricardo Leitner
2018-02-05 14:59  0%                     ` Adrien Mazarguil
2018-02-05 15:29  0%                       ` Marcelo Ricardo Leitner
2018-02-05 15:54  4%                         ` Adrien Mazarguil
2018-02-05 17:06  0%                           ` Marcelo Ricardo Leitner
2018-02-06 11:06  4%                             ` Adrien Mazarguil
2018-02-02 16:52  0%   ` [dpdk-dev] [PATCH v2 0/4] net/mlx: enhance rdma-core glue configuration Nélio Laranjeiro
2018-02-06 11:31  0%   ` Shahaf Shuler
2018-02-02 23:28     [dpdk-dev] [PATCH 0/7] vhost: support selective datapath Zhihong Wang
2018-03-19 10:12     ` [dpdk-dev] [PATCH v3 0/5] " Zhihong Wang
2018-03-19 10:12       ` [dpdk-dev] [PATCH v3 2/5] " Zhihong Wang
2018-03-21 21:05         ` Maxime Coquelin
2018-03-22  7:55  3%       ` Wang, Zhihong
2018-03-22  8:31  0%         ` Maxime Coquelin
2018-02-04  7:24  4% [dpdk-dev] [PATCH] doc: annouce ABI change for RSS configuraiton structure Xueming Li
2018-02-06  7:38  4% ` [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure Xueming Li
2018-02-13  6:52  4%   ` Shahaf Shuler
2018-02-13 11:27  4%     ` Ferruh Yigit
2018-02-13 12:10  4%       ` Jerin Jacob
2018-02-14 16:28  4%         ` Thomas Monjalon
2018-02-05 12:16     [dpdk-dev] [PATCH 0/8] vhost: input validation enhancements Stefan Hajnoczi
2018-02-05 12:16  3% ` [dpdk-dev] [PATCH 2/8] vhost: avoid enum fields in VhostUserMsg Stefan Hajnoczi
2018-02-06  9:47  0%   ` Maxime Coquelin
2018-02-07  9:26 11% [dpdk-dev] [PATCH v1] doc: update deprecation notice of rte_devargs Gaetan Rivet
2018-02-10 13:15     [dpdk-dev] [PATCH] eal: fix rte_errno values for IPC API Anatoly Burakov
2018-02-11  1:09     ` Tan, Jianfeng
2018-02-13 13:50       ` Thomas Monjalon
2018-02-13 14:16         ` Van Haaren, Harry
2018-02-13 15:39  3%       ` Bruce Richardson
2018-02-12  4:53     [dpdk-dev] [PATCH 0/4] deferred queue setup Qi Zhang
2018-03-02  4:13     ` [dpdk-dev] [PATCH v2 " Qi Zhang
2018-03-02  4:13       ` [dpdk-dev] [PATCH v2 1/4] ether: support " Qi Zhang
2018-03-14 12:31  0%     ` Ananyev, Konstantin
2018-03-15  3:13  0%       ` Zhang, Qi Z
2018-03-15 13:16  0%         ` Ananyev, Konstantin
2018-03-15 15:08  0%           ` Zhang, Qi Z
2018-03-15 15:38  0%             ` Ananyev, Konstantin
2018-02-13  8:14     [dpdk-dev] [PATCH v2] net/tap: add CRC stripping capability Ophir Munk
2018-02-13 16:35     ` Thomas Monjalon
2018-02-15 21:55  3%   ` Stephen Hemminger
2018-02-16 13:00  0%     ` Thomas Monjalon
2018-02-14 12:21  6% [dpdk-dev] [PATCH v1] doc: update release notes for 18.02 John McNamara
2018-02-14 13:50  6% ` [dpdk-dev] [PATCH v2] " John McNamara
2018-02-14 12:32  4% [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors Shahaf Shuler
2018-02-14 13:50  4% ` Thomas Monjalon
2018-02-14 13:54  4% ` Doherty, Declan
2018-02-14 14:50  4% ` Remy Horton
2018-02-14 15:27  4% ` Boccassi, Luca
2018-02-14 15:54  4%   ` Jerin Jacob
2018-02-14 16:50  4%     ` Thomas Monjalon
2018-02-14 15:37  4% [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Adrien Mazarguil
2018-02-14 15:37  3% ` [dpdk-dev] [PATCH v1 3/4] doc: announce API change for flow RSS/RAW actions Adrien Mazarguil
2018-02-14 15:55  0% ` [dpdk-dev] [PATCH v1 0/4] doc: announce API changes for flow rules Nélio Laranjeiro
2018-02-14 16:06  0%   ` Andrew Rybchenko
2018-02-14 19:11  3% [dpdk-dev] [dpdk-announce] DPDK 18.02 released Thomas Monjalon
2018-02-15 13:04  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05 John McNamara
2018-02-16 22:54  0% ` Carrillo, Erik G
2018-02-21 21:44     [dpdk-dev] [PATCH v2 0/2] lib/rib: Add Routing Information Base library Medvedkin Vladimir
2018-02-21 21:44     ` [dpdk-dev] [PATCH v2 1/2] Add RIB library Medvedkin Vladimir
2018-03-14 11:09  4%   ` Bruce Richardson
2018-03-25 18:17  0%     ` Vladimir Medvedkin
2018-03-26  9:50  0%       ` Bruce Richardson
2018-02-22 12:15  8% [dpdk-dev] [PATCH] doc: fixing grammar Alejandro Lucero
2018-02-24 13:14     [dpdk-dev] [PATCH v2 0/7] crypto: add virtio poll mode driver Jay Zhou
2018-02-24 13:14  2% ` [dpdk-dev] [PATCH v2 1/7] crypto/virtio: add virtio related fundamental functions Jay Zhou
2018-02-27 10:29  3% [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function Kirill Rybalchenko
2018-02-27 11:01  0% ` Ferruh Yigit
2018-02-27 13:45  3%   ` Thomas Monjalon
2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
2018-03-07 17:17  0%   ` Ferruh Yigit
2018-03-07 17:47  0%     ` Ferruh Yigit
2018-03-03 13:45     [dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK Anatoly Burakov
2018-03-03 13:46     ` [dpdk-dev] [PATCH 13/41] eal: replace memseg with memseg lists Anatoly Burakov
2018-03-19 17:39  3%   ` Olivier Matz
2018-03-20  9:47  4%     ` Burakov, Anatoly
2018-03-04 15:04     [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel Jianfeng Tan
2018-03-20 16:37  4% ` Pattan, Reshma
2018-03-21  2:28  4%   ` Tan, Jianfeng
2018-03-21  9:55  3%     ` Pattan, Reshma
2018-03-05 23:01     [dpdk-dev] [PATCH 1/2] eventdev: add device stop flush callback Gage Eads
2018-03-08 23:10     ` [dpdk-dev] [PATCH v2 " Gage Eads
2018-03-12  6:25  3%   ` Jerin Jacob
2018-03-12 14:30  3%     ` Eads, Gage
2018-03-12 14:38  0%       ` Jerin Jacob
2018-03-06 18:28  3% [dpdk-dev] [PATCH] eal: register rte_panic user callback Arnon Warshavsky
2018-03-07  8:32  0% ` Thomas Monjalon
2018-03-07  9:05  0%   ` Burakov, Anatoly
2018-03-07  9:59  0%     ` Thomas Monjalon
2018-03-07 11:29  0%       ` Burakov, Anatoly
2018-03-07 12:08  3% [dpdk-dev] [RFC PATCH v1 0/4] ethdev: add per-PMD tuning of RxTx parmeters Remy Horton
2018-03-21 14:27  3% ` [dpdk-dev] [PATCH v2 " Remy Horton
2018-03-07 12:08     [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters Remy Horton
2018-03-14 14:43     ` Ferruh Yigit
2018-03-14 15:48       ` Remy Horton
2018-03-14 16:42         ` Ferruh Yigit
2018-03-14 17:23           ` Shreyansh Jain
2018-03-14 17:52             ` Ferruh Yigit
2018-03-14 18:53               ` Ananyev, Konstantin
2018-03-14 21:02                 ` Ferruh Yigit
2018-03-14 21:36                   ` Bruce Richardson
2018-03-15 13:57                     ` Ferruh Yigit
2018-03-15 14:39                       ` Bruce Richardson
2018-03-15 14:57                         ` Ferruh Yigit
2018-03-16 13:54                           ` Shreyansh Jain
2018-03-20 14:54                             ` Ferruh Yigit
2018-03-21  6:51                               ` Shreyansh Jain
2018-03-21 10:02  3%                             ` Ferruh Yigit
2018-03-21 10:45  0%                               ` Shreyansh Jain
2018-03-07 17:44 23% [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
2018-03-07 18:06  0% ` Luca Boccassi
2018-03-08  8:05  5% ` Thomas Monjalon
2018-03-08 11:43  3%   ` Ferruh Yigit
2018-03-08 15:17  0%     ` Thomas Monjalon
2018-03-08 15:35  0%       ` Neil Horman
2018-03-08 16:04  0%         ` Thomas Monjalon
2018-03-08 19:40  3%           ` Neil Horman
2018-03-08 21:34  4%             ` Thomas Monjalon
2018-03-09  0:18  4%               ` Neil Horman
2018-03-08  1:29     [dpdk-dev] [RFC PATCH 0/5] add framework to load and execute BPF code Konstantin Ananyev
2018-03-08  1:29  2% ` [dpdk-dev] [RFC PATCH 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-03-09 16:42     [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
2018-03-09 16:42  2% ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-03-10  1:25     [dpdk-dev] [PATCH v1 0/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-03-10  1:25     ` [dpdk-dev] [PATCH v1 3/6] net/mlx5: add a function to rdma-core glue Yongseok Koh
2018-03-12  9:13  3%   ` Nélio Laranjeiro
2018-03-12 15:55  1% [dpdk-dev] [RFC] Switch device offload with DPDK Adrien Mazarguil
2018-03-20 11:26     [dpdk-dev] [PATCH] doc: announce ethdev CRC strip flag deprecation Ferruh Yigit
2018-03-20 11:35     ` Thomas Monjalon
2018-03-20 17:23  3%   ` Ferruh Yigit
2018-03-20 21:28     [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value Arnon Warshavsky
2018-03-20 22:04  3% ` Thomas Monjalon
2018-03-20 22:42  0%   ` Arnon Warshavsky
2018-03-20 22:49  3%     ` Thomas Monjalon
2018-03-20 23:04  0%       ` Arnon Warshavsky
2018-03-21  8:21  0%         ` Thomas Monjalon
2018-03-21  8:47  0%           ` Arnon Warshavsky
2018-03-23 12:58  3% [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes Adrien Mazarguil
2018-03-23 12:58  4% ` [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build Adrien Mazarguil
2018-03-23 17:35     [dpdk-dev] [PATCH 0/4] NFP PF support based on new CPP interface Alejandro Lucero
2018-03-23 17:35  1% ` [dpdk-dev] [PATCH 1/4] net/nfp: add NFP CPP support Alejandro Lucero
2018-03-23 17:35  6% ` [dpdk-dev] [PATCH 2/4] net/nfp: update PMD for using new CPP interface Alejandro Lucero
2018-03-25  8:33     [dpdk-dev] [PATCH v3 0/7] crypto: add virtio poll mode driver Jay Zhou
2018-03-25  8:33  2% ` [dpdk-dev] [PATCH v3 1/7] crypto/virtio: add virtio related fundamental functions Jay Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).