DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [dpdk-dev] [PATCH v5 0/9] Dynamic memzones
                       ` (2 preceding siblings ...)
  2015-06-25 14:05  4%   ` [dpdk-dev] [PATCH v4 0/9] Dynamic memzone Sergio Gonzalez Monroy
@ 2015-06-26 11:32  4%   ` Sergio Gonzalez Monroy
  2015-06-26 11:32  1%     ` [dpdk-dev] [PATCH v5 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
  3 siblings, 1 reply; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-26 11:32 UTC (permalink / raw)
  To: dev

Current implemetation allows reserving/creating memzones but not the opposite
(unreserve/free). This affects mempools and other memzone based objects.

>From my point of view, implementing free functionality for memzones would look
like malloc over memsegs.
Thus, this approach moves malloc inside eal (which in turn removes a circular
dependency), where malloc heaps are composed of memsegs.
We keep both malloc and memzone APIs as they are, but memzones allocate its
memory by calling malloc_heap_alloc.
Some extra functionality is required in malloc to allow for boundary constrained
memory requests.
In summary, currently malloc is based on memzones, and with this approach
memzones are based on malloc.

v5:
 - Fix rte_memzone_free
 - Improve rte_memzone_free unit test

v4:
 - Rebase and fix couple of merge issues

v3:
 - Create dummy librte_malloc
 - Add deprecation notice
 - Rework some of the code
 - Doc update
 - checkpatch

v2:
 - New rte_memzone_free
 - Support memzone len = 0
 - Add all available memsegs to malloc heap at init
 - Update memzone/malloc unit tests

Sergio Gonzalez Monroy (9):
  eal: move librte_malloc to eal/common
  eal: memzone allocated by malloc
  app/test: update malloc/memzone unit tests
  config: remove CONFIG_RTE_MALLOC_MEMZONE_SIZE
  eal: remove free_memseg and references to it
  eal: new rte_memzone_free
  app/test: update unit test with rte_memzone_free
  doc: announce ABI change of librte_malloc
  doc: update malloc documentation

 MAINTAINERS                                       |   9 +-
 app/test/test_malloc.c                            |  86 ----
 app/test/test_memzone.c                           | 454 ++++------------------
 config/common_bsdapp                              |   8 +-
 config/common_linuxapp                            |   8 +-
 doc/guides/prog_guide/env_abstraction_layer.rst   | 220 ++++++++++-
 doc/guides/prog_guide/img/malloc_heap.png         | Bin 81329 -> 80952 bytes
 doc/guides/prog_guide/index.rst                   |   1 -
 doc/guides/prog_guide/malloc_lib.rst              | 233 -----------
 doc/guides/prog_guide/overview.rst                |  11 +-
 doc/guides/rel_notes/abi.rst                      |   1 +
 drivers/net/af_packet/Makefile                    |   1 -
 drivers/net/bonding/Makefile                      |   1 -
 drivers/net/e1000/Makefile                        |   2 +-
 drivers/net/enic/Makefile                         |   2 +-
 drivers/net/fm10k/Makefile                        |   2 +-
 drivers/net/i40e/Makefile                         |   2 +-
 drivers/net/ixgbe/Makefile                        |   2 +-
 drivers/net/mlx4/Makefile                         |   1 -
 drivers/net/null/Makefile                         |   1 -
 drivers/net/pcap/Makefile                         |   1 -
 drivers/net/virtio/Makefile                       |   2 +-
 drivers/net/vmxnet3/Makefile                      |   2 +-
 drivers/net/xenvirt/Makefile                      |   2 +-
 lib/Makefile                                      |   2 +-
 lib/librte_acl/Makefile                           |   2 +-
 lib/librte_eal/bsdapp/eal/Makefile                |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map     |  19 +
 lib/librte_eal/common/Makefile                    |   1 +
 lib/librte_eal/common/eal_common_memzone.c        | 340 +++++++---------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   5 +-
 lib/librte_eal/common/include/rte_malloc.h        | 342 ++++++++++++++++
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/include/rte_memzone.h       |  11 +
 lib/librte_eal/common/malloc_elem.c               | 344 ++++++++++++++++
 lib/librte_eal/common/malloc_elem.h               | 192 +++++++++
 lib/librte_eal/common/malloc_heap.c               | 206 ++++++++++
 lib/librte_eal/common/malloc_heap.h               |  70 ++++
 lib/librte_eal/common/rte_malloc.c                | 259 ++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile              |   4 +-
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c         |  17 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map   |  19 +
 lib/librte_hash/Makefile                          |   2 +-
 lib/librte_lpm/Makefile                           |   2 +-
 lib/librte_malloc/Makefile                        |   6 +-
 lib/librte_malloc/malloc_elem.c                   | 320 ---------------
 lib/librte_malloc/malloc_elem.h                   | 190 ---------
 lib/librte_malloc/malloc_heap.c                   | 208 ----------
 lib/librte_malloc/malloc_heap.h                   |  70 ----
 lib/librte_malloc/rte_malloc.c                    | 228 +----------
 lib/librte_malloc/rte_malloc.h                    | 342 ----------------
 lib/librte_malloc/rte_malloc_version.map          |  16 -
 lib/librte_mempool/Makefile                       |   2 -
 lib/librte_port/Makefile                          |   1 -
 lib/librte_ring/Makefile                          |   3 +-
 lib/librte_table/Makefile                         |   1 -
 56 files changed, 1929 insertions(+), 2354 deletions(-)
 delete mode 100644 doc/guides/prog_guide/malloc_lib.rst
 create mode 100644 lib/librte_eal/common/include/rte_malloc.h
 create mode 100644 lib/librte_eal/common/malloc_elem.c
 create mode 100644 lib/librte_eal/common/malloc_elem.h
 create mode 100644 lib/librte_eal/common/malloc_heap.c
 create mode 100644 lib/librte_eal/common/malloc_heap.h
 create mode 100644 lib/librte_eal/common/rte_malloc.c
 delete mode 100644 lib/librte_malloc/malloc_elem.c
 delete mode 100644 lib/librte_malloc/malloc_elem.h
 delete mode 100644 lib/librte_malloc/malloc_heap.c
 delete mode 100644 lib/librte_malloc/malloc_heap.h
 delete mode 100644 lib/librte_malloc/rte_malloc.h

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v5 2/9] eal: memzone allocated by malloc
  2015-06-26 11:32  4%   ` [dpdk-dev] [PATCH v5 0/9] Dynamic memzones Sergio Gonzalez Monroy
@ 2015-06-26 11:32  1%     ` Sergio Gonzalez Monroy
  0 siblings, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-26 11:32 UTC (permalink / raw)
  To: dev

In the current memory hierarchy, memsegs are groups of physically
contiguous hugepages, memzones are slices of memsegs and malloc further
slices memzones into smaller memory chunks.

This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.

It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c        | 274 ++++++----------------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   2 +-
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/malloc_elem.c               |  68 ++++--
 lib/librte_eal/common/malloc_elem.h               |  14 +-
 lib/librte_eal/common/malloc_heap.c               | 140 ++++++-----
 lib/librte_eal/common/malloc_heap.h               |   6 +-
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 8 files changed, 197 insertions(+), 317 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index aee184a..943012b 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -50,15 +50,15 @@
 #include <rte_string_fns.h>
 #include <rte_common.h>
 
+#include "malloc_heap.h"
+#include "malloc_elem.h"
 #include "eal_private.h"
 
-/* internal copy of free memory segments */
-static struct rte_memseg *free_memseg = NULL;
-
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
 	const struct rte_mem_config *mcfg;
+	const struct rte_memzone *mz;
 	unsigned i = 0;
 
 	/* get pointer to global configuration */
@@ -68,8 +68,9 @@ memzone_lookup_thread_unsafe(const char *name)
 	 * the algorithm is not optimal (linear), but there are few
 	 * zones and this function should be called at init only
 	 */
-	for (i = 0; i < RTE_MAX_MEMZONE && mcfg->memzone[i].addr != NULL; i++) {
-		if (!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
+	for (i = 0; i < RTE_MAX_MEMZONE; i++) {
+		mz = &mcfg->memzone[i];
+		if (mz->addr != NULL && !strncmp(name, mz->name, RTE_MEMZONE_NAMESIZE))
 			return &mcfg->memzone[i];
 	}
 
@@ -88,39 +89,45 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 			len, socket_id, flags, RTE_CACHE_LINE_SIZE);
 }
 
-/*
- * Helper function for memzone_reserve_aligned_thread_unsafe().
- * Calculate address offset from the start of the segment.
- * Align offset in that way that it satisfy istart alignmnet and
- * buffer of the  requested length would not cross specified boundary.
- */
-static inline phys_addr_t
-align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align,
-	size_t bound)
+/* Find the heap with the greatest free block size */
+static void
+find_heap_max_free_elem(int *s, size_t *len, unsigned align)
 {
-	phys_addr_t addr_offset, bmask, end, start;
-	size_t step;
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-	step = RTE_MAX(align, bound);
-	bmask = ~((phys_addr_t)bound - 1);
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* calculate offset to closest alignment */
-	start = RTE_ALIGN_CEIL(ms->phys_addr, align);
-	addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size > *len) {
+			*len = stats.greatest_free_size;
+			*s = i;
+		}
+	}
+	*len -= (MALLOC_ELEM_OVERHEAD + align);
+}
 
-	while (addr_offset + len < ms->len) {
+/* Find a heap that can allocate the requested size */
+static void
+find_heap_suitable(int *s, size_t len, unsigned align)
+{
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-		/* check, do we meet boundary condition */
-		end = start + len - (len != 0);
-		if ((start & bmask) == (end & bmask))
-			break;
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-		/* calculate next offset */
-		start = RTE_ALIGN_CEIL(start + 1, step);
-		addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size >= len + MALLOC_ELEM_OVERHEAD + align) {
+			*s = i;
+			break;
+		}
 	}
-
-	return addr_offset;
 }
 
 static const struct rte_memzone *
@@ -128,13 +135,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
 	struct rte_mem_config *mcfg;
-	unsigned i = 0;
-	int memseg_idx = -1;
-	uint64_t addr_offset, seg_offset = 0;
 	size_t requested_len;
-	size_t memseg_len = 0;
-	phys_addr_t memseg_physaddr;
-	void *memseg_addr;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -166,7 +167,6 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (align < RTE_CACHE_LINE_SIZE)
 		align = RTE_CACHE_LINE_SIZE;
 
-
 	/* align length on cache boundary. Check for overflow before doing so */
 	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
 		rte_errno = EINVAL; /* requested size too big */
@@ -180,129 +180,50 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	requested_len = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE,  len);
 
 	/* check that boundary condition is valid */
-	if (bound != 0 &&
-			(requested_len > bound || !rte_is_power_of_2(bound))) {
+	if (bound != 0 && (requested_len > bound || !rte_is_power_of_2(bound))) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	/* find the smallest segment matching requirements */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		/* last segment */
-		if (free_memseg[i].addr == NULL)
-			break;
+	if (len == 0) {
+		if (bound != 0)
+			requested_len = bound;
+		else
+			requested_len = 0;
+	}
 
-		/* empty segment, skip it */
-		if (free_memseg[i].len == 0)
-			continue;
-
-		/* bad socket ID */
-		if (socket_id != SOCKET_ID_ANY &&
-		    free_memseg[i].socket_id != SOCKET_ID_ANY &&
-		    socket_id != free_memseg[i].socket_id)
-			continue;
-
-		/*
-		 * calculate offset to closest alignment that
-		 * meets boundary conditions.
-		 */
-		addr_offset = align_phys_boundary(free_memseg + i,
-			requested_len, align, bound);
-
-		/* check len */
-		if ((requested_len + addr_offset) > free_memseg[i].len)
-			continue;
-
-		/* check flags for hugepage sizes */
-		if ((flags & RTE_MEMZONE_2MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_1G)
-			continue;
-		if ((flags & RTE_MEMZONE_1GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_2M)
-			continue;
-		if ((flags & RTE_MEMZONE_16MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16G)
-			continue;
-		if ((flags & RTE_MEMZONE_16GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16M)
-			continue;
-
-		/* this segment is the best until now */
-		if (memseg_idx == -1) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-		/* find the biggest contiguous zone */
-		else if (len == 0) {
-			if (free_memseg[i].len > memseg_len) {
-				memseg_idx = i;
-				memseg_len = free_memseg[i].len;
-				seg_offset = addr_offset;
-			}
-		}
-		/*
-		 * find the smallest (we already checked that current
-		 * zone length is > len
-		 */
-		else if (free_memseg[i].len + align < memseg_len ||
-				(free_memseg[i].len <= memseg_len + align &&
-				addr_offset < seg_offset)) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
+	if (socket_id == SOCKET_ID_ANY) {
+		if (requested_len == 0)
+			find_heap_max_free_elem(&socket_id, &requested_len, align);
+		else
+			find_heap_suitable(&socket_id, requested_len, align);
+
+		if (socket_id == SOCKET_ID_ANY) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
 	}
 
-	/* no segment found */
-	if (memseg_idx == -1) {
-		/*
-		 * If RTE_MEMZONE_SIZE_HINT_ONLY flag is specified,
-		 * try allocating again without the size parameter otherwise -fail.
-		 */
-		if ((flags & RTE_MEMZONE_SIZE_HINT_ONLY)  &&
-		    ((flags & RTE_MEMZONE_1GB) || (flags & RTE_MEMZONE_2MB)
-		|| (flags & RTE_MEMZONE_16MB) || (flags & RTE_MEMZONE_16GB)))
-			return memzone_reserve_aligned_thread_unsafe(name,
-				len, socket_id, 0, align, bound);
-
+	/* allocate memory on heap */
+	void *mz_addr = malloc_heap_alloc(&mcfg->malloc_heaps[socket_id], NULL,
+			requested_len, flags, align, bound);
+	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
 
-	/* save aligned physical and virtual addresses */
-	memseg_physaddr = free_memseg[memseg_idx].phys_addr + seg_offset;
-	memseg_addr = RTE_PTR_ADD(free_memseg[memseg_idx].addr,
-			(uintptr_t) seg_offset);
-
-	/* if we are looking for a biggest memzone */
-	if (len == 0) {
-		if (bound == 0)
-			requested_len = memseg_len - seg_offset;
-		else
-			requested_len = RTE_ALIGN_CEIL(memseg_physaddr + 1,
-				bound) - memseg_physaddr;
-	}
-
-	/* set length to correct value */
-	len = (size_t)seg_offset + requested_len;
-
-	/* update our internal state */
-	free_memseg[memseg_idx].len -= len;
-	free_memseg[memseg_idx].phys_addr += len;
-	free_memseg[memseg_idx].addr =
-		(char *)free_memseg[memseg_idx].addr + len;
+	const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
 
 	/* fill the zone in config */
 	struct rte_memzone *mz = &mcfg->memzone[mcfg->memzone_idx++];
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
-	mz->phys_addr = memseg_physaddr;
-	mz->addr = memseg_addr;
-	mz->len = requested_len;
-	mz->hugepage_sz = free_memseg[memseg_idx].hugepage_sz;
-	mz->socket_id = free_memseg[memseg_idx].socket_id;
+	mz->phys_addr = rte_malloc_virt2phy(mz_addr);
+	mz->addr = mz_addr;
+	mz->len = (requested_len == 0 ? elem->size : requested_len);
+	mz->hugepage_sz = elem->ms->hugepage_sz;
+	mz->socket_id = elem->ms->socket_id;
 	mz->flags = 0;
-	mz->memseg_id = memseg_idx;
+	mz->memseg_id = elem->ms - rte_eal_get_configuration()->mem_config->memseg;
 
 	return mz;
 }
@@ -419,45 +340,6 @@ rte_memzone_dump(FILE *f)
 }
 
 /*
- * called by init: modify the free memseg list to have cache-aligned
- * addresses and cache-aligned lengths
- */
-static int
-memseg_sanitize(struct rte_memseg *memseg)
-{
-	unsigned phys_align;
-	unsigned virt_align;
-	unsigned off;
-
-	phys_align = memseg->phys_addr & RTE_CACHE_LINE_MASK;
-	virt_align = (unsigned long)memseg->addr & RTE_CACHE_LINE_MASK;
-
-	/*
-	 * sanity check: phys_addr and addr must have the same
-	 * alignment
-	 */
-	if (phys_align != virt_align)
-		return -1;
-
-	/* memseg is really too small, don't bother with it */
-	if (memseg->len < (2 * RTE_CACHE_LINE_SIZE)) {
-		memseg->len = 0;
-		return 0;
-	}
-
-	/* align start address */
-	off = (RTE_CACHE_LINE_SIZE - phys_align) & RTE_CACHE_LINE_MASK;
-	memseg->phys_addr += off;
-	memseg->addr = (char *)memseg->addr + off;
-	memseg->len -= off;
-
-	/* align end address */
-	memseg->len &= ~((uint64_t)RTE_CACHE_LINE_MASK);
-
-	return 0;
-}
-
-/*
  * Init the memzone subsystem
  */
 int
@@ -465,14 +347,10 @@ rte_eal_memzone_init(void)
 {
 	struct rte_mem_config *mcfg;
 	const struct rte_memseg *memseg;
-	unsigned i = 0;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* mirror the runtime memsegs from config */
-	free_memseg = mcfg->free_memseg;
-
 	/* secondary processes don't need to initialise anything */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		return 0;
@@ -485,33 +363,13 @@ rte_eal_memzone_init(void)
 
 	rte_rwlock_write_lock(&mcfg->mlock);
 
-	/* fill in uninitialized free_memsegs */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (memseg[i].addr == NULL)
-			break;
-		if (free_memseg[i].addr != NULL)
-			continue;
-		memcpy(&free_memseg[i], &memseg[i], sizeof(struct rte_memseg));
-	}
-
-	/* make all zones cache-aligned */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (free_memseg[i].addr == NULL)
-			break;
-		if (memseg_sanitize(&free_memseg[i]) < 0) {
-			RTE_LOG(ERR, EAL, "%s(): Sanity check failed\n", __func__);
-			rte_rwlock_write_unlock(&mcfg->mlock);
-			return -1;
-		}
-	}
-
 	/* delete all zones */
 	mcfg->memzone_idx = 0;
 	memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
 
 	rte_rwlock_write_unlock(&mcfg->mlock);
 
-	return 0;
+	return rte_eal_malloc_heap_init();
 }
 
 /* Walk all reserved memory zones */
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 34f5abc..055212a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -73,7 +73,7 @@ struct rte_mem_config {
 	struct rte_memseg memseg[RTE_MAX_MEMSEG];    /**< Physmem descriptors. */
 	struct rte_memzone memzone[RTE_MAX_MEMZONE]; /**< Memzone descriptors. */
 
-	/* Runtime Physmem descriptors. */
+	/* Runtime Physmem descriptors - NOT USED */
 	struct rte_memseg free_memseg[RTE_MAX_MEMSEG];
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index 716216f..b270356 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -40,7 +40,7 @@
 #include <rte_memory.h>
 
 /* Number of free lists per heap, grouped by size. */
-#define RTE_HEAP_NUM_FREELISTS  5
+#define RTE_HEAP_NUM_FREELISTS  13
 
 /**
  * Structure to hold malloc heap
@@ -48,7 +48,6 @@
 struct malloc_heap {
 	rte_spinlock_t lock;
 	LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
-	unsigned mz_count;
 	unsigned alloc_count;
 	size_t total_size;
 } __rte_cache_aligned;
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index a5e1248..b54ee33 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -37,7 +37,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
 #include <rte_per_lcore.h>
@@ -56,10 +55,10 @@
  */
 void
 malloc_elem_init(struct malloc_elem *elem,
-		struct malloc_heap *heap, const struct rte_memzone *mz, size_t size)
+		struct malloc_heap *heap, const struct rte_memseg *ms, size_t size)
 {
 	elem->heap = heap;
-	elem->mz = mz;
+	elem->ms = ms;
 	elem->prev = NULL;
 	memset(&elem->free_list, 0, sizeof(elem->free_list));
 	elem->state = ELEM_FREE;
@@ -70,12 +69,12 @@ malloc_elem_init(struct malloc_elem *elem,
 }
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
 {
-	malloc_elem_init(elem, prev->heap, prev->mz, 0);
+	malloc_elem_init(elem, prev->heap, prev->ms, 0);
 	elem->prev = prev;
 	elem->state = ELEM_BUSY; /* mark busy so its never merged */
 }
@@ -86,12 +85,24 @@ malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
  * fit, return NULL.
  */
 static void *
-elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
+elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	const uintptr_t end_pt = (uintptr_t)elem +
+	const size_t bmask = ~(bound - 1);
+	uintptr_t end_pt = (uintptr_t)elem +
 			elem->size - MALLOC_ELEM_TRAILER_LEN;
-	const uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
-	const uintptr_t new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
+	uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+	uintptr_t new_elem_start;
+
+	/* check boundary */
+	if ((new_data_start & bmask) != ((end_pt - 1) & bmask)) {
+		end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
+		new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+		if (((end_pt - 1) & bmask) != (new_data_start & bmask))
+			return NULL;
+	}
+
+	new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
 
 	/* if the new start point is before the exist start, it won't fit */
 	return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
@@ -102,9 +113,10 @@ elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
  * alignment request from the current element
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,	unsigned align,
+		size_t bound)
 {
-	return elem_start_pt(elem, size, align) != NULL;
+	return elem_start_pt(elem, size, align, bound) != NULL;
 }
 
 /*
@@ -115,10 +127,10 @@ static void
 split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt)
 {
 	struct malloc_elem *next_elem = RTE_PTR_ADD(elem, elem->size);
-	const unsigned old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
-	const unsigned new_elem_size = elem->size - old_elem_size;
+	const size_t old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
+	const size_t new_elem_size = elem->size - old_elem_size;
 
-	malloc_elem_init(split_pt, elem->heap, elem->mz, new_elem_size);
+	malloc_elem_init(split_pt, elem->heap, elem->ms, new_elem_size);
 	split_pt->prev = elem;
 	next_elem->prev = split_pt;
 	elem->size = old_elem_size;
@@ -168,8 +180,9 @@ malloc_elem_free_list_index(size_t size)
 void
 malloc_elem_free_list_insert(struct malloc_elem *elem)
 {
-	size_t idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
+	size_t idx;
 
+	idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
 	elem->state = ELEM_FREE;
 	LIST_INSERT_HEAD(&elem->heap->free_head[idx], elem, free_list);
 }
@@ -190,12 +203,26 @@ elem_free_list_remove(struct malloc_elem *elem)
  * is not done here, as it's done there previously.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	struct malloc_elem *new_elem = elem_start_pt(elem, size, align);
-	const unsigned old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
+	const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	const size_t trailer_size = elem->size - old_elem_size - size -
+		MALLOC_ELEM_OVERHEAD;
+
+	elem_free_list_remove(elem);
 
-	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE){
+	if (trailer_size > MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+		/* split it, too much free space after elem */
+		struct malloc_elem *new_free_elem =
+				RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
+
+		split_elem(elem, new_free_elem);
+		malloc_elem_free_list_insert(new_free_elem);
+	}
+
+	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
 		/* don't split it, pad the element instead */
 		elem->state = ELEM_BUSY;
 		elem->pad = old_elem_size;
@@ -208,8 +235,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 			new_elem->size = elem->size - elem->pad;
 			set_header(new_elem);
 		}
-		/* remove element from free list */
-		elem_free_list_remove(elem);
 
 		return new_elem;
 	}
@@ -219,7 +244,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 	 * Re-insert original element, in case its new size makes it
 	 * belong on a different list.
 	 */
-	elem_free_list_remove(elem);
 	split_elem(elem, new_elem);
 	new_elem->state = ELEM_BUSY;
 	malloc_elem_free_list_insert(elem);
diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h
index 9790b1a..e05d2ea 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -47,9 +47,9 @@ enum elem_state {
 
 struct malloc_elem {
 	struct malloc_heap *heap;
-	struct malloc_elem *volatile prev;      /* points to prev elem in memzone */
+	struct malloc_elem *volatile prev;      /* points to prev elem in memseg */
 	LIST_ENTRY(malloc_elem) free_list;      /* list of free elements in heap */
-	const struct rte_memzone *mz;
+	const struct rte_memseg *ms;
 	volatile enum elem_state state;
 	uint32_t pad;
 	size_t size;
@@ -136,11 +136,11 @@ malloc_elem_from_data(const void *data)
 void
 malloc_elem_init(struct malloc_elem *elem,
 		struct malloc_heap *heap,
-		const struct rte_memzone *mz,
+		const struct rte_memseg *ms,
 		size_t size);
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem,
@@ -151,14 +151,16 @@ malloc_elem_mkend(struct malloc_elem *elem,
  * of the requested size and with the requested alignment
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * reserve a block of data in an existing malloc_elem. If the malloc_elem
  * is much larger than the data block requested, we split the element in two.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_alloc(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * free a malloc_elem block by adding it to the free list. If the
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8861d27..f5fff96 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -39,7 +39,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_launch.h>
@@ -54,123 +53,104 @@
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
-/* since the memzone size starts with a digit, it will appear unquoted in
- * rte_config.h, so quote it so it can be passed to rte_str_to_size */
-#define MALLOC_MEMZONE_SIZE RTE_STR(RTE_MALLOC_MEMZONE_SIZE)
-
-/*
- * returns the configuration setting for the memzone size as a size_t value
- */
-static inline size_t
-get_malloc_memzone_size(void)
+static unsigned
+check_hugepage_sz(unsigned flags, size_t hugepage_sz)
 {
-	return rte_str_to_size(MALLOC_MEMZONE_SIZE);
+	unsigned ret = 1;
+
+	if ((flags & RTE_MEMZONE_2MB) && hugepage_sz == RTE_PGSIZE_1G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_1GB) && hugepage_sz == RTE_PGSIZE_2M)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16MB) && hugepage_sz == RTE_PGSIZE_16G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16GB) && hugepage_sz == RTE_PGSIZE_16M)
+		ret = 0;
+
+	return ret;
 }
 
 /*
- * reserve an extra memory zone and make it available for use by a particular
- * heap. This reserves the zone and sets a dummy malloc_elem header at the end
+ * Expand the heap with a memseg.
+ * This reserves the zone and sets a dummy malloc_elem header at the end
  * to prevent overflow. The rest of the zone is added to free list as a single
  * large free block
  */
-static int
-malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
+static void
+malloc_heap_add_memseg(struct malloc_heap *heap, struct rte_memseg *ms)
 {
-	const unsigned mz_flags = 0;
-	const size_t block_size = get_malloc_memzone_size();
-	/* ensure the data we want to allocate will fit in the memzone */
-	const size_t min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
-	const struct rte_memzone *mz = NULL;
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	unsigned numa_socket = heap - mcfg->malloc_heaps;
-
-	size_t mz_size = min_size;
-	if (mz_size < block_size)
-		mz_size = block_size;
-
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	snprintf(mz_name, sizeof(mz_name), "MALLOC_S%u_HEAP_%u",
-		     numa_socket, heap->mz_count++);
-
-	/* try getting a block. if we fail and we don't need as big a block
-	 * as given in the config, we can shrink our request and try again
-	 */
-	do {
-		mz = rte_memzone_reserve(mz_name, mz_size, numa_socket,
-					 mz_flags);
-		if (mz == NULL)
-			mz_size /= 2;
-	} while (mz == NULL && mz_size > min_size);
-	if (mz == NULL)
-		return -1;
-
 	/* allocate the memory block headers, one at end, one at start */
-	struct malloc_elem *start_elem = (struct malloc_elem *)mz->addr;
-	struct malloc_elem *end_elem = RTE_PTR_ADD(mz->addr,
-			mz_size - MALLOC_ELEM_OVERHEAD);
+	struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
+	struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr,
+			ms->len - MALLOC_ELEM_OVERHEAD);
 	end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
+	const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
 
-	const unsigned elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
-	malloc_elem_init(start_elem, heap, mz, elem_size);
+	malloc_elem_init(start_elem, heap, ms, elem_size);
 	malloc_elem_mkend(end_elem, start_elem);
 	malloc_elem_free_list_insert(start_elem);
 
-	/* increase heap total size by size of new memzone */
-	heap->total_size+=mz_size - MALLOC_ELEM_OVERHEAD;
-	return 0;
+	heap->total_size += elem_size;
 }
 
 /*
  * Iterates through the freelist for a heap to find a free element
  * which can store data of the required size and with the requested alignment.
+ * If size is 0, find the biggest available elem.
  * Returns null on failure, or pointer to element on success.
  */
 static struct malloc_elem *
-find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
+find_suitable_element(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
 	size_t idx;
-	struct malloc_elem *elem;
+	struct malloc_elem *elem, *alt_elem = NULL;
 
 	for (idx = malloc_elem_free_list_index(size);
-		idx < RTE_HEAP_NUM_FREELISTS; idx++)
-	{
+			idx < RTE_HEAP_NUM_FREELISTS; idx++) {
 		for (elem = LIST_FIRST(&heap->free_head[idx]);
-			!!elem; elem = LIST_NEXT(elem, free_list))
-		{
-			if (malloc_elem_can_hold(elem, size, align))
-				return elem;
+				!!elem; elem = LIST_NEXT(elem, free_list)) {
+			if (malloc_elem_can_hold(elem, size, align, bound)) {
+				if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
+					return elem;
+				alt_elem = elem;
+			}
 		}
 	}
+
+	if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		return alt_elem;
+
 	return NULL;
 }
 
 /*
- * Main function called by malloc to allocate a block of memory from the
- * heap. It locks the free list, scans it, and adds a new memzone if the
- * scan fails. Once the new memzone is added, it re-scans and should return
+ * Main function to allocate a block of memory from the heap.
+ * It locks the free list, scans it, and adds a new memseg if the
+ * scan fails. Once the new memseg is added, it re-scans and should return
  * the new element after releasing the lock.
  */
 void *
 malloc_heap_alloc(struct malloc_heap *heap,
-		const char *type __attribute__((unused)), size_t size, unsigned align)
+		const char *type __attribute__((unused)), size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
+	struct malloc_elem *elem;
+
 	size = RTE_CACHE_LINE_ROUNDUP(size);
 	align = RTE_CACHE_LINE_ROUNDUP(align);
+
 	rte_spinlock_lock(&heap->lock);
-	struct malloc_elem *elem = find_suitable_element(heap, size, align);
-	if (elem == NULL){
-		if ((malloc_heap_add_memzone(heap, size, align)) == 0)
-			elem = find_suitable_element(heap, size, align);
-	}
 
-	if (elem != NULL){
-		elem = malloc_elem_alloc(elem, size, align);
+	elem = find_suitable_element(heap, size, flags, align, bound);
+	if (elem != NULL) {
+		elem = malloc_elem_alloc(elem, size, align, bound);
 		/* increase heap's count of allocated elements */
 		heap->alloc_count++;
 	}
 	rte_spinlock_unlock(&heap->lock);
-	return elem == NULL ? NULL : (void *)(&elem[1]);
 
+	return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
 /*
@@ -206,3 +186,21 @@ malloc_heap_get_stats(const struct malloc_heap *heap,
 	socket_stats->alloc_count = heap->alloc_count;
 	return 0;
 }
+
+int
+rte_eal_malloc_heap_init(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned ms_cnt;
+	struct rte_memseg *ms;
+
+	if (mcfg == NULL)
+		return -1;
+
+	for (ms = &mcfg->memseg[0], ms_cnt = 0;
+			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
+			ms_cnt++, ms++)
+		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+
+	return 0;
+}
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a47136d..3ccbef0 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -53,15 +53,15 @@ malloc_get_numa_socket(void)
 }
 
 void *
-malloc_heap_alloc(struct malloc_heap *heap, const char *type,
-		size_t size, unsigned align);
+malloc_heap_alloc(struct malloc_heap *heap,	const char *type, size_t size,
+		unsigned flags, size_t align, size_t bound);
 
 int
 malloc_heap_get_stats(const struct malloc_heap *heap,
 		struct rte_malloc_socket_stats *socket_stats);
 
 int
-rte_eal_heap_memzone_init(void);
+rte_eal_malloc_heap_init(void);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index c313a57..54c2bd8 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -39,7 +39,6 @@
 
 #include <rte_memcpy.h>
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_branch_prediction.h>
@@ -87,7 +86,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 		return NULL;
 
 	ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
-				size, align == 0 ? 1 : align);
+				size, 0, align == 0 ? 1 : align, 0);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
@@ -98,7 +97,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 			continue;
 
 		ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
-					size, align == 0 ? 1 : align);
+					size, 0, align == 0 ? 1 : align, 0);
 		if (ret != NULL)
 			return ret;
 	}
@@ -256,5 +255,5 @@ rte_malloc_virt2phy(const void *addr)
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->mz->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->mz->addr);
+	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
-- 
1.9.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v4 1/4] ethdev: rename rte_eth_vmdq_mirror_conf
  @ 2015-06-26  7:03  5%       ` Wu, Jingjing
  0 siblings, 0 replies; 200+ results
From: Wu, Jingjing @ 2015-06-26  7:03 UTC (permalink / raw)
  To: 'nhorman@tuxdriver.com'; +Cc: dev

Hi, Neil

About this patch I have an ABI concern about it. 
This patch just renamed a struct rte_eth_vmdq_mirror_conf to rte_eth_mirror_conf, the size and its elements don't change. As my understanding, it will not break the ABI. And I also tested it. 
But when I use the script ./scripts/validate-abi.sh to check. A low severity problem is reported in symbol "rte_eth_mirror_rule_set"
 - Change: "Base type of 2nd parameter mirror_conf has been changed from struct rte_eth_vmdq_mirror_conf to struct rte_eth_mirror_conf."
 - Effect: "Replacement of parameter base type may indicate a change in its semantic meaning."

So, I'm not sure whether this patch meet the ABI policy?

Additional, about the validate-abi.sh, does it mean we need to fix all the problems it reports? Or we can decide case by case. Can a Low Severity problem be acceptable?

Look forward to your reply.

Thanks

Jingjing

> -----Original Message-----
> From: Wu, Jingjing
> Sent: Wednesday, June 10, 2015 2:25 PM
> To: dev@dpdk.org
> Cc: Wu, Jingjing; Liu, Jijiang; Jiajia, SunX; Zhang, Helin
> Subject: [PATCH v4 1/4] ethdev: rename rte_eth_vmdq_mirror_conf
> 
> rename rte_eth_vmdq_mirror_conf to rte_eth_mirror_conf and move the
> maximum rule id check from ethdev level to driver
> 
> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> ---
>  app/test-pmd/cmdline.c           | 22 +++++++++++-----------
>  drivers/net/ixgbe/ixgbe_ethdev.c | 11 +++++++----
> drivers/net/ixgbe/ixgbe_ethdev.h |  4 +++-
>  lib/librte_ether/rte_ethdev.c    | 18 ++----------------
>  lib/librte_ether/rte_ethdev.h    | 19 ++++++++++---------
>  5 files changed, 33 insertions(+), 41 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> f01db2a..d693bde 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -6604,11 +6604,11 @@ cmd_set_mirror_mask_parsed(void
> *parsed_result,  {
>  	int ret,nb_item,i;
>  	struct cmd_set_mirror_mask_result *res = parsed_result;
> -	struct rte_eth_vmdq_mirror_conf mr_conf;
> +	struct rte_eth_mirror_conf mr_conf;
> 
> -	memset(&mr_conf,0,sizeof(struct rte_eth_vmdq_mirror_conf));
> +	memset(&mr_conf, 0, sizeof(struct rte_eth_mirror_conf));
> 
> -	unsigned int vlan_list[ETH_VMDQ_MAX_VLAN_FILTERS];
> +	unsigned int vlan_list[ETH_MIRROR_MAX_VLANS];
> 
>  	mr_conf.dst_pool = res->dstpool_id;
> 
> @@ -6618,11 +6618,11 @@ cmd_set_mirror_mask_parsed(void
> *parsed_result,
>  	} else if(!strcmp(res->what, "vlan-mirror")) {
>  		mr_conf.rule_type_mask = ETH_VMDQ_VLAN_MIRROR;
>  		nb_item = parse_item_list(res->value, "core",
> -
> 	ETH_VMDQ_MAX_VLAN_FILTERS,vlan_list,1);
> +					ETH_MIRROR_MAX_VLANS, vlan_list,
> 1);
>  		if (nb_item <= 0)
>  			return;
> 
> -		for(i=0; i < nb_item; i++) {
> +		for (i = 0; i < nb_item; i++) {
>  			if (vlan_list[i] > ETHER_MAX_VLAN_ID) {
>  				printf("Invalid vlan_id: must be < 4096\n");
>  				return;
> @@ -6634,10 +6634,10 @@ cmd_set_mirror_mask_parsed(void
> *parsed_result,
>  	}
> 
>  	if(!strcmp(res->on, "on"))
> -		ret = rte_eth_mirror_rule_set(res->port_id,&mr_conf,
> +		ret = rte_eth_mirror_rule_set(res->port_id, &mr_conf,
>  						res->rule_id, 1);
>  	else
> -		ret = rte_eth_mirror_rule_set(res->port_id,&mr_conf,
> +		ret = rte_eth_mirror_rule_set(res->port_id, &mr_conf,
>  						res->rule_id, 0);
>  	if(ret < 0)
>  		printf("mirror rule add error: (%s)\n", strerror(-ret)); @@ -
> 6711,9 +6711,9 @@ cmd_set_mirror_link_parsed(void *parsed_result,  {
>  	int ret;
>  	struct cmd_set_mirror_link_result *res = parsed_result;
> -	struct rte_eth_vmdq_mirror_conf mr_conf;
> +	struct rte_eth_mirror_conf mr_conf;
> 
> -	memset(&mr_conf,0,sizeof(struct rte_eth_vmdq_mirror_conf));
> +	memset(&mr_conf, 0, sizeof(struct rte_eth_mirror_conf));
>  	if(!strcmp(res->what, "uplink-mirror")) {
>  		mr_conf.rule_type_mask = ETH_VMDQ_UPLINK_MIRROR;
>  	}else if(!strcmp(res->what, "downlink-mirror")) @@ -6722,10
> +6722,10 @@ cmd_set_mirror_link_parsed(void *parsed_result,
>  	mr_conf.dst_pool = res->dstpool_id;
> 
>  	if(!strcmp(res->on, "on"))
> -		ret = rte_eth_mirror_rule_set(res->port_id,&mr_conf,
> +		ret = rte_eth_mirror_rule_set(res->port_id, &mr_conf,
>  						res->rule_id, 1);
>  	else
> -		ret = rte_eth_mirror_rule_set(res->port_id,&mr_conf,
> +		ret = rte_eth_mirror_rule_set(res->port_id, &mr_conf,
>  						res->rule_id, 0);
> 
>  	/* check the return value and print it if is < 0 */ diff --git
> a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 0d9f9b2..9e767fa 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -209,7 +209,7 @@ static int ixgbe_set_pool_tx(struct rte_eth_dev
> *dev,uint16_t pool,uint8_t on);  static int ixgbe_set_pool_vlan_filter(struct
> rte_eth_dev *dev, uint16_t vlan,
>  		uint64_t pool_mask,uint8_t vlan_on);
>  static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
> -		struct rte_eth_vmdq_mirror_conf *mirror_conf,
> +		struct rte_eth_mirror_conf *mirror_conf,
>  		uint8_t rule_id, uint8_t on);
>  static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
>  		uint8_t	rule_id);
> @@ -3388,7 +3388,7 @@ ixgbe_set_pool_vlan_filter(struct rte_eth_dev
> *dev, uint16_t vlan,
> 
>  static int
>  ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
> -			struct rte_eth_vmdq_mirror_conf *mirror_conf,
> +			struct rte_eth_mirror_conf *mirror_conf,
>  			uint8_t rule_id, uint8_t on)
>  {
>  	uint32_t mr_ctl,vlvf;
> @@ -3412,7 +3412,10 @@ ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
>  		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> 
>  	if (ixgbe_vmdq_mode_check(hw) < 0)
> -		return (-ENOTSUP);
> +		return -ENOTSUP;
> +
> +	if (rule_id >= IXGBE_MAX_MIRROR_RULES)
> +		return -EINVAL;
> 
>  	/* Check if vlan mask is valid */
>  	if ((mirror_conf->rule_type_mask & ETH_VMDQ_VLAN_MIRROR)
> && (on)) { @@ -3526,7 +3529,7 @@ ixgbe_mirror_rule_reset(struct
> rte_eth_dev *dev, uint8_t rule_id)
>  		return (-ENOTSUP);
> 
>  	memset(&mr_info->mr_conf[rule_id], 0,
> -		sizeof(struct rte_eth_vmdq_mirror_conf));
> +		sizeof(struct rte_eth_mirror_conf));
> 
>  	/* clear PFVMCTL register */
>  	IXGBE_WRITE_REG(hw, IXGBE_MRCTL(rule_id), mr_ctl); diff --git
> a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
> index 19237b8..755b674 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> @@ -177,8 +177,10 @@ struct ixgbe_uta_info {
>  	uint32_t uta_shadow[IXGBE_MAX_UTA];
>  };
> 
> +#define IXGBE_MAX_MIRROR_RULES 4  /* Maximum nb. of mirror rules. */
> +
>  struct ixgbe_mirror_info {
> -	struct rte_eth_vmdq_mirror_conf
> mr_conf[ETH_VMDQ_NUM_MIRROR_RULE];
> +	struct rte_eth_mirror_conf mr_conf[IXGBE_MAX_MIRROR_RULES];
>  	/**< store PF mirror rules configuration*/  };
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 024fe8b..43c7295 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -3034,7 +3034,7 @@ int rte_eth_set_vf_rate_limit(uint8_t port_id,
> uint16_t vf, uint16_t tx_rate,
> 
>  int
>  rte_eth_mirror_rule_set(uint8_t port_id,
> -			struct rte_eth_vmdq_mirror_conf *mirror_conf,
> +			struct rte_eth_mirror_conf *mirror_conf,
>  			uint8_t rule_id, uint8_t on)
>  {
>  	struct rte_eth_dev *dev = &rte_eth_devices[port_id]; @@ -3051,7
> +3051,7 @@ rte_eth_mirror_rule_set(uint8_t port_id,
> 
>  	if (mirror_conf->dst_pool >= ETH_64_POOLS) {
>  		PMD_DEBUG_TRACE("Invalid dst pool, pool id must"
> -			"be 0-%d\n",ETH_64_POOLS - 1);
> +			"be 0-%d\n", ETH_64_POOLS - 1);
>  		return -EINVAL;
>  	}
> 
> @@ -3062,13 +3062,6 @@ rte_eth_mirror_rule_set(uint8_t port_id,
>  		return -EINVAL;
>  	}
> 
> -	if(rule_id >= ETH_VMDQ_NUM_MIRROR_RULE)
> -	{
> -		PMD_DEBUG_TRACE("Invalid rule_id, rule_id must be 0-
> %d\n",
> -			ETH_VMDQ_NUM_MIRROR_RULE - 1);
> -		return -EINVAL;
> -	}
> -
>  	dev = &rte_eth_devices[port_id];
>  	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mirror_rule_set, -
> ENOTSUP);
> 
> @@ -3085,13 +3078,6 @@ rte_eth_mirror_rule_reset(uint8_t port_id,
> uint8_t rule_id)
>  		return -ENODEV;
>  	}
> 
> -	if(rule_id >= ETH_VMDQ_NUM_MIRROR_RULE)
> -	{
> -		PMD_DEBUG_TRACE("Invalid rule_id, rule_id must be 0-
> %d\n",
> -			ETH_VMDQ_NUM_MIRROR_RULE-1);
> -		return -EINVAL;
> -	}
> -
>  	dev = &rte_eth_devices[port_id];
>  	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mirror_rule_reset, -
> ENOTSUP);
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16dbe00..ae22fea 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -467,8 +467,8 @@ struct rte_eth_rss_conf {
>  #define ETH_VMDQ_ACCEPT_BROADCAST   0x0008 /**< accept broadcast
> packets. */
>  #define ETH_VMDQ_ACCEPT_MULTICAST   0x0010 /**< multicast
> promiscuous. */
> 
> -/* Definitions used for VMDQ mirror rules setting */
> -#define ETH_VMDQ_NUM_MIRROR_RULE     4 /**< Maximum nb. of mirror
> rules. . */
> +/** Maximum nb. of vlan per mirror rule */
> +#define ETH_MIRROR_MAX_VLANS       64
> 
>  #define ETH_VMDQ_POOL_MIRROR    0x0001 /**< Virtual Pool Mirroring. */
>  #define ETH_VMDQ_UPLINK_MIRROR  0x0002 /**< Uplink Port Mirroring.
> */ @@ -480,18 +480,19 @@ struct rte_eth_rss_conf {
>   */
>  struct rte_eth_vlan_mirror {
>  	uint64_t vlan_mask; /**< mask for valid VLAN ID. */
> -	uint16_t vlan_id[ETH_VMDQ_MAX_VLAN_FILTERS];
> -	/** VLAN ID list for vlan mirror. */
> +	/** VLAN ID list for vlan mirroring. */
> +	uint16_t vlan_id[ETH_MIRROR_MAX_VLANS];
>  };
> 
>  /**
>   * A structure used to configure traffic mirror of an Ethernet port.
>   */
> -struct rte_eth_vmdq_mirror_conf {
> +struct rte_eth_mirror_conf {
>  	uint8_t rule_type_mask; /**< Mirroring rule type mask we want to
> set */
> -	uint8_t dst_pool; /**< Destination pool for this mirror rule. */
> +	uint8_t dst_pool;  /**< Destination pool for this mirror rule. */
>  	uint64_t pool_mask; /**< Bitmap of pool for pool mirroring */
> -	struct rte_eth_vlan_mirror vlan; /**< VLAN ID setting for VLAN
> mirroring */
> +	/** VLAN ID setting for VLAN mirroring. */
> +	struct rte_eth_vlan_mirror vlan;
>  };
> 
>  /**
> @@ -1211,7 +1212,7 @@ typedef int (*eth_set_vf_rate_limit_t)(struct
> rte_eth_dev *dev,  /**< @internal Set VF TX rate */
> 
>  typedef int (*eth_mirror_rule_set_t)(struct rte_eth_dev *dev,
> -				  struct rte_eth_vmdq_mirror_conf
> *mirror_conf,
> +				  struct rte_eth_mirror_conf *mirror_conf,
>  				  uint8_t rule_id,
>  				  uint8_t on);
>  /**< @internal Add a traffic mirroring rule on an Ethernet device */ @@ -
> 3168,7 +3169,7 @@ rte_eth_dev_set_vf_vlan_filter(uint8_t port, uint16_t
> vlan_id,
>   *   - (-EINVAL) if the mr_conf information is not correct.
>   */
>  int rte_eth_mirror_rule_set(uint8_t port_id,
> -			struct rte_eth_vmdq_mirror_conf *mirror_conf,
> +			struct rte_eth_mirror_conf *mirror_conf,
>  			uint8_t rule_id,
>  			uint8_t on);
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type
  2015-06-16  3:43  3% ` [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type Jingjing Wu
  2015-06-26  2:26  0%   ` Xu, HuilongX
@ 2015-06-26  3:14  0%   ` Zhang, Helin
  1 sibling, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-26  3:14 UTC (permalink / raw)
  To: Wu, Jingjing, dev



> -----Original Message-----
> From: Wu, Jingjing
> Sent: Tuesday, June 16, 2015 11:44 AM
> To: dev@dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Xu, HuilongX
> Subject: [PATCH v2 0/4] extend flow director to support L2_paylod type
> 
> This patch set extends flow director to support L2_paylod type in i40e driver.
> 
> v2 change:
>  - remove the flow director VF filtering from this patch to avoid breaking ABI.
> 
> Jingjing Wu (4):
>   ethdev: add struct rte_eth_l2_flow to support l2_payload flow type
>   i40e: extend flow diretcor to support l2_payload flow type
>   testpmd: extend commands
>   doc: extend commands in testpmd
> 
>  app/test-pmd/cmdline.c                      | 48
> +++++++++++++++++++++++++++--
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  5 ++-
>  drivers/net/i40e/i40e_fdir.c                | 24 +++++++++++++--
>  lib/librte_ether/rte_eth_ctrl.h             |  8 +++++
>  4 files changed, 78 insertions(+), 7 deletions(-)
Acked-by: Helin Zhang <helin.zhang@intel.com>


> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type
  2015-06-16  3:43  3% ` [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type Jingjing Wu
@ 2015-06-26  2:26  0%   ` Xu, HuilongX
  2015-06-26  3:14  0%   ` Zhang, Helin
  1 sibling, 0 replies; 200+ results
From: Xu, HuilongX @ 2015-06-26  2:26 UTC (permalink / raw)
  To: Wu, Jingjing, dev

Tested-by:huilong,xu <huilongx.xu@intel.com>
Os: 3.11.10-301.fc20.x86_64
Gcc: gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC)
Package: d2c08067240baf27f2447bf4981b9ab58ce74d35 + l2payload patch
NIC: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ [8086:1583]
Test case Summary: 1 case passed
Test sets:
1. build dpdk with x86_64 gcc target 
2. set hugpage and bind igb_uio to Fortville nic 
3. ./testpmd -c ffff -n 4 -- -I --portmask=0x3 --disable-rss --rxq=2 --txq=2 --nbcores=8 --pkt-filter-mode=perfect
4. exec testpmd cmdline
    a) set verbose 1
    b) set fwd rxonly
    c) flow_director_filter 0 add flow l2_payload ether 0x0806 flexbytes () fwd queue 1 fd 1
    d) start
5. send a arp package to port 0
6. testpmd print port 0 queue 1 received a pachage

  
> -----Original Message-----
> From: Wu, Jingjing
> Sent: Tuesday, June 16, 2015 11:44 AM
> To: dev@dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Xu, HuilongX
> Subject: [PATCH v2 0/4] extend flow director to support L2_paylod type
> 
> This patch set extends flow director to support L2_paylod type in i40e
> driver.
> 
> v2 change:
>  - remove the flow director VF filtering from this patch to avoid breaking
> ABI.
> 
> Jingjing Wu (4):
>   ethdev: add struct rte_eth_l2_flow to support l2_payload flow type
>   i40e: extend flow diretcor to support l2_payload flow type
>   testpmd: extend commands
>   doc: extend commands in testpmd
> 
>  app/test-pmd/cmdline.c                      | 48
> +++++++++++++++++++++++++++--
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  5 ++-
>  drivers/net/i40e/i40e_fdir.c                | 24 +++++++++++++--
>  lib/librte_ether/rte_eth_ctrl.h             |  8 +++++
>  4 files changed, 78 insertions(+), 7 deletions(-)
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 10/11] doc: announce ABI change of librte_hash
  2015-06-25 22:05  4% ` [dpdk-dev] [PATCH v2 00/11] Cuckoo hash Pablo de Lara
@ 2015-06-25 22:05 14%   ` Pablo de Lara
  0 siblings, 0 replies; 200+ results
From: Pablo de Lara @ 2015-06-25 22:05 UTC (permalink / raw)
  To: dev

rte_hash structure is now private for version 2.1, and two
of the macros in rte_hash.h are now deprecated, so this patch
adds notice of these changes.

Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
---
 doc/guides/rel_notes/abi.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..fae09fd 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,5 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* Structure rte_hash in librte_hash library has been changed and has been made private in relese 2.1, as applications should have never accessed to its internal data (library should have been marked as internal).
+* The Macros #RTE_HASH_BUCKET_ENTRIES_MAX and #RTE_HASH_KEY_LENGTH_MAX are deprecated and will be removed with version 2.2.
-- 
2.4.2

^ permalink raw reply	[relevance 14%]

* [dpdk-dev] [PATCH v2 00/11] Cuckoo hash
    @ 2015-06-25 22:05  4% ` Pablo de Lara
  2015-06-25 22:05 14%   ` [dpdk-dev] [PATCH v2 10/11] doc: announce ABI change of librte_hash Pablo de Lara
  1 sibling, 1 reply; 200+ results
From: Pablo de Lara @ 2015-06-25 22:05 UTC (permalink / raw)
  To: dev

This patchset is to replace the existing hash library with
a more efficient and functional approach, using the Cuckoo hash
method to deal with collisions. This method is based on using
two different hash functions to have two possible locations
in the hash table where an entry can be.
So, if a bucket is full, a new entry can push one of the items
in that bucket to its alternative location, making space for itself.

Advantages
~~~~~~~
- Offers the option to store more entries when the target bucket is full
  (unlike the previous implementation)
- Memory efficient: for storing those entries, it is not necessary to
  request new memory, as the entries will be stored in the same table
- Constant worst lookup time: in worst case scenario, it always takes
  the same time to look up an entry, as there are only two possible locations
  where an entry can be.
- Storing data: user can store data in the hash table, unlike the
  previous implementation, but he can still use the old API

This implementation tipically offers over 90% utilization.
Notice that API has been extended, but old API remains. The main
change in ABI is that rte_hash structure is now private and the
deprecation of two macros.

Changes in v2:

- Fixed issue where table could not store maximum number of entries
- Fixed issue where lookup burst could not be more than 32 (instead of 64)
- Remove unnecessary macros and add other useful ones
- Added missing library dependencies
- Used directly rte_hash_secondary instead of rte_hash_alt
- Renamed rte_hash.c to rte_cuckoo_hash.c to ease the view of the new library
- Renamed test_hash_perf.c temporarily to ease the view of the improved unit test
- Moved rte_hash, rte_bucket and rte_hash_key structures to rte_cuckoo_hash.c to
  make them private
- Corrected copyright dates
- Added an optimized function to compare keys that are multiple of 16 bytes
- Improved the way to use primary/secondary signatures. Now both are stored in
  the bucket, so there is no need to calculate them if required.
  Also, there is no need to use the MSB of a signature to differenciate between
  an empty entry and signature 0, since we are storing both signatures,
  which cannot be both 0.
- Removed rte_hash_rehash, as it was a very expensive operation.
  Therefore, the add function returns now -ENOSPC if key cannot be added
  because of a loop.
- Prefetched new slot for new key in add function to improve performance.
- Made doxygen comments more clear.
- Removed unnecessary rte_hash_del_key_data and rte_hash_del_key_with_data,
  as we can use the lookup functions if we want to get the data before deleting.
- Removed some unnecessary includes in rte_hash.h
- Removed some unnecessary variables in rte_cuckoo_hash.c
- Removed some unnecessary checks before creating a new hash table 
- Added documentation (in release notes and programmers guide)
- Added new unit tests and replaced the performance one for hash tables

Pablo de Lara (11):
  eal: add const in prefetch functions
  hash: move rte_hash structure to C file and make it internal
  test/hash: enhance hash unit tests
  test/hash: rename new hash perf unit test back to original name
  hash: replace existing hash library with cuckoo hash implementation
  hash: add new lookup_bulk_with_hash function
  hash: add new function rte_hash_reset
  hash: add new functionality to store data in hash table
  MAINTAINERS: claim responsability for hash library
  doc: announce ABI change of librte_hash
  doc: update hash documentation

 MAINTAINERS                                        |    1 +
 app/test/test_hash.c                               |  189 +--
 app/test/test_hash_perf.c                          |  906 ++++++-------
 doc/guides/prog_guide/hash_lib.rst                 |   77 +-
 doc/guides/rel_notes/abi.rst                       |    2 +
 .../common/include/arch/ppc_64/rte_prefetch.h      |    6 +-
 .../common/include/arch/x86/rte_prefetch.h         |   14 +-
 .../common/include/generic/rte_prefetch.h          |    8 +-
 lib/librte_hash/Makefile                           |    8 +-
 lib/librte_hash/rte_cuckoo_hash.c                  | 1394 ++++++++++++++++++++
 lib/librte_hash/rte_hash.c                         |  471 -------
 lib/librte_hash/rte_hash.h                         |  274 +++-
 lib/librte_hash/rte_hash_version.map               |   15 +
 13 files changed, 2191 insertions(+), 1174 deletions(-)
 create mode 100644 lib/librte_hash/rte_cuckoo_hash.c
 delete mode 100644 lib/librte_hash/rte_hash.c

-- 
2.4.2

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCHv3 3/3] ABI: Add some documentation
  @ 2015-06-25 14:35 28%   ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-06-25 14:35 UTC (permalink / raw)
  To: dev

People have been asking for ways to use the ABI macros, heres some docs to
clarify their use.  Included is:

* An overview of what ABI is
* Details of the ABI deprecation process
* Details of the versioning macros
* Examples of their use
* Details of how to use the ABI validator

Thanks to John Mcnamara, who duplicated much of this effort at Intel while I was
working on it.  Much of the introductory material was gathered and cleaned up by
him

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.mcnamara@intel.com
CC: thomas.monjalon@6wind.com

Change notes:

v2)
     * Fixed RST indentations and spelling errors
     * Rebased to upstream to fix index.rst conflict

v3)
     * Fixed in tact -> intact
     * Added docs to address static linking
     * Removed duplicate documentation from release notes
---
 doc/guides/guidelines/index.rst      |   1 +
 doc/guides/guidelines/versioning.rst | 484 +++++++++++++++++++++++++++++++++++
 doc/guides/rel_notes/abi.rst         |  30 +--
 3 files changed, 487 insertions(+), 28 deletions(-)
 create mode 100644 doc/guides/guidelines/versioning.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index 0ee9ab3..bfb9fa3 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -7,3 +7,4 @@ Guidelines
 
     coding_style
     design
+    versioning
diff --git a/doc/guides/guidelines/versioning.rst b/doc/guides/guidelines/versioning.rst
new file mode 100644
index 0000000..da9eca0
--- /dev/null
+++ b/doc/guides/guidelines/versioning.rst
@@ -0,0 +1,484 @@
+Managing ABI updates
+====================
+
+Description
+-----------
+
+This document details some methods for handling ABI management in the DPDK.
+Note this document is not exhaustive, in that C library versioning is flexible
+allowing multiple methods to achieve various goals, but it will provide the user
+with some introductory methods
+
+General Guidelines
+------------------
+
+#. Whenever possible, ABI should be preserved
+#. The addition of symbols is generally not problematic
+#. The modification of symbols can generally be managed with versioning
+#. The removal of symbols generally is an ABI break and requires bumping of the
+   LIBABIVER macro
+
+What is an ABI
+--------------
+
+An ABI (Application Binary Interface) is the set of runtime interfaces exposed
+by a library. It is similar to an API (Application Programming Interface) but
+is the result of compilation.  It is also effectively cloned when applications
+link to dynamic libraries.  That is to say when an application is compiled to
+link against dynamic libraries, it is assumed that the ABI remains constant
+between the time the application is compiled/linked, and the time that it runs.
+Therefore, in the case of dynamic linking, it is critical that an ABI is
+preserved, or (when modified), done in such a way that the application is unable
+to behave improperly or in an unexpected fashion.
+
+The DPDK ABI policy
+-------------------
+
+ABI versions are set at the time of major release labeling, and the ABI may
+change multiple times, without warning, between the last release label and the
+HEAD label of the git tree.
+
+ABI versions, once released, are available until such time as their
+deprecation has been noted in the Release Notes for at least one major release
+cycle. For example consider the case where the ABI for DPDK 2.0 has been
+shipped and then a decision is made to modify it during the development of
+DPDK 2.1. The decision will be recorded in the Release Notes for the DPDK 2.1
+release and the modification will be made available in the DPDK 2.2 release.
+
+ABI versions may be deprecated in whole or in part as needed by a given
+update.
+
+Some ABI changes may be too significant to reasonably maintain multiple
+versions. In those cases ABI's may be updated without backward compatibility
+being provided. The requirements for doing so are:
+
+#. At least 3 acknowledgments of the need to do so must be made on the
+   dpdk.org mailing list.
+
+#. A full deprecation cycle, as explained above, must be made to offer
+   downstream consumers sufficient warning of the change.
+
+#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
+   incorporated must be incremented in parallel with the ABI changes
+   themselves.
+
+Note that the above process for ABI deprecation should not be undertaken
+lightly. ABI stability is extremely important for downstream consumers of the
+DPDK, especially when distributed in shared object form. Every effort should
+be made to preserve the ABI whenever possible. The ABI should only be changed
+for significant reasons, such as performance enhancements. ABI breakage due to
+changes such as reorganizing public structure fields for aesthetic or
+readability purposes should be avoided.
+
+Examples of Deprecation Notices
+-------------------------------
+
+The following are some examples of ABI deprecation notices which would be
+added to the Release Notes:
+
+* The Macro ``#RTE_FOO`` is deprecated and will be removed with version 2.0,
+  to be replaced with the inline function ``rte_foo()``.
+
+* The function ``rte_mbuf_grok()`` has been updated to include a new parameter
+  in version 2.0. Backwards compatibility will be maintained for this function
+  until the release of version 2.1
+
+* The members of ``struct rte_foo`` have been reorganized in release 2.0 for
+  performance reasons. Existing binary applications will have backwards
+  compatibility in release 2.0, while newly built binaries will need to
+  reference the new structure variant ``struct rte_foo2``. Compatibility will
+  be removed in release 2.2, and all applications will require updating and
+  rebuilding to the new structure at that time, which will be renamed to the
+  original ``struct rte_foo``.
+
+* Significant ABI changes are planned for the ``librte_dostuff`` library. The
+  upcoming release 2.0 will not contain these changes, but release 2.1 will,
+  and no backwards compatibility is planned due to the extensive nature of
+  these changes. Binaries using this library built prior to version 2.1 will
+  require updating and recompilation.
+
+Versioning Macros
+-----------------
+
+When a symbol is exported from a library to provide an API, it also provides a
+calling convention (ABI) that is embodied in its name, return type and
+arguments. Occasionally that function may need to change to accommodate new
+functionality or behavior. When that occurs, it is desirable to allow for
+backward compatibility for a time with older binaries that are dynamically
+linked to the DPDK.
+
+To support backward compatibility the ``lib/librte_compat/rte_compat.h``
+header file provides macros to use when updating exported functions. These
+macros are used in conjunction with the ``rte_<library>_version.map`` file for
+a given library to allow multiple versions of a symbol to exist in a shared
+library so that older binaries need not be immediately recompiled.
+
+The macros exported are:
+
+* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry binding
+  unversioned symbol ``b`` to the internal function ``b_e``.
+
+
+* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
+  unversioned symbol ``b`` to the internal function ``b_e``.
+
+* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry instructing
+  the linker to bind references to symbol ``b`` to the internal symbol
+  ``b_e``.
+
+* ``MAP_STATIC_SYMBOL(f, p)``: Declare the prototype ``f``, and map it to the fully
+qualified function ``p``, so that if a symbol becomes versioned, it can still be
+mapped back to the public symbol name.
+
+Examples of ABI Macro use
+-------------------------
+
+Updating a public API
+~~~~~~~~~~~~~~~~~~~~~
+
+Assume we have a function as follows
+
+.. code-block:: c
+
+ /*
+  * Create an acl context object for apps to 
+  * manipulate
+  */
+ struct rte_acl_ctx *
+ rte_acl_create(const struct rte_acl_param *param)
+ {
+        ...
+ }
+
+
+Assume that struct rte_acl_ctx is a private structure, and that a developer
+wishes to enhance the acl api so that a debugging flag can be enabled on a
+per-context basis.  This requires an addition to the structure (which, being
+private, is safe), but it also requires modifying the code as follows
+
+.. code-block:: c
+
+ /*
+  * Create an acl context object for apps to 
+  * manipulate
+  */
+ struct rte_acl_ctx *
+ rte_acl_create(const struct rte_acl_param *param, int debug)
+ {
+        ...
+ }
+
+
+Note also that, being a public function, the header file prototype must also be
+changed, as must all the call sites, to reflect the new ABI footprint.  We will
+maintain previous ABI versions that are accessible only to previously compiled
+binaries
+
+The addition of a parameter to the function is ABI breaking as the function is
+public, and existing application may use it in its current form.  However, the
+compatibility macros in DPDK allow a developer to use symbol versioning so that
+multiple functions can be mapped to the same public symbol based on when an
+application was linked to it.  To see how this is done, we start with the
+requisite libraries version map file.  Initially the version map file for the
+acl library looks like this
+
+.. code-block:: none 
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_create;
+        rte_acl_dump;
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+   };
+
+This file needs to be modified as follows
+
+.. code-block:: none
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_create;
+        rte_acl_dump;
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+   };
+
+   DPDK_2.1 {
+        global:
+        rte_acl_create;
+
+   } DPDK_2.0;
+
+The addition of the new block tells the linker that a new version node is
+available (DPDK_2.1), which contains the symbol rte_acl_create, and inherits the
+symbols from the DPDK_2.0 node.  This list is directly translated into a list of
+exported symbols when DPDK is compiled as a shared library
+
+Next, we need to specify in the code which function map to the rte_acl_create
+symbol at which versions.  First, at the site of the initial symbol definition,
+we need to update the function so that it is uniquely named, and not in conflict
+with the public symbol name
+
+.. code-block:: c
+
+  struct rte_acl_ctx *
+ -rte_acl_create(const struct rte_acl_param *param)
+ +rte_acl_create_v20(const struct rte_acl_param *param)
+ {
+        size_t sz;
+        struct rte_acl_ctx *ctx;
+        ...
+
+Note that the base name of the symbol was kept intact, as this is condusive to
+the macros used for versioning symbols.  That is our next step, mapping this new
+symbol name to the initial symbol name at version node 2.0.  Immediately after
+the function, we add this line of code
+
+.. code-block:: c
+
+   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
+
+Remembering to also add the rte_compat.h header to the requisite c file where
+these changes are being made.  The above macro instructs the linker to create a
+new symbol ``rte_acl_create@DPDK_2.0``, which matches the symbol created in older
+builds, but now points to the above newly named function.  We have now mapped
+the original rte_acl_create symbol to the original function (but with a new
+name)
+
+Next, we need to create the 2.1 version of the symbol.  We create a new function
+name, with a different suffix, and  implement it appropriately
+
+.. code-block:: c
+
+   struct rte_acl_ctx *
+   rte_acl_create_v21(const struct rte_acl_param *param, int debug);
+   {
+        struct rte_acl_ctx *ctx = rte_acl_create_v20(param);
+
+        ctx->debug = debug;
+
+        return ctx;
+   }
+
+This code serves as our new API call.  Its the same as our old call, but adds
+the new parameter in place.  Next we need to map this function to the symbol
+``rte_acl_create@DPDK_2.1``.  To do this, we modify the public prototype of the call
+in the header file, adding the macro there to inform all including applications,
+that on re-link, the default rte_acl_create symbol should point to this
+function.  Note that we could do this by simply naming the function above
+rte_acl_create, and the linker would chose the most recent version tag to apply
+in the version script, but we can also do this in the header file
+
+.. code-block:: c
+
+   struct rte_acl_ctx *
+   -rte_acl_create(const struct rte_acl_param *param);
+   +rte_acl_create(const struct rte_acl_param *param, int debug);
+   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
+
+The BIND_DEFAULT_SYMBOL macro explicitly tells applications that include this
+header, to link to the rte_acl_create_v21 function and apply the DPDK_2.1
+version node to it.  This method is more explicit and flexible than just
+re-implementing the exact symbol name, and allows for other features (such as
+linking to the old symbol version by default, when the new ABI is to be opt-in
+for a period.
+
+One last thing we need to do.  Note that we've taken what was a public symbol,
+and duplicated it into two uniquely and differently named symbols.  We've then
+mapped each of those back to the public symbol ``rte_acl_create`` with different
+version tags.  This only applies to dynamic linking, as static linking has no
+notion of versioning.  That leaves this code in a position of no longer having a
+symbol simply named ``rte_acl_create`` and a static build will fail on that
+missing symbol.
+
+To correct this, we can simply map a function of our choosing back to the public
+symbol in the static build with the ``MAP_STATIC_SYMBOL`` macro.  Generally the
+assumption is that the most recent version of the symbol is the one you want to
+map.  So, back in the C file where, immediately after ``rte_acl_create_v21`` is
+defined, we add this
+
+.. code-block:: c
+
+   struct rte_acl_create_v21(const struct rte_acl_param *param, int debug)
+   {
+        ...
+   }
+   MAP_STATIC_SYMBOL(struct rte_acl_create(const struct rte_acl_param *param, int debug), rte_acl_create_v21);
+
+That tells the compiler that, when building a static library, any calls to the
+symbol ``rte_acl_create`` should be linked to ``rte_acl_create_v21``
+
+That's it, on the next shared library rebuild, there will be two versions of
+rte_acl_create, an old DPDK_2.0 version, used by previously built applications,
+and a new DPDK_2.1 version, used by future built applications.
+
+
+Deprecating part of a public API
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Lets assume that you've done the above update, and after a few releases have
+passed you decide you would like to retire the old version of the function.
+After having gone through the ABI deprecation announcement process, removal is
+easy.  Start by removing the symbol from the requisite version map file:
+
+.. code-block:: none
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_dump;
+ -      rte_acl_create
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+   };
+
+   DPDK_2.1 {
+        global:
+        rte_acl_create;
+   } DPDK_2.0;
+
+
+Next remove the corresponding versioned export
+.. code-block:: c
+
+ -VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
+
+
+Note that the internal function definition could also be removed, but its used
+in our example by the newer version _v21, so we leave it in place.  This is a
+coding style choice.
+
+Lastly, we need to bump the LIBABIVER number for this library in the Makefile to
+indicate to applications doing dynamic linking that this is a later, and
+possibly incompatible library version:
+
+.. code-block:: c
+
+   -LIBABIVER := 1
+   +LIBABIVER := 2
+
+Deprecating an entire ABI version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While removing a symbol from and ABI may be useful, it is often more practical
+to remove an entire version node at once.  If a version node completely
+specifies an API, then removing part of it, typically makes it incomplete.  In
+those cases it is better to remove the entire node
+ 
+To do this, start by modifying the version map file, such that all symbols from
+the node to be removed are merged into the next node in the map
+
+In the case of our map above, it would transform to look as follows
+
+.. code-block:: none
+
+   DPDK_2.1 {              
+        global:
+              
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_dump;
+        rte_acl_create
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+              
+        local: *;
+ };           
+
+Then any uses of BIND_DEFAULT_SYMBOL that pointed to the old node should be
+updated to point to the new version node in any header files for all affected
+symbols.
+
+.. code-block:: c
+
+ -BIND_DEFAULT_SYMBOL(rte_acl_create, _v20, 2.0);
+ +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
+
+Lastly, any VERSION_SYMBOL macros that point to the old version node should be
+removed, taking care to keep, where need old code in place to support newer
+versions of the symbol.
+
+Running the ABI Validator
+-------------------------
+
+The ``scripts`` directory in the DPDK source tree contains a utility program,
+``validate-abi.sh``, for validating the DPDK ABI based on the Linux `ABI
+Compliance Checker
+<http://ispras.linuxbase.org/index.php/ABI_compliance_checker>`_.
+
+This has a dependency on the ``abi-compliance-checker`` and ``and abi-dumper``
+utilities which can be installed via a package manager. For example::
+
+   sudo yum install abi-compliance-checker
+   sudo yum install abi-dumper
+
+The syntax of the ``validate-abi.sh`` utility is::
+
+   ./scripts/validate-abi.sh <TAG1> <TAG2> <TARGET>
+
+Where ``TAG1`` and ``TAG2`` are valid git tags on the local repo and target is
+the usual DPDK compilation target.
+
+For example to test the current committed HEAD against a previous release tag
+we could add a temporary tag and run the utility as follows::
+
+   git tag MY_TEMP_TAG
+   ./scripts/validate-abi.sh v2.0.0 MY_TEMP_TAG x86_64-native-linuxapp-gcc
+
+After the validation script completes (it can take a while since it need to
+compile both tags) it will create compatibility reports in the
+``./compat_report`` directory. Listed incompatibilities can be found as
+follows::
+
+  grep -lr Incompatible compat_reports/
diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..4086198 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -1,33 +1,7 @@
 ABI policy
 ==========
-ABI versions are set at the time of major release labeling, and ABI may change
-multiple times between the last labeling and the HEAD label of the git tree
-without warning.
-
-ABI versions, once released are available until such time as their
-deprecation has been noted here for at least one major release cycle, after it
-has been tagged.  E.g. the ABI for DPDK 2.0 is shipped, and then the decision to
-remove it is made during the development of DPDK 2.1.  The decision will be
-recorded here, shipped with the DPDK 2.1 release, and actually removed when DPDK
-2.2 ships.
-
-ABI versions may be deprecated in whole, or in part as needed by a given update.
-
-Some ABI changes may be too significant to reasonably maintain multiple
-versions of.  In those events ABI's may be updated without backward
-compatibility provided.  The requirements for doing so are:
-
-#. At least 3 acknowledgments of the need on the dpdk.org
-#. A full deprecation cycle must be made to offer downstream consumers sufficient warning of the change.  E.g. if dpdk 2.0 is under development when the change is proposed, a deprecation notice must be added to this file, and released with dpdk 2.0.  Then the change may be incorporated for dpdk 2.1
-#. The LIBABIVER variable in the makefile(s) where the ABI changes are incorporated must be incremented in parallel with the ABI changes themselves
-
-Note that the above process for ABI deprecation should not be undertaken
-lightly.  ABI stability is extremely important for downstream consumers of the
-DPDK, especially when distributed in shared object form.  Every effort should be
-made to preserve ABI whenever possible.  For instance, reorganizing public
-structure field for aesthetic or readability purposes should be avoided as it will
-cause ABI breakage.  Only significant (e.g. performance) reasons should be seen
-as cause to alter ABI.
+See the guidelines document for details of the ABI policy.  ABI deprecation
+notices are to be posted here
 
 Examples of Deprecation Notices
 -------------------------------
-- 
2.1.0

^ permalink raw reply	[relevance 28%]

* [dpdk-dev] [PATCH v4 2/9] eal: memzone allocated by malloc
  2015-06-25 14:05  4%   ` [dpdk-dev] [PATCH v4 0/9] Dynamic memzone Sergio Gonzalez Monroy
@ 2015-06-25 14:05  1%     ` Sergio Gonzalez Monroy
  2015-06-25 14:05 14%     ` [dpdk-dev] [PATCH v4 8/9] doc: announce ABI change of librte_malloc Sergio Gonzalez Monroy
  1 sibling, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-25 14:05 UTC (permalink / raw)
  To: dev

In the current memory hierarchy, memsegs are groups of physically
contiguous hugepages, memzones are slices of memsegs and malloc further
slices memzones into smaller memory chunks.

This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.

It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c        | 274 ++++++----------------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   2 +-
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/malloc_elem.c               |  68 ++++--
 lib/librte_eal/common/malloc_elem.h               |  14 +-
 lib/librte_eal/common/malloc_heap.c               | 140 ++++++-----
 lib/librte_eal/common/malloc_heap.h               |   6 +-
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 8 files changed, 197 insertions(+), 317 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index aee184a..943012b 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -50,15 +50,15 @@
 #include <rte_string_fns.h>
 #include <rte_common.h>
 
+#include "malloc_heap.h"
+#include "malloc_elem.h"
 #include "eal_private.h"
 
-/* internal copy of free memory segments */
-static struct rte_memseg *free_memseg = NULL;
-
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
 	const struct rte_mem_config *mcfg;
+	const struct rte_memzone *mz;
 	unsigned i = 0;
 
 	/* get pointer to global configuration */
@@ -68,8 +68,9 @@ memzone_lookup_thread_unsafe(const char *name)
 	 * the algorithm is not optimal (linear), but there are few
 	 * zones and this function should be called at init only
 	 */
-	for (i = 0; i < RTE_MAX_MEMZONE && mcfg->memzone[i].addr != NULL; i++) {
-		if (!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
+	for (i = 0; i < RTE_MAX_MEMZONE; i++) {
+		mz = &mcfg->memzone[i];
+		if (mz->addr != NULL && !strncmp(name, mz->name, RTE_MEMZONE_NAMESIZE))
 			return &mcfg->memzone[i];
 	}
 
@@ -88,39 +89,45 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 			len, socket_id, flags, RTE_CACHE_LINE_SIZE);
 }
 
-/*
- * Helper function for memzone_reserve_aligned_thread_unsafe().
- * Calculate address offset from the start of the segment.
- * Align offset in that way that it satisfy istart alignmnet and
- * buffer of the  requested length would not cross specified boundary.
- */
-static inline phys_addr_t
-align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align,
-	size_t bound)
+/* Find the heap with the greatest free block size */
+static void
+find_heap_max_free_elem(int *s, size_t *len, unsigned align)
 {
-	phys_addr_t addr_offset, bmask, end, start;
-	size_t step;
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-	step = RTE_MAX(align, bound);
-	bmask = ~((phys_addr_t)bound - 1);
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* calculate offset to closest alignment */
-	start = RTE_ALIGN_CEIL(ms->phys_addr, align);
-	addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size > *len) {
+			*len = stats.greatest_free_size;
+			*s = i;
+		}
+	}
+	*len -= (MALLOC_ELEM_OVERHEAD + align);
+}
 
-	while (addr_offset + len < ms->len) {
+/* Find a heap that can allocate the requested size */
+static void
+find_heap_suitable(int *s, size_t len, unsigned align)
+{
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-		/* check, do we meet boundary condition */
-		end = start + len - (len != 0);
-		if ((start & bmask) == (end & bmask))
-			break;
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-		/* calculate next offset */
-		start = RTE_ALIGN_CEIL(start + 1, step);
-		addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size >= len + MALLOC_ELEM_OVERHEAD + align) {
+			*s = i;
+			break;
+		}
 	}
-
-	return addr_offset;
 }
 
 static const struct rte_memzone *
@@ -128,13 +135,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
 	struct rte_mem_config *mcfg;
-	unsigned i = 0;
-	int memseg_idx = -1;
-	uint64_t addr_offset, seg_offset = 0;
 	size_t requested_len;
-	size_t memseg_len = 0;
-	phys_addr_t memseg_physaddr;
-	void *memseg_addr;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -166,7 +167,6 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (align < RTE_CACHE_LINE_SIZE)
 		align = RTE_CACHE_LINE_SIZE;
 
-
 	/* align length on cache boundary. Check for overflow before doing so */
 	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
 		rte_errno = EINVAL; /* requested size too big */
@@ -180,129 +180,50 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	requested_len = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE,  len);
 
 	/* check that boundary condition is valid */
-	if (bound != 0 &&
-			(requested_len > bound || !rte_is_power_of_2(bound))) {
+	if (bound != 0 && (requested_len > bound || !rte_is_power_of_2(bound))) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	/* find the smallest segment matching requirements */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		/* last segment */
-		if (free_memseg[i].addr == NULL)
-			break;
+	if (len == 0) {
+		if (bound != 0)
+			requested_len = bound;
+		else
+			requested_len = 0;
+	}
 
-		/* empty segment, skip it */
-		if (free_memseg[i].len == 0)
-			continue;
-
-		/* bad socket ID */
-		if (socket_id != SOCKET_ID_ANY &&
-		    free_memseg[i].socket_id != SOCKET_ID_ANY &&
-		    socket_id != free_memseg[i].socket_id)
-			continue;
-
-		/*
-		 * calculate offset to closest alignment that
-		 * meets boundary conditions.
-		 */
-		addr_offset = align_phys_boundary(free_memseg + i,
-			requested_len, align, bound);
-
-		/* check len */
-		if ((requested_len + addr_offset) > free_memseg[i].len)
-			continue;
-
-		/* check flags for hugepage sizes */
-		if ((flags & RTE_MEMZONE_2MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_1G)
-			continue;
-		if ((flags & RTE_MEMZONE_1GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_2M)
-			continue;
-		if ((flags & RTE_MEMZONE_16MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16G)
-			continue;
-		if ((flags & RTE_MEMZONE_16GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16M)
-			continue;
-
-		/* this segment is the best until now */
-		if (memseg_idx == -1) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-		/* find the biggest contiguous zone */
-		else if (len == 0) {
-			if (free_memseg[i].len > memseg_len) {
-				memseg_idx = i;
-				memseg_len = free_memseg[i].len;
-				seg_offset = addr_offset;
-			}
-		}
-		/*
-		 * find the smallest (we already checked that current
-		 * zone length is > len
-		 */
-		else if (free_memseg[i].len + align < memseg_len ||
-				(free_memseg[i].len <= memseg_len + align &&
-				addr_offset < seg_offset)) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
+	if (socket_id == SOCKET_ID_ANY) {
+		if (requested_len == 0)
+			find_heap_max_free_elem(&socket_id, &requested_len, align);
+		else
+			find_heap_suitable(&socket_id, requested_len, align);
+
+		if (socket_id == SOCKET_ID_ANY) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
 	}
 
-	/* no segment found */
-	if (memseg_idx == -1) {
-		/*
-		 * If RTE_MEMZONE_SIZE_HINT_ONLY flag is specified,
-		 * try allocating again without the size parameter otherwise -fail.
-		 */
-		if ((flags & RTE_MEMZONE_SIZE_HINT_ONLY)  &&
-		    ((flags & RTE_MEMZONE_1GB) || (flags & RTE_MEMZONE_2MB)
-		|| (flags & RTE_MEMZONE_16MB) || (flags & RTE_MEMZONE_16GB)))
-			return memzone_reserve_aligned_thread_unsafe(name,
-				len, socket_id, 0, align, bound);
-
+	/* allocate memory on heap */
+	void *mz_addr = malloc_heap_alloc(&mcfg->malloc_heaps[socket_id], NULL,
+			requested_len, flags, align, bound);
+	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
 
-	/* save aligned physical and virtual addresses */
-	memseg_physaddr = free_memseg[memseg_idx].phys_addr + seg_offset;
-	memseg_addr = RTE_PTR_ADD(free_memseg[memseg_idx].addr,
-			(uintptr_t) seg_offset);
-
-	/* if we are looking for a biggest memzone */
-	if (len == 0) {
-		if (bound == 0)
-			requested_len = memseg_len - seg_offset;
-		else
-			requested_len = RTE_ALIGN_CEIL(memseg_physaddr + 1,
-				bound) - memseg_physaddr;
-	}
-
-	/* set length to correct value */
-	len = (size_t)seg_offset + requested_len;
-
-	/* update our internal state */
-	free_memseg[memseg_idx].len -= len;
-	free_memseg[memseg_idx].phys_addr += len;
-	free_memseg[memseg_idx].addr =
-		(char *)free_memseg[memseg_idx].addr + len;
+	const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
 
 	/* fill the zone in config */
 	struct rte_memzone *mz = &mcfg->memzone[mcfg->memzone_idx++];
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
-	mz->phys_addr = memseg_physaddr;
-	mz->addr = memseg_addr;
-	mz->len = requested_len;
-	mz->hugepage_sz = free_memseg[memseg_idx].hugepage_sz;
-	mz->socket_id = free_memseg[memseg_idx].socket_id;
+	mz->phys_addr = rte_malloc_virt2phy(mz_addr);
+	mz->addr = mz_addr;
+	mz->len = (requested_len == 0 ? elem->size : requested_len);
+	mz->hugepage_sz = elem->ms->hugepage_sz;
+	mz->socket_id = elem->ms->socket_id;
 	mz->flags = 0;
-	mz->memseg_id = memseg_idx;
+	mz->memseg_id = elem->ms - rte_eal_get_configuration()->mem_config->memseg;
 
 	return mz;
 }
@@ -419,45 +340,6 @@ rte_memzone_dump(FILE *f)
 }
 
 /*
- * called by init: modify the free memseg list to have cache-aligned
- * addresses and cache-aligned lengths
- */
-static int
-memseg_sanitize(struct rte_memseg *memseg)
-{
-	unsigned phys_align;
-	unsigned virt_align;
-	unsigned off;
-
-	phys_align = memseg->phys_addr & RTE_CACHE_LINE_MASK;
-	virt_align = (unsigned long)memseg->addr & RTE_CACHE_LINE_MASK;
-
-	/*
-	 * sanity check: phys_addr and addr must have the same
-	 * alignment
-	 */
-	if (phys_align != virt_align)
-		return -1;
-
-	/* memseg is really too small, don't bother with it */
-	if (memseg->len < (2 * RTE_CACHE_LINE_SIZE)) {
-		memseg->len = 0;
-		return 0;
-	}
-
-	/* align start address */
-	off = (RTE_CACHE_LINE_SIZE - phys_align) & RTE_CACHE_LINE_MASK;
-	memseg->phys_addr += off;
-	memseg->addr = (char *)memseg->addr + off;
-	memseg->len -= off;
-
-	/* align end address */
-	memseg->len &= ~((uint64_t)RTE_CACHE_LINE_MASK);
-
-	return 0;
-}
-
-/*
  * Init the memzone subsystem
  */
 int
@@ -465,14 +347,10 @@ rte_eal_memzone_init(void)
 {
 	struct rte_mem_config *mcfg;
 	const struct rte_memseg *memseg;
-	unsigned i = 0;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* mirror the runtime memsegs from config */
-	free_memseg = mcfg->free_memseg;
-
 	/* secondary processes don't need to initialise anything */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		return 0;
@@ -485,33 +363,13 @@ rte_eal_memzone_init(void)
 
 	rte_rwlock_write_lock(&mcfg->mlock);
 
-	/* fill in uninitialized free_memsegs */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (memseg[i].addr == NULL)
-			break;
-		if (free_memseg[i].addr != NULL)
-			continue;
-		memcpy(&free_memseg[i], &memseg[i], sizeof(struct rte_memseg));
-	}
-
-	/* make all zones cache-aligned */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (free_memseg[i].addr == NULL)
-			break;
-		if (memseg_sanitize(&free_memseg[i]) < 0) {
-			RTE_LOG(ERR, EAL, "%s(): Sanity check failed\n", __func__);
-			rte_rwlock_write_unlock(&mcfg->mlock);
-			return -1;
-		}
-	}
-
 	/* delete all zones */
 	mcfg->memzone_idx = 0;
 	memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
 
 	rte_rwlock_write_unlock(&mcfg->mlock);
 
-	return 0;
+	return rte_eal_malloc_heap_init();
 }
 
 /* Walk all reserved memory zones */
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 34f5abc..055212a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -73,7 +73,7 @@ struct rte_mem_config {
 	struct rte_memseg memseg[RTE_MAX_MEMSEG];    /**< Physmem descriptors. */
 	struct rte_memzone memzone[RTE_MAX_MEMZONE]; /**< Memzone descriptors. */
 
-	/* Runtime Physmem descriptors. */
+	/* Runtime Physmem descriptors - NOT USED */
 	struct rte_memseg free_memseg[RTE_MAX_MEMSEG];
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index 716216f..b270356 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -40,7 +40,7 @@
 #include <rte_memory.h>
 
 /* Number of free lists per heap, grouped by size. */
-#define RTE_HEAP_NUM_FREELISTS  5
+#define RTE_HEAP_NUM_FREELISTS  13
 
 /**
  * Structure to hold malloc heap
@@ -48,7 +48,6 @@
 struct malloc_heap {
 	rte_spinlock_t lock;
 	LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
-	unsigned mz_count;
 	unsigned alloc_count;
 	size_t total_size;
 } __rte_cache_aligned;
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index a5e1248..b54ee33 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -37,7 +37,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
 #include <rte_per_lcore.h>
@@ -56,10 +55,10 @@
  */
 void
 malloc_elem_init(struct malloc_elem *elem,
-		struct malloc_heap *heap, const struct rte_memzone *mz, size_t size)
+		struct malloc_heap *heap, const struct rte_memseg *ms, size_t size)
 {
 	elem->heap = heap;
-	elem->mz = mz;
+	elem->ms = ms;
 	elem->prev = NULL;
 	memset(&elem->free_list, 0, sizeof(elem->free_list));
 	elem->state = ELEM_FREE;
@@ -70,12 +69,12 @@ malloc_elem_init(struct malloc_elem *elem,
 }
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
 {
-	malloc_elem_init(elem, prev->heap, prev->mz, 0);
+	malloc_elem_init(elem, prev->heap, prev->ms, 0);
 	elem->prev = prev;
 	elem->state = ELEM_BUSY; /* mark busy so its never merged */
 }
@@ -86,12 +85,24 @@ malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
  * fit, return NULL.
  */
 static void *
-elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
+elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	const uintptr_t end_pt = (uintptr_t)elem +
+	const size_t bmask = ~(bound - 1);
+	uintptr_t end_pt = (uintptr_t)elem +
 			elem->size - MALLOC_ELEM_TRAILER_LEN;
-	const uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
-	const uintptr_t new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
+	uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+	uintptr_t new_elem_start;
+
+	/* check boundary */
+	if ((new_data_start & bmask) != ((end_pt - 1) & bmask)) {
+		end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
+		new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+		if (((end_pt - 1) & bmask) != (new_data_start & bmask))
+			return NULL;
+	}
+
+	new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
 
 	/* if the new start point is before the exist start, it won't fit */
 	return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
@@ -102,9 +113,10 @@ elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
  * alignment request from the current element
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,	unsigned align,
+		size_t bound)
 {
-	return elem_start_pt(elem, size, align) != NULL;
+	return elem_start_pt(elem, size, align, bound) != NULL;
 }
 
 /*
@@ -115,10 +127,10 @@ static void
 split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt)
 {
 	struct malloc_elem *next_elem = RTE_PTR_ADD(elem, elem->size);
-	const unsigned old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
-	const unsigned new_elem_size = elem->size - old_elem_size;
+	const size_t old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
+	const size_t new_elem_size = elem->size - old_elem_size;
 
-	malloc_elem_init(split_pt, elem->heap, elem->mz, new_elem_size);
+	malloc_elem_init(split_pt, elem->heap, elem->ms, new_elem_size);
 	split_pt->prev = elem;
 	next_elem->prev = split_pt;
 	elem->size = old_elem_size;
@@ -168,8 +180,9 @@ malloc_elem_free_list_index(size_t size)
 void
 malloc_elem_free_list_insert(struct malloc_elem *elem)
 {
-	size_t idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
+	size_t idx;
 
+	idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
 	elem->state = ELEM_FREE;
 	LIST_INSERT_HEAD(&elem->heap->free_head[idx], elem, free_list);
 }
@@ -190,12 +203,26 @@ elem_free_list_remove(struct malloc_elem *elem)
  * is not done here, as it's done there previously.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	struct malloc_elem *new_elem = elem_start_pt(elem, size, align);
-	const unsigned old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
+	const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	const size_t trailer_size = elem->size - old_elem_size - size -
+		MALLOC_ELEM_OVERHEAD;
+
+	elem_free_list_remove(elem);
 
-	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE){
+	if (trailer_size > MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+		/* split it, too much free space after elem */
+		struct malloc_elem *new_free_elem =
+				RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
+
+		split_elem(elem, new_free_elem);
+		malloc_elem_free_list_insert(new_free_elem);
+	}
+
+	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
 		/* don't split it, pad the element instead */
 		elem->state = ELEM_BUSY;
 		elem->pad = old_elem_size;
@@ -208,8 +235,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 			new_elem->size = elem->size - elem->pad;
 			set_header(new_elem);
 		}
-		/* remove element from free list */
-		elem_free_list_remove(elem);
 
 		return new_elem;
 	}
@@ -219,7 +244,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 	 * Re-insert original element, in case its new size makes it
 	 * belong on a different list.
 	 */
-	elem_free_list_remove(elem);
 	split_elem(elem, new_elem);
 	new_elem->state = ELEM_BUSY;
 	malloc_elem_free_list_insert(elem);
diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h
index 9790b1a..e05d2ea 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -47,9 +47,9 @@ enum elem_state {
 
 struct malloc_elem {
 	struct malloc_heap *heap;
-	struct malloc_elem *volatile prev;      /* points to prev elem in memzone */
+	struct malloc_elem *volatile prev;      /* points to prev elem in memseg */
 	LIST_ENTRY(malloc_elem) free_list;      /* list of free elements in heap */
-	const struct rte_memzone *mz;
+	const struct rte_memseg *ms;
 	volatile enum elem_state state;
 	uint32_t pad;
 	size_t size;
@@ -136,11 +136,11 @@ malloc_elem_from_data(const void *data)
 void
 malloc_elem_init(struct malloc_elem *elem,
 		struct malloc_heap *heap,
-		const struct rte_memzone *mz,
+		const struct rte_memseg *ms,
 		size_t size);
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem,
@@ -151,14 +151,16 @@ malloc_elem_mkend(struct malloc_elem *elem,
  * of the requested size and with the requested alignment
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * reserve a block of data in an existing malloc_elem. If the malloc_elem
  * is much larger than the data block requested, we split the element in two.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_alloc(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * free a malloc_elem block by adding it to the free list. If the
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8861d27..f5fff96 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -39,7 +39,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_launch.h>
@@ -54,123 +53,104 @@
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
-/* since the memzone size starts with a digit, it will appear unquoted in
- * rte_config.h, so quote it so it can be passed to rte_str_to_size */
-#define MALLOC_MEMZONE_SIZE RTE_STR(RTE_MALLOC_MEMZONE_SIZE)
-
-/*
- * returns the configuration setting for the memzone size as a size_t value
- */
-static inline size_t
-get_malloc_memzone_size(void)
+static unsigned
+check_hugepage_sz(unsigned flags, size_t hugepage_sz)
 {
-	return rte_str_to_size(MALLOC_MEMZONE_SIZE);
+	unsigned ret = 1;
+
+	if ((flags & RTE_MEMZONE_2MB) && hugepage_sz == RTE_PGSIZE_1G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_1GB) && hugepage_sz == RTE_PGSIZE_2M)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16MB) && hugepage_sz == RTE_PGSIZE_16G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16GB) && hugepage_sz == RTE_PGSIZE_16M)
+		ret = 0;
+
+	return ret;
 }
 
 /*
- * reserve an extra memory zone and make it available for use by a particular
- * heap. This reserves the zone and sets a dummy malloc_elem header at the end
+ * Expand the heap with a memseg.
+ * This reserves the zone and sets a dummy malloc_elem header at the end
  * to prevent overflow. The rest of the zone is added to free list as a single
  * large free block
  */
-static int
-malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
+static void
+malloc_heap_add_memseg(struct malloc_heap *heap, struct rte_memseg *ms)
 {
-	const unsigned mz_flags = 0;
-	const size_t block_size = get_malloc_memzone_size();
-	/* ensure the data we want to allocate will fit in the memzone */
-	const size_t min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
-	const struct rte_memzone *mz = NULL;
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	unsigned numa_socket = heap - mcfg->malloc_heaps;
-
-	size_t mz_size = min_size;
-	if (mz_size < block_size)
-		mz_size = block_size;
-
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	snprintf(mz_name, sizeof(mz_name), "MALLOC_S%u_HEAP_%u",
-		     numa_socket, heap->mz_count++);
-
-	/* try getting a block. if we fail and we don't need as big a block
-	 * as given in the config, we can shrink our request and try again
-	 */
-	do {
-		mz = rte_memzone_reserve(mz_name, mz_size, numa_socket,
-					 mz_flags);
-		if (mz == NULL)
-			mz_size /= 2;
-	} while (mz == NULL && mz_size > min_size);
-	if (mz == NULL)
-		return -1;
-
 	/* allocate the memory block headers, one at end, one at start */
-	struct malloc_elem *start_elem = (struct malloc_elem *)mz->addr;
-	struct malloc_elem *end_elem = RTE_PTR_ADD(mz->addr,
-			mz_size - MALLOC_ELEM_OVERHEAD);
+	struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
+	struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr,
+			ms->len - MALLOC_ELEM_OVERHEAD);
 	end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
+	const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
 
-	const unsigned elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
-	malloc_elem_init(start_elem, heap, mz, elem_size);
+	malloc_elem_init(start_elem, heap, ms, elem_size);
 	malloc_elem_mkend(end_elem, start_elem);
 	malloc_elem_free_list_insert(start_elem);
 
-	/* increase heap total size by size of new memzone */
-	heap->total_size+=mz_size - MALLOC_ELEM_OVERHEAD;
-	return 0;
+	heap->total_size += elem_size;
 }
 
 /*
  * Iterates through the freelist for a heap to find a free element
  * which can store data of the required size and with the requested alignment.
+ * If size is 0, find the biggest available elem.
  * Returns null on failure, or pointer to element on success.
  */
 static struct malloc_elem *
-find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
+find_suitable_element(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
 	size_t idx;
-	struct malloc_elem *elem;
+	struct malloc_elem *elem, *alt_elem = NULL;
 
 	for (idx = malloc_elem_free_list_index(size);
-		idx < RTE_HEAP_NUM_FREELISTS; idx++)
-	{
+			idx < RTE_HEAP_NUM_FREELISTS; idx++) {
 		for (elem = LIST_FIRST(&heap->free_head[idx]);
-			!!elem; elem = LIST_NEXT(elem, free_list))
-		{
-			if (malloc_elem_can_hold(elem, size, align))
-				return elem;
+				!!elem; elem = LIST_NEXT(elem, free_list)) {
+			if (malloc_elem_can_hold(elem, size, align, bound)) {
+				if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
+					return elem;
+				alt_elem = elem;
+			}
 		}
 	}
+
+	if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		return alt_elem;
+
 	return NULL;
 }
 
 /*
- * Main function called by malloc to allocate a block of memory from the
- * heap. It locks the free list, scans it, and adds a new memzone if the
- * scan fails. Once the new memzone is added, it re-scans and should return
+ * Main function to allocate a block of memory from the heap.
+ * It locks the free list, scans it, and adds a new memseg if the
+ * scan fails. Once the new memseg is added, it re-scans and should return
  * the new element after releasing the lock.
  */
 void *
 malloc_heap_alloc(struct malloc_heap *heap,
-		const char *type __attribute__((unused)), size_t size, unsigned align)
+		const char *type __attribute__((unused)), size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
+	struct malloc_elem *elem;
+
 	size = RTE_CACHE_LINE_ROUNDUP(size);
 	align = RTE_CACHE_LINE_ROUNDUP(align);
+
 	rte_spinlock_lock(&heap->lock);
-	struct malloc_elem *elem = find_suitable_element(heap, size, align);
-	if (elem == NULL){
-		if ((malloc_heap_add_memzone(heap, size, align)) == 0)
-			elem = find_suitable_element(heap, size, align);
-	}
 
-	if (elem != NULL){
-		elem = malloc_elem_alloc(elem, size, align);
+	elem = find_suitable_element(heap, size, flags, align, bound);
+	if (elem != NULL) {
+		elem = malloc_elem_alloc(elem, size, align, bound);
 		/* increase heap's count of allocated elements */
 		heap->alloc_count++;
 	}
 	rte_spinlock_unlock(&heap->lock);
-	return elem == NULL ? NULL : (void *)(&elem[1]);
 
+	return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
 /*
@@ -206,3 +186,21 @@ malloc_heap_get_stats(const struct malloc_heap *heap,
 	socket_stats->alloc_count = heap->alloc_count;
 	return 0;
 }
+
+int
+rte_eal_malloc_heap_init(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned ms_cnt;
+	struct rte_memseg *ms;
+
+	if (mcfg == NULL)
+		return -1;
+
+	for (ms = &mcfg->memseg[0], ms_cnt = 0;
+			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
+			ms_cnt++, ms++)
+		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+
+	return 0;
+}
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a47136d..3ccbef0 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -53,15 +53,15 @@ malloc_get_numa_socket(void)
 }
 
 void *
-malloc_heap_alloc(struct malloc_heap *heap, const char *type,
-		size_t size, unsigned align);
+malloc_heap_alloc(struct malloc_heap *heap,	const char *type, size_t size,
+		unsigned flags, size_t align, size_t bound);
 
 int
 malloc_heap_get_stats(const struct malloc_heap *heap,
 		struct rte_malloc_socket_stats *socket_stats);
 
 int
-rte_eal_heap_memzone_init(void);
+rte_eal_malloc_heap_init(void);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index c313a57..54c2bd8 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -39,7 +39,6 @@
 
 #include <rte_memcpy.h>
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_branch_prediction.h>
@@ -87,7 +86,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 		return NULL;
 
 	ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
-				size, align == 0 ? 1 : align);
+				size, 0, align == 0 ? 1 : align, 0);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
@@ -98,7 +97,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 			continue;
 
 		ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
-					size, align == 0 ? 1 : align);
+					size, 0, align == 0 ? 1 : align, 0);
 		if (ret != NULL)
 			return ret;
 	}
@@ -256,5 +255,5 @@ rte_malloc_virt2phy(const void *addr)
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->mz->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->mz->addr);
+	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
-- 
1.9.3

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v4 8/9] doc: announce ABI change of librte_malloc
  2015-06-25 14:05  4%   ` [dpdk-dev] [PATCH v4 0/9] Dynamic memzone Sergio Gonzalez Monroy
  2015-06-25 14:05  1%     ` [dpdk-dev] [PATCH v4 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
@ 2015-06-25 14:05 14%     ` Sergio Gonzalez Monroy
  1 sibling, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-25 14:05 UTC (permalink / raw)
  To: dev

Announce the creation of dummy malloc library for 2.1 and removal of
such library, now integrated in librte_eal, for 2.2 release.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..2aaf900 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* librte_malloc library has been integrated into librte_eal. The 2.1 release creates a dummy/empty malloc library to fulfill binaries with dynamic linking dependencies on librte_malloc.so. Such dummy library will not be created from release 2.2 so binaries will need to be rebuilt.
-- 
1.9.3

^ permalink raw reply	[relevance 14%]

* [dpdk-dev] [PATCH v4 0/9] Dynamic memzone
    2015-06-06 10:32  1%   ` [dpdk-dev] [PATCH v2 2/7] eal: memzone allocated by malloc Sergio Gonzalez Monroy
  2015-06-19 17:21  4%   ` [dpdk-dev] [PATCH v3 0/9] Dynamic memzone Sergio Gonzalez Monroy
@ 2015-06-25 14:05  4%   ` Sergio Gonzalez Monroy
  2015-06-25 14:05  1%     ` [dpdk-dev] [PATCH v4 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
  2015-06-25 14:05 14%     ` [dpdk-dev] [PATCH v4 8/9] doc: announce ABI change of librte_malloc Sergio Gonzalez Monroy
  2015-06-26 11:32  4%   ` [dpdk-dev] [PATCH v5 0/9] Dynamic memzones Sergio Gonzalez Monroy
  3 siblings, 2 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-25 14:05 UTC (permalink / raw)
  To: dev

Current implemetation allows reserving/creating memzones but not the opposite
(unreserve/free). This affects mempools and other memzone based objects.

>From my point of view, implementing free functionality for memzones would look
like malloc over memsegs.
Thus, this approach moves malloc inside eal (which in turn removes a circular
dependency), where malloc heaps are composed of memsegs.
We keep both malloc and memzone APIs as they are, but memzones allocate its
memory by calling malloc_heap_alloc.
Some extra functionality is required in malloc to allow for boundary constrained
memory requests.
In summary, currently malloc is based on memzones, and with this approach
memzones are based on malloc.

v4:
 - rebase and fix couple of merge issues

v3:
 - Create dummy librte_malloc
 - Add deprecation notice
 - Rework some of the code
 - Doc update
 - checkpatch

v2:
 - New rte_memzone_free
 - Support memzone len = 0
 - Add all available memsegs to malloc heap at init
 - Update memzone/malloc unit tests

Sergio Gonzalez Monroy (9):
  eal: move librte_malloc to eal/common
  eal: memzone allocated by malloc
  app/test: update malloc/memzone unit tests
  config: remove CONFIG_RTE_MALLOC_MEMZONE_SIZE
  eal: remove free_memseg and references to it
  eal: new rte_memzone_free
  app/test: update unit test with rte_memzone_free
  doc: announce ABI change of librte_malloc
  doc: update malloc documentation

 MAINTAINERS                                       |   9 +-
 app/test/test_malloc.c                            |  86 -----
 app/test/test_memzone.c                           | 441 +++-------------------
 config/common_bsdapp                              |   8 +-
 config/common_linuxapp                            |   8 +-
 doc/guides/prog_guide/env_abstraction_layer.rst   | 220 ++++++++++-
 doc/guides/prog_guide/img/malloc_heap.png         | Bin 81329 -> 80952 bytes
 doc/guides/prog_guide/index.rst                   |   1 -
 doc/guides/prog_guide/malloc_lib.rst              | 233 ------------
 doc/guides/prog_guide/overview.rst                |  11 +-
 doc/guides/rel_notes/abi.rst                      |   1 +
 drivers/net/af_packet/Makefile                    |   1 -
 drivers/net/bonding/Makefile                      |   1 -
 drivers/net/e1000/Makefile                        |   2 +-
 drivers/net/enic/Makefile                         |   2 +-
 drivers/net/fm10k/Makefile                        |   2 +-
 drivers/net/i40e/Makefile                         |   2 +-
 drivers/net/ixgbe/Makefile                        |   2 +-
 drivers/net/mlx4/Makefile                         |   1 -
 drivers/net/null/Makefile                         |   1 -
 drivers/net/pcap/Makefile                         |   1 -
 drivers/net/virtio/Makefile                       |   2 +-
 drivers/net/vmxnet3/Makefile                      |   2 +-
 drivers/net/xenvirt/Makefile                      |   2 +-
 lib/Makefile                                      |   2 +-
 lib/librte_acl/Makefile                           |   2 +-
 lib/librte_eal/bsdapp/eal/Makefile                |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map     |  19 +
 lib/librte_eal/common/Makefile                    |   1 +
 lib/librte_eal/common/eal_common_memzone.c        | 329 ++++++----------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   5 +-
 lib/librte_eal/common/include/rte_malloc.h        | 342 +++++++++++++++++
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/include/rte_memzone.h       |  11 +
 lib/librte_eal/common/malloc_elem.c               | 344 +++++++++++++++++
 lib/librte_eal/common/malloc_elem.h               | 192 ++++++++++
 lib/librte_eal/common/malloc_heap.c               | 206 ++++++++++
 lib/librte_eal/common/malloc_heap.h               |  70 ++++
 lib/librte_eal/common/rte_malloc.c                | 259 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile              |   4 +-
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c         |  17 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map   |  19 +
 lib/librte_hash/Makefile                          |   2 +-
 lib/librte_lpm/Makefile                           |   2 +-
 lib/librte_malloc/Makefile                        |   6 +-
 lib/librte_malloc/malloc_elem.c                   | 320 ----------------
 lib/librte_malloc/malloc_elem.h                   | 190 ----------
 lib/librte_malloc/malloc_heap.c                   | 208 ----------
 lib/librte_malloc/malloc_heap.h                   |  70 ----
 lib/librte_malloc/rte_malloc.c                    | 228 +----------
 lib/librte_malloc/rte_malloc.h                    | 342 -----------------
 lib/librte_malloc/rte_malloc_version.map          |  16 -
 lib/librte_mempool/Makefile                       |   2 -
 lib/librte_port/Makefile                          |   1 -
 lib/librte_ring/Makefile                          |   3 +-
 lib/librte_table/Makefile                         |   1 -
 56 files changed, 1897 insertions(+), 2362 deletions(-)
 delete mode 100644 doc/guides/prog_guide/malloc_lib.rst
 create mode 100644 lib/librte_eal/common/include/rte_malloc.h
 create mode 100644 lib/librte_eal/common/malloc_elem.c
 create mode 100644 lib/librte_eal/common/malloc_elem.h
 create mode 100644 lib/librte_eal/common/malloc_heap.c
 create mode 100644 lib/librte_eal/common/malloc_heap.h
 create mode 100644 lib/librte_eal/common/rte_malloc.c
 delete mode 100644 lib/librte_malloc/malloc_elem.c
 delete mode 100644 lib/librte_malloc/malloc_elem.h
 delete mode 100644 lib/librte_malloc/malloc_heap.c
 delete mode 100644 lib/librte_malloc/malloc_heap.h
 delete mode 100644 lib/librte_malloc/rte_malloc.h

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-25 11:35  9%       ` Neil Horman
@ 2015-06-25 13:22  7%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-25 13:22 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

2015-06-25 07:35, Neil Horman:
> On Wed, Jun 24, 2015 at 11:09:29PM +0200, Thomas Monjalon wrote:
> > 2015-06-24 14:34, Neil Horman:
> > > +Some ABI changes may be too significant to reasonably maintain multiple
> > > +versions. In those cases ABI's may be updated without backward compatibility
> > > +being provided. The requirements for doing so are:
> > > +
> > > +#. At least 3 acknowledgments of the need to do so must be made on the
> > > +   dpdk.org mailing list.
> > > +
> > > +#. A full deprecation cycle, as explained above, must be made to offer
> > > +   downstream consumers sufficient warning of the change.
> > > +
> > > +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
> > > +   incorporated must be incremented in parallel with the ABI changes
> > > +   themselves.
> > 
> > The proposal was to provide the old and the new ABI in the same source code
> > during the deprecation cycle. The old ABI would be the default and people
> > can build the new one by enabling the NEXT_ABI config option.
> > So the migration to the new ABI is smoother.
> 
> Yes....I'm not sure what you're saying here.  The ABI doesn't 'Change' until the
> old ABI is removed (i.e. old applications are forced to adopt a new ABI), and so
> LIBABIVER has to be updated in parallel with that removal

I'm referring to previous threads suggesting a NEXT_ABI build option to be able
to build the old (default) ABI or the next one.
So the LIBABIVER and .map file would depend of enabling NEXT_ABI or not:
	http://dpdk.org/ml/archives/dev/2015-June/019147.html
	http://dpdk.org/ml/archives/dev/2015-June/019784.html
	http://dpdk.org/ml/archives/dev/2015-June/019810.html

> > [...]
> > > +The macros exported are:
> > > +
> > > +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry binding
> > > +  unversioned symbol ``b`` to the internal function ``b_e``.
> > 
> > The definition is the same as BASE_SYMBOL.
> > 
> No, they're different.  VERSION_SYMBOL is defined as:
> VERSION_SYMBOL(b, e, n) __asm__(".symver " RTE_STR(b) RTE_STR(e) ", " RTE_STR(b) "@DPDK_" RTE_STR(n))
> 
> while BASE_SYMBOL is
> #define BASE_SYMBOL(b, e) __asm__(".symver " RTE_STR(b) RTE_STR(e) ", " RTE_STR(b)"@")

Yes. I mean the comments are the same, so don't reflect the difference.

> > [...]
> > > +   DPDK_2.0 {
> > > +        global:
> > > +
> > > +        rte_acl_add_rules;
> > > +        rte_acl_build;
> > > +        rte_acl_classify;
> > > +        rte_acl_classify_alg;
> > > +        rte_acl_classify_scalar;
> > > +        rte_acl_create;
> > 
> > So it's declared twice, right?
> > I think it should be explicit.
> > 
> Yes, its listed once for each version node, so 2 delcarations.  I thought that
> was made explicit by the use of the code block.  What else would you like to
> see?

I think you should say it explicitly in the comment below the block.

> > > +        rte_acl_dump;
> > > +        rte_acl_find_existing;
> > > +        rte_acl_free;
> > > +        rte_acl_ipv4vlan_add_rules;
> > > +        rte_acl_ipv4vlan_build;
> > > +        rte_acl_list_dump;
> > > +        rte_acl_reset;
> > > +        rte_acl_reset_rules;
> > > +        rte_acl_set_ctx_classify;
> > > +
> > > +        local: *;
> > > +   };
> > > +
> > > +   DPDK_2.1 {
> > > +        global:
> > > +        rte_acl_create;
> > > +
> > > +   } DPDK_2.0;

> > [...]
> > > +the macros used for versioning symbols.  That is our next step, mapping this new
> > > +symbol name to the initial symbol name at version node 2.0.  Immediately after
> > > +the function, we add this line of code
> > > +
> > > +.. code-block:: c
> > > +
> > > +   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
> > 
> > Can it be declared before the function?
> > 
> Strictly speaking yes, though its a bit odd from a sylistic point to declare
> versioned aliases for a symbol prior to defining the symbol itself (its like a
> forward declaration)

It allows to declare it near the function header.

> > When do we need to use BASE_SYMBOL?
> > 
> For our purposes you currently don't, because there are no unversioned symbols
> in DPDK (since we use a map file).  I've just included it here for completeness
> in the header file should it ever be needed in the future.

If it can be useful, please integrate a note to explain when it should be used.

> > [...]
> > > +This code serves as our new API call.  Its the same as our old call, but adds
> > > +the new parameter in place.  Next we need to map this function to the symbol
> > > +``rte_acl_create@DPDK_2.1``.  To do this, we modify the public prototype of the call
> > > +in the header file, adding the macro there to inform all including applications,
> > > +that on re-link, the default rte_acl_create symbol should point to this
> > > +function.  Note that we could do this by simply naming the function above
> > > +rte_acl_create, and the linker would chose the most recent version tag to apply
> > > +in the version script, but we can also do this in the header file
> > > +
> > > +.. code-block:: c
> > > +
> > > +   struct rte_acl_ctx *
> > > +   -rte_acl_create(const struct rte_acl_param *param);
> > > +   +rte_acl_create(const struct rte_acl_param *param, int debug);
> > > +   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
> > 
> > Will it work with static library?
> > 
> hmm, this example in particular?  No, I didn't think of that.  To work with a
> static build, you still need to define the unversioned symbol.  Thats easy
> enough to do though, by either defining rte_acl_create as a public api and
> calling the appropriate versioned function, or by creating a macro to point to
> the right version via an alias.  I can fix that easily enough.

Yes please, static libraries are really important in DPDK.

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-25  7:19  4%     ` Zhang, Helin
  2015-06-25  7:42  4%       ` Gonzalez Monroy, Sergio
@ 2015-06-25 12:25  4%       ` Neil Horman
  1 sibling, 0 replies; 200+ results
From: Neil Horman @ 2015-06-25 12:25 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Thu, Jun 25, 2015 at 07:19:49AM +0000, Zhang, Helin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> > Sent: Thursday, June 25, 2015 2:35 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
> > 
> > People have been asking for ways to use the ABI macros, heres some docs to
> > clarify their use.  Included is:
> > 
> > * An overview of what ABI is
> > * Details of the ABI deprecation process
> > * Details of the versioning macros
> > * Examples of their use
> > * Details of how to use the ABI validator
> > 
> > Thanks to John Mcnamara, who duplicated much of this effort at Intel while I was
> > working on it.  Much of the introductory material was gathered and cleaned up
> > by him
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: john.mcnamara@intel.com
> > CC: thomas.monjalon@6wind.com
> > 
> > Change notes:
> > 
> > v2)
> >      * Fixed RST indentations and spelling errors
> >      * Rebased to upstream to fix index.rst conflict
> > ---
> >  doc/guides/guidelines/index.rst      |   1 +
> >  doc/guides/guidelines/versioning.rst | 456
> > +++++++++++++++++++++++++++++++++++
> >  2 files changed, 457 insertions(+)
> >  create mode 100644 doc/guides/guidelines/versioning.rst
> > 
> > diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
> > index 0ee9ab3..bfb9fa3 100644
> > --- a/doc/guides/guidelines/index.rst
> > +++ b/doc/guides/guidelines/index.rst
> > @@ -7,3 +7,4 @@ Guidelines
> > 
> >      coding_style
> >      design
> > +    versioning
> > diff --git a/doc/guides/guidelines/versioning.rst
> > b/doc/guides/guidelines/versioning.rst
> > new file mode 100644
> > index 0000000..2aef526
> > --- /dev/null
> > +++ b/doc/guides/guidelines/versioning.rst
> > @@ -0,0 +1,456 @@
> > +Managing ABI updates
> > +====================
> > +
> > +Description
> > +-----------
> > +
> > +This document details some methods for handling ABI management in the
> > DPDK.
> > +Note this document is not exhaustive, in that C library versioning is
> > +flexible allowing multiple methods to achieve various goals, but it
> > +will provide the user with some introductory methods
> > +
> > +General Guidelines
> > +------------------
> > +
> > +#. Whenever possible, ABI should be preserved #. The addition of
> > +symbols is generally not problematic #. The modification of symbols can
> > +generally be managed with versioning #. The removal of symbols
> > +generally is an ABI break and requires bumping of the
> > +   LIBABIVER macro
> > +
> > +What is an ABI
> > +--------------
> > +
> > +An ABI (Application Binary Interface) is the set of runtime interfaces
> > +exposed by a library. It is similar to an API (Application Programming
> > +Interface) but is the result of compilation.  It is also effectively
> > +cloned when applications link to dynamic libraries.  That is to say
> > +when an application is compiled to link against dynamic libraries, it
> > +is assumed that the ABI remains constant between the time the application is
> > compiled/linked, and the time that it runs.
> > +Therefore, in the case of dynamic linking, it is critical that an ABI
> > +is preserved, or (when modified), done in such a way that the
> > +application is unable to behave improperly or in an unexpected fashion.
> > +
> > +The DPDK ABI policy
> > +-------------------
> > +
> > +ABI versions are set at the time of major release labeling, and the ABI
> > +may change multiple times, without warning, between the last release
> > +label and the HEAD label of the git tree.
> > +
> > +ABI versions, once released, are available until such time as their
> > +deprecation has been noted in the Release Notes for at least one major
> > +release cycle. For example consider the case where the ABI for DPDK 2.0
> > +has been shipped and then a decision is made to modify it during the
> > +development of DPDK 2.1. The decision will be recorded in the Release
> > +Notes for the DPDK 2.1 release and the modification will be made available in
> > the DPDK 2.2 release.
> > +
> > +ABI versions may be deprecated in whole or in part as needed by a given
> > +update.
> > +
> > +Some ABI changes may be too significant to reasonably maintain multiple
> > +versions. In those cases ABI's may be updated without backward
> > +compatibility being provided. The requirements for doing so are:
> > +
> > +#. At least 3 acknowledgments of the need to do so must be made on the
> > +   dpdk.org mailing list.
> > +
> > +#. A full deprecation cycle, as explained above, must be made to offer
> > +   downstream consumers sufficient warning of the change.
> > +
> > +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
> > +   incorporated must be incremented in parallel with the ABI changes
> > +   themselves.
> > +
> > +Note that the above process for ABI deprecation should not be
> > +undertaken lightly. ABI stability is extremely important for downstream
> > +consumers of the DPDK, especially when distributed in shared object
> > +form. Every effort should be made to preserve the ABI whenever
> > +possible. The ABI should only be changed for significant reasons, such
> > +as performance enhancements. ABI breakage due to changes such as
> > +reorganizing public structure fields for aesthetic or readability purposes should
> > be avoided.
> > +
> > +Examples of Deprecation Notices
> > +-------------------------------
> > +
> > +The following are some examples of ABI deprecation notices which would
> > +be added to the Release Notes:
> > +
> > +* The Macro ``#RTE_FOO`` is deprecated and will be removed with version
> > +2.0,
> > +  to be replaced with the inline function ``rte_foo()``.
> > +
> > +* The function ``rte_mbuf_grok()`` has been updated to include a new
> > +parameter
> > +  in version 2.0. Backwards compatibility will be maintained for this
> > +function
> > +  until the release of version 2.1
> > +
> > +* The members of ``struct rte_foo`` have been reorganized in release
> > +2.0 for
> > +  performance reasons. Existing binary applications will have backwards
> > +  compatibility in release 2.0, while newly built binaries will need to
> > +  reference the new structure variant ``struct rte_foo2``.
> > +Compatibility will
> > +  be removed in release 2.2, and all applications will require updating
> > +and
> > +  rebuilding to the new structure at that time, which will be renamed
> > +to the
> > +  original ``struct rte_foo``.
> > +
> > +* Significant ABI changes are planned for the ``librte_dostuff``
> > +library. The
> > +  upcoming release 2.0 will not contain these changes, but release 2.1
> > +will,
> > +  and no backwards compatibility is planned due to the extensive nature
> > +of
> > +  these changes. Binaries using this library built prior to version 2.1
> > +will
> > +  require updating and recompilation.
> > +
> > +Versioning Macros
> > +-----------------
> > +
> > +When a symbol is exported from a library to provide an API, it also
> > +provides a calling convention (ABI) that is embodied in its name,
> > +return type and arguments. Occasionally that function may need to
> > +change to accommodate new functionality or behavior. When that occurs,
> > +it is desirable to allow for backward compatibility for a time with
> > +older binaries that are dynamically linked to the DPDK.
> > +
> > +To support backward compatibility the
> > +``lib/librte_compat/rte_compat.h``
> > +header file provides macros to use when updating exported functions.
> > +These macros are used in conjunction with the
> > +``rte_<library>_version.map`` file for a given library to allow
> > +multiple versions of a symbol to exist in a shared library so that older binaries
> > need not be immediately recompiled.
> > +
> > +The macros exported are:
> > +
> > +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry
> > +binding
> > +  unversioned symbol ``b`` to the internal function ``b_e``.
> > +
> > +
> > +* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
> > +  unversioned symbol ``b`` to the internal function ``b_e``.
> > +
> > +* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry
> > +instructing
> > +  the linker to bind references to symbol ``b`` to the internal symbol
> > +  ``b_e``.
> > +
> > +
> > +Examples of ABI Macro use
> > +-------------------------
> > +
> > +Updating a public API
> > +~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Assume we have a function as follows
> > +
> > +.. code-block:: c
> > +
> > + /*
> > +  * Create an acl context object for apps to
> > +  * manipulate
> > +  */
> > + struct rte_acl_ctx *
> > + rte_acl_create(const struct rte_acl_param *param) {
> > +        ...
> > + }
> > +
> > +
> > +Assume that struct rte_acl_ctx is a private structure, and that a
> > +developer wishes to enhance the acl api so that a debugging flag can be
> > +enabled on a per-context basis.  This requires an addition to the
> > +structure (which, being private, is safe), but it also requires
> > +modifying the code as follows
> > +
> > +.. code-block:: c
> > +
> > + /*
> > +  * Create an acl context object for apps to
> > +  * manipulate
> > +  */
> > + struct rte_acl_ctx *
> > + rte_acl_create(const struct rte_acl_param *param, int debug) {
> > +        ...
> > + }
> > +
> > +
> > +Note also that, being a public function, the header file prototype must
> > +also be changed, as must all the call sites, to reflect the new ABI
> > +footprint.  We will maintain previous ABI versions that are accessible
> > +only to previously compiled binaries
> > +
> > +The addition of a parameter to the function is ABI breaking as the
> > +function is public, and existing application may use it in its current
> > +form.  However, the compatibility macros in DPDK allow a developer to
> > +use symbol versioning so that multiple functions can be mapped to the
> > +same public symbol based on when an application was linked to it.  To
> > +see how this is done, we start with the requisite libraries version map
> > +file.  Initially the version map file for the acl library looks like
> > +this
> > +
> > +.. code-block:: none
> > +
> > +   DPDK_2.0 {
> > +        global:
> > +
> > +        rte_acl_add_rules;
> > +        rte_acl_build;
> > +        rte_acl_classify;
> > +        rte_acl_classify_alg;
> > +        rte_acl_classify_scalar;
> > +        rte_acl_create;
> > +        rte_acl_dump;
> > +        rte_acl_find_existing;
> > +        rte_acl_free;
> > +        rte_acl_ipv4vlan_add_rules;
> > +        rte_acl_ipv4vlan_build;
> > +        rte_acl_list_dump;
> > +        rte_acl_reset;
> > +        rte_acl_reset_rules;
> > +        rte_acl_set_ctx_classify;
> > +
> > +        local: *;
> > +   };
> > +
> > +This file needs to be modified as follows
> > +
> > +.. code-block:: none
> > +
> > +   DPDK_2.0 {
> > +        global:
> > +
> > +        rte_acl_add_rules;
> > +        rte_acl_build;
> > +        rte_acl_classify;
> > +        rte_acl_classify_alg;
> > +        rte_acl_classify_scalar;
> > +        rte_acl_create;
> > +        rte_acl_dump;
> > +        rte_acl_find_existing;
> > +        rte_acl_free;
> > +        rte_acl_ipv4vlan_add_rules;
> > +        rte_acl_ipv4vlan_build;
> > +        rte_acl_list_dump;
> > +        rte_acl_reset;
> > +        rte_acl_reset_rules;
> > +        rte_acl_set_ctx_classify;
> > +
> > +        local: *;
> > +   };
> > +
> > +   DPDK_2.1 {
> > +        global:
> > +        rte_acl_create;
> One question, does it need a line of "local: *;", like it did in
> librte_ether/rte_ether_version.map?
> 
It shouldn't.  local just specifies that any symbol not already declared global
be unpublished in the ELF file (i.e. not global).  You can declare it again in
the next version node, but since the 2.1 node inherits from the 2.0 node, its
implied.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-24 21:09  9%     ` Thomas Monjalon
@ 2015-06-25 11:35  9%       ` Neil Horman
  2015-06-25 13:22  7%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-06-25 11:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Wed, Jun 24, 2015 at 11:09:29PM +0200, Thomas Monjalon wrote:
> 2015-06-24 14:34, Neil Horman:
> > +Some ABI changes may be too significant to reasonably maintain multiple
> > +versions. In those cases ABI's may be updated without backward compatibility
> > +being provided. The requirements for doing so are:
> > +
> > +#. At least 3 acknowledgments of the need to do so must be made on the
> > +   dpdk.org mailing list.
> > +
> > +#. A full deprecation cycle, as explained above, must be made to offer
> > +   downstream consumers sufficient warning of the change.
> > +
> > +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
> > +   incorporated must be incremented in parallel with the ABI changes
> > +   themselves.
> 
> The proposal was to provide the old and the new ABI in the same source code
> during the deprecation cycle. The old ABI would be the default and people
> can build the new one by enabling the NEXT_ABI config option.
> So the migration to the new ABI is smoother.
> 
Yes....I'm not sure what you're saying here.  The ABI doesn't 'Change' until the
old ABI is removed (i.e. old applications are forced to adopt a new ABI), and so
LIBABIVER has to be updated in parallel with that removal

> 
> [...]
> > +The macros exported are:
> > +
> > +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry binding
> > +  unversioned symbol ``b`` to the internal function ``b_e``.
> 
> The definition is the same as BASE_SYMBOL.
> 
No, they're different.  VERSION_SYMBOL is defined as:
VERSION_SYMBOL(b, e, n) __asm__(".symver " RTE_STR(b) RTE_STR(e) ", " RTE_STR(b) "@DPDK_" RTE_STR(n))

while BASE_SYMBOL is
#define BASE_SYMBOL(b, e) __asm__(".symver " RTE_STR(b) RTE_STR(e) ", " RTE_STR(b)"@")

> > +* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
> > +  unversioned symbol ``b`` to the internal function ``b_e``.
> 
> 
> [...]
> > +   DPDK_2.0 {
> > +        global:
> > +
> > +        rte_acl_add_rules;
> > +        rte_acl_build;
> > +        rte_acl_classify;
> > +        rte_acl_classify_alg;
> > +        rte_acl_classify_scalar;
> > +        rte_acl_create;
> 
> So it's declared twice, right?
> I think it should be explicit.
> 
Yes, its listed once for each version node, so 2 delcarations.  I thought that
was made explicit by the use of the code block.  What else would you like to
see?

> > +        rte_acl_dump;
> > +        rte_acl_find_existing;
> > +        rte_acl_free;
> > +        rte_acl_ipv4vlan_add_rules;
> > +        rte_acl_ipv4vlan_build;
> > +        rte_acl_list_dump;
> > +        rte_acl_reset;
> > +        rte_acl_reset_rules;
> > +        rte_acl_set_ctx_classify;
> > +
> > +        local: *;
> > +   };
> > +
> > +   DPDK_2.1 {
> > +        global:
> > +        rte_acl_create;
> > +
> > +   } DPDK_2.0;
> 
> [...]
> > +Note that the base name of the symbol was kept in tact, as this is condusive to
> 
> s/in tact/intact/?
> 
Hmm, thats odd, aspell explicitly changed that.  Though your right, it should be
intact.  I'll fix it.

> [...]
> > +the macros used for versioning symbols.  That is our next step, mapping this new
> > +symbol name to the initial symbol name at version node 2.0.  Immediately after
> > +the function, we add this line of code
> > +
> > +.. code-block:: c
> > +
> > +   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
> 
> Can it be declared before the function?
> 
Strictly speaking yes, though its a bit odd from a sylistic point to declare
versioned aliases for a symbol prior to defining the symbol itself (its like a
forward declaration)

> [...]
> > +Remembering to also add the rte_compat.h header to the requisite c file where
> > +these changes are being made.  The above macro instructs the linker to create a
> > +new symbol ``rte_acl_create@DPDK_2.0``, which matches the symbol created in older
> > +builds, but now points to the above newly named function.  We have now mapped
> > +the original rte_acl_create symbol to the original function (but with a new
> > +name)
> 
> Could we use VERSION_SYMBOL(rte_acl_create, , 2.0);
> when introducing the function in DPDK 2.0 (before any ABI breakage)?
> It could help to generate the .map file.
> 
I've honestly not tried.  I think its possible, but the example you give above I
don't think will work, because it will result in an error indicating
rte_acl_create is declared twice.  You would have to rename rte_acl_create to
something uniqe prior to versioning it.

> When do we need to use BASE_SYMBOL?
> 
For our purposes you currently don't, because there are no unversioned symbols
in DPDK (since we use a map file).  I've just included it here for completeness
in the header file should it ever be needed in the future.

> [...]
> > +This code serves as our new API call.  Its the same as our old call, but adds
> > +the new parameter in place.  Next we need to map this function to the symbol
> > +``rte_acl_create@DPDK_2.1``.  To do this, we modify the public prototype of the call
> > +in the header file, adding the macro there to inform all including applications,
> > +that on re-link, the default rte_acl_create symbol should point to this
> > +function.  Note that we could do this by simply naming the function above
> > +rte_acl_create, and the linker would chose the most recent version tag to apply
> > +in the version script, but we can also do this in the header file
> > +
> > +.. code-block:: c
> > +
> > +   struct rte_acl_ctx *
> > +   -rte_acl_create(const struct rte_acl_param *param);
> > +   +rte_acl_create(const struct rte_acl_param *param, int debug);
> > +   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
> 
> Will it work with static library?
> 
hmm, this example in particular?  No, I didn't think of that.  To work with a
static build, you still need to define the unversioned symbol.  Thats easy
enough to do though, by either defining rte_acl_create as a public api and
calling the appropriate versioned function, or by creating a macro to point to
the right version via an alias.  I can fix that easily enough.

> > +Next remove the corresponding versioned export
> > +.. code-block:: c
> > +
> > + -VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
> > +
> > +
> > +Note that the internal function definition could also be removed, but its used
> > +in our example by the newer version _v21, so we leave it in place.  This is a
> > +coding style choice.
> > +
> > +Lastly, we need to bump the LIBABIVER number for this library in the Makefile to
> > +indicate to applications doing dynamic linking that this is a later, and
> > +possibly incompatible library version:
> > +
> > +.. code-block:: c
> > +
> > +   -LIBABIVER := 1
> > +   +LIBABIVER := 2
> 
> Very well explained, thanks.
> 
> [...]
> > +        rte_acl_add_rules;
> > +        rte_acl_build;
> > +        rte_acl_classify;
> > +        rte_acl_classify_alg;
> > +        rte_acl_classify_scalar;
> > +        rte_acl_dump;
> > +        rte_acl_create
> 
> Not in alphabetical order.
> 
No, none of them are, but that can be adjusted, though I'd like to do that
separately from this documentation.

> 
> As you copy a part of abi.rst, it should be removed from the original doc.
> 
Sure
> Thanks Neil
> 

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v3 2/7] mbuf: use the reserved 16 bits for double vlan
  @ 2015-06-25  8:31  3%       ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-25  8:31 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

Hi Neil

> -----Original Message-----
> From: Zhang, Helin
> Sent: Thursday, June 11, 2015 3:04 PM
> To: dev@dpdk.org
> Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; Richardson, Bruce;
> olivier.matz@6wind.com; Zhang, Helin
> Subject: [PATCH v3 2/7] mbuf: use the reserved 16 bits for double vlan
> 
> Use the reserved 16 bits in rte_mbuf structure for the outer vlan, also add QinQ
> offloading flags for both RX and TX sides.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> v2 changes:
> * Fixed a typo.
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index
> ab6de67..84fe181 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -101,11 +101,17 @@ extern "C" {
>  #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with
> IPv6 header. */
>  #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match.
> */
>  #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if
> FDIR match. */
> +#define PKT_RX_QINQ_PKT      (1ULL << 15)  /**< RX packet with double
> VLAN stripped. */
>  /* add new RX flags here */
> 
>  /* add new TX flags here */
> 
>  /**
> + * Second VLAN insertion (QinQ) flag.
> + */
> +#define PKT_TX_QINQ_PKT    (1ULL << 49)   /**< TX packet with double
> VLAN inserted. */
> +
> +/**
>   * TCP segmentation offload. To enable this offload feature for a
>   * packet to be transmitted on hardware supporting TSO:
>   *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies @@
> -279,7 +285,7 @@ struct rte_mbuf {
>  	uint16_t data_len;        /**< Amount of data in segment buffer. */
>  	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
>  	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
> -	uint16_t reserved;
> +	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU
> +order) */
Do you think here is a ABI break or not? Just using the reserved 16 bits, which was
intended for the second_vlan_tag. Thanks in advance!
I did not see any "Incompatible" reported by validate_abi.sh.

Regards,
Helin

>  	union {
>  		uint32_t rss;     /**< RSS hash result if RSS enabled */
>  		struct {
> @@ -777,6 +783,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
>  	m->pkt_len = 0;
>  	m->tx_offload = 0;
>  	m->vlan_tci = 0;
> +	m->vlan_tci_outer = 0;
>  	m->nb_segs = 1;
>  	m->port = 0xff;
> 
> @@ -849,6 +856,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi,
> struct rte_mbuf *m)
>  	mi->data_len = m->data_len;
>  	mi->port = m->port;
>  	mi->vlan_tci = m->vlan_tci;
> +	mi->vlan_tci_outer = m->vlan_tci_outer;
>  	mi->tx_offload = m->tx_offload;
>  	mi->hash = m->hash;
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-25  7:42  4%       ` Gonzalez Monroy, Sergio
@ 2015-06-25  8:00  4%         ` Gonzalez Monroy, Sergio
  0 siblings, 0 replies; 200+ results
From: Gonzalez Monroy, Sergio @ 2015-06-25  8:00 UTC (permalink / raw)
  To: Zhang, Helin, Neil Horman, dev

On 25/06/2015 08:42, Gonzalez Monroy, Sergio wrote:
> On 25/06/2015 08:19, Zhang, Helin wrote:
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
>>> Sent: Thursday, June 25, 2015 2:35 AM
>>> To: dev@dpdk.org
>>> Subject: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
>>>
>>> People have been asking for ways to use the ABI macros, heres some 
>>> docs to
>>> clarify their use.  Included is:
>>>
>>> * An overview of what ABI is
>>> * Details of the ABI deprecation process
>>> * Details of the versioning macros
>>> * Examples of their use
>>> * Details of how to use the ABI validator
>>>
>>> Thanks to John Mcnamara, who duplicated much of this effort at Intel 
>>> while I was
>>> working on it.  Much of the introductory material was gathered and 
>>> cleaned up
>>> by him
>>>
>>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>>> CC: john.mcnamara@intel.com
>>> CC: thomas.monjalon@6wind.com
>>>
>>> Change notes:
>>>
>>> v2)
>>>       * Fixed RST indentations and spelling errors
>>>       * Rebased to upstream to fix index.rst conflict
>>> ---
>>>   doc/guides/guidelines/index.rst      |   1 +
>>>   doc/guides/guidelines/versioning.rst | 456
>>> +++++++++++++++++++++++++++++++++++
>>>   2 files changed, 457 insertions(+)
>>>   create mode 100644 doc/guides/guidelines/versioning.rst
>>>
>>> diff --git a/doc/guides/guidelines/index.rst 
>>> b/doc/guides/guidelines/index.rst
>>> index 0ee9ab3..bfb9fa3 100644
>>> --- a/doc/guides/guidelines/index.rst
>>> +++ b/doc/guides/guidelines/index.rst
>>> @@ -7,3 +7,4 @@ Guidelines
>>>
>>>       coding_style
>>>       design
>>> +    versioning
>>> diff --git a/doc/guides/guidelines/versioning.rst
>>> b/doc/guides/guidelines/versioning.rst
>>> new file mode 100644
>>> index 0000000..2aef526
>>> --- /dev/null
>>> +++ b/doc/guides/guidelines/versioning.rst
>>> @@ -0,0 +1,456 @@
>>> +Managing ABI updates
>>> +====================
>>> +
>>> +Description
>>> +-----------
>>> +
>>> +This document details some methods for handling ABI management in the
>>> DPDK.
>>> +Note this document is not exhaustive, in that C library versioning is
>>> +flexible allowing multiple methods to achieve various goals, but it
>>> +will provide the user with some introductory methods
>>> +
>>> +General Guidelines
>>> +------------------
>>> +
>>> +#. Whenever possible, ABI should be preserved #. The addition of
>>> +symbols is generally not problematic #. The modification of symbols 
>>> can
>>> +generally be managed with versioning #. The removal of symbols
>>> +generally is an ABI break and requires bumping of the
>>> +   LIBABIVER macro
>>> +
>>> +What is an ABI
>>> +--------------
>>> +
>>> +An ABI (Application Binary Interface) is the set of runtime interfaces
>>> +exposed by a library. It is similar to an API (Application Programming
>>> +Interface) but is the result of compilation.  It is also effectively
>>> +cloned when applications link to dynamic libraries.  That is to say
>>> +when an application is compiled to link against dynamic libraries, it
>>> +is assumed that the ABI remains constant between the time the 
>>> application is
>>> compiled/linked, and the time that it runs.
>>> +Therefore, in the case of dynamic linking, it is critical that an ABI
>>> +is preserved, or (when modified), done in such a way that the
>>> +application is unable to behave improperly or in an unexpected 
>>> fashion.
>>> +
>>> +The DPDK ABI policy
>>> +-------------------
>>> +
>>> +ABI versions are set at the time of major release labeling, and the 
>>> ABI
>>> +may change multiple times, without warning, between the last release
>>> +label and the HEAD label of the git tree.
>>> +
>>> +ABI versions, once released, are available until such time as their
>>> +deprecation has been noted in the Release Notes for at least one major
>>> +release cycle. For example consider the case where the ABI for DPDK 
>>> 2.0
>>> +has been shipped and then a decision is made to modify it during the
>>> +development of DPDK 2.1. The decision will be recorded in the Release
>>> +Notes for the DPDK 2.1 release and the modification will be made 
>>> available in
>>> the DPDK 2.2 release.
>>> +
>>> +ABI versions may be deprecated in whole or in part as needed by a 
>>> given
>>> +update.
>>> +
>>> +Some ABI changes may be too significant to reasonably maintain 
>>> multiple
>>> +versions. In those cases ABI's may be updated without backward
>>> +compatibility being provided. The requirements for doing so are:
>>> +
>>> +#. At least 3 acknowledgments of the need to do so must be made on the
>>> +   dpdk.org mailing list.
>>> +
>>> +#. A full deprecation cycle, as explained above, must be made to offer
>>> +   downstream consumers sufficient warning of the change.
>>> +
>>> +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI 
>>> changes are
>>> +   incorporated must be incremented in parallel with the ABI changes
>>> +   themselves.
>>> +
>>> +Note that the above process for ABI deprecation should not be
>>> +undertaken lightly. ABI stability is extremely important for 
>>> downstream
>>> +consumers of the DPDK, especially when distributed in shared object
>>> +form. Every effort should be made to preserve the ABI whenever
>>> +possible. The ABI should only be changed for significant reasons, such
>>> +as performance enhancements. ABI breakage due to changes such as
>>> +reorganizing public structure fields for aesthetic or readability 
>>> purposes should
>>> be avoided.
>>> +
>>> +Examples of Deprecation Notices
>>> +-------------------------------
>>> +
>>> +The following are some examples of ABI deprecation notices which would
>>> +be added to the Release Notes:
>>> +
>>> +* The Macro ``#RTE_FOO`` is deprecated and will be removed with 
>>> version
>>> +2.0,
>>> +  to be replaced with the inline function ``rte_foo()``.
>>> +
>>> +* The function ``rte_mbuf_grok()`` has been updated to include a new
>>> +parameter
>>> +  in version 2.0. Backwards compatibility will be maintained for this
>>> +function
>>> +  until the release of version 2.1
>>> +
>>> +* The members of ``struct rte_foo`` have been reorganized in release
>>> +2.0 for
>>> +  performance reasons. Existing binary applications will have 
>>> backwards
>>> +  compatibility in release 2.0, while newly built binaries will 
>>> need to
>>> +  reference the new structure variant ``struct rte_foo2``.
>>> +Compatibility will
>>> +  be removed in release 2.2, and all applications will require 
>>> updating
>>> +and
>>> +  rebuilding to the new structure at that time, which will be renamed
>>> +to the
>>> +  original ``struct rte_foo``.
>>> +
>>> +* Significant ABI changes are planned for the ``librte_dostuff``
>>> +library. The
>>> +  upcoming release 2.0 will not contain these changes, but release 2.1
>>> +will,
>>> +  and no backwards compatibility is planned due to the extensive 
>>> nature
>>> +of
>>> +  these changes. Binaries using this library built prior to version 
>>> 2.1
>>> +will
>>> +  require updating and recompilation.
>>> +
>>> +Versioning Macros
>>> +-----------------
>>> +
>>> +When a symbol is exported from a library to provide an API, it also
>>> +provides a calling convention (ABI) that is embodied in its name,
>>> +return type and arguments. Occasionally that function may need to
>>> +change to accommodate new functionality or behavior. When that occurs,
>>> +it is desirable to allow for backward compatibility for a time with
>>> +older binaries that are dynamically linked to the DPDK.
>>> +
>>> +To support backward compatibility the
>>> +``lib/librte_compat/rte_compat.h``
>>> +header file provides macros to use when updating exported functions.
>>> +These macros are used in conjunction with the
>>> +``rte_<library>_version.map`` file for a given library to allow
>>> +multiple versions of a symbol to exist in a shared library so that 
>>> older binaries
>>> need not be immediately recompiled.
>>> +
>>> +The macros exported are:
>>> +
>>> +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry
>>> +binding
>>> +  unversioned symbol ``b`` to the internal function ``b_e``.
>>> +
>>> +
>>> +* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
>>> +  unversioned symbol ``b`` to the internal function ``b_e``.
>>> +
>>> +* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry
>>> +instructing
>>> +  the linker to bind references to symbol ``b`` to the internal symbol
>>> +  ``b_e``.
>>> +
>>> +
>>> +Examples of ABI Macro use
>>> +-------------------------
>>> +
>>> +Updating a public API
>>> +~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Assume we have a function as follows
>>> +
>>> +.. code-block:: c
>>> +
>>> + /*
>>> +  * Create an acl context object for apps to
>>> +  * manipulate
>>> +  */
>>> + struct rte_acl_ctx *
>>> + rte_acl_create(const struct rte_acl_param *param) {
>>> +        ...
>>> + }
>>> +
>>> +
>>> +Assume that struct rte_acl_ctx is a private structure, and that a
>>> +developer wishes to enhance the acl api so that a debugging flag 
>>> can be
>>> +enabled on a per-context basis.  This requires an addition to the
>>> +structure (which, being private, is safe), but it also requires
>>> +modifying the code as follows
>>> +
>>> +.. code-block:: c
>>> +
>>> + /*
>>> +  * Create an acl context object for apps to
>>> +  * manipulate
>>> +  */
>>> + struct rte_acl_ctx *
>>> + rte_acl_create(const struct rte_acl_param *param, int debug) {
>>> +        ...
>>> + }
>>> +
>>> +
>>> +Note also that, being a public function, the header file prototype 
>>> must
>>> +also be changed, as must all the call sites, to reflect the new ABI
>>> +footprint.  We will maintain previous ABI versions that are accessible
>>> +only to previously compiled binaries
>>> +
>>> +The addition of a parameter to the function is ABI breaking as the
>>> +function is public, and existing application may use it in its current
>>> +form.  However, the compatibility macros in DPDK allow a developer to
>>> +use symbol versioning so that multiple functions can be mapped to the
>>> +same public symbol based on when an application was linked to it.  To
>>> +see how this is done, we start with the requisite libraries version 
>>> map
>>> +file.  Initially the version map file for the acl library looks like
>>> +this
>>> +
>>> +.. code-block:: none
>>> +
>>> +   DPDK_2.0 {
>>> +        global:
>>> +
>>> +        rte_acl_add_rules;
>>> +        rte_acl_build;
>>> +        rte_acl_classify;
>>> +        rte_acl_classify_alg;
>>> +        rte_acl_classify_scalar;
>>> +        rte_acl_create;
>>> +        rte_acl_dump;
>>> +        rte_acl_find_existing;
>>> +        rte_acl_free;
>>> +        rte_acl_ipv4vlan_add_rules;
>>> +        rte_acl_ipv4vlan_build;
>>> +        rte_acl_list_dump;
>>> +        rte_acl_reset;
>>> +        rte_acl_reset_rules;
>>> +        rte_acl_set_ctx_classify;
>>> +
>>> +        local: *;
>>> +   };
>>> +
>>> +This file needs to be modified as follows
>>> +
>>> +.. code-block:: none
>>> +
>>> +   DPDK_2.0 {
>>> +        global:
>>> +
>>> +        rte_acl_add_rules;
>>> +        rte_acl_build;
>>> +        rte_acl_classify;
>>> +        rte_acl_classify_alg;
>>> +        rte_acl_classify_scalar;
>>> +        rte_acl_create;
>>> +        rte_acl_dump;
>>> +        rte_acl_find_existing;
>>> +        rte_acl_free;
>>> +        rte_acl_ipv4vlan_add_rules;
>>> +        rte_acl_ipv4vlan_build;
>>> +        rte_acl_list_dump;
>>> +        rte_acl_reset;
>>> +        rte_acl_reset_rules;
>>> +        rte_acl_set_ctx_classify;
>>> +
>>> +        local: *;
>>> +   };
>>> +
>>> +   DPDK_2.1 {
>>> +        global:
>>> +        rte_acl_create;
>> One question, does it need a line of "local: *;", like it did in
>> librte_ether/rte_ether_version.map?
> No, it does not. You only need to specify 'local' in the default/base 
> node, which
> in this case/example is DPDK_2.0.
>
> Sergio
Just to be clear, as I think I may have misused the term 'default' here, 
it is recommended
to just specify 'local: *;' in just one node (it could confuse the 
linker) and it doesn't really
matter which one.

Quoting http://www.akkadia.org/drepper/symbol-versioning:
"It makes no sense at all to associate versions with symbols which are 
not exported. Therefore the `local:' sections of all but the base 
version are empty and the `local:' section of the base version simply 
contains `*'. This will match all symbols which are not explicitly 
mentioned in any `global:' list."

Sergio

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-25  7:19  4%     ` Zhang, Helin
@ 2015-06-25  7:42  4%       ` Gonzalez Monroy, Sergio
  2015-06-25  8:00  4%         ` Gonzalez Monroy, Sergio
  2015-06-25 12:25  4%       ` Neil Horman
  1 sibling, 1 reply; 200+ results
From: Gonzalez Monroy, Sergio @ 2015-06-25  7:42 UTC (permalink / raw)
  To: Zhang, Helin, Neil Horman, dev

On 25/06/2015 08:19, Zhang, Helin wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
>> Sent: Thursday, June 25, 2015 2:35 AM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
>>
>> People have been asking for ways to use the ABI macros, heres some docs to
>> clarify their use.  Included is:
>>
>> * An overview of what ABI is
>> * Details of the ABI deprecation process
>> * Details of the versioning macros
>> * Examples of their use
>> * Details of how to use the ABI validator
>>
>> Thanks to John Mcnamara, who duplicated much of this effort at Intel while I was
>> working on it.  Much of the introductory material was gathered and cleaned up
>> by him
>>
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> CC: john.mcnamara@intel.com
>> CC: thomas.monjalon@6wind.com
>>
>> Change notes:
>>
>> v2)
>>       * Fixed RST indentations and spelling errors
>>       * Rebased to upstream to fix index.rst conflict
>> ---
>>   doc/guides/guidelines/index.rst      |   1 +
>>   doc/guides/guidelines/versioning.rst | 456
>> +++++++++++++++++++++++++++++++++++
>>   2 files changed, 457 insertions(+)
>>   create mode 100644 doc/guides/guidelines/versioning.rst
>>
>> diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
>> index 0ee9ab3..bfb9fa3 100644
>> --- a/doc/guides/guidelines/index.rst
>> +++ b/doc/guides/guidelines/index.rst
>> @@ -7,3 +7,4 @@ Guidelines
>>
>>       coding_style
>>       design
>> +    versioning
>> diff --git a/doc/guides/guidelines/versioning.rst
>> b/doc/guides/guidelines/versioning.rst
>> new file mode 100644
>> index 0000000..2aef526
>> --- /dev/null
>> +++ b/doc/guides/guidelines/versioning.rst
>> @@ -0,0 +1,456 @@
>> +Managing ABI updates
>> +====================
>> +
>> +Description
>> +-----------
>> +
>> +This document details some methods for handling ABI management in the
>> DPDK.
>> +Note this document is not exhaustive, in that C library versioning is
>> +flexible allowing multiple methods to achieve various goals, but it
>> +will provide the user with some introductory methods
>> +
>> +General Guidelines
>> +------------------
>> +
>> +#. Whenever possible, ABI should be preserved #. The addition of
>> +symbols is generally not problematic #. The modification of symbols can
>> +generally be managed with versioning #. The removal of symbols
>> +generally is an ABI break and requires bumping of the
>> +   LIBABIVER macro
>> +
>> +What is an ABI
>> +--------------
>> +
>> +An ABI (Application Binary Interface) is the set of runtime interfaces
>> +exposed by a library. It is similar to an API (Application Programming
>> +Interface) but is the result of compilation.  It is also effectively
>> +cloned when applications link to dynamic libraries.  That is to say
>> +when an application is compiled to link against dynamic libraries, it
>> +is assumed that the ABI remains constant between the time the application is
>> compiled/linked, and the time that it runs.
>> +Therefore, in the case of dynamic linking, it is critical that an ABI
>> +is preserved, or (when modified), done in such a way that the
>> +application is unable to behave improperly or in an unexpected fashion.
>> +
>> +The DPDK ABI policy
>> +-------------------
>> +
>> +ABI versions are set at the time of major release labeling, and the ABI
>> +may change multiple times, without warning, between the last release
>> +label and the HEAD label of the git tree.
>> +
>> +ABI versions, once released, are available until such time as their
>> +deprecation has been noted in the Release Notes for at least one major
>> +release cycle. For example consider the case where the ABI for DPDK 2.0
>> +has been shipped and then a decision is made to modify it during the
>> +development of DPDK 2.1. The decision will be recorded in the Release
>> +Notes for the DPDK 2.1 release and the modification will be made available in
>> the DPDK 2.2 release.
>> +
>> +ABI versions may be deprecated in whole or in part as needed by a given
>> +update.
>> +
>> +Some ABI changes may be too significant to reasonably maintain multiple
>> +versions. In those cases ABI's may be updated without backward
>> +compatibility being provided. The requirements for doing so are:
>> +
>> +#. At least 3 acknowledgments of the need to do so must be made on the
>> +   dpdk.org mailing list.
>> +
>> +#. A full deprecation cycle, as explained above, must be made to offer
>> +   downstream consumers sufficient warning of the change.
>> +
>> +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
>> +   incorporated must be incremented in parallel with the ABI changes
>> +   themselves.
>> +
>> +Note that the above process for ABI deprecation should not be
>> +undertaken lightly. ABI stability is extremely important for downstream
>> +consumers of the DPDK, especially when distributed in shared object
>> +form. Every effort should be made to preserve the ABI whenever
>> +possible. The ABI should only be changed for significant reasons, such
>> +as performance enhancements. ABI breakage due to changes such as
>> +reorganizing public structure fields for aesthetic or readability purposes should
>> be avoided.
>> +
>> +Examples of Deprecation Notices
>> +-------------------------------
>> +
>> +The following are some examples of ABI deprecation notices which would
>> +be added to the Release Notes:
>> +
>> +* The Macro ``#RTE_FOO`` is deprecated and will be removed with version
>> +2.0,
>> +  to be replaced with the inline function ``rte_foo()``.
>> +
>> +* The function ``rte_mbuf_grok()`` has been updated to include a new
>> +parameter
>> +  in version 2.0. Backwards compatibility will be maintained for this
>> +function
>> +  until the release of version 2.1
>> +
>> +* The members of ``struct rte_foo`` have been reorganized in release
>> +2.0 for
>> +  performance reasons. Existing binary applications will have backwards
>> +  compatibility in release 2.0, while newly built binaries will need to
>> +  reference the new structure variant ``struct rte_foo2``.
>> +Compatibility will
>> +  be removed in release 2.2, and all applications will require updating
>> +and
>> +  rebuilding to the new structure at that time, which will be renamed
>> +to the
>> +  original ``struct rte_foo``.
>> +
>> +* Significant ABI changes are planned for the ``librte_dostuff``
>> +library. The
>> +  upcoming release 2.0 will not contain these changes, but release 2.1
>> +will,
>> +  and no backwards compatibility is planned due to the extensive nature
>> +of
>> +  these changes. Binaries using this library built prior to version 2.1
>> +will
>> +  require updating and recompilation.
>> +
>> +Versioning Macros
>> +-----------------
>> +
>> +When a symbol is exported from a library to provide an API, it also
>> +provides a calling convention (ABI) that is embodied in its name,
>> +return type and arguments. Occasionally that function may need to
>> +change to accommodate new functionality or behavior. When that occurs,
>> +it is desirable to allow for backward compatibility for a time with
>> +older binaries that are dynamically linked to the DPDK.
>> +
>> +To support backward compatibility the
>> +``lib/librte_compat/rte_compat.h``
>> +header file provides macros to use when updating exported functions.
>> +These macros are used in conjunction with the
>> +``rte_<library>_version.map`` file for a given library to allow
>> +multiple versions of a symbol to exist in a shared library so that older binaries
>> need not be immediately recompiled.
>> +
>> +The macros exported are:
>> +
>> +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry
>> +binding
>> +  unversioned symbol ``b`` to the internal function ``b_e``.
>> +
>> +
>> +* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
>> +  unversioned symbol ``b`` to the internal function ``b_e``.
>> +
>> +* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry
>> +instructing
>> +  the linker to bind references to symbol ``b`` to the internal symbol
>> +  ``b_e``.
>> +
>> +
>> +Examples of ABI Macro use
>> +-------------------------
>> +
>> +Updating a public API
>> +~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Assume we have a function as follows
>> +
>> +.. code-block:: c
>> +
>> + /*
>> +  * Create an acl context object for apps to
>> +  * manipulate
>> +  */
>> + struct rte_acl_ctx *
>> + rte_acl_create(const struct rte_acl_param *param) {
>> +        ...
>> + }
>> +
>> +
>> +Assume that struct rte_acl_ctx is a private structure, and that a
>> +developer wishes to enhance the acl api so that a debugging flag can be
>> +enabled on a per-context basis.  This requires an addition to the
>> +structure (which, being private, is safe), but it also requires
>> +modifying the code as follows
>> +
>> +.. code-block:: c
>> +
>> + /*
>> +  * Create an acl context object for apps to
>> +  * manipulate
>> +  */
>> + struct rte_acl_ctx *
>> + rte_acl_create(const struct rte_acl_param *param, int debug) {
>> +        ...
>> + }
>> +
>> +
>> +Note also that, being a public function, the header file prototype must
>> +also be changed, as must all the call sites, to reflect the new ABI
>> +footprint.  We will maintain previous ABI versions that are accessible
>> +only to previously compiled binaries
>> +
>> +The addition of a parameter to the function is ABI breaking as the
>> +function is public, and existing application may use it in its current
>> +form.  However, the compatibility macros in DPDK allow a developer to
>> +use symbol versioning so that multiple functions can be mapped to the
>> +same public symbol based on when an application was linked to it.  To
>> +see how this is done, we start with the requisite libraries version map
>> +file.  Initially the version map file for the acl library looks like
>> +this
>> +
>> +.. code-block:: none
>> +
>> +   DPDK_2.0 {
>> +        global:
>> +
>> +        rte_acl_add_rules;
>> +        rte_acl_build;
>> +        rte_acl_classify;
>> +        rte_acl_classify_alg;
>> +        rte_acl_classify_scalar;
>> +        rte_acl_create;
>> +        rte_acl_dump;
>> +        rte_acl_find_existing;
>> +        rte_acl_free;
>> +        rte_acl_ipv4vlan_add_rules;
>> +        rte_acl_ipv4vlan_build;
>> +        rte_acl_list_dump;
>> +        rte_acl_reset;
>> +        rte_acl_reset_rules;
>> +        rte_acl_set_ctx_classify;
>> +
>> +        local: *;
>> +   };
>> +
>> +This file needs to be modified as follows
>> +
>> +.. code-block:: none
>> +
>> +   DPDK_2.0 {
>> +        global:
>> +
>> +        rte_acl_add_rules;
>> +        rte_acl_build;
>> +        rte_acl_classify;
>> +        rte_acl_classify_alg;
>> +        rte_acl_classify_scalar;
>> +        rte_acl_create;
>> +        rte_acl_dump;
>> +        rte_acl_find_existing;
>> +        rte_acl_free;
>> +        rte_acl_ipv4vlan_add_rules;
>> +        rte_acl_ipv4vlan_build;
>> +        rte_acl_list_dump;
>> +        rte_acl_reset;
>> +        rte_acl_reset_rules;
>> +        rte_acl_set_ctx_classify;
>> +
>> +        local: *;
>> +   };
>> +
>> +   DPDK_2.1 {
>> +        global:
>> +        rte_acl_create;
> One question, does it need a line of "local: *;", like it did in
> librte_ether/rte_ether_version.map?
No, it does not. You only need to specify 'local' in the default/base 
node, which
in this case/example is DPDK_2.0.

Sergio
>> +
>> +   } DPDK_2.0;
>> +
>>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-24 18:34 14%   ` [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation Neil Horman
  2015-06-24 21:09  9%     ` Thomas Monjalon
@ 2015-06-25  7:19  4%     ` Zhang, Helin
  2015-06-25  7:42  4%       ` Gonzalez Monroy, Sergio
  2015-06-25 12:25  4%       ` Neil Horman
  1 sibling, 2 replies; 200+ results
From: Zhang, Helin @ 2015-06-25  7:19 UTC (permalink / raw)
  To: Neil Horman, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Thursday, June 25, 2015 2:35 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
> 
> People have been asking for ways to use the ABI macros, heres some docs to
> clarify their use.  Included is:
> 
> * An overview of what ABI is
> * Details of the ABI deprecation process
> * Details of the versioning macros
> * Examples of their use
> * Details of how to use the ABI validator
> 
> Thanks to John Mcnamara, who duplicated much of this effort at Intel while I was
> working on it.  Much of the introductory material was gathered and cleaned up
> by him
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: john.mcnamara@intel.com
> CC: thomas.monjalon@6wind.com
> 
> Change notes:
> 
> v2)
>      * Fixed RST indentations and spelling errors
>      * Rebased to upstream to fix index.rst conflict
> ---
>  doc/guides/guidelines/index.rst      |   1 +
>  doc/guides/guidelines/versioning.rst | 456
> +++++++++++++++++++++++++++++++++++
>  2 files changed, 457 insertions(+)
>  create mode 100644 doc/guides/guidelines/versioning.rst
> 
> diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
> index 0ee9ab3..bfb9fa3 100644
> --- a/doc/guides/guidelines/index.rst
> +++ b/doc/guides/guidelines/index.rst
> @@ -7,3 +7,4 @@ Guidelines
> 
>      coding_style
>      design
> +    versioning
> diff --git a/doc/guides/guidelines/versioning.rst
> b/doc/guides/guidelines/versioning.rst
> new file mode 100644
> index 0000000..2aef526
> --- /dev/null
> +++ b/doc/guides/guidelines/versioning.rst
> @@ -0,0 +1,456 @@
> +Managing ABI updates
> +====================
> +
> +Description
> +-----------
> +
> +This document details some methods for handling ABI management in the
> DPDK.
> +Note this document is not exhaustive, in that C library versioning is
> +flexible allowing multiple methods to achieve various goals, but it
> +will provide the user with some introductory methods
> +
> +General Guidelines
> +------------------
> +
> +#. Whenever possible, ABI should be preserved #. The addition of
> +symbols is generally not problematic #. The modification of symbols can
> +generally be managed with versioning #. The removal of symbols
> +generally is an ABI break and requires bumping of the
> +   LIBABIVER macro
> +
> +What is an ABI
> +--------------
> +
> +An ABI (Application Binary Interface) is the set of runtime interfaces
> +exposed by a library. It is similar to an API (Application Programming
> +Interface) but is the result of compilation.  It is also effectively
> +cloned when applications link to dynamic libraries.  That is to say
> +when an application is compiled to link against dynamic libraries, it
> +is assumed that the ABI remains constant between the time the application is
> compiled/linked, and the time that it runs.
> +Therefore, in the case of dynamic linking, it is critical that an ABI
> +is preserved, or (when modified), done in such a way that the
> +application is unable to behave improperly or in an unexpected fashion.
> +
> +The DPDK ABI policy
> +-------------------
> +
> +ABI versions are set at the time of major release labeling, and the ABI
> +may change multiple times, without warning, between the last release
> +label and the HEAD label of the git tree.
> +
> +ABI versions, once released, are available until such time as their
> +deprecation has been noted in the Release Notes for at least one major
> +release cycle. For example consider the case where the ABI for DPDK 2.0
> +has been shipped and then a decision is made to modify it during the
> +development of DPDK 2.1. The decision will be recorded in the Release
> +Notes for the DPDK 2.1 release and the modification will be made available in
> the DPDK 2.2 release.
> +
> +ABI versions may be deprecated in whole or in part as needed by a given
> +update.
> +
> +Some ABI changes may be too significant to reasonably maintain multiple
> +versions. In those cases ABI's may be updated without backward
> +compatibility being provided. The requirements for doing so are:
> +
> +#. At least 3 acknowledgments of the need to do so must be made on the
> +   dpdk.org mailing list.
> +
> +#. A full deprecation cycle, as explained above, must be made to offer
> +   downstream consumers sufficient warning of the change.
> +
> +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
> +   incorporated must be incremented in parallel with the ABI changes
> +   themselves.
> +
> +Note that the above process for ABI deprecation should not be
> +undertaken lightly. ABI stability is extremely important for downstream
> +consumers of the DPDK, especially when distributed in shared object
> +form. Every effort should be made to preserve the ABI whenever
> +possible. The ABI should only be changed for significant reasons, such
> +as performance enhancements. ABI breakage due to changes such as
> +reorganizing public structure fields for aesthetic or readability purposes should
> be avoided.
> +
> +Examples of Deprecation Notices
> +-------------------------------
> +
> +The following are some examples of ABI deprecation notices which would
> +be added to the Release Notes:
> +
> +* The Macro ``#RTE_FOO`` is deprecated and will be removed with version
> +2.0,
> +  to be replaced with the inline function ``rte_foo()``.
> +
> +* The function ``rte_mbuf_grok()`` has been updated to include a new
> +parameter
> +  in version 2.0. Backwards compatibility will be maintained for this
> +function
> +  until the release of version 2.1
> +
> +* The members of ``struct rte_foo`` have been reorganized in release
> +2.0 for
> +  performance reasons. Existing binary applications will have backwards
> +  compatibility in release 2.0, while newly built binaries will need to
> +  reference the new structure variant ``struct rte_foo2``.
> +Compatibility will
> +  be removed in release 2.2, and all applications will require updating
> +and
> +  rebuilding to the new structure at that time, which will be renamed
> +to the
> +  original ``struct rte_foo``.
> +
> +* Significant ABI changes are planned for the ``librte_dostuff``
> +library. The
> +  upcoming release 2.0 will not contain these changes, but release 2.1
> +will,
> +  and no backwards compatibility is planned due to the extensive nature
> +of
> +  these changes. Binaries using this library built prior to version 2.1
> +will
> +  require updating and recompilation.
> +
> +Versioning Macros
> +-----------------
> +
> +When a symbol is exported from a library to provide an API, it also
> +provides a calling convention (ABI) that is embodied in its name,
> +return type and arguments. Occasionally that function may need to
> +change to accommodate new functionality or behavior. When that occurs,
> +it is desirable to allow for backward compatibility for a time with
> +older binaries that are dynamically linked to the DPDK.
> +
> +To support backward compatibility the
> +``lib/librte_compat/rte_compat.h``
> +header file provides macros to use when updating exported functions.
> +These macros are used in conjunction with the
> +``rte_<library>_version.map`` file for a given library to allow
> +multiple versions of a symbol to exist in a shared library so that older binaries
> need not be immediately recompiled.
> +
> +The macros exported are:
> +
> +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry
> +binding
> +  unversioned symbol ``b`` to the internal function ``b_e``.
> +
> +
> +* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
> +  unversioned symbol ``b`` to the internal function ``b_e``.
> +
> +* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry
> +instructing
> +  the linker to bind references to symbol ``b`` to the internal symbol
> +  ``b_e``.
> +
> +
> +Examples of ABI Macro use
> +-------------------------
> +
> +Updating a public API
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +Assume we have a function as follows
> +
> +.. code-block:: c
> +
> + /*
> +  * Create an acl context object for apps to
> +  * manipulate
> +  */
> + struct rte_acl_ctx *
> + rte_acl_create(const struct rte_acl_param *param) {
> +        ...
> + }
> +
> +
> +Assume that struct rte_acl_ctx is a private structure, and that a
> +developer wishes to enhance the acl api so that a debugging flag can be
> +enabled on a per-context basis.  This requires an addition to the
> +structure (which, being private, is safe), but it also requires
> +modifying the code as follows
> +
> +.. code-block:: c
> +
> + /*
> +  * Create an acl context object for apps to
> +  * manipulate
> +  */
> + struct rte_acl_ctx *
> + rte_acl_create(const struct rte_acl_param *param, int debug) {
> +        ...
> + }
> +
> +
> +Note also that, being a public function, the header file prototype must
> +also be changed, as must all the call sites, to reflect the new ABI
> +footprint.  We will maintain previous ABI versions that are accessible
> +only to previously compiled binaries
> +
> +The addition of a parameter to the function is ABI breaking as the
> +function is public, and existing application may use it in its current
> +form.  However, the compatibility macros in DPDK allow a developer to
> +use symbol versioning so that multiple functions can be mapped to the
> +same public symbol based on when an application was linked to it.  To
> +see how this is done, we start with the requisite libraries version map
> +file.  Initially the version map file for the acl library looks like
> +this
> +
> +.. code-block:: none
> +
> +   DPDK_2.0 {
> +        global:
> +
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_create;
> +        rte_acl_dump;
> +        rte_acl_find_existing;
> +        rte_acl_free;
> +        rte_acl_ipv4vlan_add_rules;
> +        rte_acl_ipv4vlan_build;
> +        rte_acl_list_dump;
> +        rte_acl_reset;
> +        rte_acl_reset_rules;
> +        rte_acl_set_ctx_classify;
> +
> +        local: *;
> +   };
> +
> +This file needs to be modified as follows
> +
> +.. code-block:: none
> +
> +   DPDK_2.0 {
> +        global:
> +
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_create;
> +        rte_acl_dump;
> +        rte_acl_find_existing;
> +        rte_acl_free;
> +        rte_acl_ipv4vlan_add_rules;
> +        rte_acl_ipv4vlan_build;
> +        rte_acl_list_dump;
> +        rte_acl_reset;
> +        rte_acl_reset_rules;
> +        rte_acl_set_ctx_classify;
> +
> +        local: *;
> +   };
> +
> +   DPDK_2.1 {
> +        global:
> +        rte_acl_create;
One question, does it need a line of "local: *;", like it did in
librte_ether/rte_ether_version.map?

> +
> +   } DPDK_2.0;
> +
> +The addition of the new block tells the linker that a new version node
> +is available (DPDK_2.1), which contains the symbol rte_acl_create, and
> +inherits the symbols from the DPDK_2.0 node.  This list is directly
> +translated into a list of exported symbols when DPDK is compiled as a
> +shared library
> +
> +Next, we need to specify in the code which function map to the
> +rte_acl_create symbol at which versions.  First, at the site of the
> +initial symbol definition, we need to update the function so that it is
> +uniquely named, and not in conflict with the public symbol name
> +
> +.. code-block:: c
> +
> +  struct rte_acl_ctx *
> + -rte_acl_create(const struct rte_acl_param *param)
> + +rte_acl_create_v20(const struct rte_acl_param *param)
> + {
> +        size_t sz;
> +        struct rte_acl_ctx *ctx;
> +        ...
> +
> +Note that the base name of the symbol was kept in tact, as this is
> +condusive to the macros used for versioning symbols.  That is our next
> +step, mapping this new symbol name to the initial symbol name at
> +version node 2.0.  Immediately after the function, we add this line of
> +code
> +
> +.. code-block:: c
> +
> +   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
> +
> +Remembering to also add the rte_compat.h header to the requisite c file
> +where these changes are being made.  The above macro instructs the
> +linker to create a new symbol ``rte_acl_create@DPDK_2.0``, which
> +matches the symbol created in older builds, but now points to the above
> +newly named function.  We have now mapped the original rte_acl_create
> +symbol to the original function (but with a new
> +name)
> +
> +Next, we need to create the 2.1 version of the symbol.  We create a new
> +function name, with a different suffix, and  implement it appropriately
> +
> +.. code-block:: c
> +
> +   struct rte_acl_ctx *
> +   rte_acl_create_v21(const struct rte_acl_param *param, int debug);
> +   {
> +        struct rte_acl_ctx *ctx = rte_acl_create_v20(param);
> +
> +        ctx->debug = debug;
> +
> +        return ctx;
> +   }
> +
> +This code serves as our new API call.  Its the same as our old call,
> +but adds the new parameter in place.  Next we need to map this function
> +to the symbol ``rte_acl_create@DPDK_2.1``.  To do this, we modify the
> +public prototype of the call in the header file, adding the macro there
> +to inform all including applications, that on re-link, the default
> +rte_acl_create symbol should point to this function.  Note that we
> +could do this by simply naming the function above rte_acl_create, and
> +the linker would chose the most recent version tag to apply in the
> +version script, but we can also do this in the header file
> +
> +.. code-block:: c
> +
> +   struct rte_acl_ctx *
> +   -rte_acl_create(const struct rte_acl_param *param);
> +   +rte_acl_create(const struct rte_acl_param *param, int debug);
> +   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
> +
> +The BIND_DEFAULT_SYMBOL macro explicitly tells applications that
> +include this header, to link to the rte_acl_create_v21 function and
> +apply the DPDK_2.1 version node to it.  This method is more explicit
> +and flexible than just re-implementing the exact symbol name, and
> +allows for other features (such as linking to the old symbol version by
> +default, when the new ABI is to be opt-in for a period.
> +
> +That's it, on the next shared library rebuild, there will be two
> +versions of rte_acl_create, an old DPDK_2.0 version, used by previously
> +built applications, and a new DPDK_2.1 version, used by future built
> applications.
> +
> +
> +Deprecating part of a public API
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Lets assume that you've done the above update, and after a few releases
> +have passed you decide you would like to retire the old version of the function.
> +After having gone through the ABI deprecation announcement process,
> +removal is easy.  Start by removing the symbol from the requisite version map
> file:
> +
> +.. code-block:: none
> +
> +   DPDK_2.0 {
> +        global:
> +
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_dump;
> + -      rte_acl_create
> +        rte_acl_find_existing;
> +        rte_acl_free;
> +        rte_acl_ipv4vlan_add_rules;
> +        rte_acl_ipv4vlan_build;
> +        rte_acl_list_dump;
> +        rte_acl_reset;
> +        rte_acl_reset_rules;
> +        rte_acl_set_ctx_classify;
> +
> +        local: *;
> +   };
> +
> +   DPDK_2.1 {
> +        global:
> +        rte_acl_create;
> +   } DPDK_2.0;
> +
> +
> +Next remove the corresponding versioned export .. code-block:: c
> +
> + -VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
> +
> +
> +Note that the internal function definition could also be removed, but
> +its used in our example by the newer version _v21, so we leave it in
> +place.  This is a coding style choice.
> +
> +Lastly, we need to bump the LIBABIVER number for this library in the
> +Makefile to indicate to applications doing dynamic linking that this is
> +a later, and possibly incompatible library version:
> +
> +.. code-block:: c
> +
> +   -LIBABIVER := 1
> +   +LIBABIVER := 2
> +
> +Deprecating an entire ABI version
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +While removing a symbol from and ABI may be useful, it is often more
> +practical to remove an entire version node at once.  If a version node
> +completely specifies an API, then removing part of it, typically makes
> +it incomplete.  In those cases it is better to remove the entire node
> +
> +To do this, start by modifying the version map file, such that all
> +symbols from the node to be removed are merged into the next node in
> +the map
> +
> +In the case of our map above, it would transform to look as follows
> +
> +.. code-block:: none
> +
> +   DPDK_2.1 {
> +        global:
> +
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_dump;
> +        rte_acl_create
> +        rte_acl_find_existing;
> +        rte_acl_free;
> +        rte_acl_ipv4vlan_add_rules;
> +        rte_acl_ipv4vlan_build;
> +        rte_acl_list_dump;
> +        rte_acl_reset;
> +        rte_acl_reset_rules;
> +        rte_acl_set_ctx_classify;
> +
> +        local: *;
> + };
> +
> +Then any uses of BIND_DEFAULT_SYMBOL that pointed to the old node
> +should be updated to point to the new version node in any header files
> +for all affected symbols.
> +
> +.. code-block:: c
> +
> + -BIND_DEFAULT_SYMBOL(rte_acl_create, _v20, 2.0);
> + +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
> +
> +Lastly, any VERSION_SYMBOL macros that point to the old version node
> +should be removed, taking care to keep, where need old code in place to
> +support newer versions of the symbol.
> +
> +Running the ABI Validator
> +-------------------------
> +
> +The ``scripts`` directory in the DPDK source tree contains a utility
> +program, ``validate-abi.sh``, for validating the DPDK ABI based on the
> +Linux `ABI Compliance Checker
> +<http://ispras.linuxbase.org/index.php/ABI_compliance_checker>`_.
> +
> +This has a dependency on the ``abi-compliance-checker`` and ``and
> +abi-dumper`` utilities which can be installed via a package manager. For
> example::
> +
> +   sudo yum install abi-compliance-checker
> +   sudo yum install abi-dumper
> +
> +The syntax of the ``validate-abi.sh`` utility is::
> +
> +   ./scripts/validate-abi.sh <TAG1> <TAG2> <TARGET>
> +
> +Where ``TAG1`` and ``TAG2`` are valid git tags on the local repo and
> +target is the usual DPDK compilation target.
> +
> +For example to test the current committed HEAD against a previous
> +release tag we could add a temporary tag and run the utility as follows::
> +
> +   git tag MY_TEMP_TAG
> +   ./scripts/validate-abi.sh v2.0.0 MY_TEMP_TAG
> + x86_64-native-linuxapp-gcc
> +
> +After the validation script completes (it can take a while since it
> +need to compile both tags) it will create compatibility reports in the
> +``./compat_report`` directory. Listed incompatibilities can be found as
> +follows::
> +
> +  grep -lr Incompatible compat_reports/
> --
> 2.1.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  2015-06-24 18:34 14%   ` [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation Neil Horman
@ 2015-06-24 21:09  9%     ` Thomas Monjalon
  2015-06-25 11:35  9%       ` Neil Horman
  2015-06-25  7:19  4%     ` Zhang, Helin
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-24 21:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

2015-06-24 14:34, Neil Horman:
> +Some ABI changes may be too significant to reasonably maintain multiple
> +versions. In those cases ABI's may be updated without backward compatibility
> +being provided. The requirements for doing so are:
> +
> +#. At least 3 acknowledgments of the need to do so must be made on the
> +   dpdk.org mailing list.
> +
> +#. A full deprecation cycle, as explained above, must be made to offer
> +   downstream consumers sufficient warning of the change.
> +
> +#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
> +   incorporated must be incremented in parallel with the ABI changes
> +   themselves.

The proposal was to provide the old and the new ABI in the same source code
during the deprecation cycle. The old ABI would be the default and people
can build the new one by enabling the NEXT_ABI config option.
So the migration to the new ABI is smoother.


[...]
> +The macros exported are:
> +
> +* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry binding
> +  unversioned symbol ``b`` to the internal function ``b_e``.

The definition is the same as BASE_SYMBOL.

> +* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
> +  unversioned symbol ``b`` to the internal function ``b_e``.


[...]
> +   DPDK_2.0 {
> +        global:
> +
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_create;

So it's declared twice, right?
I think it should be explicit.

> +        rte_acl_dump;
> +        rte_acl_find_existing;
> +        rte_acl_free;
> +        rte_acl_ipv4vlan_add_rules;
> +        rte_acl_ipv4vlan_build;
> +        rte_acl_list_dump;
> +        rte_acl_reset;
> +        rte_acl_reset_rules;
> +        rte_acl_set_ctx_classify;
> +
> +        local: *;
> +   };
> +
> +   DPDK_2.1 {
> +        global:
> +        rte_acl_create;
> +
> +   } DPDK_2.0;

[...]
> +Note that the base name of the symbol was kept in tact, as this is condusive to

s/in tact/intact/?

[...]
> +the macros used for versioning symbols.  That is our next step, mapping this new
> +symbol name to the initial symbol name at version node 2.0.  Immediately after
> +the function, we add this line of code
> +
> +.. code-block:: c
> +
> +   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);

Can it be declared before the function?

[...]
> +Remembering to also add the rte_compat.h header to the requisite c file where
> +these changes are being made.  The above macro instructs the linker to create a
> +new symbol ``rte_acl_create@DPDK_2.0``, which matches the symbol created in older
> +builds, but now points to the above newly named function.  We have now mapped
> +the original rte_acl_create symbol to the original function (but with a new
> +name)

Could we use VERSION_SYMBOL(rte_acl_create, , 2.0);
when introducing the function in DPDK 2.0 (before any ABI breakage)?
It could help to generate the .map file.

When do we need to use BASE_SYMBOL?

[...]
> +This code serves as our new API call.  Its the same as our old call, but adds
> +the new parameter in place.  Next we need to map this function to the symbol
> +``rte_acl_create@DPDK_2.1``.  To do this, we modify the public prototype of the call
> +in the header file, adding the macro there to inform all including applications,
> +that on re-link, the default rte_acl_create symbol should point to this
> +function.  Note that we could do this by simply naming the function above
> +rte_acl_create, and the linker would chose the most recent version tag to apply
> +in the version script, but we can also do this in the header file
> +
> +.. code-block:: c
> +
> +   struct rte_acl_ctx *
> +   -rte_acl_create(const struct rte_acl_param *param);
> +   +rte_acl_create(const struct rte_acl_param *param, int debug);
> +   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);

Will it work with static library?

> +Next remove the corresponding versioned export
> +.. code-block:: c
> +
> + -VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
> +
> +
> +Note that the internal function definition could also be removed, but its used
> +in our example by the newer version _v21, so we leave it in place.  This is a
> +coding style choice.
> +
> +Lastly, we need to bump the LIBABIVER number for this library in the Makefile to
> +indicate to applications doing dynamic linking that this is a later, and
> +possibly incompatible library version:
> +
> +.. code-block:: c
> +
> +   -LIBABIVER := 1
> +   +LIBABIVER := 2

Very well explained, thanks.

[...]
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_dump;
> +        rte_acl_create

Not in alphabetical order.


As you copy a part of abi.rst, it should be removed from the original doc.

Thanks Neil

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation
  @ 2015-06-24 18:34 14%   ` Neil Horman
  2015-06-24 21:09  9%     ` Thomas Monjalon
  2015-06-25  7:19  4%     ` Zhang, Helin
  0 siblings, 2 replies; 200+ results
From: Neil Horman @ 2015-06-24 18:34 UTC (permalink / raw)
  To: dev

People have been asking for ways to use the ABI macros, heres some docs to
clarify their use.  Included is:

* An overview of what ABI is
* Details of the ABI deprecation process
* Details of the versioning macros
* Examples of their use
* Details of how to use the ABI validator

Thanks to John Mcnamara, who duplicated much of this effort at Intel while I was
working on it.  Much of the introductory material was gathered and cleaned up by
him

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.mcnamara@intel.com
CC: thomas.monjalon@6wind.com

Change notes:

v2)
     * Fixed RST indentations and spelling errors
     * Rebased to upstream to fix index.rst conflict
---
 doc/guides/guidelines/index.rst      |   1 +
 doc/guides/guidelines/versioning.rst | 456 +++++++++++++++++++++++++++++++++++
 2 files changed, 457 insertions(+)
 create mode 100644 doc/guides/guidelines/versioning.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index 0ee9ab3..bfb9fa3 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -7,3 +7,4 @@ Guidelines
 
     coding_style
     design
+    versioning
diff --git a/doc/guides/guidelines/versioning.rst b/doc/guides/guidelines/versioning.rst
new file mode 100644
index 0000000..2aef526
--- /dev/null
+++ b/doc/guides/guidelines/versioning.rst
@@ -0,0 +1,456 @@
+Managing ABI updates
+====================
+
+Description
+-----------
+
+This document details some methods for handling ABI management in the DPDK.
+Note this document is not exhaustive, in that C library versioning is flexible
+allowing multiple methods to achieve various goals, but it will provide the user
+with some introductory methods
+
+General Guidelines
+------------------
+
+#. Whenever possible, ABI should be preserved
+#. The addition of symbols is generally not problematic
+#. The modification of symbols can generally be managed with versioning
+#. The removal of symbols generally is an ABI break and requires bumping of the
+   LIBABIVER macro
+
+What is an ABI
+--------------
+
+An ABI (Application Binary Interface) is the set of runtime interfaces exposed
+by a library. It is similar to an API (Application Programming Interface) but
+is the result of compilation.  It is also effectively cloned when applications
+link to dynamic libraries.  That is to say when an application is compiled to
+link against dynamic libraries, it is assumed that the ABI remains constant
+between the time the application is compiled/linked, and the time that it runs.
+Therefore, in the case of dynamic linking, it is critical that an ABI is
+preserved, or (when modified), done in such a way that the application is unable
+to behave improperly or in an unexpected fashion.
+
+The DPDK ABI policy
+-------------------
+
+ABI versions are set at the time of major release labeling, and the ABI may
+change multiple times, without warning, between the last release label and the
+HEAD label of the git tree.
+
+ABI versions, once released, are available until such time as their
+deprecation has been noted in the Release Notes for at least one major release
+cycle. For example consider the case where the ABI for DPDK 2.0 has been
+shipped and then a decision is made to modify it during the development of
+DPDK 2.1. The decision will be recorded in the Release Notes for the DPDK 2.1
+release and the modification will be made available in the DPDK 2.2 release.
+
+ABI versions may be deprecated in whole or in part as needed by a given
+update.
+
+Some ABI changes may be too significant to reasonably maintain multiple
+versions. In those cases ABI's may be updated without backward compatibility
+being provided. The requirements for doing so are:
+
+#. At least 3 acknowledgments of the need to do so must be made on the
+   dpdk.org mailing list.
+
+#. A full deprecation cycle, as explained above, must be made to offer
+   downstream consumers sufficient warning of the change.
+
+#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
+   incorporated must be incremented in parallel with the ABI changes
+   themselves.
+
+Note that the above process for ABI deprecation should not be undertaken
+lightly. ABI stability is extremely important for downstream consumers of the
+DPDK, especially when distributed in shared object form. Every effort should
+be made to preserve the ABI whenever possible. The ABI should only be changed
+for significant reasons, such as performance enhancements. ABI breakage due to
+changes such as reorganizing public structure fields for aesthetic or
+readability purposes should be avoided.
+
+Examples of Deprecation Notices
+-------------------------------
+
+The following are some examples of ABI deprecation notices which would be
+added to the Release Notes:
+
+* The Macro ``#RTE_FOO`` is deprecated and will be removed with version 2.0,
+  to be replaced with the inline function ``rte_foo()``.
+
+* The function ``rte_mbuf_grok()`` has been updated to include a new parameter
+  in version 2.0. Backwards compatibility will be maintained for this function
+  until the release of version 2.1
+
+* The members of ``struct rte_foo`` have been reorganized in release 2.0 for
+  performance reasons. Existing binary applications will have backwards
+  compatibility in release 2.0, while newly built binaries will need to
+  reference the new structure variant ``struct rte_foo2``. Compatibility will
+  be removed in release 2.2, and all applications will require updating and
+  rebuilding to the new structure at that time, which will be renamed to the
+  original ``struct rte_foo``.
+
+* Significant ABI changes are planned for the ``librte_dostuff`` library. The
+  upcoming release 2.0 will not contain these changes, but release 2.1 will,
+  and no backwards compatibility is planned due to the extensive nature of
+  these changes. Binaries using this library built prior to version 2.1 will
+  require updating and recompilation.
+
+Versioning Macros
+-----------------
+
+When a symbol is exported from a library to provide an API, it also provides a
+calling convention (ABI) that is embodied in its name, return type and
+arguments. Occasionally that function may need to change to accommodate new
+functionality or behavior. When that occurs, it is desirable to allow for
+backward compatibility for a time with older binaries that are dynamically
+linked to the DPDK.
+
+To support backward compatibility the ``lib/librte_compat/rte_compat.h``
+header file provides macros to use when updating exported functions. These
+macros are used in conjunction with the ``rte_<library>_version.map`` file for
+a given library to allow multiple versions of a symbol to exist in a shared
+library so that older binaries need not be immediately recompiled.
+
+The macros exported are:
+
+* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry binding
+  unversioned symbol ``b`` to the internal function ``b_e``.
+
+
+* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
+  unversioned symbol ``b`` to the internal function ``b_e``.
+
+* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry instructing
+  the linker to bind references to symbol ``b`` to the internal symbol
+  ``b_e``.
+
+
+Examples of ABI Macro use
+-------------------------
+
+Updating a public API
+~~~~~~~~~~~~~~~~~~~~~
+
+Assume we have a function as follows
+
+.. code-block:: c
+
+ /*
+  * Create an acl context object for apps to 
+  * manipulate
+  */
+ struct rte_acl_ctx *
+ rte_acl_create(const struct rte_acl_param *param)
+ {
+        ...
+ }
+
+
+Assume that struct rte_acl_ctx is a private structure, and that a developer
+wishes to enhance the acl api so that a debugging flag can be enabled on a
+per-context basis.  This requires an addition to the structure (which, being
+private, is safe), but it also requires modifying the code as follows
+
+.. code-block:: c
+
+ /*
+  * Create an acl context object for apps to 
+  * manipulate
+  */
+ struct rte_acl_ctx *
+ rte_acl_create(const struct rte_acl_param *param, int debug)
+ {
+        ...
+ }
+
+
+Note also that, being a public function, the header file prototype must also be
+changed, as must all the call sites, to reflect the new ABI footprint.  We will
+maintain previous ABI versions that are accessible only to previously compiled
+binaries
+
+The addition of a parameter to the function is ABI breaking as the function is
+public, and existing application may use it in its current form.  However, the
+compatibility macros in DPDK allow a developer to use symbol versioning so that
+multiple functions can be mapped to the same public symbol based on when an
+application was linked to it.  To see how this is done, we start with the
+requisite libraries version map file.  Initially the version map file for the
+acl library looks like this
+
+.. code-block:: none 
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_create;
+        rte_acl_dump;
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+   };
+
+This file needs to be modified as follows
+
+.. code-block:: none
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_create;
+        rte_acl_dump;
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+   };
+
+   DPDK_2.1 {
+        global:
+        rte_acl_create;
+
+   } DPDK_2.0;
+
+The addition of the new block tells the linker that a new version node is
+available (DPDK_2.1), which contains the symbol rte_acl_create, and inherits the
+symbols from the DPDK_2.0 node.  This list is directly translated into a list of
+exported symbols when DPDK is compiled as a shared library
+
+Next, we need to specify in the code which function map to the rte_acl_create
+symbol at which versions.  First, at the site of the initial symbol definition,
+we need to update the function so that it is uniquely named, and not in conflict
+with the public symbol name
+
+.. code-block:: c
+
+  struct rte_acl_ctx *
+ -rte_acl_create(const struct rte_acl_param *param)
+ +rte_acl_create_v20(const struct rte_acl_param *param)
+ {
+        size_t sz;
+        struct rte_acl_ctx *ctx;
+        ...
+
+Note that the base name of the symbol was kept in tact, as this is condusive to
+the macros used for versioning symbols.  That is our next step, mapping this new
+symbol name to the initial symbol name at version node 2.0.  Immediately after
+the function, we add this line of code
+
+.. code-block:: c
+
+   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
+
+Remembering to also add the rte_compat.h header to the requisite c file where
+these changes are being made.  The above macro instructs the linker to create a
+new symbol ``rte_acl_create@DPDK_2.0``, which matches the symbol created in older
+builds, but now points to the above newly named function.  We have now mapped
+the original rte_acl_create symbol to the original function (but with a new
+name)
+
+Next, we need to create the 2.1 version of the symbol.  We create a new function
+name, with a different suffix, and  implement it appropriately
+
+.. code-block:: c
+
+   struct rte_acl_ctx *
+   rte_acl_create_v21(const struct rte_acl_param *param, int debug);
+   {
+        struct rte_acl_ctx *ctx = rte_acl_create_v20(param);
+
+        ctx->debug = debug;
+
+        return ctx;
+   }
+
+This code serves as our new API call.  Its the same as our old call, but adds
+the new parameter in place.  Next we need to map this function to the symbol
+``rte_acl_create@DPDK_2.1``.  To do this, we modify the public prototype of the call
+in the header file, adding the macro there to inform all including applications,
+that on re-link, the default rte_acl_create symbol should point to this
+function.  Note that we could do this by simply naming the function above
+rte_acl_create, and the linker would chose the most recent version tag to apply
+in the version script, but we can also do this in the header file
+
+.. code-block:: c
+
+   struct rte_acl_ctx *
+   -rte_acl_create(const struct rte_acl_param *param);
+   +rte_acl_create(const struct rte_acl_param *param, int debug);
+   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
+
+The BIND_DEFAULT_SYMBOL macro explicitly tells applications that include this
+header, to link to the rte_acl_create_v21 function and apply the DPDK_2.1
+version node to it.  This method is more explicit and flexible than just
+re-implementing the exact symbol name, and allows for other features (such as
+linking to the old symbol version by default, when the new ABI is to be opt-in
+for a period.
+
+That's it, on the next shared library rebuild, there will be two versions of
+rte_acl_create, an old DPDK_2.0 version, used by previously built applications,
+and a new DPDK_2.1 version, used by future built applications.
+
+
+Deprecating part of a public API
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Lets assume that you've done the above update, and after a few releases have
+passed you decide you would like to retire the old version of the function.
+After having gone through the ABI deprecation announcement process, removal is
+easy.  Start by removing the symbol from the requisite version map file:
+
+.. code-block:: none
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_dump;
+ -      rte_acl_create
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+   };
+
+   DPDK_2.1 {
+        global:
+        rte_acl_create;
+   } DPDK_2.0;
+
+
+Next remove the corresponding versioned export
+.. code-block:: c
+
+ -VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
+
+
+Note that the internal function definition could also be removed, but its used
+in our example by the newer version _v21, so we leave it in place.  This is a
+coding style choice.
+
+Lastly, we need to bump the LIBABIVER number for this library in the Makefile to
+indicate to applications doing dynamic linking that this is a later, and
+possibly incompatible library version:
+
+.. code-block:: c
+
+   -LIBABIVER := 1
+   +LIBABIVER := 2
+
+Deprecating an entire ABI version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While removing a symbol from and ABI may be useful, it is often more practical
+to remove an entire version node at once.  If a version node completely
+specifies an API, then removing part of it, typically makes it incomplete.  In
+those cases it is better to remove the entire node
+ 
+To do this, start by modifying the version map file, such that all symbols from
+the node to be removed are merged into the next node in the map
+
+In the case of our map above, it would transform to look as follows
+
+.. code-block:: none
+
+   DPDK_2.1 {              
+        global:
+              
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_dump;
+        rte_acl_create
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+              
+        local: *;
+ };           
+
+Then any uses of BIND_DEFAULT_SYMBOL that pointed to the old node should be
+updated to point to the new version node in any header files for all affected
+symbols.
+
+.. code-block:: c
+
+ -BIND_DEFAULT_SYMBOL(rte_acl_create, _v20, 2.0);
+ +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
+
+Lastly, any VERSION_SYMBOL macros that point to the old version node should be
+removed, taking care to keep, where need old code in place to support newer
+versions of the symbol.
+
+Running the ABI Validator
+-------------------------
+
+The ``scripts`` directory in the DPDK source tree contains a utility program,
+``validate-abi.sh``, for validating the DPDK ABI based on the Linux `ABI
+Compliance Checker
+<http://ispras.linuxbase.org/index.php/ABI_compliance_checker>`_.
+
+This has a dependency on the ``abi-compliance-checker`` and ``and abi-dumper``
+utilities which can be installed via a package manager. For example::
+
+   sudo yum install abi-compliance-checker
+   sudo yum install abi-dumper
+
+The syntax of the ``validate-abi.sh`` utility is::
+
+   ./scripts/validate-abi.sh <TAG1> <TAG2> <TARGET>
+
+Where ``TAG1`` and ``TAG2`` are valid git tags on the local repo and target is
+the usual DPDK compilation target.
+
+For example to test the current committed HEAD against a previous release tag
+we could add a temporary tag and run the utility as follows::
+
+   git tag MY_TEMP_TAG
+   ./scripts/validate-abi.sh v2.0.0 MY_TEMP_TAG x86_64-native-linuxapp-gcc
+
+After the validation script completes (it can take a while since it need to
+compile both tags) it will create compatibility reports in the
+``./compat_report`` directory. Listed incompatibilities can be found as
+follows::
+
+  grep -lr Incompatible compat_reports/
-- 
2.1.0

^ permalink raw reply	[relevance 14%]

* Re: [dpdk-dev] [PATCH 2/2] ABI: Add some documentation
  2015-06-23 19:33 14% ` [dpdk-dev] [PATCH 2/2] ABI: Add some documentation Neil Horman
@ 2015-06-24 11:21  4%   ` Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2015-06-24 11:21 UTC (permalink / raw)
  To: Neil Horman, dev

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Tuesday, June 23, 2015 8:34 PM
> To: dev@dpdk.org
> Cc: Neil Horman; Mcnamara, John; thomas.monjalon@6wind.com
> Subject: [PATCH 2/2] ABI: Add some documentation
> 
> People have been asking for ways to use the ABI macros, heres some docs to
> clarify their use.  Included is:

Hi,

Thanks for this.

There are a few minor comments on the RST structure below.

Also, there is a conflict in the doc/guides/guidelines/index.rst file with an addition that just got merged. I just needs a rebase.


> +This file needs to be modified as follows
> +
> +.. code-block:: none
> +
> +   DPDK_2.0 {
> +        global:
> +
> +        rte_acl_add_rules;
> +        rte_acl_build;
> +        rte_acl_classify;
> +        rte_acl_classify_alg;
> +        rte_acl_classify_scalar;
> +        rte_acl_create;
> +        rte_acl_dump;
> +        rte_acl_find_existing;
> +        rte_acl_free;
> +        rte_acl_ipv4vlan_add_rules;
> +        rte_acl_ipv4vlan_build;
> +        rte_acl_list_dump;
> +        rte_acl_reset;
> +        rte_acl_reset_rules;
> +        rte_acl_set_ctx_classify;
> +
> +        local: *;
> + };
> +
> + DPDK_2.1 {
> +        global:
> +        rte_acl_create;
> +
> + } DPDK_2.0;

The last 7 lines of this verbatim section should be indented to the same level as the rest of the section. In general the code blocks should be indented at least 3 spaces to keep the various RST converters happy. That applies in a few places.


> +Note that the base name of the symbol was kept in tact, as this is
> +condusive to the macros used for versioning symbols.  That is our next
> +step, mapping this new symbol name to the initial symbol name at
> +version node 2.0.  Immediately after the function, we add this line of
> +code
> +
> +.. code-block:: c
> +
> +   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);VERSION_SYMBOL(rte_acl_create, _v20, 2.0);

The is a duplicate macro here.


> +
> +Remembering to also add the rte_compat.h header to the requisite c file
> +where these changes are being made.  The above macro instructs the
> +linker to create a new symbol rte_acl_create@DPDK_2.0, which matches

Could you enclose the symbol in RST backquotes ``rte_acl_create@DPDK_2.0`` since some of the renderers treat this as an email address! There is another one a few paragraphs down.


> +the symbol created in older builds, but now points to the above newly
> +named function.  We have now mapped the origional rte_acl_create symbol
> +to the origional function 

There are a few minor typos here and there.


> + };
> +
> + DPDK_2.1 {
> +        global:
> +        rte_acl_create;
> + } DPDK_2.0;

Same comment as above on indentation.


John.
-- 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 08/12] mempool: allow config override on element alignment
  2015-06-23 20:43  4%     ` Cyril Chemparathy
@ 2015-06-23 21:21  0%       ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-23 21:21 UTC (permalink / raw)
  To: Cyril Chemparathy; +Cc: dev



> -----Original Message-----
> From: Cyril Chemparathy [mailto:cchemparathy@ezchip.com]
> Sent: Tuesday, June 23, 2015 9:43 PM
> To: Ananyev, Konstantin
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 08/12] mempool: allow config override on element alignment
> 
> On Tue, 23 Jun 2015 00:31:06 +0000
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> 
> > > +#define RTE_MEMPOOL_ALIGN_MASK	(RTE_MEMPOOL_ALIGN - 1)
> >
> > I am probably a bit late with my comments, but why not make it a
> > runtime decision then? I know we can't add a new parameter to
> > mempool_xmem_create() without ABI breakage, but we can make some
> > global variable for now, that could be setup at init time or
> > something similar.
> 
> But then, a global variable that is modified by an application _is_ a
> part of the ABI, and a bad one at that.
> 
> I agree with the desire to make it runtime configurable, but I think we
> should do so in the right spirit, with the appropriate interfaces, and
> when we're open to changing the ABI accordingly.

Ok, let's wait till next release then.
Konstantin

> 
> Thanks
> -- Cyril.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 08/12] mempool: allow config override on element alignment
  2015-06-23  0:31  3%   ` Ananyev, Konstantin
@ 2015-06-23 20:43  4%     ` Cyril Chemparathy
  2015-06-23 21:21  0%       ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Cyril Chemparathy @ 2015-06-23 20:43 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Tue, 23 Jun 2015 00:31:06 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> > +#define RTE_MEMPOOL_ALIGN_MASK	(RTE_MEMPOOL_ALIGN - 1)  
> 
> I am probably a bit late with my comments, but why not make it a
> runtime decision then? I know we can't add a new parameter to
> mempool_xmem_create() without ABI breakage, but we can make some
> global variable for now, that could be setup at init time or
> something similar. 

But then, a global variable that is modified by an application _is_ a
part of the ABI, and a bad one at that.

I agree with the desire to make it runtime configurable, but I think we
should do so in the right spirit, with the appropriate interfaces, and
when we're open to changing the ABI accordingly.

Thanks
-- Cyril.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 2/2] ABI: Add some documentation
  @ 2015-06-23 19:33 14% ` Neil Horman
  2015-06-24 11:21  4%   ` Mcnamara, John
      2 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-06-23 19:33 UTC (permalink / raw)
  To: dev

People have been asking for ways to use the ABI macros, heres some docs to
clarify their use.  Included is:

* An overview of what ABI is
* Details of the ABI deprecation process
* Details of the versioning macros
* Examples of their use
* Details of how to use the ABI validator

Thanks to John Mcnamara, who duplicated much of this effort at Intel while I was
working on it.  Much of the introductory material was gathered and cleaned up by
him

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.mcnamara@intel.com
CC: thomas.monjalon@6wind.com
---
 doc/guides/guidelines/index.rst      |   1 +
 doc/guides/guidelines/versioning.rst | 456 +++++++++++++++++++++++++++++++++++
 2 files changed, 457 insertions(+)
 create mode 100644 doc/guides/guidelines/versioning.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..251062e 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    versioning
diff --git a/doc/guides/guidelines/versioning.rst b/doc/guides/guidelines/versioning.rst
new file mode 100644
index 0000000..79af924
--- /dev/null
+++ b/doc/guides/guidelines/versioning.rst
@@ -0,0 +1,456 @@
+Managing ABI updates
+====================
+
+Description
+-----------
+
+This document details some methods for handling ABI management in the DPDK.
+Note this document is not exhaustive, in that C library versioning is flexible
+allowing multiple methods to acheive various goals, but it will provide the user
+with some introductory methods
+
+General Guidelines
+------------------
+
+#. Whenever possible, ABI should be preserved
+#. The addition of symbols is generally not problematic
+#. The modification of symbols can generally be managed with versioning
+#. The removal of symbols generally is an ABI break and requires bumping of the
+   LIBABIVER macro
+
+What is an ABI
+--------------
+
+An ABI (Application Binary Interface) is the set of runtime interfaces exposed
+by a library. It is similar to an API (Application Programming Interface) but
+is the result of compilation.  It is also effectively cloned when applications
+link to dynamic libraries.  That is to say when an application is compiled to
+link against dynamic libraries, it is assumed that the ABI remains constant
+between the time the application is compiled/linked, and the time that it runs.
+Therefore, in the case of dynamic linking, it is critical that an ABI is
+preserved, or (when modified), done in such a way that the application is unable
+to behave improperly or in an unexpected fashion.
+
+The DPDK ABI policy
+-------------------
+
+ABI versions are set at the time of major release labeling, and the ABI may
+change multiple times, without warning, between the last release label and the
+HEAD label of the git tree.
+
+ABI versions, once released, are available until such time as their
+deprecation has been noted in the Release Notes for at least one major release
+cycle. For example consider the case where the ABI for DPDK 2.0 has been
+shipped and then a decision is made to modify it during the development of
+DPDK 2.1. The decision will be recorded in the Release Notes for the DPDK 2.1
+release and the modification will be made available in the DPDK 2.2 release.
+
+ABI versions may be deprecated in whole or in part as needed by a given
+update.
+
+Some ABI changes may be too significant to reasonably maintain multiple
+versions. In those cases ABI's may be updated without backward compatibility
+being provided. The requirements for doing so are:
+
+#. At least 3 acknowledgments of the need to do so must be made on the
+   dpdk.org mailing list.
+
+#. A full deprecation cycle, as explained above, must be made to offer
+   downstream consumers sufficient warning of the change.
+
+#. The ``LIBABIVER`` variable in the makefile(s) where the ABI changes are
+   incorporated must be incremented in parallel with the ABI changes
+   themselves.
+
+Note that the above process for ABI deprecation should not be undertaken
+lightly. ABI stability is extremely important for downstream consumers of the
+DPDK, especially when distributed in shared object form. Every effort should
+be made to preserve the ABI whenever possible. The ABI should only be changed
+for significant reasons, such as performance enhancements. ABI breakage due to
+changes such as reorganizing public structure fields for aesthetic or
+readability purposes should be avoided.
+
+Examples of Deprecation Notices
+-------------------------------
+
+The following are some examples of ABI deprecation notices which would be
+added to the Release Notes:
+
+* The Macro ``#RTE_FOO`` is deprecated and will be removed with version 2.0,
+  to be replaced with the inline function ``rte_foo()``.
+
+* The function ``rte_mbuf_grok()`` has been updated to include a new parameter
+  in version 2.0. Backwards compatibility will be maintained for this function
+  until the release of version 2.1
+
+* The members of ``struct rte_foo`` have been reorganized in release 2.0 for
+  performance reasons. Existing binary applications will have backwards
+  compatibility in release 2.0, while newly built binaries will need to
+  reference the new structure variant ``struct rte_foo2``. Compatibility will
+  be removed in release 2.2, and all applications will require updating and
+  rebuilding to the new structure at that time, which will be renamed to the
+  original ``struct rte_foo``.
+
+* Significant ABI changes are planned for the ``librte_dostuff`` library. The
+  upcoming release 2.0 will not contain these changes, but release 2.1 will,
+  and no backwards compatibility is planned due to the extensive nature of
+  these changes. Binaries using this library built prior to version 2.1 will
+  require updating and recompilation.
+
+Versioning Macros
+-----------------
+
+When a symbol is exported from a library to provide an API, it also provides a
+calling convention (ABI) that is embodied in its name, return type and
+arguments. Occasionally that function may need to change to accommodate new
+functionality or behavior. When that occurs, it is desirable to allow for
+backward compatibility for a time with older binaries that are dynamically
+linked to the DPDK.
+
+To support backward compatibility the ``lib/librte_compat/rte_compat.h``
+header file provides macros to use when updating exported functions. These
+macros are used in conjunction with the ``rte_<library>_version.map`` file for
+a given library to allow multiple versions of a symbol to exist in a shared
+library so that older binaries need not be immediately recompiled.
+
+The macros exported are:
+
+* ``VERSION_SYMBOL(b, e, n)``: Creates a symbol version table entry binding
+  unversioned symbol ``b`` to the internal function ``b_e``.
+
+
+* ``BASE_SYMBOL(b, e)``: Creates a symbol version table entry binding
+  unversioned symbol ``b`` to the internal function ``b_e``.
+
+* ``BIND_DEFAULT_SYMBOL(b, e, n)``: Creates a symbol version entry instructing
+  the linker to bind references to symbol ``b`` to the internal symbol
+  ``b_e``.
+
+
+Examples of ABI Macro use
+-------------------------
+
+Updating a public API
+~~~~~~~~~~~~~~~~~~~~~
+
+Assume we have a function as follows
+
+.. code-block:: c
+
+ /*
+  * Create an acl context object for apps to 
+  * manipulate
+  */
+ struct rte_acl_ctx *
+ rte_acl_create(const struct rte_acl_param *param)
+ {
+        ...
+ }
+
+
+Assume that struct rte_acl_ctx is a private structure, and that a developer
+wishes to enhance the acl api so that a debugging flag can be enabled on a
+per-context basis.  This requires an addition to the structure (which, being
+private, is safe), but it also requries modifying the code as follows
+
+.. code-block:: c
+
+ /*
+  * Create an acl context object for apps to 
+  * manipulate
+  */
+ struct rte_acl_ctx *
+ rte_acl_create(const struct rte_acl_param *param, int debug)
+ {
+        ...
+ }
+
+
+Note also that, being a public function, the header file prototype must also be
+changed, as must all the call sites, to reflect the new ABI footprint.  We will
+maintain previous ABI versions that are accessible only to previously compiled
+binaries
+
+The addition of a parameter to the function is ABI breaking as the function is
+public, and existing application may use it in its current form.  However, the
+compatibility macros in DPDK alow a developer to use symbol versioning so that
+multiple functions can be mapped to the same public symbol based on when an
+application was linked to it.  To see how this is done, we start with the
+requisite libraries version map file.  Initially the version map file for the
+acl library looks like this
+
+.. code-block:: none 
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_create;
+        rte_acl_dump;
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+ };
+
+This file needs to be modified as follows
+
+.. code-block:: none
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_create;
+        rte_acl_dump;
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+ };
+
+ DPDK_2.1 {
+        global:
+        rte_acl_create;
+
+ } DPDK_2.0;
+
+The addition of the new block tells the linker that a new version node is
+available (DPDK_2.1), whcih contains the symbol rte_acl_create, and inherits the
+symbols from the DPDK_2.0 node.  This list is directly translated into a list of
+exported symbols when DPDK is compiled as a shared library
+
+Next, we need to specify in the code which function map to the rte_acl_create
+symbol at which versions.  First, at the site of the initial symbol definition,
+we need to update the function so that it is uniquely named, and not in conflict
+with the public symbol name
+
+.. code-block:: c
+
+  struct rte_acl_ctx *
+ -rte_acl_create(const struct rte_acl_param *param)
+ +rte_acl_create_v20(const struct rte_acl_param *param)
+ {
+        size_t sz;
+        struct rte_acl_ctx *ctx;
+        ...
+
+Note that the base name of the symbol was kept in tact, as this is condusive to
+the macros used for versioning symbols.  That is our next step, mapping this new
+symbol name to the initial symbol name at version node 2.0.  Immediately after
+the function, we add this line of code
+
+.. code-block:: c
+
+   VERSION_SYMBOL(rte_acl_create, _v20, 2.0);VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
+
+Remembering to also add the rte_compat.h header to the requisite c file where
+these changes are being made.  The above macro instructs the linker to create a
+new symbol rte_acl_create@DPDK_2.0, which matches the symbol created in older
+builds, but now points to the above newly named function.  We have now mapped
+the origional rte_acl_create symbol to the origional function (but with a new
+name)
+
+Next, we need to create the 2.1 version of the symbol.  We create a new function
+name, with a different suffix, and  implement it appropriately
+
+.. code-block:: c
+
+   struct rte_acl_ctx *
+   rte_acl_create_v21(const struct rte_acl_param *param, int debug);
+   {
+        struct rte_acl_ctx *ctx = rte_acl_create_v20(param);
+
+        ctx->debug = debug;
+
+        return ctx;
+   }
+
+This code serves as our new API call.  Its the same as our old call, but adds
+the new parameter in place.  Next we need to map this function to the symbol
+rte_acl_create@DPDK_2.1.  To do this, we modify the public prototype of the call
+in the header file, adding the macro there to inform all including applications,
+that on re-link, the default rte_acl_create symbol should point to this
+function.  Note that we could do this by simply naming the fuction above
+rte_acl_create, and the linker would chose the most recent version tag to apply
+in the version script, but we can also do this in the header file
+
+.. code-block:: c
+
+   struct rte_acl_ctx *
+   -rte_acl_create(const struct rte_acl_param *param);
+   +rte_acl_create(const struct rte_acl_param *param, int debug);
+   +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
+
+The BIND_DEFAULT_SYMBOL macro explicitly tells applications that include this
+header, to link to the rte_acl_create_v21 function and apply the DPDK_2.1
+version node to it.  This method is more explicit and flexible than just
+reimplementing the exact symbol name, and allows for other features (such as
+linking to the old symbol version by default, when the new ABI is to be opt-in
+for a period.
+
+Thats it, on the next shared library rebuild, there will be two versions of
+rte_acl_create, an old DPDK_2.0 version, used by previously built applications,
+and a new DPDK_2.1 version, used by future built applications.
+
+
+Deprecating part of a public API
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Lets assume that you've done the above update, and after a few releases have
+passed you decide you would like to retire the old version of the function.
+After having gone through the ABI deprecation annoucement process, removal is
+easy.  Start by removing the symbol from the requisite version map file:
+
+.. code-block:: none
+
+   DPDK_2.0 {
+        global:
+
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_dump;
+ -      rte_acl_create
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+
+        local: *;
+ };
+
+ DPDK_2.1 {
+        global:
+        rte_acl_create;
+ } DPDK_2.0;
+
+
+Next remove the corresponding versioned export
+.. code-block:: c
+
+ -VERSION_SYMBOL(rte_acl_create, _v20, 2.0);
+
+
+Note that the internal function definition could also be removed, but its used
+in our example by the newer version _v21, so we leave it in place.  This is a
+coding style choice.
+
+Lastly, we need to bump the LIBABIVER number for this library in the Makefile to
+indicate to applications doing dynamic linking that this is a later, and
+possibly incompatible library version:
+
+.. code-block:: c
+
+   -LIBABIVER := 1
+   +LIBABIVER := 2
+
+Deprecating an entire ABI version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While removing a symbol from and ABI may be useful, it is often more practical
+to remove an entire version node at once.  If a version node completely
+specifies an API, then removing part of it, typically makes it incomplete.  In
+those cases it is better to remove the entire node
+ 
+To do this, start by modifying the version map file, such that all symbols from
+the node to be removed are merged into the next node in the map
+
+In the case of our map above, it would transform to look as follows
+
+.. code-block:: none
+
+   DPDK_2.1 {              
+        global:
+              
+        rte_acl_add_rules;
+        rte_acl_build;
+        rte_acl_classify;
+        rte_acl_classify_alg;
+        rte_acl_classify_scalar;
+        rte_acl_dump;
+        rte_acl_create
+        rte_acl_find_existing;
+        rte_acl_free;
+        rte_acl_ipv4vlan_add_rules;
+        rte_acl_ipv4vlan_build;
+        rte_acl_list_dump;
+        rte_acl_reset;
+        rte_acl_reset_rules;
+        rte_acl_set_ctx_classify;
+              
+        local: *;
+ };           
+
+Then any uses of BIND_DEFAULT_SYMBOL that ponited to the old node should be
+updated to point to the new version node in any header files for all affected
+symbols.
+
+.. code-block:: c
+
+ -BIND_DEFAULT_SYMBOL(rte_acl_create, _v20, 2.0);
+ +BIND_DEFAULT_SYMBOL(rte_acl_create, _v21, 2.1);
+
+Lastly, any VERSION_SYMBOL macros that point to the old version node should be
+removed, taking care to keep, where neeed old code in place to support newer
+versions of the symbol.
+
+Running the ABI Validator
+-------------------------
+
+The ``scripts`` directory in the DPDK source tree contains a utility program,
+``validate-abi.sh``, for validating the DPDK ABI based on the Linux `ABI
+Compliance Checker
+<http://ispras.linuxbase.org/index.php/ABI_compliance_checker>`_.
+
+This has a dependency on the ``abi-compliance-checker`` and ``and abi-dumper``
+utilities which can be installed via a package manager. For example::
+
+   sudo yum install abi-compliance-checker
+   sudo yum install abi-dumper
+
+The syntax of the ``validate-abi.sh`` utility is::
+
+   ./scripts/validate-abi.sh <TAG1> <TAG2> <TARGET>
+
+Where ``TAG1`` and ``TAG2`` are valid git tags on the local repo and target is
+the usual DPDK compilation target.
+
+For example to test the current committed HEAD against a previous release tag
+we could add a temporary tag and run the utility as follows::
+
+   git tag MY_TEMP_TAG
+   ./scripts/validate-abi.sh v2.0.0 MY_TEMP_TAG x86_64-native-linuxapp-gcc
+
+After the validation script completes (it can take a while since it need to
+compile both tags) it will create compatibility reports in the
+``./compat_report`` directory. Listed incompatibilities can be found as
+follows::
+
+  grep -lr Incompatible compat_reports/
-- 
2.1.0

^ permalink raw reply	[relevance 14%]

* Re: [dpdk-dev] [PATCH v8 00/18] unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (17 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 18/18] mbuf: remove old packet type bit masks Helin Zhang
@ 2015-06-23 16:13  0%       ` Ananyev, Konstantin
  18 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-23 16:13 UTC (permalink / raw)
  To: Zhang, Helin, dev



> -----Original Message-----
> From: Zhang, Helin
> Sent: Tuesday, June 23, 2015 2:50 AM
> To: dev@dpdk.org
> Cc: Cao, Waterman; Liang, Cunming; Liu, Jijiang; Ananyev, Konstantin; Richardson, Bruce; yongwang@vmware.com;
> olivier.matz@6wind.com; Wu, Jingjing; Zhang, Helin
> Subject: [PATCH v8 00/18] unified packet type
> 
> Currently only 6 bits which are stored in ol_flags are used to indicate the
> packet types. This is not enough, as some NIC hardware can recognize quite
> a lot of packet types, e.g i40e hardware can recognize more than 150 packet
> types. Hiding those packet types hides hardware offload capabilities which
> could be quite useful for improving performance and for end users.
> So an unified packet types are needed to support all possible PMDs. A 16
> bits packet_type in mbuf structure can be changed to 32 bits and used for
> this purpose. In addition, all packet types stored in ol_flag field should
> be deleted at all, and 6 bits of ol_flags can be save as the benifit.
> 
> Initially, 32 bits of packet_type can be divided into several sub fields to
> indicate different packet type information of a packet. The initial design
> is to divide those bits into fields for L2 types, L3 types, L4 types, tunnel
> types, inner L2 types, inner L3 types and inner L4 types. All PMDs should
> translate the offloaded packet types into these 7 fields of information, for
> user applications.
> 
> To avoid breaking ABI compatibility, currently all the code changes for
> unified packet type are disabled at compile time by default. Users can enable
> it manually by defining the macro of RTE_NEXT_ABI. The code changes will be
> valid by default in a future release, and the old version will be deleted
> accordingly, after the ABI change process is done.
> 
> Note that this patch set should be integrated after another patch set for
> '[PATCH v3 0/7] support i40e QinQ stripping and insertion', to clearly solve
> the conflict during integration. As both patch sets modified 'struct rte_mbuf',
> and the final layout of the 'struct rte_mbuf' is key to vectorized ixgbe PMD.
> 
> v2 changes:
> * Enlarged the packet_type field from 16 bits to 32 bits.
> * Redefined the packet type sub-fields.
> * Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
> * Used redefined packet types and enlarged packet_type field for all PMDs
>   and corresponding applications.
> * Removed changes in bond and its relevant application, as there is no need
>   at all according to the recent bond changes.
> 
> v3 changes:
> * Put the mbuf layout changes into a single patch.
> * Put vector ixgbe changes right after mbuf changes.
> * Disabled vector ixgbe PMD by default, as mbuf layout changed, and then
>   re-enabled it after vector ixgbe PMD updated.
> * Put the definitions of unified packet type into a single patch.
> * Minor bug fixes and enhancements in l3fwd example.
> 
> v4 changes:
> * Added detailed description of each packet types.
> * Supported unified packet type of fm10k.
> * Added printing logs of packet types of each received packet for rxonly
>   mode in testpmd.
> * Removed several useless code lines which block packet type unification from
>   app/test/packet_burst_generator.c.
> 
> v5 changes:
> * Added more detailed description for each packet types, together with examples.
> * Rolled back the macro definitions of RX packet flags, for ABI compitability.
> 
> v6 changes:
> * Disabled the code changes for unified packet type by default, to
>   avoid breaking ABI compatibility.
> 
> v7 changes:
> * Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.
> * Integrated with patch set for '[PATCH v3 0/7] support i40e QinQ stripping
>   and insertion', to clearly solve the conflicts during merging.
> 
> v8 changes:
> * Moved the field of 'vlan_tci_outer' in 'struct rte_mbuf' to the end of the 1st
>   cache line, to avoid breaking any vectorized PMD storing, as fields of
>   'packet_type, pkt_len, data_len, vlan_tci, rss' should be in an contiguous 128
>   bits.
> 
> Helin Zhang (18):
>   mbuf: redefine packet_type in rte_mbuf
>   ixgbe: support unified packet type in vectorized PMD
>   mbuf: add definitions of unified packet types
>   e1000: replace bit mask based packet type with unified packet type
>   ixgbe: replace bit mask based packet type with unified packet type
>   i40e: replace bit mask based packet type with unified packet type
>   enic: replace bit mask based packet type with unified packet type
>   vmxnet3: replace bit mask based packet type with unified packet type
>   fm10k: replace bit mask based packet type with unified packet type
>   app/test-pipeline: replace bit mask based packet type with unified
>     packet type
>   app/testpmd: replace bit mask based packet type with unified packet
>     type
>   app/test: Remove useless code
>   examples/ip_fragmentation: replace bit mask based packet type with
>     unified packet type
>   examples/ip_reassembly: replace bit mask based packet type with
>     unified packet type
>   examples/l3fwd-acl: replace bit mask based packet type with unified
>     packet type
>   examples/l3fwd-power: replace bit mask based packet type with unified
>     packet type
>   examples/l3fwd: replace bit mask based packet type with unified packet
>     type
>   mbuf: remove old packet type bit masks
> 
>  app/test-pipeline/pipeline_hash.c                  |  13 +
>  app/test-pmd/csumonly.c                            |  14 +
>  app/test-pmd/rxonly.c                              | 183 +++++++
>  app/test/packet_burst_generator.c                  |   6 +-
>  drivers/net/e1000/igb_rxtx.c                       | 102 ++++
>  drivers/net/enic/enic_main.c                       |  26 +
>  drivers/net/fm10k/fm10k_rxtx.c                     |  27 ++
>  drivers/net/i40e/i40e_rxtx.c                       | 528 +++++++++++++++++++++
>  drivers/net/ixgbe/ixgbe_rxtx.c                     | 163 +++++++
>  drivers/net/ixgbe/ixgbe_rxtx_vec.c                 |  75 ++-
>  drivers/net/vmxnet3/vmxnet3_rxtx.c                 |   8 +
>  examples/ip_fragmentation/main.c                   |   9 +
>  examples/ip_reassembly/main.c                      |   9 +
>  examples/l3fwd-acl/main.c                          |  29 +-
>  examples/l3fwd-power/main.c                        |   8 +
>  examples/l3fwd/main.c                              | 123 ++++-
>  .../linuxapp/eal/include/exec-env/rte_kni_common.h |   6 +
>  lib/librte_mbuf/rte_mbuf.c                         |   4 +
>  lib/librte_mbuf/rte_mbuf.h                         | 517 ++++++++++++++++++++
>  19 files changed, 1837 insertions(+), 13 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.9.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port stats and config option
  2015-06-23 14:54  0%       ` Thomas Monjalon
@ 2015-06-23 15:21  0%         ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-23 15:21 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Tuesday, June 23, 2015 3:55 PM
> To: Dumitrescu, Cristian
> Cc: Gajdzica, MaciejX T; dev@dpdk.org; nhorman@tuxdriver.com
> Subject: Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port
> stats and config option
> 
> 2015-06-23 14:30, Dumitrescu, Cristian:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > > 2015-06-19 11:41, Maciej Gajdzica:
> > > >  /** Input port interface defining the input port operation */
> > > >  struct rte_port_in_ops {
> > > >  	rte_port_in_op_create f_create; /**< Create */
> > > >  	rte_port_in_op_free f_free;     /**< Free */
> > > >  	rte_port_in_op_rx f_rx;         /**< Packet RX (packet burst) */
> > > > +	rte_port_in_op_stats_read f_stats;	/**< Stats */
> > > >  };
> > >
> > > Isn't it breaking an ABI?
> >
> > This is simply adding a field at the end of the API structure. This structure is
> instantiated per each port type  and its role is very similar to a driver ops
> structure, for example:
> >
> > 	in file "rte_port_ethdev.h": extern struct rte_port_out_ops
> rte_port_ethdev_writer_ops;
> > 	in file "rte_port_ring.h": extern struct rte_port_out_ops
> rte_port_ring_writer_nodrop_ops;
> >
> > Typically, instances of these structures are only referenced through
> pointers by application code (and other libraries, like librte_pipeline), so code
> that is not aware of this last field in the structure will still continue to work.
> >
> > The only case I see possible when code will break is if somebody would
> create an array of such structures, but I think this is not a realistic scenario.
> Instances of this structure are infrequent: once per port type in librte_port,
> and new instances are only created when the user wants to create new port
> type. Basically, instances of this structure are created in isolation and not in
> bulk (arrays).
> 
> Why wouldn't it be a problem even for single instance?
> If the application allocates one with old sizeof and the lib is trying to write
> in the new field, there can be a problem, no?
> 

The only case when the application is required to create a new instance of this structure is when the application is defining a new port type (unlikely). In this case, the application using the old structure layout is not aware about the statistics functionality, so it will not invoke it, so the library will not attempt to read the f_stats structure field. Since this field is immediately after the old structure layout, the other fields in the structure are not disturbed, so the application works just fine.

The only case when the application using the old structure layout is impacted is when the position of the old structure fields changes, and this can only happen when an array of such structures is created. To my earlier point, this is not realistic, as instances of this structure are created in isolation (single instance, not array of instances) and are accessed through pointers.

> Maybe Neil has an opinion?
> 
> > Due to this, I do not see this as breaking the API. I think this is the most it
> could be done to minimize the effect on the ABI will still adding new
> functionality. Please let me know what you think.
> >
> > >
> > > >  struct rte_port_out_ops {
> > > > -	rte_port_out_op_create f_create;   /**< Create */
> > > > -	rte_port_out_op_free f_free;       /**< Free */
> > > > -	rte_port_out_op_tx f_tx;           /**< Packet TX (single packet) */
> > > > -	rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst)
> > > */
> > > > -	rte_port_out_op_flush f_flush;     /**< Flush */
> > > > +	rte_port_out_op_create f_create;		/**< Create */
> > > > +	rte_port_out_op_free f_free;			/**< Free */
> > > > +	rte_port_out_op_tx f_tx;				/**< Packet
> > > TX (single packet) */
> > > > +	rte_port_out_op_tx_bulk f_tx_bulk;		/**< Packet TX
> > > (packet burst) */
> > > > +	rte_port_out_op_flush f_flush;			/**< Flush */
> > >
> > > What is the goal of this change? Breaking the alignment?
> >
> > Shall we submit a new patch revision to fix the alignment of the
> comments?
> 
> Yes using spaces for alignment would be better.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 0/4] User-space Ethtool
  2015-06-18 12:47  0%     ` Wang, Liang-min
@ 2015-06-23 15:19  0%       ` Wang, Liang-min
  0 siblings, 0 replies; 200+ results
From: Wang, Liang-min @ 2015-06-23 15:19 UTC (permalink / raw)
  To: 'Stephen Hemminger'; +Cc: 'dev@dpdk.org'

Stephen,

From: Wang, Liang-min 
Sent: Thursday, June 18, 2015 8:47 AM
To: Stephen Hemminger
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] [PATCH v7 0/4] User-space Ethtool

>>I agree with having a more complete API, but have some nits to pick.
>>Could the API be more abstract to reduce ABI issues in future?

>Which API? Are you referring to the APIs over ethdev level, or something else?
>More abstract on input/output data structure definition or else? Could you be more specific?

>>I know choosing names is hard, but as a Linux developer ethtool has a very specific meaning to me.
>>This API encompasses things broader than Linux ethtool and has different semantics therefore
>>not sure having something in DPDK with same name is really a good idea.
>>
>>It would be better to call it something else like netdev_?? Or dpnet_??

>Just to clarify the naming suggestion, in this patch, the prefix “ethtool” only appears on example and on this patch description.
>Are you suggesting changing the name over example/l2fwd-ethtool or on this patch description, or may be both?

Have not heard your feedback on last request?



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port stats and config option
  2015-06-23 13:55  3%   ` Thomas Monjalon
  2015-06-23 14:30  3%     ` Dumitrescu, Cristian
@ 2015-06-23 15:16  3%     ` Neil Horman
  1 sibling, 0 replies; 200+ results
From: Neil Horman @ 2015-06-23 15:16 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Tue, Jun 23, 2015 at 03:55:25PM +0200, Thomas Monjalon wrote:
> 2015-06-19 11:41, Maciej Gajdzica:
> >  /** Input port interface defining the input port operation */
> >  struct rte_port_in_ops {
> >  	rte_port_in_op_create f_create; /**< Create */
> >  	rte_port_in_op_free f_free;     /**< Free */
> >  	rte_port_in_op_rx f_rx;         /**< Packet RX (packet burst) */
> > +	rte_port_in_op_stats_read f_stats;	/**< Stats */
> >  };
> 
> Isn't it breaking an ABI?
> 
This is an interesting question.  Strictly speaking, yes it breaks ABI because
we're adding space, and if older applications statically allocate this
structure, it will be smaller than the port library expects, potentially
scribbling over someone elses memory.  That said, I'm not sure this structure is
meant to be accessed outside of the library.  If it isn't, then we can feel
comfortable that no one is going to access the data structure from code that was
compiled out of sync with the defining library.

The implication if thats true of course is that we should make this structure
opaque outside of the library with a structure prototype and move its definition
into the library private namespace, but I'm fine with doing that at a later date
if the intention is not to have applications touch this structure.

Regards
Neil

> >  struct rte_port_out_ops {
> > -	rte_port_out_op_create f_create;   /**< Create */
> > -	rte_port_out_op_free f_free;       /**< Free */
> > -	rte_port_out_op_tx f_tx;           /**< Packet TX (single packet) */
> > -	rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst) */
> > -	rte_port_out_op_flush f_flush;     /**< Flush */
> > +	rte_port_out_op_create f_create;		/**< Create */
> > +	rte_port_out_op_free f_free;			/**< Free */
> > +	rte_port_out_op_tx f_tx;				/**< Packet TX (single packet) */
> > +	rte_port_out_op_tx_bulk f_tx_bulk;		/**< Packet TX (packet burst) */
> > +	rte_port_out_op_flush f_flush;			/**< Flush */
> 
> What is the goal of this change? Breaking the alignment?
> 
> > +	rte_port_out_op_stats_read f_stats;     /**< Stats */
> 
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port stats and config option
  2015-06-23 14:30  3%     ` Dumitrescu, Cristian
@ 2015-06-23 14:54  0%       ` Thomas Monjalon
  2015-06-23 15:21  0%         ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-23 14:54 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev

2015-06-23 14:30, Dumitrescu, Cristian:
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > 2015-06-19 11:41, Maciej Gajdzica:
> > >  /** Input port interface defining the input port operation */
> > >  struct rte_port_in_ops {
> > >  	rte_port_in_op_create f_create; /**< Create */
> > >  	rte_port_in_op_free f_free;     /**< Free */
> > >  	rte_port_in_op_rx f_rx;         /**< Packet RX (packet burst) */
> > > +	rte_port_in_op_stats_read f_stats;	/**< Stats */
> > >  };
> > 
> > Isn't it breaking an ABI?
> 
> This is simply adding a field at the end of the API structure. This structure is instantiated per each port type  and its role is very similar to a driver ops structure, for example:
> 
> 	in file "rte_port_ethdev.h": extern struct rte_port_out_ops rte_port_ethdev_writer_ops;
> 	in file "rte_port_ring.h": extern struct rte_port_out_ops rte_port_ring_writer_nodrop_ops;
> 
> Typically, instances of these structures are only referenced through pointers by application code (and other libraries, like librte_pipeline), so code that is not aware of this last field in the structure will still continue to work.
> 
> The only case I see possible when code will break is if somebody would create an array of such structures, but I think this is not a realistic scenario. Instances of this structure are infrequent: once per port type in librte_port, and new instances are only created when the user wants to create new port type. Basically, instances of this structure are created in isolation and not in bulk (arrays).

Why wouldn't it be a problem even for single instance?
If the application allocates one with old sizeof and the lib is trying to write
in the new field, there can be a problem, no?

Maybe Neil has an opinion?

> Due to this, I do not see this as breaking the API. I think this is the most it could be done to minimize the effect on the ABI will still adding new functionality. Please let me know what you think.
> 
> > 
> > >  struct rte_port_out_ops {
> > > -	rte_port_out_op_create f_create;   /**< Create */
> > > -	rte_port_out_op_free f_free;       /**< Free */
> > > -	rte_port_out_op_tx f_tx;           /**< Packet TX (single packet) */
> > > -	rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst)
> > */
> > > -	rte_port_out_op_flush f_flush;     /**< Flush */
> > > +	rte_port_out_op_create f_create;		/**< Create */
> > > +	rte_port_out_op_free f_free;			/**< Free */
> > > +	rte_port_out_op_tx f_tx;				/**< Packet
> > TX (single packet) */
> > > +	rte_port_out_op_tx_bulk f_tx_bulk;		/**< Packet TX
> > (packet burst) */
> > > +	rte_port_out_op_flush f_flush;			/**< Flush */
> > 
> > What is the goal of this change? Breaking the alignment?
> 
> Shall we submit a new patch revision to fix the alignment of the comments?

Yes using spaces for alignment would be better.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port stats and config option
  2015-06-23 13:55  3%   ` Thomas Monjalon
@ 2015-06-23 14:30  3%     ` Dumitrescu, Cristian
  2015-06-23 14:54  0%       ` Thomas Monjalon
  2015-06-23 15:16  3%     ` Neil Horman
  1 sibling, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2015-06-23 14:30 UTC (permalink / raw)
  To: Thomas Monjalon, Gajdzica, MaciejX T; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, June 23, 2015 2:55 PM
> To: Gajdzica, MaciejX T
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port
> stats and config option
> 
> 2015-06-19 11:41, Maciej Gajdzica:
> >  /** Input port interface defining the input port operation */
> >  struct rte_port_in_ops {
> >  	rte_port_in_op_create f_create; /**< Create */
> >  	rte_port_in_op_free f_free;     /**< Free */
> >  	rte_port_in_op_rx f_rx;         /**< Packet RX (packet burst) */
> > +	rte_port_in_op_stats_read f_stats;	/**< Stats */
> >  };
> 
> Isn't it breaking an ABI?

This is simply adding a field at the end of the API structure. This structure is instantiated per each port type  and its role is very similar to a driver ops structure, for example:

	in file "rte_port_ethdev.h": extern struct rte_port_out_ops rte_port_ethdev_writer_ops;
	in file "rte_port_ring.h": extern struct rte_port_out_ops rte_port_ring_writer_nodrop_ops;

Typically, instances of these structures are only referenced through pointers by application code (and other libraries, like librte_pipeline), so code that is not aware of this last field in the structure will still continue to work.

The only case I see possible when code will break is if somebody would create an array of such structures, but I think this is not a realistic scenario. Instances of this structure are infrequent: once per port type in librte_port, and new instances are only created when the user wants to create new port type. Basically, instances of this structure are created in isolation and not in bulk (arrays).

Due to this, I do not see this as breaking the API. I think this is the most it could be done to minimize the effect on the ABI will still adding new functionality. Please let me know what you think.

> 
> >  struct rte_port_out_ops {
> > -	rte_port_out_op_create f_create;   /**< Create */
> > -	rte_port_out_op_free f_free;       /**< Free */
> > -	rte_port_out_op_tx f_tx;           /**< Packet TX (single packet) */
> > -	rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst)
> */
> > -	rte_port_out_op_flush f_flush;     /**< Flush */
> > +	rte_port_out_op_create f_create;		/**< Create */
> > +	rte_port_out_op_free f_free;			/**< Free */
> > +	rte_port_out_op_tx f_tx;				/**< Packet
> TX (single packet) */
> > +	rte_port_out_op_tx_bulk f_tx_bulk;		/**< Packet TX
> (packet burst) */
> > +	rte_port_out_op_flush f_flush;			/**< Flush */
> 
> What is the goal of this change? Breaking the alignment?

Shall we submit a new patch revision to fix the alignment of the comments?

> 
> > +	rte_port_out_op_stats_read f_stats;     /**< Stats */

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v5 01/13] port: added structures for port stats and config option
  @ 2015-06-23 13:55  3%   ` Thomas Monjalon
  2015-06-23 14:30  3%     ` Dumitrescu, Cristian
  2015-06-23 15:16  3%     ` Neil Horman
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2015-06-23 13:55 UTC (permalink / raw)
  To: Maciej Gajdzica; +Cc: dev

2015-06-19 11:41, Maciej Gajdzica:
>  /** Input port interface defining the input port operation */
>  struct rte_port_in_ops {
>  	rte_port_in_op_create f_create; /**< Create */
>  	rte_port_in_op_free f_free;     /**< Free */
>  	rte_port_in_op_rx f_rx;         /**< Packet RX (packet burst) */
> +	rte_port_in_op_stats_read f_stats;	/**< Stats */
>  };

Isn't it breaking an ABI?

>  struct rte_port_out_ops {
> -	rte_port_out_op_create f_create;   /**< Create */
> -	rte_port_out_op_free f_free;       /**< Free */
> -	rte_port_out_op_tx f_tx;           /**< Packet TX (single packet) */
> -	rte_port_out_op_tx_bulk f_tx_bulk; /**< Packet TX (packet burst) */
> -	rte_port_out_op_flush f_flush;     /**< Flush */
> +	rte_port_out_op_create f_create;		/**< Create */
> +	rte_port_out_op_free f_free;			/**< Free */
> +	rte_port_out_op_tx f_tx;				/**< Packet TX (single packet) */
> +	rte_port_out_op_tx_bulk f_tx_bulk;		/**< Packet TX (packet burst) */
> +	rte_port_out_op_flush f_flush;			/**< Flush */

What is the goal of this change? Breaking the alignment?

> +	rte_port_out_op_stats_read f_stats;     /**< Stats */

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 18/18] mbuf: remove old packet type bit masks
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (16 preceding siblings ...)
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 17/18] examples/l3fwd: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23 16:13  0%       ` [dpdk-dev] [PATCH v8 00/18] unified packet type Ananyev, Konstantin
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

As unified packet types are used instead, those old bit masks and
the relevant macros for packet type indication need to be removed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 4 ++++
 lib/librte_mbuf/rte_mbuf.h | 4 ++++
 2 files changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.
* Redefined the bit masks for packet RX offload flags.

v5 changes:
* Rolled back the bit masks of RX flags, for ABI compatibility.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index f506517..4320dd4 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -251,14 +251,18 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
 	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
 	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+#ifndef RTE_NEXT_ABI
 	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
 	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
 	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
 	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+#endif /* RTE_NEXT_ABI */
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+#ifndef RTE_NEXT_ABI
 	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
 	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+#endif /* RTE_NEXT_ABI */
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 0ee0c55..74a7f41 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,14 +91,18 @@ extern "C" {
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR     (0ULL << 0)  /**< Hardware processing error. */
 #define PKT_RX_MAC_ERR       (0ULL << 0)  /**< MAC error. */
+#ifndef RTE_NEXT_ABI
 #define PKT_RX_IPV4_HDR      (1ULL << 5)  /**< RX packet with IPv4 header. */
 #define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with extended IPv4 header. */
 #define PKT_RX_IPV6_HDR      (1ULL << 7)  /**< RX packet with IPv6 header. */
 #define PKT_RX_IPV6_HDR_EXT  (1ULL << 8)  /**< RX packet with extended IPv6 header. */
+#endif /* RTE_NEXT_ABI */
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
+#ifndef RTE_NEXT_ABI
 #define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 header.*/
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
+#endif /* RTE_NEXT_ABI */
 #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
 #define PKT_RX_QINQ_PKT      (1ULL << 15)  /**< RX packet with double VLAN stripped. */
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 17/18] examples/l3fwd: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (15 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 16/18] examples/l3fwd-power: " Helin Zhang
@ 2015-06-23  1:50  3%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 18/18] mbuf: remove old packet type bit masks Helin Zhang
  2015-06-23 16:13  0%       ` [dpdk-dev] [PATCH v8 00/18] unified packet type Ananyev, Konstantin
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd/main.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 120 insertions(+), 3 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v3 changes:
* Minor bug fixes and enhancements.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 7e4bbfd..eff9580 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -948,7 +948,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		/* Handle IPv4 headers.*/
 		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char *) +
 				sizeof(struct ether_hdr));
@@ -979,8 +983,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
-
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 	} else {
+#endif
 		/* Handle IPv6 headers.*/
 		struct ipv6_hdr *ipv6_hdr;
 
@@ -999,8 +1006,13 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
+#ifdef RTE_NEXT_ABI
+	} else
+		/* Free the mbuf that contains non-IPV4/IPV6 packet */
+		rte_pktmbuf_free(m);
+#else
 	}
-
+#endif
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1024,12 +1036,19 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
  * to BAD_PORT value.
  */
 static inline __attribute__((always_inline)) void
+#ifdef RTE_NEXT_ABI
+rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
+#else
 rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t flags)
+#endif
 {
 	uint8_t ihl;
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(ptype)) {
+#else
 	if ((flags & PKT_RX_IPV4_HDR) != 0) {
-
+#endif
 		ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL;
 
 		ipv4_hdr->time_to_live--;
@@ -1059,11 +1078,19 @@ get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	if (pkt->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		if (rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
 				&next_hop) != 0)
 			next_hop = portid;
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (pkt->ol_flags & PKT_RX_IPV6_HDR) {
+#endif
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 		if (rte_lpm6_lookup(qconf->ipv6_lookup_struct,
@@ -1097,12 +1124,52 @@ process_packet(struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	ve = val_eth[dp];
 
 	dst_port[0] = dp;
+#ifdef RTE_NEXT_ABI
+	rfc1812_process(ipv4_hdr, dst_port, pkt->packet_type);
+#else
 	rfc1812_process(ipv4_hdr, dst_port, pkt->ol_flags);
+#endif
 
 	te =  _mm_blend_epi16(te, ve, MASK_ETH);
 	_mm_store_si128((__m128i *)eth_hdr, te);
 }
 
+#ifdef RTE_NEXT_ABI
+/*
+ * Read packet_type and destination IPV4 addresses from 4 mbufs.
+ */
+static inline void
+processx4_step1(struct rte_mbuf *pkt[FWDSTEP],
+		__m128i *dip,
+		uint32_t *ipv4_flag)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ether_hdr *eth_hdr;
+	uint32_t x0, x1, x2, x3;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[0], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x0 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] = pkt[0]->packet_type & RTE_PTYPE_L3_IPV4;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[1], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x1 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[1]->packet_type;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[2], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x2 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[2]->packet_type;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[3], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x3 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[3]->packet_type;
+
+	dip[0] = _mm_set_epi32(x3, x2, x1, x0);
+}
+#else /* RTE_NEXT_ABI */
 /*
  * Read ol_flags and destination IPV4 addresses from 4 mbufs.
  */
@@ -1135,14 +1202,24 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, uint32_t *flag)
 
 	dip[0] = _mm_set_epi32(x3, x2, x1, x0);
 }
+#endif /* RTE_NEXT_ABI */
 
 /*
  * Lookup into LPM for destination port.
  * If lookup fails, use incoming port (portid) as destination port.
  */
 static inline void
+#ifdef RTE_NEXT_ABI
+processx4_step2(const struct lcore_conf *qconf,
+		__m128i dip,
+		uint32_t ipv4_flag,
+		uint8_t portid,
+		struct rte_mbuf *pkt[FWDSTEP],
+		uint16_t dprt[FWDSTEP])
+#else
 processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
 	uint8_t portid, struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP])
+#endif /* RTE_NEXT_ABI */
 {
 	rte_xmm_t dst;
 	const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
@@ -1152,7 +1229,11 @@ processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
 	dip = _mm_shuffle_epi8(dip, bswap_mask);
 
 	/* if all 4 packets are IPV4. */
+#ifdef RTE_NEXT_ABI
+	if (likely(ipv4_flag)) {
+#else
 	if (likely(flag != 0)) {
+#endif
 		rte_lpm_lookupx4(qconf->ipv4_lookup_struct, dip, dprt, portid);
 	} else {
 		dst.x = dip;
@@ -1202,6 +1283,16 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 	_mm_store_si128(p[2], te[2]);
 	_mm_store_si128(p[3], te[3]);
 
+#ifdef RTE_NEXT_ABI
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[0] + 1),
+		&dst_port[0], pkt[0]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[1] + 1),
+		&dst_port[1], pkt[1]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[2] + 1),
+		&dst_port[2], pkt[2]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[3] + 1),
+		&dst_port[3], pkt[3]->packet_type);
+#else /* RTE_NEXT_ABI */
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[0] + 1),
 		&dst_port[0], pkt[0]->ol_flags);
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[1] + 1),
@@ -1210,6 +1301,7 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 		&dst_port[2], pkt[2]->ol_flags);
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[3] + 1),
 		&dst_port[3], pkt[3]->ol_flags);
+#endif /* RTE_NEXT_ABI */
 }
 
 /*
@@ -1396,7 +1488,11 @@ main_loop(__attribute__((unused)) void *dummy)
 	uint16_t *lp;
 	uint16_t dst_port[MAX_PKT_BURST];
 	__m128i dip[MAX_PKT_BURST / FWDSTEP];
+#ifdef RTE_NEXT_ABI
+	uint32_t ipv4_flag[MAX_PKT_BURST / FWDSTEP];
+#else
 	uint32_t flag[MAX_PKT_BURST / FWDSTEP];
+#endif
 	uint16_t pnum[MAX_PKT_BURST + 1];
 #endif
 
@@ -1466,6 +1562,18 @@ main_loop(__attribute__((unused)) void *dummy)
 				 */
 				int32_t n = RTE_ALIGN_FLOOR(nb_rx, 4);
 				for (j = 0; j < n ; j+=4) {
+#ifdef RTE_NEXT_ABI
+					uint32_t pkt_type =
+						pkts_burst[j]->packet_type &
+						pkts_burst[j+1]->packet_type &
+						pkts_burst[j+2]->packet_type &
+						pkts_burst[j+3]->packet_type;
+					if (pkt_type & RTE_PTYPE_L3_IPV4) {
+						simple_ipv4_fwd_4pkts(
+						&pkts_burst[j], portid, qconf);
+					} else if (pkt_type &
+						RTE_PTYPE_L3_IPV6) {
+#else /* RTE_NEXT_ABI */
 					uint32_t ol_flag = pkts_burst[j]->ol_flags
 							& pkts_burst[j+1]->ol_flags
 							& pkts_burst[j+2]->ol_flags
@@ -1474,6 +1582,7 @@ main_loop(__attribute__((unused)) void *dummy)
 						simple_ipv4_fwd_4pkts(&pkts_burst[j],
 									portid, qconf);
 					} else if (ol_flag & PKT_RX_IPV6_HDR) {
+#endif /* RTE_NEXT_ABI */
 						simple_ipv6_fwd_4pkts(&pkts_burst[j],
 									portid, qconf);
 					} else {
@@ -1498,13 +1607,21 @@ main_loop(__attribute__((unused)) void *dummy)
 			for (j = 0; j != k; j += FWDSTEP) {
 				processx4_step1(&pkts_burst[j],
 					&dip[j / FWDSTEP],
+#ifdef RTE_NEXT_ABI
+					&ipv4_flag[j / FWDSTEP]);
+#else
 					&flag[j / FWDSTEP]);
+#endif
 			}
 
 			k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
 			for (j = 0; j != k; j += FWDSTEP) {
 				processx4_step2(qconf, dip[j / FWDSTEP],
+#ifdef RTE_NEXT_ABI
+					ipv4_flag[j / FWDSTEP], portid,
+#else
 					flag[j / FWDSTEP], portid,
+#endif
 					&pkts_burst[j], &dst_port[j]);
 			}
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 16/18] examples/l3fwd-power: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (14 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 15/18] examples/l3fwd-acl: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 17/18] examples/l3fwd: " Helin Zhang
                         ` (2 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd-power/main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 6057059..705188f 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -635,7 +635,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		/* Handle IPv4 headers.*/
 		ipv4_hdr =
 			(struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char*)
@@ -670,8 +674,12 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 	}
 	else {
+#endif
 		/* Handle IPv6 headers.*/
 #if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH)
 		struct ipv6_hdr *ipv6_hdr;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 15/18] examples/l3fwd-acl: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (13 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 14/18] examples/ip_reassembly: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 16/18] examples/l3fwd-power: " Helin Zhang
                         ` (3 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd-acl/main.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index a5d4f25..78b6df2 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -645,10 +645,13 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 	struct ipv4_hdr *ipv4_hdr;
 	struct rte_mbuf *pkt = pkts_in[index];
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
 
 	if (type == PKT_RX_IPV4_HDR) {
-
+#endif
 		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
 			unsigned char *) + sizeof(struct ether_hdr));
 
@@ -667,9 +670,11 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 			/* Not a valid IPv4 packet */
 			rte_pktmbuf_free(pkt);
 		}
-
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (type == PKT_RX_IPV6_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
 		acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -687,17 +692,22 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 {
 	struct rte_mbuf *pkt = pkts_in[index];
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
 
 	if (type == PKT_RX_IPV4_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv4[acl->num_ipv4] = MBUF_IPV4_2PROTO(pkt);
 		acl->m_ipv4[(acl->num_ipv4)++] = pkt;
 
-
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (type == PKT_RX_IPV6_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
 		acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -745,10 +755,17 @@ send_one_packet(struct rte_mbuf *m, uint32_t res)
 		/* in the ACL list, drop it */
 #ifdef L3FWDACL_DEBUG
 		if ((res & ACL_DENY_SIGNATURE) != 0) {
+#ifdef RTE_NEXT_ABI
+			if (RTE_ETH_IS_IPV4_HDR(m->packet_type))
+				dump_acl4_rule(m, res);
+			else if (RTE_ETH_IS_IPV6_HDR(m->packet_type))
+				dump_acl6_rule(m, res);
+#else
 			if (m->ol_flags & PKT_RX_IPV4_HDR)
 				dump_acl4_rule(m, res);
 			else
 				dump_acl6_rule(m, res);
+#endif /* RTE_NEXT_ABI */
 		}
 #endif
 		rte_pktmbuf_free(m);
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 14/18] examples/ip_reassembly: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (12 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 15/18] examples/l3fwd-acl: " Helin Zhang
                         ` (4 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/ip_reassembly/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 9ecb6f9..f1c47ad 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -356,7 +356,11 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	dst_port = portid;
 
 	/* if packet is IPv4 */
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & (PKT_RX_IPV4_HDR)) {
+#endif
 		struct ipv4_hdr *ip_hdr;
 		uint32_t ip_dst;
 
@@ -396,9 +400,14 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+		/* if packet is IPv6 */
+#else
 	}
 	/* if packet is IPv6 */
 	else if (m->ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) {
+#endif
 		struct ipv6_extension_fragment *frag_hdr;
 		struct ipv6_hdr *ip_hdr;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (11 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 12/18] app/test: Remove useless code Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 14/18] examples/ip_reassembly: " Helin Zhang
                         ` (5 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/ip_fragmentation/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 0922ba6..b71d05f 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -283,7 +283,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 	len = qconf->tx_mbufs[port_out].len;
 
 	/* if this is an IPv4 packet */
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		struct ipv4_hdr *ip_hdr;
 		uint32_t ip_dst;
 		/* Read the lookup key (i.e. ip_dst) from the input packet */
@@ -317,9 +321,14 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 			if (unlikely (len2 < 0))
 				return;
 		}
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+		/* if this is an IPv6 packet */
+#else
 	}
 	/* if this is an IPv6 packet */
 	else if (m->ol_flags & PKT_RX_IPV6_HDR) {
+#endif
 		struct ipv6_hdr *ip_hdr;
 
 		ipv6 = 1;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 12/18] app/test: Remove useless code
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (10 preceding siblings ...)
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 11/18] app/testpmd: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
                         ` (6 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

Severl useless code lines are added accidently, which blocks packet
type unification. They should be deleted at all.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test/packet_burst_generator.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

v4 changes:
* Removed several useless code lines which block packet type unification.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/app/test/packet_burst_generator.c b/app/test/packet_burst_generator.c
index b46eed7..61e6340 100644
--- a/app/test/packet_burst_generator.c
+++ b/app/test/packet_burst_generator.c
@@ -272,19 +272,21 @@ nomore_mbuf:
 		if (ipv4) {
 			pkt->vlan_tci  = ETHER_TYPE_IPv4;
 			pkt->l3_len = sizeof(struct ipv4_hdr);
-
+#ifndef RTE_NEXT_ABI
 			if (vlan_enabled)
 				pkt->ol_flags = PKT_RX_IPV4_HDR | PKT_RX_VLAN_PKT;
 			else
 				pkt->ol_flags = PKT_RX_IPV4_HDR;
+#endif
 		} else {
 			pkt->vlan_tci  = ETHER_TYPE_IPv6;
 			pkt->l3_len = sizeof(struct ipv6_hdr);
-
+#ifndef RTE_NEXT_ABI
 			if (vlan_enabled)
 				pkt->ol_flags = PKT_RX_IPV6_HDR | PKT_RX_VLAN_PKT;
 			else
 				pkt->ol_flags = PKT_RX_IPV6_HDR;
+#endif
 		}
 
 		pkts_burst[nb_pkt] = pkt;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 11/18] app/testpmd: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (9 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 10/18] app/test-pipeline: " Helin Zhang
@ 2015-06-23  1:50  3%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 12/18] app/test: Remove useless code Helin Zhang
                         ` (7 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c |  14 ++++
 app/test-pmd/rxonly.c   | 183 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v4 changes:
* Added printing logs of packet types of each received packet in rxonly mode.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 950ea82..fab9600 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -202,8 +202,14 @@ parse_ethernet(struct ether_hdr *eth_hdr, struct testpmd_offload_info *info)
 
 /* Parse a vxlan header */
 static void
+#ifdef RTE_NEXT_ABI
+parse_vxlan(struct udp_hdr *udp_hdr,
+	    struct testpmd_offload_info *info,
+	    uint32_t pkt_type)
+#else
 parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
 	uint64_t mbuf_olflags)
+#endif
 {
 	struct ether_hdr *eth_hdr;
 
@@ -211,8 +217,12 @@ parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
 	 * (rfc7348) or that the rx offload flag is set (i40e only
 	 * currently) */
 	if (udp_hdr->dst_port != _htons(4789) &&
+#ifdef RTE_NEXT_ABI
+		RTE_ETH_IS_TUNNEL_PKT(pkt_type) == 0)
+#else
 		(mbuf_olflags & (PKT_RX_TUNNEL_IPV4_HDR |
 			PKT_RX_TUNNEL_IPV6_HDR)) == 0)
+#endif
 		return;
 
 	info->is_tunnel = 1;
@@ -549,7 +559,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				struct udp_hdr *udp_hdr;
 				udp_hdr = (struct udp_hdr *)((char *)l3_hdr +
 					info.l3_len);
+#ifdef RTE_NEXT_ABI
+				parse_vxlan(udp_hdr, &info, m->packet_type);
+#else
 				parse_vxlan(udp_hdr, &info, m->ol_flags);
+#endif
 			} else if (info.l4_proto == IPPROTO_GRE) {
 				struct simple_gre_hdr *gre_hdr;
 				gre_hdr = (struct simple_gre_hdr *)
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index f6a2f84..5a30347 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -91,7 +91,11 @@ pkt_burst_receive(struct fwd_stream *fs)
 	uint64_t ol_flags;
 	uint16_t nb_rx;
 	uint16_t i, packet_type;
+#ifdef RTE_NEXT_ABI
+	uint16_t is_encapsulation;
+#else
 	uint64_t is_encapsulation;
+#endif
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -135,8 +139,12 @@ pkt_burst_receive(struct fwd_stream *fs)
 		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 
+#ifdef RTE_NEXT_ABI
+		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
+#else
 		is_encapsulation = ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
 				PKT_RX_TUNNEL_IPV6_HDR);
+#endif
 
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
@@ -163,6 +171,177 @@ pkt_burst_receive(struct fwd_stream *fs)
 		if (ol_flags & PKT_RX_QINQ_PKT)
 			printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
 					mb->vlan_tci, mb->vlan_tci_outer);
+#ifdef RTE_NEXT_ABI
+		if (mb->packet_type) {
+			uint32_t ptype;
+
+			/* (outer) L2 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L2_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L2_MAC:
+				printf(" - (outer) L2 type: MAC");
+				break;
+			case RTE_PTYPE_L2_MAC_TIMESYNC:
+				printf(" - (outer) L2 type: MAC Timesync");
+				break;
+			case RTE_PTYPE_L2_ARP:
+				printf(" - (outer) L2 type: ARP");
+				break;
+			case RTE_PTYPE_L2_LLDP:
+				printf(" - (outer) L2 type: LLDP");
+				break;
+			default:
+				printf(" - (outer) L2 type: Unknown");
+				break;
+			}
+
+			/* (outer) L3 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L3_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L3_IPV4:
+				printf(" - (outer) L3 type: IPV4");
+				break;
+			case RTE_PTYPE_L3_IPV4_EXT:
+				printf(" - (outer) L3 type: IPV4_EXT");
+				break;
+			case RTE_PTYPE_L3_IPV6:
+				printf(" - (outer) L3 type: IPV6");
+				break;
+			case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+				printf(" - (outer) L3 type: IPV4_EXT_UNKNOWN");
+				break;
+			case RTE_PTYPE_L3_IPV6_EXT:
+				printf(" - (outer) L3 type: IPV6_EXT");
+				break;
+			case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+				printf(" - (outer) L3 type: IPV6_EXT_UNKNOWN");
+				break;
+			default:
+				printf(" - (outer) L3 type: Unknown");
+				break;
+			}
+
+			/* (outer) L4 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L4_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L4_TCP:
+				printf(" - (outer) L4 type: TCP");
+				break;
+			case RTE_PTYPE_L4_UDP:
+				printf(" - (outer) L4 type: UDP");
+				break;
+			case RTE_PTYPE_L4_FRAG:
+				printf(" - (outer) L4 type: L4_FRAG");
+				break;
+			case RTE_PTYPE_L4_SCTP:
+				printf(" - (outer) L4 type: SCTP");
+				break;
+			case RTE_PTYPE_L4_ICMP:
+				printf(" - (outer) L4 type: ICMP");
+				break;
+			case RTE_PTYPE_L4_NONFRAG:
+				printf(" - (outer) L4 type: L4_NONFRAG");
+				break;
+			default:
+				printf(" - (outer) L4 type: Unknown");
+				break;
+			}
+
+			/* packet tunnel type */
+			ptype = mb->packet_type & RTE_PTYPE_TUNNEL_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_TUNNEL_IP:
+				printf(" - Tunnel type: IP");
+				break;
+			case RTE_PTYPE_TUNNEL_GRE:
+				printf(" - Tunnel type: GRE");
+				break;
+			case RTE_PTYPE_TUNNEL_VXLAN:
+				printf(" - Tunnel type: VXLAN");
+				break;
+			case RTE_PTYPE_TUNNEL_NVGRE:
+				printf(" - Tunnel type: NVGRE");
+				break;
+			case RTE_PTYPE_TUNNEL_GENEVE:
+				printf(" - Tunnel type: GENEVE");
+				break;
+			case RTE_PTYPE_TUNNEL_GRENAT:
+				printf(" - Tunnel type: GRENAT");
+				break;
+			default:
+				printf(" - Tunnel type: Unkown");
+				break;
+			}
+
+			/* inner L2 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_L2_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L2_MAC:
+				printf(" - Inner L2 type: MAC");
+				break;
+			case RTE_PTYPE_INNER_L2_MAC_VLAN:
+				printf(" - Inner L2 type: MAC_VLAN");
+				break;
+			default:
+				printf(" - Inner L2 type: Unknown");
+				break;
+			}
+
+			/* inner L3 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_INNER_L3_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L3_IPV4:
+				printf(" - Inner L3 type: IPV4");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV4_EXT:
+				printf(" - Inner L3 type: IPV4_EXT");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6:
+				printf(" - Inner L3 type: IPV6");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+				printf(" - Inner L3 type: IPV4_EXT_UNKNOWN");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6_EXT:
+				printf(" - Inner L3 type: IPV6_EXT");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+				printf(" - Inner L3 type: IPV6_EXT_UNKOWN");
+				break;
+			default:
+				printf(" - Inner L3 type: Unkown");
+				break;
+			}
+
+			/* inner L4 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_L4_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L4_TCP:
+				printf(" - Inner L4 type: TCP");
+				break;
+			case RTE_PTYPE_INNER_L4_UDP:
+				printf(" - Inner L4 type: UDP");
+				break;
+			case RTE_PTYPE_INNER_L4_FRAG:
+				printf(" - Inner L4 type: L4_FRAG");
+				break;
+			case RTE_PTYPE_INNER_L4_SCTP:
+				printf(" - Inner L4 type: SCTP");
+				break;
+			case RTE_PTYPE_INNER_L4_ICMP:
+				printf(" - Inner L4 type: ICMP");
+				break;
+			case RTE_PTYPE_INNER_L4_NONFRAG:
+				printf(" - Inner L4 type: L4_NONFRAG");
+				break;
+			default:
+				printf(" - Inner L4 type: Unknown");
+				break;
+			}
+			printf("\n");
+		} else
+			printf("Unknown packet type\n");
+#endif /* RTE_NEXT_ABI */
 		if (is_encapsulation) {
 			struct ipv4_hdr *ipv4_hdr;
 			struct ipv6_hdr *ipv6_hdr;
@@ -176,7 +355,11 @@ pkt_burst_receive(struct fwd_stream *fs)
 			l2_len  = sizeof(struct ether_hdr);
 
 			 /* Do not support ipv4 option field */
+#ifdef RTE_NEXT_ABI
+			if (RTE_ETH_IS_IPV4_HDR(packet_type)) {
+#else
 			if (ol_flags & PKT_RX_TUNNEL_IPV4_HDR) {
+#endif
 				l3_len = sizeof(struct ipv4_hdr);
 				ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 10/18] app/test-pipeline: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (8 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 09/18] fm10k: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 11/18] app/testpmd: " Helin Zhang
                         ` (8 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pipeline/pipeline_hash.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/app/test-pipeline/pipeline_hash.c b/app/test-pipeline/pipeline_hash.c
index 4598ad4..aa3f9e5 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -459,20 +459,33 @@ app_main_loop_rx_metadata(void) {
 			signature = RTE_MBUF_METADATA_UINT32_PTR(m, 0);
 			key = RTE_MBUF_METADATA_UINT8_PTR(m, 32);
 
+#ifdef RTE_NEXT_ABI
+			if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 			if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 				ip_hdr = (struct ipv4_hdr *)
 					&m_data[sizeof(struct ether_hdr)];
 				ip_dst = ip_hdr->dst_addr;
 
 				k32 = (uint32_t *) key;
 				k32[0] = ip_dst & 0xFFFFFF00;
+#ifdef RTE_NEXT_ABI
+			} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 			} else {
+#endif
 				ipv6_hdr = (struct ipv6_hdr *)
 					&m_data[sizeof(struct ether_hdr)];
 				ipv6_dst = ipv6_hdr->dst_addr;
 
 				memcpy(key, ipv6_dst, 16);
+#ifdef RTE_NEXT_ABI
+			} else
+				continue;
+#else
 			}
+#endif
 
 			*signature = test_hash(key, 0, 0);
 		}
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 09/18] fm10k: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (7 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 08/18] vmxnet3: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 10/18] app/test-pipeline: " Helin Zhang
                         ` (9 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

v4 changes:
* Supported unified packet type of fm10k from v4.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index f5d1ad0..4b00f5c 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -68,12 +68,37 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 static inline void
 rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d)
 {
+#ifdef RTE_NEXT_ABI
+	static const uint32_t
+		ptype_table[FM10K_RXD_PKTTYPE_MASK >> FM10K_RXD_PKTTYPE_SHIFT]
+			__rte_cache_aligned = {
+		[FM10K_PKTTYPE_OTHER] = RTE_PTYPE_L2_MAC,
+		[FM10K_PKTTYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+		[FM10K_PKTTYPE_IPV4_EX] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[FM10K_PKTTYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+		[FM10K_PKTTYPE_IPV6_EX] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[FM10K_PKTTYPE_IPV4 | FM10K_PKTTYPE_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[FM10K_PKTTYPE_IPV6 | FM10K_PKTTYPE_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[FM10K_PKTTYPE_IPV4 | FM10K_PKTTYPE_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[FM10K_PKTTYPE_IPV6 | FM10K_PKTTYPE_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+	};
+
+	m->packet_type = ptype_table[(d->w.pkt_info & FM10K_RXD_PKTTYPE_MASK)
+						>> FM10K_RXD_PKTTYPE_SHIFT];
+#else /* RTE_NEXT_ABI */
 	uint16_t ptype;
 	static const uint16_t pt_lut[] = { 0,
 		PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT,
 		PKT_RX_IPV6_HDR, PKT_RX_IPV6_HDR_EXT,
 		0, 0, 0
 	};
+#endif /* RTE_NEXT_ABI */
 
 	if (d->w.pkt_info & FM10K_RXD_RSSTYPE_MASK)
 		m->ol_flags |= PKT_RX_RSS_HASH;
@@ -97,9 +122,11 @@ rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d)
 	if (unlikely(d->d.staterr & FM10K_RXD_STATUS_RXE))
 		m->ol_flags |= PKT_RX_RECIP_ERR;
 
+#ifndef RTE_NEXT_ABI
 	ptype = (d->d.data & FM10K_RXD_PKTTYPE_MASK_L3) >>
 						FM10K_RXD_PKTTYPE_SHIFT;
 	m->ol_flags |= pt_lut[(uint8_t)ptype];
+#endif
 }
 
 uint16_t
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 07/18] enic: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (5 preceding siblings ...)
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 06/18] i40e: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 08/18] vmxnet3: " Helin Zhang
                         ` (11 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/enic/enic_main.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 15313c2..f47e96c 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -423,7 +423,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 		rx_pkt->pkt_len = bytes_written;
 
 		if (ipv4) {
+#ifdef RTE_NEXT_ABI
+			rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 			rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 			if (!csum_not_calc) {
 				if (unlikely(!ipv4_csum_ok))
 					rx_pkt->ol_flags |= PKT_RX_IP_CKSUM_BAD;
@@ -432,7 +436,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 					rx_pkt->ol_flags |= PKT_RX_L4_CKSUM_BAD;
 			}
 		} else if (ipv6)
+#ifdef RTE_NEXT_ABI
+			rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
+#else
 			rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 	} else {
 		/* Header split */
 		if (sop && !eop) {
@@ -445,7 +453,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 				*rx_pkt_bucket = rx_pkt;
 				rx_pkt->pkt_len = bytes_written;
 				if (ipv4) {
+#ifdef RTE_NEXT_ABI
+					rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 					rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 					if (!csum_not_calc) {
 						if (unlikely(!ipv4_csum_ok))
 							rx_pkt->ol_flags |=
@@ -457,13 +469,22 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 							    PKT_RX_L4_CKSUM_BAD;
 					}
 				} else if (ipv6)
+#ifdef RTE_NEXT_ABI
+					rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
+#else
 					rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 			} else {
 				/* Payload */
 				hdr_rx_pkt = *rx_pkt_bucket;
 				hdr_rx_pkt->pkt_len += bytes_written;
 				if (ipv4) {
+#ifdef RTE_NEXT_ABI
+					hdr_rx_pkt->packet_type =
+						RTE_PTYPE_L3_IPV4;
+#else
 					hdr_rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 					if (!csum_not_calc) {
 						if (unlikely(!ipv4_csum_ok))
 							hdr_rx_pkt->ol_flags |=
@@ -475,7 +496,12 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 							    PKT_RX_L4_CKSUM_BAD;
 					}
 				} else if (ipv6)
+#ifdef RTE_NEXT_ABI
+					hdr_rx_pkt->packet_type =
+						RTE_PTYPE_L3_IPV6;
+#else
 					hdr_rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 
 			}
 		}
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 08/18] vmxnet3: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (6 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 07/18] enic: " Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 09/18] fm10k: " Helin Zhang
                         ` (10 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index a1eac45..25ae2f6 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -649,9 +649,17 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);
 
 			if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct ipv4_hdr))
+#ifdef RTE_NEXT_ABI
+				rxm->packet_type = RTE_PTYPE_L3_IPV4_EXT;
+#else
 				rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+#endif
 			else
+#ifdef RTE_NEXT_ABI
+				rxm->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 				rxm->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 
 			if (!rcd->cnc) {
 				if (!rcd->ipc)
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 06/18] i40e: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (4 preceding siblings ...)
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 05/18] ixgbe: " Helin Zhang
@ 2015-06-23  1:50  3%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 07/18] enic: " Helin Zhang
                         ` (12 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_rxtx.c | 528 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 528 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index b2e1d6d..b951da0 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -176,6 +176,514 @@ i40e_rxd_error_to_pkt_flags(uint64_t qword)
 	return flags;
 }
 
+#ifdef RTE_NEXT_ABI
+/* For each value it means, datasheet of hardware can tell more details */
+static inline uint32_t
+i40e_rxd_pkt_type_mapping(uint8_t ptype)
+{
+	static const uint32_t ptype_table[UINT8_MAX] __rte_cache_aligned = {
+		/* L2 types */
+		/* [0] reserved */
+		[1] = RTE_PTYPE_L2_MAC,
+		[2] = RTE_PTYPE_L2_MAC_TIMESYNC,
+		/* [3] - [5] reserved */
+		[6] = RTE_PTYPE_L2_LLDP,
+		/* [7] - [10] reserved */
+		[11] = RTE_PTYPE_L2_ARP,
+		/* [12] - [21] reserved */
+
+		/* Non tunneled IPv4 */
+		[22] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_FRAG,
+		[23] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_NONFRAG,
+		[24] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_UDP,
+		/* [25] reserved */
+		[26] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_TCP,
+		[27] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_SCTP,
+		[28] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_ICMP,
+
+		/* IPv4 --> IPv4 */
+		[29] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[30] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[31] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [32] reserved */
+		[33] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[34] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[35] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> IPv6 */
+		[36] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[37] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[38] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [39] reserved */
+		[40] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[41] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[42] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN */
+		[43] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> IPv4 */
+		[44] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[45] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[46] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [47] reserved */
+		[48] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[49] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[50] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> IPv6 */
+		[51] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[52] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[53] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [54] reserved */
+		[55] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[56] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[57] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC */
+		[58] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4 */
+		[59] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[60] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[61] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [62] reserved */
+		[63] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[64] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[65] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv6 */
+		[66] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[67] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[68] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [69] reserved */
+		[70] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[71] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[72] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN */
+		[73] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv4 */
+		[74] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[75] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[76] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [77] reserved */
+		[78] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[79] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[80] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv6 */
+		[81] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[82] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[83] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [84] reserved */
+		[85] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[86] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[87] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* Non tunneled IPv6 */
+		[88] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_FRAG,
+		[89] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_NONFRAG,
+		[90] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_UDP,
+		/* [91] reserved */
+		[92] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_TCP,
+		[93] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_SCTP,
+		[94] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_ICMP,
+
+		/* IPv6 --> IPv4 */
+		[95] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[96] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[97] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [98] reserved */
+		[99] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[100] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[101] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> IPv6 */
+		[102] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[103] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[104] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [105] reserved */
+		[106] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[107] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[108] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN */
+		[109] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> IPv4 */
+		[110] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[111] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[112] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [113] reserved */
+		[114] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[115] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[116] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> IPv6 */
+		[117] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[118] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[119] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [120] reserved */
+		[121] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[122] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[123] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC */
+		[124] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC --> IPv4 */
+		[125] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[126] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[127] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [128] reserved */
+		[129] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[130] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[131] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC --> IPv6 */
+		[132] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[133] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[134] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [135] reserved */
+		[136] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[137] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[138] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN */
+		[139] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv4 */
+		[140] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[141] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[142] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [143] reserved */
+		[144] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[145] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[146] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv6 */
+		[147] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[148] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[149] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [150] reserved */
+		[151] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[152] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[153] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* All others reserved */
+	};
+
+	return ptype_table[ptype];
+}
+#else /* RTE_NEXT_ABI */
 /* Translate pkt types to pkt flags */
 static inline uint64_t
 i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
@@ -443,6 +951,7 @@ i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
 
 	return ip_ptype_map[ptype];
 }
+#endif /* RTE_NEXT_ABI */
 
 #define I40E_RX_DESC_EXT_STATUS_FLEXBH_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FD_ID  0x01
@@ -730,11 +1239,18 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			i40e_rxd_to_vlan_tci(mb, &rxdp[j]);
 			pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_NEXT_ABI
+			mb->packet_type =
+				i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+						I40E_RXD_QW1_PTYPE_MASK) >>
+						I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 			pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 
 			mb->packet_type = (uint16_t)((qword1 &
 					I40E_RXD_QW1_PTYPE_MASK) >>
 					I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_NEXT_ABI */
 			if (pkt_flags & PKT_RX_RSS_HASH)
 				mb->hash.rss = rte_le_to_cpu_32(\
 					rxdp[j].wb.qword0.hi_dword.rss);
@@ -971,9 +1487,15 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		i40e_rxd_to_vlan_tci(rxm, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_NEXT_ABI
+		rxm->packet_type =
+			i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+			I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 		rxm->packet_type = (uint16_t)((qword1 & I40E_RXD_QW1_PTYPE_MASK) >>
 				I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_NEXT_ABI */
 		if (pkt_flags & PKT_RX_RSS_HASH)
 			rxm->hash.rss =
 				rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
@@ -1129,10 +1651,16 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		i40e_rxd_to_vlan_tci(first_seg, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_NEXT_ABI
+		first_seg->packet_type =
+			i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+			I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 		first_seg->packet_type = (uint16_t)((qword1 &
 					I40E_RXD_QW1_PTYPE_MASK) >>
 					I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_NEXT_ABI */
 		if (pkt_flags & PKT_RX_RSS_HASH)
 			rxm->hash.rss =
 				rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 05/18] ixgbe: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (3 preceding siblings ...)
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
@ 2015-06-23  1:50  3%       ` Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 06/18] i40e: " Helin Zhang
                         ` (13 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet type among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.
Note that around 2.5% performance drop (64B) was observed of doing
4 ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 163 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 163 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index a211096..83a869f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -860,6 +860,110 @@ end_of_tx:
  *  RX functions
  *
  **********************************************************************/
+#ifdef RTE_NEXT_ABI
+#define IXGBE_PACKET_TYPE_IPV4              0X01
+#define IXGBE_PACKET_TYPE_IPV4_TCP          0X11
+#define IXGBE_PACKET_TYPE_IPV4_UDP          0X21
+#define IXGBE_PACKET_TYPE_IPV4_SCTP         0X41
+#define IXGBE_PACKET_TYPE_IPV4_EXT          0X03
+#define IXGBE_PACKET_TYPE_IPV4_EXT_SCTP     0X43
+#define IXGBE_PACKET_TYPE_IPV6              0X04
+#define IXGBE_PACKET_TYPE_IPV6_TCP          0X14
+#define IXGBE_PACKET_TYPE_IPV6_UDP          0X24
+#define IXGBE_PACKET_TYPE_IPV6_EXT          0X0C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_TCP      0X1C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_UDP      0X2C
+#define IXGBE_PACKET_TYPE_IPV4_IPV6         0X05
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_TCP     0X15
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_UDP     0X25
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT     0X0D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IXGBE_PACKET_TYPE_MAX               0X80
+#define IXGBE_PACKET_TYPE_MASK              0X7F
+#define IXGBE_PACKET_TYPE_SHIFT             0X04
+static inline uint32_t
+ixgbe_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+	static const uint32_t
+		ptype_table[IXGBE_PACKET_TYPE_MAX] __rte_cache_aligned = {
+		[IXGBE_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4,
+		[IXGBE_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[IXGBE_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6,
+		[IXGBE_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT,
+		[IXGBE_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+		[IXGBE_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+	};
+	if (unlikely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+		return RTE_PTYPE_UNKNOWN;
+
+	pkt_info = (pkt_info >> IXGBE_PACKET_TYPE_SHIFT) &
+				IXGBE_PACKET_TYPE_MASK;
+
+	return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+ixgbe_rxd_pkt_info_to_pkt_flags(uint16_t pkt_info)
+{
+	static uint64_t ip_rss_types_map[16] __rte_cache_aligned = {
+		0, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH,
+		0, PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH,
+		PKT_RX_RSS_HASH, 0, 0, 0,
+		0, 0, 0,  PKT_RX_FDIR,
+	};
+#ifdef RTE_LIBRTE_IEEE1588
+	static uint64_t ip_pkt_etqf_map[8] = {
+		0, 0, 0, PKT_RX_IEEE1588_PTP,
+		0, 0, 0, 0,
+	};
+
+	if (likely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+		return ip_pkt_etqf_map[(pkt_info >> 4) & 0X07] |
+				ip_rss_types_map[pkt_info & 0XF];
+	else
+		return ip_rss_types_map[pkt_info & 0XF];
+#else
+	return ip_rss_types_map[pkt_info & 0XF];
+#endif
+}
+#else /* RTE_NEXT_ABI */
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
@@ -895,6 +999,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 #endif
 	return pkt_flags | ip_rss_types_map[hl_tp_rs & 0xF];
 }
+#endif /* RTE_NEXT_ABI */
 
 static inline uint64_t
 rx_desc_status_to_pkt_flags(uint32_t rx_status)
@@ -950,7 +1055,13 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
 	uint64_t pkt_flags;
+#ifdef RTE_NEXT_ABI
+	int nb_dd;
+	uint32_t s[LOOK_AHEAD];
+	uint16_t pkt_info[LOOK_AHEAD];
+#else
 	int s[LOOK_AHEAD], nb_dd;
+#endif /* RTE_NEXT_ABI */
 	int i, j, nb_rx = 0;
 
 
@@ -973,6 +1084,12 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 		for (j = LOOK_AHEAD-1; j >= 0; --j)
 			s[j] = rxdp[j].wb.upper.status_error;
 
+#ifdef RTE_NEXT_ABI
+		for (j = LOOK_AHEAD-1; j >= 0; --j)
+			pkt_info[j] = rxdp[j].wb.lower.lo_dword.
+						hs_rss.pkt_info;
+#endif /* RTE_NEXT_ABI */
+
 		/* Compute how many status bits were set */
 		nb_dd = 0;
 		for (j = 0; j < LOOK_AHEAD; ++j)
@@ -989,12 +1106,22 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 			mb->vlan_tci = rte_le_to_cpu_16(rxdp[j].wb.upper.vlan);
 
 			/* convert descriptor fields to rte mbuf flags */
+#ifdef RTE_NEXT_ABI
+			pkt_flags = rx_desc_status_to_pkt_flags(s[j]);
+			pkt_flags |= rx_desc_error_to_pkt_flags(s[j]);
+			pkt_flags |=
+				ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info[j]);
+			mb->ol_flags = pkt_flags;
+			mb->packet_type =
+				ixgbe_rxd_pkt_info_to_pkt_type(pkt_info[j]);
+#else /* RTE_NEXT_ABI */
 			pkt_flags  = rx_desc_hlen_type_rss_to_pkt_flags(
 					rxdp[j].wb.lower.lo_dword.data);
 			/* reuse status field from scan list */
 			pkt_flags |= rx_desc_status_to_pkt_flags(s[j]);
 			pkt_flags |= rx_desc_error_to_pkt_flags(s[j]);
 			mb->ol_flags = pkt_flags;
+#endif /* RTE_NEXT_ABI */
 
 			if (likely(pkt_flags & PKT_RX_RSS_HASH))
 				mb->hash.rss = rxdp[j].wb.lower.hi_dword.rss;
@@ -1211,7 +1338,11 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	union ixgbe_adv_rx_desc rxd;
 	uint64_t dma_addr;
 	uint32_t staterr;
+#ifdef RTE_NEXT_ABI
+	uint32_t pkt_info;
+#else
 	uint32_t hlen_type_rss;
+#endif
 	uint16_t pkt_len;
 	uint16_t rx_id;
 	uint16_t nb_rx;
@@ -1329,6 +1460,19 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		rxm->data_len = pkt_len;
 		rxm->port = rxq->port_id;
 
+#ifdef RTE_NEXT_ABI
+		pkt_info = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.hs_rss.
+								pkt_info);
+		/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
+		rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
+
+		pkt_flags = rx_desc_status_to_pkt_flags(staterr);
+		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
+		pkt_flags = pkt_flags |
+			ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info);
+		rxm->ol_flags = pkt_flags;
+		rxm->packet_type = ixgbe_rxd_pkt_info_to_pkt_type(pkt_info);
+#else /* RTE_NEXT_ABI */
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
 		rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
@@ -1337,6 +1481,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		rxm->ol_flags = pkt_flags;
+#endif /* RTE_NEXT_ABI */
 
 		if (likely(pkt_flags & PKT_RX_RSS_HASH))
 			rxm->hash.rss = rxd.wb.lower.hi_dword.rss;
@@ -1410,6 +1555,23 @@ ixgbe_fill_cluster_head_buf(
 	uint8_t port_id,
 	uint32_t staterr)
 {
+#ifdef RTE_NEXT_ABI
+	uint16_t pkt_info;
+	uint64_t pkt_flags;
+
+	head->port = port_id;
+
+	/* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is
+	 * set in the pkt_flags field.
+	 */
+	head->vlan_tci = rte_le_to_cpu_16(desc->wb.upper.vlan);
+	pkt_info = rte_le_to_cpu_32(desc->wb.lower.lo_dword.hs_rss.pkt_info);
+	pkt_flags = rx_desc_status_to_pkt_flags(staterr);
+	pkt_flags |= rx_desc_error_to_pkt_flags(staterr);
+	pkt_flags |= ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info);
+	head->ol_flags = pkt_flags;
+	head->packet_type = ixgbe_rxd_pkt_info_to_pkt_type(pkt_info);
+#else /* RTE_NEXT_ABI */
 	uint32_t hlen_type_rss;
 	uint64_t pkt_flags;
 
@@ -1425,6 +1587,7 @@ ixgbe_fill_cluster_head_buf(
 	pkt_flags |= rx_desc_status_to_pkt_flags(staterr);
 	pkt_flags |= rx_desc_error_to_pkt_flags(staterr);
 	head->ol_flags = pkt_flags;
+#endif /* RTE_NEXT_ABI */
 
 	if (likely(pkt_flags & PKT_RX_RSS_HASH))
 		head->hash.rss = rte_le_to_cpu_32(desc->wb.lower.hi_dword.rss);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 04/18] e1000: replace bit mask based packet type with unified packet type
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
                         ` (2 preceding siblings ...)
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 03/18] mbuf: add definitions of unified packet types Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 05/18] ixgbe: " Helin Zhang
                         ` (14 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_rxtx.c | 102 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 43d6703..d1c2ef8 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -590,6 +590,99 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
  *  RX functions
  *
  **********************************************************************/
+#ifdef RTE_NEXT_ABI
+#define IGB_PACKET_TYPE_IPV4              0X01
+#define IGB_PACKET_TYPE_IPV4_TCP          0X11
+#define IGB_PACKET_TYPE_IPV4_UDP          0X21
+#define IGB_PACKET_TYPE_IPV4_SCTP         0X41
+#define IGB_PACKET_TYPE_IPV4_EXT          0X03
+#define IGB_PACKET_TYPE_IPV4_EXT_SCTP     0X43
+#define IGB_PACKET_TYPE_IPV6              0X04
+#define IGB_PACKET_TYPE_IPV6_TCP          0X14
+#define IGB_PACKET_TYPE_IPV6_UDP          0X24
+#define IGB_PACKET_TYPE_IPV6_EXT          0X0C
+#define IGB_PACKET_TYPE_IPV6_EXT_TCP      0X1C
+#define IGB_PACKET_TYPE_IPV6_EXT_UDP      0X2C
+#define IGB_PACKET_TYPE_IPV4_IPV6         0X05
+#define IGB_PACKET_TYPE_IPV4_IPV6_TCP     0X15
+#define IGB_PACKET_TYPE_IPV4_IPV6_UDP     0X25
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT     0X0D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IGB_PACKET_TYPE_MAX               0X80
+#define IGB_PACKET_TYPE_MASK              0X7F
+#define IGB_PACKET_TYPE_SHIFT             0X04
+static inline uint32_t
+igb_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+	static const uint32_t
+		ptype_table[IGB_PACKET_TYPE_MAX] __rte_cache_aligned = {
+		[IGB_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+		[IGB_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[IGB_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+		[IGB_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6,
+		[IGB_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT,
+		[IGB_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+		[IGB_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_UDP] =  RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+		[IGB_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+		[IGB_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+	};
+	if (unlikely(pkt_info & E1000_RXDADV_PKTTYPE_ETQF))
+		return RTE_PTYPE_UNKNOWN;
+
+	pkt_info = (pkt_info >> IGB_PACKET_TYPE_SHIFT) & IGB_PACKET_TYPE_MASK;
+
+	return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
+{
+	uint64_t pkt_flags = ((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH;
+
+#if defined(RTE_LIBRTE_IEEE1588)
+	static uint32_t ip_pkt_etqf_map[8] = {
+		0, 0, 0, PKT_RX_IEEE1588_PTP,
+		0, 0, 0, 0,
+	};
+
+	pkt_flags |= ip_pkt_etqf_map[(hl_tp_rs >> 4) & 0x07];
+#endif
+
+	return pkt_flags;
+}
+#else /* RTE_NEXT_ABI */
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
@@ -617,6 +710,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 #endif
 	return pkt_flags | (((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH);
 }
+#endif /* RTE_NEXT_ABI */
 
 static inline uint64_t
 rx_desc_status_to_pkt_flags(uint32_t rx_status)
@@ -790,6 +884,10 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		rxm->ol_flags = pkt_flags;
+#ifdef RTE_NEXT_ABI
+		rxm->packet_type = igb_rxd_pkt_info_to_pkt_type(rxd.wb.lower.
+						lo_dword.hs_rss.pkt_info);
+#endif
 
 		/*
 		 * Store the mbuf address into the next entry of the array
@@ -1024,6 +1122,10 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		first_seg->ol_flags = pkt_flags;
+#ifdef RTE_NEXT_ABI
+		first_seg->packet_type = igb_rxd_pkt_info_to_pkt_type(rxd.wb.
+					lower.lo_dword.hs_rss.pkt_info);
+#endif
 
 		/* Prefetch data of first segment, if configured to do so. */
 		rte_packet_prefetch((char *)first_seg->buf_addr +
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 03/18] mbuf: add definitions of unified packet types
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
@ 2015-06-23  1:50  3%       ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
                         ` (15 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

As there are only 6 bit flags in ol_flags for indicating packet
types, which is not enough to describe all the possible packet
types hardware can recognize. For example, i40e hardware can
recognize more than 150 packet types. Unified packet type is
composed of L2 type, L3 type, L4 type, tunnel type, inner L2 type,
inner L3 type and inner L4 type fields, and can be stored in
'struct rte_mbuf' of 32 bits field 'packet_type'.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 487 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 487 insertions(+)

v3 changes:
* Put the definitions of unified packet type into a single patch.

v4 changes:
* Added detailed description of each packet types.

v5 changes:
* Re-worded the commit logs.
* Added more detailed description for all packet types, together with examples.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 0315561..0ee0c55 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -201,6 +201,493 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+#ifdef RTE_NEXT_ABI
+/*
+ * 32 bits are divided into several fields to mark packet types. Note that
+ * each field is indexical.
+ * - Bit 3:0 is for L2 types.
+ * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
+ * - Bit 11:8 is for L4 or outer L4 (for tunneling case) types.
+ * - Bit 15:12 is for tunnel types.
+ * - Bit 19:16 is for inner L2 types.
+ * - Bit 23:20 is for inner L3 types.
+ * - Bit 27:24 is for inner L4 types.
+ * - Bit 31:28 is reserved.
+ *
+ * To be compatible with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,
+ * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
+ * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
+ *
+ * Note that L3 types values are selected for checking IPV4/IPV6 header from
+ * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
+ * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
+ *
+ * Note that the packet types of the same packet recognized by different
+ * hardware may be different, as different hardware may have different
+ * capability of packet type recognition.
+ *
+ * examples:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=0x29
+ * | 'version'=6, 'next header'=0x3A
+ * | 'ICMPv6 header'>
+ * will be recognized on i40e hardware as packet type combination of,
+ * RTE_PTYPE_L2_MAC |
+ * RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+ * RTE_PTYPE_TUNNEL_IP |
+ * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_INNER_L4_ICMP.
+ *
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x2F
+ * | 'GRE header'
+ * | 'version'=6, 'next header'=0x11
+ * | 'UDP header'>
+ * will be recognized on i40e hardware as packet type combination of,
+ * RTE_PTYPE_L2_MAC |
+ * RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_TUNNEL_GRENAT |
+ * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_INNER_L4_UDP.
+ */
+#define RTE_PTYPE_UNKNOWN                   0x00000000
+/**
+ * MAC (Media Access Control) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=[0x0800|0x86DD|others]>
+ */
+#define RTE_PTYPE_L2_MAC                    0x00000001
+/**
+ * MAC (Media Access Control) packet type for time sync.
+ *
+ * Packet format:
+ * <'ether type'=0x88F7>
+ */
+#define RTE_PTYPE_L2_MAC_TIMESYNC           0x00000002
+/**
+ * ARP (Address Resolution Protocol) packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0806>
+ */
+#define RTE_PTYPE_L2_ARP                    0x00000003
+/**
+ * LLDP (Link Layer Discovery Protocol) packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x88CC>
+ */
+#define RTE_PTYPE_L2_LLDP                   0x00000004
+/**
+ * Mask of layer 2 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L2_MASK                   0x0000000f
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and does not contain any
+ * header option.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=5>
+ */
+#define RTE_PTYPE_L3_IPV4                   0x00000010
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and contains header
+ * options.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[6-15], 'options'>
+ */
+#define RTE_PTYPE_L3_IPV4_EXT               0x00000030
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and does not contain any
+ * extension header.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x3B>
+ */
+#define RTE_PTYPE_L3_IPV6                   0x00000040
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and may or maynot contain
+ * header options.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[5-15], <'options'>>
+ */
+#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x00000090
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and contains extension
+ * headers.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   'extension headers'>
+ */
+#define RTE_PTYPE_L3_IPV6_EXT               0x000000c0
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and may or maynot contain
+ * extension headers.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   <'extension headers'>>
+ */
+#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x000000e0
+/**
+ * Mask of layer 3 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L3_MASK                   0x000000f0
+/**
+ * TCP (Transmission Control Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=6, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=6>
+ */
+#define RTE_PTYPE_L4_TCP                    0x00000100
+/**
+ * UDP (User Datagram Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17>
+ */
+#define RTE_PTYPE_L4_UDP                    0x00000200
+/**
+ * Fragmented IP (Internet Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * It refers to those packets of any IP types, which can be recognized as
+ * fragmented. A fragmented packet cannot be recognized as any other L4 types
+ * (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP,
+ * RTE_PTYPE_L4_NONFRAG).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'MF'=1>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=44>
+ */
+#define RTE_PTYPE_L4_FRAG                   0x00000300
+/**
+ * SCTP (Stream Control Transmission Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=132, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=132>
+ */
+#define RTE_PTYPE_L4_SCTP                   0x00000400
+/**
+ * ICMP (Internet Control Message Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=1, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=1>
+ */
+#define RTE_PTYPE_L4_ICMP                   0x00000500
+/**
+ * Non-fragmented IP (Internet Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * It refers to those packets of any IP types, while cannot be recognized as
+ * any of above L4 types (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
+ * RTE_PTYPE_L4_FRAG, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'!=[6|17|44|132|1]>
+ */
+#define RTE_PTYPE_L4_NONFRAG                0x00000600
+/**
+ * Mask of layer 4 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L4_MASK                   0x00000f00
+/**
+ * IP (Internet Protocol) in IP (Internet Protocol) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=[4|41]>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[4|41]>
+ */
+#define RTE_PTYPE_TUNNEL_IP                 0x00001000
+/**
+ * GRE (Generic Routing Encapsulation) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47>
+ */
+#define RTE_PTYPE_TUNNEL_GRE                0x00002000
+/**
+ * VXLAN (Virtual eXtensible Local Area Network) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=4798>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=4798>
+ */
+#define RTE_PTYPE_TUNNEL_VXLAN              0x00003000
+/**
+ * NVGRE (Network Virtualization using Generic Routing Encapsulation) tunneling
+ * packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47
+ * | 'protocol type'=0x6558>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47
+ * | 'protocol type'=0x6558'>
+ */
+#define RTE_PTYPE_TUNNEL_NVGRE              0x00004000
+/**
+ * GENEVE (Generic Network Virtualization Encapsulation) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=6081>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=6081>
+ */
+#define RTE_PTYPE_TUNNEL_GENEVE             0x00005000
+/**
+ * Tunneling packet type of Teredo, VXLAN (Virtual eXtensible Local Area
+ * Network) or GRE (Generic Routing Encapsulation) could be recognized as this
+ * packet type, if they can not be recognized independently as of hardware
+ * capability.
+ */
+#define RTE_PTYPE_TUNNEL_GRENAT             0x00006000
+/**
+ * Mask of tunneling packet types.
+ */
+#define RTE_PTYPE_TUNNEL_MASK               0x0000f000
+/**
+ * MAC (Media Access Control) packet type.
+ * It is used for inner packet type only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=[0x800|0x86DD]>
+ */
+#define RTE_PTYPE_INNER_L2_MAC              0x00010000
+/**
+ * MAC (Media Access Control) packet type with VLAN (Virtual Local Area
+ * Network) tag.
+ *
+ * Packet format (inner only):
+ * <'ether type'=[0x800|0x86DD], vlan=[1-4095]>
+ */
+#define RTE_PTYPE_INNER_L2_MAC_VLAN         0x00020000
+/**
+ * Mask of inner layer 2 packet types.
+ */
+#define RTE_PTYPE_INNER_L2_MASK             0x000f0000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and does not contain any header option.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=5>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4             0x00100000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and contains header options.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[6-15], 'options'>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x00200000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and does not contain any extension header.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x3B>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6             0x00300000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and may or maynot contain header options.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[5-15], <'options'>>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x00400000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and contains extension headers.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   'extension headers'>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x00500000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and may or maynot contain extension
+ * headers.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   <'extension headers'>>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x00600000
+/**
+ * Mask of inner layer 3 packet types.
+ */
+#define RTE_PTYPE_INNER_INNER_L3_MASK       0x00f00000
+/**
+ * TCP (Transmission Control Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=6, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=6>
+ */
+#define RTE_PTYPE_INNER_L4_TCP              0x01000000
+/**
+ * UDP (User Datagram Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17>
+ */
+#define RTE_PTYPE_INNER_L4_UDP              0x02000000
+/**
+ * Fragmented IP (Internet Protocol) packet type.
+ * It is used for inner packet only, and may or maynot have layer 4 packet.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'MF'=1>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=44>
+ */
+#define RTE_PTYPE_INNER_L4_FRAG             0x03000000
+/**
+ * SCTP (Stream Control Transmission Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=132, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=132>
+ */
+#define RTE_PTYPE_INNER_L4_SCTP             0x04000000
+/**
+ * ICMP (Internet Control Message Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=1, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=1>
+ */
+#define RTE_PTYPE_INNER_L4_ICMP             0x05000000
+/**
+ * Non-fragmented IP (Internet Protocol) packet type.
+ * It is used for inner packet only, and may or maynot have other unknown layer
+ * 4 packet types.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'!=[6|17|44|132|1]>
+ */
+#define RTE_PTYPE_INNER_L4_NONFRAG          0x06000000
+/**
+ * Mask of inner layer 4 packet types.
+ */
+#define RTE_PTYPE_INNER_L4_MASK             0x0f000000
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 6 is selected to be used for IPv4 only. Then checking bit 6 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
+
+/* Check if it is a tunneling packet */
+#define RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & RTE_PTYPE_TUNNEL_MASK)
+#endif /* RTE_NEXT_ABI */
+
 /**
  * Get the name of a RX offload flag
  *
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 02/18] ixgbe: support unified packet type in vectorized PMD
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
@ 2015-06-23  1:50  3%       ` Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 03/18] mbuf: add definitions of unified packet types Helin Zhang
                         ` (16 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

To unify the packet type, bit masks of packet type for ol_flags are
replaced. In addition, more packet types (UDP, TCP and SCTP) are
supported in vectorized ixgbe PMD.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.
Note that around 2% performance drop (64B) was observed of doing 4
ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 config/common_linuxapp             |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 75 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 3 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v3 changes:
* Put vector ixgbe changes right after mbuf changes.
* Enabled vector ixgbe PMD by default together with changes for updated
  vector PMD.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 617d4a1..5deb55a 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_IXGBE_INC_VECTOR=y
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
 
 #
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index abd10f6..ccea7cd 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -134,6 +134,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
  */
 #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
 
+#ifdef RTE_NEXT_ABI
+#define OLFLAGS_MASK_V  (((uint64_t)PKT_RX_VLAN_PKT << 48) | \
+			((uint64_t)PKT_RX_VLAN_PKT << 32) | \
+			((uint64_t)PKT_RX_VLAN_PKT << 16) | \
+			((uint64_t)PKT_RX_VLAN_PKT))
+#else
 #define OLFLAGS_MASK     ((uint16_t)(PKT_RX_VLAN_PKT | PKT_RX_IPV4_HDR |\
 				     PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\
 				     PKT_RX_IPV6_HDR_EXT))
@@ -142,11 +148,26 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 			  ((uint64_t)OLFLAGS_MASK << 16) | \
 			  ((uint64_t)OLFLAGS_MASK))
 #define PTYPE_SHIFT    (1)
+#endif /* RTE_NEXT_ABI */
+
 #define VTAG_SHIFT     (3)
 
 static inline void
 desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
+#ifdef RTE_NEXT_ABI
+	__m128i vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT);
+	vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V;
+#else
 	__m128i ptype0, ptype1, vtag0, vtag1;
 	union {
 		uint16_t e[4];
@@ -166,6 +187,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 
 	ptype1 = _mm_or_si128(ptype1, vtag1);
 	vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V;
+#endif /* RTE_NEXT_ABI */
 
 	rx_pkts[0]->ol_flags = vol.e[0];
 	rx_pkts[1]->ol_flags = vol.e[1];
@@ -196,6 +218,18 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	int pos;
 	uint64_t var;
 	__m128i shuf_msk;
+#ifdef RTE_NEXT_ABI
+	__m128i crc_adjust = _mm_set_epi16(
+				0, 0, 0,    /* ignore non-length fields */
+				-rxq->crc_len, /* sub crc on data_len */
+				0,          /* ignore high-16bits of pkt_len */
+				-rxq->crc_len, /* sub crc on pkt_len */
+				0, 0            /* ignore pkt_type field */
+			);
+	__m128i dd_check, eop_check;
+	__m128i desc_mask = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF,
+					  0xFFFFFFFF, 0xFFFF07F0);
+#else
 	__m128i crc_adjust = _mm_set_epi16(
 				0, 0, 0, 0, /* ignore non-length fields */
 				0,          /* ignore high-16bits of pkt_len */
@@ -204,6 +238,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 				0            /* ignore pkt_type field */
 			);
 	__m128i dd_check, eop_check;
+#endif /* RTE_NEXT_ABI */
 
 	if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST))
 		return 0;
@@ -232,6 +267,18 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
 
 	/* mask to shuffle from desc. to mbuf */
+#ifdef RTE_NEXT_ABI
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		1,           /* octet 1, 8 bits pkt_type field */
+		0            /* octet 0, 4 bits offset 4 pkt_type field */
+		);
+#else
 	shuf_msk = _mm_set_epi8(
 		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
 		0xFF, 0xFF,  /* skip high 16 bits vlan_macip, zero out */
@@ -241,18 +288,28 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		13, 12,      /* octet 12~13, 16 bits data_len */
 		0xFF, 0xFF   /* skip pkt_type field */
 		);
+#endif /* RTE_NEXT_ABI */
 
 	/* Cache is empty -> need to scan the buffer rings, but first move
 	 * the next 'n' mbufs into the cache */
 	sw_ring = &rxq->sw_ring[rxq->rx_tail];
 
-	/*
-	 * A. load 4 packet in one loop
+#ifdef RTE_NEXT_ABI
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
 	 * B. copy 4 mbuf point from swring to rx_pkts
 	 * C. calc the number of DD bits among the 4 packets
 	 * [C*. extract the end-of-packet bit, if requested]
 	 * D. fill info. from desc to mbuf
 	 */
+#else
+	/* A. load 4 packet in one loop
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+#endif /* RTE_NEXT_ABI */
 	for (pos = 0, nb_pkts_recd = 0; pos < RTE_IXGBE_VPMD_RX_BURST;
 			pos += RTE_IXGBE_DESCS_PER_LOOP,
 			rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
@@ -289,6 +346,16 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
 
+#ifdef RTE_NEXT_ABI
+		/* A* mask out 0~3 bits RSS type */
+		descs[3] = _mm_and_si128(descs[3], desc_mask);
+		descs[2] = _mm_and_si128(descs[2], desc_mask);
+
+		/* A* mask out 0~3 bits RSS type */
+		descs[1] = _mm_and_si128(descs[1], desc_mask);
+		descs[0] = _mm_and_si128(descs[0], desc_mask);
+#endif /* RTE_NEXT_ABI */
+
 		/* avoid compiler reorder optimization */
 		rte_compiler_barrier();
 
@@ -301,7 +368,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* C.1 4=>2 filter staterr info only */
 		sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);
 
+#ifdef RTE_NEXT_ABI
+		/* set ol_flags with vlan packet type */
+#else
 		/* set ol_flags with packet type and vlan tag */
+#endif /* RTE_NEXT_ABI */
 		desc_to_olflags_v(descs, &rx_pkts[pos]);
 
 		/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
@ 2015-06-23  1:50  4%       ` Helin Zhang
  2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
                         ` (17 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

In order to unify the packet type, the field of 'packet_type' in
'struct rte_mbuf' needs to be extended from 16 to 32 bits.
Accordingly, some fields in 'struct rte_mbuf' are re-organized to
support this change for Vector PMD. As 'struct rte_kni_mbuf' for
KNI should be right mapped to 'struct rte_mbuf', it should be
modified accordingly. In addition, Vector PMD of ixgbe is disabled
by default, as 'struct rte_mbuf' changed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 config/common_linuxapp                             |  2 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |  6 +++++
 lib/librte_mbuf/rte_mbuf.h                         | 26 ++++++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Disabled vector ixgbe PMD by default, as mbuf layout changed.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.
* Integrated with changes of QinQ stripping/insertion.

v8 changes:
* Moved the field of 'vlan_tci_outer' in 'struct rte_mbuf' to the end
  of the 1st cache line, to avoid breaking any vectorized PMD storing.

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5deb55a..617d4a1 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_IXGBE_INC_VECTOR=y
+CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
 
 #
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 1e55c2d..e9f38bd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -117,9 +117,15 @@ struct rte_kni_mbuf {
 	uint16_t data_off;      /**< Start address of data in segment buffer. */
 	char pad1[4];
 	uint64_t ol_flags;      /**< Offload features. */
+#ifdef RTE_NEXT_ABI
+	char pad2[4];
+	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
+	uint16_t data_len;      /**< Amount of data in segment buffer. */
+#else
 	char pad2[2];
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
+#endif
 
 	/* fields on second cache line */
 	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a0f3d3b..0315561 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -275,6 +275,28 @@ struct rte_mbuf {
 	/* remaining bytes are set on RX when pulling packet from descriptor */
 	MARKER rx_descriptor_fields1;
 
+#ifdef RTE_NEXT_ABI
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types.
+	 */
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
+#else /* RTE_NEXT_ABI */
 	/**
 	 * The packet type, which is used to indicate ordinary packet and also
 	 * tunneled packet format, i.e. each number is represented a type of
@@ -286,6 +308,7 @@ struct rte_mbuf {
 	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
 	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
+#endif /* RTE_NEXT_ABI */
 	union {
 		uint32_t rss;     /**< RSS hash result if RSS enabled */
 		struct {
@@ -306,6 +329,9 @@ struct rte_mbuf {
 	} hash;                   /**< hash information */
 
 	uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
+#ifdef RTE_NEXT_ABI
+	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
+#endif /* RTE_NEXT_ABI */
 
 	/* second cache line - fields only used in slow path or on TX */
 	MARKER cacheline1 __rte_cache_aligned;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 00/18] unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (17 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 18/18] mbuf: remove old packet type bit masks Helin Zhang
@ 2015-06-23  1:50  4%     ` Helin Zhang
  2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
                         ` (18 more replies)
  18 siblings, 19 replies; 200+ results
From: Helin Zhang @ 2015-06-23  1:50 UTC (permalink / raw)
  To: dev

Currently only 6 bits which are stored in ol_flags are used to indicate the
packet types. This is not enough, as some NIC hardware can recognize quite
a lot of packet types, e.g i40e hardware can recognize more than 150 packet
types. Hiding those packet types hides hardware offload capabilities which
could be quite useful for improving performance and for end users.
So an unified packet types are needed to support all possible PMDs. A 16
bits packet_type in mbuf structure can be changed to 32 bits and used for
this purpose. In addition, all packet types stored in ol_flag field should
be deleted at all, and 6 bits of ol_flags can be save as the benifit.

Initially, 32 bits of packet_type can be divided into several sub fields to
indicate different packet type information of a packet. The initial design
is to divide those bits into fields for L2 types, L3 types, L4 types, tunnel
types, inner L2 types, inner L3 types and inner L4 types. All PMDs should
translate the offloaded packet types into these 7 fields of information, for
user applications.

To avoid breaking ABI compatibility, currently all the code changes for
unified packet type are disabled at compile time by default. Users can enable
it manually by defining the macro of RTE_NEXT_ABI. The code changes will be
valid by default in a future release, and the old version will be deleted
accordingly, after the ABI change process is done.

Note that this patch set should be integrated after another patch set for
'[PATCH v3 0/7] support i40e QinQ stripping and insertion', to clearly solve
the conflict during integration. As both patch sets modified 'struct rte_mbuf',
and the final layout of the 'struct rte_mbuf' is key to vectorized ixgbe PMD.

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
* Used redefined packet types and enlarged packet_type field for all PMDs
  and corresponding applications.
* Removed changes in bond and its relevant application, as there is no need
  at all according to the recent bond changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Put vector ixgbe changes right after mbuf changes.
* Disabled vector ixgbe PMD by default, as mbuf layout changed, and then
  re-enabled it after vector ixgbe PMD updated.
* Put the definitions of unified packet type into a single patch.
* Minor bug fixes and enhancements in l3fwd example.

v4 changes:
* Added detailed description of each packet types.
* Supported unified packet type of fm10k.
* Added printing logs of packet types of each received packet for rxonly
  mode in testpmd.
* Removed several useless code lines which block packet type unification from
  app/test/packet_burst_generator.c.

v5 changes:
* Added more detailed description for each packet types, together with examples.
* Rolled back the macro definitions of RX packet flags, for ABI compitability.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.
* Integrated with patch set for '[PATCH v3 0/7] support i40e QinQ stripping
  and insertion', to clearly solve the conflicts during merging.

v8 changes:
* Moved the field of 'vlan_tci_outer' in 'struct rte_mbuf' to the end of the 1st
  cache line, to avoid breaking any vectorized PMD storing, as fields of
  'packet_type, pkt_len, data_len, vlan_tci, rss' should be in an contiguous 128
  bits.

Helin Zhang (18):
  mbuf: redefine packet_type in rte_mbuf
  ixgbe: support unified packet type in vectorized PMD
  mbuf: add definitions of unified packet types
  e1000: replace bit mask based packet type with unified packet type
  ixgbe: replace bit mask based packet type with unified packet type
  i40e: replace bit mask based packet type with unified packet type
  enic: replace bit mask based packet type with unified packet type
  vmxnet3: replace bit mask based packet type with unified packet type
  fm10k: replace bit mask based packet type with unified packet type
  app/test-pipeline: replace bit mask based packet type with unified
    packet type
  app/testpmd: replace bit mask based packet type with unified packet
    type
  app/test: Remove useless code
  examples/ip_fragmentation: replace bit mask based packet type with
    unified packet type
  examples/ip_reassembly: replace bit mask based packet type with
    unified packet type
  examples/l3fwd-acl: replace bit mask based packet type with unified
    packet type
  examples/l3fwd-power: replace bit mask based packet type with unified
    packet type
  examples/l3fwd: replace bit mask based packet type with unified packet
    type
  mbuf: remove old packet type bit masks

 app/test-pipeline/pipeline_hash.c                  |  13 +
 app/test-pmd/csumonly.c                            |  14 +
 app/test-pmd/rxonly.c                              | 183 +++++++
 app/test/packet_burst_generator.c                  |   6 +-
 drivers/net/e1000/igb_rxtx.c                       | 102 ++++
 drivers/net/enic/enic_main.c                       |  26 +
 drivers/net/fm10k/fm10k_rxtx.c                     |  27 ++
 drivers/net/i40e/i40e_rxtx.c                       | 528 +++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.c                     | 163 +++++++
 drivers/net/ixgbe/ixgbe_rxtx_vec.c                 |  75 ++-
 drivers/net/vmxnet3/vmxnet3_rxtx.c                 |   8 +
 examples/ip_fragmentation/main.c                   |   9 +
 examples/ip_reassembly/main.c                      |   9 +
 examples/l3fwd-acl/main.c                          |  29 +-
 examples/l3fwd-power/main.c                        |   8 +
 examples/l3fwd/main.c                              | 123 ++++-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |   6 +
 lib/librte_mbuf/rte_mbuf.c                         |   4 +
 lib/librte_mbuf/rte_mbuf.h                         | 517 ++++++++++++++++++++
 19 files changed, 1837 insertions(+), 13 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 08/12] mempool: allow config override on element alignment
  @ 2015-06-23  0:31  3%   ` Ananyev, Konstantin
  2015-06-23 20:43  4%     ` Cyril Chemparathy
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-06-23  0:31 UTC (permalink / raw)
  To: Cyril Chemparathy, dev

Hi Cyril,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Cyril Chemparathy
> Sent: Monday, June 22, 2015 7:59 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 08/12] mempool: allow config override on element alignment
> 
> On TILE-Gx and TILE-Mx platforms, the buffers fed into the hardware
> buffer manager require a 128-byte alignment.  With this change, we
> allow configuration based override of the element alignment, and
> default to RTE_CACHE_LINE_SIZE if left unspecified.
> 
> Change-Id: I9cd789d92b0bc9c8f44a633de59bb04d45d927a7
> Signed-off-by: Cyril Chemparathy <cchemparathy@ezchip.com>
> ---
>  lib/librte_mempool/rte_mempool.c | 16 +++++++++-------
>  lib/librte_mempool/rte_mempool.h |  6 ++++++
>  2 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 002d3a8..7656b0f 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -120,10 +120,10 @@ static unsigned optimize_object_size(unsigned obj_size)
>  		nrank = 1;
> 
>  	/* process new object size */
> -	new_obj_size = (obj_size + RTE_CACHE_LINE_MASK) / RTE_CACHE_LINE_SIZE;
> +	new_obj_size = (obj_size + RTE_MEMPOOL_ALIGN_MASK) / RTE_MEMPOOL_ALIGN;
>  	while (get_gcd(new_obj_size, nrank * nchan) != 1)
>  		new_obj_size++;
> -	return new_obj_size * RTE_CACHE_LINE_SIZE;
> +	return new_obj_size * RTE_MEMPOOL_ALIGN;
>  }
> 
>  static void
> @@ -267,7 +267,7 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>  #endif
>  	if ((flags & MEMPOOL_F_NO_CACHE_ALIGN) == 0)
>  		sz->header_size = RTE_ALIGN_CEIL(sz->header_size,
> -			RTE_CACHE_LINE_SIZE);
> +			RTE_MEMPOOL_ALIGN);
> 
>  	/* trailer contains the cookie in debug mode */
>  	sz->trailer_size = 0;
> @@ -281,9 +281,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
>  	if ((flags & MEMPOOL_F_NO_CACHE_ALIGN) == 0) {
>  		sz->total_size = sz->header_size + sz->elt_size +
>  			sz->trailer_size;
> -		sz->trailer_size += ((RTE_CACHE_LINE_SIZE -
> -				  (sz->total_size & RTE_CACHE_LINE_MASK)) &
> -				 RTE_CACHE_LINE_MASK);
> +		sz->trailer_size += ((RTE_MEMPOOL_ALIGN -
> +				  (sz->total_size & RTE_MEMPOOL_ALIGN_MASK)) &
> +				 RTE_MEMPOOL_ALIGN_MASK);
>  	}
> 
>  	/*
> @@ -498,7 +498,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
>  	 * cache-aligned
>  	 */
>  	private_data_size = (private_data_size +
> -			     RTE_CACHE_LINE_MASK) & (~RTE_CACHE_LINE_MASK);
> +			     RTE_MEMPOOL_ALIGN_MASK) & (~RTE_MEMPOOL_ALIGN_MASK);
> 
>  	if (! rte_eal_has_hugepages()) {
>  		/*
> @@ -525,6 +525,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
>  	 * enough to hold mempool header and metadata plus mempool objects.
>  	 */
>  	mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
> +	mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
>  	if (vaddr == NULL)
>  		mempool_size += (size_t)objsz.total_size * n;
> 
> @@ -580,6 +581,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
>  	/* calculate address of the first element for continuous mempool. */
>  	obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
>  		private_data_size;
> +	obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);
> 
>  	/* populate address translation fields. */
>  	mp->pg_num = pg_num;
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 380d60b..9321b86 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -142,6 +142,12 @@ struct rte_mempool_objsz {
>  /** Mempool over one chunk of physically continuous memory */
>  #define	MEMPOOL_PG_NUM_DEFAULT	1
> 
> +#ifndef RTE_MEMPOOL_ALIGN
> +#define RTE_MEMPOOL_ALIGN	RTE_CACHE_LINE_SIZE
> +#endif
> +
> +#define RTE_MEMPOOL_ALIGN_MASK	(RTE_MEMPOOL_ALIGN - 1)

I am probably a bit late with my comments, but why not make it a runtime decision then?
I know we can't add a new parameter to mempool_xmem_create() without ABI breakage,
but we can make some global variable for now, that could be setup at init time or something similar. 

> +
>  /**
>   * Mempool object header structure
>   *
> --
> 2.1.2

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
  2015-06-22 20:16  0%   ` Thomas Monjalon
  2015-06-22 20:23  0%     ` Cyril Chemparathy
@ 2015-06-22 20:34  0%     ` Cyril Chemparathy
  1 sibling, 0 replies; 200+ results
From: Cyril Chemparathy @ 2015-06-22 20:34 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Mon, 22 Jun 2015 22:16:59 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-06-05 15:31, Dumitrescu, Cristian:
> > > Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
> > > meta-data fields.
> > > Forcing aligned accesses is not really required, so this is
> > > removing an unneeded constraint.
> > > This issue was met during testing of the new version of the
> > > ip_pipeline application. There is no performance impact.
> > > This change has no ABI impact, as the previous code that uses
> > > aligned accesses continues to run without any issues.
> > > 
> > > Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
> > 
> > Ack-ed by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> 
> Applied, thanks
> 
> Cyril, feel free to fix it if it breaks with Tile arch.

Also, in the code, doesn't the following break when mbuf_priv_size != 0?

> #define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)          \
>         (&((uint8_t *) &(mbuf)[1])[offset])


Thanks
-- Cyril.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
  2015-06-22 20:16  0%   ` Thomas Monjalon
@ 2015-06-22 20:23  0%     ` Cyril Chemparathy
  2015-06-22 20:34  0%     ` Cyril Chemparathy
  1 sibling, 0 replies; 200+ results
From: Cyril Chemparathy @ 2015-06-22 20:23 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Mon, 22 Jun 2015 22:16:59 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-06-05 15:31, Dumitrescu, Cristian:
> > > Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
> > > meta-data fields.
> > > Forcing aligned accesses is not really required, so this is
> > > removing an unneeded constraint.
> > > This issue was met during testing of the new version of the
> > > ip_pipeline application. There is no performance impact.
> > > This change has no ABI impact, as the previous code that uses
> > > aligned accesses continues to run without any issues.
> > > 
> > > Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
> > 
> > Ack-ed by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> 
> Applied, thanks
> 
> Cyril, feel free to fix it if it breaks with Tile arch.

Why define these locally within rte_port.h? Shouldn't these macros
really be in rte_mbuf.h?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
  2015-06-05 15:31  0% ` Dumitrescu, Cristian
@ 2015-06-22 20:16  0%   ` Thomas Monjalon
  2015-06-22 20:23  0%     ` Cyril Chemparathy
  2015-06-22 20:34  0%     ` Cyril Chemparathy
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2015-06-22 20:16 UTC (permalink / raw)
  To: Mrzyglod, DanielX T; +Cc: dev

2015-06-05 15:31, Dumitrescu, Cristian:
> > Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
> > meta-data fields.
> > Forcing aligned accesses is not really required, so this is removing an
> > unneeded constraint.
> > This issue was met during testing of the new version of the ip_pipeline
> > application. There is no performance impact.
> > This change has no ABI impact, as the previous code that uses aligned
> > accesses continues to run without any issues.
> > 
> > Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
> 
> Ack-ed by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

Applied, thanks

Cyril, feel free to fix it if it breaks with Tile arch.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] DPDK v2.0.0 has different rte_eal_pci_probe() behavior
       [not found]           ` <CAO1kT8_C2QJUrNk-fqOQd=WmOkpvNw5jCvxEhfPdHwyCwBuyKA@mail.gmail.com>
@ 2015-06-22  0:32  4%         ` Matthew Hall
  0 siblings, 0 replies; 200+ results
From: Matthew Hall @ 2015-06-22  0:32 UTC (permalink / raw)
  To: <dev@dpdk.org>

On Jun 21, 2015, at 3:54 PM, Tom Barbette <tom.barbette@ulg.ac.be> wrote:
> Application call to rte_eal_pci_probe() is not needed anymore since DPDK 1.8.
> 
> http://dpdk.org/ml/archives/dev/2014-September/005890.html
> 
> You were not wrong before, it is just a change in DPDK. I came across the same problem a few days ago.
> 
> Tom Barbette

So, we have a good practical example below about ABI compatibility.

The prototype and name of the rte_eal_pci_probe() was kept exactly the same, and it compiled fine with no change, but it fails at runtime because it causes a dual-init of all the PCI devices and hits a resource conflict in the process.

Thus it's important to remember you can break compatibility even if the ABI stays the same, if the APIs themselves don't behave the same over time...

Matthew.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 17:02  8%             ` Thomas Monjalon
@ 2015-06-19 17:57  9%               ` Thomas F Herbert
  0 siblings, 0 replies; 200+ results
From: Thomas F Herbert @ 2015-06-19 17:57 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



On 6/19/15 1:02 PM, Thomas Monjalon wrote:
> 2015-06-19 12:13, Thomas F Herbert:
>>
>> On 6/19/15 9:16 AM, Thomas Monjalon wrote:
>>> 2015-06-19 09:02, Neil Horman:
>>>> On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
>>>>> 2015-06-19 06:26, Neil Horman:
>>>>>> On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
>>>>>>> For the 2.1 release, I think we should agree to make patches that change
>>>>>>> the ABI controllable via a compile-time option. I like Olivier's proposal
>>>>>>> on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
>>>>>>> changes instead of a separate option per patch set (see
>>>>>>> http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
>>>>>>> should rework the affected patch sets to use that approach for 2.1.
>>>>>>
>>>>>> This is a bad idea.  Making ABI dependent on compile time options isn't a
>>>>>> maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
>>>>>> work (that is to say you make it impossible to really tell what ABI version you
>>>>>> are building).
>>>>>
>>>>> The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
>>>>> So one ABI version number refers always to the same ABI.
>>>>>
>>>>>> If you have two compile time options that modify the ABI, you
>>>>>> have to burn through 4 possible LIBABIVER version values to accomodate all
>>>>>> possible combinations, and then you need to remember that when you make them
>>>>>> statically applicable.
>>>>>
>>>>> The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
>>>>>
>>>>> Your intent when introducing ABI policy was to allow smooth porting of
>>>>> applications from a DPDK version to another. Right?
>>>>> The adopted solution was to provide backward compatibility during 1 release.
>>>>> But there are cases where it's not possible. So the policy was to notice
>>>>> the future change and wait one release cycle to break the ABI (failing
>>>>> compatibility goals).
>>>>> The compile-time option may provide an alternative DPDK packaging when the
>>>>> ABI backward compatibility cannot be provided (case of mbuf changes).
>>>>> In such case, it's still possible to upgrade DPDK by providing 2 versions of
>>>>> DPDK libs. So the existing apps continue to link with the previous ABI and
>>>>> have the possibility of migrating to the new one.
>>>>> Another advantage of this approach is that we don't have to wait 1 release
>>>>> to integrate the changes.
>>>>> The last advantage is to benefit early of these changes with static libraries.
>>>>
>>>> Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
>>>> Implementing an ABI preview option like this implies the notion that, after a
>>>> release, you have to remove all the ifdefs that you inserted to create the new
>>>> ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
>>>> large, and is predicated on the centralization of work effort (that is to say
>>>> you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
>>>> from the build' patch every release.
>>>
>>> It won't be so huge if we reserve the NEXT_ABI solution to changes which cannot
>>> have easy backward compatibility with the compat macros you introduced.
>>> I feel I can do the job of removing the ifdefs NEXT_ABI after each release.
>>> At the same time, the deprecated API, using the compat macros, will be removed.
>>>
>>>> What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
>>>> the sort that Thomas Herbert proposed a few weeks ago).
>>>
>>> This tree was created after Thomas' request:
>>> 	http://dpdk.org/browse/next/dpdk-next/
>>
>> Thomas, I am sorry if I went quiet for awhile but I was on personal
>> travel with inconsistent access so I almost missed most of this
>> discussion about ABI changes.
>>
>> My understanding of the purpose of the dpdk-next tree is to validate
>> patches by applying and compiling against a "pull" from the main dpdk
>> tree. I think a good way to handle ABI change while effectively using
>> the dpdk-next might be to do as follows:
>>
>> Create a specific branch for the new ABI such as 2.X in the main dpdk
>> tree. Once that 2.X branch is created, dpdk-next would mirror the 2.X
>> branch along with master.
>>
>> Since, dpdk-next would also have the 2.X branch that is in the main dpdk
>> tree, submitted patches could be applied to either the main branch or
>> the new-ABI 2.X branch. Providing that patch submitters make it clear
>> whether a submitted patch is for the new ABI or the old ABI, dpdk-next
>> could continue to validate the patches for either the main branch or the
>> new ABI 2.X branch.
>
> What is the benefit of a new-ABI branch in the -next tree?
I don't think that there is any specific benefit to an new-ABI branch in 
the dpdk-next tree. I was responding to the suggestion above and perhaps 
I missread it. It sounded like what was being proposed was to use the 
dpdk-next tree specifically for pre-integration of new-ABI. I don't 
think this is of any benefit either.

However if it should be decided to integrate new-ABI patches in a branch 
of dpdk rather then in a separate new-ABI tree, then net-next can 
"mirror" that branch along with the master branch so patches can be 
smoke tested whether they are submitted to the master or to the new-ABI 
branch.
>
> The goal of this discussion is to find a consensus on ABI policy to
> smoothly integrate new features without forcing users of shared libraries
> to re-build their application when upgrading DPDK, and let them do the
> transition before the next upgrade.
I understand this and I think it is a good suggestion to have a 
mechanism to ease the transition.
>

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH v3 8/9] doc: announce ABI change of librte_malloc
  2015-06-19 17:21  4%   ` [dpdk-dev] [PATCH v3 0/9] Dynamic memzone Sergio Gonzalez Monroy
  2015-06-19 17:21  1%     ` [dpdk-dev] [PATCH v3 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
@ 2015-06-19 17:21 14%     ` Sergio Gonzalez Monroy
  1 sibling, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-19 17:21 UTC (permalink / raw)
  To: dev

Announce the creation of dummy malloc library for 2.1 and removal of
such library, now integrated in librte_eal, for 2.2 release.

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..2aaf900 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* librte_malloc library has been integrated into librte_eal. The 2.1 release creates a dummy/empty malloc library to fulfill binaries with dynamic linking dependencies on librte_malloc.so. Such dummy library will not be created from release 2.2 so binaries will need to be rebuilt.
-- 
1.9.3

^ permalink raw reply	[relevance 14%]

* [dpdk-dev] [PATCH v3 0/9] Dynamic memzone
    2015-06-06 10:32  1%   ` [dpdk-dev] [PATCH v2 2/7] eal: memzone allocated by malloc Sergio Gonzalez Monroy
@ 2015-06-19 17:21  4%   ` Sergio Gonzalez Monroy
  2015-06-19 17:21  1%     ` [dpdk-dev] [PATCH v3 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
  2015-06-19 17:21 14%     ` [dpdk-dev] [PATCH v3 8/9] doc: announce ABI change of librte_malloc Sergio Gonzalez Monroy
  2015-06-25 14:05  4%   ` [dpdk-dev] [PATCH v4 0/9] Dynamic memzone Sergio Gonzalez Monroy
  2015-06-26 11:32  4%   ` [dpdk-dev] [PATCH v5 0/9] Dynamic memzones Sergio Gonzalez Monroy
  3 siblings, 2 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-19 17:21 UTC (permalink / raw)
  To: dev

Current implemetation allows reserving/creating memzones but not the opposite
(unreserve/free). This affects mempools and other memzone based objects.

>From my point of view, implementing free functionality for memzones would look
like malloc over memsegs.
Thus, this approach moves malloc inside eal (which in turn removes a circular
dependency), where malloc heaps are composed of memsegs.
We keep both malloc and memzone APIs as they are, but memzones allocate its
memory by calling malloc_heap_alloc.
Some extra functionality is required in malloc to allow for boundary constrained
memory requests.
In summary, currently malloc is based on memzones, and with this approach
memzones are based on malloc.

v3:
 - Create dummy librte_malloc
 - Add deprecation notice
 - Rework some of the code
 - Doc update
 - checkpatch

v2:
 - New rte_memzone_free
 - Support memzone len = 0
 - Add all available memsegs to malloc heap at init
 - Update memzone/malloc unit tests

Sergio Gonzalez Monroy (9):
  eal: move librte_malloc to eal/common
  eal: memzone allocated by malloc
  app/test: update malloc/memzone unit tests
  config: remove CONFIG_RTE_MALLOC_MEMZONE_SIZE
  eal: remove free_memseg and references to it
  eal: new rte_memzone_free
  app/test: update unit test with rte_memzone_free
  doc: announce ABI change of librte_malloc
  doc: update malloc documentation

 MAINTAINERS                                       |   9 +-
 app/test/test_malloc.c                            |  86 -----
 app/test/test_memzone.c                           | 441 +++-------------------
 config/common_bsdapp                              |   8 +-
 config/common_linuxapp                            |   8 +-
 doc/guides/prog_guide/env_abstraction_layer.rst   | 220 ++++++++++-
 doc/guides/prog_guide/img/malloc_heap.png         | Bin 81329 -> 80952 bytes
 doc/guides/prog_guide/index.rst                   |   1 -
 doc/guides/prog_guide/malloc_lib.rst              | 233 ------------
 doc/guides/prog_guide/overview.rst                |  11 +-
 doc/guides/rel_notes/abi.rst                      |   1 +
 drivers/net/af_packet/Makefile                    |   1 -
 drivers/net/bonding/Makefile                      |   1 -
 drivers/net/e1000/Makefile                        |   2 +-
 drivers/net/enic/Makefile                         |   2 +-
 drivers/net/fm10k/Makefile                        |   2 +-
 drivers/net/i40e/Makefile                         |   2 +-
 drivers/net/ixgbe/Makefile                        |   2 +-
 drivers/net/mlx4/Makefile                         |   1 -
 drivers/net/null/Makefile                         |   1 -
 drivers/net/pcap/Makefile                         |   1 -
 drivers/net/virtio/Makefile                       |   2 +-
 drivers/net/vmxnet3/Makefile                      |   2 +-
 drivers/net/xenvirt/Makefile                      |   2 +-
 lib/Makefile                                      |   2 +-
 lib/librte_acl/Makefile                           |   2 +-
 lib/librte_eal/bsdapp/eal/Makefile                |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map     |  19 +
 lib/librte_eal/common/Makefile                    |   1 +
 lib/librte_eal/common/eal_common_memzone.c        | 329 ++++++----------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   5 +-
 lib/librte_eal/common/include/rte_malloc.h        | 342 +++++++++++++++++
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/include/rte_memzone.h       |  11 +
 lib/librte_eal/common/malloc_elem.c               | 344 +++++++++++++++++
 lib/librte_eal/common/malloc_elem.h               | 192 ++++++++++
 lib/librte_eal/common/malloc_heap.c               | 206 ++++++++++
 lib/librte_eal/common/malloc_heap.h               |  70 ++++
 lib/librte_eal/common/rte_malloc.c                | 259 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile              |   4 +-
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c         |  17 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map   |  19 +
 lib/librte_hash/Makefile                          |   2 +-
 lib/librte_lpm/Makefile                           |   2 +-
 lib/librte_malloc/Makefile                        |   6 +-
 lib/librte_malloc/malloc_elem.c                   | 320 ----------------
 lib/librte_malloc/malloc_elem.h                   | 190 ----------
 lib/librte_malloc/malloc_heap.c                   | 209 ----------
 lib/librte_malloc/malloc_heap.h                   |  70 ----
 lib/librte_malloc/rte_malloc.c                    | 228 +----------
 lib/librte_malloc/rte_malloc.h                    | 342 -----------------
 lib/librte_malloc/rte_malloc_version.map          |  16 -
 lib/librte_mempool/Makefile                       |   2 -
 lib/librte_port/Makefile                          |   1 -
 lib/librte_ring/Makefile                          |   3 +-
 lib/librte_table/Makefile                         |   1 -
 56 files changed, 1897 insertions(+), 2363 deletions(-)
 delete mode 100644 doc/guides/prog_guide/malloc_lib.rst
 create mode 100644 lib/librte_eal/common/include/rte_malloc.h
 create mode 100644 lib/librte_eal/common/malloc_elem.c
 create mode 100644 lib/librte_eal/common/malloc_elem.h
 create mode 100644 lib/librte_eal/common/malloc_heap.c
 create mode 100644 lib/librte_eal/common/malloc_heap.h
 create mode 100644 lib/librte_eal/common/rte_malloc.c
 delete mode 100644 lib/librte_malloc/malloc_elem.c
 delete mode 100644 lib/librte_malloc/malloc_elem.h
 delete mode 100644 lib/librte_malloc/malloc_heap.c
 delete mode 100644 lib/librte_malloc/malloc_heap.h
 delete mode 100644 lib/librte_malloc/rte_malloc.h

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 2/9] eal: memzone allocated by malloc
  2015-06-19 17:21  4%   ` [dpdk-dev] [PATCH v3 0/9] Dynamic memzone Sergio Gonzalez Monroy
@ 2015-06-19 17:21  1%     ` Sergio Gonzalez Monroy
  2015-06-19 17:21 14%     ` [dpdk-dev] [PATCH v3 8/9] doc: announce ABI change of librte_malloc Sergio Gonzalez Monroy
  1 sibling, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-19 17:21 UTC (permalink / raw)
  To: dev

In the current memory hierarchy, memsegs are groups of physically
contiguous hugepages, memzones are slices of memsegs and malloc further
slices memzones into smaller memory chunks.

This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.

It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c        | 274 ++++++----------------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   2 +-
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/malloc_elem.c               |  68 ++++--
 lib/librte_eal/common/malloc_elem.h               |  14 +-
 lib/librte_eal/common/malloc_heap.c               | 139 ++++++-----
 lib/librte_eal/common/malloc_heap.h               |   6 +-
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 8 files changed, 196 insertions(+), 317 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 888f9e5..943012b 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -50,15 +50,15 @@
 #include <rte_string_fns.h>
 #include <rte_common.h>
 
+#include "malloc_heap.h"
+#include "malloc_elem.h"
 #include "eal_private.h"
 
-/* internal copy of free memory segments */
-static struct rte_memseg *free_memseg = NULL;
-
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
 	const struct rte_mem_config *mcfg;
+	const struct rte_memzone *mz;
 	unsigned i = 0;
 
 	/* get pointer to global configuration */
@@ -68,8 +68,9 @@ memzone_lookup_thread_unsafe(const char *name)
 	 * the algorithm is not optimal (linear), but there are few
 	 * zones and this function should be called at init only
 	 */
-	for (i = 0; i < RTE_MAX_MEMZONE && mcfg->memzone[i].addr != NULL; i++) {
-		if (!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
+	for (i = 0; i < RTE_MAX_MEMZONE; i++) {
+		mz = &mcfg->memzone[i];
+		if (mz->addr != NULL && !strncmp(name, mz->name, RTE_MEMZONE_NAMESIZE))
 			return &mcfg->memzone[i];
 	}
 
@@ -88,39 +89,45 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 			len, socket_id, flags, RTE_CACHE_LINE_SIZE);
 }
 
-/*
- * Helper function for memzone_reserve_aligned_thread_unsafe().
- * Calculate address offset from the start of the segment.
- * Align offset in that way that it satisfy istart alignmnet and
- * buffer of the  requested length would not cross specified boundary.
- */
-static inline phys_addr_t
-align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align,
-	size_t bound)
+/* Find the heap with the greatest free block size */
+static void
+find_heap_max_free_elem(int *s, size_t *len, unsigned align)
 {
-	phys_addr_t addr_offset, bmask, end, start;
-	size_t step;
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-	step = RTE_MAX(align, bound);
-	bmask = ~((phys_addr_t)bound - 1);
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* calculate offset to closest alignment */
-	start = RTE_ALIGN_CEIL(ms->phys_addr, align);
-	addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size > *len) {
+			*len = stats.greatest_free_size;
+			*s = i;
+		}
+	}
+	*len -= (MALLOC_ELEM_OVERHEAD + align);
+}
 
-	while (addr_offset + len < ms->len) {
+/* Find a heap that can allocate the requested size */
+static void
+find_heap_suitable(int *s, size_t len, unsigned align)
+{
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-		/* check, do we meet boundary condition */
-		end = start + len - (len != 0);
-		if ((start & bmask) == (end & bmask))
-			break;
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-		/* calculate next offset */
-		start = RTE_ALIGN_CEIL(start + 1, step);
-		addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size >= len + MALLOC_ELEM_OVERHEAD + align) {
+			*s = i;
+			break;
+		}
 	}
-
-	return (addr_offset);
 }
 
 static const struct rte_memzone *
@@ -128,13 +135,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
 	struct rte_mem_config *mcfg;
-	unsigned i = 0;
-	int memseg_idx = -1;
-	uint64_t addr_offset, seg_offset = 0;
 	size_t requested_len;
-	size_t memseg_len = 0;
-	phys_addr_t memseg_physaddr;
-	void *memseg_addr;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -166,7 +167,6 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (align < RTE_CACHE_LINE_SIZE)
 		align = RTE_CACHE_LINE_SIZE;
 
-
 	/* align length on cache boundary. Check for overflow before doing so */
 	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
 		rte_errno = EINVAL; /* requested size too big */
@@ -180,129 +180,50 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	requested_len = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE,  len);
 
 	/* check that boundary condition is valid */
-	if (bound != 0 &&
-			(requested_len > bound || !rte_is_power_of_2(bound))) {
+	if (bound != 0 && (requested_len > bound || !rte_is_power_of_2(bound))) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	/* find the smallest segment matching requirements */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		/* last segment */
-		if (free_memseg[i].addr == NULL)
-			break;
+	if (len == 0) {
+		if (bound != 0)
+			requested_len = bound;
+		else
+			requested_len = 0;
+	}
 
-		/* empty segment, skip it */
-		if (free_memseg[i].len == 0)
-			continue;
-
-		/* bad socket ID */
-		if (socket_id != SOCKET_ID_ANY &&
-		    free_memseg[i].socket_id != SOCKET_ID_ANY &&
-		    socket_id != free_memseg[i].socket_id)
-			continue;
-
-		/*
-		 * calculate offset to closest alignment that
-		 * meets boundary conditions.
-		 */
-		addr_offset = align_phys_boundary(free_memseg + i,
-			requested_len, align, bound);
-
-		/* check len */
-		if ((requested_len + addr_offset) > free_memseg[i].len)
-			continue;
-
-		/* check flags for hugepage sizes */
-		if ((flags & RTE_MEMZONE_2MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_1G)
-			continue;
-		if ((flags & RTE_MEMZONE_1GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_2M)
-			continue;
-		if ((flags & RTE_MEMZONE_16MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16G)
-			continue;
-		if ((flags & RTE_MEMZONE_16GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16M)
-			continue;
-
-		/* this segment is the best until now */
-		if (memseg_idx == -1) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-		/* find the biggest contiguous zone */
-		else if (len == 0) {
-			if (free_memseg[i].len > memseg_len) {
-				memseg_idx = i;
-				memseg_len = free_memseg[i].len;
-				seg_offset = addr_offset;
-			}
-		}
-		/*
-		 * find the smallest (we already checked that current
-		 * zone length is > len
-		 */
-		else if (free_memseg[i].len + align < memseg_len ||
-				(free_memseg[i].len <= memseg_len + align &&
-				addr_offset < seg_offset)) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
+	if (socket_id == SOCKET_ID_ANY) {
+		if (requested_len == 0)
+			find_heap_max_free_elem(&socket_id, &requested_len, align);
+		else
+			find_heap_suitable(&socket_id, requested_len, align);
+
+		if (socket_id == SOCKET_ID_ANY) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
 	}
 
-	/* no segment found */
-	if (memseg_idx == -1) {
-		/*
-		 * If RTE_MEMZONE_SIZE_HINT_ONLY flag is specified,
-		 * try allocating again without the size parameter otherwise -fail.
-		 */
-		if ((flags & RTE_MEMZONE_SIZE_HINT_ONLY)  &&
-		    ((flags & RTE_MEMZONE_1GB) || (flags & RTE_MEMZONE_2MB)
-		|| (flags & RTE_MEMZONE_16MB) || (flags & RTE_MEMZONE_16GB)))
-			return memzone_reserve_aligned_thread_unsafe(name,
-				len, socket_id, 0, align, bound);
-
+	/* allocate memory on heap */
+	void *mz_addr = malloc_heap_alloc(&mcfg->malloc_heaps[socket_id], NULL,
+			requested_len, flags, align, bound);
+	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
 
-	/* save aligned physical and virtual addresses */
-	memseg_physaddr = free_memseg[memseg_idx].phys_addr + seg_offset;
-	memseg_addr = RTE_PTR_ADD(free_memseg[memseg_idx].addr,
-			(uintptr_t) seg_offset);
-
-	/* if we are looking for a biggest memzone */
-	if (len == 0) {
-		if (bound == 0)
-			requested_len = memseg_len - seg_offset;
-		else
-			requested_len = RTE_ALIGN_CEIL(memseg_physaddr + 1,
-				bound) - memseg_physaddr;
-	}
-
-	/* set length to correct value */
-	len = (size_t)seg_offset + requested_len;
-
-	/* update our internal state */
-	free_memseg[memseg_idx].len -= len;
-	free_memseg[memseg_idx].phys_addr += len;
-	free_memseg[memseg_idx].addr =
-		(char *)free_memseg[memseg_idx].addr + len;
+	const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
 
 	/* fill the zone in config */
 	struct rte_memzone *mz = &mcfg->memzone[mcfg->memzone_idx++];
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
-	mz->phys_addr = memseg_physaddr;
-	mz->addr = memseg_addr;
-	mz->len = requested_len;
-	mz->hugepage_sz = free_memseg[memseg_idx].hugepage_sz;
-	mz->socket_id = free_memseg[memseg_idx].socket_id;
+	mz->phys_addr = rte_malloc_virt2phy(mz_addr);
+	mz->addr = mz_addr;
+	mz->len = (requested_len == 0 ? elem->size : requested_len);
+	mz->hugepage_sz = elem->ms->hugepage_sz;
+	mz->socket_id = elem->ms->socket_id;
 	mz->flags = 0;
-	mz->memseg_id = memseg_idx;
+	mz->memseg_id = elem->ms - rte_eal_get_configuration()->mem_config->memseg;
 
 	return mz;
 }
@@ -419,45 +340,6 @@ rte_memzone_dump(FILE *f)
 }
 
 /*
- * called by init: modify the free memseg list to have cache-aligned
- * addresses and cache-aligned lengths
- */
-static int
-memseg_sanitize(struct rte_memseg *memseg)
-{
-	unsigned phys_align;
-	unsigned virt_align;
-	unsigned off;
-
-	phys_align = memseg->phys_addr & RTE_CACHE_LINE_MASK;
-	virt_align = (unsigned long)memseg->addr & RTE_CACHE_LINE_MASK;
-
-	/*
-	 * sanity check: phys_addr and addr must have the same
-	 * alignment
-	 */
-	if (phys_align != virt_align)
-		return -1;
-
-	/* memseg is really too small, don't bother with it */
-	if (memseg->len < (2 * RTE_CACHE_LINE_SIZE)) {
-		memseg->len = 0;
-		return 0;
-	}
-
-	/* align start address */
-	off = (RTE_CACHE_LINE_SIZE - phys_align) & RTE_CACHE_LINE_MASK;
-	memseg->phys_addr += off;
-	memseg->addr = (char *)memseg->addr + off;
-	memseg->len -= off;
-
-	/* align end address */
-	memseg->len &= ~((uint64_t)RTE_CACHE_LINE_MASK);
-
-	return 0;
-}
-
-/*
  * Init the memzone subsystem
  */
 int
@@ -465,14 +347,10 @@ rte_eal_memzone_init(void)
 {
 	struct rte_mem_config *mcfg;
 	const struct rte_memseg *memseg;
-	unsigned i = 0;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* mirror the runtime memsegs from config */
-	free_memseg = mcfg->free_memseg;
-
 	/* secondary processes don't need to initialise anything */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		return 0;
@@ -485,33 +363,13 @@ rte_eal_memzone_init(void)
 
 	rte_rwlock_write_lock(&mcfg->mlock);
 
-	/* fill in uninitialized free_memsegs */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (memseg[i].addr == NULL)
-			break;
-		if (free_memseg[i].addr != NULL)
-			continue;
-		memcpy(&free_memseg[i], &memseg[i], sizeof(struct rte_memseg));
-	}
-
-	/* make all zones cache-aligned */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (free_memseg[i].addr == NULL)
-			break;
-		if (memseg_sanitize(&free_memseg[i]) < 0) {
-			RTE_LOG(ERR, EAL, "%s(): Sanity check failed\n", __func__);
-			rte_rwlock_write_unlock(&mcfg->mlock);
-			return -1;
-		}
-	}
-
 	/* delete all zones */
 	mcfg->memzone_idx = 0;
 	memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
 
 	rte_rwlock_write_unlock(&mcfg->mlock);
 
-	return 0;
+	return rte_eal_malloc_heap_init();
 }
 
 /* Walk all reserved memory zones */
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 34f5abc..055212a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -73,7 +73,7 @@ struct rte_mem_config {
 	struct rte_memseg memseg[RTE_MAX_MEMSEG];    /**< Physmem descriptors. */
 	struct rte_memzone memzone[RTE_MAX_MEMZONE]; /**< Memzone descriptors. */
 
-	/* Runtime Physmem descriptors. */
+	/* Runtime Physmem descriptors - NOT USED */
 	struct rte_memseg free_memseg[RTE_MAX_MEMSEG];
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index 716216f..b270356 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -40,7 +40,7 @@
 #include <rte_memory.h>
 
 /* Number of free lists per heap, grouped by size. */
-#define RTE_HEAP_NUM_FREELISTS  5
+#define RTE_HEAP_NUM_FREELISTS  13
 
 /**
  * Structure to hold malloc heap
@@ -48,7 +48,6 @@
 struct malloc_heap {
 	rte_spinlock_t lock;
 	LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
-	unsigned mz_count;
 	unsigned alloc_count;
 	size_t total_size;
 } __rte_cache_aligned;
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index a5e1248..b54ee33 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -37,7 +37,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
 #include <rte_per_lcore.h>
@@ -56,10 +55,10 @@
  */
 void
 malloc_elem_init(struct malloc_elem *elem,
-		struct malloc_heap *heap, const struct rte_memzone *mz, size_t size)
+		struct malloc_heap *heap, const struct rte_memseg *ms, size_t size)
 {
 	elem->heap = heap;
-	elem->mz = mz;
+	elem->ms = ms;
 	elem->prev = NULL;
 	memset(&elem->free_list, 0, sizeof(elem->free_list));
 	elem->state = ELEM_FREE;
@@ -70,12 +69,12 @@ malloc_elem_init(struct malloc_elem *elem,
 }
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
 {
-	malloc_elem_init(elem, prev->heap, prev->mz, 0);
+	malloc_elem_init(elem, prev->heap, prev->ms, 0);
 	elem->prev = prev;
 	elem->state = ELEM_BUSY; /* mark busy so its never merged */
 }
@@ -86,12 +85,24 @@ malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
  * fit, return NULL.
  */
 static void *
-elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
+elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	const uintptr_t end_pt = (uintptr_t)elem +
+	const size_t bmask = ~(bound - 1);
+	uintptr_t end_pt = (uintptr_t)elem +
 			elem->size - MALLOC_ELEM_TRAILER_LEN;
-	const uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
-	const uintptr_t new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
+	uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+	uintptr_t new_elem_start;
+
+	/* check boundary */
+	if ((new_data_start & bmask) != ((end_pt - 1) & bmask)) {
+		end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
+		new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+		if (((end_pt - 1) & bmask) != (new_data_start & bmask))
+			return NULL;
+	}
+
+	new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
 
 	/* if the new start point is before the exist start, it won't fit */
 	return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
@@ -102,9 +113,10 @@ elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
  * alignment request from the current element
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,	unsigned align,
+		size_t bound)
 {
-	return elem_start_pt(elem, size, align) != NULL;
+	return elem_start_pt(elem, size, align, bound) != NULL;
 }
 
 /*
@@ -115,10 +127,10 @@ static void
 split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt)
 {
 	struct malloc_elem *next_elem = RTE_PTR_ADD(elem, elem->size);
-	const unsigned old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
-	const unsigned new_elem_size = elem->size - old_elem_size;
+	const size_t old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
+	const size_t new_elem_size = elem->size - old_elem_size;
 
-	malloc_elem_init(split_pt, elem->heap, elem->mz, new_elem_size);
+	malloc_elem_init(split_pt, elem->heap, elem->ms, new_elem_size);
 	split_pt->prev = elem;
 	next_elem->prev = split_pt;
 	elem->size = old_elem_size;
@@ -168,8 +180,9 @@ malloc_elem_free_list_index(size_t size)
 void
 malloc_elem_free_list_insert(struct malloc_elem *elem)
 {
-	size_t idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
+	size_t idx;
 
+	idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
 	elem->state = ELEM_FREE;
 	LIST_INSERT_HEAD(&elem->heap->free_head[idx], elem, free_list);
 }
@@ -190,12 +203,26 @@ elem_free_list_remove(struct malloc_elem *elem)
  * is not done here, as it's done there previously.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	struct malloc_elem *new_elem = elem_start_pt(elem, size, align);
-	const unsigned old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
+	const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	const size_t trailer_size = elem->size - old_elem_size - size -
+		MALLOC_ELEM_OVERHEAD;
+
+	elem_free_list_remove(elem);
 
-	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE){
+	if (trailer_size > MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+		/* split it, too much free space after elem */
+		struct malloc_elem *new_free_elem =
+				RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
+
+		split_elem(elem, new_free_elem);
+		malloc_elem_free_list_insert(new_free_elem);
+	}
+
+	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
 		/* don't split it, pad the element instead */
 		elem->state = ELEM_BUSY;
 		elem->pad = old_elem_size;
@@ -208,8 +235,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 			new_elem->size = elem->size - elem->pad;
 			set_header(new_elem);
 		}
-		/* remove element from free list */
-		elem_free_list_remove(elem);
 
 		return new_elem;
 	}
@@ -219,7 +244,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 	 * Re-insert original element, in case its new size makes it
 	 * belong on a different list.
 	 */
-	elem_free_list_remove(elem);
 	split_elem(elem, new_elem);
 	new_elem->state = ELEM_BUSY;
 	malloc_elem_free_list_insert(elem);
diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h
index 9790b1a..e05d2ea 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -47,9 +47,9 @@ enum elem_state {
 
 struct malloc_elem {
 	struct malloc_heap *heap;
-	struct malloc_elem *volatile prev;      /* points to prev elem in memzone */
+	struct malloc_elem *volatile prev;      /* points to prev elem in memseg */
 	LIST_ENTRY(malloc_elem) free_list;      /* list of free elements in heap */
-	const struct rte_memzone *mz;
+	const struct rte_memseg *ms;
 	volatile enum elem_state state;
 	uint32_t pad;
 	size_t size;
@@ -136,11 +136,11 @@ malloc_elem_from_data(const void *data)
 void
 malloc_elem_init(struct malloc_elem *elem,
 		struct malloc_heap *heap,
-		const struct rte_memzone *mz,
+		const struct rte_memseg *ms,
 		size_t size);
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem,
@@ -151,14 +151,16 @@ malloc_elem_mkend(struct malloc_elem *elem,
  * of the requested size and with the requested alignment
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * reserve a block of data in an existing malloc_elem. If the malloc_elem
  * is much larger than the data block requested, we split the element in two.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_alloc(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * free a malloc_elem block by adding it to the free list. If the
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index defb903..f5fff96 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -39,7 +39,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_launch.h>
@@ -54,123 +53,104 @@
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
-/* since the memzone size starts with a digit, it will appear unquoted in
- * rte_config.h, so quote it so it can be passed to rte_str_to_size */
-#define MALLOC_MEMZONE_SIZE RTE_STR(RTE_MALLOC_MEMZONE_SIZE)
-
-/*
- * returns the configuration setting for the memzone size as a size_t value
- */
-static inline size_t
-get_malloc_memzone_size(void)
+static unsigned
+check_hugepage_sz(unsigned flags, size_t hugepage_sz)
 {
-	return rte_str_to_size(MALLOC_MEMZONE_SIZE);
+	unsigned ret = 1;
+
+	if ((flags & RTE_MEMZONE_2MB) && hugepage_sz == RTE_PGSIZE_1G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_1GB) && hugepage_sz == RTE_PGSIZE_2M)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16MB) && hugepage_sz == RTE_PGSIZE_16G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16GB) && hugepage_sz == RTE_PGSIZE_16M)
+		ret = 0;
+
+	return ret;
 }
 
 /*
- * reserve an extra memory zone and make it available for use by a particular
- * heap. This reserves the zone and sets a dummy malloc_elem header at the end
+ * Expand the heap with a memseg.
+ * This reserves the zone and sets a dummy malloc_elem header at the end
  * to prevent overflow. The rest of the zone is added to free list as a single
  * large free block
  */
-static int
-malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
+static void
+malloc_heap_add_memseg(struct malloc_heap *heap, struct rte_memseg *ms)
 {
-	const unsigned mz_flags = 0;
-	const size_t block_size = get_malloc_memzone_size();
-	/* ensure the data we want to allocate will fit in the memzone */
-	const size_t min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
-	const struct rte_memzone *mz = NULL;
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	unsigned numa_socket = heap - mcfg->malloc_heaps;
-
-	size_t mz_size = min_size;
-	if (mz_size < block_size)
-		mz_size = block_size;
-
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	snprintf(mz_name, sizeof(mz_name), "MALLOC_S%u_HEAP_%u",
-		     numa_socket, heap->mz_count++);
-
-	/* try getting a block. if we fail and we don't need as big a block
-	 * as given in the config, we can shrink our request and try again
-	 */
-	do {
-		mz = rte_memzone_reserve(mz_name, mz_size, numa_socket,
-					 mz_flags);
-		if (mz == NULL)
-			mz_size /= 2;
-	} while (mz == NULL && mz_size > min_size);
-	if (mz == NULL)
-		return -1;
-
 	/* allocate the memory block headers, one at end, one at start */
-	struct malloc_elem *start_elem = (struct malloc_elem *)mz->addr;
-	struct malloc_elem *end_elem = RTE_PTR_ADD(mz->addr,
-			mz_size - MALLOC_ELEM_OVERHEAD);
+	struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
+	struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr,
+			ms->len - MALLOC_ELEM_OVERHEAD);
 	end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
+	const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
 
-	const unsigned elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
-	malloc_elem_init(start_elem, heap, mz, elem_size);
+	malloc_elem_init(start_elem, heap, ms, elem_size);
 	malloc_elem_mkend(end_elem, start_elem);
 	malloc_elem_free_list_insert(start_elem);
 
-	/* increase heap total size by size of new memzone */
-	heap->total_size+=mz_size - MALLOC_ELEM_OVERHEAD;
-	return 0;
+	heap->total_size += elem_size;
 }
 
 /*
  * Iterates through the freelist for a heap to find a free element
  * which can store data of the required size and with the requested alignment.
+ * If size is 0, find the biggest available elem.
  * Returns null on failure, or pointer to element on success.
  */
 static struct malloc_elem *
-find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
+find_suitable_element(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
 	size_t idx;
-	struct malloc_elem *elem;
+	struct malloc_elem *elem, *alt_elem = NULL;
 
 	for (idx = malloc_elem_free_list_index(size);
-		idx < RTE_HEAP_NUM_FREELISTS; idx++)
-	{
+			idx < RTE_HEAP_NUM_FREELISTS; idx++) {
 		for (elem = LIST_FIRST(&heap->free_head[idx]);
-			!!elem; elem = LIST_NEXT(elem, free_list))
-		{
-			if (malloc_elem_can_hold(elem, size, align))
-				return elem;
+				!!elem; elem = LIST_NEXT(elem, free_list)) {
+			if (malloc_elem_can_hold(elem, size, align, bound)) {
+				if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
+					return elem;
+				alt_elem = elem;
+			}
 		}
 	}
+
+	if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		return alt_elem;
+
 	return NULL;
 }
 
 /*
- * Main function called by malloc to allocate a block of memory from the
- * heap. It locks the free list, scans it, and adds a new memzone if the
- * scan fails. Once the new memzone is added, it re-scans and should return
+ * Main function to allocate a block of memory from the heap.
+ * It locks the free list, scans it, and adds a new memseg if the
+ * scan fails. Once the new memseg is added, it re-scans and should return
  * the new element after releasing the lock.
  */
 void *
 malloc_heap_alloc(struct malloc_heap *heap,
-		const char *type __attribute__((unused)), size_t size, unsigned align)
+		const char *type __attribute__((unused)), size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
+	struct malloc_elem *elem;
+
 	size = RTE_CACHE_LINE_ROUNDUP(size);
 	align = RTE_CACHE_LINE_ROUNDUP(align);
+
 	rte_spinlock_lock(&heap->lock);
-	struct malloc_elem *elem = find_suitable_element(heap, size, align);
-	if (elem == NULL){
-		if ((malloc_heap_add_memzone(heap, size, align)) == 0)
-			elem = find_suitable_element(heap, size, align);
-	}
 
-	if (elem != NULL){
-		elem = malloc_elem_alloc(elem, size, align);
+	elem = find_suitable_element(heap, size, flags, align, bound);
+	if (elem != NULL) {
+		elem = malloc_elem_alloc(elem, size, align, bound);
 		/* increase heap's count of allocated elements */
 		heap->alloc_count++;
 	}
 	rte_spinlock_unlock(&heap->lock);
-	return elem == NULL ? NULL : (void *)(&elem[1]);
 
+	return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
 /*
@@ -207,3 +187,20 @@ malloc_heap_get_stats(const struct malloc_heap *heap,
 	return 0;
 }
 
+int
+rte_eal_malloc_heap_init(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned ms_cnt;
+	struct rte_memseg *ms;
+
+	if (mcfg == NULL)
+		return -1;
+
+	for (ms = &mcfg->memseg[0], ms_cnt = 0;
+			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
+			ms_cnt++, ms++)
+		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+
+	return 0;
+}
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a47136d..3ccbef0 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -53,15 +53,15 @@ malloc_get_numa_socket(void)
 }
 
 void *
-malloc_heap_alloc(struct malloc_heap *heap, const char *type,
-		size_t size, unsigned align);
+malloc_heap_alloc(struct malloc_heap *heap,	const char *type, size_t size,
+		unsigned flags, size_t align, size_t bound);
 
 int
 malloc_heap_get_stats(const struct malloc_heap *heap,
 		struct rte_malloc_socket_stats *socket_stats);
 
 int
-rte_eal_heap_memzone_init(void);
+rte_eal_malloc_heap_init(void);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index c313a57..54c2bd8 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -39,7 +39,6 @@
 
 #include <rte_memcpy.h>
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_branch_prediction.h>
@@ -87,7 +86,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 		return NULL;
 
 	ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
-				size, align == 0 ? 1 : align);
+				size, 0, align == 0 ? 1 : align, 0);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
@@ -98,7 +97,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 			continue;
 
 		ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
-					size, align == 0 ? 1 : align);
+					size, 0, align == 0 ? 1 : align, 0);
 		if (ret != NULL)
 			return ret;
 	}
@@ -256,5 +255,5 @@ rte_malloc_virt2phy(const void *addr)
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->mz->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->mz->addr);
+	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
-- 
1.9.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 16:13  9%           ` Thomas F Herbert
@ 2015-06-19 17:02  8%             ` Thomas Monjalon
  2015-06-19 17:57  9%               ` Thomas F Herbert
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-19 17:02 UTC (permalink / raw)
  To: Thomas F Herbert; +Cc: dev

2015-06-19 12:13, Thomas F Herbert:
> 
> On 6/19/15 9:16 AM, Thomas Monjalon wrote:
> > 2015-06-19 09:02, Neil Horman:
> >> On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
> >>> 2015-06-19 06:26, Neil Horman:
> >>>> On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> >>>>> For the 2.1 release, I think we should agree to make patches that change
> >>>>> the ABI controllable via a compile-time option. I like Olivier's proposal
> >>>>> on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
> >>>>> changes instead of a separate option per patch set (see
> >>>>> http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
> >>>>> should rework the affected patch sets to use that approach for 2.1.
> >>>>
> >>>> This is a bad idea.  Making ABI dependent on compile time options isn't a
> >>>> maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
> >>>> work (that is to say you make it impossible to really tell what ABI version you
> >>>> are building).
> >>>
> >>> The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
> >>> So one ABI version number refers always to the same ABI.
> >>>
> >>>> If you have two compile time options that modify the ABI, you
> >>>> have to burn through 4 possible LIBABIVER version values to accomodate all
> >>>> possible combinations, and then you need to remember that when you make them
> >>>> statically applicable.
> >>>
> >>> The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
> >>>
> >>> Your intent when introducing ABI policy was to allow smooth porting of
> >>> applications from a DPDK version to another. Right?
> >>> The adopted solution was to provide backward compatibility during 1 release.
> >>> But there are cases where it's not possible. So the policy was to notice
> >>> the future change and wait one release cycle to break the ABI (failing
> >>> compatibility goals).
> >>> The compile-time option may provide an alternative DPDK packaging when the
> >>> ABI backward compatibility cannot be provided (case of mbuf changes).
> >>> In such case, it's still possible to upgrade DPDK by providing 2 versions of
> >>> DPDK libs. So the existing apps continue to link with the previous ABI and
> >>> have the possibility of migrating to the new one.
> >>> Another advantage of this approach is that we don't have to wait 1 release
> >>> to integrate the changes.
> >>> The last advantage is to benefit early of these changes with static libraries.
> >>
> >> Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
> >> Implementing an ABI preview option like this implies the notion that, after a
> >> release, you have to remove all the ifdefs that you inserted to create the new
> >> ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
> >> large, and is predicated on the centralization of work effort (that is to say
> >> you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
> >> from the build' patch every release.
> >
> > It won't be so huge if we reserve the NEXT_ABI solution to changes which cannot
> > have easy backward compatibility with the compat macros you introduced.
> > I feel I can do the job of removing the ifdefs NEXT_ABI after each release.
> > At the same time, the deprecated API, using the compat macros, will be removed.
> >
> >> What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
> >> the sort that Thomas Herbert proposed a few weeks ago).
> >
> > This tree was created after Thomas' request:
> > 	http://dpdk.org/browse/next/dpdk-next/
> 
> Thomas, I am sorry if I went quiet for awhile but I was on personal 
> travel with inconsistent access so I almost missed most of this 
> discussion about ABI changes.
> 
> My understanding of the purpose of the dpdk-next tree is to validate 
> patches by applying and compiling against a "pull" from the main dpdk 
> tree. I think a good way to handle ABI change while effectively using 
> the dpdk-next might be to do as follows:
> 
> Create a specific branch for the new ABI such as 2.X in the main dpdk 
> tree. Once that 2.X branch is created, dpdk-next would mirror the 2.X 
> branch along with master.
> 
> Since, dpdk-next would also have the 2.X branch that is in the main dpdk 
> tree, submitted patches could be applied to either the main branch or 
> the new-ABI 2.X branch. Providing that patch submitters make it clear 
> whether a submitted patch is for the new ABI or the old ABI, dpdk-next 
> could continue to validate the patches for either the main branch or the 
> new ABI 2.X branch.

What is the benefit of a new-ABI branch in the -next tree?

The goal of this discussion is to find a consensus on ABI policy to
smoothly integrate new features without forcing users of shared libraries
to re-build their application when upgrading DPDK, and let them do the
transition before the next upgrade.

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 13:16  4%         ` Thomas Monjalon
  2015-06-19 15:27  9%           ` Neil Horman
@ 2015-06-19 16:13  9%           ` Thomas F Herbert
  2015-06-19 17:02  8%             ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Thomas F Herbert @ 2015-06-19 16:13 UTC (permalink / raw)
  To: Thomas Monjalon, Neil Horman; +Cc: dev



On 6/19/15 9:16 AM, Thomas Monjalon wrote:
> 2015-06-19 09:02, Neil Horman:
>> On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
>>> 2015-06-19 06:26, Neil Horman:
>>>> On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
>>>>> For the 2.1 release, I think we should agree to make patches that change
>>>>> the ABI controllable via a compile-time option. I like Olivier's proposal
>>>>> on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
>>>>> changes instead of a separate option per patch set (see
>>>>> http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
>>>>> should rework the affected patch sets to use that approach for 2.1.
>>>>
>>>> This is a bad idea.  Making ABI dependent on compile time options isn't a
>>>> maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
>>>> work (that is to say you make it impossible to really tell what ABI version you
>>>> are building).
>>>
>>> The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
>>> So one ABI version number refers always to the same ABI.
>>>
>>>> If you have two compile time options that modify the ABI, you
>>>> have to burn through 4 possible LIBABIVER version values to accomodate all
>>>> possible combinations, and then you need to remember that when you make them
>>>> statically applicable.
>>>
>>> The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
>>>
>>> Your intent when introducing ABI policy was to allow smooth porting of
>>> applications from a DPDK version to another. Right?
>>> The adopted solution was to provide backward compatibility during 1 release.
>>> But there are cases where it's not possible. So the policy was to notice
>>> the future change and wait one release cycle to break the ABI (failing
>>> compatibility goals).
>>> The compile-time option may provide an alternative DPDK packaging when the
>>> ABI backward compatibility cannot be provided (case of mbuf changes).
>>> In such case, it's still possible to upgrade DPDK by providing 2 versions of
>>> DPDK libs. So the existing apps continue to link with the previous ABI and
>>> have the possibility of migrating to the new one.
>>> Another advantage of this approach is that we don't have to wait 1 release
>>> to integrate the changes.
>>> The last advantage is to benefit early of these changes with static libraries.
>>
>> Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
>> Implementing an ABI preview option like this implies the notion that, after a
>> release, you have to remove all the ifdefs that you inserted to create the new
>> ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
>> large, and is predicated on the centralization of work effort (that is to say
>> you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
>> from the build' patch every release.
>
> It won't be so huge if we reserve the NEXT_ABI solution to changes which cannot
> have easy backward compatibility with the compat macros you introduced.
> I feel I can do the job of removing the ifdefs NEXT_ABI after each release.
> At the same time, the deprecated API, using the compat macros, will be removed.
>
>> What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
>> the sort that Thomas Herbert proposed a few weeks ago).
>
> This tree was created after Thomas' request:
> 	http://dpdk.org/browse/next/dpdk-next/

Thomas, I am sorry if I went quiet for awhile but I was on personal 
travel with inconsistent access so I almost missed most of this 
discussion about ABI changes.

My understanding of the purpose of the dpdk-next tree is to validate 
patches by applying and compiling against a "pull" from the main dpdk 
tree. I think a good way to handle ABI change while effectively using 
the dpdk-next might be to do as follows:

Create a specific branch for the new ABI such as 2.X in the main dpdk 
tree. Once that 2.X branch is created, dpdk-next would mirror the 2.X 
branch along with master.

Since, dpdk-next would also have the 2.X branch that is in the main dpdk 
tree, submitted patches could be applied to either the main branch or 
the new-ABI 2.X branch. Providing that patch submitters make it clear 
whether a submitted patch is for the new ABI or the old ABI, dpdk-next 
could continue to validate the patches for either the main branch or the 
new ABI 2.X branch.

>
>> Patches that aren't ABI stable can be put on the next-branch/tree in thier
>> final format.  You can delcare the branch unstable (thereby reserving your
>> right to rebase it).  People can use that to preview the next ABI version
>> (complete with the update LIBABIVER bump), and when you release dpdk-X,
>> the new ABI for dpdk-X+1 is achieved by simply merging.
>
> Having this tree living would be a nice improvement but it won't provide any
> stable (and enough validated) releases to rely on.
>

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 15:27  9%           ` Neil Horman
@ 2015-06-19 15:51  9%             ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-19 15:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

2015-06-19 11:27, Neil Horman:
> On Fri, Jun 19, 2015 at 03:16:53PM +0200, Thomas Monjalon wrote:
> > 2015-06-19 09:02, Neil Horman:
> > > On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
> > > > 2015-06-19 06:26, Neil Horman:
> > > > > On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> > > > > > For the 2.1 release, I think we should agree to make patches that change
> > > > > > the ABI controllable via a compile-time option. I like Olivier's proposal
> > > > > > on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
> > > > > > changes instead of a separate option per patch set (see
> > > > > > http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
> > > > > > should rework the affected patch sets to use that approach for 2.1.
> > > > > 
> > > > > This is a bad idea.  Making ABI dependent on compile time options isn't a
> > > > > maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
> > > > > work (that is to say you make it impossible to really tell what ABI version you
> > > > > are building).
> > > > 
> > > > The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
> > > > So one ABI version number refers always to the same ABI.
> > > > 
> > > > > If you have two compile time options that modify the ABI, you
> > > > > have to burn through 4 possible LIBABIVER version values to accomodate all
> > > > > possible combinations, and then you need to remember that when you make them
> > > > > statically applicable.
> > > > 
> > > > The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
> > > > 
> > > > Your intent when introducing ABI policy was to allow smooth porting of
> > > > applications from a DPDK version to another. Right?
> > > > The adopted solution was to provide backward compatibility during 1 release.
> > > > But there are cases where it's not possible. So the policy was to notice
> > > > the future change and wait one release cycle to break the ABI (failing
> > > > compatibility goals).
> > > > The compile-time option may provide an alternative DPDK packaging when the
> > > > ABI backward compatibility cannot be provided (case of mbuf changes).
> > > > In such case, it's still possible to upgrade DPDK by providing 2 versions of
> > > > DPDK libs. So the existing apps continue to link with the previous ABI and
> > > > have the possibility of migrating to the new one.
> > > > Another advantage of this approach is that we don't have to wait 1 release
> > > > to integrate the changes.
> > > > The last advantage is to benefit early of these changes with static libraries.
> > > 
> > > Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
> > > Implementing an ABI preview option like this implies the notion that, after a
> > > release, you have to remove all the ifdefs that you inserted to create the new
> > > ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
> > > large, and is predicated on the centralization of work effort (that is to say
> > > you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
> > > from the build' patch every release.
> > 
> > It won't be so huge if we reserve the NEXT_ABI solution to changes which cannot
> > have easy backward compatibility with the compat macros you introduced.
> > I feel I can do the job of removing the ifdefs NEXT_ABI after each release.
> > At the same time, the deprecated API, using the compat macros, will be removed.
> > 
> I think that is something you can't really predict, as its not an issue of how
> stringent we are with its use, but rather a function of how much change
> developers want in a given release.  That is to say, if you only reserve it for
> the most important/urgently needed changes, thats fine, but if you have a
> release in which 50 developers want to make urgent and important changes that
> breaks ABI, you still have quite a job on your hands to back out the config
> changes.
> 
> Not to mention the fact that backing those changes out is a manual process.
> 
> > > What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
> > > the sort that Thomas Herbert proposed a few weeks ago).
> > 
> > This tree was created after Thomas' request:
> > 	http://dpdk.org/browse/next/dpdk-next/
> > 
> Awesome, Though I'm not sure thats entirely the right place either.  IIRC that
> location was intended to be an early integration site that took unreviewed
> patches.  I think this really calls for a branch from the mainline tree that
> exclusively accepts reviewed ABI changing patches, that can then be merged after
> the next release
> 
> > > Patches that aren't ABI stable can be put on the next-branch/tree in thier
> > > final format.  You can delcare the branch unstable (thereby reserving your
> > > right to rebase it).  People can use that to preview the next ABI version
> > > (complete with the update LIBABIVER bump), and when you release dpdk-X,
> > > the new ABI for dpdk-X+1 is achieved by simply merging.
> > 
> > Having this tree living would be a nice improvement but it won't provide any
> > stable (and enough validated) releases to rely on.
> > 
> I'm not sure I follow you entirely here.  If the goal is to find a place to
> accept patches that are ABI altering ahead of the main release, why do you need
> to provide stable/validated releases?  Just base it off the HEAD of the git tree
> during the DPDK release X merge window, any testing done in the base branch
> should roughly apply, save for functional changes made by the ABI patches you
> add in on the branch.

OK, I didn't get you. So you are saying that the changes for the next release
may be prepared in a branch. Yes, it's possible.
But before stating on practical method to execute the policy, we need to agree
on the policy.

This is my proposal:
- The ABI policy must be better explained: what happens to LIBABIVER and .map
when adding a field, adding or removing a function.
- The file doc/guides/rel_notes/abi.rst must contain the ABI notices but the
policy must be moved to doc/guides/guidelines/compat.rst
- The case "backward compatibility broken" must be replaced by the usage of
CONFIG_RTE_NEXT_ABI while describing cases where it can apply.
- The .map files must be generated in order to make it simpler and allows
the use of CONFIG_RTE_NEXT_ABI.

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 13:16  4%         ` Thomas Monjalon
@ 2015-06-19 15:27  9%           ` Neil Horman
  2015-06-19 15:51  9%             ` Thomas Monjalon
  2015-06-19 16:13  9%           ` Thomas F Herbert
  1 sibling, 1 reply; 200+ results
From: Neil Horman @ 2015-06-19 15:27 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Fri, Jun 19, 2015 at 03:16:53PM +0200, Thomas Monjalon wrote:
> 2015-06-19 09:02, Neil Horman:
> > On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
> > > 2015-06-19 06:26, Neil Horman:
> > > > On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> > > > > For the 2.1 release, I think we should agree to make patches that change
> > > > > the ABI controllable via a compile-time option. I like Olivier's proposal
> > > > > on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
> > > > > changes instead of a separate option per patch set (see
> > > > > http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
> > > > > should rework the affected patch sets to use that approach for 2.1.
> > > > 
> > > > This is a bad idea.  Making ABI dependent on compile time options isn't a
> > > > maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
> > > > work (that is to say you make it impossible to really tell what ABI version you
> > > > are building).
> > > 
> > > The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
> > > So one ABI version number refers always to the same ABI.
> > > 
> > > > If you have two compile time options that modify the ABI, you
> > > > have to burn through 4 possible LIBABIVER version values to accomodate all
> > > > possible combinations, and then you need to remember that when you make them
> > > > statically applicable.
> > > 
> > > The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
> > > 
> > > Your intent when introducing ABI policy was to allow smooth porting of
> > > applications from a DPDK version to another. Right?
> > > The adopted solution was to provide backward compatibility during 1 release.
> > > But there are cases where it's not possible. So the policy was to notice
> > > the future change and wait one release cycle to break the ABI (failing
> > > compatibility goals).
> > > The compile-time option may provide an alternative DPDK packaging when the
> > > ABI backward compatibility cannot be provided (case of mbuf changes).
> > > In such case, it's still possible to upgrade DPDK by providing 2 versions of
> > > DPDK libs. So the existing apps continue to link with the previous ABI and
> > > have the possibility of migrating to the new one.
> > > Another advantage of this approach is that we don't have to wait 1 release
> > > to integrate the changes.
> > > The last advantage is to benefit early of these changes with static libraries.
> > 
> > Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
> > Implementing an ABI preview option like this implies the notion that, after a
> > release, you have to remove all the ifdefs that you inserted to create the new
> > ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
> > large, and is predicated on the centralization of work effort (that is to say
> > you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
> > from the build' patch every release.
> 
> It won't be so huge if we reserve the NEXT_ABI solution to changes which cannot
> have easy backward compatibility with the compat macros you introduced.
> I feel I can do the job of removing the ifdefs NEXT_ABI after each release.
> At the same time, the deprecated API, using the compat macros, will be removed.
> 
I think that is something you can't really predict, as its not an issue of how
stringent we are with its use, but rather a function of how much change
developers want in a given release.  That is to say, if you only reserve it for
the most important/urgently needed changes, thats fine, but if you have a
release in which 50 developers want to make urgent and important changes that
breaks ABI, you still have quite a job on your hands to back out the config
changes.

Not to mention the fact that backing those changes out is a manual process.

> > What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
> > the sort that Thomas Herbert proposed a few weeks ago).
> 
> This tree was created after Thomas' request:
> 	http://dpdk.org/browse/next/dpdk-next/
> 
Awesome, Though I'm not sure thats entirely the right place either.  IIRC that
location was intended to be an early integration site that took unreviewed
patches.  I think this really calls for a branch from the mainline tree that
exclusively accepts reviewed ABI changing patches, that can then be merged after
the next release

> > Patches that aren't ABI stable can be put on the next-branch/tree in thier
> > final format.  You can delcare the branch unstable (thereby reserving your
> > right to rebase it).  People can use that to preview the next ABI version
> > (complete with the update LIBABIVER bump), and when you release dpdk-X,
> > the new ABI for dpdk-X+1 is achieved by simply merging.
> 
> Having this tree living would be a nice improvement but it won't provide any
> stable (and enough validated) releases to rely on.
> 
I'm not sure I follow you entirely here.  If the goal is to find a place to
accept patches that are ABI altering ahead of the main release, why do you need
to provide stable/validated releases?  Just base it off the HEAD of the git tree
during the DPDK release X merge window, any testing done in the base branch
should roughly apply, save for functional changes made by the ABI patches you
add in on the branch.

Neil

> 

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 13:02  9%       ` Neil Horman
@ 2015-06-19 13:16  4%         ` Thomas Monjalon
  2015-06-19 15:27  9%           ` Neil Horman
  2015-06-19 16:13  9%           ` Thomas F Herbert
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2015-06-19 13:16 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

2015-06-19 09:02, Neil Horman:
> On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
> > 2015-06-19 06:26, Neil Horman:
> > > On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> > > > For the 2.1 release, I think we should agree to make patches that change
> > > > the ABI controllable via a compile-time option. I like Olivier's proposal
> > > > on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
> > > > changes instead of a separate option per patch set (see
> > > > http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
> > > > should rework the affected patch sets to use that approach for 2.1.
> > > 
> > > This is a bad idea.  Making ABI dependent on compile time options isn't a
> > > maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
> > > work (that is to say you make it impossible to really tell what ABI version you
> > > are building).
> > 
> > The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
> > So one ABI version number refers always to the same ABI.
> > 
> > > If you have two compile time options that modify the ABI, you
> > > have to burn through 4 possible LIBABIVER version values to accomodate all
> > > possible combinations, and then you need to remember that when you make them
> > > statically applicable.
> > 
> > The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
> > 
> > Your intent when introducing ABI policy was to allow smooth porting of
> > applications from a DPDK version to another. Right?
> > The adopted solution was to provide backward compatibility during 1 release.
> > But there are cases where it's not possible. So the policy was to notice
> > the future change and wait one release cycle to break the ABI (failing
> > compatibility goals).
> > The compile-time option may provide an alternative DPDK packaging when the
> > ABI backward compatibility cannot be provided (case of mbuf changes).
> > In such case, it's still possible to upgrade DPDK by providing 2 versions of
> > DPDK libs. So the existing apps continue to link with the previous ABI and
> > have the possibility of migrating to the new one.
> > Another advantage of this approach is that we don't have to wait 1 release
> > to integrate the changes.
> > The last advantage is to benefit early of these changes with static libraries.
> 
> Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
> Implementing an ABI preview option like this implies the notion that, after a
> release, you have to remove all the ifdefs that you inserted to create the new
> ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
> large, and is predicated on the centralization of work effort (that is to say
> you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
> from the build' patch every release.

It won't be so huge if we reserve the NEXT_ABI solution to changes which cannot
have easy backward compatibility with the compat macros you introduced.
I feel I can do the job of removing the ifdefs NEXT_ABI after each release.
At the same time, the deprecated API, using the compat macros, will be removed.

> What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
> the sort that Thomas Herbert proposed a few weeks ago).

This tree was created after Thomas' request:
	http://dpdk.org/browse/next/dpdk-next/

> Patches that aren't ABI stable can be put on the next-branch/tree in thier
> final format.  You can delcare the branch unstable (thereby reserving your
> right to rebase it).  People can use that to preview the next ABI version
> (complete with the update LIBABIVER bump), and when you release dpdk-X,
> the new ABI for dpdk-X+1 is achieved by simply merging.

Having this tree living would be a nice improvement but it won't provide any
stable (and enough validated) releases to rely on.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 12:32  9%     ` Thomas Monjalon
@ 2015-06-19 13:02  9%       ` Neil Horman
  2015-06-19 13:16  4%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-06-19 13:02 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Fri, Jun 19, 2015 at 02:32:33PM +0200, Thomas Monjalon wrote:
> 2015-06-19 06:26, Neil Horman:
> > On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> > > For the 2.1 release, I think we should agree to make patches that change
> > > the ABI controllable via a compile-time option. I like Olivier's proposal
> > > on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
> > > changes instead of a separate option per patch set (see
> > > http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
> > > should rework the affected patch sets to use that approach for 2.1.
> > 
> > This is a bad idea.  Making ABI dependent on compile time options isn't a
> > maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
> > work (that is to say you make it impossible to really tell what ABI version you
> > are building).
> 
> The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
> So one ABI version number refers always to the same ABI.
> 
> > If you have two compile time options that modify the ABI, you
> > have to burn through 4 possible LIBABIVER version values to accomodate all
> > possible combinations, and then you need to remember that when you make them
> > statically applicable.
> 
> The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.
> 
> Your intent when introducing ABI policy was to allow smooth porting of
> applications from a DPDK version to another. Right?
> The adopted solution was to provide backward compatibility during 1 release.
> But there are cases where it's not possible. So the policy was to notice
> the future change and wait one release cycle to break the ABI (failing
> compatibility goals).
> The compile-time option may provide an alternative DPDK packaging when the
> ABI backward compatibility cannot be provided (case of mbuf changes).
> In such case, it's still possible to upgrade DPDK by providing 2 versions of
> DPDK libs. So the existing apps continue to link with the previous ABI and
> have the possibility of migrating to the new one.
> Another advantage of this approach is that we don't have to wait 1 release
> to integrate the changes.
> The last advantage is to benefit early of these changes with static libraries.
> 


Hm, ok, thats a bit more reasonable, but it still seems shaky to me.
Implementing an ABI preview option like this implies the notion that, after a
release, you have to remove all the ifdefs that you inserted to create the new
ABI.  That seems like an easy task, but it becomes a pain when the ABI delta is
large, and is predicated on the centralization of work effort (that is to say
you need to identify someone to submit the 'remove the NEXT_ABI config ifdefs
from the build' patch every release.

What might be better would be a dpdk-next branch (or even a dpdk-next tree, of
the sort that Thomas Herbert proposed a few weeks ago).  Patches that aren't ABI
stable can be put on the next-branch/tree in thier final format.  You can
delcare the branch unstable (thereby reserving your right to rebase it).  People
can use that to preview the next ABI version (complete with the update LIBABIVER
bump), and when you release dpdk-X, the new ABI for dpdk-X+1 is achieved by
simply merging.

Neil

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-19 10:26  9%   ` Neil Horman
@ 2015-06-19 12:32  9%     ` Thomas Monjalon
  2015-06-19 13:02  9%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-19 12:32 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

2015-06-19 06:26, Neil Horman:
> On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> > For the 2.1 release, I think we should agree to make patches that change
> > the ABI controllable via a compile-time option. I like Olivier's proposal
> > on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these
> > changes instead of a separate option per patch set (see
> > http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we
> > should rework the affected patch sets to use that approach for 2.1.
> 
> This is a bad idea.  Making ABI dependent on compile time options isn't a
> maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
> work (that is to say you make it impossible to really tell what ABI version you
> are building).

The idea was to make LIBABIVER increment dependent of CONFIG_RTE_NEXT_ABI.
So one ABI version number refers always to the same ABI.

> If you have two compile time options that modify the ABI, you
> have to burn through 4 possible LIBABIVER version values to accomodate all
> possible combinations, and then you need to remember that when you make them
> statically applicable.

The idea is to have only 1 compile-time option: CONFIG_RTE_NEXT_ABI.

Your intent when introducing ABI policy was to allow smooth porting of
applications from a DPDK version to another. Right?
The adopted solution was to provide backward compatibility during 1 release.
But there are cases where it's not possible. So the policy was to notice
the future change and wait one release cycle to break the ABI (failing
compatibility goals).
The compile-time option may provide an alternative DPDK packaging when the
ABI backward compatibility cannot be provided (case of mbuf changes).
In such case, it's still possible to upgrade DPDK by providing 2 versions of
DPDK libs. So the existing apps continue to link with the previous ABI and
have the possibility of migrating to the new one.
Another advantage of this approach is that we don't have to wait 1 release
to integrate the changes.
The last advantage is to benefit early of these changes with static libraries.

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17 11:06  4%   ` Richardson, Bruce
@ 2015-06-19 11:08  7%     ` Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2015-06-19 11:08 UTC (permalink / raw)
  To: Richardson, Bruce, Neil Horman, Thomas Monjalon; +Cc: dev

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Richardson, Bruce
> Sent: Wednesday, June 17, 2015 12:07 PM
> To: Neil Horman; Thomas Monjalon
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices -
> statistics - ABI
> Hi Neil,
> 
> on my end, some suggestions:
> 
> 1. the documentation on changing an API function provided in rte_compat.h
> is really good, but I don't think this is present in our documentation in
> the docs folder or on website is it (apologies if it is and I've missed
> it)? This needs to go into programmers guide or some other doc (perhaps
> the new doc that the coding style went into).
> 
> 2. The documentation also needs an example of: this is how you add a new
> function and update the map file, and this is how you a) mark a function
> as deprecated and b) remove it completely. That way we could have one
> guide covering API versioning, how to add, modify and remove functions.
> 
> 3. This doc should also cover how to use the API checker tool, something I
> haven't had the chance to look at yet, but should do in the near future!
> :-)


+1 on all three. We need better documentation on how to work with the ABI in DPDK. A new document in doc/guides/guidelines/ would be good.

John

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-18 16:55  8% ` O'Driscoll, Tim
  2015-06-18 21:13  4%   ` Vincent JARDIN
@ 2015-06-19 10:26  9%   ` Neil Horman
  2015-06-19 12:32  9%     ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Neil Horman @ 2015-06-19 10:26 UTC (permalink / raw)
  To: O'Driscoll, Tim; +Cc: dev

On Thu, Jun 18, 2015 at 04:55:45PM +0000, O'Driscoll, Tim wrote:
> > -----Original Message-----
> > From: announce [mailto:announce-bounces@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > Sent: Wednesday, June 17, 2015 12:30 AM
> > To: announce@dpdk.org
> > Subject: [dpdk-announce] important design choices - statistics - ABI
> > 
> > Hi all,
> > 
> 
> > During the development of the release 2.0, there was an agreement to
> > keep
> > ABI compatibility or to bring new ABI while keeping old one during one
> > release.
> > In case it's not possible to have this transition, the (exceptional)
> > break
> > should be acknowledged by several developers.
> > 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> > There were some interesting discussions but not a lot of participants:
> > 	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus
> > =8461
> > 
> > During the current development cycle for the release 2.1, the ABI
> > question
> > arises many times in different threads.
> > To add the hash key size field, it is proposed to use a struct padding
> > gap:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
> > To support the flow director for VF, there is no proposal yet:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
> > To add the speed capability, it is proposed to break ABI in the release
> > 2.2:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
> > To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
> > To add the interrupt mode, it is proposed to add a build-time option
> > CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
> > binary:
> > 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
> > To add the packet type, there is a proposal to add a build-time option
> > CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
> > We must also better document how to remove a deprecated ABI:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
> > The ABI compatibility is a new constraint and we need to better
> > understand
> > what it means and how to proceed. Even the macros are not yet well
> > documented:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
> > 
> > Thanks for your attention and your participation in these important
> > choices.
> 
> There's been some good discussion on the ABI policy in various responses to this email. I think we now need to reach a conclusion on how we're going to proceed for the 2.1 release. Then, we can have further discussion on the use of versioning or other methods for avoiding the problem in future.
> 
> For the 2.1 release, I think we should agree to make patches that change the ABI controllable via a compile-time option. I like Olivier's proposal on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these changes instead of a separate option per patch set (see http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we should rework the affected patch sets to use that approach for 2.1.
> 
This is a bad idea.  Making ABI dependent on compile time options isn't a
maintainable solution.  It breaks the notion of how LIBABIVER is supposed to
work (that is to say you make it impossible to really tell what ABI version you
are building).  If you have two compile time options that modify the ABI, you
have to burn through 4 possible LIBABIVER version values to accomodate all
possible combinations, and then you need to remember that when you make them
statically applicable.

Neil

> 
> Tim
> 

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH v7 18/18] mbuf: remove old packet type bit masks
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (16 preceding siblings ...)
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 17/18] examples/l3fwd: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

As unified packet types are used instead, those old bit masks and
the relevant macros for packet type indication need to be removed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 4 ++++
 lib/librte_mbuf/rte_mbuf.h | 4 ++++
 2 files changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.
* Redefined the bit masks for packet RX offload flags.

v5 changes:
* Rolled back the bit masks of RX flags, for ABI compatibility.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index f506517..4320dd4 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -251,14 +251,18 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
 	/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
 	/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
 	/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
+#ifndef RTE_NEXT_ABI
 	case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
 	case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
 	case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
 	case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
+#endif /* RTE_NEXT_ABI */
 	case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
 	case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+#ifndef RTE_NEXT_ABI
 	case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
 	case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
+#endif /* RTE_NEXT_ABI */
 	default: return NULL;
 	}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 5e7cc26..9f32edf 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -91,14 +91,18 @@ extern "C" {
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR     (0ULL << 0)  /**< Hardware processing error. */
 #define PKT_RX_MAC_ERR       (0ULL << 0)  /**< MAC error. */
+#ifndef RTE_NEXT_ABI
 #define PKT_RX_IPV4_HDR      (1ULL << 5)  /**< RX packet with IPv4 header. */
 #define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with extended IPv4 header. */
 #define PKT_RX_IPV6_HDR      (1ULL << 7)  /**< RX packet with IPv6 header. */
 #define PKT_RX_IPV6_HDR_EXT  (1ULL << 8)  /**< RX packet with extended IPv6 header. */
+#endif /* RTE_NEXT_ABI */
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
+#ifndef RTE_NEXT_ABI
 #define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 header.*/
 #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
+#endif /* RTE_NEXT_ABI */
 #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
 #define PKT_RX_QINQ_PKT      (1ULL << 15)  /**< RX packet with double VLAN stripped. */
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 17/18] examples/l3fwd: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (15 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 16/18] examples/l3fwd-power: " Helin Zhang
@ 2015-06-19  8:14  3%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 18/18] mbuf: remove old packet type bit masks Helin Zhang
  2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd/main.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 120 insertions(+), 3 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v3 changes:
* Minor bug fixes and enhancements.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 7e4bbfd..eff9580 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -948,7 +948,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		/* Handle IPv4 headers.*/
 		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char *) +
 				sizeof(struct ether_hdr));
@@ -979,8 +983,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
-
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 	} else {
+#endif
 		/* Handle IPv6 headers.*/
 		struct ipv6_hdr *ipv6_hdr;
 
@@ -999,8 +1006,13 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
+#ifdef RTE_NEXT_ABI
+	} else
+		/* Free the mbuf that contains non-IPV4/IPV6 packet */
+		rte_pktmbuf_free(m);
+#else
 	}
-
+#endif
 }
 
 #ifdef DO_RFC_1812_CHECKS
@@ -1024,12 +1036,19 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, struct lcore_conf *qcon
  * to BAD_PORT value.
  */
 static inline __attribute__((always_inline)) void
+#ifdef RTE_NEXT_ABI
+rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
+#else
 rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t flags)
+#endif
 {
 	uint8_t ihl;
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(ptype)) {
+#else
 	if ((flags & PKT_RX_IPV4_HDR) != 0) {
-
+#endif
 		ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL;
 
 		ipv4_hdr->time_to_live--;
@@ -1059,11 +1078,19 @@ get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	if (pkt->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		if (rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
 				&next_hop) != 0)
 			next_hop = portid;
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (pkt->ol_flags & PKT_RX_IPV6_HDR) {
+#endif
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 		if (rte_lpm6_lookup(qconf->ipv6_lookup_struct,
@@ -1097,12 +1124,52 @@ process_packet(struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	ve = val_eth[dp];
 
 	dst_port[0] = dp;
+#ifdef RTE_NEXT_ABI
+	rfc1812_process(ipv4_hdr, dst_port, pkt->packet_type);
+#else
 	rfc1812_process(ipv4_hdr, dst_port, pkt->ol_flags);
+#endif
 
 	te =  _mm_blend_epi16(te, ve, MASK_ETH);
 	_mm_store_si128((__m128i *)eth_hdr, te);
 }
 
+#ifdef RTE_NEXT_ABI
+/*
+ * Read packet_type and destination IPV4 addresses from 4 mbufs.
+ */
+static inline void
+processx4_step1(struct rte_mbuf *pkt[FWDSTEP],
+		__m128i *dip,
+		uint32_t *ipv4_flag)
+{
+	struct ipv4_hdr *ipv4_hdr;
+	struct ether_hdr *eth_hdr;
+	uint32_t x0, x1, x2, x3;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[0], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x0 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] = pkt[0]->packet_type & RTE_PTYPE_L3_IPV4;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[1], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x1 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[1]->packet_type;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[2], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x2 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[2]->packet_type;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt[3], struct ether_hdr *);
+	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
+	x3 = ipv4_hdr->dst_addr;
+	ipv4_flag[0] &= pkt[3]->packet_type;
+
+	dip[0] = _mm_set_epi32(x3, x2, x1, x0);
+}
+#else /* RTE_NEXT_ABI */
 /*
  * Read ol_flags and destination IPV4 addresses from 4 mbufs.
  */
@@ -1135,14 +1202,24 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, uint32_t *flag)
 
 	dip[0] = _mm_set_epi32(x3, x2, x1, x0);
 }
+#endif /* RTE_NEXT_ABI */
 
 /*
  * Lookup into LPM for destination port.
  * If lookup fails, use incoming port (portid) as destination port.
  */
 static inline void
+#ifdef RTE_NEXT_ABI
+processx4_step2(const struct lcore_conf *qconf,
+		__m128i dip,
+		uint32_t ipv4_flag,
+		uint8_t portid,
+		struct rte_mbuf *pkt[FWDSTEP],
+		uint16_t dprt[FWDSTEP])
+#else
 processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
 	uint8_t portid, struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP])
+#endif /* RTE_NEXT_ABI */
 {
 	rte_xmm_t dst;
 	const  __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
@@ -1152,7 +1229,11 @@ processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
 	dip = _mm_shuffle_epi8(dip, bswap_mask);
 
 	/* if all 4 packets are IPV4. */
+#ifdef RTE_NEXT_ABI
+	if (likely(ipv4_flag)) {
+#else
 	if (likely(flag != 0)) {
+#endif
 		rte_lpm_lookupx4(qconf->ipv4_lookup_struct, dip, dprt, portid);
 	} else {
 		dst.x = dip;
@@ -1202,6 +1283,16 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 	_mm_store_si128(p[2], te[2]);
 	_mm_store_si128(p[3], te[3]);
 
+#ifdef RTE_NEXT_ABI
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[0] + 1),
+		&dst_port[0], pkt[0]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[1] + 1),
+		&dst_port[1], pkt[1]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[2] + 1),
+		&dst_port[2], pkt[2]->packet_type);
+	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[3] + 1),
+		&dst_port[3], pkt[3]->packet_type);
+#else /* RTE_NEXT_ABI */
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[0] + 1),
 		&dst_port[0], pkt[0]->ol_flags);
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[1] + 1),
@@ -1210,6 +1301,7 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 		&dst_port[2], pkt[2]->ol_flags);
 	rfc1812_process((struct ipv4_hdr *)((struct ether_hdr *)p[3] + 1),
 		&dst_port[3], pkt[3]->ol_flags);
+#endif /* RTE_NEXT_ABI */
 }
 
 /*
@@ -1396,7 +1488,11 @@ main_loop(__attribute__((unused)) void *dummy)
 	uint16_t *lp;
 	uint16_t dst_port[MAX_PKT_BURST];
 	__m128i dip[MAX_PKT_BURST / FWDSTEP];
+#ifdef RTE_NEXT_ABI
+	uint32_t ipv4_flag[MAX_PKT_BURST / FWDSTEP];
+#else
 	uint32_t flag[MAX_PKT_BURST / FWDSTEP];
+#endif
 	uint16_t pnum[MAX_PKT_BURST + 1];
 #endif
 
@@ -1466,6 +1562,18 @@ main_loop(__attribute__((unused)) void *dummy)
 				 */
 				int32_t n = RTE_ALIGN_FLOOR(nb_rx, 4);
 				for (j = 0; j < n ; j+=4) {
+#ifdef RTE_NEXT_ABI
+					uint32_t pkt_type =
+						pkts_burst[j]->packet_type &
+						pkts_burst[j+1]->packet_type &
+						pkts_burst[j+2]->packet_type &
+						pkts_burst[j+3]->packet_type;
+					if (pkt_type & RTE_PTYPE_L3_IPV4) {
+						simple_ipv4_fwd_4pkts(
+						&pkts_burst[j], portid, qconf);
+					} else if (pkt_type &
+						RTE_PTYPE_L3_IPV6) {
+#else /* RTE_NEXT_ABI */
 					uint32_t ol_flag = pkts_burst[j]->ol_flags
 							& pkts_burst[j+1]->ol_flags
 							& pkts_burst[j+2]->ol_flags
@@ -1474,6 +1582,7 @@ main_loop(__attribute__((unused)) void *dummy)
 						simple_ipv4_fwd_4pkts(&pkts_burst[j],
 									portid, qconf);
 					} else if (ol_flag & PKT_RX_IPV6_HDR) {
+#endif /* RTE_NEXT_ABI */
 						simple_ipv6_fwd_4pkts(&pkts_burst[j],
 									portid, qconf);
 					} else {
@@ -1498,13 +1607,21 @@ main_loop(__attribute__((unused)) void *dummy)
 			for (j = 0; j != k; j += FWDSTEP) {
 				processx4_step1(&pkts_burst[j],
 					&dip[j / FWDSTEP],
+#ifdef RTE_NEXT_ABI
+					&ipv4_flag[j / FWDSTEP]);
+#else
 					&flag[j / FWDSTEP]);
+#endif
 			}
 
 			k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
 			for (j = 0; j != k; j += FWDSTEP) {
 				processx4_step2(qconf, dip[j / FWDSTEP],
+#ifdef RTE_NEXT_ABI
+					ipv4_flag[j / FWDSTEP], portid,
+#else
 					flag[j / FWDSTEP], portid,
+#endif
 					&pkts_burst[j], &dst_port[j]);
 			}
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 16/18] examples/l3fwd-power: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (14 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 15/18] examples/l3fwd-acl: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 17/18] examples/l3fwd: " Helin Zhang
                       ` (2 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd-power/main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 6057059..705188f 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -635,7 +635,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
 
 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		/* Handle IPv4 headers.*/
 		ipv4_hdr =
 			(struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char*)
@@ -670,8 +674,12 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
 		ether_addr_copy(&ports_eth_addr[dst_port], &eth_hdr->s_addr);
 
 		send_single_packet(m, dst_port);
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 	}
 	else {
+#endif
 		/* Handle IPv6 headers.*/
 #if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH)
 		struct ipv6_hdr *ipv6_hdr;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 15/18] examples/l3fwd-acl: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (13 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 14/18] examples/ip_reassembly: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 16/18] examples/l3fwd-power: " Helin Zhang
                       ` (3 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/l3fwd-acl/main.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index a5d4f25..78b6df2 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -645,10 +645,13 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 	struct ipv4_hdr *ipv4_hdr;
 	struct rte_mbuf *pkt = pkts_in[index];
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
 
 	if (type == PKT_RX_IPV4_HDR) {
-
+#endif
 		ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
 			unsigned char *) + sizeof(struct ether_hdr));
 
@@ -667,9 +670,11 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 			/* Not a valid IPv4 packet */
 			rte_pktmbuf_free(pkt);
 		}
-
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (type == PKT_RX_IPV6_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
 		acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -687,17 +692,22 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct acl_search_t *acl,
 {
 	struct rte_mbuf *pkt = pkts_in[index];
 
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
+#else
 	int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
 
 	if (type == PKT_RX_IPV4_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv4[acl->num_ipv4] = MBUF_IPV4_2PROTO(pkt);
 		acl->m_ipv4[(acl->num_ipv4)++] = pkt;
 
-
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
+#else
 	} else if (type == PKT_RX_IPV6_HDR) {
-
+#endif
 		/* Fill acl structure */
 		acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
 		acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -745,10 +755,17 @@ send_one_packet(struct rte_mbuf *m, uint32_t res)
 		/* in the ACL list, drop it */
 #ifdef L3FWDACL_DEBUG
 		if ((res & ACL_DENY_SIGNATURE) != 0) {
+#ifdef RTE_NEXT_ABI
+			if (RTE_ETH_IS_IPV4_HDR(m->packet_type))
+				dump_acl4_rule(m, res);
+			else if (RTE_ETH_IS_IPV6_HDR(m->packet_type))
+				dump_acl6_rule(m, res);
+#else
 			if (m->ol_flags & PKT_RX_IPV4_HDR)
 				dump_acl4_rule(m, res);
 			else
 				dump_acl6_rule(m, res);
+#endif /* RTE_NEXT_ABI */
 		}
 #endif
 		rte_pktmbuf_free(m);
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 14/18] examples/ip_reassembly: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (12 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 15/18] examples/l3fwd-acl: " Helin Zhang
                       ` (4 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/ip_reassembly/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 9ecb6f9..f1c47ad 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -356,7 +356,11 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	dst_port = portid;
 
 	/* if packet is IPv4 */
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & (PKT_RX_IPV4_HDR)) {
+#endif
 		struct ipv4_hdr *ip_hdr;
 		uint32_t ip_dst;
 
@@ -396,9 +400,14 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+		/* if packet is IPv6 */
+#else
 	}
 	/* if packet is IPv6 */
 	else if (m->ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) {
+#endif
 		struct ipv6_extension_fragment *frag_hdr;
 		struct ipv6_hdr *ip_hdr;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (11 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 12/18] app/test: Remove useless code Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 14/18] examples/ip_reassembly: " Helin Zhang
                       ` (5 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 examples/ip_fragmentation/main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 0922ba6..b71d05f 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -283,7 +283,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 	len = qconf->tx_mbufs[port_out].len;
 
 	/* if this is an IPv4 packet */
+#ifdef RTE_NEXT_ABI
+	if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 	if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 		struct ipv4_hdr *ip_hdr;
 		uint32_t ip_dst;
 		/* Read the lookup key (i.e. ip_dst) from the input packet */
@@ -317,9 +321,14 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 			if (unlikely (len2 < 0))
 				return;
 		}
+#ifdef RTE_NEXT_ABI
+	} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+		/* if this is an IPv6 packet */
+#else
 	}
 	/* if this is an IPv6 packet */
 	else if (m->ol_flags & PKT_RX_IPV6_HDR) {
+#endif
 		struct ipv6_hdr *ip_hdr;
 
 		ipv6 = 1;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 12/18] app/test: Remove useless code
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (10 preceding siblings ...)
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 11/18] app/testpmd: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
                       ` (6 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

Severl useless code lines are added accidently, which blocks packet
type unification. They should be deleted at all.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test/packet_burst_generator.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

v4 changes:
* Removed several useless code lines which block packet type unification.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/app/test/packet_burst_generator.c b/app/test/packet_burst_generator.c
index b46eed7..61e6340 100644
--- a/app/test/packet_burst_generator.c
+++ b/app/test/packet_burst_generator.c
@@ -272,19 +272,21 @@ nomore_mbuf:
 		if (ipv4) {
 			pkt->vlan_tci  = ETHER_TYPE_IPv4;
 			pkt->l3_len = sizeof(struct ipv4_hdr);
-
+#ifndef RTE_NEXT_ABI
 			if (vlan_enabled)
 				pkt->ol_flags = PKT_RX_IPV4_HDR | PKT_RX_VLAN_PKT;
 			else
 				pkt->ol_flags = PKT_RX_IPV4_HDR;
+#endif
 		} else {
 			pkt->vlan_tci  = ETHER_TYPE_IPv6;
 			pkt->l3_len = sizeof(struct ipv6_hdr);
-
+#ifndef RTE_NEXT_ABI
 			if (vlan_enabled)
 				pkt->ol_flags = PKT_RX_IPV6_HDR | PKT_RX_VLAN_PKT;
 			else
 				pkt->ol_flags = PKT_RX_IPV6_HDR;
+#endif
 		}
 
 		pkts_burst[nb_pkt] = pkt;
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 11/18] app/testpmd: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (9 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 10/18] app/test-pipeline: " Helin Zhang
@ 2015-06-19  8:14  3%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 12/18] app/test: Remove useless code Helin Zhang
                       ` (7 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
---
 app/test-pmd/csumonly.c |  14 ++++
 app/test-pmd/rxonly.c   | 183 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v4 changes:
* Added printing logs of packet types of each received packet in rxonly mode.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 950ea82..fab9600 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -202,8 +202,14 @@ parse_ethernet(struct ether_hdr *eth_hdr, struct testpmd_offload_info *info)
 
 /* Parse a vxlan header */
 static void
+#ifdef RTE_NEXT_ABI
+parse_vxlan(struct udp_hdr *udp_hdr,
+	    struct testpmd_offload_info *info,
+	    uint32_t pkt_type)
+#else
 parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
 	uint64_t mbuf_olflags)
+#endif
 {
 	struct ether_hdr *eth_hdr;
 
@@ -211,8 +217,12 @@ parse_vxlan(struct udp_hdr *udp_hdr, struct testpmd_offload_info *info,
 	 * (rfc7348) or that the rx offload flag is set (i40e only
 	 * currently) */
 	if (udp_hdr->dst_port != _htons(4789) &&
+#ifdef RTE_NEXT_ABI
+		RTE_ETH_IS_TUNNEL_PKT(pkt_type) == 0)
+#else
 		(mbuf_olflags & (PKT_RX_TUNNEL_IPV4_HDR |
 			PKT_RX_TUNNEL_IPV6_HDR)) == 0)
+#endif
 		return;
 
 	info->is_tunnel = 1;
@@ -549,7 +559,11 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 				struct udp_hdr *udp_hdr;
 				udp_hdr = (struct udp_hdr *)((char *)l3_hdr +
 					info.l3_len);
+#ifdef RTE_NEXT_ABI
+				parse_vxlan(udp_hdr, &info, m->packet_type);
+#else
 				parse_vxlan(udp_hdr, &info, m->ol_flags);
+#endif
 			} else if (info.l4_proto == IPPROTO_GRE) {
 				struct simple_gre_hdr *gre_hdr;
 				gre_hdr = (struct simple_gre_hdr *)
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index f6a2f84..5a30347 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -91,7 +91,11 @@ pkt_burst_receive(struct fwd_stream *fs)
 	uint64_t ol_flags;
 	uint16_t nb_rx;
 	uint16_t i, packet_type;
+#ifdef RTE_NEXT_ABI
+	uint16_t is_encapsulation;
+#else
 	uint64_t is_encapsulation;
+#endif
 
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
@@ -135,8 +139,12 @@ pkt_burst_receive(struct fwd_stream *fs)
 		ol_flags = mb->ol_flags;
 		packet_type = mb->packet_type;
 
+#ifdef RTE_NEXT_ABI
+		is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
+#else
 		is_encapsulation = ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
 				PKT_RX_TUNNEL_IPV6_HDR);
+#endif
 
 		print_ether_addr("  src=", &eth_hdr->s_addr);
 		print_ether_addr(" - dst=", &eth_hdr->d_addr);
@@ -163,6 +171,177 @@ pkt_burst_receive(struct fwd_stream *fs)
 		if (ol_flags & PKT_RX_QINQ_PKT)
 			printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
 					mb->vlan_tci, mb->vlan_tci_outer);
+#ifdef RTE_NEXT_ABI
+		if (mb->packet_type) {
+			uint32_t ptype;
+
+			/* (outer) L2 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L2_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L2_MAC:
+				printf(" - (outer) L2 type: MAC");
+				break;
+			case RTE_PTYPE_L2_MAC_TIMESYNC:
+				printf(" - (outer) L2 type: MAC Timesync");
+				break;
+			case RTE_PTYPE_L2_ARP:
+				printf(" - (outer) L2 type: ARP");
+				break;
+			case RTE_PTYPE_L2_LLDP:
+				printf(" - (outer) L2 type: LLDP");
+				break;
+			default:
+				printf(" - (outer) L2 type: Unknown");
+				break;
+			}
+
+			/* (outer) L3 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L3_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L3_IPV4:
+				printf(" - (outer) L3 type: IPV4");
+				break;
+			case RTE_PTYPE_L3_IPV4_EXT:
+				printf(" - (outer) L3 type: IPV4_EXT");
+				break;
+			case RTE_PTYPE_L3_IPV6:
+				printf(" - (outer) L3 type: IPV6");
+				break;
+			case RTE_PTYPE_L3_IPV4_EXT_UNKNOWN:
+				printf(" - (outer) L3 type: IPV4_EXT_UNKNOWN");
+				break;
+			case RTE_PTYPE_L3_IPV6_EXT:
+				printf(" - (outer) L3 type: IPV6_EXT");
+				break;
+			case RTE_PTYPE_L3_IPV6_EXT_UNKNOWN:
+				printf(" - (outer) L3 type: IPV6_EXT_UNKNOWN");
+				break;
+			default:
+				printf(" - (outer) L3 type: Unknown");
+				break;
+			}
+
+			/* (outer) L4 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_L4_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_L4_TCP:
+				printf(" - (outer) L4 type: TCP");
+				break;
+			case RTE_PTYPE_L4_UDP:
+				printf(" - (outer) L4 type: UDP");
+				break;
+			case RTE_PTYPE_L4_FRAG:
+				printf(" - (outer) L4 type: L4_FRAG");
+				break;
+			case RTE_PTYPE_L4_SCTP:
+				printf(" - (outer) L4 type: SCTP");
+				break;
+			case RTE_PTYPE_L4_ICMP:
+				printf(" - (outer) L4 type: ICMP");
+				break;
+			case RTE_PTYPE_L4_NONFRAG:
+				printf(" - (outer) L4 type: L4_NONFRAG");
+				break;
+			default:
+				printf(" - (outer) L4 type: Unknown");
+				break;
+			}
+
+			/* packet tunnel type */
+			ptype = mb->packet_type & RTE_PTYPE_TUNNEL_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_TUNNEL_IP:
+				printf(" - Tunnel type: IP");
+				break;
+			case RTE_PTYPE_TUNNEL_GRE:
+				printf(" - Tunnel type: GRE");
+				break;
+			case RTE_PTYPE_TUNNEL_VXLAN:
+				printf(" - Tunnel type: VXLAN");
+				break;
+			case RTE_PTYPE_TUNNEL_NVGRE:
+				printf(" - Tunnel type: NVGRE");
+				break;
+			case RTE_PTYPE_TUNNEL_GENEVE:
+				printf(" - Tunnel type: GENEVE");
+				break;
+			case RTE_PTYPE_TUNNEL_GRENAT:
+				printf(" - Tunnel type: GRENAT");
+				break;
+			default:
+				printf(" - Tunnel type: Unkown");
+				break;
+			}
+
+			/* inner L2 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_L2_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L2_MAC:
+				printf(" - Inner L2 type: MAC");
+				break;
+			case RTE_PTYPE_INNER_L2_MAC_VLAN:
+				printf(" - Inner L2 type: MAC_VLAN");
+				break;
+			default:
+				printf(" - Inner L2 type: Unknown");
+				break;
+			}
+
+			/* inner L3 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_INNER_L3_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L3_IPV4:
+				printf(" - Inner L3 type: IPV4");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV4_EXT:
+				printf(" - Inner L3 type: IPV4_EXT");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6:
+				printf(" - Inner L3 type: IPV6");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN:
+				printf(" - Inner L3 type: IPV4_EXT_UNKNOWN");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6_EXT:
+				printf(" - Inner L3 type: IPV6_EXT");
+				break;
+			case RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN:
+				printf(" - Inner L3 type: IPV6_EXT_UNKOWN");
+				break;
+			default:
+				printf(" - Inner L3 type: Unkown");
+				break;
+			}
+
+			/* inner L4 packet type */
+			ptype = mb->packet_type & RTE_PTYPE_INNER_L4_MASK;
+			switch (ptype) {
+			case RTE_PTYPE_INNER_L4_TCP:
+				printf(" - Inner L4 type: TCP");
+				break;
+			case RTE_PTYPE_INNER_L4_UDP:
+				printf(" - Inner L4 type: UDP");
+				break;
+			case RTE_PTYPE_INNER_L4_FRAG:
+				printf(" - Inner L4 type: L4_FRAG");
+				break;
+			case RTE_PTYPE_INNER_L4_SCTP:
+				printf(" - Inner L4 type: SCTP");
+				break;
+			case RTE_PTYPE_INNER_L4_ICMP:
+				printf(" - Inner L4 type: ICMP");
+				break;
+			case RTE_PTYPE_INNER_L4_NONFRAG:
+				printf(" - Inner L4 type: L4_NONFRAG");
+				break;
+			default:
+				printf(" - Inner L4 type: Unknown");
+				break;
+			}
+			printf("\n");
+		} else
+			printf("Unknown packet type\n");
+#endif /* RTE_NEXT_ABI */
 		if (is_encapsulation) {
 			struct ipv4_hdr *ipv4_hdr;
 			struct ipv6_hdr *ipv6_hdr;
@@ -176,7 +355,11 @@ pkt_burst_receive(struct fwd_stream *fs)
 			l2_len  = sizeof(struct ether_hdr);
 
 			 /* Do not support ipv4 option field */
+#ifdef RTE_NEXT_ABI
+			if (RTE_ETH_IS_IPV4_HDR(packet_type)) {
+#else
 			if (ol_flags & PKT_RX_TUNNEL_IPV4_HDR) {
+#endif
 				l3_len = sizeof(struct ipv4_hdr);
 				ipv4_hdr = (struct ipv4_hdr *) (rte_pktmbuf_mtod(mb,
 						unsigned char *) + l2_len);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 10/18] app/test-pipeline: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (8 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 09/18] fm10k: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 11/18] app/testpmd: " Helin Zhang
                       ` (8 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pipeline/pipeline_hash.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/app/test-pipeline/pipeline_hash.c b/app/test-pipeline/pipeline_hash.c
index 4598ad4..aa3f9e5 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -459,20 +459,33 @@ app_main_loop_rx_metadata(void) {
 			signature = RTE_MBUF_METADATA_UINT32_PTR(m, 0);
 			key = RTE_MBUF_METADATA_UINT8_PTR(m, 32);
 
+#ifdef RTE_NEXT_ABI
+			if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
+#else
 			if (m->ol_flags & PKT_RX_IPV4_HDR) {
+#endif
 				ip_hdr = (struct ipv4_hdr *)
 					&m_data[sizeof(struct ether_hdr)];
 				ip_dst = ip_hdr->dst_addr;
 
 				k32 = (uint32_t *) key;
 				k32[0] = ip_dst & 0xFFFFFF00;
+#ifdef RTE_NEXT_ABI
+			} else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+#else
 			} else {
+#endif
 				ipv6_hdr = (struct ipv6_hdr *)
 					&m_data[sizeof(struct ether_hdr)];
 				ipv6_dst = ipv6_hdr->dst_addr;
 
 				memcpy(key, ipv6_dst, 16);
+#ifdef RTE_NEXT_ABI
+			} else
+				continue;
+#else
 			}
+#endif
 
 			*signature = test_hash(key, 0, 0);
 		}
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 09/18] fm10k: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (7 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 08/18] vmxnet3: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 10/18] app/test-pipeline: " Helin Zhang
                       ` (9 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/fm10k/fm10k_rxtx.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

v4 changes:
* Supported unified packet type of fm10k from v4.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 56df6cd..45005c2 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -68,12 +68,37 @@ static inline void dump_rxd(union fm10k_rx_desc *rxd)
 static inline void
 rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d)
 {
+#ifdef RTE_NEXT_ABI
+	static const uint32_t
+		ptype_table[FM10K_RXD_PKTTYPE_MASK >> FM10K_RXD_PKTTYPE_SHIFT]
+			__rte_cache_aligned = {
+		[FM10K_PKTTYPE_OTHER] = RTE_PTYPE_L2_MAC,
+		[FM10K_PKTTYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+		[FM10K_PKTTYPE_IPV4_EX] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[FM10K_PKTTYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+		[FM10K_PKTTYPE_IPV6_EX] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[FM10K_PKTTYPE_IPV4 | FM10K_PKTTYPE_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[FM10K_PKTTYPE_IPV6 | FM10K_PKTTYPE_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[FM10K_PKTTYPE_IPV4 | FM10K_PKTTYPE_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[FM10K_PKTTYPE_IPV6 | FM10K_PKTTYPE_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+	};
+
+	m->packet_type = ptype_table[(d->w.pkt_info & FM10K_RXD_PKTTYPE_MASK)
+						>> FM10K_RXD_PKTTYPE_SHIFT];
+#else /* RTE_NEXT_ABI */
 	uint16_t ptype;
 	static const uint16_t pt_lut[] = { 0,
 		PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT,
 		PKT_RX_IPV6_HDR, PKT_RX_IPV6_HDR_EXT,
 		0, 0, 0
 	};
+#endif /* RTE_NEXT_ABI */
 
 	if (d->w.pkt_info & FM10K_RXD_RSSTYPE_MASK)
 		m->ol_flags |= PKT_RX_RSS_HASH;
@@ -97,9 +122,11 @@ rx_desc_to_ol_flags(struct rte_mbuf *m, const union fm10k_rx_desc *d)
 	if (unlikely(d->d.staterr & FM10K_RXD_STATUS_RXE))
 		m->ol_flags |= PKT_RX_RECIP_ERR;
 
+#ifndef RTE_NEXT_ABI
 	ptype = (d->d.data & FM10K_RXD_PKTTYPE_MASK_L3) >>
 						FM10K_RXD_PKTTYPE_SHIFT;
 	m->ol_flags |= pt_lut[(uint8_t)ptype];
+#endif
 }
 
 uint16_t
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 08/18] vmxnet3: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (6 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 07/18] enic: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 09/18] fm10k: " Helin Zhang
                       ` (10 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index a1eac45..25ae2f6 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -649,9 +649,17 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);
 
 			if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct ipv4_hdr))
+#ifdef RTE_NEXT_ABI
+				rxm->packet_type = RTE_PTYPE_L3_IPV4_EXT;
+#else
 				rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+#endif
 			else
+#ifdef RTE_NEXT_ABI
+				rxm->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 				rxm->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 
 			if (!rcd->cnc) {
 				if (!rcd->ipc)
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 06/18] i40e: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (4 preceding siblings ...)
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 05/18] ixgbe: " Helin Zhang
@ 2015-06-19  8:14  3%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 07/18] enic: " Helin Zhang
                       ` (12 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_rxtx.c | 528 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 528 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index b2e1d6d..b951da0 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -176,6 +176,514 @@ i40e_rxd_error_to_pkt_flags(uint64_t qword)
 	return flags;
 }
 
+#ifdef RTE_NEXT_ABI
+/* For each value it means, datasheet of hardware can tell more details */
+static inline uint32_t
+i40e_rxd_pkt_type_mapping(uint8_t ptype)
+{
+	static const uint32_t ptype_table[UINT8_MAX] __rte_cache_aligned = {
+		/* L2 types */
+		/* [0] reserved */
+		[1] = RTE_PTYPE_L2_MAC,
+		[2] = RTE_PTYPE_L2_MAC_TIMESYNC,
+		/* [3] - [5] reserved */
+		[6] = RTE_PTYPE_L2_LLDP,
+		/* [7] - [10] reserved */
+		[11] = RTE_PTYPE_L2_ARP,
+		/* [12] - [21] reserved */
+
+		/* Non tunneled IPv4 */
+		[22] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_FRAG,
+		[23] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_NONFRAG,
+		[24] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_UDP,
+		/* [25] reserved */
+		[26] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_TCP,
+		[27] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_SCTP,
+		[28] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_L4_ICMP,
+
+		/* IPv4 --> IPv4 */
+		[29] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[30] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[31] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [32] reserved */
+		[33] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[34] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[35] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> IPv6 */
+		[36] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[37] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[38] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [39] reserved */
+		[40] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[41] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[42] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN */
+		[43] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> IPv4 */
+		[44] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[45] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[46] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [47] reserved */
+		[48] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[49] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[50] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> IPv6 */
+		[51] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[52] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[53] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [54] reserved */
+		[55] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[56] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[57] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC */
+		[58] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4 */
+		[59] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[60] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[61] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [62] reserved */
+		[63] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[64] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[65] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv6 */
+		[66] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[67] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[68] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [69] reserved */
+		[70] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[71] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[72] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN */
+		[73] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv4 */
+		[74] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[75] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[76] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [77] reserved */
+		[78] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[79] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[80] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv4 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv6 */
+		[81] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[82] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[83] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [84] reserved */
+		[85] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[86] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[87] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* Non tunneled IPv6 */
+		[88] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_FRAG,
+		[89] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_NONFRAG,
+		[90] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_UDP,
+		/* [91] reserved */
+		[92] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_TCP,
+		[93] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_SCTP,
+		[94] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_L4_ICMP,
+
+		/* IPv6 --> IPv4 */
+		[95] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[96] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[97] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [98] reserved */
+		[99] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[100] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[101] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> IPv6 */
+		[102] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[103] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[104] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [105] reserved */
+		[106] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[107] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[108] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN */
+		[109] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> IPv4 */
+		[110] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[111] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[112] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [113] reserved */
+		[114] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[115] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[116] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> IPv6 */
+		[117] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[118] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[119] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [120] reserved */
+		[121] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[122] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[123] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC */
+		[124] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC --> IPv4 */
+		[125] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[126] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[127] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [128] reserved */
+		[129] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[130] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[131] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC --> IPv6 */
+		[132] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[133] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[134] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [135] reserved */
+		[136] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[137] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[138] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN */
+		[139] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv4 */
+		[140] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[141] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[142] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [143] reserved */
+		[144] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[145] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[146] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* IPv6 --> GRE/Teredo/VXLAN --> MAC/VLAN --> IPv6 */
+		[147] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_FRAG,
+		[148] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_NONFRAG,
+		[149] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_UDP,
+		/* [150] reserved */
+		[151] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_TCP,
+		[152] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_SCTP,
+		[153] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_MAC_VLAN |
+			RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+			RTE_PTYPE_INNER_L4_ICMP,
+
+		/* All others reserved */
+	};
+
+	return ptype_table[ptype];
+}
+#else /* RTE_NEXT_ABI */
 /* Translate pkt types to pkt flags */
 static inline uint64_t
 i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
@@ -443,6 +951,7 @@ i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
 
 	return ip_ptype_map[ptype];
 }
+#endif /* RTE_NEXT_ABI */
 
 #define I40E_RX_DESC_EXT_STATUS_FLEXBH_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FD_ID  0x01
@@ -730,11 +1239,18 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			i40e_rxd_to_vlan_tci(mb, &rxdp[j]);
 			pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_NEXT_ABI
+			mb->packet_type =
+				i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+						I40E_RXD_QW1_PTYPE_MASK) >>
+						I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 			pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 
 			mb->packet_type = (uint16_t)((qword1 &
 					I40E_RXD_QW1_PTYPE_MASK) >>
 					I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_NEXT_ABI */
 			if (pkt_flags & PKT_RX_RSS_HASH)
 				mb->hash.rss = rte_le_to_cpu_32(\
 					rxdp[j].wb.qword0.hi_dword.rss);
@@ -971,9 +1487,15 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		i40e_rxd_to_vlan_tci(rxm, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_NEXT_ABI
+		rxm->packet_type =
+			i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+			I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 		rxm->packet_type = (uint16_t)((qword1 & I40E_RXD_QW1_PTYPE_MASK) >>
 				I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_NEXT_ABI */
 		if (pkt_flags & PKT_RX_RSS_HASH)
 			rxm->hash.rss =
 				rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
@@ -1129,10 +1651,16 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		i40e_rxd_to_vlan_tci(first_seg, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
+#ifdef RTE_NEXT_ABI
+		first_seg->packet_type =
+			i40e_rxd_pkt_type_mapping((uint8_t)((qword1 &
+			I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT));
+#else
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
 		first_seg->packet_type = (uint16_t)((qword1 &
 					I40E_RXD_QW1_PTYPE_MASK) >>
 					I40E_RXD_QW1_PTYPE_SHIFT);
+#endif /* RTE_NEXT_ABI */
 		if (pkt_flags & PKT_RX_RSS_HASH)
 			rxm->hash.rss =
 				rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 07/18] enic: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (5 preceding siblings ...)
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 06/18] i40e: " Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 08/18] vmxnet3: " Helin Zhang
                       ` (11 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/enic/enic_main.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 15313c2..f47e96c 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -423,7 +423,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 		rx_pkt->pkt_len = bytes_written;
 
 		if (ipv4) {
+#ifdef RTE_NEXT_ABI
+			rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 			rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 			if (!csum_not_calc) {
 				if (unlikely(!ipv4_csum_ok))
 					rx_pkt->ol_flags |= PKT_RX_IP_CKSUM_BAD;
@@ -432,7 +436,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 					rx_pkt->ol_flags |= PKT_RX_L4_CKSUM_BAD;
 			}
 		} else if (ipv6)
+#ifdef RTE_NEXT_ABI
+			rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
+#else
 			rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 	} else {
 		/* Header split */
 		if (sop && !eop) {
@@ -445,7 +453,11 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 				*rx_pkt_bucket = rx_pkt;
 				rx_pkt->pkt_len = bytes_written;
 				if (ipv4) {
+#ifdef RTE_NEXT_ABI
+					rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
+#else
 					rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 					if (!csum_not_calc) {
 						if (unlikely(!ipv4_csum_ok))
 							rx_pkt->ol_flags |=
@@ -457,13 +469,22 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 							    PKT_RX_L4_CKSUM_BAD;
 					}
 				} else if (ipv6)
+#ifdef RTE_NEXT_ABI
+					rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
+#else
 					rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 			} else {
 				/* Payload */
 				hdr_rx_pkt = *rx_pkt_bucket;
 				hdr_rx_pkt->pkt_len += bytes_written;
 				if (ipv4) {
+#ifdef RTE_NEXT_ABI
+					hdr_rx_pkt->packet_type =
+						RTE_PTYPE_L3_IPV4;
+#else
 					hdr_rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+#endif
 					if (!csum_not_calc) {
 						if (unlikely(!ipv4_csum_ok))
 							hdr_rx_pkt->ol_flags |=
@@ -475,7 +496,12 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
 							    PKT_RX_L4_CKSUM_BAD;
 					}
 				} else if (ipv6)
+#ifdef RTE_NEXT_ABI
+					hdr_rx_pkt->packet_type =
+						RTE_PTYPE_L3_IPV6;
+#else
 					hdr_rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+#endif
 
 			}
 		}
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 00/18] unified packet type
    @ 2015-06-19  8:14  4%   ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
                       ` (18 more replies)
  1 sibling, 19 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

Currently only 6 bits which are stored in ol_flags are used to indicate
the packet types. This is not enough, as some NIC hardware can recognize
quite a lot of packet types, e.g i40e hardware can recognize more than 150
packet types. Hiding those packet types hides hardware offload capabilities
which could be quite useful for improving performance and for end users.
So an unified packet types are needed to support all possible PMDs. A 16
bits packet_type in mbuf structure can be changed to 32 bits and used for
this purpose. In addition, all packet types stored in ol_flag field should
be deleted at all, and 6 bits of ol_flags can be save as the benifit.

Initially, 32 bits of packet_type can be divided into several sub fields
to indicate different packet type information of a packet. The initial
design is to divide those bits into fields for L2 types, L3 types, L4 types,
tunnel types, inner L2 types, inner L3 types and inner L4 types. All PMDs
should translate the offloaded packet types into these 7 fields of
information, for user applications.

To avoid breaking ABI compatibility, currently all the code changes for
unified packet type are disabled at compile time by default. Users can
enable it manually by defining the macro of RTE_NEXT_ABI. The code changes
will be valid by default in a future release, and the old version will be
deleted accordingly, after the ABI change process is done.

Note that this patch set should be integrated after another patch set for
'[PATCH v3 0/7] support i40e QinQ stripping and insertion', to clearly
solve the conflict during integration. As both patch sets modified
'struct rte_mbuf', and the final layout of the 'struct rte_mbuf' is key to
vectorized ixgbe PMD.

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.
* Used redefined packet types and enlarged packet_type field for all PMDs
  and corresponding applications.
* Removed changes in bond and its relevant application, as there is no need
  at all according to the recent bond changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Put vector ixgbe changes right after mbuf changes.
* Disabled vector ixgbe PMD by default, as mbuf layout changed, and then
  re-enabled it after vector ixgbe PMD updated.
* Put the definitions of unified packet type into a single patch.
* Minor bug fixes and enhancements in l3fwd example.

v4 changes:
* Added detailed description of each packet types.
* Supported unified packet type of fm10k.
* Added printing logs of packet types of each received packet for rxonly
  mode in testpmd.
* Removed several useless code lines which block packet type unification from
  app/test/packet_burst_generator.c.

v5 changes:
* Added more detailed description for each packet types, together with examples.
* Rolled back the macro definitions of RX packet flags, for ABI compitability.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.
* Integrated with patch set for '[PATCH v3 0/7] support i40e QinQ stripping
  and insertion', to clearly solve the conflicts during merging.

Helin Zhang (18):
  mbuf: redefine packet_type in rte_mbuf
  ixgbe: support unified packet type in vectorized PMD
  mbuf: add definitions of unified packet types
  e1000: replace bit mask based packet type with unified packet type
  ixgbe: replace bit mask based packet type with unified packet type
  i40e: replace bit mask based packet type with unified packet type
  enic: replace bit mask based packet type with unified packet type
  vmxnet3: replace bit mask based packet type with unified packet type
  fm10k: replace bit mask based packet type with unified packet type
  app/test-pipeline: replace bit mask based packet type with unified
    packet type
  app/testpmd: replace bit mask based packet type with unified packet
    type
  app/test: Remove useless code
  examples/ip_fragmentation: replace bit mask based packet type with
    unified packet type
  examples/ip_reassembly: replace bit mask based packet type with
    unified packet type
  examples/l3fwd-acl: replace bit mask based packet type with unified
    packet type
  examples/l3fwd-power: replace bit mask based packet type with unified
    packet type
  examples/l3fwd: replace bit mask based packet type with unified packet
    type
  mbuf: remove old packet type bit masks

 app/test-pipeline/pipeline_hash.c                  |  13 +
 app/test-pmd/csumonly.c                            |  14 +
 app/test-pmd/rxonly.c                              | 183 +++++++
 app/test/packet_burst_generator.c                  |   6 +-
 drivers/net/e1000/igb_rxtx.c                       | 102 ++++
 drivers/net/enic/enic_main.c                       |  26 +
 drivers/net/fm10k/fm10k_rxtx.c                     |  27 ++
 drivers/net/i40e/i40e_rxtx.c                       | 528 +++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.c                     | 163 +++++++
 drivers/net/ixgbe/ixgbe_rxtx_vec.c                 |  75 ++-
 drivers/net/vmxnet3/vmxnet3_rxtx.c                 |   8 +
 examples/ip_fragmentation/main.c                   |   9 +
 examples/ip_reassembly/main.c                      |   9 +
 examples/l3fwd-acl/main.c                          |  29 +-
 examples/l3fwd-power/main.c                        |   8 +
 examples/l3fwd/main.c                              | 123 ++++-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |   6 +
 lib/librte_mbuf/rte_mbuf.c                         |   4 +
 lib/librte_mbuf/rte_mbuf.h                         | 514 ++++++++++++++++++++
 19 files changed, 1834 insertions(+), 13 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 05/18] ixgbe: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (3 preceding siblings ...)
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
@ 2015-06-19  8:14  3%     ` Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 06/18] i40e: " Helin Zhang
                       ` (13 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet type among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.
Note that around 2.5% performance drop (64B) was observed of doing
4 ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 163 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 163 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 041c544..7b5792b 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -855,6 +855,110 @@ end_of_tx:
  *  RX functions
  *
  **********************************************************************/
+#ifdef RTE_NEXT_ABI
+#define IXGBE_PACKET_TYPE_IPV4              0X01
+#define IXGBE_PACKET_TYPE_IPV4_TCP          0X11
+#define IXGBE_PACKET_TYPE_IPV4_UDP          0X21
+#define IXGBE_PACKET_TYPE_IPV4_SCTP         0X41
+#define IXGBE_PACKET_TYPE_IPV4_EXT          0X03
+#define IXGBE_PACKET_TYPE_IPV4_EXT_SCTP     0X43
+#define IXGBE_PACKET_TYPE_IPV6              0X04
+#define IXGBE_PACKET_TYPE_IPV6_TCP          0X14
+#define IXGBE_PACKET_TYPE_IPV6_UDP          0X24
+#define IXGBE_PACKET_TYPE_IPV6_EXT          0X0C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_TCP      0X1C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_UDP      0X2C
+#define IXGBE_PACKET_TYPE_IPV4_IPV6         0X05
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_TCP     0X15
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_UDP     0X25
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT     0X0D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IXGBE_PACKET_TYPE_MAX               0X80
+#define IXGBE_PACKET_TYPE_MASK              0X7F
+#define IXGBE_PACKET_TYPE_SHIFT             0X04
+static inline uint32_t
+ixgbe_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+	static const uint32_t
+		ptype_table[IXGBE_PACKET_TYPE_MAX] __rte_cache_aligned = {
+		[IXGBE_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4,
+		[IXGBE_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[IXGBE_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6,
+		[IXGBE_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT,
+		[IXGBE_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+		[IXGBE_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+		[IXGBE_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+		[IXGBE_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+	};
+	if (unlikely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+		return RTE_PTYPE_UNKNOWN;
+
+	pkt_info = (pkt_info >> IXGBE_PACKET_TYPE_SHIFT) &
+				IXGBE_PACKET_TYPE_MASK;
+
+	return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+ixgbe_rxd_pkt_info_to_pkt_flags(uint16_t pkt_info)
+{
+	static uint64_t ip_rss_types_map[16] __rte_cache_aligned = {
+		0, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH,
+		0, PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH,
+		PKT_RX_RSS_HASH, 0, 0, 0,
+		0, 0, 0,  PKT_RX_FDIR,
+	};
+#ifdef RTE_LIBRTE_IEEE1588
+	static uint64_t ip_pkt_etqf_map[8] = {
+		0, 0, 0, PKT_RX_IEEE1588_PTP,
+		0, 0, 0, 0,
+	};
+
+	if (likely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+		return ip_pkt_etqf_map[(pkt_info >> 4) & 0X07] |
+				ip_rss_types_map[pkt_info & 0XF];
+	else
+		return ip_rss_types_map[pkt_info & 0XF];
+#else
+	return ip_rss_types_map[pkt_info & 0XF];
+#endif
+}
+#else /* RTE_NEXT_ABI */
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
@@ -890,6 +994,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 #endif
 	return pkt_flags | ip_rss_types_map[hl_tp_rs & 0xF];
 }
+#endif /* RTE_NEXT_ABI */
 
 static inline uint64_t
 rx_desc_status_to_pkt_flags(uint32_t rx_status)
@@ -945,7 +1050,13 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 	struct rte_mbuf *mb;
 	uint16_t pkt_len;
 	uint64_t pkt_flags;
+#ifdef RTE_NEXT_ABI
+	int nb_dd;
+	uint32_t s[LOOK_AHEAD];
+	uint16_t pkt_info[LOOK_AHEAD];
+#else
 	int s[LOOK_AHEAD], nb_dd;
+#endif /* RTE_NEXT_ABI */
 	int i, j, nb_rx = 0;
 
 
@@ -968,6 +1079,12 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 		for (j = LOOK_AHEAD-1; j >= 0; --j)
 			s[j] = rxdp[j].wb.upper.status_error;
 
+#ifdef RTE_NEXT_ABI
+		for (j = LOOK_AHEAD-1; j >= 0; --j)
+			pkt_info[j] = rxdp[j].wb.lower.lo_dword.
+						hs_rss.pkt_info;
+#endif /* RTE_NEXT_ABI */
+
 		/* Compute how many status bits were set */
 		nb_dd = 0;
 		for (j = 0; j < LOOK_AHEAD; ++j)
@@ -984,12 +1101,22 @@ ixgbe_rx_scan_hw_ring(struct ixgbe_rx_queue *rxq)
 			mb->vlan_tci = rte_le_to_cpu_16(rxdp[j].wb.upper.vlan);
 
 			/* convert descriptor fields to rte mbuf flags */
+#ifdef RTE_NEXT_ABI
+			pkt_flags = rx_desc_status_to_pkt_flags(s[j]);
+			pkt_flags |= rx_desc_error_to_pkt_flags(s[j]);
+			pkt_flags |=
+				ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info[j]);
+			mb->ol_flags = pkt_flags;
+			mb->packet_type =
+				ixgbe_rxd_pkt_info_to_pkt_type(pkt_info[j]);
+#else /* RTE_NEXT_ABI */
 			pkt_flags  = rx_desc_hlen_type_rss_to_pkt_flags(
 					rxdp[j].wb.lower.lo_dword.data);
 			/* reuse status field from scan list */
 			pkt_flags |= rx_desc_status_to_pkt_flags(s[j]);
 			pkt_flags |= rx_desc_error_to_pkt_flags(s[j]);
 			mb->ol_flags = pkt_flags;
+#endif /* RTE_NEXT_ABI */
 
 			if (likely(pkt_flags & PKT_RX_RSS_HASH))
 				mb->hash.rss = rxdp[j].wb.lower.hi_dword.rss;
@@ -1206,7 +1333,11 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	union ixgbe_adv_rx_desc rxd;
 	uint64_t dma_addr;
 	uint32_t staterr;
+#ifdef RTE_NEXT_ABI
+	uint32_t pkt_info;
+#else
 	uint32_t hlen_type_rss;
+#endif
 	uint16_t pkt_len;
 	uint16_t rx_id;
 	uint16_t nb_rx;
@@ -1324,6 +1455,19 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		rxm->data_len = pkt_len;
 		rxm->port = rxq->port_id;
 
+#ifdef RTE_NEXT_ABI
+		pkt_info = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.hs_rss.
+								pkt_info);
+		/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
+		rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
+
+		pkt_flags = rx_desc_status_to_pkt_flags(staterr);
+		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
+		pkt_flags = pkt_flags |
+			ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info);
+		rxm->ol_flags = pkt_flags;
+		rxm->packet_type = ixgbe_rxd_pkt_info_to_pkt_type(pkt_info);
+#else /* RTE_NEXT_ABI */
 		hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data);
 		/* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
 		rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);
@@ -1332,6 +1476,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		rxm->ol_flags = pkt_flags;
+#endif /* RTE_NEXT_ABI */
 
 		if (likely(pkt_flags & PKT_RX_RSS_HASH))
 			rxm->hash.rss = rxd.wb.lower.hi_dword.rss;
@@ -1405,6 +1550,23 @@ ixgbe_fill_cluster_head_buf(
 	uint8_t port_id,
 	uint32_t staterr)
 {
+#ifdef RTE_NEXT_ABI
+	uint16_t pkt_info;
+	uint64_t pkt_flags;
+
+	head->port = port_id;
+
+	/* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is
+	 * set in the pkt_flags field.
+	 */
+	head->vlan_tci = rte_le_to_cpu_16(desc->wb.upper.vlan);
+	pkt_info = rte_le_to_cpu_32(desc->wb.lower.lo_dword.hs_rss.pkt_info);
+	pkt_flags = rx_desc_status_to_pkt_flags(staterr);
+	pkt_flags |= rx_desc_error_to_pkt_flags(staterr);
+	pkt_flags |= ixgbe_rxd_pkt_info_to_pkt_flags(pkt_info);
+	head->ol_flags = pkt_flags;
+	head->packet_type = ixgbe_rxd_pkt_info_to_pkt_type(pkt_info);
+#else /* RTE_NEXT_ABI */
 	uint32_t hlen_type_rss;
 	uint64_t pkt_flags;
 
@@ -1420,6 +1582,7 @@ ixgbe_fill_cluster_head_buf(
 	pkt_flags |= rx_desc_status_to_pkt_flags(staterr);
 	pkt_flags |= rx_desc_error_to_pkt_flags(staterr);
 	head->ol_flags = pkt_flags;
+#endif /* RTE_NEXT_ABI */
 
 	if (likely(pkt_flags & PKT_RX_RSS_HASH))
 		head->hash.rss = rte_le_to_cpu_32(desc->wb.lower.hi_dword.rss);
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 04/18] e1000: replace bit mask based packet type with unified packet type
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
                       ` (2 preceding siblings ...)
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 03/18] mbuf: add definitions of unified packet types Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 05/18] ixgbe: " Helin Zhang
                       ` (14 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify packet types among all PMDs, bit masks of packet type for
'ol_flags' are replaced by unified packet type.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_rxtx.c | 102 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 43d6703..d1c2ef8 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -590,6 +590,99 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
  *  RX functions
  *
  **********************************************************************/
+#ifdef RTE_NEXT_ABI
+#define IGB_PACKET_TYPE_IPV4              0X01
+#define IGB_PACKET_TYPE_IPV4_TCP          0X11
+#define IGB_PACKET_TYPE_IPV4_UDP          0X21
+#define IGB_PACKET_TYPE_IPV4_SCTP         0X41
+#define IGB_PACKET_TYPE_IPV4_EXT          0X03
+#define IGB_PACKET_TYPE_IPV4_EXT_SCTP     0X43
+#define IGB_PACKET_TYPE_IPV6              0X04
+#define IGB_PACKET_TYPE_IPV6_TCP          0X14
+#define IGB_PACKET_TYPE_IPV6_UDP          0X24
+#define IGB_PACKET_TYPE_IPV6_EXT          0X0C
+#define IGB_PACKET_TYPE_IPV6_EXT_TCP      0X1C
+#define IGB_PACKET_TYPE_IPV6_EXT_UDP      0X2C
+#define IGB_PACKET_TYPE_IPV4_IPV6         0X05
+#define IGB_PACKET_TYPE_IPV4_IPV6_TCP     0X15
+#define IGB_PACKET_TYPE_IPV4_IPV6_UDP     0X25
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT     0X0D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IGB_PACKET_TYPE_MAX               0X80
+#define IGB_PACKET_TYPE_MASK              0X7F
+#define IGB_PACKET_TYPE_SHIFT             0X04
+static inline uint32_t
+igb_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+	static const uint32_t
+		ptype_table[IGB_PACKET_TYPE_MAX] __rte_cache_aligned = {
+		[IGB_PACKET_TYPE_IPV4] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV4,
+		[IGB_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT,
+		[IGB_PACKET_TYPE_IPV6] = RTE_PTYPE_L2_MAC | RTE_PTYPE_L3_IPV6,
+		[IGB_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6,
+		[IGB_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT,
+		[IGB_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_TCP,
+		[IGB_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_TCP,
+		[IGB_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6 | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_UDP] =  RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6 | RTE_PTYPE_INNER_L4_UDP,
+		[IGB_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_TUNNEL_IP |
+			RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_INNER_L4_UDP,
+		[IGB_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_SCTP,
+		[IGB_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L2_MAC |
+			RTE_PTYPE_L3_IPV4_EXT | RTE_PTYPE_L4_SCTP,
+	};
+	if (unlikely(pkt_info & E1000_RXDADV_PKTTYPE_ETQF))
+		return RTE_PTYPE_UNKNOWN;
+
+	pkt_info = (pkt_info >> IGB_PACKET_TYPE_SHIFT) & IGB_PACKET_TYPE_MASK;
+
+	return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
+{
+	uint64_t pkt_flags = ((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH;
+
+#if defined(RTE_LIBRTE_IEEE1588)
+	static uint32_t ip_pkt_etqf_map[8] = {
+		0, 0, 0, PKT_RX_IEEE1588_PTP,
+		0, 0, 0, 0,
+	};
+
+	pkt_flags |= ip_pkt_etqf_map[(hl_tp_rs >> 4) & 0x07];
+#endif
+
+	return pkt_flags;
+}
+#else /* RTE_NEXT_ABI */
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
@@ -617,6 +710,7 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 #endif
 	return pkt_flags | (((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH);
 }
+#endif /* RTE_NEXT_ABI */
 
 static inline uint64_t
 rx_desc_status_to_pkt_flags(uint32_t rx_status)
@@ -790,6 +884,10 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		rxm->ol_flags = pkt_flags;
+#ifdef RTE_NEXT_ABI
+		rxm->packet_type = igb_rxd_pkt_info_to_pkt_type(rxd.wb.lower.
+						lo_dword.hs_rss.pkt_info);
+#endif
 
 		/*
 		 * Store the mbuf address into the next entry of the array
@@ -1024,6 +1122,10 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr);
 		pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr);
 		first_seg->ol_flags = pkt_flags;
+#ifdef RTE_NEXT_ABI
+		first_seg->packet_type = igb_rxd_pkt_info_to_pkt_type(rxd.wb.
+					lower.lo_dword.hs_rss.pkt_info);
+#endif
 
 		/* Prefetch data of first segment, if configured to do so. */
 		rte_packet_prefetch((char *)first_seg->buf_addr +
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 03/18] mbuf: add definitions of unified packet types
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
@ 2015-06-19  8:14  3%     ` Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
                       ` (15 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

As there are only 6 bit flags in ol_flags for indicating packet
types, which is not enough to describe all the possible packet
types hardware can recognize. For example, i40e hardware can
recognize more than 150 packet types. Unified packet type is
composed of L2 type, L3 type, L4 type, tunnel type, inner L2 type,
inner L3 type and inner L4 type fields, and can be stored in
'struct rte_mbuf' of 32 bits field 'packet_type'.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_mbuf/rte_mbuf.h | 487 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 487 insertions(+)

v3 changes:
* Put the definitions of unified packet type into a single patch.

v4 changes:
* Added detailed description of each packet types.

v5 changes:
* Re-worded the commit logs.
* Added more detailed description for all packet types, together with examples.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index aa55769..5e7cc26 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -201,6 +201,493 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
 
+#ifdef RTE_NEXT_ABI
+/*
+ * 32 bits are divided into several fields to mark packet types. Note that
+ * each field is indexical.
+ * - Bit 3:0 is for L2 types.
+ * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
+ * - Bit 11:8 is for L4 or outer L4 (for tunneling case) types.
+ * - Bit 15:12 is for tunnel types.
+ * - Bit 19:16 is for inner L2 types.
+ * - Bit 23:20 is for inner L3 types.
+ * - Bit 27:24 is for inner L4 types.
+ * - Bit 31:28 is reserved.
+ *
+ * To be compatible with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,
+ * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
+ * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
+ *
+ * Note that L3 types values are selected for checking IPV4/IPV6 header from
+ * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
+ * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
+ *
+ * Note that the packet types of the same packet recognized by different
+ * hardware may be different, as different hardware may have different
+ * capability of packet type recognition.
+ *
+ * examples:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=0x29
+ * | 'version'=6, 'next header'=0x3A
+ * | 'ICMPv6 header'>
+ * will be recognized on i40e hardware as packet type combination of,
+ * RTE_PTYPE_L2_MAC |
+ * RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
+ * RTE_PTYPE_TUNNEL_IP |
+ * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_INNER_L4_ICMP.
+ *
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x2F
+ * | 'GRE header'
+ * | 'version'=6, 'next header'=0x11
+ * | 'UDP header'>
+ * will be recognized on i40e hardware as packet type combination of,
+ * RTE_PTYPE_L2_MAC |
+ * RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_TUNNEL_GRENAT |
+ * RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
+ * RTE_PTYPE_INNER_L4_UDP.
+ */
+#define RTE_PTYPE_UNKNOWN                   0x00000000
+/**
+ * MAC (Media Access Control) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=[0x0800|0x86DD|others]>
+ */
+#define RTE_PTYPE_L2_MAC                    0x00000001
+/**
+ * MAC (Media Access Control) packet type for time sync.
+ *
+ * Packet format:
+ * <'ether type'=0x88F7>
+ */
+#define RTE_PTYPE_L2_MAC_TIMESYNC           0x00000002
+/**
+ * ARP (Address Resolution Protocol) packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0806>
+ */
+#define RTE_PTYPE_L2_ARP                    0x00000003
+/**
+ * LLDP (Link Layer Discovery Protocol) packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x88CC>
+ */
+#define RTE_PTYPE_L2_LLDP                   0x00000004
+/**
+ * Mask of layer 2 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L2_MASK                   0x0000000f
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and does not contain any
+ * header option.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=5>
+ */
+#define RTE_PTYPE_L3_IPV4                   0x00000010
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and contains header
+ * options.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[6-15], 'options'>
+ */
+#define RTE_PTYPE_L3_IPV4_EXT               0x00000030
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and does not contain any
+ * extension header.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x3B>
+ */
+#define RTE_PTYPE_L3_IPV6                   0x00000040
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for outer packet for tunneling cases, and may or maynot contain
+ * header options.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[5-15], <'options'>>
+ */
+#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN       0x00000090
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and contains extension
+ * headers.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   'extension headers'>
+ */
+#define RTE_PTYPE_L3_IPV6_EXT               0x000000c0
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for outer packet for tunneling cases, and may or maynot contain
+ * extension headers.
+ *
+ * Packet format:
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   <'extension headers'>>
+ */
+#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN       0x000000e0
+/**
+ * Mask of layer 3 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L3_MASK                   0x000000f0
+/**
+ * TCP (Transmission Control Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=6, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=6>
+ */
+#define RTE_PTYPE_L4_TCP                    0x00000100
+/**
+ * UDP (User Datagram Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17>
+ */
+#define RTE_PTYPE_L4_UDP                    0x00000200
+/**
+ * Fragmented IP (Internet Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * It refers to those packets of any IP types, which can be recognized as
+ * fragmented. A fragmented packet cannot be recognized as any other L4 types
+ * (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP,
+ * RTE_PTYPE_L4_NONFRAG).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'MF'=1>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=44>
+ */
+#define RTE_PTYPE_L4_FRAG                   0x00000300
+/**
+ * SCTP (Stream Control Transmission Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=132, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=132>
+ */
+#define RTE_PTYPE_L4_SCTP                   0x00000400
+/**
+ * ICMP (Internet Control Message Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=1, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=1>
+ */
+#define RTE_PTYPE_L4_ICMP                   0x00000500
+/**
+ * Non-fragmented IP (Internet Protocol) packet type.
+ * It is used for outer packet for tunneling cases.
+ *
+ * It refers to those packets of any IP types, while cannot be recognized as
+ * any of above L4 types (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
+ * RTE_PTYPE_L4_FRAG, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP).
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'!=[6|17|44|132|1]>
+ */
+#define RTE_PTYPE_L4_NONFRAG                0x00000600
+/**
+ * Mask of layer 4 packet types.
+ * It is used for outer packet for tunneling cases.
+ */
+#define RTE_PTYPE_L4_MASK                   0x00000f00
+/**
+ * IP (Internet Protocol) in IP (Internet Protocol) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=[4|41]>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[4|41]>
+ */
+#define RTE_PTYPE_TUNNEL_IP                 0x00001000
+/**
+ * GRE (Generic Routing Encapsulation) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47>
+ */
+#define RTE_PTYPE_TUNNEL_GRE                0x00002000
+/**
+ * VXLAN (Virtual eXtensible Local Area Network) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=4798>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=4798>
+ */
+#define RTE_PTYPE_TUNNEL_VXLAN              0x00003000
+/**
+ * NVGRE (Network Virtualization using Generic Routing Encapsulation) tunneling
+ * packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=47
+ * | 'protocol type'=0x6558>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=47
+ * | 'protocol type'=0x6558'>
+ */
+#define RTE_PTYPE_TUNNEL_NVGRE              0x00004000
+/**
+ * GENEVE (Generic Network Virtualization Encapsulation) tunneling packet type.
+ *
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=6081>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=6081>
+ */
+#define RTE_PTYPE_TUNNEL_GENEVE             0x00005000
+/**
+ * Tunneling packet type of Teredo, VXLAN (Virtual eXtensible Local Area
+ * Network) or GRE (Generic Routing Encapsulation) could be recognized as this
+ * packet type, if they can not be recognized independently as of hardware
+ * capability.
+ */
+#define RTE_PTYPE_TUNNEL_GRENAT             0x00006000
+/**
+ * Mask of tunneling packet types.
+ */
+#define RTE_PTYPE_TUNNEL_MASK               0x0000f000
+/**
+ * MAC (Media Access Control) packet type.
+ * It is used for inner packet type only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=[0x800|0x86DD]>
+ */
+#define RTE_PTYPE_INNER_L2_MAC              0x00010000
+/**
+ * MAC (Media Access Control) packet type with VLAN (Virtual Local Area
+ * Network) tag.
+ *
+ * Packet format (inner only):
+ * <'ether type'=[0x800|0x86DD], vlan=[1-4095]>
+ */
+#define RTE_PTYPE_INNER_L2_MAC_VLAN         0x00020000
+/**
+ * Mask of inner layer 2 packet types.
+ */
+#define RTE_PTYPE_INNER_L2_MASK             0x000f0000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and does not contain any header option.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=5>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4             0x00100000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and contains header options.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[6-15], 'options'>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT         0x00200000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and does not contain any extension header.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=0x3B>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6             0x00300000
+/**
+ * IP (Internet Protocol) version 4 packet type.
+ * It is used for inner packet only, and may or maynot contain header options.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'ihl'=[5-15], <'options'>>
+ */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x00400000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and contains extension headers.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   'extension headers'>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT         0x00500000
+/**
+ * IP (Internet Protocol) version 6 packet type.
+ * It is used for inner packet only, and may or maynot contain extension
+ * headers.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=[0x3B|0x0|0x2B|0x2C|0x32|0x33|0x3C|0x87],
+ *   <'extension headers'>>
+ */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x00600000
+/**
+ * Mask of inner layer 3 packet types.
+ */
+#define RTE_PTYPE_INNER_INNER_L3_MASK       0x00f00000
+/**
+ * TCP (Transmission Control Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=6, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=6>
+ */
+#define RTE_PTYPE_INNER_L4_TCP              0x01000000
+/**
+ * UDP (User Datagram Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17>
+ */
+#define RTE_PTYPE_INNER_L4_UDP              0x02000000
+/**
+ * Fragmented IP (Internet Protocol) packet type.
+ * It is used for inner packet only, and may or maynot have layer 4 packet.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'MF'=1>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=44>
+ */
+#define RTE_PTYPE_INNER_L4_FRAG             0x03000000
+/**
+ * SCTP (Stream Control Transmission Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=132, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=132>
+ */
+#define RTE_PTYPE_INNER_L4_SCTP             0x04000000
+/**
+ * ICMP (Internet Control Message Protocol) packet type.
+ * It is used for inner packet only.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=1, 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=1>
+ */
+#define RTE_PTYPE_INNER_L4_ICMP             0x05000000
+/**
+ * Non-fragmented IP (Internet Protocol) packet type.
+ * It is used for inner packet only, and may or maynot have other unknown layer
+ * 4 packet types.
+ *
+ * Packet format (inner only):
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'!=[6|17|44|132|1]>
+ */
+#define RTE_PTYPE_INNER_L4_NONFRAG          0x06000000
+/**
+ * Mask of inner layer 4 packet types.
+ */
+#define RTE_PTYPE_INNER_L4_MASK             0x0f000000
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 6 is selected to be used for IPv4 only. Then checking bit 6 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV6)
+
+/* Check if it is a tunneling packet */
+#define RTE_ETH_IS_TUNNEL_PKT(ptype) ((ptype) & RTE_PTYPE_TUNNEL_MASK)
+#endif /* RTE_NEXT_ABI */
+
 /**
  * Get the name of a RX offload flag
  *
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 02/18] ixgbe: support unified packet type in vectorized PMD
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
  2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
@ 2015-06-19  8:14  3%     ` Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 03/18] mbuf: add definitions of unified packet types Helin Zhang
                       ` (16 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

To unify the packet type, bit masks of packet type for ol_flags are
replaced. In addition, more packet types (UDP, TCP and SCTP) are
supported in vectorized ixgbe PMD.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.
Note that around 2% performance drop (64B) was observed of doing 4
ports (1 port per 82599 card) IO forwarding on the same SNB core.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 config/common_linuxapp             |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 75 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 3 deletions(-)

v2 changes:
* Used redefined packet types and enlarged packet_type field in mbuf.

v3 changes:
* Put vector ixgbe changes right after mbuf changes.
* Enabled vector ixgbe PMD by default together with changes for updated
  vector PMD.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 617d4a1..5deb55a 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_IXGBE_INC_VECTOR=y
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
 
 #
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index abd10f6..ccea7cd 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -134,6 +134,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
  */
 #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE
 
+#ifdef RTE_NEXT_ABI
+#define OLFLAGS_MASK_V  (((uint64_t)PKT_RX_VLAN_PKT << 48) | \
+			((uint64_t)PKT_RX_VLAN_PKT << 32) | \
+			((uint64_t)PKT_RX_VLAN_PKT << 16) | \
+			((uint64_t)PKT_RX_VLAN_PKT))
+#else
 #define OLFLAGS_MASK     ((uint16_t)(PKT_RX_VLAN_PKT | PKT_RX_IPV4_HDR |\
 				     PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\
 				     PKT_RX_IPV6_HDR_EXT))
@@ -142,11 +148,26 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq)
 			  ((uint64_t)OLFLAGS_MASK << 16) | \
 			  ((uint64_t)OLFLAGS_MASK))
 #define PTYPE_SHIFT    (1)
+#endif /* RTE_NEXT_ABI */
+
 #define VTAG_SHIFT     (3)
 
 static inline void
 desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
+#ifdef RTE_NEXT_ABI
+	__m128i vtag0, vtag1;
+	union {
+		uint16_t e[4];
+		uint64_t dword;
+	} vol;
+
+	vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+	vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+	vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+	vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT);
+	vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V;
+#else
 	__m128i ptype0, ptype1, vtag0, vtag1;
 	union {
 		uint16_t e[4];
@@ -166,6 +187,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 
 	ptype1 = _mm_or_si128(ptype1, vtag1);
 	vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V;
+#endif /* RTE_NEXT_ABI */
 
 	rx_pkts[0]->ol_flags = vol.e[0];
 	rx_pkts[1]->ol_flags = vol.e[1];
@@ -196,6 +218,18 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	int pos;
 	uint64_t var;
 	__m128i shuf_msk;
+#ifdef RTE_NEXT_ABI
+	__m128i crc_adjust = _mm_set_epi16(
+				0, 0, 0,    /* ignore non-length fields */
+				-rxq->crc_len, /* sub crc on data_len */
+				0,          /* ignore high-16bits of pkt_len */
+				-rxq->crc_len, /* sub crc on pkt_len */
+				0, 0            /* ignore pkt_type field */
+			);
+	__m128i dd_check, eop_check;
+	__m128i desc_mask = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF,
+					  0xFFFFFFFF, 0xFFFF07F0);
+#else
 	__m128i crc_adjust = _mm_set_epi16(
 				0, 0, 0, 0, /* ignore non-length fields */
 				0,          /* ignore high-16bits of pkt_len */
@@ -204,6 +238,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 				0            /* ignore pkt_type field */
 			);
 	__m128i dd_check, eop_check;
+#endif /* RTE_NEXT_ABI */
 
 	if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST))
 		return 0;
@@ -232,6 +267,18 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	eop_check = _mm_set_epi64x(0x0000000200000002LL, 0x0000000200000002LL);
 
 	/* mask to shuffle from desc. to mbuf */
+#ifdef RTE_NEXT_ABI
+	shuf_msk = _mm_set_epi8(
+		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+		15, 14,      /* octet 14~15, low 16 bits vlan_macip */
+		13, 12,      /* octet 12~13, 16 bits data_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+		13, 12,      /* octet 12~13, low 16 bits pkt_len */
+		0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+		1,           /* octet 1, 8 bits pkt_type field */
+		0            /* octet 0, 4 bits offset 4 pkt_type field */
+		);
+#else
 	shuf_msk = _mm_set_epi8(
 		7, 6, 5, 4,  /* octet 4~7, 32bits rss */
 		0xFF, 0xFF,  /* skip high 16 bits vlan_macip, zero out */
@@ -241,18 +288,28 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		13, 12,      /* octet 12~13, 16 bits data_len */
 		0xFF, 0xFF   /* skip pkt_type field */
 		);
+#endif /* RTE_NEXT_ABI */
 
 	/* Cache is empty -> need to scan the buffer rings, but first move
 	 * the next 'n' mbufs into the cache */
 	sw_ring = &rxq->sw_ring[rxq->rx_tail];
 
-	/*
-	 * A. load 4 packet in one loop
+#ifdef RTE_NEXT_ABI
+	/* A. load 4 packet in one loop
+	 * [A*. mask out 4 unused dirty field in desc]
 	 * B. copy 4 mbuf point from swring to rx_pkts
 	 * C. calc the number of DD bits among the 4 packets
 	 * [C*. extract the end-of-packet bit, if requested]
 	 * D. fill info. from desc to mbuf
 	 */
+#else
+	/* A. load 4 packet in one loop
+	 * B. copy 4 mbuf point from swring to rx_pkts
+	 * C. calc the number of DD bits among the 4 packets
+	 * [C*. extract the end-of-packet bit, if requested]
+	 * D. fill info. from desc to mbuf
+	 */
+#endif /* RTE_NEXT_ABI */
 	for (pos = 0, nb_pkts_recd = 0; pos < RTE_IXGBE_VPMD_RX_BURST;
 			pos += RTE_IXGBE_DESCS_PER_LOOP,
 			rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
@@ -289,6 +346,16 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
 
+#ifdef RTE_NEXT_ABI
+		/* A* mask out 0~3 bits RSS type */
+		descs[3] = _mm_and_si128(descs[3], desc_mask);
+		descs[2] = _mm_and_si128(descs[2], desc_mask);
+
+		/* A* mask out 0~3 bits RSS type */
+		descs[1] = _mm_and_si128(descs[1], desc_mask);
+		descs[0] = _mm_and_si128(descs[0], desc_mask);
+#endif /* RTE_NEXT_ABI */
+
 		/* avoid compiler reorder optimization */
 		rte_compiler_barrier();
 
@@ -301,7 +368,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* C.1 4=>2 filter staterr info only */
 		sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);
 
+#ifdef RTE_NEXT_ABI
+		/* set ol_flags with vlan packet type */
+#else
 		/* set ol_flags with packet type and vlan tag */
+#endif /* RTE_NEXT_ABI */
 		desc_to_olflags_v(descs, &rx_pkts[pos]);
 
 		/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
@ 2015-06-19  8:14  4%     ` Helin Zhang
  2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
                       ` (17 subsequent siblings)
  18 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-19  8:14 UTC (permalink / raw)
  To: dev

In order to unify the packet type, the field of 'packet_type' in
'struct rte_mbuf' needs to be extended from 16 to 32 bits.
Accordingly, some fields in 'struct rte_mbuf' are re-organized to
support this change for Vector PMD. As 'struct rte_kni_mbuf' for
KNI should be right mapped to 'struct rte_mbuf', it should be
modified accordingly. In addition, Vector PMD of ixgbe is disabled
by default, as 'struct rte_mbuf' changed.
To avoid breaking ABI compatibility, all the changes would be
enabled by RTE_NEXT_ABI, which is disabled by default.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 config/common_linuxapp                             |  2 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |  6 ++++++
 lib/librte_mbuf/rte_mbuf.h                         | 23 ++++++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

v2 changes:
* Enlarged the packet_type field from 16 bits to 32 bits.
* Redefined the packet type sub-fields.
* Updated the 'struct rte_kni_mbuf' for KNI according to the mbuf changes.

v3 changes:
* Put the mbuf layout changes into a single patch.
* Disabled vector ixgbe PMD by default, as mbuf layout changed.

v5 changes:
* Re-worded the commit logs.

v6 changes:
* Disabled the code changes for unified packet type by default, to
  avoid breaking ABI compatibility.

v7 changes:
* Renamed RTE_UNIFIED_PKT_TYPE to RTE_NEXT_ABI.
* Integrated with changes of QinQ stripping/insertion.

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5deb55a..617d4a1 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -167,7 +167,7 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_IXGBE_INC_VECTOR=y
+CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y
 
 #
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
index 1e55c2d..e9f38bd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
@@ -117,9 +117,15 @@ struct rte_kni_mbuf {
 	uint16_t data_off;      /**< Start address of data in segment buffer. */
 	char pad1[4];
 	uint64_t ol_flags;      /**< Offload features. */
+#ifdef RTE_NEXT_ABI
+	char pad2[4];
+	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
+	uint16_t data_len;      /**< Amount of data in segment buffer. */
+#else
 	char pad2[2];
 	uint16_t data_len;      /**< Amount of data in segment buffer. */
 	uint32_t pkt_len;       /**< Total pkt len: sum of all segment data_len. */
+#endif
 
 	/* fields on second cache line */
 	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index a0f3d3b..aa55769 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -275,6 +275,28 @@ struct rte_mbuf {
 	/* remaining bytes are set on RX when pulling packet from descriptor */
 	MARKER rx_descriptor_fields1;
 
+#ifdef RTE_NEXT_ABI
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types.
+	 */
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
+#else
 	/**
 	 * The packet type, which is used to indicate ordinary packet and also
 	 * tunneled packet format, i.e. each number is represented a type of
@@ -285,6 +307,7 @@ struct rte_mbuf {
 	uint16_t data_len;        /**< Amount of data in segment buffer. */
 	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 	uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
+#endif
 	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifier (CPU order) */
 	union {
 		uint32_t rss;     /**< RSS hash result if RSS enabled */
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v13 14/14] abi: fix v2.1 abi broken issue
  2015-06-19  4:00  4%     ` [dpdk-dev] [PATCH v13 " Cunming Liang
  2015-06-19  4:00  2%       ` [dpdk-dev] [PATCH v13 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-06-19  4:00 10%       ` Cunming Liang
  1 sibling, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-19  4:00 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

RTE_NEXT_ABI will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The usrs should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v13 changes
 - Use common RTE_NEXT_ABI to replace RTE_EAL_RX_INTR

v9
 - Acked-by: vincent jardin <vincent.jardin@6wind.com>

 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 56d604a..e6f4ba8 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_NEXT_ABI
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -201,11 +203,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_NEXT_ABI
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_NEXT_ABI
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -764,7 +770,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_NEXT_ABI
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -805,6 +813,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_NEXT_ABI
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -822,6 +831,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -917,9 +927,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_NEXT_ABI
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1011,12 +1023,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_NEXT_ABI
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1024,7 +1038,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_NEXT_ABI
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1042,11 +1058,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_NEXT_ABI
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1871,6 +1889,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_NEXT_ABI
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1898,6 +1917,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3766,6 +3786,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_NEXT_ABI
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3807,6 +3828,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3816,18 +3838,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_NEXT_ABI
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_NEXT_ABI
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3884,6 +3909,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 7b428eb..e3ff015 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_NEXT_ABI
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_NEXT_ABI
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_NEXT_ABI
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1487,7 +1493,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_NEXT_ABI
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1520,6 +1528,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_NEXT_ABI
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1538,6 +1547,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1625,9 +1635,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_NEXT_ABI
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1733,12 +1745,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_NEXT_ABI
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2341,6 +2355,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_NEXT_ABI
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2351,6 +2366,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3133,7 +3149,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_NEXT_ABI
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3166,6 +3184,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_NEXT_ABI
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3183,7 +3202,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3229,19 +3248,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_NEXT_ABI
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_NEXT_ABI
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3252,11 +3275,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_NEXT_ABI
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3840,6 +3865,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_NEXT_ABI
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3908,21 +3934,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_NEXT_ABI
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_NEXT_ABI
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3933,6 +3963,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3943,18 +3974,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_NEXT_ABI
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_NEXT_ABI
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4006,6 +4040,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index bae36f7..7f57491 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,6 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
 	},
 };
 
@@ -889,7 +888,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index ba4640a..a730d89 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -51,9 +51,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_NEXT_ABI
+	/**
+	 * RTE_NEXT_ABI will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index d7a5403..291d5ab 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_NEXT_ABI
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_NEXT_ABI
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -876,6 +884,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_NEXT_ABI
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -918,6 +927,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1056,6 +1066,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_NEXT_ABI
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1168,3 +1179,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 912cc50..e46e65e 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_NEXT_ABI
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_NEXT_ABI
+	/**
+	 * RTE_NEXT_ABI will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_NEXT_ABI
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_NEXT_ABI
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_NEXT_ABI
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_NEXT_ABI
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_NEXT_ABI
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index d149f12..4d4e456 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_NEXT_ABI
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3352,6 +3353,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index efa246f..8d8b641 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_NEXT_ABI
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2948,8 +2950,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_NEXT_ABI
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2972,9 +2986,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_NEXT_ABI
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH v13 10/14] ethdev: add rx intr enable, disable and ctl functions
  2015-06-19  4:00  4%     ` [dpdk-dev] [PATCH v13 " Cunming Liang
@ 2015-06-19  4:00  2%       ` Cunming Liang
  2015-06-19  4:00 10%       ` [dpdk-dev] [PATCH v13 14/14] abi: fix v2.1 abi broken issue Cunming Liang
  1 sibling, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-19  4:00 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v13 changes
 - version map cleanup for v2.1

v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e13fde5..d149f12 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 04c192d..efa246f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1391,6 +1401,10 @@ struct eth_dev_ops {
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
 	eth_set_mc_addr_list_t set_mc_addr_list; /**< set list of mcast addrs */
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2873,6 +2887,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 012a82e..3981b7b 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -109,6 +109,10 @@ DPDK_2.0 {
 DPDK_2.1 {
 	global:
 
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_set_mc_addr_list;
 
 	local: *;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v13 00/14] Interrupt mode PMD
  2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
                       ` (2 preceding siblings ...)
  2015-06-09 23:59  0%     ` [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD Stephen Hemminger
@ 2015-06-19  4:00  4%     ` Cunming Liang
  2015-06-19  4:00  2%       ` [dpdk-dev] [PATCH v13 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-06-19  4:00 10%       ` [dpdk-dev] [PATCH v13 14/14] abi: fix v2.1 abi broken issue Cunming Liang
  3 siblings, 2 replies; 200+ results
From: Cunming Liang @ 2015-06-19  4:00 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

v13 changes
 - version map cleanup for v2.1
 - replace RTE_EAL_RX_INTR by RTE_NEXT_ABI for ABI compatibility

Patch series v12
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Danny Zhou <danny.zhou@intel.com>

v12 changes
 - bsd cleanup for unused variable warning
 - fix awkward line split in debug message

v11 changes
 - typo cleanup and check kernel style

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by default so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (14):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  eal/bsd: fix inappropriate linuxapp referred in bsd
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 202 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  30 ++
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  91 +++-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |  12 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 362 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |  15 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1883 insertions(+), 127 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 2/9] lib_vhost: Support multiple queues in virtio dev
  2015-06-18 13:34  3%       ` Flavio Leitner
@ 2015-06-19  1:17  3%         ` Ouyang, Changchun
  0 siblings, 0 replies; 200+ results
From: Ouyang, Changchun @ 2015-06-19  1:17 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: dev



> -----Original Message-----
> From: Flavio Leitner [mailto:fbl@sysclose.org]
> Sent: Thursday, June 18, 2015 9:34 PM
> To: Ouyang, Changchun
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 2/9] lib_vhost: Support multiple queues in
> virtio dev
> 
> On Mon, Jun 15, 2015 at 03:56:39PM +0800, Ouyang Changchun wrote:
> > Each virtio device could have multiple queues, say 2 or 4, at most 8.
> > Enabling this feature allows virtio device/port on guest has the
> > ability to use different vCPU to receive/transmit packets from/to each
> queue.
> >
> > In multiple queues mode, virtio device readiness means all queues of
> > this virtio device are ready, cleanup/destroy a virtio device also
> > requires clearing all queues belong to it.
> >
> > Changes in v3:
> >   - fix coding style
> >   - check virtqueue idx validity
> >
> > Changes in v2:
> >   - remove the q_num_set api
> >   - add the qp_num_get api
> >   - determine the queue pair num from qemu message
> >   - rework for reset owner message handler
> >   - dynamically alloc mem for dev virtqueue
> >   - queue pair num could be 0x8000
> >   - fix checkpatch errors
> >
> > Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> > ---
> >  lib/librte_vhost/rte_virtio_net.h             |  10 +-
> >  lib/librte_vhost/vhost-net.h                  |   1 +
> >  lib/librte_vhost/vhost_rxtx.c                 |  49 +++++---
> >  lib/librte_vhost/vhost_user/vhost-net-user.c  |   4 +-
> >  lib/librte_vhost/vhost_user/virtio-net-user.c |  76 +++++++++---
> >  lib/librte_vhost/vhost_user/virtio-net-user.h |   2 +
> >  lib/librte_vhost/virtio-net.c                 | 161 +++++++++++++++++---------
> >  7 files changed, 216 insertions(+), 87 deletions(-)
> >
> > diff --git a/lib/librte_vhost/rte_virtio_net.h
> > b/lib/librte_vhost/rte_virtio_net.h
> > index 5d38185..873be3e 100644
> > --- a/lib/librte_vhost/rte_virtio_net.h
> > +++ b/lib/librte_vhost/rte_virtio_net.h
> > @@ -59,7 +59,6 @@ struct rte_mbuf;
> >  /* Backend value set by guest. */
> >  #define VIRTIO_DEV_STOPPED -1
> >
> > -
> >  /* Enum for virtqueue management. */
> >  enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
> >
> > @@ -96,13 +95,14 @@ struct vhost_virtqueue {
> >   * Device structure contains all configuration information relating to the
> device.
> >   */
> >  struct virtio_net {
> > -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains
> all virtqueue information. */
> >  	struct virtio_memory	*mem;		/**< QEMU memory and
> memory region information. */
> > +	struct vhost_virtqueue	**virtqueue;    /**< Contains all virtqueue
> information. */
> >  	uint64_t		features;	/**< Negotiated feature set.
> */
> >  	uint64_t		device_fh;	/**< device identifier. */
> >  	uint32_t		flags;		/**< Device flags. Only used
> to check if device is running on data core. */
> >  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> >  	char			ifname[IF_NAME_SZ];	/**< Name of the tap
> device or socket path. */
> > +	uint32_t		num_virt_queues;
> >  	void			*priv;		/**< private context */
> >  } __rte_cache_aligned;
> 
> 
> As already pointed out, this breaks ABI.
> Do you have a plan for that or are you pushing this for dpdk 2.2?

Yes, I think it will be enabled in 2.2.
I have already  sent out the abi announce a few days ago.
> 
> 
> > @@ -220,4 +220,10 @@ uint16_t rte_vhost_enqueue_burst(struct
> > virtio_net *dev, uint16_t queue_id,  uint16_t
> rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> >  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count);
> >
> > +/**
> > + * This function get the queue pair number of one vhost device.
> > + * @return
> > + *  num of queue pair of specified virtio device.
> > + */
> > +uint16_t rte_vhost_qp_num_get(struct virtio_net *dev);
> 
> This needs to go to rte_vhost_version.map too.
Will update it.

Thanks
Changchun

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-18 16:55  8% ` O'Driscoll, Tim
@ 2015-06-18 21:13  4%   ` Vincent JARDIN
  2015-06-19 10:26  9%   ` Neil Horman
  1 sibling, 0 replies; 200+ results
From: Vincent JARDIN @ 2015-06-18 21:13 UTC (permalink / raw)
  To: O'Driscoll, Tim, Thomas Monjalon; +Cc: dev

On 18/06/2015 18:55, O'Driscoll, Tim wrote:
> I like Olivier's proposal on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these changes instead of a separate option per patch set (seehttp://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we should rework the affected patch sets to use that approach for 2.1.

Do we have any other options to meet the short deadlines of 2.1?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
                   ` (2 preceding siblings ...)
  2015-06-17 10:35  9% ` Neil Horman
@ 2015-06-18 16:55  8% ` O'Driscoll, Tim
  2015-06-18 21:13  4%   ` Vincent JARDIN
  2015-06-19 10:26  9%   ` Neil Horman
  3 siblings, 2 replies; 200+ results
From: O'Driscoll, Tim @ 2015-06-18 16:55 UTC (permalink / raw)
  To: Thomas Monjalon, dev

> -----Original Message-----
> From: announce [mailto:announce-bounces@dpdk.org] On Behalf Of Thomas
> Monjalon
> Sent: Wednesday, June 17, 2015 12:30 AM
> To: announce@dpdk.org
> Subject: [dpdk-announce] important design choices - statistics - ABI
> 
> Hi all,
> 

> During the development of the release 2.0, there was an agreement to
> keep
> ABI compatibility or to bring new ABI while keeping old one during one
> release.
> In case it's not possible to have this transition, the (exceptional)
> break
> should be acknowledged by several developers.
> 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> There were some interesting discussions but not a lot of participants:
> 	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus
> =8461
> 
> During the current development cycle for the release 2.1, the ABI
> question
> arises many times in different threads.
> To add the hash key size field, it is proposed to use a struct padding
> gap:
> 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
> To support the flow director for VF, there is no proposal yet:
> 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
> To add the speed capability, it is proposed to break ABI in the release
> 2.2:
> 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
> To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
> To add the interrupt mode, it is proposed to add a build-time option
> CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
> binary:
> 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
> To add the packet type, there is a proposal to add a build-time option
> CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
> We must also better document how to remove a deprecated ABI:
> 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
> The ABI compatibility is a new constraint and we need to better
> understand
> what it means and how to proceed. Even the macros are not yet well
> documented:
> 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
> 
> Thanks for your attention and your participation in these important
> choices.

There's been some good discussion on the ABI policy in various responses to this email. I think we now need to reach a conclusion on how we're going to proceed for the 2.1 release. Then, we can have further discussion on the use of versioning or other methods for avoiding the problem in future.

For the 2.1 release, I think we should agree to make patches that change the ABI controllable via a compile-time option. I like Olivier's proposal on using a single option (CONFIG_RTE_NEXT_ABI) to control all of these changes instead of a separate option per patch set (see http://dpdk.org/ml/archives/dev/2015-June/019147.html), so I think we should rework the affected patch sets to use that approach for 2.1.


Tim

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17 11:17  4%   ` Bruce Richardson
@ 2015-06-18 16:32  4%     ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-18 16:32 UTC (permalink / raw)
  To: Richardson, Bruce, Matthew Hall; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, June 17, 2015 12:17 PM
> To: Matthew Hall
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices -
> statistics - ABI
> 
> On Tue, Jun 16, 2015 at 09:36:54PM -0700, Matthew Hall wrote:
> > On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > > There were some debates about software statistics disabling.
> > > Should they be always on or possibly disabled when compiled?
> > > We need to take a decision shortly and discuss (or agree) this proposal:
> > > 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> >
> > This goes against the idea I have seen before that we should be moving
> toward
> > a distro-friendly approach where one copy of DPDK can be used by multiple
> apps
> > without having to rebuild it. It seems like it is also a bit ABI hostile
> > according to the below goals / discussions.
> >
> > Jemalloc is also very high-performance code and still manages to allow
> > enabling and disabling statistics at runtime. Are we sure it's impossible for
> > DPDK or just theorizing?
> >
> 
> +1 to this. I think that any compile-time option to disable stats should only
> be used when we have a proven performance issue with just disabling them
> at runtime.
> I would assume that apps do not switch on or off stats multiple times per
> second,
> so any code branches to track stats or not would be entirely predictable in
> the
> code - since they always go one way. Therefore, when disabledi, we should
> be looking
> at a very minimal overhead per stat. If there are lots of checks for the same
> value in the one path, i.e. lots of stats in a hot path, hopefully the compiler
> will be smart enough to make the check just once. If not, we can always do
> that in the C code by duplicating the hotpath code for with or without stats
> cases -
> again selectable at runtime.
> 

I see where you're coming from, but reality is you cannot guarantee that the few conditional branches in the library are going to be predicted correctly simply because the application and the other libraries used by the app have an unknown number of conditional branches themselves. For complex apps, the total number of conditional branches is large and the more there are, the lesser probability of such a branch being predicted correctly is. I agree that for test apps like l3fwd with just a few branches the probability to have library branches being predicted correctly is high, but for me is an incorrect proof point. The cost of ~14 cycles per branch misprediction is important for DPDK packet budgets.

Since we don't control the application, I am feeling very uncomfortable with generic statements about how application is likely to execute and the impact of library conditional branches over the application should be. To me, they sound like we are willing to take chances, and to me this is not the right decision. In my opinion, the right decision for a significant framework like DPDK is to keep all options open for the apps: keep counters always enabled, keep counters always disabled, keep counters enabled first and disabled later. I suggest we should move the focus from WHY arguments like: "I don't think anybody will ever need this this" to the HOW part of making sure that technical solution we pick is correct and keeps all options open.

I think stats are not always equivalent to incrementing a counter. In the guideline proposal, I am providing several examples of more complex statistics logic that is more than just inc one counter. As an extreme case, think about the case where the metric to compute requires complex math like prediction of next packet arrival time based on recent history, etc. When defining a policy, we should consider a broad spectrum of metrics, not just n_pkts_in.


> Also, there is also the case where the stats tracking itself is such low
> overhead
> that its not worth disabling. In that case, neither runtime nor compile-time
> disabling
> should need to be provided. For example, any library that is keeping a track
> of
> bursts of packets should not need to have that stat disable option - one
> increment
> or addition per burst of (32) packets is not going to seriously affect any app. :-
> )
> 
> Regards,
> /Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
  2015-06-18 15:06  0% ` Marc Sune
@ 2015-06-18 15:33  3%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-18 15:33 UTC (permalink / raw)
  To: Marc Sune; +Cc: dev, Morten Brørup

2015-06-18 17:06, Marc Sune:
> On 18/06/15 16:43, Morten Brørup wrote:
> > Regarding the PHY speed ABI:
> >
> > 1. The Ethernet PHY ABI for speed, duplex, etc. should be common throughout the entire DPDK. It might be confusing if some structures/functions use a bitmask to indicate PHY speed/duplex/personality/etc. and other structures/functions use a combination of an unsigned integer, duplex flag, personality enumeration etc. (By personality enumeration, I am referring to PHYs with multiple electrical interfaces. E.g. a dual personality PHY might have both an RJ45 copper interface and an SFP module interface, whereof only one can be active at any time.)
> 
> Thomas was sending a similar comment and I agreed to do a unified speed 
> bitmap for both capabilities and link negotiation/link info (v3, waiting 
> for 2.2 merge window):
> 
> http://dpdk.org/ml/archives/dev/2015-June/019207.html

It would be better to try merging it in 2.1 while keeping an ABI backward
compatibility.

> > 2. The auto-negotiation standard allows the PHY to announce (to its link
> > partner) any subset of its capabilities to its link partner. E.g. a
> > standard 10/100/100 Ethernet PHY (which can handle both 10 and 100 Mbit/s
> > in both half and full duplex and 1 Gbit/s full duplex) can be configured
> > to announce 10 Mbit/s half duplex and 100 Mbit/s full duplex capabilities
> > to its link partner. (Of course, more useful combinations are normally
> > announced, but the purpose of the example is to show that any combination
> > is possible.)
> >
> > The ABI for auto-negotiation should include options to select the list of
> > capabilities to announce to the link partner. The Linux PHY ABI only
> > allows forcing a selected speed and duplex (thereby disabling
> > auto-negotiation) or enabling auto-negotiation (thereby announcing all
> > possible speeds and duplex combinations the PHY is capable of). Don't make
> > the same mistake in DPDK.
> 
> I see what you mean, and you are probably right. In any case this is for 
> a separate patch, if we think it is a necessary feature to implement.
> 
> Nevertheless, this makes me rethink about the proposal from Thomas about 
> unifying _100_HD/_100_FD to 100M, because you will need this 
> granularity, eventually. @Thomas: opinions?

I think Morten's advice is good.
Are we going to have half duplex links for PHY faster than 100M?
If not, we can manage them with a hackish define 100M_HD and a 100M
which implicitly means full duplex.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
  2015-06-18 14:43  5% [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info Morten Brørup
@ 2015-06-18 15:06  0% ` Marc Sune
  2015-06-18 15:33  3%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Marc Sune @ 2015-06-18 15:06 UTC (permalink / raw)
  To: dev, Thomas Monjalon



On 18/06/15 16:43, Morten Brørup wrote:
> Regarding the PHY speed ABI:
>
>   
>
> 1. The Ethernet PHY ABI for speed, duplex, etc. should be common throughout the entire DPDK. It might be confusing if some structures/functions use a bitmask to indicate PHY speed/duplex/personality/etc. and other structures/functions use a combination of an unsigned integer, duplex flag, personality enumeration etc. (By personality enumeration, I am referring to PHYs with multiple electrical interfaces. E.g. a dual personality PHY might have both an RJ45 copper interface and an SFP module interface, whereof only one can be active at any time.)

Thomas was sending a similar comment and I agreed to do a unified speed 
bitmap for both capabilities and link negotiation/link info (v3, waiting 
for 2.2 merge window):

http://dpdk.org/ml/archives/dev/2015-June/019207.html

>
>   
>
> 2. The auto-negotiation standard allows the PHY to announce (to its link partner) any subset of its capabilities to its link partner. E.g. a standard 10/100/100 Ethernet PHY (which can handle both 10 and 100 Mbit/s in both half and full duplex and 1 Gbit/s full duplex) can be configured to announce 10 Mbit/s half duplex and 100 Mbit/s full duplex capabilities to its link partner. (Of course, more useful combinations are normally announced, but the purpose of the example is to show that any combination is possible.)
>
>   
>
> The ABI for auto-negotiation should include options to select the list of capabilities to announce to the link partner. The Linux PHY ABI only allows forcing a selected speed and duplex (thereby disabling auto-negotiation) or enabling auto-negotiation (thereby announcing all possible speeds and duplex combinations the PHY is capable of). Don't make the same mistake in DPDK.

I see what you mean, and you are probably right. In any case this is for 
a separate patch, if we think it is a necessary feature to implement.

Nevertheless, this makes me rethink about the proposal from Thomas about 
unifying _100_HD/_100_FD to 100M, because you will need this 
granularity, eventually. @Thomas: opinions?

Thanks
Marc

p.s. Please configure your email client to reply using "In-Reply-To:" to 
allow clients and ML archives to use threading.

>   
>
> PS: While working for Vitesse Semiconductors (an Ethernet chip company) a long time ago, I actually wrote the API for their line of Ethernet PHYs. So I have hands on experience in this area.
>
>   
>
>   
>
> Med venlig hilsen / kind regards
>
>   
>
> Morten Brørup
>
> CTO
>
>   
>
>   
>
>   
>
> SmartShare Systems A/S
>
> Tonsbakken 16-18
>
> DK-2740 Skovlunde
>
> Denmark
>
>   
>
> Office      +45 70 20 00 93
>
> Direct      +45 89 93 50 22
>
> Mobile      +45 25 40 82 12
>
>   
>
> mb@smartsharesystems.com <mailto:mb@smartsharesystems.com>
>
> www.smartsharesystems.com <http://www.smartsharesystems.com/>
>
>   
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
@ 2015-06-18 14:43  5% Morten Brørup
  2015-06-18 15:06  0% ` Marc Sune
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2015-06-18 14:43 UTC (permalink / raw)
  To: dev

Regarding the PHY speed ABI:

 

1. The Ethernet PHY ABI for speed, duplex, etc. should be common throughout the entire DPDK. It might be confusing if some structures/functions use a bitmask to indicate PHY speed/duplex/personality/etc. and other structures/functions use a combination of an unsigned integer, duplex flag, personality enumeration etc. (By personality enumeration, I am referring to PHYs with multiple electrical interfaces. E.g. a dual personality PHY might have both an RJ45 copper interface and an SFP module interface, whereof only one can be active at any time.)

 

2. The auto-negotiation standard allows the PHY to announce (to its link partner) any subset of its capabilities to its link partner. E.g. a standard 10/100/100 Ethernet PHY (which can handle both 10 and 100 Mbit/s in both half and full duplex and 1 Gbit/s full duplex) can be configured to announce 10 Mbit/s half duplex and 100 Mbit/s full duplex capabilities to its link partner. (Of course, more useful combinations are normally announced, but the purpose of the example is to show that any combination is possible.)

 

The ABI for auto-negotiation should include options to select the list of capabilities to announce to the link partner. The Linux PHY ABI only allows forcing a selected speed and duplex (thereby disabling auto-negotiation) or enabling auto-negotiation (thereby announcing all possible speeds and duplex combinations the PHY is capable of). Don't make the same mistake in DPDK.

 

PS: While working for Vitesse Semiconductors (an Ethernet chip company) a long time ago, I actually wrote the API for their line of Ethernet PHYs. So I have hands on experience in this area.

 

 

Med venlig hilsen / kind regards

 

Morten Brørup

CTO

 

 

 

SmartShare Systems A/S

Tonsbakken 16-18

DK-2740 Skovlunde

Denmark

 

Office      +45 70 20 00 93

Direct      +45 89 93 50 22

Mobile      +45 25 40 82 12

 

mb@smartsharesystems.com <mailto:mb@smartsharesystems.com> 

www.smartsharesystems.com <http://www.smartsharesystems.com/> 

 

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX queue information
  2015-06-18 14:17  3%       ` Ananyev, Konstantin
@ 2015-06-18 14:37  0%         ` Walukiewicz, Miroslaw
  0 siblings, 0 replies; 200+ results
From: Walukiewicz, Miroslaw @ 2015-06-18 14:37 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, June 18, 2015 4:17 PM
> To: Walukiewicz, Miroslaw; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX
> queue information
> 
> Hi Mirek,
> 
> > -----Original Message-----
> > From: Walukiewicz, Miroslaw
> > Sent: Thursday, June 18, 2015 2:31 PM
> > To: Ananyev, Konstantin; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve
> RX/TX queue information
> >
> > Konstantin,
> >
> > Is there a possibility to read information about available space in NIC queue
> for TX.
> 
> I suppose it is possible as some future addition.
> As I said in the commit message, I left some reserved space, so extra fields
> could be added to the structure without ABI breakage.
> For now, I just added some static/config information for the queue, but I think
> it is possible and plausible to have some runtime information too.
> 
> >
> > It is quite easy to compute (or even available directly)  and very useful
> especially for application sending multi-descriptor packets like
> > TCP TSO.
> >
> > Now there is no access to such information and the transmit packet function
> must be called to
> > be sure that there is available space.
> 
> Hmm, as I said I was thinking about adding some RT information in future:
> number of free descriptors (from SW point of view), index of next descriptor
> to process by SW, etc.
> But my thought it would be use by some watchdog thread to collect
> statistics/detect stall, etc.
> I didn't intend it to be used by IO thread.
> 
> I am not sure why do you need to call such function at RT?
> PMD wouldn't TX a packet, if there is not enough free TXDs for the whole
> packet.
> From other side upper layer, can't always calculate correctly how many TXD it
> would really need
> (context descriptors might be needed, etc).
>  Plus, even if nb_tx_free==X, in reality it could be there much more free
> TXDs, and SW just need
> to process them (and would do that at next  tx_burst() call.
> So, why just not:
> ...
> n = tx_burst(..., nb_tx);
> if (n < nb_tx) {requeue unsent of packets;}
> ?
> 
The case is TCP TSO operation as I said.

I really use the that method proposed by you but it very expensive way.

The TX path in TCP is very complex and expensive. 
It is desirable to create the TCP TSO segments that fits the space in queue to avoid re-queue  of packets 
making the TX path much more complex. 
For ixgbe the TSO segments are up to 40 descriptors (max 64K of data) and making the TX queue full is just simple.

Having the possibility for reading number of descriptors available to send can improve TCP TX process in the way 
that when there is no space in TX queue we can just not enter the complex TX path and spent time making more 
useful job than re-queue.
Also It is easy to fit the TSO segment to available space in TX queue instead of sending always a large 64K TCP segments 
and count on the result of tx_burst and re-queue in case of lack of space.

Mirek

> Konstantin
> 
> >
> > Mirek
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Konstantin
> Ananyev
> > > Sent: Thursday, June 18, 2015 3:19 PM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX
> > > queue information
> > >
> > > Add the ability for the upper layer to query RX/TX queue information.
> > > Right now supported for:
> > > ixgbe, i40e, e1000 PMDs.
> > >
> > > Konstantin Ananyev (5):
> > >   ethdev: add new API to retrieve RX/TX queue information
> > >   i40e: add support for eth_(rxq|txq)_info_get
> > >   ixgbe: add support for eth_(rxq|txq)_info_get
> > >   e1000: add support for eth_(rxq|txq)_info_get
> > >   testpmd: add new command to display RX/TX queue information
> > >
> > >  app/test-pmd/cmdline.c           | 48 +++++++++++++++++++++++++
> > >  app/test-pmd/config.c            | 67
> > > ++++++++++++++++++++++++++++++++++
> > >  app/test-pmd/testpmd.h           |  2 ++
> > >  drivers/net/e1000/e1000_ethdev.h | 12 +++++++
> > >  drivers/net/e1000/em_ethdev.c    |  2 ++
> > >  drivers/net/e1000/em_rxtx.c      | 38 ++++++++++++++++++++
> > >  drivers/net/e1000/igb_ethdev.c   |  4 +++
> > >  drivers/net/e1000/igb_rxtx.c     | 36 +++++++++++++++++++
> > >  drivers/net/i40e/i40e_ethdev.c   |  2 ++
> > >  drivers/net/i40e/i40e_ethdev.h   |  5 +++
> > >  drivers/net/i40e/i40e_rxtx.c     | 42 ++++++++++++++++++++++
> > >  drivers/net/ixgbe/ixgbe_ethdev.c |  4 +++
> > >  drivers/net/ixgbe/ixgbe_ethdev.h |  6 ++++
> > >  drivers/net/ixgbe/ixgbe_rxtx.c   | 42 ++++++++++++++++++++++
> > >  lib/librte_ether/rte_ethdev.c    | 54 ++++++++++++++++++++++++++++
> > >  lib/librte_ether/rte_ethdev.h    | 77
> > > +++++++++++++++++++++++++++++++++++++++-
> > >  16 files changed, 440 insertions(+), 1 deletion(-)
> > >
> > > --
> > > 1.8.5.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX queue information
  @ 2015-06-18 14:17  3%       ` Ananyev, Konstantin
  2015-06-18 14:37  0%         ` Walukiewicz, Miroslaw
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-06-18 14:17 UTC (permalink / raw)
  To: Walukiewicz, Miroslaw, dev

Hi Mirek,

> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Thursday, June 18, 2015 2:31 PM
> To: Ananyev, Konstantin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX queue information
> 
> Konstantin,
> 
> Is there a possibility to read information about available space in NIC queue for TX.

I suppose it is possible as some future addition.
As I said in the commit message, I left some reserved space, so extra fields could be added to the structure without ABI breakage.
For now, I just added some static/config information for the queue, but I think it is possible and plausible to have some runtime information too.

> 
> It is quite easy to compute (or even available directly)  and very useful especially for application sending multi-descriptor packets like
> TCP TSO.
> 
> Now there is no access to such information and the transmit packet function must be called to
> be sure that there is available space.

Hmm, as I said I was thinking about adding some RT information in future:
number of free descriptors (from SW point of view), index of next descriptor to process by SW, etc.
But my thought it would be use by some watchdog thread to collect statistics/detect stall, etc.
I didn't intend it to be used by IO thread. 

I am not sure why do you need to call such function at RT?
PMD wouldn't TX a packet, if there is not enough free TXDs for the whole packet.
>From other side upper layer, can't always calculate correctly how many TXD it would really need
(context descriptors might be needed, etc).
 Plus, even if nb_tx_free==X, in reality it could be there much more free TXDs, and SW just need
to process them (and would do that at next  tx_burst() call.
So, why just not:
...
n = tx_burst(..., nb_tx);
if (n < nb_tx) {requeue unsent of packets;}
?

Konstantin

> 
> Mirek
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Konstantin Ananyev
> > Sent: Thursday, June 18, 2015 3:19 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX
> > queue information
> >
> > Add the ability for the upper layer to query RX/TX queue information.
> > Right now supported for:
> > ixgbe, i40e, e1000 PMDs.
> >
> > Konstantin Ananyev (5):
> >   ethdev: add new API to retrieve RX/TX queue information
> >   i40e: add support for eth_(rxq|txq)_info_get
> >   ixgbe: add support for eth_(rxq|txq)_info_get
> >   e1000: add support for eth_(rxq|txq)_info_get
> >   testpmd: add new command to display RX/TX queue information
> >
> >  app/test-pmd/cmdline.c           | 48 +++++++++++++++++++++++++
> >  app/test-pmd/config.c            | 67
> > ++++++++++++++++++++++++++++++++++
> >  app/test-pmd/testpmd.h           |  2 ++
> >  drivers/net/e1000/e1000_ethdev.h | 12 +++++++
> >  drivers/net/e1000/em_ethdev.c    |  2 ++
> >  drivers/net/e1000/em_rxtx.c      | 38 ++++++++++++++++++++
> >  drivers/net/e1000/igb_ethdev.c   |  4 +++
> >  drivers/net/e1000/igb_rxtx.c     | 36 +++++++++++++++++++
> >  drivers/net/i40e/i40e_ethdev.c   |  2 ++
> >  drivers/net/i40e/i40e_ethdev.h   |  5 +++
> >  drivers/net/i40e/i40e_rxtx.c     | 42 ++++++++++++++++++++++
> >  drivers/net/ixgbe/ixgbe_ethdev.c |  4 +++
> >  drivers/net/ixgbe/ixgbe_ethdev.h |  6 ++++
> >  drivers/net/ixgbe/ixgbe_rxtx.c   | 42 ++++++++++++++++++++++
> >  lib/librte_ether/rte_ethdev.c    | 54 ++++++++++++++++++++++++++++
> >  lib/librte_ether/rte_ethdev.h    | 77
> > +++++++++++++++++++++++++++++++++++++++-
> >  16 files changed, 440 insertions(+), 1 deletion(-)
> >
> > --
> > 1.8.5.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCHv2 0/5] ethdev: add new API to retrieve RX/TX queue information
    2015-06-18 13:18  3%     ` [dpdk-dev] [PATCHv2 1/5] " Konstantin Ananyev
  @ 2015-06-18 13:58  3%     ` Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-18 13:58 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

On Thu, Jun 18, 2015 at 02:18:43PM +0100, Konstantin Ananyev wrote:
> Add the ability for the upper layer to query RX/TX queue information.
> Right now supported for:
> ixgbe, i40e, e1000 PMDs.
> 
> Konstantin Ananyev (5):
>   ethdev: add new API to retrieve RX/TX queue information
>   i40e: add support for eth_(rxq|txq)_info_get
>   ixgbe: add support for eth_(rxq|txq)_info_get
>   e1000: add support for eth_(rxq|txq)_info_get
>   testpmd: add new command to display RX/TX queue information
> 
>  app/test-pmd/cmdline.c           | 48 +++++++++++++++++++++++++
>  app/test-pmd/config.c            | 67 ++++++++++++++++++++++++++++++++++
>  app/test-pmd/testpmd.h           |  2 ++
>  drivers/net/e1000/e1000_ethdev.h | 12 +++++++
>  drivers/net/e1000/em_ethdev.c    |  2 ++
>  drivers/net/e1000/em_rxtx.c      | 38 ++++++++++++++++++++
>  drivers/net/e1000/igb_ethdev.c   |  4 +++
>  drivers/net/e1000/igb_rxtx.c     | 36 +++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev.c   |  2 ++
>  drivers/net/i40e/i40e_ethdev.h   |  5 +++
>  drivers/net/i40e/i40e_rxtx.c     | 42 ++++++++++++++++++++++
>  drivers/net/ixgbe/ixgbe_ethdev.c |  4 +++
>  drivers/net/ixgbe/ixgbe_ethdev.h |  6 ++++
>  drivers/net/ixgbe/ixgbe_rxtx.c   | 42 ++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev.c    | 54 ++++++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev.h    | 77 +++++++++++++++++++++++++++++++++++++++-
>  16 files changed, 440 insertions(+), 1 deletion(-)
> 
> -- 
> 1.8.5.3
> 
Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>

BTW: I'm sure there are plenty of possible suggestions for extensions to these 
functions, but rather than constantly doing new versions to keep adding things
in, can we get the base functionality applied and add in the new info later -
as separate patches? There is space in the structs for more info without 
affecting the ABI.

Regards,
/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 2/9] lib_vhost: Support multiple queues in virtio dev
  @ 2015-06-18 13:34  3%       ` Flavio Leitner
  2015-06-19  1:17  3%         ` Ouyang, Changchun
  0 siblings, 1 reply; 200+ results
From: Flavio Leitner @ 2015-06-18 13:34 UTC (permalink / raw)
  To: Ouyang Changchun; +Cc: dev

On Mon, Jun 15, 2015 at 03:56:39PM +0800, Ouyang Changchun wrote:
> Each virtio device could have multiple queues, say 2 or 4, at most 8.
> Enabling this feature allows virtio device/port on guest has the ability to
> use different vCPU to receive/transmit packets from/to each queue.
> 
> In multiple queues mode, virtio device readiness means all queues of
> this virtio device are ready, cleanup/destroy a virtio device also
> requires clearing all queues belong to it.
> 
> Changes in v3:
>   - fix coding style
>   - check virtqueue idx validity
> 
> Changes in v2:
>   - remove the q_num_set api
>   - add the qp_num_get api
>   - determine the queue pair num from qemu message
>   - rework for reset owner message handler
>   - dynamically alloc mem for dev virtqueue
>   - queue pair num could be 0x8000
>   - fix checkpatch errors
> 
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> ---
>  lib/librte_vhost/rte_virtio_net.h             |  10 +-
>  lib/librte_vhost/vhost-net.h                  |   1 +
>  lib/librte_vhost/vhost_rxtx.c                 |  49 +++++---
>  lib/librte_vhost/vhost_user/vhost-net-user.c  |   4 +-
>  lib/librte_vhost/vhost_user/virtio-net-user.c |  76 +++++++++---
>  lib/librte_vhost/vhost_user/virtio-net-user.h |   2 +
>  lib/librte_vhost/virtio-net.c                 | 161 +++++++++++++++++---------
>  7 files changed, 216 insertions(+), 87 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
> index 5d38185..873be3e 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -59,7 +59,6 @@ struct rte_mbuf;
>  /* Backend value set by guest. */
>  #define VIRTIO_DEV_STOPPED -1
>  
> -
>  /* Enum for virtqueue management. */
>  enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
>  
> @@ -96,13 +95,14 @@ struct vhost_virtqueue {
>   * Device structure contains all configuration information relating to the device.
>   */
>  struct virtio_net {
> -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
>  	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
> +	struct vhost_virtqueue	**virtqueue;    /**< Contains all virtqueue information. */
>  	uint64_t		features;	/**< Negotiated feature set. */
>  	uint64_t		device_fh;	/**< device identifier. */
>  	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>  	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
> +	uint32_t		num_virt_queues;
>  	void			*priv;		/**< private context */
>  } __rte_cache_aligned;


As already pointed out, this breaks ABI.
Do you have a plan for that or are you pushing this for dpdk 2.2?


> @@ -220,4 +220,10 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id,
>  uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
>  
> +/**
> + * This function get the queue pair number of one vhost device.
> + * @return
> + *  num of queue pair of specified virtio device.
> + */
> +uint16_t rte_vhost_qp_num_get(struct virtio_net *dev);

This needs to go to rte_vhost_version.map too.

fbl

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  4:36  9% ` Matthew Hall
  2015-06-17  5:28  8%   ` Stephen Hemminger
  2015-06-17 11:17  4%   ` Bruce Richardson
@ 2015-06-18 13:25  8%   ` Dumitrescu, Cristian
  2 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-18 13:25 UTC (permalink / raw)
  To: Matthew Hall, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matthew Hall
> Sent: Wednesday, June 17, 2015 5:37 AM
> To: dev@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices -
> statistics - ABI
> 
> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> 
> This goes against the idea I have seen before that we should be moving
> toward
> a distro-friendly approach where one copy of DPDK can be used by multiple
> apps
> without having to rebuild it. It seems like it is also a bit ABI hostile
> according to the below goals / discussions.

Matthew, thanks for your input. As Thomas also mentions, it would be good to first read the proposal on the guideline (http://dpdk.org/ml/archives/dev/2015-June/019461.html).

In the guideline proposal, we are addressing the topic of preventing the ABI changes due to library statistics support, it would be good to get your opinion on that, I am not sure you saw it. Given DPDK paramount focus on performance, I think probably provides the best solution for ABI compatibility in this case.

> 
> Jemalloc is also very high-performance code and still manages to allow
> enabling and disabling statistics at runtime. Are we sure it's impossible for
> DPDK or just theorizing?

A performance improvement for one environment might still be an overkill for DPDK run-time environment.

Stats cannot be enabled/disabled at run-time without performance impact, as it typically requires testing a persistent flag for incrementing the stats counters, which requires a conditional branch instruction. Sometimes the branches are predicted correctly, sometimes not, and in this case the CPU pipeline needs to be flushed; typical branch misprediction cost is ~14 cycles, which is important given a packet budget of ~200 cycles.

When only a small number of branches are present in the (application + library) code, they are predicted correctly, so I am sure that for a simple test application like l3fw the cost of run-time enabled stats is not visible. The problem is that real applications are much more complex and they typically have a lot of conditional branches, out of which the library stats test is just one of them, so they cannot be all predicted correctly, hence the cost of stats run-time enablement becomes very much visible.

> 
> > During the development of the release 2.0, there was an agreement to
> keep
> > ABI compatibility or to bring new ABI while keeping old one during one
> release.
> > In case it's not possible to have this transition, the (exceptional) break
> > should be acknowledged by several developers.
> 
> Personally to me it seems more important to preserve the ABI on patch
> releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?
> 
> > During the current development cycle for the release 2.1, the ABI question
> > arises many times in different threads.
> 
> Most but not all of these examples point to a different issue which
> sometimes
> happens in libraries... often seen as "old-style" versus "new-style" C library
> interface. For example, in old-style like libpcap there are a lot of structs,
> both opaque and non-opaque, which the caller must allocate in order to run
> libpcap.
> 
> However new-style libraries such as libcurl usually just have init functions
> which initialize all the secret structs based on some defaults and some user
> parameters and hide the actual structs from the user. If you want to adjust
> some you call an adjuster function that modifies the actual secret struct
> contents, with some enum saying what field to adjust, and the new value
> you
> want it to have.
> 
> If you want to keep a stable ABI for a non-stable library like DPDK, there's a
> good chance you must begin hiding all these weird device specific structs all
> over the DPDK from the user needing to directly allocate and modify them.
> Otherwise the ABI breaks everytime you have to add adjustments,
> extensions,
> modifications to all these obscure special features.
> 
> Matthew.

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCHv2 1/5] ethdev: add new API to retrieve RX/TX queue information
  @ 2015-06-18 13:18  3%     ` Konstantin Ananyev
    2015-06-18 13:58  3%     ` Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2015-06-18 13:18 UTC (permalink / raw)
  To: dev

Add the ability for the upper layer to query RX/TX queue information.

Add new structures:
struct rte_eth_rxq_info
struct rte_eth_txq_info

new functions:
rte_eth_rx_queue_info_get
rte_eth_tx_queue_info_get

into rte_etdev API.

Left extra free space in the queue info structures,
so extra fields could be added later without ABI breakage.

v2 changes:

- Add formal check for the qinfo input parameter.
- As suggested rename 'rx_qinfo/tx_qinfo' to 'rxq_info/txq_info'

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ether/rte_ethdev.c | 54 ++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h | 77 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e13fde5..7dfe72a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3629,6 +3629,60 @@ rte_eth_remove_tx_callback(uint8_t port_id, uint16_t queue_id,
 }
 
 int
+rte_eth_rx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_rxq_info *qinfo)
+{
+	struct rte_eth_dev *dev;
+
+	if (qinfo == NULL)
+		return -EINVAL;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -EINVAL;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
+		return -EINVAL;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
+
+	memset(qinfo, 0, sizeof(*qinfo));
+	dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
+	return 0;
+}
+
+int
+rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_txq_info *qinfo)
+{
+	struct rte_eth_dev *dev;
+
+	if (qinfo == NULL)
+		return -EINVAL;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -EINVAL;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_tx_queues) {
+		PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		return -EINVAL;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
+
+	memset(qinfo, 0, sizeof(*qinfo));
+	dev->dev_ops->txq_info_get(dev, queue_id, qinfo);
+	return 0;
+}
+
+int
 rte_eth_dev_set_mc_addr_list(uint8_t port_id,
 			     struct ether_addr *mc_addr_set,
 			     uint32_t nb_mc_addr)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 04c192d..5dd4c01 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -942,6 +942,30 @@ struct rte_eth_xstats {
 	uint64_t value;
 };
 
+/**
+ * Ethernet device RX queue information strcuture.
+ * Used to retieve information about configured queue.
+ */
+struct rte_eth_rxq_info {
+	struct rte_mempool *mp;     /**< mempool used by that queue. */
+	struct rte_eth_rxconf conf; /**< queue config parameters. */
+	uint8_t scattered_rx;       /**< scattered packets RX supported. */
+	uint16_t nb_desc;           /**< configured number of RXDs. */
+	uint16_t max_desc;          /**< max allowed number of RXDs. */
+	uint16_t min_desc;          /**< min allowed number of RXDs. */
+} __rte_cache_aligned;
+
+/**
+ * Ethernet device TX queue information strcuture.
+ * Used to retieve information about configured queue.
+ */
+struct rte_eth_txq_info {
+	struct rte_eth_txconf conf; /**< queue config parameters. */
+	uint16_t nb_desc;           /**< configured number of TXDs. */
+	uint16_t max_desc;          /**< max allowed number of TXDs. */
+	uint16_t min_desc;          /**< min allowed number of TXDs. */
+} __rte_cache_aligned;
+
 struct rte_eth_dev;
 
 struct rte_eth_dev_callback;
@@ -1045,6 +1069,12 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @Check DD bit of specific RX descriptor */
 
+typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
+	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
+
+typedef void (*eth_txq_info_get_t)(struct rte_eth_dev *dev,
+	uint16_t tx_queue_id, struct rte_eth_txq_info *qinfo);
+
 typedef int (*mtu_set_t)(struct rte_eth_dev *dev, uint16_t mtu);
 /**< @internal Set MTU. */
 
@@ -1389,8 +1419,13 @@ struct eth_dev_ops {
 	rss_hash_update_t rss_hash_update;
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
-	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+	eth_filter_ctrl_t              filter_ctrl;
+	/**< common filter control. */
 	eth_set_mc_addr_list_t set_mc_addr_list; /**< set list of mcast addrs */
+	eth_rxq_info_get_t rxq_info_get;
+	/**< retrieve RX queue information. */
+	eth_txq_info_get_t txq_info_get;
+	/**< retrieve TX queue information. */
 };
 
 /**
@@ -3616,6 +3651,46 @@ int rte_eth_remove_rx_callback(uint8_t port_id, uint16_t queue_id,
 int rte_eth_remove_tx_callback(uint8_t port_id, uint16_t queue_id,
 		struct rte_eth_rxtx_callback *user_cb);
 
+/**
+ * Retrieve information about given port's RX queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The RX queue on the Ethernet device for which information
+ *   will be retrieved.
+ * @param qinfo
+ *   A pointer to a structure of type *rte_eth_rxq_info_info* to be filled with
+ *   the information of the Ethernet device.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOTSUP: routine is not supported by the device PMD.
+ *   - -EINVAL:  The port_id or the queue_id is out of range.
+ */
+int rte_eth_rx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_rxq_info *qinfo);
+
+/**
+ * Retrieve information about given port's TX queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The TX queue on the Ethernet device for which information
+ *   will be retrieved.
+ * @param qinfo
+ *   A pointer to a structure of type *rte_eth_txq_info_info* to be filled with
+ *   the information of the Ethernet device.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOTSUP: routine is not supported by the device PMD.
+ *   - -EINVAL:  The port_id or the queue_id is out of range.
+ */
+int rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_txq_info *qinfo);
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.5.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  9:54  8% ` Morten Brørup
@ 2015-06-18 13:00  4%   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-18 13:00 UTC (permalink / raw)
  To: Morten Brørup, Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Morten Brørup
> Sent: Wednesday, June 17, 2015 10:54 AM
> To: Thomas Monjalon
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices -
> statistics - ABI
> 
> Dear Thomas,
> 
> I don't have time to follow the DPDK Developers mailing list, but since you call
> for feedback, I would like to share my thoughts regarding these design
> choices.
> 
> 
> Regarding the statistics discussion:
> 
> 1. The suggested solution assumes that, when statistics is disabled, the cost
> of allocating and maintaining zero-value statistics is negligible. If statistics
> counters are only available through accessor functions, this is probably true.
> 
> However, if statistics counters are directly accessible, e.g. as elements in the
> fast path data structures of a library, maintaining zero-value statistics may a
> have memory and/or performance impact.
> 

Counters are only accessible through API functions.

> Since the compile time flag
> CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT already tells the
> application if the statistics are present or not, the application should simply
> use this flag to determine if statistics are accessible or not.
> 
> 2. The suggested solution with only one single flag per library prevents
> implementing statistics with varying granularity for different purposes. E.g. a
> library could have one set of statistics counters for ordinary SNMP purposes,
> and another set of statistics counters for debugging/optimization purposes.
> 
> Multiple flags per library should be possible. A hierarchy of flags per library is
> probably not required.
> 

Morten, thank you for your input. It would be good if you could add your contribution to the of the guidelines documentation patch by replying to the thread that Thomas indicated: http://dpdk.org/ml/archives/dev/2015-June/019461.html.

Our initial stats patch submission had a much finer granularity of stats configuration: per object type instead of per library, but a lot of people on this mailing list are against this, so we are now looking for one configuration flag per library.

> 
> Regarding the PHY speed ABI:
> 
> 1. The Ethernet PHY ABI for speed, duplex, etc. should be common
> throughout the entire DPDK. It might be confusing if some
> structures/functions use a bitmask to indicate PHY
> speed/duplex/personality/etc. and other structures/functions use a
> combination of an unsigned integer, duplex flag, personality enumeration
> etc. (By personality enumeration, I am referring to PHYs with multiple
> electrical interfaces. E.g. a dual personality PHY might have both an RJ45
> copper interface and an SFP module interface, whereof only one can be
> active at any time.)
> 
> 2. The auto-negotiation standard allows the PHY to announce (to its link
> partner) any subset of its capabilities to its link partner. E.g. a standard
> 10/100/100 Ethernet PHY (which can handle both 10 and 100 Mbit/s in both
> half and full duplex and 1 Gbit/s full duplex) can be configured to announce
> 10 Mbit/s half duplex and 100 Mbit/s full duplex capabilities to its link partner.
> (Of course, more useful combinations are normally announced, but the
> purpose of the example is to show that any combination is possible.)
> 
> The ABI for auto-negotiation should include options to select the list of
> capabilities to announce to the link partner. The Linux PHY ABI only allows
> forcing a selected speed and duplex (thereby disabling auto-negotiation) or
> enabling auto-negotiation (thereby announcing all possible speeds and
> duplex combinations the PHY is capable of). Don't make the same mistake in
> DPDK.
> 
> PS: While working for Vitesse Semiconductors (an Ethernet chip company) a
> long time ago, I actually wrote the API for their line of Ethernet PHYs. So I
> have hands on experience in this area.
> 
> 
> Regarding the discussion about backwards/forwards compatibility in the ABI:
> 
> 1. Sometimes, ABI breakage is required. That is the cost the users pay for
> getting the benefits from upgrading to the latest and greatest version of any
> library. The current solution of requiring acknowledgement from several
> qualified developers is fine - these developers will consider the cost/benefit
> on behalf of all the DPDK users and make a qualified decision.
> 
> 2. It is my general experience that documentation is not always updated to
> reflect the fine details of the source code, and this also applies to release
> notes. For open source software, the primary point of documentation is
> usually the source code itself.
> 
> 2a. It should be clearly visible directly in the DPDK source code (including
> makefiles etc.) which ABI (i.e. functions, macros, type definitions etc.) is the
> current, the deprecated, and the future.
> 
> 2b. When a developer migrates a project using DPDK from a previous version
> of the DPDK, it should be easy for the developer to identify all DPDK ABI
> modifications and variants, e.g. by using a common indicator in the DPDK
> source code, such as LIBAPIVER, that developer can simply search for.
> 
> 3. Adding special feature flags, e.g. CONFIG_RTE_EAL_RX_INTR, to indicate a
> breakage of the ABI, should only be done if it is the intention to keep both
> the current and the new variants of the feature in the DPDK in the future.
> Otherwise, such a flag should be combined with the standard ABI version
> indication, so it is clear that this feature belongs to certain versions (i.e.
> deprecated, current or future).
> 
> 
> Med venlig hilsen / kind regards
> 
> Morten Brørup
> CTO
> 
> 
> 
> SmartShare Systems A/S
> Tonsbakken 16-18
> DK-2740 Skovlunde
> Denmark
> 
> Office      +45 70 20 00 93
> Direct      +45 89 93 50 22
> Mobile      +45 25 40 82 12
> 
> mb@smartsharesystems.com
> www.smartsharesystems.com
> -----Original Message-----
> From: announce [mailto:announce-bounces@dpdk.org] On Behalf Of
> Thomas Monjalon
> Sent: 17. juni 2015 01:30
> To: announce@dpdk.org
> Subject: [dpdk-announce] important design choices - statistics - ABI
> 
> Hi all,
> 
> Sometimes there are some important discussions about architecture or
> design which require opinions from several developers. Unfortunately, we
> cannot read every threads. Maybe that using the announce mailing list will
> help to bring more audience to these discussions.
> Please note that
> 	- the announce@ ML is moderated to keep a low traffic,
> 	- every announce email is forwarded to dev@ ML.
> In case you want to reply to this email, please use dev@dpdk.org address.
> 
> There were some debates about software statistics disabling.
> Should they be always on or possibly disabled when compiled?
> We need to take a decision shortly and discuss (or agree) this proposal:
> 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> 
> During the development of the release 2.0, there was an agreement to keep
> ABI compatibility or to bring new ABI while keeping old one during one
> release.
> In case it's not possible to have this transition, the (exceptional) break should
> be acknowledged by several developers.
> 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> There were some interesting discussions but not a lot of participants:
> 	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367
> /focus=8461
> 
> During the current development cycle for the release 2.1, the ABI question
> arises many times in different threads.
> To add the hash key size field, it is proposed to use a struct padding gap:
> 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
> To support the flow director for VF, there is no proposal yet:
> 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
> To add the speed capability, it is proposed to break ABI in the release 2.2:
> 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
> To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
> To add the interrupt mode, it is proposed to add a build-time option
> CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
> binary:
> 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
> To add the packet type, there is a proposal to add a build-time option
> CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
> We must also better document how to remove a deprecated ABI:
> 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
> The ABI compatibility is a new constraint and we need to better understand
> what it means and how to proceed. Even the macros are not yet well
> documented:
> 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
> 
> Thanks for your attention and your participation in these important choices.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v7 0/4] User-space Ethtool
  2015-06-18  2:04  3%   ` Stephen Hemminger
@ 2015-06-18 12:47  0%     ` Wang, Liang-min
  2015-06-23 15:19  0%       ` Wang, Liang-min
  0 siblings, 1 reply; 200+ results
From: Wang, Liang-min @ 2015-06-18 12:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

>I agree with having a more complete API, but have some nits to pick.
>Could the API be more abstract to reduce ABI issues in future?

Which API? Are you referring to the APIs over ethdev level, or something else?
More abstract on input/output data structure definition or else? Could you be more specific?

>I know choosing names is hard, but as a Linux developer ethtool has a very specific meaning to me.
>This API encompasses things broader than Linux ethtool and has different semantics therefore
>not sure having something in DPDK with same name is really a good idea.
>
>It would be better to call it something else like netdev_?? Or dpnet_??

Just to clarify the naming suggestion, in this patch, the prefix “ethtool” only appears on example and on this patch description.
Are you suggesting changing the name over example/l2fwd-ethtool or on this patch description, or may be both?



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/6] hash: replace existing hash library with cuckoo hash implementation
  @ 2015-06-18  9:50  4%   ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-18  9:50 UTC (permalink / raw)
  To: Pablo de Lara; +Cc: dev

On Fri, Jun 05, 2015 at 03:33:20PM +0100, Pablo de Lara wrote:
> This patch replaces the existing hash library with another approach,
> using the Cuckoo Hash method to resolve collisions (open addressing),
> which pushes items from a full bucket when a new entry tries
> to be added in it, storing the evicted entry in an alternative location,
> using a secondary hash function.
> 
> This gives the user the ability to store more entries when a bucket
> is full, in comparison with the previous implementation.
> Therefore, the unit test has been updated, as some scenarios have changed
> (such as the previous removed restriction).
> 
> Also note that the API has not been changed, although new fields
> have been added in the rte_hash structure.
> The main change when creating a new table is that the number of entries
> per bucket is fixed now, so its parameter is ignored now
> (still there to maintain the same parameters structure).
> 
> As a last note, the maximum burst size in lookup_burst function
> hash been increased to 64, to improve performance.
> 
> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

Hi Pablo,

Some review comments below.

/Bruce


> ---
>  app/test/test_hash.c       |  86 +----
>  lib/librte_hash/rte_hash.c | 797 ++++++++++++++++++++++++++++++++++-----------
>  lib/librte_hash/rte_hash.h | 157 +++++----
>  3 files changed, 721 insertions(+), 319 deletions(-)
> 
> diff --git a/app/test/test_hash.c b/app/test/test_hash.c
> index 1da27c5..4ef99ee 100644
> --- a/app/test/test_hash.c
> +++ b/app/test/test_hash.c
> @@ -169,7 +169,6 @@ static struct flow_key keys[5] = { {
>  /* Parameters used for hash table in unit test functions. Name set later. */
>  static struct rte_hash_parameters ut_params = {
>  	.entries = 64,
> -	.bucket_entries = 4,
>  	.key_len = sizeof(struct flow_key), /* 13 */
>  	.hash_func = rte_jhash,
>  	.hash_func_init_val = 0,
> @@ -527,21 +526,18 @@ static int test_five_keys(void)
>  /*
>   * Add keys to the same bucket until bucket full.
>   *	- add 5 keys to the same bucket (hash created with 4 keys per bucket):
> - *	  first 4 successful, 5th unsuccessful
> - *	- lookup the 5 keys: 4 hits, 1 miss
> - *	- add the 5 keys again: 4 OK, one error as bucket is full
> - *	- lookup the 5 keys: 4 hits (updated data), 1 miss
> - *	- delete the 5 keys: 5 OK (even if the 5th is not in the table)
> + *	  first 4 successful, 5th successful, pushing existing item in bucket
> + *	- lookup the 5 keys: 5 hits
> + *	- add the 5 keys again: 5 OK
> + *	- lookup the 5 keys: 5 hits (updated data)
> + *	- delete the 5 keys: 5 OK
>   *	- lookup the 5 keys: 5 misses
> - *	- add the 5th key: OK
> - *	- lookup the 5th key: hit
>   */
>  static int test_full_bucket(void)
>  {
>  	struct rte_hash_parameters params_pseudo_hash = {
>  		.name = "test4",
>  		.entries = 64,
> -		.bucket_entries = 4,
>  		.key_len = sizeof(struct flow_key), /* 13 */
>  		.hash_func = pseudo_hash,
>  		.hash_func_init_val = 0,
> @@ -555,7 +551,7 @@ static int test_full_bucket(void)
>  	handle = rte_hash_create(&params_pseudo_hash);
>  	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
>  
> -	/* Fill bucket*/
> +	/* Fill bucket */
>  	for (i = 0; i < 4; i++) {
>  		pos[i] = rte_hash_add_key(handle, &keys[i]);
>  		print_key_info("Add", &keys[i], pos[i]);
> @@ -563,47 +559,36 @@ static int test_full_bucket(void)
>  			"failed to add key (pos[%u]=%d)", i, pos[i]);
>  		expected_pos[i] = pos[i];
>  	}
> -	/* This shouldn't work because the bucket is full */
> +	/* This should work and will push one of the items in the bucket because it is full */
>  	pos[4] = rte_hash_add_key(handle, &keys[4]);
>  	print_key_info("Add", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] != -ENOSPC,
> -			"fail: added key to full bucket (pos[4]=%d)", pos[4]);
> +	RETURN_IF_ERROR(pos[4] < 0,
> +			"failed to add key (pos[4]=%d)", pos[4]);
> +	expected_pos[5] = pos[5];
>  
>  	/* Lookup */
> -	for (i = 0; i < 4; i++) {
> +	for (i = 0; i < 5; i++) {
>  		pos[i] = rte_hash_lookup(handle, &keys[i]);
>  		print_key_info("Lkp", &keys[i], pos[i]);
>  		RETURN_IF_ERROR(pos[i] != expected_pos[i],
>  			"failed to find key (pos[%u]=%d)", i, pos[i]);
>  	}
> -	pos[4] = rte_hash_lookup(handle, &keys[4]);
> -	print_key_info("Lkp", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] != -ENOENT,
> -			"fail: found non-existent key (pos[4]=%d)", pos[4]);
>  
>  	/* Add - update */
> -	for (i = 0; i < 4; i++) {
> +	for (i = 0; i < 5; i++) {
>  		pos[i] = rte_hash_add_key(handle, &keys[i]);
>  		print_key_info("Add", &keys[i], pos[i]);
>  		RETURN_IF_ERROR(pos[i] != expected_pos[i],
>  			"failed to add key (pos[%u]=%d)", i, pos[i]);
>  	}
> -	pos[4] = rte_hash_add_key(handle, &keys[4]);
> -	print_key_info("Add", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] != -ENOSPC,
> -			"fail: added key to full bucket (pos[4]=%d)", pos[4]);
>  
>  	/* Lookup */
> -	for (i = 0; i < 4; i++) {
> +	for (i = 0; i < 5; i++) {
>  		pos[i] = rte_hash_lookup(handle, &keys[i]);
>  		print_key_info("Lkp", &keys[i], pos[i]);
>  		RETURN_IF_ERROR(pos[i] != expected_pos[i],
>  			"failed to find key (pos[%u]=%d)", i, pos[i]);
>  	}
> -	pos[4] = rte_hash_lookup(handle, &keys[4]);
> -	print_key_info("Lkp", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] != -ENOENT,
> -			"fail: found non-existent key (pos[4]=%d)", pos[4]);
>  
>  	/* Delete 1 key, check other keys are still found */
>  	pos[1] = rte_hash_del_key(handle, &keys[1]);
> @@ -623,35 +608,21 @@ static int test_full_bucket(void)
>  	RETURN_IF_ERROR(pos[1] < 0, "failed to add key (pos[1]=%d)", pos[1]);
>  
>  	/* Delete */
> -	for (i = 0; i < 4; i++) {
> +	for (i = 0; i < 5; i++) {
>  		pos[i] = rte_hash_del_key(handle, &keys[i]);
>  		print_key_info("Del", &keys[i], pos[i]);
>  		RETURN_IF_ERROR(pos[i] != expected_pos[i],
>  			"failed to delete key (pos[%u]=%d)", i, pos[i]);
>  	}
> -	pos[4] = rte_hash_del_key(handle, &keys[4]);
> -	print_key_info("Del", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] != -ENOENT,
> -			"fail: deleted non-existent key (pos[4]=%d)", pos[4]);
>  
>  	/* Lookup */
> -	for (i = 0; i < 4; i++) {
> +	for (i = 0; i < 5; i++) {
>  		pos[i] = rte_hash_lookup(handle, &keys[i]);
>  		print_key_info("Lkp", &keys[i], pos[i]);
>  		RETURN_IF_ERROR(pos[i] != -ENOENT,
>  			"fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
>  	}
>  
> -	/* Add and lookup the 5th key */
> -	pos[4] = rte_hash_add_key(handle, &keys[4]);
> -	print_key_info("Add", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] < 0, "failed to add key (pos[4]=%d)", pos[4]);
> -	expected_pos[4] = pos[4];
> -	pos[4] = rte_hash_lookup(handle, &keys[4]);
> -	print_key_info("Lkp", &keys[4], pos[4]);
> -	RETURN_IF_ERROR(pos[4] != expected_pos[4],
> -			"failed to find key (pos[4]=%d)", pos[4]);
> -
>  	rte_hash_free(handle);
>  
>  	/* Cover the NULL case. */
> @@ -1017,18 +988,8 @@ static int test_hash_creation_with_bad_parameters(void)
>  	}
>  
>  	memcpy(&params, &ut_params, sizeof(params));
> -	params.name = "creation_with_bad_parameters_1";
> -	params.bucket_entries = RTE_HASH_BUCKET_ENTRIES_MAX + 1;
> -	handle = rte_hash_create(&params);
> -	if (handle != NULL) {
> -		rte_hash_free(handle);
> -		printf("Impossible creating hash sucessfully with bucket_entries in parameter exceeded\n");
> -		return -1;
> -	}
> -
> -	memcpy(&params, &ut_params, sizeof(params));
>  	params.name = "creation_with_bad_parameters_2";
> -	params.entries = params.bucket_entries - 1;
> +	params.entries = RTE_HASH_BUCKET_ENTRIES - 1;
>  	handle = rte_hash_create(&params);
>  	if (handle != NULL) {
>  		rte_hash_free(handle);
> @@ -1048,16 +1009,6 @@ static int test_hash_creation_with_bad_parameters(void)
>  
>  	memcpy(&params, &ut_params, sizeof(params));
>  	params.name = "creation_with_bad_parameters_4";
> -	params.bucket_entries = params.bucket_entries - 1;
> -	handle = rte_hash_create(&params);
> -	if (handle != NULL) {
> -		rte_hash_free(handle);
> -		printf("Impossible creating hash sucessfully if bucket_entries in parameter is not power of 2\n");
> -		return -1;
> -	}
> -
> -	memcpy(&params, &ut_params, sizeof(params));
> -	params.name = "creation_with_bad_parameters_5";
>  	params.key_len = 0;
>  	handle = rte_hash_create(&params);
>  	if (handle != NULL) {
> @@ -1067,7 +1018,7 @@ static int test_hash_creation_with_bad_parameters(void)
>  	}
>  
>  	memcpy(&params, &ut_params, sizeof(params));
> -	params.name = "creation_with_bad_parameters_6";
> +	params.name = "creation_with_bad_parameters_5";
>  	params.key_len = RTE_HASH_KEY_LENGTH_MAX + 1;
>  	handle = rte_hash_create(&params);
>  	if (handle != NULL) {
> @@ -1077,7 +1028,7 @@ static int test_hash_creation_with_bad_parameters(void)
>  	}
>  
>  	memcpy(&params, &ut_params, sizeof(params));
> -	params.name = "creation_with_bad_parameters_7";
> +	params.name = "creation_with_bad_parameters_6";
>  	params.socket_id = RTE_MAX_NUMA_NODES + 1;
>  	handle = rte_hash_create(&params);
>  	if (handle != NULL) {
> @@ -1158,7 +1109,6 @@ static uint8_t key[16] = {0x00, 0x01, 0x02, 0x03,
>  static struct rte_hash_parameters hash_params_ex = {
>  	.name = NULL,
>  	.entries = 64,
> -	.bucket_entries = 4,
>  	.key_len = 0,
>  	.hash_func = NULL,
>  	.hash_func_init_val = 0,
> diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
> index 9245716..cbfe17e 100644
> --- a/lib/librte_hash/rte_hash.c
> +++ b/lib/librte_hash/rte_hash.c
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -83,64 +83,12 @@ EAL_REGISTER_TAILQ(rte_hash_tailq)
>  #define DEFAULT_HASH_FUNC       rte_jhash
>  #endif
>  
> -/* Signature bucket size is a multiple of this value */
> -#define SIG_BUCKET_ALIGNMENT    16
> -
> -/* Stoered key size is a multiple of this value */
> -#define KEY_ALIGNMENT           16
> +/* Bucket size is a multiple of this value */
> +#define BUCKET_ALIGNMENT	16

Why this value? Why not just CACHE_LINE_SIZE? In fact, given that the buckets
in this implementation are now a fixed size, this is unnecessary. The buckets are
always one cache line in size, due to __rte_cache_aligned on the definition.

>  
>  /* The high bit is always set in real signatures */
>  #define NULL_SIGNATURE          0
>  
> -/* Returns a pointer to the first signature in specified bucket. */
> -static inline hash_sig_t *
> -get_sig_tbl_bucket(const struct rte_hash *h, uint32_t bucket_index)
> -{
> -	return (hash_sig_t *)
> -			&(h->sig_tbl[bucket_index * h->sig_tbl_bucket_size]);
> -}
> -
> -/* Returns a pointer to the first key in specified bucket. */
> -static inline uint8_t *
> -get_key_tbl_bucket(const struct rte_hash *h, uint32_t bucket_index)
> -{
> -	return (uint8_t *) &(h->key_tbl[bucket_index * h->bucket_entries *
> -				     h->key_tbl_key_size]);
> -}
> -
> -/* Returns a pointer to a key at a specific position in a specified bucket. */
> -static inline void *
> -get_key_from_bucket(const struct rte_hash *h, uint8_t *bkt, uint32_t pos)
> -{
> -	return (void *) &bkt[pos * h->key_tbl_key_size];
> -}
> -
> -/* Does integer division with rounding-up of result. */
> -static inline uint32_t
> -div_roundup(uint32_t numerator, uint32_t denominator)
> -{
> -	return (numerator + denominator - 1) / denominator;
> -}
> -
> -/* Increases a size (if needed) to a multiple of alignment. */
> -static inline uint32_t
> -align_size(uint32_t val, uint32_t alignment)
> -{
> -	return alignment * div_roundup(val, alignment);
> -}
> -
> -/* Returns the index into the bucket of the first occurrence of a signature. */
> -static inline int
> -find_first(uint32_t sig, const uint32_t *sig_bucket, uint32_t num_sigs)
> -{
> -	uint32_t i;
> -	for (i = 0; i < num_sigs; i++) {
> -		if (sig == sig_bucket[i])
> -			return i;
> -	}
> -	return -1;
> -}
> -
>  struct rte_hash *
>  rte_hash_find_existing(const char *name)
>  {
> @@ -165,25 +113,49 @@ rte_hash_find_existing(const char *name)
>  	return h;
>  }
>  
> +/* Does integer division with rounding-up of result. */
> +static inline uint32_t
> +div_roundup(uint32_t numerator, uint32_t denominator)
> +{
> +	return (numerator + denominator - 1) / denominator;
> +}
> +
> +/* Increases a size (if needed) to a multiple of alignment. */
> +static inline uint32_t
> +align_size(uint32_t val, uint32_t alignment)
> +{
> +	return alignment * div_roundup(val, alignment);
> +}

There are already inline functions/macros for alignment in rte_common.h, that
should be used if possible.

> +
>  struct rte_hash *
>  rte_hash_create(const struct rte_hash_parameters *params)
>  {
>  	struct rte_hash *h = NULL;
>  	struct rte_tailq_entry *te;
> -	uint32_t num_buckets, sig_bucket_size, key_size,
> -		hash_tbl_size, sig_tbl_size, key_tbl_size, mem_size;
> +	uint32_t num_buckets, bucket_size,
> +		tbl_size, mem_size, hash_struct_size;
>  	char hash_name[RTE_HASH_NAMESIZE];
>  	struct rte_hash_list *hash_list;
> +	void *k = NULL;
> +	struct rte_ring *r = NULL;
> +	char ring_name[RTE_RING_NAMESIZE];
> +	unsigned key_entry_size;
> +	const unsigned anyalignment = 0;
> +	unsigned i;
> +	void *ptr;
>  
>  	hash_list = RTE_TAILQ_CAST(rte_hash_tailq.head, rte_hash_list);
>  
> +	if (params == NULL) {
> +		RTE_LOG(ERR, HASH, "rte_hash_create has no parameters\n");
> +		return NULL;
> +	}
> +
>  	/* Check for valid parameters */
> -	if ((params == NULL) ||
> -			(params->entries > RTE_HASH_ENTRIES_MAX) ||
> -			(params->bucket_entries > RTE_HASH_BUCKET_ENTRIES_MAX) ||
> -			(params->entries < params->bucket_entries) ||
> +	if ((params->entries > RTE_HASH_ENTRIES_MAX) ||
> +			(params->entries < RTE_HASH_BUCKET_ENTRIES) ||
>  			!rte_is_power_of_2(params->entries) ||
> -			!rte_is_power_of_2(params->bucket_entries) ||
> +			!rte_is_power_of_2(RTE_HASH_BUCKET_ENTRIES) ||
>  			(params->key_len == 0) ||
>  			(params->key_len > RTE_HASH_KEY_LENGTH_MAX)) {
>  		rte_errno = EINVAL;
> @@ -194,19 +166,18 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  	snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
>  
>  	/* Calculate hash dimensions */
> -	num_buckets = params->entries / params->bucket_entries;
> -	sig_bucket_size = align_size(params->bucket_entries *
> -				     sizeof(hash_sig_t), SIG_BUCKET_ALIGNMENT);
> -	key_size =  align_size(params->key_len, KEY_ALIGNMENT);
> +	num_buckets = params->entries / RTE_HASH_BUCKET_ENTRIES;
> +
> +	bucket_size = align_size(sizeof(struct rte_hash_bucket), BUCKET_ALIGNMENT);

unnecessary, due to cache line ailgnment of the struct.
>  
> -	hash_tbl_size = align_size(sizeof(struct rte_hash), RTE_CACHE_LINE_SIZE);
> -	sig_tbl_size = align_size(num_buckets * sig_bucket_size,
> +	hash_struct_size = align_size(sizeof(struct rte_hash), RTE_CACHE_LINE_SIZE);

I don't think this is necessary. Better to make structure cache line aligned, and
then just use sizeof without any additional calculations on it's size.

> +	tbl_size = align_size(num_buckets * bucket_size,
>  				  RTE_CACHE_LINE_SIZE);

Again, unecessary alignment calculation.

> -	key_tbl_size = align_size(num_buckets * key_size *
> -				  params->bucket_entries, RTE_CACHE_LINE_SIZE);
>  
>  	/* Total memory required for hash context */
> -	mem_size = hash_tbl_size + sig_tbl_size + key_tbl_size;
> +	mem_size = hash_struct_size + tbl_size;
> +
> +	key_entry_size = params->key_len;

Is this not a case where we do want some sort of alignment on our keys, just to
reduce the chances of a key crossing a cache line boundary?

>  
>  	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
>  
> @@ -216,6 +187,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  		if (strncmp(params->name, h->name, RTE_HASH_NAMESIZE) == 0)
>  			break;
>  	}
> +	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
>  	if (te != NULL)
>  		goto exit;
>  
> @@ -226,36 +198,73 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  	}
>  
>  	h = (struct rte_hash *)rte_zmalloc_socket(hash_name, mem_size,
> -					   RTE_CACHE_LINE_SIZE, params->socket_id);
> +					RTE_CACHE_LINE_SIZE, params->socket_id);
> +
>  	if (h == NULL) {
>  		RTE_LOG(ERR, HASH, "memory allocation failed\n");
>  		rte_free(te);
>  		goto exit;
>  	}
>  
> +	k = rte_zmalloc_socket(NULL, key_entry_size * (params->entries + 1),
> +			anyalignment, params->socket_id);
> +
> +	if (k == NULL) {
> +		RTE_LOG(ERR, HASH, "memory allocation failed\n");
> +		h = NULL;
> +		rte_free(te);
> +		rte_free(h);

Rather than building-up an ever increasing list of memory to be freed on each
exit branch, define a separate error label and always free all the memory pointers
allocated during the run. The rte_free() API has no effect on NULL pointers, so
it can be safe to do so (assuming you NULL-initialize all vars first).

> +		goto exit;
> +	}
> +
> +	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
> +	r = rte_ring_lookup(ring_name);
> +	if (r != NULL) {
> +		/* clear the free ring */
> +		while (rte_ring_dequeue(r, &ptr) == 0)
> +			rte_pause();
> +	} else
> +		r = rte_ring_create(ring_name, rte_align32pow2(params->entries),
> +				params->socket_id, 0);
> +	if (r == NULL) {
> +		RTE_LOG(ERR, HASH, "memory allocation failed\n");
> +		rte_free(te);
> +		rte_free(h);
> +		h = NULL;
> +		rte_free(k);
> +		goto exit;
> +	}
> +
>  	/* Setup hash context */
>  	snprintf(h->name, sizeof(h->name), "%s", params->name);
>  	h->entries = params->entries;
> -	h->bucket_entries = params->bucket_entries;
> +	h->bucket_entries = RTE_HASH_BUCKET_ENTRIES;
>  	h->key_len = params->key_len;
> +	h->key_entry_size = key_entry_size;
>  	h->hash_func_init_val = params->hash_func_init_val;
> +	h->socket_id = params->socket_id;
> +
>  	h->num_buckets = num_buckets;
>  	h->bucket_bitmask = h->num_buckets - 1;
> -	h->sig_msb = 1 << (sizeof(hash_sig_t) * 8 - 1);
> -	h->sig_tbl = (uint8_t *)h + hash_tbl_size;
> -	h->sig_tbl_bucket_size = sig_bucket_size;
> -	h->key_tbl = h->sig_tbl + sig_tbl_size;
> -	h->key_tbl_key_size = key_size;
> +	h->buckets = (struct rte_hash_bucket *)((uint8_t *)h + hash_struct_size);

You can avoid the addition and typecasting by adding
	struct rte_hash_bucket buckets[0] __rte_cache_aligned;
as the last field in your rte_hash structure.

>  	h->hash_func = (params->hash_func == NULL) ?
>  		DEFAULT_HASH_FUNC : params->hash_func;
>  
> +	h->sig_msb = 1ULL << (sizeof(uint64_t) * 8 - 1);

8 == CHAR_BIT

> +	h->sig_secondary = 1ULL << (sizeof(uint64_t) * 8 - 2);
> +

h->sig_secondary = h->sig_msb >> 1; // ??

> +	h->key_store = k;
> +	h->free_slots = r;
>  	te->data = (void *) h;
>  
> -	TAILQ_INSERT_TAIL(hash_list, te, next);
> +	/* populate the free slots ring. Entry zero is reserved for key misses */
> +	for (i = 1; i < params->entries + 1; i++)
> +			rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
>  
> -exit:
> +	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> +	TAILQ_INSERT_TAIL(hash_list, te, next);
>  	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> -
> +exit:
>  	return h;
>  }
>  
> @@ -287,49 +296,164 @@ rte_hash_free(struct rte_hash *h)
>  
>  	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
>  
> +	rte_free(h->key_store);
>  	rte_free(h);
>  	rte_free(te);
>  }
>  
>  static inline int32_t
> -__rte_hash_add_key_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig)
> +run_cuckoo(const struct rte_hash *h, struct rte_hash_bucket *bkt, uint32_t key_idx,
> +		uint64_t hash, uint64_t original_hash, const void *original_key)
>  {
> -	hash_sig_t *sig_bucket;
> -	uint8_t *key_bucket;
> -	uint32_t bucket_index, i;
> -	int32_t pos;
> -
> -	/* Get the hash signature and bucket index */
> -	sig |= h->sig_msb;
> -	bucket_index = sig & h->bucket_bitmask;
> -	sig_bucket = get_sig_tbl_bucket(h, bucket_index);
> -	key_bucket = get_key_tbl_bucket(h, bucket_index);
> -
> -	/* Check if key is already present in the hash */
> -	for (i = 0; i < h->bucket_entries; i++) {
> -		if ((sig == sig_bucket[i]) &&
> -		    likely(memcmp(key, get_key_from_bucket(h, key_bucket, i),
> -				  h->key_len) == 0)) {
> -			return bucket_index * h->bucket_entries + i;
> +	/* idx = 0 if primary, 1 if secondary */
> +	unsigned idx;
> +	static unsigned number_pushes;
> +	void *k, *keys = h->key_store;
> +	unsigned i, j;
> +
> +	uint64_t hash_stored;
> +	uint32_t key_idx_stored;
> +	uint32_t bucket_stored_idx;
> +	struct rte_hash_bucket *bkt_stored;
> +
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		/* Check if slot is available */
> +		if (likely(bkt->signatures[i] == NULL_SIGNATURE)) {
> +			bkt->signatures[i] = hash;
> +			bkt->key_idx[i] = key_idx;
> +			number_pushes = 0;
> +			return bkt->key_idx[i];
> +		}
> +	}
> +
> +	/*
> +	 * If number of pushes has exceeded a certain limit, it
> +	 * is very likely that it has entered in a loop, need rehasing
> +	 */
> +	if (++number_pushes > 1 && hash == original_hash) {
> +		k = (char *)keys + key_idx * h->key_entry_size;
> +		if (!memcmp(k, original_key, h->key_len)) {
> +			rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t)key_idx));
> +			number_pushes = 0;
> +			/*
> +			 * Indicates to the user that key could not be added,
> +			 * so he can rehash and add it again or decide not to.
> +			 */
> +			return -EAGAIN;
>  		}
>  	}
>  
> -	/* Check if any free slot within the bucket to add the new key */
> -	pos = find_first(NULL_SIGNATURE, sig_bucket, h->bucket_entries);
> +	/*
> +	 * Push existing item (search for bucket with space in alternative locations)
> +	 * to its alternative location
> +	 */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		/*
> +		 * Check if item was stored in its primary/secondary location,
> +		 * to get the hash in the alternative location
> +		 */
> +		idx = !(bkt->signatures[i] & (h->sig_secondary));
> +		key_idx_stored = bkt->key_idx[i];
> +		k = (char *)keys + key_idx_stored * h->key_entry_size;
> +
> +		if (idx == 0)
> +			hash_stored = rte_hash_hash(h, k);
> +		else
> +			hash_stored = rte_hash_secondary_hash(bkt->signatures[i]);
> +
> +		bucket_stored_idx = hash_stored & h->bucket_bitmask;
> +		bkt_stored = &h->buckets[bucket_stored_idx];
> +		hash_stored |= h->sig_msb;
> +
> +		if (idx != 0)
> +			hash_stored |= h->sig_secondary;
> +
> +		for (j = 0; j < RTE_HASH_BUCKET_ENTRIES; j++) {
> +			if (bkt_stored->signatures[j] == NULL_SIGNATURE)
> +				break;
> +		}
> +
> +		if (j != RTE_HASH_BUCKET_ENTRIES)
> +			break;
> +	}
> +
> +	/* Push existing item (if all alternative buckets are full, pick the last one) */
> +	if (i == RTE_HASH_BUCKET_ENTRIES)
> +		i -= 1;
> +
> +	bkt->signatures[i] = hash;
> +	bkt->key_idx[i] = key_idx;
> +
> +	/* There is an empty slot in the alternative bucket */
> +	if (j != RTE_HASH_BUCKET_ENTRIES) {
> +		bkt_stored->signatures[j] = hash_stored;
> +		bkt_stored->key_idx[j] = key_idx_stored;
> +
> +		number_pushes = 0;
> +		return bkt_stored->key_idx[j];
> +	} else
> +		return run_cuckoo(h, bkt_stored, key_idx_stored, hash_stored,
> +				original_hash, original_key);
> +}
> +
> +static inline int32_t
> +__rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> +						hash_sig_t sig)
> +{
> +	uint64_t hash0, bucket_idx0, hash1, bucket_idx1;
> +	unsigned i;
> +	struct rte_hash_bucket *bkt0, *bkt1;
> +	void *k, *keys = h->key_store;
> +	void *slot_id;
> +
> +	hash0 = sig;
> +	bucket_idx0 = hash0 & h->bucket_bitmask;
> +	hash0 |= h->sig_msb;
> +
> +	bkt0 = &h->buckets[bucket_idx0];
>  
> -	if (unlikely(pos < 0))
> +	hash1 = rte_hash_secondary_hash(hash0);

Should this not be done before you go setting the msb on hash0?

> +
> +	bucket_idx1 = hash1 & h->bucket_bitmask;
> +	hash1 |= h->sig_msb;
> +	/* Secondary location, add an extra 1 in the second MSB */
> +	hash1 |= h->sig_secondary;
> +
> +	bkt1 = &h->buckets[bucket_idx1];
> +
> +	/* Check if key is already inserted in primary location */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (bkt0->signatures[i] == hash0) {
> +			k = (char *)keys + bkt0->key_idx[i] * h->key_entry_size;
> +			if (memcmp(key, k, h->key_len) == 0)
> +				return bkt0->key_idx[i];
> +		}
> +	}
> +
> +	/* Check if key is already inserted in secondary location */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (bkt1->signatures[i] == hash1) {
> +			k = (char *)keys + bkt1->key_idx[i] * h->key_entry_size;
> +			if (memcmp(key, k, h->key_len) == 0)
> +				return bkt1->key_idx[i];
> +		}
> +	}
> +
> +	k = (char *)keys + bkt0->key_idx[i] * h->key_entry_size;

Useless statement. You assign to k again just 3 lines further down.

> +	if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0)
>  		return -ENOSPC;
>  
> -	/* Add the new key to the bucket */
> -	sig_bucket[pos] = sig;
> -	rte_memcpy(get_key_from_bucket(h, key_bucket, pos), key, h->key_len);
> -	return bucket_index * h->bucket_entries + pos;
> +	/* Copy key*/
> +	k = (char *)keys + (uintptr_t)slot_id * h->key_entry_size;
> +	memcpy(k, key, h->key_len);
> +
> +	/* Run cuckoo algorithm */
> +	return run_cuckoo(h, bkt0, (uint32_t)((uintptr_t) slot_id), hash0, hash0, key);

Is this not meant to be hash0, hash1 for parameters 4 & 5?

>  }
>  
>  int32_t
>  rte_hash_add_key_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig)
> +			const void *key, hash_sig_t sig)
>  {
>  	RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
>  	return __rte_hash_add_key_with_hash(h, key, sig);
> @@ -343,26 +467,45 @@ rte_hash_add_key(const struct rte_hash *h, const void *key)
>  }
>  
>  static inline int32_t
> -__rte_hash_del_key_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig)
> +__rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> +					hash_sig_t sig)
>  {
> -	hash_sig_t *sig_bucket;
> -	uint8_t *key_bucket;
> -	uint32_t bucket_index, i;
> -
> -	/* Get the hash signature and bucket index */
> -	sig = sig | h->sig_msb;
> -	bucket_index = sig & h->bucket_bitmask;
> -	sig_bucket = get_sig_tbl_bucket(h, bucket_index);
> -	key_bucket = get_key_tbl_bucket(h, bucket_index);
> -
> -	/* Check if key is already present in the hash */
> -	for (i = 0; i < h->bucket_entries; i++) {
> -		if ((sig == sig_bucket[i]) &&
> -		    likely(memcmp(key, get_key_from_bucket(h, key_bucket, i),
> -				  h->key_len) == 0)) {
> -			sig_bucket[i] = NULL_SIGNATURE;
> -			return bucket_index * h->bucket_entries + i;
> +	uint64_t hash, bucket_idx;
> +	unsigned i;
> +	struct rte_hash_bucket *bkt;
> +	void *k, *keys = h->key_store;
> +
> +	hash = sig;
> +	bucket_idx = hash & h->bucket_bitmask;
> +	hash |= h->sig_msb;
> +
> +	bkt = &h->buckets[bucket_idx];
> +
> +	/* Check if key is in primary location */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (bkt->signatures[i] == hash) {
> +			k = (char *)keys + bkt->key_idx[i] * h->key_entry_size;
> +			if (memcmp(key, k, h->key_len) == 0)
> +				return bkt->key_idx[i];
> +		}
> +	}
> +
> +	/* Calculate secondary hash */
> +	hash = rte_hash_secondary_hash(hash);
> +
> +	bucket_idx = hash & h->bucket_bitmask;
> +	hash |= h->sig_msb;
> +	/* Secondary location, add an extra 1 in the second MSB */
> +	hash |= h->sig_secondary;
> +
> +	bkt = &h->buckets[bucket_idx];
> +
> +	/* Check if key is in secondary location */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (bkt->signatures[i] == hash) {
> +			k = (char *)keys + bkt->key_idx[i] * h->key_entry_size;
> +			if (memcmp(key, k, h->key_len) == 0)
> +				return bkt->key_idx[i];
>  		}
>  	}
>  
> @@ -370,40 +513,66 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h,
>  }
>  
>  int32_t
> -rte_hash_del_key_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig)
> +rte_hash_lookup_with_hash(const struct rte_hash *h,
> +			const void *key, hash_sig_t sig)
>  {
>  	RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> -	return __rte_hash_del_key_with_hash(h, key, sig);
> +	return __rte_hash_lookup_with_hash(h, key, sig);
>  }
>  
>  int32_t
> -rte_hash_del_key(const struct rte_hash *h, const void *key)
> +rte_hash_lookup(const struct rte_hash *h, const void *key)
>  {
>  	RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> -	return __rte_hash_del_key_with_hash(h, key, rte_hash_hash(h, key));
> +	return __rte_hash_lookup_with_hash(h, key, rte_hash_hash(h, key));
>  }
>  
>  static inline int32_t
> -__rte_hash_lookup_with_hash(const struct rte_hash *h,
> -			const void *key, hash_sig_t sig)
> +__rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
> +						hash_sig_t sig)
>  {
> -	hash_sig_t *sig_bucket;
> -	uint8_t *key_bucket;
> -	uint32_t bucket_index, i;
> -
> -	/* Get the hash signature and bucket index */
> -	sig |= h->sig_msb;
> -	bucket_index = sig & h->bucket_bitmask;
> -	sig_bucket = get_sig_tbl_bucket(h, bucket_index);
> -	key_bucket = get_key_tbl_bucket(h, bucket_index);
> -
> -	/* Check if key is already present in the hash */
> -	for (i = 0; i < h->bucket_entries; i++) {
> -		if ((sig == sig_bucket[i]) &&
> -		    likely(memcmp(key, get_key_from_bucket(h, key_bucket, i),
> -				  h->key_len) == 0)) {
> -			return bucket_index * h->bucket_entries + i;
> +	uint64_t hash, bucket_idx;
> +	unsigned i;
> +	struct rte_hash_bucket *bkt;
> +	void *k, *keys = h->key_store;
> +
> +	hash = sig;
> +	bucket_idx = hash & h->bucket_bitmask;
> +	hash |= h->sig_msb;
> +
> +	bkt = &h->buckets[bucket_idx];
> +
> +	/* Check if key is in primary location */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (bkt->signatures[i] == hash) {
> +			k = (char *)keys + bkt->key_idx[i] * h->key_entry_size;
> +			if (memcmp(key, k, h->key_len) == 0) {
> +				bkt->signatures[i] = NULL_SIGNATURE;
> +				rte_ring_sp_enqueue(h->free_slots,
> +						(void *)((uintptr_t)bkt->key_idx[i]));
> +				return bkt->key_idx[i];
> +			}
> +		}
> +	}
> +
> +	hash = rte_hash_secondary_hash(hash);
> +	bucket_idx = hash & h->bucket_bitmask;
> +	hash |= h->sig_msb;
> +	/* Secondary location, add an extra 1 in the second MSB */
> +	hash |= h->sig_secondary;
> +
> +	bkt = &h->buckets[bucket_idx];
> +
> +	/* Check if key is in secondary location */
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		if (bkt->signatures[i] == hash) {
> +			k = (char *)keys + bkt->key_idx[i] * h->key_entry_size;
> +			if (memcmp(key, k, h->key_len) == 0) {
> +				bkt->signatures[i] = NULL_SIGNATURE;
> +				rte_ring_sp_enqueue(h->free_slots,
> +						(void *)((uintptr_t)bkt->key_idx[i]));
> +				return bkt->key_idx[i];
> +			}
>  		}
>  	}
>  
> @@ -411,61 +580,313 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h,
>  }
>  
>  int32_t
> -rte_hash_lookup_with_hash(const struct rte_hash *h,
> +rte_hash_del_key_with_hash(const struct rte_hash *h,
>  			const void *key, hash_sig_t sig)
>  {
>  	RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> -	return __rte_hash_lookup_with_hash(h, key, sig);
> +	return __rte_hash_del_key_with_hash(h, key, sig);
>  }
>  
>  int32_t
> -rte_hash_lookup(const struct rte_hash *h, const void *key)
> +rte_hash_del_key(const struct rte_hash *h, const void *key)
>  {
>  	RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> -	return __rte_hash_lookup_with_hash(h, key, rte_hash_hash(h, key));
> +	return __rte_hash_del_key_with_hash(h, key, rte_hash_hash(h, key));
>  }
>  
> -int
> -rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> -		      uint32_t num_keys, int32_t *positions)
> +/* Lookup bulk stage 0: Calculate next primary/secondary hash value (from new key)  */
> +static inline void
> +lookup_stage0(unsigned *idx, uint64_t *lookup_mask,
> +		uint64_t *primary_hash,	uint64_t *secondary_hash,
> +		const void * const *keys, const struct rte_hash *h)
>  {
> -	uint32_t i, j, bucket_index;
> -	hash_sig_t sigs[RTE_HASH_LOOKUP_BULK_MAX];
> +	*idx = __builtin_ctzl(*lookup_mask);
> +	if (*lookup_mask == 0)
> +		*idx = 0;
>  
> -	RETURN_IF_TRUE(((h == NULL) || (keys == NULL) || (num_keys == 0) ||
> -			(num_keys > RTE_HASH_LOOKUP_BULK_MAX) ||
> -			(positions == NULL)), -EINVAL);
> +	*primary_hash = rte_hash_hash(h, keys[*idx]);
> +	*secondary_hash = rte_hash_secondary_hash(*primary_hash);
>  
> -	/* Get the hash signature and bucket index */
> -	for (i = 0; i < num_keys; i++) {
> -		sigs[i] = h->hash_func(keys[i], h->key_len,
> -				h->hash_func_init_val) | h->sig_msb;
> -		bucket_index = sigs[i] & h->bucket_bitmask;
> +	*primary_hash |= h->sig_msb;
> +
> +	*secondary_hash |= h->sig_msb;
> +	*secondary_hash |= h->sig_secondary;
> +
> +	*lookup_mask &= ~(1llu << *idx);
> +}
> +
> +
> +/* Lookup bulk stage 1: Prefetch primary/secondary buckets */
> +static inline void
> +lookup_stage1(uint64_t primary_hash, uint64_t secondary_hash,
> +		const struct rte_hash_bucket **primary_bkt,
> +		const struct rte_hash_bucket **secondary_bkt,
> +		const struct rte_hash *h)
> +{
> +	*primary_bkt = &h->buckets[primary_hash & h->bucket_bitmask];
> +	*secondary_bkt = &h->buckets[secondary_hash & h->bucket_bitmask];
> +
> +	rte_prefetch0(*primary_bkt);
> +	rte_prefetch0(*secondary_bkt);
> +}
>  
> -		/* Pre-fetch relevant buckets */
> -		rte_prefetch1((void *) get_sig_tbl_bucket(h, bucket_index));
> -		rte_prefetch1((void *) get_key_tbl_bucket(h, bucket_index));
> +/*
> + * Lookup bulk stage 2:  Search for match hashes in primary/secondary locations
> + * and prefetch first key slot
> + */
> +static inline void
> +lookup_stage2(unsigned idx, uint64_t primary_hash, uint64_t secondary_hash,
> +		const struct rte_hash_bucket *primary_bkt,
> +		const struct rte_hash_bucket *secondary_bkt,
> +		const void **key_slot,
> +		int32_t *positions,
> +		uint64_t *extra_hits_mask,
> +		const void *keys, const struct rte_hash *h)
> +{
> +	unsigned primary_hash_matches, secondary_hash_matches, key_idx, i;
> +	unsigned total_hash_matches;
> +
> +	total_hash_matches = 1 << (RTE_HASH_BUCKET_ENTRIES * 2);
> +	primary_hash_matches = 1 << RTE_HASH_BUCKET_ENTRIES;
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		primary_hash_matches |= ((primary_hash == primary_bkt->signatures[i]) << i);
> +		total_hash_matches |= ((primary_hash == primary_bkt->signatures[i]) << i);
>  	}
>  
> -	/* Check if key is already present in the hash */
> -	for (i = 0; i < num_keys; i++) {
> -		bucket_index = sigs[i] & h->bucket_bitmask;
> -		hash_sig_t *sig_bucket = get_sig_tbl_bucket(h, bucket_index);
> -		uint8_t *key_bucket = get_key_tbl_bucket(h, bucket_index);
> -
> -		positions[i] = -ENOENT;
> -
> -		for (j = 0; j < h->bucket_entries; j++) {
> -			if ((sigs[i] == sig_bucket[j]) &&
> -			    likely(memcmp(keys[i],
> -					  get_key_from_bucket(h, key_bucket, j),
> -					  h->key_len) == 0)) {
> -				positions[i] = bucket_index *
> -					h->bucket_entries + j;
> -				break;
> -			}
> -		}
> +	key_idx = primary_bkt->key_idx[__builtin_ctzl(primary_hash_matches)];
> +
> +	secondary_hash_matches = 1 << RTE_HASH_BUCKET_ENTRIES;
> +	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
> +		secondary_hash_matches |= ((secondary_hash == secondary_bkt->signatures[i]) << i);
> +		total_hash_matches |= ((secondary_hash == secondary_bkt->signatures[i])
> +						<< (i + RTE_HASH_BUCKET_ENTRIES));
> +	}
> +
> +	if (key_idx == 0)
> +		key_idx = secondary_bkt->key_idx[__builtin_ctzl(secondary_hash_matches)];
> +
> +	*key_slot = (const char *)keys + key_idx * h->key_entry_size;
> +
> +	rte_prefetch0(*key_slot);
> +	positions[idx] = key_idx;
> +
> +	*extra_hits_mask |= (uint64_t)(__builtin_popcount(primary_hash_matches) > 2) << idx;
> +	*extra_hits_mask |= (uint64_t)(__builtin_popcount(secondary_hash_matches) > 2) << idx;
> +	*extra_hits_mask |= (uint64_t)(__builtin_popcount(total_hash_matches) > 2) << idx;
> +
> +}
> +
> +
> +/* Lookup bulk stage 3: Check if key matches, update hit mask */
> +static inline void
> +lookup_stage3(unsigned idx, const void *key_slot,
> +		const void * const *keys, int32_t *positions,
> +		uint64_t *hits, const struct rte_hash *h)
> +{
> +	unsigned hit;
> +
> +	hit = !memcmp(key_slot, keys[idx], h->key_len);
> +	if (unlikely(hit == 0))
> +		positions[idx] = -ENOENT;
> +	*hits = (uint64_t)(hit) << idx;
> +}
> +
> +static inline int
> +__rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> +		      uint32_t num_keys, int32_t *positions) {
> +
> +	uint64_t hits = 0;
> +	uint64_t next_mask = 0;
> +	uint64_t extra_hits_mask = 0;
> +	uint64_t lookup_mask;
> +	unsigned idx;
> +	const void *key_store = h->key_store;
> +
> +	unsigned idx00, idx01, idx10, idx11, idx20, idx21, idx30, idx31;
> +	const struct rte_hash_bucket *primary_bkt10, *primary_bkt11;
> +	const struct rte_hash_bucket *secondary_bkt10, *secondary_bkt11;
> +	const struct rte_hash_bucket *primary_bkt20, *primary_bkt21;
> +	const struct rte_hash_bucket *secondary_bkt20, *secondary_bkt21;
> +	const void *k_slot20, *k_slot21, *k_slot30, *k_slot31;
> +	uint64_t primary_hash00, primary_hash01;
> +	uint64_t secondary_hash00, secondary_hash01;
> +	uint64_t primary_hash10, primary_hash11;
> +	uint64_t secondary_hash10, secondary_hash11;
> +	uint64_t primary_hash20, primary_hash21;
> +	uint64_t secondary_hash20, secondary_hash21;
> +
> +	if (num_keys == RTE_HASH_LOOKUP_BULK_MAX)
> +		lookup_mask = 0xffffffffffffffff;
> +	else
> +		lookup_mask = (1 << num_keys) - 1;
> +
> +	lookup_stage0(&idx00, &lookup_mask, &primary_hash00,
> +			&secondary_hash00, keys, h);
> +	lookup_stage0(&idx01, &lookup_mask, &primary_hash01,
> +			&secondary_hash01, keys, h);
> +
> +	primary_hash10 = primary_hash00;
> +	primary_hash11 = primary_hash01;
> +	secondary_hash10 = secondary_hash00;
> +	secondary_hash11 = secondary_hash01;
> +	idx10 = idx00, idx11 = idx01;
> +
> +	lookup_stage0(&idx00, &lookup_mask, &primary_hash00,
> +			&secondary_hash00, keys, h);
> +	lookup_stage0(&idx01, &lookup_mask, &primary_hash01,
> +			&secondary_hash01, keys, h);
> +	lookup_stage1(primary_hash10, secondary_hash10, &primary_bkt10,
> +			&secondary_bkt10, h);
> +	lookup_stage1(primary_hash11, secondary_hash11, &primary_bkt11,
> +			&secondary_bkt11, h);
> +
> +	primary_bkt20 = primary_bkt10;
> +	primary_bkt21 = primary_bkt11;
> +	secondary_bkt20 = secondary_bkt10;
> +	secondary_bkt21 = secondary_bkt11;
> +	primary_hash20 = primary_hash10;
> +	primary_hash21 = primary_hash11;
> +	secondary_hash20 = secondary_hash10;
> +	secondary_hash21 = secondary_hash11;
> +	idx20 = idx10, idx21 = idx11;
> +	primary_hash10 = primary_hash00;
> +	primary_hash11 = primary_hash01;
> +	secondary_hash10 = secondary_hash00;
> +	secondary_hash11 = secondary_hash01;
> +	idx10 = idx00, idx11 = idx01;
> +
> +	lookup_stage0(&idx00, &lookup_mask, &primary_hash00,
> +			&secondary_hash00, keys, h);
> +	lookup_stage0(&idx01, &lookup_mask, &primary_hash01,
> +			&secondary_hash01, keys, h);
> +	lookup_stage1(primary_hash10, secondary_hash10, &primary_bkt10,
> +			&secondary_bkt10, h);
> +	lookup_stage1(primary_hash11, secondary_hash11, &primary_bkt11,
> +			&secondary_bkt11, h);
> +	lookup_stage2(idx20, primary_hash20, secondary_hash20, primary_bkt20,
> +		secondary_bkt20, &k_slot20, positions, &extra_hits_mask,
> +		key_store, h);
> +	lookup_stage2(idx21, primary_hash21, secondary_hash21, primary_bkt21,
> +		secondary_bkt21, &k_slot21, positions, &extra_hits_mask,
> +		key_store, h);
> +
> +	while (lookup_mask) {
> +		k_slot30 = k_slot20, k_slot31 = k_slot21;
> +		idx30 = idx20, idx31 = idx21;
> +		primary_bkt20 = primary_bkt10;
> +		primary_bkt21 = primary_bkt11;
> +		secondary_bkt20 = secondary_bkt10;
> +		secondary_bkt21 = secondary_bkt11;
> +		primary_hash20 = primary_hash10;
> +		primary_hash21 = primary_hash11;
> +		secondary_hash20 = secondary_hash10;
> +		secondary_hash21 = secondary_hash11;
> +		idx20 = idx10, idx21 = idx11;
> +		primary_hash10 = primary_hash00;
> +		primary_hash11 = primary_hash01;
> +		secondary_hash10 = secondary_hash00;
> +		secondary_hash11 = secondary_hash01;
> +		idx10 = idx00, idx11 = idx01;
> +
> +		lookup_stage0(&idx00, &lookup_mask, &primary_hash00,
> +				&secondary_hash00, keys, h);
> +		lookup_stage0(&idx01, &lookup_mask, &primary_hash01,
> +				&secondary_hash01, keys, h);
> +		lookup_stage1(primary_hash10, secondary_hash10, &primary_bkt10,
> +				&secondary_bkt10, h);
> +		lookup_stage1(primary_hash11, secondary_hash11, &primary_bkt11,
> +				&secondary_bkt11, h);
> +		lookup_stage2(idx20, primary_hash20, secondary_hash20,
> +			primary_bkt20, secondary_bkt20, &k_slot20, positions,
> +			&extra_hits_mask, key_store, h);
> +		lookup_stage2(idx21, primary_hash21, secondary_hash21,
> +			primary_bkt21, secondary_bkt21,	&k_slot21, positions,
> +			&extra_hits_mask, key_store, h);
> +		lookup_stage3(idx30, k_slot30, keys, positions, &hits, h);
> +		lookup_stage3(idx31, k_slot31, keys, positions, &hits, h);
> +	}
> +
> +	k_slot30 = k_slot20, k_slot31 = k_slot21;
> +	idx30 = idx20, idx31 = idx21;
> +	primary_bkt20 = primary_bkt10;
> +	primary_bkt21 = primary_bkt11;
> +	secondary_bkt20 = secondary_bkt10;
> +	secondary_bkt21 = secondary_bkt11;
> +	primary_hash20 = primary_hash10;
> +	primary_hash21 = primary_hash11;
> +	secondary_hash20 = secondary_hash10;
> +	secondary_hash21 = secondary_hash11;
> +	idx20 = idx10, idx21 = idx11;
> +	primary_hash10 = primary_hash00;
> +	primary_hash11 = primary_hash01;
> +	secondary_hash10 = secondary_hash00;
> +	secondary_hash11 = secondary_hash01;
> +	idx10 = idx00, idx11 = idx01;
> +	lookup_stage1(primary_hash10, secondary_hash10, &primary_bkt10,
> +		&secondary_bkt10, h);
> +	lookup_stage1(primary_hash11, secondary_hash11, &primary_bkt11,
> +		&secondary_bkt11, h);
> +	lookup_stage2(idx20, primary_hash20, secondary_hash20, primary_bkt20,
> +		secondary_bkt20, &k_slot20, positions, &extra_hits_mask,
> +		key_store, h);
> +	lookup_stage2(idx21, primary_hash21, secondary_hash21, primary_bkt21,
> +		secondary_bkt21, &k_slot21, positions, &extra_hits_mask,
> +		key_store, h);
> +	lookup_stage3(idx30, k_slot30, keys, positions, &hits, h);
> +	lookup_stage3(idx31, k_slot31, keys, positions, &hits, h);
> +
> +	k_slot30 = k_slot20, k_slot31 = k_slot21;
> +	idx30 = idx20, idx31 = idx21;
> +	primary_bkt20 = primary_bkt10;
> +	primary_bkt21 = primary_bkt11;
> +	secondary_bkt20 = secondary_bkt10;
> +	secondary_bkt21 = secondary_bkt11;
> +	primary_hash20 = primary_hash10;
> +	primary_hash21 = primary_hash11;
> +	secondary_hash20 = secondary_hash10;
> +	secondary_hash21 = secondary_hash11;
> +	idx20 = idx10, idx21 = idx11;
> +
> +	lookup_stage2(idx20, primary_hash20, secondary_hash20, primary_bkt20,
> +		secondary_bkt20, &k_slot20, positions, &extra_hits_mask,
> +		key_store, h);
> +	lookup_stage2(idx21, primary_hash21, secondary_hash21, primary_bkt21,
> +		secondary_bkt21, &k_slot21, positions, &extra_hits_mask,
> +		key_store, h);
> +	lookup_stage3(idx30, k_slot30, keys, positions, &hits, h);
> +	lookup_stage3(idx31, k_slot31, keys, positions, &hits, h);
> +
> +	k_slot30 = k_slot20, k_slot31 = k_slot21;
> +	idx30 = idx20, idx31 = idx21;
> +
> +	lookup_stage3(idx30, k_slot30, keys, positions, &hits, h);
> +	lookup_stage3(idx31, k_slot31, keys, positions, &hits, h);
> +
> +	/* handle extra_hits_mask */
> +	next_mask |= extra_hits_mask;
> +
> +	/* ignore any items we have already found */
> +	next_mask &= ~hits;
> +
> +	if (unlikely(next_mask)) {
> +		/* run a single search for each remaining item */
> +		do {
> +			idx = __builtin_ctzl(next_mask);
> +			positions[idx] = rte_hash_lookup(h, keys[idx]);
> +			next_mask &= ~(1llu << idx);
> +		} while (next_mask);
>  	}
>  
>  	return 0;
>  }
> +
> +int
> +rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> +		      uint32_t num_keys, int32_t *positions)
> +{
> +	RETURN_IF_TRUE(((h == NULL) || (keys == NULL) || (num_keys == 0) ||
> +			(num_keys > RTE_HASH_LOOKUP_BULK_MAX) ||
> +			(positions == NULL)), -EINVAL);
> +
> +	return __rte_hash_lookup_bulk(h, keys, num_keys, positions);
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
> index 821a9d4..4088ac4 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -42,27 +42,36 @@
>  
>  #include <stdint.h>
>  #include <sys/queue.h>
> +#include <rte_ring.h>

The depdirs in the makefile needs to be updated with a new dependency on
rte_ring library.

>  
>  #ifdef __cplusplus
>  extern "C" {
>  #endif
>  
>  /** Maximum size of hash table that can be created. */
> -#define RTE_HASH_ENTRIES_MAX			(1 << 26)
> +#define RTE_HASH_ENTRIES_MAX			(1 << 30)
>  
> -/** Maximum bucket size that can be created. */
> -#define RTE_HASH_BUCKET_ENTRIES_MAX		16

This is a public macro, you can't just drop it. I suggest changing it's value
to 4, and adding the new macro below alongside it. If you want, you can
also see about marking it as deprecated to remove in future, but I'm not sure
it's worthwhile doing so.

> +/** Number of items per bucket. */
> +#define RTE_HASH_BUCKET_ENTRIES			4
>  
>  /** Maximum length of key that can be used. */
>  #define RTE_HASH_KEY_LENGTH_MAX			64
>  
> -/** Max number of keys that can be searched for using rte_hash_lookup_multi. */
> -#define RTE_HASH_LOOKUP_BULK_MAX		16
> -#define RTE_HASH_LOOKUP_MULTI_MAX		RTE_HASH_LOOKUP_BULK_MAX

As above, you need to keep the lookup_multi_max macro.

> -
> -/** Max number of characters in hash name.*/
> +/** Maximum number of characters in hash name.*/
>  #define RTE_HASH_NAMESIZE			32
>  
> +/** Bits that will be right shifted when calculating a secondary hash. */
> +#define RTE_HASH_ALT_BITS_SHIFT			12
> +
> +/** Bits that will be XORed to obtained a secondary hash. */
> +#define RTE_HASH_ALT_BITS_XOR_MASK		0x5bd1e995

If these macros are for use internally in the hash implementation only, mark
them as @internal, so that people don't start depending on them in their apps.
This allows us to rename them or remove them in future if we like, without having
to give one-releases deprecation warning.

> +
> +/** Maximum number of keys that can be searched for using rte_hash_lookup_bulk. */
> +#define RTE_HASH_LOOKUP_BULK_MAX		64
> +
> +/* Stored key size is a multiple of this value */
> +#define KEY_ALIGNMENT				16
> +
>  /** Signature of key that is stored internally. */
>  typedef uint32_t hash_sig_t;
>  
> @@ -77,9 +86,9 @@ typedef uint32_t (*rte_hash_function)(const void *key, uint32_t key_len,
>  struct rte_hash_parameters {
>  	const char *name;		/**< Name of the hash. */
>  	uint32_t entries;		/**< Total hash table entries. */
> -	uint32_t bucket_entries;	/**< Bucket entries. */
> -	uint32_t key_len;		/**< Length of hash key. */
> -	rte_hash_function hash_func;	/**< Function used to calculate hash. */
> +	uint32_t bucket_entries;        /**< Bucket entries. */
> +	uint16_t key_len;		/**< Length of hash key. */
> +	rte_hash_function hash_func;	/**< Primary Hash function used to calculate hash. */
>  	uint32_t hash_func_init_val;	/**< Init value used by hash_func. */
>  	int socket_id;			/**< NUMA Socket ID for memory. */
>  };

Due to padding of the uint16_t value, I don't think this will break the ABI.
However, it might be worthwhile manually putting in an explicit pad (or comment)
to make this clear.

> @@ -88,24 +97,35 @@ struct rte_hash_parameters {
>  struct rte_hash {

As part of this patchset, it might be worthwhile updating the comment on
struct rte_hash to indicate that this is an internal data-structure that should
not be directly referenced by apps.

>  	char name[RTE_HASH_NAMESIZE];	/**< Name of the hash. */
>  	uint32_t entries;		/**< Total table entries. */
> -	uint32_t bucket_entries;	/**< Bucket entries. */
> -	uint32_t key_len;		/**< Length of hash key. */
> +	uint32_t bucket_entries;        /**< Bucket entries. */
> +	uint16_t key_len;		/**< Length of hash key. */
>  	rte_hash_function hash_func;	/**< Function used to calculate hash. */
>  	uint32_t hash_func_init_val;	/**< Init value used by hash_func. */
>  	uint32_t num_buckets;		/**< Number of buckets in table. */
>  	uint32_t bucket_bitmask;	/**< Bitmask for getting bucket index
>  							from hash signature. */
> -	hash_sig_t sig_msb;	/**< MSB is always set in valid signatures. */
> -	uint8_t *sig_tbl;	/**< Flat array of hash signature buckets. */
> -	uint32_t sig_tbl_bucket_size;	/**< Signature buckets may be padded for
> -					   alignment reasons, and this is the
> -					   bucket size used by sig_tbl. */
> -	uint8_t *key_tbl;	/**< Flat array of key value buckets. */
> -	uint32_t key_tbl_key_size;	/**< Keys may be padded for alignment
> -					   reasons, and this is the key size
> -					   used	by key_tbl. */
> +	uint16_t key_entry_size;	/**< Size of each key entry. */
> +	int socket_id;			/**< NUMA Socket ID for memory. */
> +	uint64_t sig_msb;		/**< MSB is always set in valid signatures. */
> +	uint64_t sig_secondary;		/**< Second MSB is set when hash is calculated
> +						from secondary hash function. */
> +
> +	struct rte_ring *free_slots;	/**< Ring that stores all indexes
> +						of the free slots in the key table */
> +	void *key_store;		/**< Table storing all keys and data */
> +
> +	struct rte_hash_bucket *buckets;	/**< Table with buckets storing all the
> +							hash values and key indexes
> +							to the key table*/
>  };

Since this header file does not contain any inline functions that operate on
this structure, it can be safely moved to the .c file, and replaced by a simple
"struct rte_hash" definition.

>  
> +/** Bucket structure */
> +struct rte_hash_bucket {
> +	uint64_t signatures[RTE_HASH_BUCKET_ENTRIES];
> +	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
> +} __rte_cache_aligned;
> +

Move to .c file as unused in this file.

> +
>  /**
>   * Create a new hash table.
>   *
> @@ -126,7 +146,6 @@ struct rte_hash {
>  struct rte_hash *
>  rte_hash_create(const struct rte_hash_parameters *params);
>  
> -
>  /**
>   * Find an existing hash table object and return a pointer to it.
>   *
> @@ -158,9 +177,9 @@ rte_hash_free(struct rte_hash *h);
>   *   Key to add to the hash table.
>   * @return
>   *   - -EINVAL if the parameters are invalid.
> + *   - -EAGAIN if key could not be added (table needs rehash)
>   *   - -ENOSPC if there is no space in the hash for this key.
> - *   - A positive value that can be used by the caller as an offset into an
> - *     array of user data. This value is unique for this key.
> + *   - 0 if key was added successfully
>   */

This is an ABI and API change. With this change you can no longer use the old-style
way of returning indexes to allow the app to store the data locally. 

>  int32_t
>  rte_hash_add_key(const struct rte_hash *h, const void *key);
> @@ -177,13 +196,12 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
>   *   Hash value to add to the hash table.
>   * @return
>   *   - -EINVAL if the parameters are invalid.
> + *   - -EAGAIN if key could not be added (table needs rehash)
>   *   - -ENOSPC if there is no space in the hash for this key.
> - *   - A positive value that can be used by the caller as an offset into an
> - *     array of user data. This value is unique for this key.
> + *   - 0 if key was added successfully
>   */
>  int32_t
> -rte_hash_add_key_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig);
> +rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>  
>  /**
>   * Remove a key from an existing hash table. This operation is not multi-thread
> @@ -196,9 +214,7 @@ rte_hash_add_key_with_hash(const struct rte_hash *h,
>   * @return
>   *   - -EINVAL if the parameters are invalid.
>   *   - -ENOENT if the key is not found.
> - *   - A positive value that can be used by the caller as an offset into an
> - *     array of user data. This value is unique for this key, and is the same
> - *     value that was returned when the key was added.
> + *   - 0 if key was deleted successfully
>   */
>  int32_t
>  rte_hash_del_key(const struct rte_hash *h, const void *key);
> @@ -216,14 +232,10 @@ rte_hash_del_key(const struct rte_hash *h, const void *key);
>   * @return
>   *   - -EINVAL if the parameters are invalid.
>   *   - -ENOENT if the key is not found.
> - *   - A positive value that can be used by the caller as an offset into an
> - *     array of user data. This value is unique for this key, and is the same
> - *     value that was returned when the key was added.
> + *   - 0 if key was deleted successfully
>   */
>  int32_t
> -rte_hash_del_key_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig);
> -
> +rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>  
>  /**
>   * Find a key in the hash table. This operation is multi-thread safe.
> @@ -235,9 +247,7 @@ rte_hash_del_key_with_hash(const struct rte_hash *h,
>   * @return
>   *   - -EINVAL if the parameters are invalid.
>   *   - -ENOENT if the key is not found.
> - *   - A positive value that can be used by the caller as an offset into an
> - *     array of user data. This value is unique for this key, and is the same
> - *     value that was returned when the key was added.
> + *   - 0 if key was found successfully
>   */
>  int32_t
>  rte_hash_lookup(const struct rte_hash *h, const void *key);
> @@ -254,14 +264,40 @@ rte_hash_lookup(const struct rte_hash *h, const void *key);
>   * @return
>   *   - -EINVAL if the parameters are invalid.
>   *   - -ENOENT if the key is not found.
> - *   - A positive value that can be used by the caller as an offset into an
> - *     array of user data. This value is unique for this key, and is the same
> - *     value that was returned when the key was added.
> + *   - 0 if key was found successfully
>   */
>  int32_t
> -rte_hash_lookup_with_hash(const struct rte_hash *h,
> -				const void *key, hash_sig_t sig);
> +rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>  
> +#define rte_hash_lookup_multi rte_hash_lookup_bulk
> +/**
> + * Find multiple keys in the hash table. This operation is multi-thread safe.
> + *
> + * @param h
> + *   Hash table to look in.
> + * @param keys
> + *   A pointer to a list of keys to look for.
> + * @param num_keys
> + *   How many keys are in the keys list (less than RTE_HASH_LOOKUP_BULK_MAX).
> + * @param positions
> + *   Output containing a list of values, corresponding to the list of keys that
> + *   can be used by the caller as an offset into an array of user data. These
> + *   values are unique for each key, and are the same values that were returned
> + *   when each key was added. If a key in the list was not found, then -ENOENT
> + *   will be the value.
> + * @return
> + *   -EINVAL if there's an error, otherwise number of hits.
> + */
> +int
> +rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> +		      uint32_t num_keys, int32_t *positions);
> +
> +static inline hash_sig_t
> +hash_alt(const uint32_t primary_hash) {
> +	uint32_t tag = primary_hash >> RTE_HASH_ALT_BITS_SHIFT;
> +
> +	return (primary_hash ^ ((tag + 1) * RTE_HASH_ALT_BITS_XOR_MASK));
> +}

This needs a doxygen comment, is it public or internal?

>  
>  /**
>   * Calc a hash value by key. This operation is not multi-process safe.
> @@ -280,28 +316,23 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
>  	return h->hash_func(key, h->key_len, h->hash_func_init_val);
>  }
>  
> -#define rte_hash_lookup_multi rte_hash_lookup_bulk
>  /**
> - * Find multiple keys in the hash table. This operation is multi-thread safe.
> + * Calc the secondary hash value by key. This operation is not multi-process safe.
>   *
>   * @param h
>   *   Hash table to look in.
> - * @param keys
> - *   A pointer to a list of keys to look for.
> - * @param num_keys
> - *   How many keys are in the keys list (less than RTE_HASH_LOOKUP_BULK_MAX).
> - * @param positions
> - *   Output containing a list of values, corresponding to the list of keys that
> - *   can be used by the caller as an offset into an array of user data. These
> - *   values are unique for each key, and are the same values that were returned
> - *   when each key was added. If a key in the list was not found, then -ENOENT
> - *   will be the value.
> + * @param key
> + *   Key to find.
>   * @return
> - *   -EINVAL if there's an error, otherwise 0.
> + *   - hash value
>   */
> -int
> -rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
> -		      uint32_t num_keys, int32_t *positions);
> +static inline hash_sig_t
> +rte_hash_secondary_hash(const uint32_t primary_hash)
> +{
> +	/* calc hash result by key */
> +	return hash_alt(primary_hash);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> -- 
> 2.4.2
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17 10:35  9% ` Neil Horman
  2015-06-17 11:06  4%   ` Richardson, Bruce
  2015-06-17 12:14  7%   ` Panu Matilainen
@ 2015-06-18  8:36  4%   ` Zhang, Helin
  2 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-18  8:36 UTC (permalink / raw)
  To: Neil Horman, Thomas Monjalon; +Cc: dev

Hi Neil

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, June 17, 2015 6:35 PM
> To: Thomas Monjalon
> Cc: announce@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices - statistics -
> ABI
> 
> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > Hi all,
> >
> > Sometimes there are some important discussions about architecture or
> > design which require opinions from several developers. Unfortunately,
> > we cannot read every threads. Maybe that using the announce mailing
> > list will help to bring more audience to these discussions.
> > Please note that
> > 	- the announce@ ML is moderated to keep a low traffic,
> > 	- every announce email is forwarded to dev@ ML.
> > In case you want to reply to this email, please use dev@dpdk.org address.
> >
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> >
> > During the development of the release 2.0, there was an agreement to
> > keep ABI compatibility or to bring new ABI while keeping old one during one
> release.
> > In case it's not possible to have this transition, the (exceptional)
> > break should be acknowledged by several developers.
> > 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> > There were some interesting discussions but not a lot of participants:
> >
> > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=84
> > 61
> >
> > During the current development cycle for the release 2.1, the ABI
> > question arises many times in different threads.
> > To add the hash key size field, it is proposed to use a struct padding gap:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
> > To support the flow director for VF, there is no proposal yet:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
> > To add the speed capability, it is proposed to break ABI in the release 2.2:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
> > To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
> > To add the interrupt mode, it is proposed to add a build-time option
> > CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
> binary:
> > 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
> > To add the packet type, there is a proposal to add a build-time option
> > CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
> > We must also better document how to remove a deprecated ABI:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
> > The ABI compatibility is a new constraint and we need to better
> > understand what it means and how to proceed. Even the macros are not yet
> well documented:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
> >
> > Thanks for your attention and your participation in these important choices.
> >
> 
> Thomas-
> 	Just to re-iterate what you said earlier, and what was discussed in the
> previous ABI discussions
> 
> 1) ABI stability was introduced to promote DPDK's ability to be included with
> various linux and BSD distributions.  Distributions, by and large, favor building
> libraries as DSO's, favoring security and updatability in favor of all out
> performance.
> 
> 2) The desire was to put DPDK developers in a mindset whereby ABI stability was
> something they needed to think about during development, as the DPDK
> exposes many data structures and instances that cannot be changed without
> breaking ABI
> 
> 3) The versioning mechanism was introduced to allow for backward compatibility
> during periods in which we needed to support both an old an new ABI
> 
> 4) As Stephan and others point out, its not expected that we will always be able
> to maintain ABI, and as such an easy library versioning mechanism was
> introduced to prevent the loading of an incompatible library with an older
> application
> 
> 5) The ABI policy was introduced to create a method by which new ABI facets
> could be scheduled while allowing distros to prepare their downstream users for
> the upcomming changes.
> 
> 
> It seems to me, looking back over these last few months, that we're falling down
> a bit on our use of (3).  I've seen several people take advantage of the ABI
> scheduled updates, but no one has tried the versioning interface, and as a result
> patches are getting delayed, which was never my intent.  Not sure whats to be
> done about that, but we should probably address it.  Is use of the versionnig
> interface just too hard or convoluted?
That means we should try to use versioning mechanism as more as possible?
Are there any rules/instructions to judge what type of changes we should try
versioning mechanism, and what else we shouldn't try?
Is there any good examples of using versioning mechanism for reference?
Even not in DPDK project.

Thanks,
Helin


> 
> Neil

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v7 0/4] User-space Ethtool
  @ 2015-06-18  2:04  3%   ` Stephen Hemminger
  2015-06-18 12:47  0%     ` Wang, Liang-min
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-18  2:04 UTC (permalink / raw)
  To: Liang-Min Larry Wang; +Cc: dev

On Wed, Jun 17, 2015 at 6:22 PM, Liang-Min Larry Wang <
liang-min.wang@intel.com> wrote:

> This implementation is designed to provide a familar interface for
> applications that rely on kernel-space driver to support ethtool_op and
> net_device_op for device management. The initial implementation focuses on
> ops that can be implemented through existing netdev APIs. More ops will be
> supported in latter release.
>
> v7 change:
> - Remove rte_eth_dev_get_ringparam implementation
> v6 change:
> - Rebase to match new changes over librte_ether
> v5 change:
> - Change API name from 'leng' to 'length'
> - Remove unused data structure rte_dev_vf_info
> - Remove placeholder API rte_eth_dev_set_ringparam
> - Clean up set_mac_addr implementation
> v4 change:
> - Add rte_eth_xxx apis and respective ops over igb and ixgbe
>   to support ethtool and net device alike ops
> - Add an example to demonstrate the use of ethtool library
> v3 change:
> - Fix a build issue
> v2 change:
> - Implement rte_eth_dev_default_mac_addr_set through dev_ops::mac_addr_set
> so it would support NIC devices other than ixgbe and igb
>
> Liang-Min Larry Wang (4):
>   ethdev: add apis to support access device info
>   ixgbe: add ops to support ethtool ops
>   igb: add ops to support ethtool ops
>   examples: new example: l2fwd-ethtool
>
>  drivers/net/e1000/igb_ethdev.c                   |  186 ++++
>  drivers/net/e1000/igb_regs.h                     |  217 +++++
>  drivers/net/ixgbe/ixgbe_ethdev.c                 |  183 ++++
>  drivers/net/ixgbe/ixgbe_regs.h                   |  357 ++++++++
>  examples/l2fwd-ethtool/Makefile                  |   55 ++
>  examples/l2fwd-ethtool/l2fwd-app/Makefile        |   58 ++
>  examples/l2fwd-ethtool/l2fwd-app/main.c          | 1030
> ++++++++++++++++++++++
>  examples/l2fwd-ethtool/l2fwd-app/netdev_api.h    |  781 ++++++++++++++++
>  examples/l2fwd-ethtool/l2fwd-app/shared_fifo.h   |  151 ++++
>  examples/l2fwd-ethtool/lib/Makefile              |   55 ++
>  examples/l2fwd-ethtool/lib/rte_ethtool.c         |  301 +++++++
>  examples/l2fwd-ethtool/lib/rte_ethtool.h         |  378 ++++++++
>  examples/l2fwd-ethtool/nic-control/Makefile      |   55 ++
>  examples/l2fwd-ethtool/nic-control/nic_control.c |  412 +++++++++
>  lib/librte_ether/Makefile                        |    1 +
>  lib/librte_ether/rte_eth_dev_info.h              |   57 ++
>  lib/librte_ether/rte_ethdev.c                    |  115 +++
>  lib/librte_ether/rte_ethdev.h                    |  117 +++
>  lib/librte_ether/rte_ether_version.map           |    6 +
>  19 files changed, 4515 insertions(+)
>  create mode 100644 drivers/net/e1000/igb_regs.h
>  create mode 100644 drivers/net/ixgbe/ixgbe_regs.h
>  create mode 100644 examples/l2fwd-ethtool/Makefile
>  create mode 100644 examples/l2fwd-ethtool/l2fwd-app/Makefile
>  create mode 100644 examples/l2fwd-ethtool/l2fwd-app/main.c
>  create mode 100644 examples/l2fwd-ethtool/l2fwd-app/netdev_api.h
>  create mode 100644 examples/l2fwd-ethtool/l2fwd-app/shared_fifo.h
>  create mode 100644 examples/l2fwd-ethtool/lib/Makefile
>  create mode 100644 examples/l2fwd-ethtool/lib/rte_ethtool.c
>  create mode 100644 examples/l2fwd-ethtool/lib/rte_ethtool.h
>  create mode 100644 examples/l2fwd-ethtool/nic-control/Makefile
>  create mode 100644 examples/l2fwd-ethtool/nic-control/nic_control.c
>  create mode 100644 lib/librte_ether/rte_eth_dev_info.h
>
> --
> 2.1.4
>
>
I agree with having a more complete API, but have some nits to pick.
Could the API be more abstract to reduce ABI issues in future?

I know choosing names is hard, but as a Linux developer ethtool has a very
specific meaning to me.
This API encompasses things broader than Linux ethtool and has different
semantics therefore
not sure having something in DPDK with same name is really a good idea.

It would be better to call it something else like netdev_?? Or dpnet_??

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline
       [not found]               ` <2601191342CEEE43887BDE71AB97725836A1237C@irsmsx105.ger.corp.intel.com>
@ 2015-06-17 18:50  3%             ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-17 18:50 UTC (permalink / raw)
  To: dev

Hi Dave,

> From: Dave Barach (dbarach) [mailto:dbarach@cisco.com]
> Sent: Wednesday, June 17, 2015 4:46 PM
> To: Venkatesan, Venky; Richardson, Bruce; olivier.matz@6wind.com; Ananyev, Konstantin
> Cc: Damjan Marion (damarion)
> Subject: RE: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> 
> Dear Venky,
> 
> The first thing I noticed - on a specific piece of hardware, yadda yadda yadda - was that the i40e driver speed-path spent an ungodly
> amount of time stalled in i40e_rx_alloc_bufs(.) in rte_mbuf_refcnt_set. Mumble, missing prefetch. I added a stride-of-1 prefetch,
> which made the first stall go away. See below.
> 
> Next thing I noticed: a stall setting mb->next = 0, in the non-scattered RX case. So I added the (commented-out) rte_prefetch0
> (&pfmb->next). At that point, I decided to move the buffer metadata around to eliminate the second prefetch.
> 
> Taken together, these changes made a 10% PPS difference, again in a specific use case.

That seems like a valid point, but why moving next to the first cache-line is considered as the only possible option?
As Bruce suggested before, we can try to get rid of touching next in non-scattered RX routines.
That I think should provide similar performance improvement for i40e/ixgbe fast-path scalar RX code. 

Or you are talking about hit in your app layer code, when you start chain mbufs together?

> 
> Damjan was going to produce a cleaned up version of the rte_mbuf.h diffs, of the form:
> 
> struct rte_mbuf {
> #ifdef CONFIG_OFFLOAD_IN_FIRST_CACHE_LINE
>   Offload-in-first-cache-line;
>   Next-in-second-cache-line;
> #else
>   Offload-in-second-cache-line;
>   Next-in-first-cache-line;r
> #endif
> };
> 
> .along with a parallel change in the kernel module version.

As first, with the proposed below changes ixgbe vector RX routine would be broken.
Of course, it could be fixed by putting even more conditional compilation around -
enable vector RX, only when OFFLOAD_IN_FIRST_CACHE_LINE enabled, etc.
Second, how long it would take before someone else would like to introduce another mbuf fields swap?
All for perfectly good reason of course.
Let say, swap 'hash' and 'segn' (to keep vector RX working),
or 'next' and 'userdata', or put tx_offload into the first line (traffic generators).
I think if we'll go that way  (allow mbuf fields swapping at build-time) we'll end up with
totally unmaintainable code.
Not to mention problems with ABI compatibility) (shared libraries, multiple processes, KNI).
So, I think we better stick with one mbuf format.
If it's absolutely necessary to move next into the first cache, I think it has to be done for all configs.
Though, from what you described with i40e_rx_alloc_bufs() -
that looks like a flaw in particular implementation, that might be fixed without changing mbuf format.

> 
> I'm out of LSU bandwidth / prefetch slots in a number of apps - mostly L2 apps - which will never use h/w offload results. I can see
> your point about not turning dpdk buffer metadata into a garbage can. On the other hand, the problems we face often involve MTU
> => scattered RX, with a mix of small and large packets. As in: enough PPS to care about extra cache-line fills.
> 
> Hope this explains it a bit. We can obviously make these changes in our own repos, but I can't imagine that we're the only folks who
> might not use h/w offload results.
> 
> Thanks. Dave
> 
> 
> diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
> index 9c7be6f..1200361 100644
> --- a/lib/librte_pmd_i40e/i40e_rxtx.c
> +++ b/lib/librte_pmd_i40e/i40e_rxtx.c
> @@ -779,9 +779,15 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
> 
>       rxdp = &rxq->rx_ring[alloc_idx];
>       for (i = 0; i < rxq->rx_free_thresh; i++) {
> +                if (i < (rxq->rx_free_thresh - 1)) {
> +                        struct rte_mbuf *pfmb;
> +                        pfmb = rxep[i+1].mbuf;
> +                        rte_prefetch0 (pfmb);
> +                        // rte_prefetch0 (&pfmb->next);

Wonder does your compiler unroll that loop?
If not, wonder, would manually unrolling it (by 4 or so) help?
Konstantin  


> +                }
>             mb = rxep[i].mbuf;
>             rte_mbuf_refcnt_set(mb, 1);
> -           mb->next = NULL;
> +           mb->next = NULL; /* $$$ in second cacheline */
>             mb->data_off = RTE_PKTMBUF_HEADROOM;
>             mb->nb_segs = 1;
>             mb->port = rxq->port_id;
> 
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 55749bc..efd7f4e 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -288,6 +288,12 @@ struct rte_mbuf {
>       uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
>       uint16_t vlan_tci;        /**< VLAN Tag Control Identifier (CPU order) */
>       uint16_t reserved;
> +     uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
> +
> +     struct rte_mbuf *next;    /**< Next segment of scattered packet. */
> +
> +     /* second cache line - fields only used in slow path or on TX */
> +     MARKER cacheline1 __rte_cache_aligned;
>       union {
>             uint32_t rss;     /**< RSS hash result if RSS enabled */
>             struct {
> @@ -307,18 +313,12 @@ struct rte_mbuf {
>             uint32_t usr;       /**< User defined tags. See rte_distributor_process() */
>       } hash;                   /**< hash information */
> 
> -     uint32_t seqn; /**< Sequence number. See also rte_reorder_insert() */
> -
> -     /* second cache line - fields only used in slow path or on TX */
> -     MARKER cacheline1 __rte_cache_aligned;
> -
>       union {
>             void *userdata;   /**< Can be used for external metadata */
>             uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
>       };
> 
>       struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
> -     struct rte_mbuf *next;    /**< Next segment of scattered packet. */
> 
>       /* fields to support TX offloads */
>       union {
> 
> 
> 
> 
> Thanks. Dave
> 
> From: Venkatesan, Venky [mailto:venky.venkatesan@intel.com]
> Sent: Wednesday, June 17, 2015 11:03 AM
> To: Richardson, Bruce; Dave Barach (dbarach); olivier.matz@6wind.com; Ananyev, Konstantin
> Cc: Damjan Marion (damarion)
> Subject: RE: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> 
> Dave,
> 
> Is there a patch that I can look at? From your description what you seem to indicate is that the mbuf structure becomes variant
> between applications - that to me is likely a non-starter. If rte_mbuf is variant between implementations, then  we really have no
> commonality that all VNFs can rely on.
> 
> I may be getting a bit ahead of things here but letting different apps implement different mbuf layouts means, to achieve any level of
> commonality for upper level apps we need to go to accessor functions (not inline macros) to get various components - pretty much
> like ODP does; all that does is add function call overhead per access and that is completely non-performant. We toyed with that about
> 5 years ago and tossed it out.
> 
> Bruce,
> 
> To some extent I need to get an understanding of why this performance drop is actually happening. Since SNB we have had a paired
> cache line prefetcher that brings in the cache line pair. I think the reason that we have the perf issue is that half the mbufs actually
> begin on an odd cache line boundary - i.e. those are the ones that will suffer a cache miss on rte_mbuf.next. Could you verify that this
> is the case, and see what happens if we address all mbufs beginning on an even cache line? That runs into another potential issue with
> 4K aliasing, but I may have a way to avoid that.
> 
> Regards,
> -Venky
> 
> 
> From: Richardson, Bruce
> Sent: Wednesday, June 17, 2015 7:50 AM
> To: Dave Barach (dbarach); olivier.matz@6wind.com; Ananyev, Konstantin
> Cc: Damjan Marion (damarion); Venkatesan, Venky
> Subject: RE: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> 
> Hi,
> 
> it's something we can look at - especially once we see the proposed patch.
> 
> However, my question is, if an app is cycle constrained such that the extra cache-line reference for jumbo frames cause a problem, is
> that app not likely to really suffer performance issues when run with smaller e.g. 256 byte packet sizes?  And in the inverse case, if an
> app can deal with small packets, is it not likely able to take the extra hit for the jumbo frames? This was our thinking when doing the
> cache-line split, but perhaps the logic doesn't hold up in the cases you refer to.
> 
> Regards,
> /Bruce
> 
> From: Dave Barach (dbarach) [mailto:dbarach@cisco.com]
> Sent: Wednesday, June 17, 2015 3:37 PM
> To: Richardson, Bruce; olivier.matz@6wind.com; Ananyev, Konstantin
> Cc: Damjan Marion (damarion)
> Subject: RE: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> 
> Dear Bruce,
> 
> We've implemented a number of use cases which can't or at least don't currently use hardware offload data. Examples: L2 bridging,
> integrated routing and bridging, various flavors of tunneling, ipv6 MAP, ipv6 segment routing, and so on.
> 
> You're not considering the cost of additional pressure on the load-store unit, memory system, caused when mb->next must be
> prefetched, set, and cleared. It doesn't matter that the packets are large, the cost still exists. As we transition to 40 and 100g NICs,
> large-packet PPS-per-core becomes more of an issue.
> 
> Damjan has offered to make the layout of the buffer metadata configurable, so that folks can "have it their way." Given that it's a ~10
> line change - with minimal comedic potential - it seems like a reasonable way to go.
> 
> Thoughts?
> 
> Thanks. Dave Barach
> Cisco Fellow
> 
> From: Damjan Marion (damarion)
> Sent: Wednesday, June 17, 2015 10:17 AM
> To: Dave Barach (dbarach)
> Subject: Fwd: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> 
> 
> 
> Begin forwarded message:
> 
> From: Bruce Richardson <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> Date: 17 Jun 2015 16:06:48 CEST
> To: "Damjan Marion (damarion)" <damarion@cisco.com>
> Cc: Olivier MATZ <olivier.matz@6wind.com>, "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "dev@dpdk.org"
> <dev@dpdk.org>
> 
> On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote:
> 
> On 15 Jun 2015, at 16:12, Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> The next pointers always start out as NULL when the mbuf pool is created. The
> only time it is set to non-NULL is when we have chained mbufs. If we never have
> any chained mbufs, we never need to touch the next field, or even read it - since
> we have the num-segments count in the first cache line. If we do have a multi-segment
> mbuf, it's likely to be a big packet, so we have more processing time available
> and we can then take the hit of setting the next pointer.
> 
> There are applications which are not using rx offload, but they deal with chained mbufs.
> Why they are less important than ones using rx offload? This is something people
> should be able to configure on build time.
> 
> It's not that they are less important, it's that the packet processing cycle count
> budget is going to be greater. A packet which is 64 bytes, or 128 bytes in size
> can make use of a number of RX offloads to reduce it's processing time. However,
> a 64/128 packet is not going to be split across multiple buffers [unless we
> are dealing with a very unusual setup!].
> 
> To handle 64 byte packets at 40G line rate, one has 50 cycles per core per packet
> when running at 3GHz. [3000000000 cycles / 59.5 mpps].
> If we assume that we are dealing with fairly small buffers
> here, and that anything greater than 1k packets are chained, we still have 626
> cycles per 3GHz core per packet to work with for that 1k packet. Given that
> "normal" DPDK buffers are 2k in size, we have over a thousand cycles per packet
> for any packet that is split.
> 
> In summary, packets spread across multiple buffers are large packets, and so have
> larger packet cycle count budgets and so can much better absorb the cost of
> touching a second cache line in the mbuf than a 64-byte packet can. Therefore,
> we optimize for the 64B packet case.
> 
> Hope this clarifies things a bit.
> 
> Regards,
> /Bruce

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 1/5] ethdev: add new API to retrieve RX/TX queue information
  @ 2015-06-17 16:54  3% ` Konstantin Ananyev
    0 siblings, 1 reply; 200+ results
From: Konstantin Ananyev @ 2015-06-17 16:54 UTC (permalink / raw)
  To: dev

Add the ability for the upper layer to query RX/TX queue information.

Add new structures:
struct rte_eth_rx_qinfo
struct rte_eth_tx_qinfo

new functions:
rte_eth_rx_queue_info_get
rte_eth_tx_queue_info_get

into rte_etdev API.

Left extra free space in the qinfo structures,
so extra fields could be added later without ABI breakage.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ether/rte_ethdev.c | 48 +++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h | 77 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e13fde5..6b9a7ef 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3629,6 +3629,54 @@ rte_eth_remove_tx_callback(uint8_t port_id, uint16_t queue_id,
 }
 
 int
+rte_eth_rx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_rx_qinfo *qinfo)
+{
+	struct rte_eth_dev *dev;
+
+	memset(qinfo, 0, sizeof(*qinfo));
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -EINVAL;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
+		return -EINVAL;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_qinfo_get, -ENOTSUP);
+	dev->dev_ops->rx_qinfo_get(dev, queue_id, qinfo);
+	return 0;
+}
+
+int
+rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_tx_qinfo *qinfo)
+{
+	struct rte_eth_dev *dev;
+
+	memset(qinfo, 0, sizeof(*qinfo));
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -EINVAL;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_tx_queues) {
+		PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+		return -EINVAL;
+	}
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_qinfo_get, -ENOTSUP);
+	dev->dev_ops->tx_qinfo_get(dev, queue_id, qinfo);
+	return 0;
+}
+
+int
 rte_eth_dev_set_mc_addr_list(uint8_t port_id,
 			     struct ether_addr *mc_addr_set,
 			     uint32_t nb_mc_addr)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 04c192d..45afdd3 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -942,6 +942,30 @@ struct rte_eth_xstats {
 	uint64_t value;
 };
 
+/**
+ * Ethernet device RX queue information strcuture.
+ * Used to retieve information about configured queue.
+ */
+struct rte_eth_rx_qinfo {
+	struct rte_mempool *mp;     /**< mempool used by that queue. */
+	struct rte_eth_rxconf conf; /**< queue config parameters. */
+	uint8_t scattered_rx;       /**< scattered packets RX supported. */
+	uint16_t nb_desc;           /**< configured number of RXDs. */
+	uint16_t max_desc;          /**< max allowed number of RXDs. */
+	uint16_t min_desc;          /**< min allowed number of RXDs. */
+} __rte_cache_aligned;
+
+/**
+ * Ethernet device TX queue information strcuture.
+ * Used to retieve information about configured queue.
+ */
+struct rte_eth_tx_qinfo {
+	struct rte_eth_txconf conf; /**< queue config parameters. */
+	uint16_t nb_desc;           /**< configured number of TXDs. */
+	uint16_t max_desc;          /**< max allowed number of TXDs. */
+	uint16_t min_desc;          /**< min allowed number of TXDs. */
+} __rte_cache_aligned;
+
 struct rte_eth_dev;
 
 struct rte_eth_dev_callback;
@@ -1045,6 +1069,12 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @Check DD bit of specific RX descriptor */
 
+typedef void (*eth_rx_qinfo_get_t)(struct rte_eth_dev *dev,
+	uint16_t rx_queue_id, struct rte_eth_rx_qinfo *qinfo);
+
+typedef void (*eth_tx_qinfo_get_t)(struct rte_eth_dev *dev,
+	uint16_t tx_queue_id, struct rte_eth_tx_qinfo *qinfo);
+
 typedef int (*mtu_set_t)(struct rte_eth_dev *dev, uint16_t mtu);
 /**< @internal Set MTU. */
 
@@ -1389,8 +1419,13 @@ struct eth_dev_ops {
 	rss_hash_update_t rss_hash_update;
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
-	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+	eth_filter_ctrl_t              filter_ctrl;
+	/**< common filter control. */
 	eth_set_mc_addr_list_t set_mc_addr_list; /**< set list of mcast addrs */
+	eth_rx_qinfo_get_t rx_qinfo_get;
+	/**< retrieve RX queue information. */
+	eth_tx_qinfo_get_t tx_qinfo_get;
+	/**< retrieve TX queue information. */
 };
 
 /**
@@ -3616,6 +3651,46 @@ int rte_eth_remove_rx_callback(uint8_t port_id, uint16_t queue_id,
 int rte_eth_remove_tx_callback(uint8_t port_id, uint16_t queue_id,
 		struct rte_eth_rxtx_callback *user_cb);
 
+/**
+ * Retrieve information about given port's RX queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The RX queue on the Ethernet device for which information
+ *   will be retrieved.
+ * @param qinfo
+ *   A pointer to a structure of type *rte_eth_rx_qinfo_info* to be filled with
+ *   the information of the Ethernet device.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOTSUP: routine is not supported by the device PMD.
+ *   - -EINVAL:  The port_id or the queue_id is out of range.
+ */
+int rte_eth_rx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_rx_qinfo *qinfo);
+
+/**
+ * Retrieve information about given port's TX queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The TX queue on the Ethernet device for which information
+ *   will be retrieved.
+ * @param qinfo
+ *   A pointer to a structure of type *rte_eth_tx_qinfo_info* to be filled with
+ *   the information of the Ethernet device.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOTSUP: routine is not supported by the device PMD.
+ *   - -EINVAL:  The port_id or the queue_id is out of range.
+ */
+int rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+	struct rte_eth_tx_qinfo *qinfo);
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.5.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17 12:14  7%   ` Panu Matilainen
@ 2015-06-17 13:21  8%     ` Vincent JARDIN
  0 siblings, 0 replies; 200+ results
From: Vincent JARDIN @ 2015-06-17 13:21 UTC (permalink / raw)
  To: Panu Matilainen, Neil Horman, Thomas Monjalon; +Cc: dev

On 17/06/2015 14:14, Panu Matilainen wrote:
> (initially accidentally sent to announce, resending to dev)
>
> On 06/17/2015 01:35 PM, Neil Horman wrote:
>> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
>>> Hi all,
>>>
>>> Sometimes there are some important discussions about architecture or
>>> design
>>> which require opinions from several developers. Unfortunately, we cannot
>>> read every threads. Maybe that using the announce mailing list will help
>>> to bring more audience to these discussions.
>>> Please note that
>>>     - the announce@ ML is moderated to keep a low traffic,
>>>     - every announce email is forwarded to dev@ ML.
>>> In case you want to reply to this email, please use dev@dpdk.org
>>> address.
>>>
>>> There were some debates about software statistics disabling.
>>> Should they be always on or possibly disabled when compiled?
>>> We need to take a decision shortly and discuss (or agree) this proposal:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019461.html
>>>
>>> During the development of the release 2.0, there was an agreement to
>>> keep
>>> ABI compatibility or to bring new ABI while keeping old one during
>>> one release.
>>> In case it's not possible to have this transition, the (exceptional)
>>> break
>>> should be acknowledged by several developers.
>>>     http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
>>> There were some interesting discussions but not a lot of participants:
>>>     http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=8461
>>>
>>>
>>> During the current development cycle for the release 2.1, the ABI
>>> question
>>> arises many times in different threads.
>>> To add the hash key size field, it is proposed to use a struct
>>> padding gap:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019386.html
>>> To support the flow director for VF, there is no proposal yet:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019343.html
>>> To add the speed capability, it is proposed to break ABI in the
>>> release 2.2:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019225.html
>>> To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019443.html
>>> To add the interrupt mode, it is proposed to add a build-time option
>>> CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
>>> binary:
>>>     http://dpdk.org/ml/archives/dev/2015-June/018947.html
>>> To add the packet type, there is a proposal to add a build-time option
>>> CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019172.html
>>> We must also better document how to remove a deprecated ABI:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019465.html
>>> The ABI compatibility is a new constraint and we need to better
>>> understand
>>> what it means and how to proceed. Even the macros are not yet well
>>> documented:
>>>     http://dpdk.org/ml/archives/dev/2015-June/019357.html
>>>
>>> Thanks for your attention and your participation in these important
>>> choices.
>>>
>>
>> Thomas-
>>     Just to re-iterate what you said earlier, and what was discussed
>> in the
>> previous ABI discussions
>>
>> 1) ABI stability was introduced to promote DPDK's ability to be
>> included with
>> various linux and BSD distributions.  Distributions, by and large, favor
>> building libraries as DSO's, favoring security and updatability in
>> favor of all
>> out performance.
>>
>> 2) The desire was to put DPDK developers in a mindset whereby ABI
>> stability was
>> something they needed to think about during development, as the DPDK
>> exposes
>> many data structures and instances that cannot be changed without
>> breaking ABI
>>
>> 3) The versioning mechanism was introduced to allow for backward
>> compatibility
>> during periods in which we needed to support both an old an new ABI
>>
>> 4) As Stephan and others point out, its not expected that we will
>> always be able
>> to maintain ABI, and as such an easy library versioning mechanism was
>> introduced
>> to prevent the loading of an incompatible library with an older
>> application
>>
>> 5) The ABI policy was introduced to create a method by which new ABI
>> facets
>> could be scheduled while allowing distros to prepare their downstream
>> users for
>> the upcomming changes.
>>
>>
>> It seems to me, looking back over these last few months, that we're
>> falling down
>> a bit on our use of (3).  I've seen several people take advantage of
>> the ABI
>> scheduled updates, but no one has tried the versioning interface, and
>> as a
>> result patches are getting delayed, which was never my intent.  Not
>> sure whats
>> to be done about that, but we should probably address it.  Is use of the
>> versionnig interface just too hard or convoluted?
>
> To me it seems that by far the biggest problem with ABI stability in
> DPDK is features requiring changes to public structs (often directly
> allocated and accessed by apps), which is something the symbol
> versioning doesn't directly help with, you'd need to version the structs
> too.
>
> One only needs to glance at the glibc documentation on how to accomplish
> it [1] to see it gets rather involved. Glibc promises backwards
> compatibility for life, so the effort is justified. However in where
> DPDK we're talking about extending compatibility for a few months by
> minimum requirement, people are unlikely to bother.
>
> [1] https://sourceware.org/glibc/wiki/Development/Versioning_A_Structure

Does it means that it requires a specific engineering so we are back to 
the needs of using two layers:
   a- the direct calls to "non ABI stable layers" (current librte*) for 
those we can/are fine to use it,
   b- the calls thru such layers that guarantee the ABI stabilities (a 
librte_compat?) for those who needs it.

Could both be exposed by the distributions so they can be consumed?

Thank you,
   Vincent

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17 10:35  9% ` Neil Horman
  2015-06-17 11:06  4%   ` Richardson, Bruce
@ 2015-06-17 12:14  7%   ` Panu Matilainen
  2015-06-17 13:21  8%     ` Vincent JARDIN
  2015-06-18  8:36  4%   ` Zhang, Helin
  2 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-17 12:14 UTC (permalink / raw)
  To: Neil Horman, Thomas Monjalon; +Cc: dev

(initially accidentally sent to announce, resending to dev)

On 06/17/2015 01:35 PM, Neil Horman wrote:
> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
>> Hi all,
>>
>> Sometimes there are some important discussions about architecture or design
>> which require opinions from several developers. Unfortunately, we cannot
>> read every threads. Maybe that using the announce mailing list will help
>> to bring more audience to these discussions.
>> Please note that
>> 	- the announce@ ML is moderated to keep a low traffic,
>> 	- every announce email is forwarded to dev@ ML.
>> In case you want to reply to this email, please use dev@dpdk.org address.
>>
>> There were some debates about software statistics disabling.
>> Should they be always on or possibly disabled when compiled?
>> We need to take a decision shortly and discuss (or agree) this proposal:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
>>
>> During the development of the release 2.0, there was an agreement to keep
>> ABI compatibility or to bring new ABI while keeping old one during one release.
>> In case it's not possible to have this transition, the (exceptional) break
>> should be acknowledged by several developers.
>> 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
>> There were some interesting discussions but not a lot of participants:
>> 	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=8461
>>
>> During the current development cycle for the release 2.1, the ABI question
>> arises many times in different threads.
>> To add the hash key size field, it is proposed to use a struct padding gap:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
>> To support the flow director for VF, there is no proposal yet:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
>> To add the speed capability, it is proposed to break ABI in the release 2.2:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
>> To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
>> To add the interrupt mode, it is proposed to add a build-time option
>> CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking binary:
>> 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
>> To add the packet type, there is a proposal to add a build-time option
>> CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
>> We must also better document how to remove a deprecated ABI:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
>> The ABI compatibility is a new constraint and we need to better understand
>> what it means and how to proceed. Even the macros are not yet well documented:
>> 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
>>
>> Thanks for your attention and your participation in these important choices.
>>
>
> Thomas-
> 	Just to re-iterate what you said earlier, and what was discussed in the
> previous ABI discussions
>
> 1) ABI stability was introduced to promote DPDK's ability to be included with
> various linux and BSD distributions.  Distributions, by and large, favor
> building libraries as DSO's, favoring security and updatability in favor of all
> out performance.
>
> 2) The desire was to put DPDK developers in a mindset whereby ABI stability was
> something they needed to think about during development, as the DPDK exposes
> many data structures and instances that cannot be changed without breaking ABI
>
> 3) The versioning mechanism was introduced to allow for backward compatibility
> during periods in which we needed to support both an old an new ABI
>
> 4) As Stephan and others point out, its not expected that we will always be able
> to maintain ABI, and as such an easy library versioning mechanism was introduced
> to prevent the loading of an incompatible library with an older application
>
> 5) The ABI policy was introduced to create a method by which new ABI facets
> could be scheduled while allowing distros to prepare their downstream users for
> the upcomming changes.
>
>
> It seems to me, looking back over these last few months, that we're falling down
> a bit on our use of (3).  I've seen several people take advantage of the ABI
> scheduled updates, but no one has tried the versioning interface, and as a
> result patches are getting delayed, which was never my intent.  Not sure whats
> to be done about that, but we should probably address it.  Is use of the
> versionnig interface just too hard or convoluted?

To me it seems that by far the biggest problem with ABI stability in 
DPDK is features requiring changes to public structs (often directly 
allocated and accessed by apps), which is something the symbol 
versioning doesn't directly help with, you'd need to version the structs 
too.

One only needs to glance at the glibc documentation on how to accomplish 
it [1] to see it gets rather involved. Glibc promises backwards 
compatibility for life, so the effort is justified. However in where 
DPDK we're talking about extending compatibility for a few months by 
minimum requirement, people are unlikely to bother.

[1] https://sourceware.org/glibc/wiki/Development/Versioning_A_Structure

     - Panu -

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  4:36  9% ` Matthew Hall
  2015-06-17  5:28  8%   ` Stephen Hemminger
@ 2015-06-17 11:17  4%   ` Bruce Richardson
  2015-06-18 16:32  4%     ` Dumitrescu, Cristian
  2015-06-18 13:25  8%   ` Dumitrescu, Cristian
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-06-17 11:17 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev

On Tue, Jun 16, 2015 at 09:36:54PM -0700, Matthew Hall wrote:
> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> 
> This goes against the idea I have seen before that we should be moving toward 
> a distro-friendly approach where one copy of DPDK can be used by multiple apps 
> without having to rebuild it. It seems like it is also a bit ABI hostile 
> according to the below goals / discussions.
> 
> Jemalloc is also very high-performance code and still manages to allow 
> enabling and disabling statistics at runtime. Are we sure it's impossible for 
> DPDK or just theorizing?
> 

+1 to this. I think that any compile-time option to disable stats should only
be used when we have a proven performance issue with just disabling them at runtime.
I would assume that apps do not switch on or off stats multiple times per second,
so any code branches to track stats or not would be entirely predictable in the
code - since they always go one way. Therefore, when disabledi, we should be looking
at a very minimal overhead per stat. If there are lots of checks for the same
value in the one path, i.e. lots of stats in a hot path, hopefully the compiler
will be smart enough to make the check just once. If not, we can always do
that in the C code by duplicating the hotpath code for with or without stats cases -
again selectable at runtime.

Also, there is also the case where the stats tracking itself is such low overhead
that its not worth disabling. In that case, neither runtime nor compile-time disabling
should need to be provided. For example, any library that is keeping a track of
bursts of packets should not need to have that stat disable option - one increment
or addition per burst of (32) packets is not going to seriously affect any app. :-)

Regards,
/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17 10:35  9% ` Neil Horman
@ 2015-06-17 11:06  4%   ` Richardson, Bruce
  2015-06-19 11:08  7%     ` Mcnamara, John
  2015-06-17 12:14  7%   ` Panu Matilainen
  2015-06-18  8:36  4%   ` Zhang, Helin
  2 siblings, 1 reply; 200+ results
From: Richardson, Bruce @ 2015-06-17 11:06 UTC (permalink / raw)
  To: Neil Horman, Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Neil Horman
> Sent: Wednesday, June 17, 2015 11:35 AM
> To: Thomas Monjalon
> Cc: announce@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-announce] important design choices -
> statistics - ABI
> 
> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > Hi all,
> >
> > Sometimes there are some important discussions about architecture or
> > design which require opinions from several developers. Unfortunately,
> > we cannot read every threads. Maybe that using the announce mailing
> > list will help to bring more audience to these discussions.
> > Please note that
> > 	- the announce@ ML is moderated to keep a low traffic,
> > 	- every announce email is forwarded to dev@ ML.
> > In case you want to reply to this email, please use dev@dpdk.org
> address.
> >
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> >
> > During the development of the release 2.0, there was an agreement to
> > keep ABI compatibility or to bring new ABI while keeping old one during
> one release.
> > In case it's not possible to have this transition, the (exceptional)
> > break should be acknowledged by several developers.
> > 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> > There were some interesting discussions but not a lot of participants:
> >
> > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=84
> > 61
> >
> > During the current development cycle for the release 2.1, the ABI
> > question arises many times in different threads.
> > To add the hash key size field, it is proposed to use a struct padding
> gap:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
> > To support the flow director for VF, there is no proposal yet:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
> > To add the speed capability, it is proposed to break ABI in the release
> 2.2:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
> > To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
> > To add the interrupt mode, it is proposed to add a build-time option
> > CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking
> binary:
> > 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
> > To add the packet type, there is a proposal to add a build-time option
> > CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
> > We must also better document how to remove a deprecated ABI:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
> > The ABI compatibility is a new constraint and we need to better
> > understand what it means and how to proceed. Even the macros are not yet
> well documented:
> > 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
> >
> > Thanks for your attention and your participation in these important
> choices.
> >
> 
> Thomas-
> 	Just to re-iterate what you said earlier, and what was discussed in
> the previous ABI discussions
> 
> 1) ABI stability was introduced to promote DPDK's ability to be included
> with various linux and BSD distributions.  Distributions, by and large,
> favor building libraries as DSO's, favoring security and updatability in
> favor of all out performance.
> 
> 2) The desire was to put DPDK developers in a mindset whereby ABI
> stability was something they needed to think about during development, as
> the DPDK exposes many data structures and instances that cannot be changed
> without breaking ABI
> 
> 3) The versioning mechanism was introduced to allow for backward
> compatibility during periods in which we needed to support both an old an
> new ABI
> 
> 4) As Stephan and others point out, its not expected that we will always
> be able to maintain ABI, and as such an easy library versioning mechanism
> was introduced to prevent the loading of an incompatible library with an
> older application
> 
> 5) The ABI policy was introduced to create a method by which new ABI
> facets could be scheduled while allowing distros to prepare their
> downstream users for the upcomming changes.
> 
> 
> It seems to me, looking back over these last few months, that we're
> falling down a bit on our use of (3).  I've seen several people take
> advantage of the ABI scheduled updates, but no one has tried the
> versioning interface, and as a result patches are getting delayed, which
> was never my intent.  Not sure whats to be done about that, but we should
> probably address it.  Is use of the versionnig interface just too hard or
> convoluted?
> 
> Neil


Hi Neil,
 
on my end, some suggestions:

1. the documentation on changing an API function provided in rte_compat.h
is really good, but I don't think this is present in our documentation in
the docs folder or on website is it (apologies if it is and I've missed
it)? This needs to go into programmers guide or some other doc (perhaps
the new doc that the coding style went into).

2. The documentation also needs an example of: this is how you add a new
function and update the map file, and this is how you a) mark a function
as deprecated and b) remove it completely. That way we could have one
guide covering API versioning, how to add, modify and remove functions.

3. This doc should also cover how to use the API checker tool, something I
haven't had the chance to look at yet, but should do in the near future!
:-)

Regards,

/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
  2015-06-17  4:36  9% ` Matthew Hall
  2015-06-17  9:54  8% ` Morten Brørup
@ 2015-06-17 10:35  9% ` Neil Horman
  2015-06-17 11:06  4%   ` Richardson, Bruce
                     ` (2 more replies)
  2015-06-18 16:55  8% ` O'Driscoll, Tim
  3 siblings, 3 replies; 200+ results
From: Neil Horman @ 2015-06-17 10:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: announce

On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> Hi all,
> 
> Sometimes there are some important discussions about architecture or design
> which require opinions from several developers. Unfortunately, we cannot
> read every threads. Maybe that using the announce mailing list will help
> to bring more audience to these discussions.
> Please note that
> 	- the announce@ ML is moderated to keep a low traffic,
> 	- every announce email is forwarded to dev@ ML.
> In case you want to reply to this email, please use dev@dpdk.org address.
> 
> There were some debates about software statistics disabling.
> Should they be always on or possibly disabled when compiled?
> We need to take a decision shortly and discuss (or agree) this proposal:
> 	http://dpdk.org/ml/archives/dev/2015-June/019461.html
> 
> During the development of the release 2.0, there was an agreement to keep
> ABI compatibility or to bring new ABI while keeping old one during one release.
> In case it's not possible to have this transition, the (exceptional) break
> should be acknowledged by several developers.
> 	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
> There were some interesting discussions but not a lot of participants:
> 	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=8461
> 
> During the current development cycle for the release 2.1, the ABI question
> arises many times in different threads.
> To add the hash key size field, it is proposed to use a struct padding gap:
> 	http://dpdk.org/ml/archives/dev/2015-June/019386.html
> To support the flow director for VF, there is no proposal yet:
> 	http://dpdk.org/ml/archives/dev/2015-June/019343.html
> To add the speed capability, it is proposed to break ABI in the release 2.2:
> 	http://dpdk.org/ml/archives/dev/2015-June/019225.html
> To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
> 	http://dpdk.org/ml/archives/dev/2015-June/019443.html
> To add the interrupt mode, it is proposed to add a build-time option
> CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking binary:
> 	http://dpdk.org/ml/archives/dev/2015-June/018947.html
> To add the packet type, there is a proposal to add a build-time option
> CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
> 	http://dpdk.org/ml/archives/dev/2015-June/019172.html
> We must also better document how to remove a deprecated ABI:
> 	http://dpdk.org/ml/archives/dev/2015-June/019465.html
> The ABI compatibility is a new constraint and we need to better understand
> what it means and how to proceed. Even the macros are not yet well documented:
> 	http://dpdk.org/ml/archives/dev/2015-June/019357.html
> 
> Thanks for your attention and your participation in these important choices.
> 

Thomas-
	Just to re-iterate what you said earlier, and what was discussed in the
previous ABI discussions

1) ABI stability was introduced to promote DPDK's ability to be included with
various linux and BSD distributions.  Distributions, by and large, favor
building libraries as DSO's, favoring security and updatability in favor of all
out performance.

2) The desire was to put DPDK developers in a mindset whereby ABI stability was
something they needed to think about during development, as the DPDK exposes
many data structures and instances that cannot be changed without breaking ABI

3) The versioning mechanism was introduced to allow for backward compatibility
during periods in which we needed to support both an old an new ABI

4) As Stephan and others point out, its not expected that we will always be able
to maintain ABI, and as such an easy library versioning mechanism was introduced
to prevent the loading of an incompatible library with an older application

5) The ABI policy was introduced to create a method by which new ABI facets
could be scheduled while allowing distros to prepare their downstream users for
the upcomming changes.


It seems to me, looking back over these last few months, that we're falling down
a bit on our use of (3).  I've seen several people take advantage of the ABI
scheduled updates, but no one has tried the versioning interface, and as a
result patches are getting delayed, which was never my intent.  Not sure whats
to be done about that, but we should probably address it.  Is use of the
versionnig interface just too hard or convoluted?

Neil

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
  2015-06-17  4:36  9% ` Matthew Hall
@ 2015-06-17  9:54  8% ` Morten Brørup
  2015-06-18 13:00  4%   ` Dumitrescu, Cristian
  2015-06-17 10:35  9% ` Neil Horman
  2015-06-18 16:55  8% ` O'Driscoll, Tim
  3 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2015-06-17  9:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Dear Thomas,

I don't have time to follow the DPDK Developers mailing list, but since you call for feedback, I would like to share my thoughts regarding these design choices.


Regarding the statistics discussion:

1. The suggested solution assumes that, when statistics is disabled, the cost of allocating and maintaining zero-value statistics is negligible. If statistics counters are only available through accessor functions, this is probably true.

However, if statistics counters are directly accessible, e.g. as elements in the fast path data structures of a library, maintaining zero-value statistics may a have memory and/or performance impact.

Since the compile time flag CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT already tells the application if the statistics are present or not, the application should simply use this flag to determine if statistics are accessible or not.

2. The suggested solution with only one single flag per library prevents implementing statistics with varying granularity for different purposes. E.g. a library could have one set of statistics counters for ordinary SNMP purposes, and another set of statistics counters for debugging/optimization purposes.

Multiple flags per library should be possible. A hierarchy of flags per library is probably not required.


Regarding the PHY speed ABI:

1. The Ethernet PHY ABI for speed, duplex, etc. should be common throughout the entire DPDK. It might be confusing if some structures/functions use a bitmask to indicate PHY speed/duplex/personality/etc. and other structures/functions use a combination of an unsigned integer, duplex flag, personality enumeration etc. (By personality enumeration, I am referring to PHYs with multiple electrical interfaces. E.g. a dual personality PHY might have both an RJ45 copper interface and an SFP module interface, whereof only one can be active at any time.)

2. The auto-negotiation standard allows the PHY to announce (to its link partner) any subset of its capabilities to its link partner. E.g. a standard 10/100/100 Ethernet PHY (which can handle both 10 and 100 Mbit/s in both half and full duplex and 1 Gbit/s full duplex) can be configured to announce 10 Mbit/s half duplex and 100 Mbit/s full duplex capabilities to its link partner. (Of course, more useful combinations are normally announced, but the purpose of the example is to show that any combination is possible.)

The ABI for auto-negotiation should include options to select the list of capabilities to announce to the link partner. The Linux PHY ABI only allows forcing a selected speed and duplex (thereby disabling auto-negotiation) or enabling auto-negotiation (thereby announcing all possible speeds and duplex combinations the PHY is capable of). Don't make the same mistake in DPDK.

PS: While working for Vitesse Semiconductors (an Ethernet chip company) a long time ago, I actually wrote the API for their line of Ethernet PHYs. So I have hands on experience in this area.


Regarding the discussion about backwards/forwards compatibility in the ABI:

1. Sometimes, ABI breakage is required. That is the cost the users pay for getting the benefits from upgrading to the latest and greatest version of any library. The current solution of requiring acknowledgement from several qualified developers is fine - these developers will consider the cost/benefit on behalf of all the DPDK users and make a qualified decision.

2. It is my general experience that documentation is not always updated to reflect the fine details of the source code, and this also applies to release notes. For open source software, the primary point of documentation is usually the source code itself.

2a. It should be clearly visible directly in the DPDK source code (including makefiles etc.) which ABI (i.e. functions, macros, type definitions etc.) is the current, the deprecated, and the future.

2b. When a developer migrates a project using DPDK from a previous version of the DPDK, it should be easy for the developer to identify all DPDK ABI modifications and variants, e.g. by using a common indicator in the DPDK source code, such as LIBAPIVER, that developer can simply search for.

3. Adding special feature flags, e.g. CONFIG_RTE_EAL_RX_INTR, to indicate a breakage of the ABI, should only be done if it is the intention to keep both the current and the new variants of the feature in the DPDK in the future. Otherwise, such a flag should be combined with the standard ABI version indication, so it is clear that this feature belongs to certain versions (i.e. deprecated, current or future).


Med venlig hilsen / kind regards

Morten Brørup
CTO



SmartShare Systems A/S
Tonsbakken 16-18
DK-2740 Skovlunde
Denmark

Office      +45 70 20 00 93
Direct      +45 89 93 50 22
Mobile      +45 25 40 82 12

mb@smartsharesystems.com
www.smartsharesystems.com
-----Original Message-----
From: announce [mailto:announce-bounces@dpdk.org] On Behalf Of Thomas Monjalon
Sent: 17. juni 2015 01:30
To: announce@dpdk.org
Subject: [dpdk-announce] important design choices - statistics - ABI

Hi all,

Sometimes there are some important discussions about architecture or design which require opinions from several developers. Unfortunately, we cannot read every threads. Maybe that using the announce mailing list will help to bring more audience to these discussions.
Please note that
	- the announce@ ML is moderated to keep a low traffic,
	- every announce email is forwarded to dev@ ML.
In case you want to reply to this email, please use dev@dpdk.org address.

There were some debates about software statistics disabling.
Should they be always on or possibly disabled when compiled?
We need to take a decision shortly and discuss (or agree) this proposal:
	http://dpdk.org/ml/archives/dev/2015-June/019461.html

During the development of the release 2.0, there was an agreement to keep ABI compatibility or to bring new ABI while keeping old one during one release.
In case it's not possible to have this transition, the (exceptional) break should be acknowledged by several developers.
	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
There were some interesting discussions but not a lot of participants:
	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=8461

During the current development cycle for the release 2.1, the ABI question arises many times in different threads.
To add the hash key size field, it is proposed to use a struct padding gap:
	http://dpdk.org/ml/archives/dev/2015-June/019386.html
To support the flow director for VF, there is no proposal yet:
	http://dpdk.org/ml/archives/dev/2015-June/019343.html
To add the speed capability, it is proposed to break ABI in the release 2.2:
	http://dpdk.org/ml/archives/dev/2015-June/019225.html
To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
	http://dpdk.org/ml/archives/dev/2015-June/019443.html
To add the interrupt mode, it is proposed to add a build-time option CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking binary:
	http://dpdk.org/ml/archives/dev/2015-June/018947.html
To add the packet type, there is a proposal to add a build-time option CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
	http://dpdk.org/ml/archives/dev/2015-June/019172.html
We must also better document how to remove a deprecated ABI:
	http://dpdk.org/ml/archives/dev/2015-June/019465.html
The ABI compatibility is a new constraint and we need to better understand what it means and how to proceed. Even the macros are not yet well documented:
	http://dpdk.org/ml/archives/dev/2015-June/019357.html

Thanks for your attention and your participation in these important choices.

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  5:28  8%   ` Stephen Hemminger
@ 2015-06-17  8:23  7%     ` Thomas Monjalon
  2015-06-17  8:23  9%     ` Marc Sune
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-17  8:23 UTC (permalink / raw)
  To: Stephen Hemminger, Matthew Hall; +Cc: dev

2015-06-16 22:28, Stephen Hemminger:
> On Tue, Jun 16, 2015 at 9:36 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
> > On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > > There were some debates about software statistics disabling.
> > > Should they be always on or possibly disabled when compiled?
> > > We need to take a decision shortly and discuss (or agree) this proposal:
> > >       http://dpdk.org/ml/archives/dev/2015-June/019461.html
> >
> > This goes against the idea I have seen before that we should be moving
> > toward a distro-friendly approach where one copy of DPDK can be used by
> > multiple apps without having to rebuild it. It seems like it is also a
> > bit ABI hostile according to the below goals / discussions.
> >
> > Jemalloc is also very high-performance code and still manages to allow
> > enabling and disabling statistics at runtime. Are we sure it's impossible
> > for DPDK or just theorizing?

Please Matthew, it is better to comment in the thread dedicated to statistics.

[...]
> > Personally to me it seems more important to preserve the ABI on patch
> > releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?

The goal of the ABI deprecation process was to provide a smooth integration
of the release 2.X+1.0. There are 4 months between releases 2.X.0 and 2.X+1.0.

[...]
> > However new-style libraries such as libcurl usually just have init
> > functions which initialize all the secret structs based on some defaults
> > and some user parameters and hide the actual structs from the user.
> > If you want to adjust some you call an adjuster function that modifies
> > the actual secret struct contents, with some enum saying what field to
> > adjust, and the new value you want it to have.
> >
> > If you want to keep a stable ABI for a non-stable library like DPDK,
> > there's a good chance you must begin hiding all these weird device
> > specific structs all over the DPDK from the user needing to directly
> > allocate and modify them.
> > Otherwise the ABI breaks everytime you have to add adjustments,
> > extensions, modifications to all these obscure special features.
> 
> The DPDK makes extensive use of inline functions which prevents data hiding
> necessary for ABI stablility. This a fundamental tradeoff, and since the
> whole reason for DPDK is performance; the ABI is going to be a moving target.
> 
> It would make more sense to provide a higher level API which was abstracted,
> slower, but stable for applications. But in doing so it would mean giving
> up things like inline lockless rings. Just don't go as far as the Open (not)
> dataplane API;
> which is just an excuse for closed source.

I don't understand what you mean.

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  5:28  8%   ` Stephen Hemminger
  2015-06-17  8:23  7%     ` Thomas Monjalon
@ 2015-06-17  8:23  9%     ` Marc Sune
  1 sibling, 0 replies; 200+ results
From: Marc Sune @ 2015-06-17  8:23 UTC (permalink / raw)
  To: dev



On 17/06/15 07:28, Stephen Hemminger wrote:
> On Tue, Jun 16, 2015 at 9:36 PM, Matthew Hall <mhall@mhcomputing.net> wrote:
>
>> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
>>> There were some debates about software statistics disabling.
>>> Should they be always on or possibly disabled when compiled?
>>> We need to take a decision shortly and discuss (or agree) this proposal:
>>>        http://dpdk.org/ml/archives/dev/2015-June/019461.html
>> This goes against the idea I have seen before that we should be moving
>> toward
>> a distro-friendly approach where one copy of DPDK can be used by multiple
>> apps
>> without having to rebuild it. It seems like it is also a bit ABI hostile
>> according to the below goals / discussions.
>>
>> Jemalloc is also very high-performance code and still manages to allow
>> enabling and disabling statistics at runtime. Are we sure it's impossible
>> for
>> DPDK or just theorizing?
>>
>>> During the development of the release 2.0, there was an agreement to keep
>>> ABI compatibility or to bring new ABI while keeping old one during one
>> release.
>>> In case it's not possible to have this transition, the (exceptional)
>> break
>>> should be acknowledged by several developers.
>> Personally to me it seems more important to preserve the ABI on patch
>> releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?
>>
>>> During the current development cycle for the release 2.1, the ABI
>> question
>>> arises many times in different threads.
>> Most but not all of these examples point to a different issue which
>> sometimes
>> happens in libraries... often seen as "old-style" versus "new-style" C
>> library
>> interface. For example, in old-style like libpcap there are a lot of
>> structs,
>> both opaque and non-opaque, which the caller must allocate in order to run
>> libpcap.
>>
>> However new-style libraries such as libcurl usually just have init
>> functions
>> which initialize all the secret structs based on some defaults and some
>> user
>> parameters and hide the actual structs from the user. If you want to adjust
>> some you call an adjuster function that modifies the actual secret struct
>> contents, with some enum saying what field to adjust, and the new value you
>> want it to have.
>>
>> If you want to keep a stable ABI for a non-stable library like DPDK,
>> there's a
>> good chance you must begin hiding all these weird device specific structs
>> all
>> over the DPDK from the user needing to directly allocate and modify them.
>> Otherwise the ABI breaks everytime you have to add adjustments, extensions,
>> modifications to all these obscure special features.
>>
>> Matthew.
>>
> The DPDK makes extensive use of inline functions which prevents data hiding
> necessary for ABI stablility. This a fundamental tradeoff, and since the
> whole
> reason for DPDK is performance; the ABI is going to be a moving target.
>
> It would make more sense to provide a higher level API which was abstracted,
> slower, but stable for applications. But in doing so it would mean giving
> up things
> like inline lockless rings. Just don't go as far as the Open (not)
> dataplane API;
> which is just an excuse for closed source.
+1 to what Stephen says. I don't think though a higher level API is 
really worth.

I unfortunately could not participate in the ABI discussion (there are 
times in which you cannot be following each and every single thread in 
the mailing list, and then it got lost in my inbox, my bad).

To me, ABI compatibility is a must if we'd have long term support 
releases (1.8.0, 1.8.1...), which we don't (I proposed to have them a 
couple of times).

On the other side, if the argument for ABI compatibility is shared 
libraries and to make it easy to package for distros, I don't see 
anything better than using the MAJOR.MIN version to identify the ABI 
(1.8, 1.9). Having LTS support would also help distros be able to stay 
in an ABI compatible version of DPDK and get recent bug fixes, and to 
better identify ABi changes (MAJ.MIN).

Moreover, I doubt that many people use shared libraries in DPDK. DPDK is 
built for performance and has a lot of code inlining which makes anyway 
your binary to be recompiled, as Stephen says.

DPDK is still growing, and the strict ABI policy is forcing patches to 
be delayed, to add hacky work-arounds to not break the ABI policy ( Re: 
[dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf as an 
example)  that are announced that have to be removed afterwards. I see 
the entire ABI policy as a blocker and inducing to have botches all over 
the code, than anything else.

Marc

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH 3/3] acl: mark deprecated functions
  @ 2015-06-17  7:59  4%   ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-17  7:59 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger

On 06/15/2015 07:51 PM, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> To allow for compatiablity with later releases, any functions
> to be removed should be marked as deprecated for one release.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
[...]
> diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h
> index 3a93730..0c32df0 100644
> --- a/lib/librte_acl/rte_acl.h
> +++ b/lib/librte_acl/rte_acl.h
> @@ -456,7 +456,7 @@ enum {
>   int
>   rte_acl_ipv4vlan_add_rules(struct rte_acl_ctx *ctx,
>   	const struct rte_acl_ipv4vlan_rule *rules,
> -	uint32_t num);
> +	uint32_t num) __attribute__((deprecated));
>
>   /**
>    * Analyze set of ipv4vlan rules and build required internal
> @@ -478,7 +478,7 @@ rte_acl_ipv4vlan_add_rules(struct rte_acl_ctx *ctx,
>   int
>   rte_acl_ipv4vlan_build(struct rte_acl_ctx *ctx,
>   	const uint32_t layout[RTE_ACL_IPV4VLAN_NUM],
> -	uint32_t num_categories);
> +	uint32_t num_categories) __attribute__((deprecated));
>
>
>   #ifdef __cplusplus
>

I've no objections to the patch as such, but I think the ABI policy is 
asking all deprecation notices to be added to the "Deprecation Notices" 
section in the policy document itself.

That said, the average developer is MUCH likelier to notice the compiler 
warning from these than a deprecation notice in some "who reads those" 
document :) Perhaps we could generate a list of functions marked for 
removal for the ABI policy document automatically based on the 
deprecated attributes.

	- Panu -

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  2015-06-17  0:39  0%         ` Stephen Hemminger
@ 2015-06-17  7:29  0%           ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-17  7:29 UTC (permalink / raw)
  To: Stephen Hemminger, Thomas Monjalon; +Cc: dev, Stephen Hemminger

On 06/17/2015 03:39 AM, Stephen Hemminger wrote:
> On Tue, 16 Jun 2015 23:37:32 +0000
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>
>> 2015-06-16 16:05, Stephen Hemminger:
>>> On Tue, 16 Jun 2015 14:52:16 +0100
>>> Bruce Richardson <bruce.richardson@intel.com> wrote:
>>>
>>>> On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
>>>>> From: Stephen Hemminger <shemming@brocade.com>
>>>>>
>>>>> These were deprecated in 2.0 so remove them from 2.1
>>>>>
>>>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>> ---
>>>>>   drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
>>>>>   drivers/net/ring/rte_eth_ring_version.map |  4 +--
>>>>>   2 files changed, 1 insertion(+), 58 deletions(-)
>>>>>
>>>> [..snip..]
>>>>> diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
>>>>> index 8ad107d..5ee55d9 100644
>>>>> --- a/drivers/net/ring/rte_eth_ring_version.map
>>>>> +++ b/drivers/net/ring/rte_eth_ring_version.map
>>>>> @@ -1,9 +1,7 @@
>>>>> -DPDK_2.0 {
>>>>> +DPDK_2.1 {
>>>>>   	global:
>>>>>
>>>>>   	rte_eth_from_rings;
>>>>> -	rte_eth_ring_pair_attach;
>>>>> -	rte_eth_ring_pair_create;
>>>>>
>>>>>   	local: *;
>>>>>   };
>>>>
>>>> [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0
>>>> version listings in the .map file?
>>>
>>> Notice the version # changed as well, so linker will generate a new version.
>>> The function was marked deprecated in last version.
>>
>> What happens if you load the 2.1 lib with an app built for 2.0?
>> Shouldn't we keep the DPDK_2.0 block?
>
> What happens is that build process makes a new version of DPDK package
> with a new version number. This version can co-exist on same system with
> old library (depends on library packaging).
> Old library will have old functions, and old application will
> use old library. New applications will be have new so version and get the
> new library.
>
>    http://unix.stackexchange.com/questions/475/how-do-so-shared-object-numbers-work
>
> If we didn't do this, nothing could ever really be removed!

Yes, soname bump is required when symbols are removed.

However that doesn't change the version the remaining symbols were 
introduced, eg rte_eth_from_rings() in this case, so AIUI you should 
leave the DPDK_2.0 {} block version alone. If new symbols get added in 
2.1 then a new DPDK_2.1 block needs to be added for those.

	- Panu -

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: announce ABI changes planned for unified packet type
@ 2015-06-17  5:48 20% Helin Zhang
  0 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-17  5:48 UTC (permalink / raw)
  To: dev

The significant ABI changes are planned for unified packet type
which will be supported from release 2.2. Here announces that ABI
changes in detail.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..6bbb6ed 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* Significant ABI changes are planned for struct rte_mbuf, and several PKT_RX_ flags will be removed, to support unified packet type from release 2.2. The upcoming release 2.1 will not have those changes. There is no backward compatibility planned from release 2.2. All binaries will need to be rebuilt from release 2.2.
-- 
1.9.3

^ permalink raw reply	[relevance 20%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-17  4:36  9% ` Matthew Hall
@ 2015-06-17  5:28  8%   ` Stephen Hemminger
  2015-06-17  8:23  7%     ` Thomas Monjalon
  2015-06-17  8:23  9%     ` Marc Sune
  2015-06-17 11:17  4%   ` Bruce Richardson
  2015-06-18 13:25  8%   ` Dumitrescu, Cristian
  2 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2015-06-17  5:28 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev

On Tue, Jun 16, 2015 at 9:36 PM, Matthew Hall <mhall@mhcomputing.net> wrote:

> On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> > There were some debates about software statistics disabling.
> > Should they be always on or possibly disabled when compiled?
> > We need to take a decision shortly and discuss (or agree) this proposal:
> >       http://dpdk.org/ml/archives/dev/2015-June/019461.html
>
> This goes against the idea I have seen before that we should be moving
> toward
> a distro-friendly approach where one copy of DPDK can be used by multiple
> apps
> without having to rebuild it. It seems like it is also a bit ABI hostile
> according to the below goals / discussions.
>
> Jemalloc is also very high-performance code and still manages to allow
> enabling and disabling statistics at runtime. Are we sure it's impossible
> for
> DPDK or just theorizing?
>
> > During the development of the release 2.0, there was an agreement to keep
> > ABI compatibility or to bring new ABI while keeping old one during one
> release.
> > In case it's not possible to have this transition, the (exceptional)
> break
> > should be acknowledged by several developers.
>
> Personally to me it seems more important to preserve the ABI on patch
> releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?
>
> > During the current development cycle for the release 2.1, the ABI
> question
> > arises many times in different threads.
>
> Most but not all of these examples point to a different issue which
> sometimes
> happens in libraries... often seen as "old-style" versus "new-style" C
> library
> interface. For example, in old-style like libpcap there are a lot of
> structs,
> both opaque and non-opaque, which the caller must allocate in order to run
> libpcap.
>
> However new-style libraries such as libcurl usually just have init
> functions
> which initialize all the secret structs based on some defaults and some
> user
> parameters and hide the actual structs from the user. If you want to adjust
> some you call an adjuster function that modifies the actual secret struct
> contents, with some enum saying what field to adjust, and the new value you
> want it to have.
>
> If you want to keep a stable ABI for a non-stable library like DPDK,
> there's a
> good chance you must begin hiding all these weird device specific structs
> all
> over the DPDK from the user needing to directly allocate and modify them.
> Otherwise the ABI breaks everytime you have to add adjustments, extensions,
> modifications to all these obscure special features.
>
> Matthew.
>

The DPDK makes extensive use of inline functions which prevents data hiding
necessary for ABI stablility. This a fundamental tradeoff, and since the
whole
reason for DPDK is performance; the ABI is going to be a moving target.

It would make more sense to provide a higher level API which was abstracted,
slower, but stable for applications. But in doing so it would mean giving
up things
like inline lockless rings. Just don't go as far as the Open (not)
dataplane API;
which is just an excuse for closed source.

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
  2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
@ 2015-06-17  4:36  9% ` Matthew Hall
  2015-06-17  5:28  8%   ` Stephen Hemminger
                     ` (2 more replies)
  2015-06-17  9:54  8% ` Morten Brørup
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 200+ results
From: Matthew Hall @ 2015-06-17  4:36 UTC (permalink / raw)
  To: dev

On Wed, Jun 17, 2015 at 01:29:47AM +0200, Thomas Monjalon wrote:
> There were some debates about software statistics disabling.
> Should they be always on or possibly disabled when compiled?
> We need to take a decision shortly and discuss (or agree) this proposal:
> 	http://dpdk.org/ml/archives/dev/2015-June/019461.html

This goes against the idea I have seen before that we should be moving toward 
a distro-friendly approach where one copy of DPDK can be used by multiple apps 
without having to rebuild it. It seems like it is also a bit ABI hostile 
according to the below goals / discussions.

Jemalloc is also very high-performance code and still manages to allow 
enabling and disabling statistics at runtime. Are we sure it's impossible for 
DPDK or just theorizing?

> During the development of the release 2.0, there was an agreement to keep
> ABI compatibility or to bring new ABI while keeping old one during one release.
> In case it's not possible to have this transition, the (exceptional) break
> should be acknowledged by several developers.

Personally to me it seems more important to preserve the ABI on patch 
releases, like 2.X.Y going to 2.X.Z. But maybe I missed something?

> During the current development cycle for the release 2.1, the ABI question
> arises many times in different threads.

Most but not all of these examples point to a different issue which sometimes 
happens in libraries... often seen as "old-style" versus "new-style" C library 
interface. For example, in old-style like libpcap there are a lot of structs, 
both opaque and non-opaque, which the caller must allocate in order to run libpcap. 

However new-style libraries such as libcurl usually just have init functions 
which initialize all the secret structs based on some defaults and some user 
parameters and hide the actual structs from the user. If you want to adjust 
some you call an adjuster function that modifies the actual secret struct 
contents, with some enum saying what field to adjust, and the new value you 
want it to have.

If you want to keep a stable ABI for a non-stable library like DPDK, there's a 
good chance you must begin hiding all these weird device specific structs all 
over the DPDK from the user needing to directly allocate and modify them. 
Otherwise the ABI breaks everytime you have to add adjustments, extensions, 
modifications to all these obscure special features.

Matthew.

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] abi: announce abi changes plan for struct rte_eth_fdir_flow_ext
@ 2015-06-17  3:36 23% Jingjing Wu
  0 siblings, 0 replies; 200+ results
From: Jingjing Wu @ 2015-06-17  3:36 UTC (permalink / raw)
  To: dev

It announces the planned ABI change to support flow director filtering in VF on v2.2.

Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..b8d7cc8 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* The ABI changes are planned for struct rte_eth_fdir_flow_ext in order to support flow director filtering in VF. The upcoming release 2.1 will not contain these ABI changes, but release 2.2 will, and no backwards compatibility is planed due to this change. Binaries using this library build prior to version 2.2 will require updating and recompilation.
-- 
1.9.3

^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
       [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
@ 2015-06-17  0:39  0%         ` Stephen Hemminger
  2015-06-17  7:29  0%           ` Panu Matilainen
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-17  0:39 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Stephen Hemminger

On Tue, 16 Jun 2015 23:37:32 +0000
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2015-06-16 16:05, Stephen Hemminger:
> > On Tue, 16 Jun 2015 14:52:16 +0100
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > 
> > > On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> > > > From: Stephen Hemminger <shemming@brocade.com>
> > > > 
> > > > These were deprecated in 2.0 so remove them from 2.1
> > > > 
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > ---
> > > >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> > > >  drivers/net/ring/rte_eth_ring_version.map |  4 +--
> > > >  2 files changed, 1 insertion(+), 58 deletions(-)
> > > >   
> > > [..snip..]
> > > > diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> > > > index 8ad107d..5ee55d9 100644
> > > > --- a/drivers/net/ring/rte_eth_ring_version.map
> > > > +++ b/drivers/net/ring/rte_eth_ring_version.map
> > > > @@ -1,9 +1,7 @@
> > > > -DPDK_2.0 {
> > > > +DPDK_2.1 {
> > > >  	global:
> > > >  
> > > >  	rte_eth_from_rings;
> > > > -	rte_eth_ring_pair_attach;
> > > > -	rte_eth_ring_pair_create;
> > > >  
> > > >  	local: *;
> > > >  };  
> > > 
> > > [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
> > > version listings in the .map file?
> > 
> > Notice the version # changed as well, so linker will generate a new version.
> > The function was marked deprecated in last version.
> 
> What happens if you load the 2.1 lib with an app built for 2.0?
> Shouldn't we keep the DPDK_2.0 block?

What happens is that build process makes a new version of DPDK package
with a new version number. This version can co-exist on same system with
old library (depends on library packaging).
Old library will have old functions, and old application will
use old library. New applications will be have new so version and get the
new library.

  http://unix.stackexchange.com/questions/475/how-do-so-shared-object-numbers-work

If we didn't do this, nothing could ever really be removed!

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  2015-06-16 23:05  0%     ` Stephen Hemminger
@ 2015-06-16 23:37  0%       ` Thomas Monjalon
       [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-16 23:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

2015-06-16 16:05, Stephen Hemminger:
> On Tue, 16 Jun 2015 14:52:16 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> > > From: Stephen Hemminger <shemming@brocade.com>
> > > 
> > > These were deprecated in 2.0 so remove them from 2.1
> > > 
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> > >  drivers/net/ring/rte_eth_ring_version.map |  4 +--
> > >  2 files changed, 1 insertion(+), 58 deletions(-)
> > >   
> > [..snip..]
> > > diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> > > index 8ad107d..5ee55d9 100644
> > > --- a/drivers/net/ring/rte_eth_ring_version.map
> > > +++ b/drivers/net/ring/rte_eth_ring_version.map
> > > @@ -1,9 +1,7 @@
> > > -DPDK_2.0 {
> > > +DPDK_2.1 {
> > >  	global:
> > >  
> > >  	rte_eth_from_rings;
> > > -	rte_eth_ring_pair_attach;
> > > -	rte_eth_ring_pair_create;
> > >  
> > >  	local: *;
> > >  };  
> > 
> > [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
> > version listings in the .map file?
> 
> Notice the version # changed as well, so linker will generate a new version.
> The function was marked deprecated in last version.

What happens if you load the 2.1 lib with an app built for 2.0?
Shouldn't we keep the DPDK_2.0 block?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI
@ 2015-06-16 23:29  9% Thomas Monjalon
  2015-06-17  4:36  9% ` Matthew Hall
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Thomas Monjalon @ 2015-06-16 23:29 UTC (permalink / raw)
  To: announce

Hi all,

Sometimes there are some important discussions about architecture or design
which require opinions from several developers. Unfortunately, we cannot
read every threads. Maybe that using the announce mailing list will help
to bring more audience to these discussions.
Please note that
	- the announce@ ML is moderated to keep a low traffic,
	- every announce email is forwarded to dev@ ML.
In case you want to reply to this email, please use dev@dpdk.org address.

There were some debates about software statistics disabling.
Should they be always on or possibly disabled when compiled?
We need to take a decision shortly and discuss (or agree) this proposal:
	http://dpdk.org/ml/archives/dev/2015-June/019461.html

During the development of the release 2.0, there was an agreement to keep
ABI compatibility or to bring new ABI while keeping old one during one release.
In case it's not possible to have this transition, the (exceptional) break
should be acknowledged by several developers.
	http://dpdk.org/doc/guides-2.0/rel_notes/abi.html
There were some interesting discussions but not a lot of participants:
	http://thread.gmane.org/gmane.comp.networking.dpdk.devel/8367/focus=8461

During the current development cycle for the release 2.1, the ABI question
arises many times in different threads.
To add the hash key size field, it is proposed to use a struct padding gap:
	http://dpdk.org/ml/archives/dev/2015-June/019386.html
To support the flow director for VF, there is no proposal yet:
	http://dpdk.org/ml/archives/dev/2015-June/019343.html
To add the speed capability, it is proposed to break ABI in the release 2.2:
	http://dpdk.org/ml/archives/dev/2015-June/019225.html
To support vhost-user multiqueues, it is proposed to break ABI in 2.2:
	http://dpdk.org/ml/archives/dev/2015-June/019443.html
To add the interrupt mode, it is proposed to add a build-time option
CONFIG_RTE_EAL_RX_INTR to switch between compatible and ABI breaking binary:
	http://dpdk.org/ml/archives/dev/2015-June/018947.html
To add the packet type, there is a proposal to add a build-time option
CONFIG_RTE_NEXT_ABI common to every ABI breaking features:
	http://dpdk.org/ml/archives/dev/2015-June/019172.html
We must also better document how to remove a deprecated ABI:
	http://dpdk.org/ml/archives/dev/2015-June/019465.html
The ABI compatibility is a new constraint and we need to better understand
what it means and how to proceed. Even the macros are not yet well documented:
	http://dpdk.org/ml/archives/dev/2015-June/019357.html

Thanks for your attention and your participation in these important choices.

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  2015-06-16 13:52  3%   ` Bruce Richardson
@ 2015-06-16 23:05  0%     ` Stephen Hemminger
  2015-06-16 23:37  0%       ` Thomas Monjalon
       [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2015-06-16 23:05 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Stephen Hemminger

On Tue, 16 Jun 2015 14:52:16 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> > From: Stephen Hemminger <shemming@brocade.com>
> > 
> > These were deprecated in 2.0 so remove them from 2.1
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> >  drivers/net/ring/rte_eth_ring_version.map |  4 +--
> >  2 files changed, 1 insertion(+), 58 deletions(-)
> >   
> [..snip..]
> > diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> > index 8ad107d..5ee55d9 100644
> > --- a/drivers/net/ring/rte_eth_ring_version.map
> > +++ b/drivers/net/ring/rte_eth_ring_version.map
> > @@ -1,9 +1,7 @@
> > -DPDK_2.0 {
> > +DPDK_2.1 {
> >  	global:
> >  
> >  	rte_eth_from_rings;
> > -	rte_eth_ring_pair_attach;
> > -	rte_eth_ring_pair_create;
> >  
> >  	local: *;
> >  };  
> 
> [ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
> version listings in the .map file?

Notice the version # changed as well, so linker will generate a new version.
The function was marked deprecated in last version.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions
  @ 2015-06-16 13:52  3%   ` Bruce Richardson
  2015-06-16 23:05  0%     ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-06-16 13:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

On Mon, Jun 15, 2015 at 09:51:11AM -0700, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
> 
> These were deprecated in 2.0 so remove them from 2.1
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
>  drivers/net/ring/rte_eth_ring_version.map |  4 +--
>  2 files changed, 1 insertion(+), 58 deletions(-)
> 
[..snip..]
> diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> index 8ad107d..5ee55d9 100644
> --- a/drivers/net/ring/rte_eth_ring_version.map
> +++ b/drivers/net/ring/rte_eth_ring_version.map
> @@ -1,9 +1,7 @@
> -DPDK_2.0 {
> +DPDK_2.1 {
>  	global:
>  
>  	rte_eth_from_rings;
> -	rte_eth_ring_pair_attach;
> -	rte_eth_ring_pair_create;
>  
>  	local: *;
>  };

[ABI newbie question] Is this how deprecating a fn is done? We no longer have any DPDK_2.0 
version listings in the .map file?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4] doc: guidelines for library statistics
@ 2015-06-16 13:35  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-16 13:35 UTC (permalink / raw)
  To: dev

v4 changes
-more fixes for bullets

v3 changes
-fixed bullets for correct doc generation

v2 changes
-small text changes
-reordered sections to have guidelines at the top and motivation at the end
-broke lines at 80 characters

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+* For Application A, counters maintained by Library X are always relevant and
+  the application needs to use them to implement certain features, such as traffic
+  accounting, logging, application-level statistics, etc. In this case,
+  the application requires that collection of statistics counters for Library X is
+  always turned on.
+
+* For Application B, counters maintained by Library X are only useful during the
+  application debug stage and are not relevant once debug phase is over. In this
+  case, the application may decide to turn on the collection of Library X
+  statistics counters during the debug phase and at a later stage turn them off.
+
+* For Application C, counters maintained by Library X are not relevant at all.
+  It might be that the application maintains its own set of statistics counters
+  that monitor a different set of run-time events (e.g. number of connection
+  requests, number of active users, etc). It might also be that the application
+  uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+  statistics counters of Library Y, but not in those of Library X. In this case,
+  the application may decide to turn the collection of statistics counters off for
+  Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+* Number of libraries used by the current application that have statistics
+  counters collection turned on.
+
+* Number of statistics counters maintained by each library per object type
+  instance (e.g. per port, table, pipeline, thread, etc).
+
+* Number of instances created for each object type supported by each library.
+
+* Complexity of the statistics logic collection for each counter: when only
+  some occurrences of a specific event are valid, additional logic is typically
+  needed to decide whether the current occurrence of the event should be counted
+  or not. For example, in the event of packet reception, when only TCP packets
+  with destination port within a certain range should be recorded, conditional
+  branches are usually required. When processing a burst of packets that have been
+  validated for header integrity, counting the number of bits set in a bitmask
+  might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v3] doc: guidelines for library statistics
@ 2015-06-16 13:15  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-16 13:15 UTC (permalink / raw)
  To: dev

v3 changes
-fixed bullets for correct doc generation

v2 changes
-small text changes
-reordered sections to have guidelines at the top and motivation at the end
-broke lines at 80 characters

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+ * For Application A, counters maintained by Library X are always relevant and
+the application needs to use them to implement certain features, such as traffic
+accounting, logging, application-level statistics, etc. In this case,
+the application requires that collection of statistics counters for Library X is
+always turned on.
+
+ * For Application B, counters maintained by Library X are only useful during the
+application debug stage and are not relevant once debug phase is over. In this
+case, the application may decide to turn on the collection of Library X
+statistics counters during the debug phase and at a later stage turn them off.
+
+ * For Application C, counters maintained by Library X are not relevant at all.
+It might be that the application maintains its own set of statistics counters
+that monitor a different set of run-time events (e.g. number of connection
+requests, number of active users, etc). It might also be that the application
+uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+statistics counters of Library Y, but not in those of Library X. In this case,
+the application may decide to turn the collection of statistics counters off for
+Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+ * Number of libraries used by the current application that have statistics
+counters collection turned on.
+
+ * Number of statistics counters maintained by each library per object type
+instance (e.g. per port, table, pipeline, thread, etc).
+
+ * Number of instances created for each object type supported by each library.
+
+ * Complexity of the statistics logic collection for each counter: when only
+some occurrences of a specific event are valid, additional logic is typically
+needed to decide whether the current occurrence of the event should be counted
+or not. For example, in the event of packet reception, when only TCP packets
+with destination port within a certain range should be recorded, conditional
+branches are usually required. When processing a burst of packets that have been
+validated for header integrity, counting the number of bits set in a bitmask
+might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2] doc: guidelines for library statistics
@ 2015-06-16 13:14  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-16 13:14 UTC (permalink / raw)
  To: dev

v3 changes
-fixed bullets for correct doc generation

v2 changes
-small text changes
-reordered sections to have guidelines at the top and motivation at the end
-broke lines at 80 characters

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+ * For Application A, counters maintained by Library X are always relevant and
+the application needs to use them to implement certain features, such as traffic
+accounting, logging, application-level statistics, etc. In this case,
+the application requires that collection of statistics counters for Library X is
+always turned on.
+
+ * For Application B, counters maintained by Library X are only useful during the
+application debug stage and are not relevant once debug phase is over. In this
+case, the application may decide to turn on the collection of Library X
+statistics counters during the debug phase and at a later stage turn them off.
+
+ * For Application C, counters maintained by Library X are not relevant at all.
+It might be that the application maintains its own set of statistics counters
+that monitor a different set of run-time events (e.g. number of connection
+requests, number of active users, etc). It might also be that the application
+uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+statistics counters of Library Y, but not in those of Library X. In this case,
+the application may decide to turn the collection of statistics counters off for
+Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+ * Number of libraries used by the current application that have statistics
+counters collection turned on.
+
+ * Number of statistics counters maintained by each library per object type
+instance (e.g. per port, table, pipeline, thread, etc).
+
+ * Number of instances created for each object type supported by each library.
+
+ * Complexity of the statistics logic collection for each counter: when only
+some occurrences of a specific event are valid, additional logic is typically
+needed to decide whether the current occurrence of the event should be counted
+or not. For example, in the event of packet reception, when only TCP packets
+with destination port within a certain range should be recorded, conditional
+branches are usually required. When processing a burst of packets that have been
+validated for header integrity, counting the number of bits set in a bitmask
+might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues
  2015-06-16  1:38 23% [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues Ouyang Changchun
@ 2015-06-16 10:36  5% ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-06-16 10:36 UTC (permalink / raw)
  To: Ouyang Changchun; +Cc: dev

On Tue, Jun 16, 2015 at 09:38:43AM +0800, Ouyang Changchun wrote:
> It announces the planned ABI changes for vhost-user multiple queues feature on v2.2.
> 
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> ---
>  doc/guides/rel_notes/abi.rst | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
> index f00a6ee..dc1b0eb 100644
> --- a/doc/guides/rel_notes/abi.rst
> +++ b/doc/guides/rel_notes/abi.rst
> @@ -38,3 +38,4 @@ Examples of Deprecation Notices
>  
>  Deprecation Notices
>  -------------------
> +* The ABI changes are planned for struct virtio_net in order to support vhost-user multiple queues feature. The upcoming release 2.1 will not contain these ABI changes, but release 2.2 will, and no backwards compatibility is planed due to the vhost-user multiple queues feature enabling. Binaries using this library build prior to version 2.2 will require updating and recompilation.
> -- 
> 1.8.4.2
> 
> 

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type
    @ 2015-06-16  3:43  3% ` Jingjing Wu
  2015-06-26  2:26  0%   ` Xu, HuilongX
  2015-06-26  3:14  0%   ` Zhang, Helin
  1 sibling, 2 replies; 200+ results
From: Jingjing Wu @ 2015-06-16  3:43 UTC (permalink / raw)
  To: dev

This patch set extends flow director to support L2_paylod type in i40e driver.
 
v2 change:
 - remove the flow director VF filtering from this patch to avoid breaking ABI.

Jingjing Wu (4):
  ethdev: add struct rte_eth_l2_flow to support l2_payload flow type
  i40e: extend flow diretcor to support l2_payload flow type
  testpmd: extend commands
  doc: extend commands in testpmd

 app/test-pmd/cmdline.c                      | 48 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  5 ++-
 drivers/net/i40e/i40e_fdir.c                | 24 +++++++++++++--
 lib/librte_ether/rte_eth_ctrl.h             |  8 +++++
 4 files changed, 78 insertions(+), 7 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues
@ 2015-06-16  1:38 23% Ouyang Changchun
  2015-06-16 10:36  5% ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ouyang Changchun @ 2015-06-16  1:38 UTC (permalink / raw)
  To: dev

It announces the planned ABI changes for vhost-user multiple queues feature on v2.2.

Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..dc1b0eb 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* The ABI changes are planned for struct virtio_net in order to support vhost-user multiple queues feature. The upcoming release 2.1 will not contain these ABI changes, but release 2.2 will, and no backwards compatibility is planed due to the vhost-user multiple queues feature enabling. Binaries using this library build prior to version 2.2 will require updating and recompilation.
-- 
1.8.4.2

^ permalink raw reply	[relevance 23%]

* [dpdk-dev] [PATCH v2] doc: guidelines for library statistics
@ 2015-06-15 22:07  5% Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2015-06-15 22:07 UTC (permalink / raw)
  To: dev


Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |    1 +
 doc/guides/guidelines/statistics.rst |  104 ++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958 100644
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..bc91723
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,104 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter
+support. This includes guidelines for turning library statistics on and off and
+requirements for preventing ABI changes when implementing statistics.
+
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that maintains statistics counters should provide a single build
+time flag that decides whether the statistics counter collection is enabled or
+not. This flag should be exposed as a variable within the DPDK configuration
+file. When this flag is set, all the counters supported by current library are
+collected for all the instances of every object type provided by the library.
+When this flag is cleared, none of the counters supported by the current library
+are collected for any instance of any object type provided by the library:
+
+.. code-block:: console
+
+	# DPDK file config/common_linuxapp, config/common_bsdapp, etc.
+	CONFIG_RTE_<LIBRARY_NAME>_STATS_COLLECT=y/n
+
+The default value for this DPDK configuration file variable (either "yes" or
+"no") is decided by each library.
+
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the
+library API should not be affected by whether the collection of statistics
+counters is turned on or off for the current library. In practical terms, this
+means that space should always be allocated in the API data structures for
+statistics counters and the statistics related API functions are always built
+into the code, regardless of whether the statistics counter collection is turned
+on or off for the current library.
+
+When the collection of statistics counters for the current library is turned
+off, the counters retrieved through the statistics related API functions should
+have a default value of zero.
+
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow
+an application to monitor the library-level run-time events. Typical counters
+are: number of packets received/dropped/transmitted, number of buffers
+allocated/freed, number of occurrences for specific events, etc.
+
+However, the resources consumed for library-level statistics counter collection
+have to be spent out of the application budget and the counters collected by
+some libraries might not be relevant to the current application. In order to
+avoid any unwanted waste of resources and/or performance impacts, the
+application should decide at build time whether the collection of library-level
+statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific
+applications:
+
+* For Application A, counters maintained by Library X are always relevant and
+the application needs to use them to implement certain features, such as traffic
+accounting, logging, application-level statistics, etc. In this case,
+the application requires that collection of statistics counters for Library X is
+always turned on.
+
+* For Application B, counters maintained by Library X are only useful during the
+application debug stage and are not relevant once debug phase is over. In this
+case, the application may decide to turn on the collection of Library X
+statistics counters during the debug phase and at a later stage turn them off.
+
+* For Application C, counters maintained by Library X are not relevant at all.
+It might be that the application maintains its own set of statistics counters
+that monitor a different set of run-time events (e.g. number of connection
+requests, number of active users, etc). It might also be that the application
+uses multiple libraries (Library X, Library Y, etc) and it is interested in the
+statistics counters of Library Y, but not in those of Library X. In this case,
+the application may decide to turn the collection of statistics counters off for
+Library X and on for Library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles,
+cache bandwidth, memory bandwidth, etc) that depends on:
+
+* Number of libraries used by the current application that have statistics
+counters collection turned on.
+
+* Number of statistics counters maintained by each library per object type
+instance (e.g. per port, table, pipeline, thread, etc).
+
+* Number of instances created for each object type supported by each library.
+
+* Complexity of the statistics logic collection for each counter: when only
+some occurrences of a specific event are valid, additional logic is typically
+needed to decide whether the current occurrence of the event should be counted
+or not. For example, in the event of packet reception, when only TCP packets
+with destination port within a certain range should be recorded, conditional
+branches are usually required. When processing a burst of packets that have been
+validated for header integrity, counting the number of bits set in a bitmask
+might be needed.
-- 
1.7.4.1

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics
  2015-06-11 12:05  0% ` Thomas Monjalon
@ 2015-06-15 21:46  4%   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2015-06-15 21:46 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, June 11, 2015 1:05 PM
> To: Dumitrescu, Cristian
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics
> 
> Hi Cristian,
> 
> Thanks for trying to make a policy clearer.
> We need to make a decision in the coming week.
> Below are comments on the style and content.
> 
> 2015-06-08 15:50, Cristian Dumitrescu:
> >  doc/guides/guidelines/statistics.rst | 42
> ++++++++++++++++++++++++++++++++++++
> 
> Maybe we should have a more general file like design.rst.

I am not sure I correctly understood your suggestion. Do you want the section on statistics to be called design.rst? Do you want it to be part of doc/guides/guidelines or create a brand new document?
To me, the initial idea of doc/guidelines/statistics.rst makes more sense, but I am fine either way.

> In order to have a lot of readers of such guidelines, they must be concise.

I reordered the sections to provide the guidelines first and the motivation afterwards. I think it is good to keep the motivation together with the guidelines. Do you suggest to remove the motivation from this document?

> 
> Please wrap lines to be not too long and/or split lines after the end of a
> sentence.

Done in next version.

> 
> > +Library Statistics
> > +==================
> > +
> > +Description
> > +-----------
> > +
> > +This document describes the guidelines for DPDK library-level statistics
> counter support. This includes guidelines for turning library statistics on and
> off, requirements for preventing ABI changes when library statistics are
> turned on and off, etc.
> 
> Should we consider that driver stats and lib stats are different in DPDK? Why?

I did update the document to make it clear these guidelines are applicable for the case when the stats counters are maintained by the library itself.
I think the primary difference is whether the stats counters are implemented by the HW (e.g. NIC) or in SW by the CPU.
* In the first case, the CPU does not spend cycles maintaining the counters. System resources might be consumed by the NIC for maintaining the counters (e.g. memory bandwidth), so sometimes the NICs provide mechanisms to enable/disable stats, which is done as part of init code rather than buid time.
* In the second case, the CPU maintains the counters, which comes at the usual cost of cycles and system resources. These guidelines focus on this case.

> 
> > +Motivation to allow the application to turn library statistics on and off
> > +-------------------------------------------------------------------------
> > +
> > +It is highly recommended that each library provides statistics counters to
> allow the application to monitor the library-level run-time events. Typical
> counters are: number of packets received/dropped/transmitted, number of
> buffers allocated/freed, number of occurrences for specific events, etc.
> > +
> > +Since the resources consumed for library-level statistics counter collection
> have to be spent out of the application budget and the counters collected by
> some libraries might not be relevant for the current application, in order to
> avoid any unwanted waste of resources and/or performance for the
> application, the application is to decide at build time whether the collection
> of library-level statistics counters should be turned on or off for each library
> individually.
> 
> It would be good to have acknowledgements or other opinions on this.
> Some of them were expressed in other threads. Please comment here.
> 
> > +Library-level statistics counters can be relevant or not for specific
> applications:
> > +* For application A, counters maintained by library X are always relevant
> and the application needs to use them to implement certain features, as
> traffic accounting, logging, application-level statistics, etc. In this case, the
> application requires that collection of statistics counters for library X is always
> turned on;
> > +* For application B, counters maintained by library X are only useful during
> the application debug stage and not relevant once debug phase is over. In
> this case, the application may decide to turn on the collection of library X
> statistics counters during the debug phase and later on turn them off;
> 
> Users of binary package have not this choice.

There is a section in the document about allowing stats to be turned on/off without ABI impact.

> 
> > +* For application C, counters maintained by library X are not relevant at all.
> It might me that the application maintains its own set of statistics counters
> that monitor a different set of run-time events than library X (e.g. number of
> connection requests, number of active users, etc). It might also be that
> application uses multiple libraries (library X, library Y, etc) and it is interested
> in the statistics counters of library Y, but not in those of library X. In this case,
> the application may decide to turn the collection of statistics counters off for
> library X and on for library Y.
> > +
> > +The statistics collection consumes a certain amount of CPU resources
> (cycles, cache bandwidth, memory bandwidth, etc) that depends on:
> > +* Number of libraries used by the current application that have statistics
> counters collection turned on;
> > +* Number of statistics counters maintained by each library per object type
> instance (e.g. per port, table, pipeline, thread, etc);
> > +* Number of instances created for each object type supported by each
> library;
> > +* Complexity of the statistics logic collection for each counter: when only
> some occurrences of a specific event are valid, several conditional branches
> might involved in the decision of whether the current occurrence of the
> event should be counted or not (e.g. on the event of packet reception, only
> TCP packets with destination port within a certain range should be recorded),
> etc.
> > +
> > +Mechanism to allow the application to turn library statistics on and off
> > +------------------------------------------------------------------------
> > +
> > +Each library that provides statistics counters should provide a single build
> time flag that decides whether the statistics counter collection is enabled or
> not for this library. This flag should be exposed as a variable within the DPDK
> configuration file. When this flag is set, all the counters supported by current
> library are collected; when this flag is cleared, none of the counters
> supported by the current library are collected:
> > +
> > +	#DPDK file “./config/common_linuxapp”,
> “./config/common_bsdapp”, etc
> > +	CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_COLLECT_STATS=y/n
> 
> Why not simply CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_STATS (without
> COLLECT)?
> 

The reason I propose the name STATS_COLLECT rather than just STATS is to give a strong hint that counters are always allocated and the stats API is always available, while just the collection/update of the counters is configurable. The simple STATS mention might drive some library developers to make the stats API compiled in and out, which is something we want to prevent in order to avoid ABI changes. I don't have a strong preference though, if you still think STATS is better than STATS_COLLECT, let me know.

> > +The default value for this DPDK configuration file variable (either “yes” or
> “no”) is left at the decision of each library.
> > +
> > +Prevention of ABI changes due to library statistics support
> > +-----------------------------------------------------------
> > +
> > +The layout of data structures and prototype of functions that are part of
> the library API should not be affected by whether the collection of statistics
> counters is turned on or off for the current library. In practical terms, this
> means that space is always allocated in the API data structures for statistics
> counters and the statistics related API functions are always built into the
> code, regardless of whether the statistics counter collection is turned on or
> off for the current library.
> > +
> > +When the collection of statistics counters for the current library is turned
> off, the counters retrieved through the statistics related API functions should
> have the default value of zero.


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info
  @ 2015-06-15 18:23  3%                     ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-15 18:23 UTC (permalink / raw)
  To: David Harton (dharton), Wang, Liang-min, dev



> -----Original Message-----
> From: David Harton (dharton) [mailto:dharton@cisco.com]
> Sent: Monday, June 15, 2015 5:05 PM
> To: Ananyev, Konstantin; Wang, Liang-min; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info
> 
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > Sent: Monday, June 15, 2015 9:46 AM
> > To: Wang, Liang-min; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access
> > device info
> >
> >
> >
> > > -----Original Message-----
> > > From: Wang, Liang-min
> > > Sent: Monday, June 15, 2015 2:26 PM
> > > To: Ananyev, Konstantin; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > access device info
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Friday, June 12, 2015 8:31 AM
> > > > To: Wang, Liang-min; dev@dpdk.org
> > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > > access device info
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Wang, Liang-min
> > > > > Sent: Thursday, June 11, 2015 10:51 PM
> > > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support
> > > > > access device info
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Ananyev, Konstantin
> > > > > > Sent: Thursday, June 11, 2015 9:07 AM
> > > > > > To: Wang, Liang-min; dev@dpdk.org
> > > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > support access device info
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Wang, Liang-min
> > > > > > > Sent: Thursday, June 11, 2015 1:58 PM
> > > > > > > To: Ananyev, Konstantin; dev@dpdk.org
> > > > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > > support access device info
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ananyev, Konstantin
> > > > > > > > Sent: Thursday, June 11, 2015 8:26 AM
> > > > > > > > To: Wang, Liang-min; dev@dpdk.org
> > > > > > > > Cc: Wang, Liang-min
> > > > > > > > Subject: RE: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > > > support access device info
> > > > > > > >
> > > > > > > > Hi Larry,
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of
> > > > > > > > > Liang-Min Larry Wang
> > > > > > > > > Sent: Wednesday, June 10, 2015 4:10 PM
> > > > > > > > > To: dev@dpdk.org
> > > > > > > > > Cc: Wang, Liang-min
> > > > > > > > > Subject: [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to
> > > > > > > > > support access device info
> > > > > > > > >
> > > > > > > > > add new apis:
> > > > > > > > > - rte_eth_dev_default_mac_addr_set
> > > > > > > > > - rte_eth_dev_reg_leng
> > > > > > > > > - rte_eth_dev_reg_info
> > > > > > > > > - rte_eth_dev_eeprom_leng
> > > > > > > > > - rte_eth_dev_get_eeprom
> > > > > > > > > - rte_eth_dev_set_eeprom
> > > > > > > > > - rte_eth_dev_get_ringparam
> > > > > > > > > - rte_eth_dev_set_ringparam
> > > > > > > > >
> > > > > > > > > to enable reading device parameters (mac-addr, register,
> > > > > > > > > eeprom,
> > > > > > > > > ring) based upon ethtool alike data parameter specification.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Liang-Min Larry Wang
> > > > > > > > > <liang-min.wang@intel.com>
> > > > > > > > > ---
> > > > > > > > >  lib/librte_ether/Makefile              |   1 +
> > > > > > > > >  lib/librte_ether/rte_eth_dev_info.h    |  80
> > +++++++++++++++++
> > > > > > > > >  lib/librte_ether/rte_ethdev.c          | 159
> > > > > > > > +++++++++++++++++++++++++++++++++
> > > > > > > > >  lib/librte_ether/rte_ethdev.h          | 158
> > > > > > > > ++++++++++++++++++++++++++++++++
> > > > > > > > >  lib/librte_ether/rte_ether_version.map |   8 ++
> > > > > > > > >  5 files changed, 406 insertions(+)  create mode 100644
> > > > > > > > > lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > >
> > > > > > > > > diff --git a/lib/librte_ether/Makefile
> > > > > > > > > b/lib/librte_ether/Makefile index c0e5768..05209e9 100644
> > > > > > > > > --- a/lib/librte_ether/Makefile
> > > > > > > > > +++ b/lib/librte_ether/Makefile
> > > > > > > > > @@ -51,6 +51,7 @@ SRCS-y += rte_ethdev.c
> > > > > > > > > SYMLINK-y-include += rte_ether.h  SYMLINK-y-include +=
> > > > > > > > > rte_ethdev.h SYMLINK-y-include
> > > > > > > > > += rte_eth_ctrl.h
> > > > > > > > > +SYMLINK-y-include += rte_eth_dev_info.h
> > > > > > > > >
> > > > > > > > >  # this lib depends upon:
> > > > > > > > >  DEPDIRS-y += lib/librte_eal lib/librte_mempool
> > > > > > > > > lib/librte_ring lib/librte_mbuf diff --git
> > > > > > > > > a/lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > > b/lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000..002c4b5
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/lib/librte_ether/rte_eth_dev_info.h
> > > > > > > > > @@ -0,0 +1,80 @@
> > > > > > > > > +/*-
> > > > > > > > > + *   BSD LICENSE
> > > > > > > > > + *
> > > > > > > > > + *   Copyright(c) 2015 Intel Corporation. All rights
> > reserved.
> > > > > > > > > + *   All rights reserved.
> > > > > > > > > + *
> > > > > > > > > + *   Redistribution and use in source and binary forms,
> > with or
> > > > without
> > > > > > > > > + *   modification, are permitted provided that the
> > following
> > > > conditions
> > > > > > > > > + *   are met:
> > > > > > > > > + *
> > > > > > > > > + *     * Redistributions of source code must retain the
> > above
> > > > copyright
> > > > > > > > > + *       notice, this list of conditions and the following
> > disclaimer.
> > > > > > > > > + *     * Redistributions in binary form must reproduce the
> > above
> > > > > > copyright
> > > > > > > > > + *       notice, this list of conditions and the following
> > disclaimer in
> > > > > > > > > + *       the documentation and/or other materials provided
> > with the
> > > > > > > > > + *       distribution.
> > > > > > > > > + *     * Neither the name of Intel Corporation nor the
> > names of its
> > > > > > > > > + *       contributors may be used to endorse or promote
> > products
> > > > > > derived
> > > > > > > > > + *       from this software without specific prior written
> > permission.
> > > > > > > > > + *
> > > > > > > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > > > > > > > CONTRIBUTORS
> > > > > > > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
> > > > INCLUDING,
> > > > > > BUT
> > > > > > > > NOT
> > > > > > > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
> > > > AND
> > > > > > > > FITNESS FOR
> > > > > > > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> > > > THE
> > > > > > > > COPYRIGHT
> > > > > > > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> > > > INDIRECT,
> > > > > > > > INCIDENTAL,
> > > > > > > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> > > > > > (INCLUDING,
> > > > > > > > BUT NOT
> > > > > > > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> > > > SERVICES;
> > > > > > > > LOSS OF USE,
> > > > > > > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
> > > > > > CAUSED
> > > > > > > > AND ON ANY
> > > > > > > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> > > > LIABILITY,
> > > > > > OR
> > > > > > > > TORT
> > > > > > > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> > > > WAY
> > > > > > OUT
> > > > > > > > OF THE USE
> > > > > > > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> > OF
> > > > SUCH
> > > > > > > > DAMAGE.
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +#ifndef _RTE_ETH_DEV_INFO_H_ #define _RTE_ETH_DEV_INFO_H_
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Placeholder for accessing device registers  */ struct
> > > > > > > > > +rte_dev_reg_info {
> > > > > > > > > +	void *buf; /**< Buffer for register */
> > > > > > > > > +	uint32_t offset; /**< Offset for 1st register to fetch
> > */
> > > > > > > > > +	uint32_t leng; /**< Number of registers to fetch */
> > > > > > > > > +	uint32_t version; /**< Device version */ };
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Placeholder for accessing device eeprom  */ struct
> > > > > > > > > +rte_dev_eeprom_info {
> > > > > > > > > +	void *buf; /**< Buffer for eeprom */
> > > > > > > > > +	uint32_t offset; /**< Offset for 1st eeprom location to
> > > > > > > > > +access
> > > > */
> > > > > > > > > +	uint32_t leng; /**< Length of eeprom region to access */
> > > > > > > > > +	uint32_t magic; /**< Device ID */ };
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Placeholder for accessing device ring parameters  */
> > > > > > > > > +struct rte_dev_ring_info {
> > > > > > > > > +	uint32_t rx_pending; /**< Number of outstanding Rx ring
> > */
> > > > > > > > > +	uint32_t tx_pending; /**< Number of outstanding Tx ring
> > */
> > > > > > > > > +	uint32_t rx_max_pending; /**< Maximum number of
> > > > outstanding
> > > > > > > > > +Rx
> > > > > > > > ring */
> > > > > > > > > +	uint32_t tx_max_pending; /**< Maximum number of
> > > > outstanding
> > > > > > > > > +Tx
> > > > > > > > ring
> > > > > > > > > +*/ };
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * A data structure captures information as defined in
> > > > > > > > > +struct ifla_vf_info
> > > > > > > > > + * for user-space api
> > > > > > > > > + */
> > > > > > > > > +struct rte_dev_vf_info {
> > > > > > > > > +	uint32_t vf;
> > > > > > > > > +	uint8_t mac[ETHER_ADDR_LEN];
> > > > > > > > > +	uint32_t vlan;
> > > > > > > > > +	uint32_t tx_rate;
> > > > > > > > > +	uint32_t spoofchk;
> > > > > > > > > +};
> > > > > > > >
> > > > > > > >
> > > > > > > > Wonder what that structure is for?
> > > > > > > > I can't see it used in any function below?
> > > > > > > >
> > > > > > >
> > > > > > > Good catch, this is designed for other ethtool ops that I did
> > > > > > > not include in
> > > > > > this release, I will remove this from next fix.
> > > > > > >
> > > > > > > > > +
> > > > > > > > > +#endif /* _RTE_ETH_DEV_INFO_H_ */
> > > > > > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > > > > > b/lib/librte_ether/rte_ethdev.c index 5a94654..186e85c
> > > > > > > > > 100644
> > > > > > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > > > > > @@ -2751,6 +2751,32 @@ rte_eth_dev_mac_addr_remove(uint8_t
> > > > > > > > port_id,
> > > > > > > > > struct ether_addr *addr)  }
> > > > > > > > >
> > > > > > > > >  int
> > > > > > > > > +rte_eth_dev_default_mac_addr_set(uint8_t port_id, struct
> > > > > > > > > +ether_addr
> > > > > > > > > +*addr) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +	const int index = 0;
> > > > > > > > > +	const uint32_t pool = 0;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	dev = &rte_eth_devices[port_id];
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> > > > >mac_addr_remove, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mac_addr_add, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +
> > > > > > > > > +	/* Update NIC default MAC address*/
> > > > > > > > > +	(*dev->dev_ops->mac_addr_remove)(dev, index);
> > > > > > > > > +	(*dev->dev_ops->mac_addr_add)(dev, addr, index, pool);
> > > > > > > > > +
> > > > > > > > > +	/* Update default address in NIC data structure */
> > > > > > > > > +	ether_addr_copy(addr, &dev->data->mac_addrs[index]);
> > > > > > > > > +
> > > > > > > > > +	return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > >  rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
> > > > > > > > >  				uint16_t rx_mode, uint8_t on)  { @@
> > > > -3627,3
> > > > > > +3653,136 @@
> > > > > > > > > rte_eth_remove_tx_callback(uint8_t port_id,
> > > > > > > > uint16_t queue_id,
> > > > > > > > >  	/* Callback wasn't found. */
> > > > > > > > >  	return -EINVAL;
> > > > > > > > >  }
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_reg_leng(uint8_t port_id) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_length,
> > > > -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_reg_length)(dev);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_reg_info(uint8_t port_id, struct
> > > > > > > > > +rte_dev_reg_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg, -
> > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_reg)(dev, info); }
> > > > > > > >
> > > > > > > > Seems that *get_reg* stuff, will be really good addition for
> > > > > > > > DPDK debugging abilities.
> > > > > > > > Though, I'd suggest we change it a bit to make more generic
> > > > > > > > and
> > > > flexible:
> > > > > > > >
> > > > > > > > Introduce rte_eth_reg_read/rte_eth_reg_write(),
> > > > > > > > or probably even better rte_pcidev_reg_read
> > > > > > > > /rte_pcidev_reg_write at
> > > > > > EAL.
> > > > > > > > Something similar to what
> > > > > > > > port_pci_reg_read/port_pci_reg_write()
> > > > > > > > are doing now at testpmd.h.
> > > > > > > >
> > > > > > > > struct rte_pcidev_reg_info {
> > > > > > > >    const char *name;
> > > > > > > >    uint32_t endianes, bar, offset, size, count; };
> > > > > > > >
> > > > > > > > int rte_pcidev_reg_read(const struct rte_pci_device *, const
> > > > > > > > struct rte_pcidev_reg_info *, uint64_t *reg_val);
> > > > > > > >
> > > > > > > > Then:
> > > > > > > > int rte_eth_dev_get_reg_info(port_id, const struct
> > > > > > > > rte_pcidev_reg_info **info);
> > > > > > > >
> > > > > > > > So each device would store in info a pointer to an array of
> > > > > > > > it's register descriptions (finished by zero elem?).
> > > > > > > >
> > > > > > > > Then your ethtool (or any other upper layer) can do the
> > > > > > > > following to read all device regs:
> > > > > > > >
> > > > > > >
> > > > > > > The proposed reg info structure allows future improvement to
> > > > > > > support
> > > > > > individual register read/write.
> > > > > > > Also, because each NIC device has a very distinguish register
> > definition.
> > > > > > > So, the plan is to have more comprehensive interface to
> > > > > > > support query operation (for example, register name) before
> > > > > > > introduce individual/group
> > > > > > register access.
> > > > > > > Points taken, the support will be in future release.
> > > > > >
> > > > > > Sorry, didn't get you.
> > > > > > So you are ok to make these changes in next patch version?
> > > > > >
> > > > > I would like to get a consensus from dpdk community on how to
> > > > > provide
> > > > register information.
> > > >
> > > > Well, that' ok, but if it is just a trial patch that is not intended
> > > > to be applied, then you should mark it as RFC.
> > > >
> > > > > Currently, it's designed for debug dumping. The register
> > > > > information is very
> > > > hardware dependent.
> > > > > Need to consider current supported NIC device and future devices
> > > > > for
> > > > DPDK, so we won't make it a bulky interface.
> > > >
> > > > Ok, could you explain what exactly  concerns you in the approach
> > > > described above?
> > > > What part you feel is bulky?
> > > >
> > > > > > >
> > > > > > > > const struct rte_eth_dev_reg_info *reg_info; struct
> > > > > > > > rte_eth_dev_info dev_info;
> > > > > > > >
> > > > > > > > rte_eth_dev_info_get(pid, &dev_info);
> > > > > > > > rte_eth_dev_get_reg_info(port_id, &reg_info);
> > > > > > > >
> > > > > > > > for (i = 0; reg_info[i].name != NULL; i++) {
> > > > > > > >    ...
> > > > > > > >    rte_pcidev_read_reg(dev_info. pci_dev, reg_info[i], &v);
> > > > > > > >   ..
> > > > > > > > }
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_eeprom_leng(uint8_t port_id) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> > > > >get_eeprom_length, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_eeprom_length)(dev);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_get_eeprom(uint8_t port_id, struct
> > > > > > > > > +rte_dev_eeprom_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_eeprom, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_eeprom)(dev, info); }
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_set_eeprom(uint8_t port_id, struct
> > > > > > > > > +rte_dev_eeprom_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_eeprom, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->set_eeprom)(dev, info); }
> > > > > > > > > +
> > > > > > > > > +int
> > > > > > > > > +rte_eth_dev_get_ringparam(uint8_t port_id, struct
> > > > > > > > > +rte_dev_ring_info
> > > > > > > > > +*info) {
> > > > > > > > > +	struct rte_eth_dev *dev;
> > > > > > > > > +
> > > > > > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port_id=%d\n",
> > > > port_id);
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if ((dev= &rte_eth_devices[port_id]) == NULL) {
> > > > > > > > > +		PMD_DEBUG_TRACE("Invalid port device\n");
> > > > > > > > > +		return -ENODEV;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_ringparam, -
> > > > > > > > ENOTSUP);
> > > > > > > > > +	return (*dev->dev_ops->get_ringparam)(dev, info); }
> > > > > > > >
> > > > > > > > I think it will be a useful addition to the ethdev API  to
> > > > > > > > have an ability to retrieve current RX/TX queue parameters.
> > > > > > > > Though again, it need to be more generic, so it could be
> > > > > > > > useful for
> > > > > > > > non- ethtool upper layer too.
> > > > > > > > So I suggest to modify it a bit.
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > struct rte_eth_tx_queue_info {
> > > > > > > >     struct rte_eth_txconf txconf;
> > > > > > > >     uint32_t nb_tx_desc;
> > > > > > > >     uint32_t nb_max_tx_desc; /*max allowable TXDs for that
> > queue */
> > > > > > > >     uint32_t nb_tx_free;            /* number of free TXDs at
> > the moment
> > > > of
> > > > > > call.
> > > > > > > > */
> > > > > > > >     /* other tx queue data. */ };
> > > > > > > >
> > > > > > > > int rte_etdev_get_tx_queue_info(portid, queue_id, struct
> > > > > > > > rte_eth_tx_queue_info *qinfo)
> > > > > > > >
> > > > > > > > Then, your upper layer ethtool wrapper, can implement yours
> > > > > > > > ethtool_get_ringparam() by:
> > > > > > > >
> > > > > > > >  ...
> > > > > > > >  struct rte_eth_tx_queue_info qinfo;
> > > > > > > > rte_ethdev_get_tx_queue_info(port, 0, &qinfo);
> > > > > > > > ring_param->tx_pending = qinfo.nb_tx_desc -
> > > > > > > > rte_eth_rx_queue_count(port, 0);
> > > > > > > >
> > > > > > > > Or probably even:
> > > > > > > > ring_param->tx_pending = qinfo.nb_tx_desc -
> > > > > > > > qinfo.nb_tx_free;
> > > > > > > >
> > > > > > > > Same for RX.
> > > > > > > >
> > > > > > > For now, this descriptor ring information is used by the ethtool
> > op.
> > > > > > > To make this interface simple, i.e. caller doesn't need to
> > > > > > > access other
> > > > > > queue information.
> > > > > >
> > > > > > I just repeat what I said to you in off-line conversation:
> > > > > > ethdev API is not equal ethtool API.
> > > > > > It is ok to add  a new function/structure to ethdev if it really
> > > > > > needed, but we should do mechanical one to one copy.
> > > > > > It is much better to add  a function/structure that would be
> > > > > > more generic, and suit other users, not only ethtool.
> > > > > > There is no point to have dozen functions in rte_ethdev API
> > > > > > providing similar information.
> > > > > > BTW, I don't see how API I proposed is much more  complex, then
> > > > > > yours
> > > > one.
> > > > > The ring parameter is a run-time information which is different
> > > > > than data
> > > > structure described in this discussion.
> > > >
> > > > I don't see how they are different.
> > > > Looking at ixgbe_get_ringparam(), it returns:
> > > > rx_max_pending - that's a static IXGBE PMD value (max possible
> > > > number of RXDs per one queue).
> > > > rx_pending - number of RXD currently in use by the HW for queue 0
> > > > (that information can be changed at each call).
> > > >
> > > > With the approach I suggesting - you can get same information for
> > > > each RX queue by calling rte_ethdev_get_rx_queue_info() and
> > > > rte_eth_rx_queue_count().
> > > > Plus you are getting other RX queue data.
> > > >
> > > > Another thing - what is practical usage of the information you
> > > > retrieving now by get_ringparam()?
> > > > Let say it returned to you: rx_max_pending=4096; rx_pending=128; How
> > > > that information would help to understand what is going on with the
> > device?
> > > > Without knowing  value of nb_tx_desc for the queue, you can't say is
> > > > you queue full or not.
> > > > Again, it could be that all your traffic going through some other
> > > > queue (not 0).
> > > > So from my point rte_eth_dev_get_ringparam()  usage is very limited,
> > > > and doesn't provide enough information about current queue state.
> > > >
> > > > Same thing applies for TX.
> > > >
> > >
> > > After careful review the suggestion in this comment, and review the
> > existing dpdk source code.
> > > I came to realize that neither rte_ethdev_get_rx_queue_info,
> > > rte_ethdev_get_tx_queue_info, struct rte_eth_rx_queue_info and struct
> > > rte_eth_tx_queue_info are available in the existing dpdk source code. I
> > could not make a patch based upon a set of non- existent API and data
> > structure.
> >
> > Right now, in dpdk.org source code, struct  rte_eth_dev_ring_info, struct
> > rte_dev_eeprom_info and struct  rte_dev_reg_info don't exist also.
> > Same as  all these functions:
> >
> > rte_eth_dev_default_mac_addr_set
> > rte_eth_dev_reg_length
> > rte_eth_dev_reg_info
> > rte_eth_dev_eeprom_length
> > rte_eth_dev_get_eeprom
> > rte_eth_dev_set_eeprom
> > rte_eth_dev_get_ringparam
> >
> > All this is a new API that's you are trying to add.
> > But, by some reason you consider it is ok to add 'struct
> > rte_eth_dev_ring_info', but couldn't add  struct
> > 'rte_ethdev_get_tx_queue_info'
> > That just doesn't make any sense to me.
> > In fact, I think our conversation is going in cycles.
> > If you are not happy with the suggested approach, please provide some
> > meaningful reason why.
> > Konstantin
> 
> It seems the new API aims at providing users a mechanism to quickly and
> gracefully migrate from using ethtool/ioctl calls.

I am fine with that goal in general.
But it doesn't mean that all ethool API should be pushed into rte_ethdev layer.
That's why  a shim layer on top of rte_ethdev is created -
it's goal is to provide for the upper layer an ethool-like API and
hide actual implementation based on rte_ethdev API inside.

>  The provided get/set
> ring param info is very similar to that of ethtool and facilitates the
> ethtool needs.

If rte_ethtool shim layer has to provide get/set ring_param API as it is - that's ok with me.
Though I don't see any good reason why rte_ethdev layer API should restrict itself with exactly the same ethtool-like API.
As I said before - current practical usage of rte_eth_dev_get_ringparam() looks quite limited.
Probably it just me, but I don't see how user can conclude what is the device state,
using  information provided by rte_eth_dev_get_ringparam().

My concern is - if we'll introduce rte_eth_dev_get_ringparam() as it suggested by Larry now,
one of two things would happen:
- no one except ethtool shim layer would use it.
- people will complain that it doesn't provide desired information. 
So, in next release we'll have to either introduce a new function and support 2 functions doing similar things
(code duplication), or modify existing API (ABI breakage pain).

It would be much better to introduce a new rte_ethdev API here that would be generic enough
and fit common needs, not only one particular case (ethtool).
After all, I don't think the changes I suggesting  differ that much from current approach.
Konstantin

> While additional enhancements to the API to provide additional details
> such as those you have suggested are certainly possible, I believe Larry
> is stating those ideas are outside the scope he has intended with the
> API introduction and that they should be discussed further and delivered
> in a future patch.

> 
> Does that seem reasonable?
> 
> Thanks,
> Dave
> 
> >
> > >
> > > > > It's the desire of this patch to separate each data structure to
> > > > > avoid cross
> > > > dependency.
> > > >
> > > > That's too cryptic to me.
> > > > Could you explain what cross dependency you are talking about?
> > > >
> > > > Konstantin

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
@ 2015-06-15 15:01  3%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-15 15:01 UTC (permalink / raw)
  To: nhorman; +Cc: dev

2015-06-12 15:33, Helin Zhang:
> v3 changes:
> * Moved the newly added element right after 'uint16_t reta_size', where it
>   was a padding. So it will not break any ABI compatibility, and no need to
>   disable it by default.
[...]
> @@ -918,6 +918,7 @@ struct rte_eth_dev_info {
>  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
>  	uint16_t reta_size;
>  	/**< Device redirection table size, the total number of entries. */
> +	uint8_t hash_key_size; /**< Hash key size in bytes */
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>  	uint64_t flow_type_rss_offloads;
>  	struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */

Neil, what is your opinion?
Do you ack that this technique covers every ABI cases?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 4/4] ethdev: check support for rx_queue_count and descriptor_done fns
  @ 2015-06-15 10:14  4%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-15 10:14 UTC (permalink / raw)
  To: Roger B. Melton; +Cc: dev

On Fri, Jun 12, 2015 at 01:32:56PM -0400, Roger B. Melton wrote:
> Hi Bruce,  Comment in-line.  Regards, Roger
> 
> On 6/12/15 7:28 AM, Bruce Richardson wrote:
> >The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are
> >supported by very few PMDs. Therefore, it is best to check for support
> >for the functions in the ethdev library, so as to avoid crashes
> >at run-time if the application goes to use those APIs. The performance
> >impact of this change should be very small as this is a predictable
> >branch in the function.
> >
> >Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >---
> >  lib/librte_ether/rte_ethdev.h | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> >diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> >index 827ca3e..9ad1b6a 100644
> >--- a/lib/librte_ether/rte_ethdev.h
> >+++ b/lib/librte_ether/rte_ethdev.h
> >@@ -2496,6 +2496,8 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
> >   *  The queue id on the specific port.
> >   * @return
> >   *  The number of used descriptors in the specific queue.
> >+ *  NOTE: if function is not supported by device this call
> >+ *        returns (uint32_t)-ENOTSUP
> >   */
> >  static inline uint32_t
> 
> Why not change the return type to int32_t?
> In this way, the caller isn't required to make the assumption that a large
> queue count indicates an error.  < 0 means error, other wise it's a valid
> queue count.
> 
> This approach would be consistent with other APIs.
> 

Yes, good point, I should see about that. One thing I'm unsure of, though, is
does this count as ABI breakage? I don't see how it should break any older
apps, since the return type is the same size, but I'm not sure as we are 
changing the return type of the function.

Neil, can you perhaps comment here? Is changing uint32_t to int32_t ok, from
an ABI point of view?

Regards,
/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow director in VFs
  2015-06-12 16:45  3%   ` Thomas Monjalon
@ 2015-06-15  7:14  3%     ` Wu, Jingjing
  0 siblings, 0 replies; 200+ results
From: Wu, Jingjing @ 2015-06-15  7:14 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, neil.horman



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Saturday, June 13, 2015 12:45 AM
> To: Wu, Jingjing
> Cc: dev@dpdk.org; neil.horman@tuxdriver.com
> Subject: Re: [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow
> director in VFs
> 
> 2015-05-11 11:46, Jingjing Wu:
> > This patch extends struct rte_eth_fdir_flow_ext to support flow
> > director in VFs.
> >
> > Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> 
> > --- a/lib/librte_ether/rte_eth_ctrl.h
> > +++ b/lib/librte_ether/rte_eth_ctrl.h
> > @@ -394,6 +394,8 @@ struct rte_eth_fdir_flow_ext {
> >  	uint16_t vlan_tci;
> >  	uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
> >  	/**< It is filled by the flexible payload to match. */
> > +	uint8_t is_vf;   /**< 1 for VF, 0 for port dev */
> > +	uint16_t dst_id; /**< VF ID, available when is_vf is 1*/
> >  };
> 
> Isn't it breaking the ABI?

Yes, it is breaking the ABI. Will consider how to avoid that.

Thanks
Jingjing

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow director in VFs
  @ 2015-06-12 16:45  3%   ` Thomas Monjalon
  2015-06-15  7:14  3%     ` Wu, Jingjing
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-12 16:45 UTC (permalink / raw)
  To: Jingjing Wu; +Cc: dev, neil.horman

2015-05-11 11:46, Jingjing Wu:
> This patch extends struct rte_eth_fdir_flow_ext to support flow
> director in VFs.
> 
> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>

> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -394,6 +394,8 @@ struct rte_eth_fdir_flow_ext {
>  	uint16_t vlan_tci;
>  	uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
>  	/**< It is filled by the flexible payload to match. */
> +	uint8_t is_vf;   /**< 1 for VF, 0 for port dev */
> +	uint16_t dst_id; /**< VF ID, available when is_vf is 1*/
>  };

Isn't it breaking the ABI?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] rte_ring: remove deprecated functions
  2015-06-12  5:46  3%   ` Panu Matilainen
@ 2015-06-12 14:00  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-12 14:00 UTC (permalink / raw)
  To: Panu Matilainen; +Cc: dev, Stephen Hemminger

On Fri, Jun 12, 2015 at 08:46:55AM +0300, Panu Matilainen wrote:
> On 06/12/2015 08:18 AM, Stephen Hemminger wrote:
> >From: Stephen Hemminger <shemming@brocade.com>
> >
> >These were deprecated in 2.0 so remove them from 2.1
> >
> >Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >---
> >  drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
> >  drivers/net/ring/rte_eth_ring_version.map |  2 --
> >  2 files changed, 57 deletions(-)
> >
> [...]
> >diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> >index 8ad107d..0875e25 100644
> >--- a/drivers/net/ring/rte_eth_ring_version.map
> >+++ b/drivers/net/ring/rte_eth_ring_version.map
> >@@ -2,8 +2,6 @@ DPDK_2.0 {
> >  	global:
> >
> >  	rte_eth_from_rings;
> >-	rte_eth_ring_pair_attach;
> >-	rte_eth_ring_pair_create;
> >
> >  	local: *;
> >  };
> 
> Removing symbols is an ABI break so it additionally requires a soname bump
> for this library.
> 
> In addition, simply due to being the first library to do so, it'll also then
> break the combined shared library as is currently is. Mind you, this is not
> an objection at all, the need to change to a linker script approach has
> always been a matter of time.
> 
> 	- Panu -
> 
Also, patch title should be prefixed with "ring pmd" or "drivers/net/ring" rather
than rte_ring, since this is a patch for the ring pmd, not the rte_ring library.

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 0/4] ethdev: Add checks for function support in driver
@ 2015-06-12 11:28  3% Bruce Richardson
    0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2015-06-12 11:28 UTC (permalink / raw)
  To: dev

The functions to check the current occupancy of the RX descriptor ring, and
to specifically query if a particular descriptor is ready to be received, are
used for load monitoring on the data path, and so are inline functions inside
rte_ethdev.h. However, these functions are not implemented for the majority of
drivers, so their use can cause crashes in applications as there are no checks
for a function call with a NULL pointer.

This patchset attempts to fix this by putting in place the minimal check
for a NULL pointer needed before calling the function and thereby avoid any
crashes. The functions now also have a new possible return code of -ENOTSUP
but no return type changes, so ABI is preserved.

As part of the patchset, some additional cleanup is performed. The two functions
in question, as well as the rx and tx burst functions actually have two copies in
the ethdev library - one in the header and a debug version in the C file. This
patchset removes this duplication by merging the debug version into the header
file version. Build-time and runtime behaviour was preserved as part of this
merge.

NOTE: This patchset depends on Stephen Hemminger's patch to make
rte_eth_dev_is_valid_port() public: http://dpdk.org/dev/patchwork/patch/5373/ 

Bruce Richardson (4):
  ethdev: rename macros to have RTE_ETH prefix
  ethdev: move RTE_ETH_FPTR_OR_ERR macros to header
  ethdev: remove duplicated debug functions
  ethdev: check support for rx_queue_count and descriptor_done fns

 lib/librte_ether/rte_ethdev.c | 626 ++++++++++++++++++------------------------
 lib/librte_ether/rte_ethdev.h | 112 +++++---
 2 files changed, 345 insertions(+), 393 deletions(-)

-- 
2.4.2

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 0/6] query hash key size in byte
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (5 preceding siblings ...)
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 6/6] app/testpmd: show " Helin Zhang
@ 2015-06-12  9:31  0%     ` Ananyev, Konstantin
  6 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2015-06-12  9:31 UTC (permalink / raw)
  To: Zhang, Helin, dev



> -----Original Message-----
> From: Zhang, Helin
> Sent: Friday, June 12, 2015 8:34 AM
> To: dev@dpdk.org
> Cc: Cao, Min; Xu, Qian Q; Cao, Waterman; Ananyev, Konstantin; Zhang, Helin
> Subject: [PATCH v3 0/6] query hash key size in byte
> 
> As different hardware has different hash key sizes, querying it (in byte)
> per port was asked by users. Otherwise there is no convenient way to know
> the size of hash key which should be prepared.
> 
> v2 changes:
> * Disabled the code changes by default, to avoid breaking ABI compatibility.
> 
> v3 changes:
> * Moved the newly added element right after 'uint16_t reta_size', where it
>   was a padding. So it will not break any ABI compatibility, and no need to
>   disable the code changes by default.
> 
> Helin Zhang (6):
>   ethdev: add an field for querying hash key size
>   e1000: fill the hash key size
>   fm10k: fill the hash key size
>   i40e: fill the hash key size
>   ixgbe: fill the hash key size
>   app/testpmd: show the hash key size
> 
>  app/test-pmd/config.c             | 2 ++
>  drivers/net/e1000/igb_ethdev.c    | 3 +++
>  drivers/net/fm10k/fm10k_ethdev.c  | 1 +
>  drivers/net/i40e/i40e_ethdev.c    | 2 ++
>  drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
>  drivers/net/ixgbe/ixgbe_ethdev.c  | 3 +++
>  lib/librte_ether/rte_ethdev.h     | 1 +
>  7 files changed, 14 insertions(+)
> 

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] pmd: change initialization to indicate pci drivers
  @ 2015-06-12  9:13  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2015-06-12  9:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2015-05-29 08:47, Stephen Hemminger:
> Upcoming drivers will need to be able to support other bus types.
> This is a transparent change to how struct eth_driver is initialized.
> It has not function or ABI layout impact, but makes adding a later
> bus type (Xen, Hyper-V, ...) much easier.
> 
> Signed-off-by: Stpehen Hemminger <stephen@networkplumber.org>

There is a (fixed) typo in this line. Please use --signoff.

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied, thanks

Next step would be to avoid this strange assumption:
struct eth_driver {
    struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  8:28  3%                   ` Zhang, Helin
  2015-06-12  9:00  3%                     ` Panu Matilainen
@ 2015-06-12  9:07  4%                     ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2015-06-12  9:07 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Fri, Jun 12, 2015 at 08:28:55AM +0000, Zhang, Helin wrote:
> 
> 
> > -----Original Message-----
> > From: Panu Matilainen [mailto:pmatilai@redhat.com]
> > Sent: Friday, June 12, 2015 4:15 PM
> > To: Zhang, Helin; Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim;
> > nhorman@tuxdriver.com
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> > rte_mbuf
> > 
> > On 06/12/2015 10:43 AM, Zhang, Helin wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> > >> Sent: Friday, June 12, 2015 3:24 PM
> > >> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
> > >> nhorman@tuxdriver.com
> > >> Cc: dev@dpdk.org
> > >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> > >> in rte_mbuf
> > >>
> > >> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> > >>> 2015-06-10 16:32, Olivier MATZ:
> > >>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > >>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> > >>>>>>> In order to unify the packet type, the field of 'packet_type' in
> > >>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> > >>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized
> > >>>>>>> to support this change for Vector PMD. As 'struct rte_kni_mbuf'
> > >>>>>>> for KNI should be right mapped to 'struct rte_mbuf', it should
> > >>>>>>> be modified accordingly. In addition, Vector PMD of ixgbe is
> > >>>>>>> disabled by default, as 'struct rte_mbuf' changed.
> > >>>>>>> To avoid breaking ABI compatibility, all the changes would be
> > >>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> > >>>>>>
> > >>>>>> What are the plans for this compile-time option in the future?
> > >>>>>>
> > >>>>>> I wonder what are the benefits of having this option in terms of
> > >>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
> > >>>>>> the packet-type feature is not present, and when it is enabled we
> > >>>>>> have the feature but it breaks the compatibility.
> > >>>>>>
> > >>>>>> In my opinion, the v5 is preferable: for this kind of features, I
> > >>>>>> don't see how the ABI can be preserved, and I think packet-type
> > >>>>>> won't be the only feature that will modify the mbuf structure. I
> > >>>>>> think the process described here should be applied:
> > >>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> > >>>>>>
> > >>>>>> (starting from "Some ABI changes may be too significant to
> > >>>>>> reasonably maintain multiple versions of").
> > >>>>>
> > >>>>> This is just like the change that Steve (Cunming) Liang submitted
> > >>>>> for Interrupt Mode. We have the same problem in both cases: we
> > >>>>> want to find a way to get the features included, but need to
> > >>>>> comply with our ABI policy. So, in both cases, the proposal is to
> > >>>>> add a config option to enable the change by default, so we
> > >>>>> maintain backward
> > >> compatibility.
> > >>>>> Users that want these changes, and are willing to accept the
> > >>>>> associated ABI change, have to specifically enable them.
> > >>>>>
> > >>>>> We can note in the Deprecation Notices in the Release Notes for
> > >>>>> 2.1 that these config options will be removed in 2.2. The features
> > >>>>> will then be enabled by default.
> > >>>>>
> > >>>>> This seems like a good compromise which allows us to get these
> > >>>>> changes into 2.1 but avoids breaking the ABI policy.
> > >>>>
> > >>>> Sorry for the late answer.
> > >>>>
> > >>>> After some thoughts on this topic, I understand that having a
> > >>>> compile-time option is perhaps a good compromise between keeping
> > >>>> compatibility and having new features earlier.
> > >>>>
> > >>>> I'm just afraid about having one #ifdef in the code for each new
> > >>>> feature that cannot keep the ABI compatibility.
> > >>>> What do you think about having one option -- let's call it
> > >>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that
> > >>>> would surround any new feature that breaks the ABI?
> > >>>>
> > >>>> This would have several advantages:
> > >>>> - only 2 cases (on or off), the combinatorial is smaller than
> > >>>>      having one option per feature
> > >>>> - all next features breaking the abi can be identified by a grep
> > >>>> - the code inside the #ifdef can be enabled in a simple operation
> > >>>>      by Thomas after each release.
> > >>>>
> > >>>> Thomas, any comment?
> > >>>
> > >>> As previously discussed (1to1) with Olivier, I think that's a good
> > >>> proposal to introduce changes breaking deeply the ABI.
> > >>>
> > >>> Let's sum up the current policy:
> > >>> 1/ For changes which have a limited impact on the ABI, the backward
> > >>> compatibility must be kept during 1 release including the notice in
> > >> doc/guides/rel_notes/abi.rst.
> > >>> 2/ For important changes like mbuf rework, there was an agreement on
> > >>> skipping the backward compatibility after having 3 acknowledgements
> > >>> and an
> > >> 1-release long notice.
> > >>> Then the ABI numbering must be incremented.
> > >>>
> > >>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the
> > >>> second
> > >> case.
> > >>> In order to be adopted, a patch for the file
> > >>> doc/guides/rel_notes/abi.rst must be submitted and strongly
> > acknowledged.
> > >>>
> > >>> The ABI numbering must be also clearly explained:
> > >>> 1/ Should we have different libraries version number depending of
> > >> CONFIG_RTE_NEXT_ABI?
> > >>> It seems straightforward to use "ifeq" when LIBABIVER in the
> > >>> Makefiles
> > >>
> > >> An incompatible ABI must be reflected by a soname change, otherwise
> > >> the whole library versioning is irrelevant.
> > >>
> > >>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
> > >> the .map files?
> > >>> Maybe we should remove these files and generate them with some
> > >> preprocessing.
> > >>>
> > >>> Neil, as the ABI policy author, what is your opinion?
> > >>
> > >> I'm not Neil but my 5c...
> > >>
> > >> Working around ABI compatibility policy via config options seems like
> > >> a slippery slope. Going forward this will likely mean there are
> > >> always two different ABIs for any given version, and the thought of
> > >> keeping track of it all in a truly compatible manner makes my head hurt.
> > >>
> > >> That said its easy to understand the desire to move faster than the
> > >> ABI policy allows. In a project where so many structs are in the open
> > >> it gets hard to do much anything at all without breaking the ABI.
> > >>
> > >> The issue could be mitigated somewhat by reserving some space at the
> > >> end of the structs eg when the ABI needs to be changed anyway, but it
> > >> has obvious downsides as well. The other options I see tend to
> > >> revolve around changing release policies one way or the other:
> > >> releasing ABI compatible micro versions between minor versions and
> > >> relaxing the ABI policy a bit, or just releasing new minor versions more often
> > than the current cycle.
> > >>
> > >> 	- Panu -
> > >
> > > Does it mean releasing R2.01 right now with announcement of all ABI
> > > changes, which based on R2.0 first, and then releasing R2.1 several weeks later
> > with all the code changes?
> > 
> > Something like that, but I'd think its too late for any big release model / policy
> > changes for this particular cycle.
> > 
> > I also do not want to undermine the ABI policy we just got in place, but since
> > people are actively looking for ways to work around it anyway its better to map
> > out all the possibilities. One of them is committing to longer term maintenance of
> > releases (via ABI compatible micro version updates), another one is shortening
> > the cycles. Both achieve roughly the same goals with differences in emphasis
> > perhaps, but more releases requires more resources on maintaining, testing etc
> > so...
> R2.01 could just have all the same of R2.0, with an additional ABI announcement.
> Then nothing needs to be tested.
> 
> - Helin
> 
Then it would be a paper exercise just to bypass an ABI policy, so NACK to that idea.

If (and it's a fairly big if) we do decide we need longer-term maintenance 
branches for maintaining ABI, then we need to do it properly.
This may including doing things liek back-porting relevant (maybe all) features from
later releases that don't break the ABI to the supported version. Bug fixes would
obviously have to be backported.

However, the overhead of this is obvious, since we would now have multiple development
lines to be maintained. 

Regards,
/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  8:28  3%                   ` Zhang, Helin
@ 2015-06-12  9:00  3%                     ` Panu Matilainen
  2015-06-12  9:07  4%                     ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-12  9:00 UTC (permalink / raw)
  To: Zhang, Helin, Thomas Monjalon, Olivier MATZ, O'Driscoll, Tim,
	nhorman
  Cc: dev

On 06/12/2015 11:28 AM, Zhang, Helin wrote:
>
>
>> -----Original Message-----
>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>> Sent: Friday, June 12, 2015 4:15 PM
>> To: Zhang, Helin; Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim;
>> nhorman@tuxdriver.com
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
>> rte_mbuf
>>
>> On 06/12/2015 10:43 AM, Zhang, Helin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>>>> Sent: Friday, June 12, 2015 3:24 PM
>>>> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
>>>> nhorman@tuxdriver.com
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
>>>> in rte_mbuf
>>>>
>>>> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
>>>>> 2015-06-10 16:32, Olivier MATZ:
>>>>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>>>>>>>> In order to unify the packet type, the field of 'packet_type' in
>>>>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>>>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized
>>>>>>>>> to support this change for Vector PMD. As 'struct rte_kni_mbuf'
>>>>>>>>> for KNI should be right mapped to 'struct rte_mbuf', it should
>>>>>>>>> be modified accordingly. In addition, Vector PMD of ixgbe is
>>>>>>>>> disabled by default, as 'struct rte_mbuf' changed.
>>>>>>>>> To avoid breaking ABI compatibility, all the changes would be
>>>>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>>>>>>>
>>>>>>>> What are the plans for this compile-time option in the future?
>>>>>>>>
>>>>>>>> I wonder what are the benefits of having this option in terms of
>>>>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
>>>>>>>> the packet-type feature is not present, and when it is enabled we
>>>>>>>> have the feature but it breaks the compatibility.
>>>>>>>>
>>>>>>>> In my opinion, the v5 is preferable: for this kind of features, I
>>>>>>>> don't see how the ABI can be preserved, and I think packet-type
>>>>>>>> won't be the only feature that will modify the mbuf structure. I
>>>>>>>> think the process described here should be applied:
>>>>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>>>>>>>
>>>>>>>> (starting from "Some ABI changes may be too significant to
>>>>>>>> reasonably maintain multiple versions of").
>>>>>>>
>>>>>>> This is just like the change that Steve (Cunming) Liang submitted
>>>>>>> for Interrupt Mode. We have the same problem in both cases: we
>>>>>>> want to find a way to get the features included, but need to
>>>>>>> comply with our ABI policy. So, in both cases, the proposal is to
>>>>>>> add a config option to enable the change by default, so we
>>>>>>> maintain backward
>>>> compatibility.
>>>>>>> Users that want these changes, and are willing to accept the
>>>>>>> associated ABI change, have to specifically enable them.
>>>>>>>
>>>>>>> We can note in the Deprecation Notices in the Release Notes for
>>>>>>> 2.1 that these config options will be removed in 2.2. The features
>>>>>>> will then be enabled by default.
>>>>>>>
>>>>>>> This seems like a good compromise which allows us to get these
>>>>>>> changes into 2.1 but avoids breaking the ABI policy.
>>>>>>
>>>>>> Sorry for the late answer.
>>>>>>
>>>>>> After some thoughts on this topic, I understand that having a
>>>>>> compile-time option is perhaps a good compromise between keeping
>>>>>> compatibility and having new features earlier.
>>>>>>
>>>>>> I'm just afraid about having one #ifdef in the code for each new
>>>>>> feature that cannot keep the ABI compatibility.
>>>>>> What do you think about having one option -- let's call it
>>>>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that
>>>>>> would surround any new feature that breaks the ABI?
>>>>>>
>>>>>> This would have several advantages:
>>>>>> - only 2 cases (on or off), the combinatorial is smaller than
>>>>>>       having one option per feature
>>>>>> - all next features breaking the abi can be identified by a grep
>>>>>> - the code inside the #ifdef can be enabled in a simple operation
>>>>>>       by Thomas after each release.
>>>>>>
>>>>>> Thomas, any comment?
>>>>>
>>>>> As previously discussed (1to1) with Olivier, I think that's a good
>>>>> proposal to introduce changes breaking deeply the ABI.
>>>>>
>>>>> Let's sum up the current policy:
>>>>> 1/ For changes which have a limited impact on the ABI, the backward
>>>>> compatibility must be kept during 1 release including the notice in
>>>> doc/guides/rel_notes/abi.rst.
>>>>> 2/ For important changes like mbuf rework, there was an agreement on
>>>>> skipping the backward compatibility after having 3 acknowledgements
>>>>> and an
>>>> 1-release long notice.
>>>>> Then the ABI numbering must be incremented.
>>>>>
>>>>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the
>>>>> second
>>>> case.
>>>>> In order to be adopted, a patch for the file
>>>>> doc/guides/rel_notes/abi.rst must be submitted and strongly
>> acknowledged.
>>>>>
>>>>> The ABI numbering must be also clearly explained:
>>>>> 1/ Should we have different libraries version number depending of
>>>> CONFIG_RTE_NEXT_ABI?
>>>>> It seems straightforward to use "ifeq" when LIBABIVER in the
>>>>> Makefiles
>>>>
>>>> An incompatible ABI must be reflected by a soname change, otherwise
>>>> the whole library versioning is irrelevant.
>>>>
>>>>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
>>>> the .map files?
>>>>> Maybe we should remove these files and generate them with some
>>>> preprocessing.
>>>>>
>>>>> Neil, as the ABI policy author, what is your opinion?
>>>>
>>>> I'm not Neil but my 5c...
>>>>
>>>> Working around ABI compatibility policy via config options seems like
>>>> a slippery slope. Going forward this will likely mean there are
>>>> always two different ABIs for any given version, and the thought of
>>>> keeping track of it all in a truly compatible manner makes my head hurt.
>>>>
>>>> That said its easy to understand the desire to move faster than the
>>>> ABI policy allows. In a project where so many structs are in the open
>>>> it gets hard to do much anything at all without breaking the ABI.
>>>>
>>>> The issue could be mitigated somewhat by reserving some space at the
>>>> end of the structs eg when the ABI needs to be changed anyway, but it
>>>> has obvious downsides as well. The other options I see tend to
>>>> revolve around changing release policies one way or the other:
>>>> releasing ABI compatible micro versions between minor versions and
>>>> relaxing the ABI policy a bit, or just releasing new minor versions more often
>> than the current cycle.
>>>>
>>>> 	- Panu -
>>>
>>> Does it mean releasing R2.01 right now with announcement of all ABI
>>> changes, which based on R2.0 first, and then releasing R2.1 several weeks later
>> with all the code changes?
>>
>> Something like that, but I'd think its too late for any big release model / policy
>> changes for this particular cycle.
>>
>> I also do not want to undermine the ABI policy we just got in place, but since
>> people are actively looking for ways to work around it anyway its better to map
>> out all the possibilities. One of them is committing to longer term maintenance of
>> releases (via ABI compatible micro version updates), another one is shortening
>> the cycles. Both achieve roughly the same goals with differences in emphasis
>> perhaps, but more releases requires more resources on maintaining, testing etc
>> so...
> R2.01 could just have all the same of R2.0, with an additional ABI announcement.
> Then nothing needs to be tested.

That's also entirely missing the point of having an ABI policy in the 
first place. Its purpose is not to force people to find loopholes in the 
policy but for the benefit of other developers building apps on top of DPDK.

	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  8:15  4%                 ` Panu Matilainen
@ 2015-06-12  8:28  3%                   ` Zhang, Helin
  2015-06-12  9:00  3%                     ` Panu Matilainen
  2015-06-12  9:07  4%                     ` Bruce Richardson
  0 siblings, 2 replies; 200+ results
From: Zhang, Helin @ 2015-06-12  8:28 UTC (permalink / raw)
  To: Panu Matilainen, Thomas Monjalon, Olivier MATZ, O'Driscoll,
	Tim, nhorman
  Cc: dev



> -----Original Message-----
> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> Sent: Friday, June 12, 2015 4:15 PM
> To: Zhang, Helin; Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim;
> nhorman@tuxdriver.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> On 06/12/2015 10:43 AM, Zhang, Helin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> >> Sent: Friday, June 12, 2015 3:24 PM
> >> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
> >> nhorman@tuxdriver.com
> >> Cc: dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> >> in rte_mbuf
> >>
> >> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> >>> 2015-06-10 16:32, Olivier MATZ:
> >>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>>>>>> In order to unify the packet type, the field of 'packet_type' in
> >>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized
> >>>>>>> to support this change for Vector PMD. As 'struct rte_kni_mbuf'
> >>>>>>> for KNI should be right mapped to 'struct rte_mbuf', it should
> >>>>>>> be modified accordingly. In addition, Vector PMD of ixgbe is
> >>>>>>> disabled by default, as 'struct rte_mbuf' changed.
> >>>>>>> To avoid breaking ABI compatibility, all the changes would be
> >>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>>>>>
> >>>>>> What are the plans for this compile-time option in the future?
> >>>>>>
> >>>>>> I wonder what are the benefits of having this option in terms of
> >>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
> >>>>>> the packet-type feature is not present, and when it is enabled we
> >>>>>> have the feature but it breaks the compatibility.
> >>>>>>
> >>>>>> In my opinion, the v5 is preferable: for this kind of features, I
> >>>>>> don't see how the ABI can be preserved, and I think packet-type
> >>>>>> won't be the only feature that will modify the mbuf structure. I
> >>>>>> think the process described here should be applied:
> >>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>>>>>
> >>>>>> (starting from "Some ABI changes may be too significant to
> >>>>>> reasonably maintain multiple versions of").
> >>>>>
> >>>>> This is just like the change that Steve (Cunming) Liang submitted
> >>>>> for Interrupt Mode. We have the same problem in both cases: we
> >>>>> want to find a way to get the features included, but need to
> >>>>> comply with our ABI policy. So, in both cases, the proposal is to
> >>>>> add a config option to enable the change by default, so we
> >>>>> maintain backward
> >> compatibility.
> >>>>> Users that want these changes, and are willing to accept the
> >>>>> associated ABI change, have to specifically enable them.
> >>>>>
> >>>>> We can note in the Deprecation Notices in the Release Notes for
> >>>>> 2.1 that these config options will be removed in 2.2. The features
> >>>>> will then be enabled by default.
> >>>>>
> >>>>> This seems like a good compromise which allows us to get these
> >>>>> changes into 2.1 but avoids breaking the ABI policy.
> >>>>
> >>>> Sorry for the late answer.
> >>>>
> >>>> After some thoughts on this topic, I understand that having a
> >>>> compile-time option is perhaps a good compromise between keeping
> >>>> compatibility and having new features earlier.
> >>>>
> >>>> I'm just afraid about having one #ifdef in the code for each new
> >>>> feature that cannot keep the ABI compatibility.
> >>>> What do you think about having one option -- let's call it
> >>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that
> >>>> would surround any new feature that breaks the ABI?
> >>>>
> >>>> This would have several advantages:
> >>>> - only 2 cases (on or off), the combinatorial is smaller than
> >>>>      having one option per feature
> >>>> - all next features breaking the abi can be identified by a grep
> >>>> - the code inside the #ifdef can be enabled in a simple operation
> >>>>      by Thomas after each release.
> >>>>
> >>>> Thomas, any comment?
> >>>
> >>> As previously discussed (1to1) with Olivier, I think that's a good
> >>> proposal to introduce changes breaking deeply the ABI.
> >>>
> >>> Let's sum up the current policy:
> >>> 1/ For changes which have a limited impact on the ABI, the backward
> >>> compatibility must be kept during 1 release including the notice in
> >> doc/guides/rel_notes/abi.rst.
> >>> 2/ For important changes like mbuf rework, there was an agreement on
> >>> skipping the backward compatibility after having 3 acknowledgements
> >>> and an
> >> 1-release long notice.
> >>> Then the ABI numbering must be incremented.
> >>>
> >>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the
> >>> second
> >> case.
> >>> In order to be adopted, a patch for the file
> >>> doc/guides/rel_notes/abi.rst must be submitted and strongly
> acknowledged.
> >>>
> >>> The ABI numbering must be also clearly explained:
> >>> 1/ Should we have different libraries version number depending of
> >> CONFIG_RTE_NEXT_ABI?
> >>> It seems straightforward to use "ifeq" when LIBABIVER in the
> >>> Makefiles
> >>
> >> An incompatible ABI must be reflected by a soname change, otherwise
> >> the whole library versioning is irrelevant.
> >>
> >>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
> >> the .map files?
> >>> Maybe we should remove these files and generate them with some
> >> preprocessing.
> >>>
> >>> Neil, as the ABI policy author, what is your opinion?
> >>
> >> I'm not Neil but my 5c...
> >>
> >> Working around ABI compatibility policy via config options seems like
> >> a slippery slope. Going forward this will likely mean there are
> >> always two different ABIs for any given version, and the thought of
> >> keeping track of it all in a truly compatible manner makes my head hurt.
> >>
> >> That said its easy to understand the desire to move faster than the
> >> ABI policy allows. In a project where so many structs are in the open
> >> it gets hard to do much anything at all without breaking the ABI.
> >>
> >> The issue could be mitigated somewhat by reserving some space at the
> >> end of the structs eg when the ABI needs to be changed anyway, but it
> >> has obvious downsides as well. The other options I see tend to
> >> revolve around changing release policies one way or the other:
> >> releasing ABI compatible micro versions between minor versions and
> >> relaxing the ABI policy a bit, or just releasing new minor versions more often
> than the current cycle.
> >>
> >> 	- Panu -
> >
> > Does it mean releasing R2.01 right now with announcement of all ABI
> > changes, which based on R2.0 first, and then releasing R2.1 several weeks later
> with all the code changes?
> 
> Something like that, but I'd think its too late for any big release model / policy
> changes for this particular cycle.
> 
> I also do not want to undermine the ABI policy we just got in place, but since
> people are actively looking for ways to work around it anyway its better to map
> out all the possibilities. One of them is committing to longer term maintenance of
> releases (via ABI compatible micro version updates), another one is shortening
> the cycles. Both achieve roughly the same goals with differences in emphasis
> perhaps, but more releases requires more resources on maintaining, testing etc
> so...
R2.01 could just have all the same of R2.0, with an additional ABI announcement.
Then nothing needs to be tested.

- Helin

> 
> 	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  7:43  3%               ` Zhang, Helin
@ 2015-06-12  8:15  4%                 ` Panu Matilainen
  2015-06-12  8:28  3%                   ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-12  8:15 UTC (permalink / raw)
  To: Zhang, Helin, Thomas Monjalon, Olivier MATZ, O'Driscoll, Tim,
	nhorman
  Cc: dev

On 06/12/2015 10:43 AM, Zhang, Helin wrote:
>
>
>> -----Original Message-----
>> From: Panu Matilainen [mailto:pmatilai@redhat.com]
>> Sent: Friday, June 12, 2015 3:24 PM
>> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
>> nhorman@tuxdriver.com
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
>> rte_mbuf
>>
>> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
>>> 2015-06-10 16:32, Olivier MATZ:
>>>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>>>>>> In order to unify the packet type, the field of 'packet_type' in
>>>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
>>>>>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
>>>>>>> KNI should be right mapped to 'struct rte_mbuf', it should be
>>>>>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
>>>>>>> by default, as 'struct rte_mbuf' changed.
>>>>>>> To avoid breaking ABI compatibility, all the changes would be
>>>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>>>>>
>>>>>> What are the plans for this compile-time option in the future?
>>>>>>
>>>>>> I wonder what are the benefits of having this option in terms of
>>>>>> ABI compatibility: when it is disabled, it is ABI-compatible but
>>>>>> the packet-type feature is not present, and when it is enabled we
>>>>>> have the feature but it breaks the compatibility.
>>>>>>
>>>>>> In my opinion, the v5 is preferable: for this kind of features, I
>>>>>> don't see how the ABI can be preserved, and I think packet-type
>>>>>> won't be the only feature that will modify the mbuf structure. I
>>>>>> think the process described here should be applied:
>>>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>>>>>
>>>>>> (starting from "Some ABI changes may be too significant to
>>>>>> reasonably maintain multiple versions of").
>>>>>
>>>>> This is just like the change that Steve (Cunming) Liang submitted
>>>>> for Interrupt Mode. We have the same problem in both cases: we want
>>>>> to find a way to get the features included, but need to comply with
>>>>> our ABI policy. So, in both cases, the proposal is to add a config
>>>>> option to enable the change by default, so we maintain backward
>> compatibility.
>>>>> Users that want these changes, and are willing to accept the
>>>>> associated ABI change, have to specifically enable them.
>>>>>
>>>>> We can note in the Deprecation Notices in the Release Notes for 2.1
>>>>> that these config options will be removed in 2.2. The features will
>>>>> then be enabled by default.
>>>>>
>>>>> This seems like a good compromise which allows us to get these
>>>>> changes into 2.1 but avoids breaking the ABI policy.
>>>>
>>>> Sorry for the late answer.
>>>>
>>>> After some thoughts on this topic, I understand that having a
>>>> compile-time option is perhaps a good compromise between keeping
>>>> compatibility and having new features earlier.
>>>>
>>>> I'm just afraid about having one #ifdef in the code for each new
>>>> feature that cannot keep the ABI compatibility.
>>>> What do you think about having one option -- let's call it
>>>> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would
>>>> surround any new feature that breaks the ABI?
>>>>
>>>> This would have several advantages:
>>>> - only 2 cases (on or off), the combinatorial is smaller than
>>>>      having one option per feature
>>>> - all next features breaking the abi can be identified by a grep
>>>> - the code inside the #ifdef can be enabled in a simple operation
>>>>      by Thomas after each release.
>>>>
>>>> Thomas, any comment?
>>>
>>> As previously discussed (1to1) with Olivier, I think that's a good
>>> proposal to introduce changes breaking deeply the ABI.
>>>
>>> Let's sum up the current policy:
>>> 1/ For changes which have a limited impact on the ABI, the backward
>>> compatibility must be kept during 1 release including the notice in
>> doc/guides/rel_notes/abi.rst.
>>> 2/ For important changes like mbuf rework, there was an agreement on
>>> skipping the backward compatibility after having 3 acknowledgements and an
>> 1-release long notice.
>>> Then the ABI numbering must be incremented.
>>>
>>> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second
>> case.
>>> In order to be adopted, a patch for the file
>>> doc/guides/rel_notes/abi.rst must be submitted and strongly acknowledged.
>>>
>>> The ABI numbering must be also clearly explained:
>>> 1/ Should we have different libraries version number depending of
>> CONFIG_RTE_NEXT_ABI?
>>> It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles
>>
>> An incompatible ABI must be reflected by a soname change, otherwise the
>> whole library versioning is irrelevant.
>>
>>> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
>> the .map files?
>>> Maybe we should remove these files and generate them with some
>> preprocessing.
>>>
>>> Neil, as the ABI policy author, what is your opinion?
>>
>> I'm not Neil but my 5c...
>>
>> Working around ABI compatibility policy via config options seems like a slippery
>> slope. Going forward this will likely mean there are always two different ABIs for
>> any given version, and the thought of keeping track of it all in a truly compatible
>> manner makes my head hurt.
>>
>> That said its easy to understand the desire to move faster than the ABI policy
>> allows. In a project where so many structs are in the open it gets hard to do much
>> anything at all without breaking the ABI.
>>
>> The issue could be mitigated somewhat by reserving some space at the end of
>> the structs eg when the ABI needs to be changed anyway, but it has obvious
>> downsides as well. The other options I see tend to revolve around changing
>> release policies one way or the other: releasing ABI compatible micro versions
>> between minor versions and relaxing the ABI policy a bit, or just releasing new
>> minor versions more often than the current cycle.
>>
>> 	- Panu -
>
> Does it mean releasing R2.01 right now with announcement of all ABI changes, which
> based on R2.0 first, and then releasing R2.1 several weeks later with all the code changes?

Something like that, but I'd think its too late for any big release 
model / policy changes for this particular cycle.

I also do not want to undermine the ABI policy we just got in place, but 
since people are actively looking for ways to work around it anyway its 
better to map out all the possibilities. One of them is committing to 
longer term maintenance of releases (via ABI compatible micro version 
updates), another one is shortening the cycles. Both achieve roughly the 
same goals with differences in emphasis perhaps, but more releases 
requires more resources on maintaining, testing etc so...

	- Panu -

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-12  7:24  5%             ` Panu Matilainen
@ 2015-06-12  7:43  3%               ` Zhang, Helin
  2015-06-12  8:15  4%                 ` Panu Matilainen
  0 siblings, 1 reply; 200+ results
From: Zhang, Helin @ 2015-06-12  7:43 UTC (permalink / raw)
  To: Panu Matilainen, Thomas Monjalon, Olivier MATZ, O'Driscoll,
	Tim, nhorman
  Cc: dev



> -----Original Message-----
> From: Panu Matilainen [mailto:pmatilai@redhat.com]
> Sent: Friday, June 12, 2015 3:24 PM
> To: Thomas Monjalon; Olivier MATZ; O'Driscoll, Tim; Zhang, Helin;
> nhorman@tuxdriver.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> > 2015-06-10 16:32, Olivier MATZ:
> >> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>>>> In order to unify the packet type, the field of 'packet_type' in
> >>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> >>>>> KNI should be right mapped to 'struct rte_mbuf', it should be
> >>>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> >>>>> by default, as 'struct rte_mbuf' changed.
> >>>>> To avoid breaking ABI compatibility, all the changes would be
> >>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>>>
> >>>> What are the plans for this compile-time option in the future?
> >>>>
> >>>> I wonder what are the benefits of having this option in terms of
> >>>> ABI compatibility: when it is disabled, it is ABI-compatible but
> >>>> the packet-type feature is not present, and when it is enabled we
> >>>> have the feature but it breaks the compatibility.
> >>>>
> >>>> In my opinion, the v5 is preferable: for this kind of features, I
> >>>> don't see how the ABI can be preserved, and I think packet-type
> >>>> won't be the only feature that will modify the mbuf structure. I
> >>>> think the process described here should be applied:
> >>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>>>
> >>>> (starting from "Some ABI changes may be too significant to
> >>>> reasonably maintain multiple versions of").
> >>>
> >>> This is just like the change that Steve (Cunming) Liang submitted
> >>> for Interrupt Mode. We have the same problem in both cases: we want
> >>> to find a way to get the features included, but need to comply with
> >>> our ABI policy. So, in both cases, the proposal is to add a config
> >>> option to enable the change by default, so we maintain backward
> compatibility.
> >>> Users that want these changes, and are willing to accept the
> >>> associated ABI change, have to specifically enable them.
> >>>
> >>> We can note in the Deprecation Notices in the Release Notes for 2.1
> >>> that these config options will be removed in 2.2. The features will
> >>> then be enabled by default.
> >>>
> >>> This seems like a good compromise which allows us to get these
> >>> changes into 2.1 but avoids breaking the ABI policy.
> >>
> >> Sorry for the late answer.
> >>
> >> After some thoughts on this topic, I understand that having a
> >> compile-time option is perhaps a good compromise between keeping
> >> compatibility and having new features earlier.
> >>
> >> I'm just afraid about having one #ifdef in the code for each new
> >> feature that cannot keep the ABI compatibility.
> >> What do you think about having one option -- let's call it
> >> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would
> >> surround any new feature that breaks the ABI?
> >>
> >> This would have several advantages:
> >> - only 2 cases (on or off), the combinatorial is smaller than
> >>     having one option per feature
> >> - all next features breaking the abi can be identified by a grep
> >> - the code inside the #ifdef can be enabled in a simple operation
> >>     by Thomas after each release.
> >>
> >> Thomas, any comment?
> >
> > As previously discussed (1to1) with Olivier, I think that's a good
> > proposal to introduce changes breaking deeply the ABI.
> >
> > Let's sum up the current policy:
> > 1/ For changes which have a limited impact on the ABI, the backward
> > compatibility must be kept during 1 release including the notice in
> doc/guides/rel_notes/abi.rst.
> > 2/ For important changes like mbuf rework, there was an agreement on
> > skipping the backward compatibility after having 3 acknowledgements and an
> 1-release long notice.
> > Then the ABI numbering must be incremented.
> >
> > This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second
> case.
> > In order to be adopted, a patch for the file
> > doc/guides/rel_notes/abi.rst must be submitted and strongly acknowledged.
> >
> > The ABI numbering must be also clearly explained:
> > 1/ Should we have different libraries version number depending of
> CONFIG_RTE_NEXT_ABI?
> > It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles
> 
> An incompatible ABI must be reflected by a soname change, otherwise the
> whole library versioning is irrelevant.
> 
> > 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in
> the .map files?
> > Maybe we should remove these files and generate them with some
> preprocessing.
> >
> > Neil, as the ABI policy author, what is your opinion?
> 
> I'm not Neil but my 5c...
> 
> Working around ABI compatibility policy via config options seems like a slippery
> slope. Going forward this will likely mean there are always two different ABIs for
> any given version, and the thought of keeping track of it all in a truly compatible
> manner makes my head hurt.
> 
> That said its easy to understand the desire to move faster than the ABI policy
> allows. In a project where so many structs are in the open it gets hard to do much
> anything at all without breaking the ABI.
> 
> The issue could be mitigated somewhat by reserving some space at the end of
> the structs eg when the ABI needs to be changed anyway, but it has obvious
> downsides as well. The other options I see tend to revolve around changing
> release policies one way or the other: releasing ABI compatible micro versions
> between minor versions and relaxing the ABI policy a bit, or just releasing new
> minor versions more often than the current cycle.
> 
> 	- Panu -

Does it mean releasing R2.01 right now with announcement of all ABI changes, which
based on R2.0 first, and then releasing R2.1 several weeks later with all the code changes?

- Helin


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 3/6] fm10k: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 2/6] e1000: fill the " Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Helin Zhang
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/fm10k/fm10k_ethdev.c | 1 +
 1 file changed, 1 insertion(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 87852ed..bd626ce 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -767,6 +767,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 		DEV_RX_OFFLOAD_UDP_CKSUM  |
 		DEV_RX_OFFLOAD_TCP_CKSUM;
 	dev_info->tx_offload_capa    = 0;
+	dev_info->hash_key_size = FM10K_RSSRK_SIZE * sizeof(uint32_t);
 	dev_info->reta_size = FM10K_MAX_RSS_INDICES;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 0/6] query hash key size in byte
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (4 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: show " Helin Zhang
@ 2015-06-12  7:33  4%   ` Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
                       ` (6 more replies)
  5 siblings, 7 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

As different hardware has different hash key sizes, querying it (in byte)
per port was asked by users. Otherwise there is no convenient way to know
the size of hash key which should be prepared.

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* Moved the newly added element right after 'uint16_t reta_size', where it
  was a padding. So it will not break any ABI compatibility, and no need to
  disable the code changes by default.

Helin Zhang (6):
  ethdev: add an field for querying hash key size
  e1000: fill the hash key size
  fm10k: fill the hash key size
  i40e: fill the hash key size
  ixgbe: fill the hash key size
  app/testpmd: show the hash key size

 app/test-pmd/config.c             | 2 ++
 drivers/net/e1000/igb_ethdev.c    | 3 +++
 drivers/net/fm10k/fm10k_ethdev.c  | 1 +
 drivers/net/i40e/i40e_ethdev.c    | 2 ++
 drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
 drivers/net/ixgbe/ixgbe_ethdev.c  | 3 +++
 lib/librte_ether/rte_ethdev.h     | 1 +
 7 files changed, 14 insertions(+)

-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 6/6] app/testpmd: show the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (4 preceding siblings ...)
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Helin Zhang
@ 2015-06-12  7:34  4%     ` Helin Zhang
  2015-06-12  9:31  0%     ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Ananyev, Konstantin
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:34 UTC (permalink / raw)
  To: dev

As querying hash key size in byte was supported, it can be shown
in testpmd after getting the device information if not zero.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pmd/config.c | 2 ++
 1 file changed, 2 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f788ed5..800756f 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -361,6 +361,8 @@ port_infos_display(portid_t port_id)
 
 	memset(&dev_info, 0, sizeof(dev_info));
 	rte_eth_dev_info_get(port_id, &dev_info);
+	if (dev_info.hash_key_size > 0)
+		printf("Hash key size in bytes: %u\n", dev_info.hash_key_size);
 	if (dev_info.reta_size > 0)
 		printf("Redirection table size: %u\n", dev_info.reta_size);
 	if (!dev_info.flow_type_rss_offloads)
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 5/6] ixgbe: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (3 preceding siblings ...)
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Helin Zhang
@ 2015-06-12  7:34  4%     ` Helin Zhang
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 6/6] app/testpmd: show " Helin Zhang
  2015-06-12  9:31  0%     ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Ananyev, Konstantin
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:34 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..588ccc0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -116,6 +116,8 @@
 
 #define IXGBE_QUEUE_STAT_COUNTERS (sizeof(hw_stats->qprc) / sizeof(hw_stats->qprc[0]))
 
+#define IXGBE_HKEY_MAX_INDEX 10
+
 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
@@ -2052,6 +2054,7 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
 				ETH_TXQ_FLAGS_NOOFFLOADS,
 	};
+	dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
 }
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 4/6] i40e: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
                       ` (2 preceding siblings ...)
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Helin Zhang
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 2 ++
 drivers/net/i40e/i40e_ethdev_vf.c | 2 ++
 2 files changed, 4 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..c699ddb 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1540,6 +1540,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_SCTP_CKSUM |
 		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_TCP_TSO;
+	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
 	dev_info->reta_size = pf->hash_lut_size;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 4f4404e..f70d94c 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1670,6 +1670,8 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 2/6] e1000: fill the hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Helin Zhang
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* As newly added element was put to a padding space, it will not break ABI
  compatibility, and no need to disable the code changes by default.

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..7d388f3 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -68,6 +68,8 @@
 #define IGB_DEFAULT_TX_HTHRESH      0
 #define IGB_DEFAULT_TX_WTHRESH      0
 
+#define IGB_HKEY_MAX_INDEX 10
+
 /* Bit shift and mask */
 #define IGB_4_BIT_WIDTH  (CHAR_BIT / 2)
 #define IGB_4_BIT_MASK   RTE_LEN2MASK(IGB_4_BIT_WIDTH, uint8_t)
@@ -1377,6 +1379,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		/* Should not happen */
 		break;
 	}
+	dev_info->hash_key_size = IGB_HKEY_MAX_INDEX * sizeof(uint32_t);
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IGB_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
@ 2015-06-12  7:33  4%     ` Helin Zhang
  2015-06-15 15:01  3%       ` Thomas Monjalon
  2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 2/6] e1000: fill the " Helin Zhang
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 200+ results
From: Helin Zhang @ 2015-06-12  7:33 UTC (permalink / raw)
  To: dev

To support querying hash key size per port, an new field of
'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
hash key size in bytes.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 1 +
 1 file changed, 1 insertion(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

v3 changes:
* Moved the newly added element right after 'uint16_t reta_size', where it
  was a padding. So it will not break any ABI compatibility, and no need to
  disable it by default.

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..bce152d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -918,6 +918,7 @@ struct rte_eth_dev_info {
 	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
 	uint16_t reta_size;
 	/**< Device redirection table size, the total number of entries. */
+	uint8_t hash_key_size; /**< Hash key size in bytes */
 	/** Bit mask of RSS offloads, the bit offset also means flow type */
 	uint64_t flow_type_rss_offloads;
 	struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */
-- 
1.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 16:14  5%           ` Thomas Monjalon
@ 2015-06-12  7:24  5%             ` Panu Matilainen
  2015-06-12  7:43  3%               ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-12  7:24 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier MATZ, O'Driscoll, Tim, Zhang, Helin,
	nhorman
  Cc: dev

On 06/10/2015 07:14 PM, Thomas Monjalon wrote:
> 2015-06-10 16:32, Olivier MATZ:
>> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>>>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>>>> In order to unify the packet type, the field of 'packet_type' in
>>>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
>>>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
>>>>> KNI should be right mapped to 'struct rte_mbuf', it should be
>>>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
>>>>> by default, as 'struct rte_mbuf' changed.
>>>>> To avoid breaking ABI compatibility, all the changes would be
>>>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>>>
>>>> What are the plans for this compile-time option in the future?
>>>>
>>>> I wonder what are the benefits of having this option in terms
>>>> of ABI compatibility: when it is disabled, it is ABI-compatible but
>>>> the packet-type feature is not present, and when it is enabled we
>>>> have the feature but it breaks the compatibility.
>>>>
>>>> In my opinion, the v5 is preferable: for this kind of features, I
>>>> don't see how the ABI can be preserved, and I think packet-type
>>>> won't be the only feature that will modify the mbuf structure. I think
>>>> the process described here should be applied:
>>>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>>>
>>>> (starting from "Some ABI changes may be too significant to reasonably
>>>> maintain multiple versions of").
>>>
>>> This is just like the change that Steve (Cunming) Liang submitted for
>>> Interrupt Mode. We have the same problem in both cases: we want to find
>>> a way to get the features included, but need to comply with our ABI
>>> policy. So, in both cases, the proposal is to add a config option to
>>> enable the change by default, so we maintain backward compatibility.
>>> Users that want these changes, and are willing to accept the
>>> associated ABI change, have to specifically enable them.
>>>
>>> We can note in the Deprecation Notices in the Release Notes for 2.1
>>> that these config options will be removed in 2.2. The features will
>>> then be enabled by default.
>>>
>>> This seems like a good compromise which allows us to get these changes
>>> into 2.1 but avoids breaking the ABI policy.
>>
>> Sorry for the late answer.
>>
>> After some thoughts on this topic, I understand that having a
>> compile-time option is perhaps a good compromise between
>> keeping compatibility and having new features earlier.
>>
>> I'm just afraid about having one #ifdef in the code for
>> each new feature that cannot keep the ABI compatibility.
>> What do you think about having one option -- let's call
>> it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
>> and that would surround any new feature that breaks the
>> ABI?
>>
>> This would have several advantages:
>> - only 2 cases (on or off), the combinatorial is smaller than
>>     having one option per feature
>> - all next features breaking the abi can be identified by a grep
>> - the code inside the #ifdef can be enabled in a simple operation
>>     by Thomas after each release.
>>
>> Thomas, any comment?
>
> As previously discussed (1to1) with Olivier, I think that's a good proposal
> to introduce changes breaking deeply the ABI.
>
> Let's sum up the current policy:
> 1/ For changes which have a limited impact on the ABI, the backward compatibility
> must be kept during 1 release including the notice in doc/guides/rel_notes/abi.rst.
> 2/ For important changes like mbuf rework, there was an agreement on skipping the
> backward compatibility after having 3 acknowledgements and an 1-release long notice.
> Then the ABI numbering must be incremented.
>
> This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second case.
> In order to be adopted, a patch for the file doc/guides/rel_notes/abi.rst must
> be submitted and strongly acknowledged.
>
> The ABI numbering must be also clearly explained:
> 1/ Should we have different libraries version number depending of CONFIG_RTE_NEXT_ABI?
> It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles

An incompatible ABI must be reflected by a soname change, otherwise the 
whole library versioning is irrelevant.

> 2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in the .map files?
> Maybe we should remove these files and generate them with some preprocessing.
>
> Neil, as the ABI policy author, what is your opinion?

I'm not Neil but my 5c...

Working around ABI compatibility policy via config options seems like a 
slippery slope. Going forward this will likely mean there are always two 
different ABIs for any given version, and the thought of keeping track 
of it all in a truly compatible manner makes my head hurt.

That said its easy to understand the desire to move faster than the ABI 
policy allows. In a project where so many structs are in the open it 
gets hard to do much anything at all without breaking the ABI.

The issue could be mitigated somewhat by reserving some space at the end 
of the structs eg when the ABI needs to be changed anyway, but it has 
obvious downsides as well. The other options I see tend to revolve 
around changing release policies one way or the other: releasing ABI 
compatible micro versions between minor versions and relaxing the ABI 
policy a bit, or just releasing new minor versions more often than the 
current cycle.

	- Panu -

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH 2/3] kni: remove deprecated functions
  @ 2015-06-12  6:20  3%   ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-12  6:20 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger

On 06/12/2015 08:18 AM, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> These functions were tagged as deprecated in 2.0 so they can be
> removed in 2.1
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   app/test/Makefile        |  6 ------
>   app/test/test_kni.c      | 36 --------------------------------
>   lib/librte_kni/rte_kni.c | 50 --------------------------------------------
>   lib/librte_kni/rte_kni.h | 54 ------------------------------------------------
>   4 files changed, 146 deletions(-)
>
[...]
> diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
> index 603e2cd..f65ce24 100644
> --- a/lib/librte_kni/rte_kni.h
> +++ b/lib/librte_kni/rte_kni.h
> @@ -129,30 +129,6 @@ extern struct rte_kni *rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
>   				     struct rte_kni_ops *ops);
>
>   /**
> - * It create a KNI device for specific port.
> - *
> - * Note: It is deprecated and just for backward compatibility.
> - *
> - * @param port_id
> - *  Port ID.
> - * @param mbuf_size
> - *  mbuf size.
> - * @param pktmbuf_pool
> - *  The mempool for allocting mbufs for packets.
> - * @param ops
> - *  The pointer to the callbacks for the KNI kernel requests.
> - *
> - * @return
> - *  - The pointer to the context of a KNI interface.
> - *  - NULL indicate error.
> - */
> -extern struct rte_kni *rte_kni_create(uint8_t port_id,
> -				      unsigned mbuf_size,
> -				      struct rte_mempool *pktmbuf_pool,
> -				      struct rte_kni_ops *ops) \
> -				      __attribute__ ((deprecated));
> -
> -/**
>    * Release KNI interface according to the context. It will also release the
>    * paired KNI interface in kernel space. All processing on the specific KNI
>    * context need to be stopped before calling this interface.
> @@ -221,21 +197,6 @@ extern unsigned rte_kni_tx_burst(struct rte_kni *kni,
>   		struct rte_mbuf **mbufs, unsigned num);
>
>   /**
> - * Get the port id from KNI interface.
> - *
> - * Note: It is deprecated and just for backward compatibility.
> - *
> - * @param kni
> - *  The KNI interface context.
> - *
> - * @return
> - *  On success: The port id.
> - *  On failure: ~0x0
> - */
> -extern uint8_t rte_kni_get_port_id(struct rte_kni *kni) \
> -				__attribute__ ((deprecated));
> -
> -/**
>    * Get the KNI context of its name.
>    *
>    * @param name
> @@ -248,21 +209,6 @@ extern uint8_t rte_kni_get_port_id(struct rte_kni *kni) \
>   extern struct rte_kni *rte_kni_get(const char *name);
>
>   /**
> - * Get the KNI context of the specific port.
> - *
> - * Note: It is deprecated and just for backward compatibility.
> - *
> - * @param port_id
> - *  the port id.
> - *
> - * @return
> - *  On success: Pointer to KNI interface.
> - *  On failure: NULL
> - */
> -extern struct rte_kni *rte_kni_info_get(uint8_t port_id) \
> -				__attribute__ ((deprecated));
> -
> -/**
>    * Register KNI request handling for a specified port,and it can
>    * be called by master process or slave process.
>    *
>

These symbols need to be removed from rte_kni_version.map too, and since 
its an ABI break, the library soname needs a bump as well.

	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
  2015-06-04 10:38  3%     ` Ananyev, Konstantin
@ 2015-06-12  6:06  0%       ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-12  6:06 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, June 4, 2015 6:38 PM
> To: Zhang, Helin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash
> key size
> 
> Hi Helin,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Helin Zhang
> > Sent: Thursday, June 04, 2015 8:34 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying
> > hash key size
> >
> > To support querying hash key size per port, an new field of
> > 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> > hash key size in bytes.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > v2 changes:
> > * Disabled the code changes by default, to avoid breaking ABI compatibility.
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 16dbe00..bdebc87 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -916,6 +916,9 @@ struct rte_eth_dev_info {
> >  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> >  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> >  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> > +#ifdef RTE_QUERY_HASH_KEY_SIZE
> > +	uint8_t hash_key_size; /**< Hash key size in bytes */ #endif
> >  	uint16_t reta_size;
> >  	/**< Device redirection table size, the total number of entries. */
> >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> 
> Why do you need to introduce an #ifdef RTE_QUERY_HASH_KEY_SIZE around
> your code?
> Why not to have it always on?
> Is it because of not breaking ABI for 2.1?
> But here, I suppose there would be no breakage anyway:
> 
> struct rte_eth_dev_info {
> ...
>         uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
>         uint16_t reta_size;
>         /**< Device redirection table size, the total number of entries. */
>         /** Bit mask of RSS offloads, the bit offset also means flow type */
>         uint64_t flow_type_rss_offloads;
>         struct rte_eth_rxconf default_rxconf;
> 
> 
> so between 'reta_size' and 'flow_type_rss_offloads', there is a 2 bytes gap.
> Wonder, why not put it there?
Oh, yes, you are totally right. There should be a 2 bytes padding there.
I will rework it with that. Thanks!

Helin

> 
> Konstantin
> 
> > --
> > 1.9.3

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] rte_ring: remove deprecated functions
  @ 2015-06-12  5:46  3%   ` Panu Matilainen
  2015-06-12 14:00  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Panu Matilainen @ 2015-06-12  5:46 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger

On 06/12/2015 08:18 AM, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> These were deprecated in 2.0 so remove them from 2.1
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   drivers/net/ring/rte_eth_ring.c           | 55 -------------------------------
>   drivers/net/ring/rte_eth_ring_version.map |  2 --
>   2 files changed, 57 deletions(-)
>
[...]
> diff --git a/drivers/net/ring/rte_eth_ring_version.map b/drivers/net/ring/rte_eth_ring_version.map
> index 8ad107d..0875e25 100644
> --- a/drivers/net/ring/rte_eth_ring_version.map
> +++ b/drivers/net/ring/rte_eth_ring_version.map
> @@ -2,8 +2,6 @@ DPDK_2.0 {
>   	global:
>
>   	rte_eth_from_rings;
> -	rte_eth_ring_pair_attach;
> -	rte_eth_ring_pair_create;
>
>   	local: *;
>   };

Removing symbols is an ABI break so it additionally requires a soname 
bump for this library.

In addition, simply due to being the first library to do so, it'll also 
then break the combined shared library as is currently is. Mind you, 
this is not an objection at all, the need to change to a linker script 
approach has always been a matter of time.

	- Panu -

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 15:39  0%           ` Ananyev, Konstantin
@ 2015-06-12  3:22  0%             ` Zhang, Helin
  0 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-12  3:22 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier MATZ, O'Driscoll, Tim, Thomas Monjalon
  Cc: dev



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, June 10, 2015 11:40 PM
> To: Olivier MATZ; O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> Hi Olivier,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > Sent: Wednesday, June 10, 2015 3:33 PM
> > To: O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> > rte_mbuf
> >
> > Hi Tim, Helin,
> >
> > On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> > >
> > >> -----Original Message-----
> > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> > >> Sent: Monday, June 1, 2015 9:15 AM
> > >> To: Zhang, Helin; dev@dpdk.org
> > >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> > >> in rte_mbuf
> > >>
> > >> Hi Helin,
> > >>
> > >> +CC Neil
> > >>
> > >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> > >>> In order to unify the packet type, the field of 'packet_type' in
> > >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> > >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> > >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> > >>> KNI should be right mapped to 'struct rte_mbuf', it should be
> > >>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> > >>> by default, as 'struct rte_mbuf' changed.
> > >>> To avoid breaking ABI compatibility, all the changes would be
> > >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> > >>
> > >> What are the plans for this compile-time option in the future?
> > >>
> > >> I wonder what are the benefits of having this option in terms of
> > >> ABI compatibility: when it is disabled, it is ABI-compatible but
> > >> the packet-type feature is not present, and when it is enabled we
> > >> have the feature but it breaks the compatibility.
> > >>
> > >> In my opinion, the v5 is preferable: for this kind of features, I
> > >> don't see how the ABI can be preserved, and I think packet-type
> > >> won't be the only feature that will modify the mbuf structure. I
> > >> think the process described here should be applied:
> > >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> > >>
> > >> (starting from "Some ABI changes may be too significant to
> > >> reasonably maintain multiple versions of").
> > >>
> > >>
> > >> Regards,
> > >> Olivier
> > >>
> > >
> > > This is just like the change that Steve (Cunming) Liang submitted
> > > for Interrupt Mode. We have the same problem in both cases: we
> > want to find a way to get the features included, but need to comply
> > with our ABI policy. So, in both cases, the proposal is to add a
> > config option to enable the change by default, so we maintain backward
> compatibility. Users that want these changes, and are willing to accept the
> associated ABI change, have to specifically enable them.
> > >
> > > We can note in the Deprecation Notices in the Release Notes for 2.1
> > > that these config options will be removed in 2.2. The features
> > will then be enabled by default.
> > >
> > > This seems like a good compromise which allows us to get these changes into
> 2.1 but avoids breaking the ABI policy.
> >
> > Sorry for the late answer.
> >
> > After some thoughts on this topic, I understand that having a
> > compile-time option is perhaps a good compromise between keeping
> > compatibility and having new features earlier.
> >
> > I'm just afraid about having one #ifdef in the code for each new
> > feature that cannot keep the ABI compatibility.
> > What do you think about having one option -- let's call it
> > "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would
> > surround any new feature that breaks the ABI?
> 
> I am not Tim/Helin, but really like that idea :) Konstantin

It seems more guys like Oliver's idea of introducing CONFIG_RTE_NEXT_ABI. Any objections?
If none, I will rework my patches with that.

- Helin

> 
> 
> >
> > This would have several advantages:
> > - only 2 cases (on or off), the combinatorial is smaller than
> >    having one option per feature
> > - all next features breaking the abi can be identified by a grep
> > - the code inside the #ifdef can be enabled in a simple operation
> >    by Thomas after each release.
> >
> > Thomas, any comment?
> >
> > Regards,
> > Olivier
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info
  @ 2015-06-11 14:35  3%       ` Marc Sune
  0 siblings, 0 replies; 200+ results
From: Marc Sune @ 2015-06-11 14:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev



On 11/06/15 11:08, Thomas Monjalon wrote:
> 2015-06-08 10:50, Marc Sune:
>> On 29/05/15 20:23, Thomas Monjalon wrote:
>>> 2015-05-27 11:15, Marc Sune:
>>>> On 27/05/15 06:02, Thomas Monjalon wrote:
>>>>>> +#define ETH_SPEED_CAP_10M_HD	(1 << 0)  /*< 10 Mbps half-duplex> */
>>>>>> +#define ETH_SPEED_CAP_10M_FD	(1 << 1)  /*< 10 Mbps full-duplex> */
>>>>>> +#define ETH_SPEED_CAP_100M_HD	(1 << 2)  /*< 100 Mbps half-duplex> */
>>>>>> +#define ETH_SPEED_CAP_100M_FD	(1 << 3)  /*< 100 Mbps full-duplex> */
>>>>>> +#define ETH_SPEED_CAP_1G	(1 << 4)  /*< 1 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_2_5G	(1 << 5)  /*< 2.5 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_5G	(1 << 6)  /*< 5 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_10G	(1 << 7)  /*< 10 Mbps > */
>>>>>> +#define ETH_SPEED_CAP_20G	(1 << 8)  /*< 20 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_25G	(1 << 9)  /*< 25 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_40G	(1 << 10)  /*< 40 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_50G	(1 << 11)  /*< 50 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_56G	(1 << 12)  /*< 56 Gbps > */
>>>>>> +#define ETH_SPEED_CAP_100G	(1 << 13)  /*< 100 Gbps > */
>>>>> We should note that rte_eth_link is using ETH_LINK_SPEED_* constants
>>>>> which are not some bitmaps so we have to create these new constants.
>>>> Yes, I can add that to the patch description (1/2).
>>>>
>>>>> Furthermore, rte_eth_link.link_speed is an uint16_t so it is limited
>>>>> to 40G. Should we use some constant bitmaps here also?
>>>> I also thought about converting link_speed into a bitmap to unify the
>>>> constants before starting the patch (there is redundancy), but I wanted
>>>> to be minimally invasive; changing link to a bitmap can break existing apps.
>>>>
>>>> I can also merge them if we think is a better idea.
>>> Maybe. Someone against this idea?
>> Me. I tried implementing this unified speed constantss, but the problem
>> is that for the capabilities full-duplex/half-duplex speeds are unrolled
>> (e.g. 100M_HD/100_FD). There is no generic 100M to set a specific speed,
> Or we can define ETH_SPEED_CAP_100M and ETH_SPEED_CAP_100M_FD.
> Is it possible to have a NIC doing 100M_FD but not 100M_HD?

Did not check in detail, but I guess this is mostly legacy NICs, not 
supported by DPDK anyway, and that is safe to assume 10M/100M meaning 
supports both HD/FD.

>
>> so if you want a fiex speed and duplex auto-negociation witht the
>> current set of constants, it would look weird; e.g.
>> link_speed=ETH_SPEED_100M_HD and then set
>> link_duplex=ETH_LINK_AUTONEG_DUPLEX):
>>
>>    232 struct rte_eth_link {
>>    233         uint16_t link_speed;      /**< ETH_LINK_SPEED_[10, 100,
>> 1000, 10000] */
>>    234         uint16_t link_duplex;     /**< ETH_LINK_[HALF_DUPLEX,
>> FULL_DUPLEX] */
>>    235         uint8_t  link_status : 1; /**< 1 -> link up, 0 -> link
>> down */
>>    236 }__attribute__((aligned(8)));     /**< aligned for atomic64
>> read/write */
>>
>> There is another minor point, which is when setting the speed in
>> rte_eth_conf:
>>
>>    840 struct rte_eth_conf {
>>    841         uint16_t link_speed;
>>    842         /**< ETH_LINK_SPEED_10[0|00|000], or 0 for autonegotation */
>>
>> 0 is used for speed auto-negociation, but 0 is also used in the
>> capabilities bitmap to indicate no PHY_MEDIA (virtual interface). I
>> would have to define something like:
>>
>> 906 #define ETH_SPEED_NOT_PHY   (0)  /*< No phy media > */
>> 907 #define ETH_SPEED_AUTONEG   (0)  /*< Autonegociate speed > */
> Or something like SPEED_UNDEFINED

Ok. I will prepare the patch and circulate a v3.

After briefly chatting offline with Thomas, it seems I was not clearly 
stating in my original v1 that this patch is targeting DPDK v2.2, due to 
ABI(and API) issues.

It is, and so I will hold v3 until 2.2 window starts, to make it more clear.

thanks
Marc
>
>> And use (only) NOT_PHY for a capabilities and _AUTONEG for rte_eth_conf.
>>
>> The options I see:
>>
>> a) add to the the list of the current speeds generic 10M/100M/1G speeds
>> without HD/FD, and just use these speeds in rte_eth_conf.
>> b) leave them separated.
>>
>> I would vote for b), since the a) is not completely clean.
>> Opinions&other alternatives welcome.
>>
>> Marc

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: guidelines for library statistics
  2015-06-08 14:50  5% [dpdk-dev] [PATCH] doc: guidelines for library statistics Cristian Dumitrescu
@ 2015-06-11 12:05  0% ` Thomas Monjalon
  2015-06-15 21:46  4%   ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-11 12:05 UTC (permalink / raw)
  To: Cristian Dumitrescu; +Cc: dev

Hi Cristian,

Thanks for trying to make a policy clearer.
We need to make a decision in the coming week.
Below are comments on the style and content.

2015-06-08 15:50, Cristian Dumitrescu:
>  doc/guides/guidelines/statistics.rst | 42 ++++++++++++++++++++++++++++++++++++

Maybe we should have a more general file like design.rst.
In order to have a lot of readers of such guidelines, they must be concise.

Please wrap lines to be not too long and/or split lines after the end of a sentence.

> +Library Statistics
> +==================
> +
> +Description
> +-----------
> +
> +This document describes the guidelines for DPDK library-level statistics counter support. This includes guidelines for turning library statistics on and off, requirements for preventing ABI changes when library statistics are turned on and off, etc.

Should we consider that driver stats and lib stats are different in DPDK? Why?

> +Motivation to allow the application to turn library statistics on and off
> +-------------------------------------------------------------------------
> +
> +It is highly recommended that each library provides statistics counters to allow the application to monitor the library-level run-time events. Typical counters are: number of packets received/dropped/transmitted, number of buffers allocated/freed, number of occurrences for specific events, etc.
> +
> +Since the resources consumed for library-level statistics counter collection have to be spent out of the application budget and the counters collected by some libraries might not be relevant for the current application, in order to avoid any unwanted waste of resources and/or performance for the application, the application is to decide at build time whether the collection of library-level statistics counters should be turned on or off for each library individually.

It would be good to have acknowledgements or other opinions on this.
Some of them were expressed in other threads. Please comment here.

> +Library-level statistics counters can be relevant or not for specific applications:
> +* For application A, counters maintained by library X are always relevant and the application needs to use them to implement certain features, as traffic accounting, logging, application-level statistics, etc. In this case, the application requires that collection of statistics counters for library X is always turned on;
> +* For application B, counters maintained by library X are only useful during the application debug stage and not relevant once debug phase is over. In this case, the application may decide to turn on the collection of library X statistics counters during the debug phase and later on turn them off;

Users of binary package have not this choice.

> +* For application C, counters maintained by library X are not relevant at all. It might me that the application maintains its own set of statistics counters that monitor a different set of run-time events than library X (e.g. number of connection requests, number of active users, etc). It might also be that application uses multiple libraries (library X, library Y, etc) and it is interested in the statistics counters of library Y, but not in those of library X. In this case, the application may decide to turn the collection of statistics counters off for library X and on for library Y.
> +
> +The statistics collection consumes a certain amount of CPU resources (cycles, cache bandwidth, memory bandwidth, etc) that depends on:
> +* Number of libraries used by the current application that have statistics counters collection turned on;
> +* Number of statistics counters maintained by each library per object type instance (e.g. per port, table, pipeline, thread, etc);
> +* Number of instances created for each object type supported by each library;
> +* Complexity of the statistics logic collection for each counter: when only some occurrences of a specific event are valid, several conditional branches might involved in the decision of whether the current occurrence of the event should be counted or not (e.g. on the event of packet reception, only TCP packets with destination port within a certain range should be recorded), etc.
> +
> +Mechanism to allow the application to turn library statistics on and off
> +------------------------------------------------------------------------
> +
> +Each library that provides statistics counters should provide a single build time flag that decides whether the statistics counter collection is enabled or not for this library. This flag should be exposed as a variable within the DPDK configuration file. When this flag is set, all the counters supported by current library are collected; when this flag is cleared, none of the counters supported by the current library are collected:
> +
> +	#DPDK file “./config/common_linuxapp”, “./config/common_bsdapp”, etc
> +	CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_COLLECT_STATS=y/n

Why not simply CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_STATS (without COLLECT)?

> +The default value for this DPDK configuration file variable (either “yes” or “no”) is left at the decision of each library.
> +
> +Prevention of ABI changes due to library statistics support
> +-----------------------------------------------------------
> +
> +The layout of data structures and prototype of functions that are part of the library API should not be affected by whether the collection of statistics counters is turned on or off for the current library. In practical terms, this means that space is always allocated in the API data structures for statistics counters and the statistics related API functions are always built into the code, regardless of whether the statistics counter collection is turned on or off for the current library.
> +
> +When the collection of statistics counters for the current library is turned off, the counters retrieved through the statistics related API functions should have the default value of zero.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 2/7] lib_vhost: Support multiple queues in virtio dev
  @ 2015-06-11  9:54  5%     ` Panu Matilainen
  0 siblings, 0 replies; 200+ results
From: Panu Matilainen @ 2015-06-11  9:54 UTC (permalink / raw)
  To: Ouyang Changchun, dev

On 06/10/2015 08:52 AM, Ouyang Changchun wrote:
> Each virtio device could have multiple queues, say 2 or 4, at most 8.
> Enabling this feature allows virtio device/port on guest has the ability to
> use different vCPU to receive/transmit packets from/to each queue.
>
> In multiple queues mode, virtio device readiness means all queues of
> this virtio device are ready, cleanup/destroy a virtio device also
> requires clearing all queues belong to it.
>
> Changes in v2:
>    - remove the q_num_set api
>    - add the qp_num_get api
>    - determine the queue pair num from qemu message
>    - rework for reset owner message handler
>    - dynamically alloc mem for dev virtqueue
>    - queue pair num could be 0x8000
>    - fix checkpatch errors
>
> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com>
> ---
>   lib/librte_vhost/rte_virtio_net.h             |  10 +-
>   lib/librte_vhost/vhost-net.h                  |   1 +
>   lib/librte_vhost/vhost_rxtx.c                 |  32 ++---
>   lib/librte_vhost/vhost_user/vhost-net-user.c  |   4 +-
>   lib/librte_vhost/vhost_user/virtio-net-user.c |  76 +++++++++---
>   lib/librte_vhost/vhost_user/virtio-net-user.h |   2 +
>   lib/librte_vhost/virtio-net.c                 | 161 +++++++++++++++++---------
>   7 files changed, 197 insertions(+), 89 deletions(-)
>
> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
> index 5d38185..92b4bfa 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -59,7 +59,6 @@ struct rte_mbuf;
>   /* Backend value set by guest. */
>   #define VIRTIO_DEV_STOPPED -1
>
> -
>   /* Enum for virtqueue management. */
>   enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
>
> @@ -96,13 +95,14 @@ struct vhost_virtqueue {
>    * Device structure contains all configuration information relating to the device.
>    */
>   struct virtio_net {
> -	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
>   	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
> +	struct vhost_virtqueue	**virtqueue;    /**< Contains all virtqueue information. */
>   	uint64_t		features;	/**< Negotiated feature set. */
>   	uint64_t		device_fh;	/**< device identifier. */
>   	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
>   #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>   	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
> +	uint32_t                num_virt_queues;
>   	void			*priv;		/**< private context */
>   } __rte_cache_aligned;
>
> @@ -220,4 +220,10 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id,
>   uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>   	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
>

Unfortunately this is an ABI break, NAK. Ditto for other changes to 
struct virtio_net in patch 3/7 in this series. See 
http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst for the 
ABI policy.

There's plenty of discussion around the ABI going on at the moment, 
including this thread: http://dpdk.org/ml/archives/dev/2015-June/018456.html

	- Panu -

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v3 0/7] support i40e QinQ stripping and insertion
  2015-06-11  7:03  3%   ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
    2015-06-11  7:03  2%     ` [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan stripping and insertion Helin Zhang
@ 2015-06-11  7:25  0%     ` Wu, Jingjing
  2 siblings, 0 replies; 200+ results
From: Wu, Jingjing @ 2015-06-11  7:25 UTC (permalink / raw)
  To: Zhang, Helin, dev

Acked-by: Jingjing Wu <jingjing.wu@intel.com>


> -----Original Message-----
> From: Zhang, Helin
> Sent: Thursday, June 11, 2015 3:04 PM
> To: dev@dpdk.org
> Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; Richardson,
> Bruce; olivier.matz@6wind.com; Zhang, Helin
> Subject: [PATCH v3 0/7] support i40e QinQ stripping and insertion
> 
> As i40e hardware can be reconfigured to support QinQ stripping and insertion,
> this patch set is to enable that together with using the reserved 16 bits in
> 'struct rte_mbuf' for the second vlan tag. Corresponding command is added
> in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.
> 
> v3 changes:
> * update documentation (Testpmd Application User Guide).
> 
> Helin Zhang (7):
>   ixgbe: remove a discarded source line
>   mbuf: use the reserved 16 bits for double vlan
>   i40e: support double vlan stripping and insertion
>   i40evf: add supported offload capability flags
>   app/testpmd: add test cases for qinq stripping and insertion
>   examples/ipv4_multicast: support double vlan stripping and insertion
>   doc: update testpmd command
> 
>  app/test-pmd/cmdline.c                      | 78 ++++++++++++++++++++++++---
>  app/test-pmd/config.c                       | 21 +++++++-
>  app/test-pmd/flowgen.c                      |  4 +-
>  app/test-pmd/macfwd.c                       |  3 ++
>  app/test-pmd/macswap.c                      |  3 ++
>  app/test-pmd/rxonly.c                       |  3 ++
>  app/test-pmd/testpmd.h                      |  6 ++-
>  app/test-pmd/txonly.c                       |  8 ++-
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst | 14 ++++-
>  drivers/net/i40e/i40e_ethdev.c              | 52 ++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev_vf.c           | 13 +++++
>  drivers/net/i40e/i40e_rxtx.c                | 81 ++++++++++++++++++-----------
>  drivers/net/ixgbe/ixgbe_rxtx.c              |  1 -
>  examples/ipv4_multicast/main.c              |  1 +
>  lib/librte_ether/rte_ethdev.h               |  2 +
>  lib/librte_mbuf/rte_mbuf.h                  | 10 +++-
>  16 files changed, 255 insertions(+), 45 deletions(-)
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan stripping and insertion
  2015-06-11  7:03  3%   ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
  @ 2015-06-11  7:03  2%     ` Helin Zhang
  2015-06-11  7:25  0%     ` [dpdk-dev] [PATCH v3 0/7] support i40e QinQ " Wu, Jingjing
  2 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-11  7:03 UTC (permalink / raw)
  To: dev

It configures specific registers to enable double vlan stripping
on RX side and insertion on TX side.
The RX descriptors will be parsed, the vlan tags and flags will be
saved to corresponding mbuf fields if vlan tag is detected.
The TX descriptors will be configured according to the
configurations in mbufs, to trigger the hardware insertion of
double vlan tags for each packets sent out.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev_vf.c |  6 +++
 drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
 lib/librte_ether/rte_ethdev.h     |  2 +
 4 files changed, 112 insertions(+), 29 deletions(-)

v2 changes:
* Kept the original RX/TX offload flags as they were, added new
  flags after with new bit masks, for ABI compatibility.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..7593a70 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -211,6 +211,7 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
 				void *arg);
 static void i40e_configure_registers(struct i40e_hw *hw);
 static void i40e_hw_init(struct i40e_hw *hw);
+static int i40e_config_qinq(struct i40e_hw *hw, struct i40e_vsi *vsi);
 
 static const struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
@@ -1529,11 +1530,13 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_vfs = dev->pci_dev->max_vfs;
 	dev_info->rx_offload_capa =
 		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
 		DEV_RX_OFFLOAD_IPV4_CKSUM |
 		DEV_RX_OFFLOAD_UDP_CKSUM |
 		DEV_RX_OFFLOAD_TCP_CKSUM;
 	dev_info->tx_offload_capa =
 		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT |
 		DEV_TX_OFFLOAD_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_UDP_CKSUM |
 		DEV_TX_OFFLOAD_TCP_CKSUM |
@@ -3056,6 +3059,7 @@ i40e_vsi_setup(struct i40e_pf *pf,
 		 * macvlan filter which is expected and cannot be removed.
 		 */
 		i40e_update_default_filter_setting(vsi);
+		i40e_config_qinq(hw, vsi);
 	} else if (type == I40E_VSI_SRIOV) {
 		memset(&ctxt, 0, sizeof(ctxt));
 		/**
@@ -3096,6 +3100,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
 		 * Since VSI is not created yet, only configure parameter,
 		 * will add vsi below.
 		 */
+
+		i40e_config_qinq(hw, vsi);
 	} else if (type == I40E_VSI_VMDQ2) {
 		memset(&ctxt, 0, sizeof(ctxt));
 		/*
@@ -5697,3 +5703,49 @@ i40e_configure_registers(struct i40e_hw *hw)
 			"0x%"PRIx32, reg_table[i].val, reg_table[i].addr);
 	}
 }
+
+#define I40E_VSI_TSR(_i)            (0x00050800 + ((_i) * 4))
+#define I40E_VSI_TSR_QINQ_CONFIG    0xc030
+#define I40E_VSI_L2TAGSTXVALID(_i)  (0x00042800 + ((_i) * 4))
+#define I40E_VSI_L2TAGSTXVALID_QINQ 0xab
+static int
+i40e_config_qinq(struct i40e_hw *hw, struct i40e_vsi *vsi)
+{
+	uint32_t reg;
+	int ret;
+
+	if (vsi->vsi_id >= I40E_MAX_NUM_VSIS) {
+		PMD_DRV_LOG(ERR, "VSI ID exceeds the maximum");
+		return -EINVAL;
+	}
+
+	/* Configure for double VLAN RX stripping */
+	reg = I40E_READ_REG(hw, I40E_VSI_TSR(vsi->vsi_id));
+	if ((reg & I40E_VSI_TSR_QINQ_CONFIG) != I40E_VSI_TSR_QINQ_CONFIG) {
+		reg |= I40E_VSI_TSR_QINQ_CONFIG;
+		ret = i40e_aq_debug_write_register(hw,
+						   I40E_VSI_TSR(vsi->vsi_id),
+						   reg, NULL);
+		if (ret < 0) {
+			PMD_DRV_LOG(ERR, "Failed to update VSI_TSR[%d]",
+				    vsi->vsi_id);
+			return I40E_ERR_CONFIG;
+		}
+	}
+
+	/* Configure for double VLAN TX insertion */
+	reg = I40E_READ_REG(hw, I40E_VSI_L2TAGSTXVALID(vsi->vsi_id));
+	if ((reg & 0xff) != I40E_VSI_L2TAGSTXVALID_QINQ) {
+		reg = I40E_VSI_L2TAGSTXVALID_QINQ;
+		ret = i40e_aq_debug_write_register(hw,
+						   I40E_VSI_L2TAGSTXVALID(
+						   vsi->vsi_id), reg, NULL);
+		if (ret < 0) {
+			PMD_DRV_LOG(ERR, "Failed to update "
+				"VSI_L2TAGSTXVALID[%d]", vsi->vsi_id);
+			return I40E_ERR_CONFIG;
+		}
+	}
+
+	return 0;
+}
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 4f4404e..3ae2553 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1672,6 +1672,12 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+	dev_info->rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP;
+	dev_info->tx_offload_capa =
+		DEV_TX_OFFLOAD_VLAN_INSERT |
+		DEV_TX_OFFLOAD_QINQ_INSERT;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_thresh = {
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 2de0ac4..b2e1d6d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -94,18 +94,44 @@ static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
 
+static inline void
+i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp)
+{
+	if (rte_le_to_cpu_64(rxdp->wb.qword1.status_error_len) &
+		(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) {
+		mb->ol_flags |= PKT_RX_VLAN_PKT;
+		mb->vlan_tci =
+			rte_le_to_cpu_16(rxdp->wb.qword0.lo_dword.l2tag1);
+		PMD_RX_LOG(DEBUG, "Descriptor l2tag1: %u",
+			   rte_le_to_cpu_16(rxdp->wb.qword0.lo_dword.l2tag1));
+	} else {
+		mb->vlan_tci = 0;
+	}
+#ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+	if (rte_le_to_cpu_16(rxdp->wb.qword2.ext_status) &
+		(1 << I40E_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) {
+		mb->ol_flags |= PKT_RX_QINQ_PKT;
+		mb->vlan_tci_outer = mb->vlan_tci;
+		mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2);
+		PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
+			   rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_1),
+			   rte_le_to_cpu_16(rxdp->wb.qword2.l2tag2_2));
+	} else {
+		mb->vlan_tci_outer = 0;
+	}
+#endif
+	PMD_RX_LOG(DEBUG, "Mbuf vlan_tci: %u, vlan_tci_outer: %u",
+		   mb->vlan_tci, mb->vlan_tci_outer);
+}
+
 /* Translate the rx descriptor status to pkt flags */
 static inline uint64_t
 i40e_rxd_status_to_pkt_flags(uint64_t qword)
 {
 	uint64_t flags;
 
-	/* Check if VLAN packet */
-	flags = qword & (1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-							PKT_RX_VLAN_PKT : 0;
-
 	/* Check if RSS_HASH */
-	flags |= (((qword >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
+	flags = (((qword >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
 					I40E_RX_DESC_FLTSTAT_RSS_HASH) ==
 			I40E_RX_DESC_FLTSTAT_RSS_HASH) ? PKT_RX_RSS_HASH : 0;
 
@@ -696,16 +722,12 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			mb = rxep[j].mbuf;
 			qword1 = rte_le_to_cpu_64(\
 				rxdp[j].wb.qword1.status_error_len);
-			rx_status = (qword1 & I40E_RXD_QW1_STATUS_MASK) >>
-						I40E_RXD_QW1_STATUS_SHIFT;
 			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
 				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
 			mb->data_len = pkt_len;
 			mb->pkt_len = pkt_len;
-			mb->vlan_tci = rx_status &
-				(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-			rte_le_to_cpu_16(\
-				rxdp[j].wb.qword0.lo_dword.l2tag1) : 0;
+			mb->ol_flags = 0;
+			i40e_rxd_to_vlan_tci(mb, &rxdp[j]);
 			pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 			pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -719,7 +741,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 			if (pkt_flags & PKT_RX_FDIR)
 				pkt_flags |= i40e_rxd_build_fdir(&rxdp[j], mb);
 
-			mb->ol_flags = pkt_flags;
+			mb->ol_flags |= pkt_flags;
 		}
 
 		for (j = 0; j < I40E_LOOK_AHEAD; j++)
@@ -945,10 +967,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = rx_packet_len;
 		rxm->data_len = rx_packet_len;
 		rxm->port = rxq->port_id;
-
-		rxm->vlan_tci = rx_status &
-			(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT) ?
-			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
+		rxm->ol_flags = 0;
+		i40e_rxd_to_vlan_tci(rxm, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -960,7 +980,7 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (pkt_flags & PKT_RX_FDIR)
 			pkt_flags |= i40e_rxd_build_fdir(&rxd, rxm);
 
-		rxm->ol_flags = pkt_flags;
+		rxm->ol_flags |= pkt_flags;
 
 		rx_pkts[nb_rx++] = rxm;
 	}
@@ -1105,9 +1125,8 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		}
 
 		first_seg->port = rxq->port_id;
-		first_seg->vlan_tci = (rx_status &
-			(1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
-			rte_le_to_cpu_16(rxd.wb.qword0.lo_dword.l2tag1) : 0;
+		first_seg->ol_flags = 0;
+		i40e_rxd_to_vlan_tci(first_seg, &rxd);
 		pkt_flags = i40e_rxd_status_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_error_to_pkt_flags(qword1);
 		pkt_flags |= i40e_rxd_ptype_to_pkt_flags(qword1);
@@ -1120,7 +1139,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		if (pkt_flags & PKT_RX_FDIR)
 			pkt_flags |= i40e_rxd_build_fdir(&rxd, rxm);
 
-		first_seg->ol_flags = pkt_flags;
+		first_seg->ol_flags |= pkt_flags;
 
 		/* Prefetch data of first segment, if configured to do so. */
 		rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
@@ -1158,17 +1177,15 @@ i40e_recv_scattered_pkts(void *rx_queue,
 static inline uint16_t
 i40e_calc_context_desc(uint64_t flags)
 {
-	uint64_t mask = 0ULL;
-
-	mask |= (PKT_TX_OUTER_IP_CKSUM | PKT_TX_TCP_SEG);
+	static uint64_t mask = PKT_TX_OUTER_IP_CKSUM |
+		PKT_TX_TCP_SEG |
+		PKT_TX_QINQ_PKT;
 
 #ifdef RTE_LIBRTE_IEEE1588
 	mask |= PKT_TX_IEEE1588_TMST;
 #endif
-	if (flags & mask)
-		return 1;
 
-	return 0;
+	return ((flags & mask) ? 1 : 0);
 }
 
 /* set i40e TSO context descriptor */
@@ -1289,9 +1306,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Descriptor based VLAN insertion */
-		if (ol_flags & PKT_TX_VLAN_PKT) {
+		if (ol_flags & (PKT_TX_VLAN_PKT | PKT_TX_QINQ_PKT)) {
 			tx_flags |= tx_pkt->vlan_tci <<
-					I40E_TX_FLAG_L2TAG1_SHIFT;
+				I40E_TX_FLAG_L2TAG1_SHIFT;
 			tx_flags |= I40E_TX_FLAG_INSERT_VLAN;
 			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
 			td_tag = (tx_flags & I40E_TX_FLAG_L2TAG1_MASK) >>
@@ -1339,6 +1356,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			ctx_txd->tunneling_params =
 				rte_cpu_to_le_32(cd_tunneling_params);
+			if (ol_flags & PKT_TX_QINQ_PKT) {
+				cd_l2tag2 = tx_pkt->vlan_tci_outer;
+				cd_type_cmd_tso_mss |=
+					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
+						I40E_TXD_CTX_QW1_CMD_SHIFT);
+			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->type_cmd_tso_mss =
 				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..892280c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -887,6 +887,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_UDP_CKSUM   0x00000004
 #define DEV_RX_OFFLOAD_TCP_CKSUM   0x00000008
 #define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
+#define DEV_RX_OFFLOAD_QINQ_STRIP  0x00000020
 
 /**
  * TX offload capabilities of a device.
@@ -899,6 +900,7 @@ struct rte_eth_conf {
 #define DEV_TX_OFFLOAD_TCP_TSO     0x00000020
 #define DEV_TX_OFFLOAD_UDP_TSO     0x00000040
 #define DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000080 /**< Used for tunneling packet. */
+#define DEV_TX_OFFLOAD_QINQ_INSERT 0x00000100
 
 struct rte_eth_dev_info {
 	struct rte_pci_device *pci_dev; /**< Device PCI information. */
-- 
1.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 0/7] support i40e QinQ stripping and insertion
      2015-06-08  7:40  0%   ` Olivier MATZ
@ 2015-06-11  7:03  3%   ` Helin Zhang
                         ` (2 more replies)
  2 siblings, 3 replies; 200+ results
From: Helin Zhang @ 2015-06-11  7:03 UTC (permalink / raw)
  To: dev

As i40e hardware can be reconfigured to support QinQ stripping
and insertion, this patch set is to enable that together with
using the reserved 16 bits in 'struct rte_mbuf' for the second
vlan tag. Corresponding command is added in testpmd for testing.
Note that no need to rework vPMD, as nothings used in it changed.

v2 changes:
* Added more commit logs of which commit it fix for.
* Fixed a typo.
* Kept the original RX/TX offload flags as they were, added new
  flags after with new bit masks, for ABI compatibility.
* Supported double vlan stripping/insertion in examples/ipv4_multicast.

v3 changes:
* update documentation (Testpmd Application User Guide).

Helin Zhang (7):
  ixgbe: remove a discarded source line
  mbuf: use the reserved 16 bits for double vlan
  i40e: support double vlan stripping and insertion
  i40evf: add supported offload capability flags
  app/testpmd: add test cases for qinq stripping and insertion
  examples/ipv4_multicast: support double vlan stripping and insertion
  doc: update testpmd command

 app/test-pmd/cmdline.c                      | 78 ++++++++++++++++++++++++---
 app/test-pmd/config.c                       | 21 +++++++-
 app/test-pmd/flowgen.c                      |  4 +-
 app/test-pmd/macfwd.c                       |  3 ++
 app/test-pmd/macswap.c                      |  3 ++
 app/test-pmd/rxonly.c                       |  3 ++
 app/test-pmd/testpmd.h                      |  6 ++-
 app/test-pmd/txonly.c                       |  8 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 14 ++++-
 drivers/net/i40e/i40e_ethdev.c              | 52 ++++++++++++++++++
 drivers/net/i40e/i40e_ethdev_vf.c           | 13 +++++
 drivers/net/i40e/i40e_rxtx.c                | 81 ++++++++++++++++++-----------
 drivers/net/ixgbe/ixgbe_rxtx.c              |  1 -
 examples/ipv4_multicast/main.c              |  1 +
 lib/librte_ether/rte_ethdev.h               |  2 +
 lib/librte_mbuf/rte_mbuf.h                  | 10 +++-
 16 files changed, 255 insertions(+), 45 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 14:32  4%         ` Olivier MATZ
  2015-06-10 14:51  0%           ` Zhang, Helin
  2015-06-10 15:39  0%           ` Ananyev, Konstantin
@ 2015-06-10 16:14  5%           ` Thomas Monjalon
  2015-06-12  7:24  5%             ` Panu Matilainen
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2015-06-10 16:14 UTC (permalink / raw)
  To: Olivier MATZ, O'Driscoll, Tim, Zhang, Helin, nhorman; +Cc: dev

2015-06-10 16:32, Olivier MATZ:
> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>> In order to unify the packet type, the field of 'packet_type' in
> >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> >>> KNI should be right mapped to 'struct rte_mbuf', it should be
> >>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> >>> by default, as 'struct rte_mbuf' changed.
> >>> To avoid breaking ABI compatibility, all the changes would be
> >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>
> >> What are the plans for this compile-time option in the future?
> >>
> >> I wonder what are the benefits of having this option in terms
> >> of ABI compatibility: when it is disabled, it is ABI-compatible but
> >> the packet-type feature is not present, and when it is enabled we
> >> have the feature but it breaks the compatibility.
> >>
> >> In my opinion, the v5 is preferable: for this kind of features, I
> >> don't see how the ABI can be preserved, and I think packet-type
> >> won't be the only feature that will modify the mbuf structure. I think
> >> the process described here should be applied:
> >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>
> >> (starting from "Some ABI changes may be too significant to reasonably
> >> maintain multiple versions of").
> >
> > This is just like the change that Steve (Cunming) Liang submitted for
> > Interrupt Mode. We have the same problem in both cases: we want to find
> > a way to get the features included, but need to comply with our ABI
> > policy. So, in both cases, the proposal is to add a config option to
> > enable the change by default, so we maintain backward compatibility.
> > Users that want these changes, and are willing to accept the
> > associated ABI change, have to specifically enable them.
> >
> > We can note in the Deprecation Notices in the Release Notes for 2.1
> > that these config options will be removed in 2.2. The features will
> > then be enabled by default.
> >
> > This seems like a good compromise which allows us to get these changes
> > into 2.1 but avoids breaking the ABI policy.
> 
> Sorry for the late answer.
> 
> After some thoughts on this topic, I understand that having a
> compile-time option is perhaps a good compromise between
> keeping compatibility and having new features earlier.
> 
> I'm just afraid about having one #ifdef in the code for
> each new feature that cannot keep the ABI compatibility.
> What do you think about having one option -- let's call
> it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
> and that would surround any new feature that breaks the
> ABI?
> 
> This would have several advantages:
> - only 2 cases (on or off), the combinatorial is smaller than
>    having one option per feature
> - all next features breaking the abi can be identified by a grep
> - the code inside the #ifdef can be enabled in a simple operation
>    by Thomas after each release.
> 
> Thomas, any comment?

As previously discussed (1to1) with Olivier, I think that's a good proposal
to introduce changes breaking deeply the ABI.

Let's sum up the current policy:
1/ For changes which have a limited impact on the ABI, the backward compatibility
must be kept during 1 release including the notice in doc/guides/rel_notes/abi.rst.
2/ For important changes like mbuf rework, there was an agreement on skipping the
backward compatibility after having 3 acknowledgements and an 1-release long notice.
Then the ABI numbering must be incremented.

This CONFIG_RTE_NEXT_ABI proposal would change the rules for the second case.
In order to be adopted, a patch for the file doc/guides/rel_notes/abi.rst must
be submitted and strongly acknowledged.

The ABI numbering must be also clearly explained:
1/ Should we have different libraries version number depending of CONFIG_RTE_NEXT_ABI?
It seems straightforward to use "ifeq" when LIBABIVER in the Makefiles
2/ Are we able to have some "if CONFIG_RTE_NEXT_ABI" statement in the .map files?
Maybe we should remove these files and generate them with some preprocessing.

Neil, as the ABI policy author, what is your opinion?

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 14:32  4%         ` Olivier MATZ
  2015-06-10 14:51  0%           ` Zhang, Helin
@ 2015-06-10 15:39  0%           ` Ananyev, Konstantin
  2015-06-12  3:22  0%             ` Zhang, Helin
  2015-06-10 16:14  5%           ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-06-10 15:39 UTC (permalink / raw)
  To: Olivier MATZ, O'Driscoll, Tim, Zhang, Helin, dev

Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Wednesday, June 10, 2015 3:33 PM
> To: O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
> 
> Hi Tim, Helin,
> 
> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Monday, June 1, 2015 9:15 AM
> >> To: Zhang, Helin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> >> rte_mbuf
> >>
> >> Hi Helin,
> >>
> >> +CC Neil
> >>
> >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>> In order to unify the packet type, the field of 'packet_type' in
> >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
> >>> KNI should be right mapped to 'struct rte_mbuf', it should be
> >>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
> >>> by default, as 'struct rte_mbuf' changed.
> >>> To avoid breaking ABI compatibility, all the changes would be
> >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>
> >> What are the plans for this compile-time option in the future?
> >>
> >> I wonder what are the benefits of having this option in terms
> >> of ABI compatibility: when it is disabled, it is ABI-compatible but
> >> the packet-type feature is not present, and when it is enabled we
> >> have the feature but it breaks the compatibility.
> >>
> >> In my opinion, the v5 is preferable: for this kind of features, I
> >> don't see how the ABI can be preserved, and I think packet-type
> >> won't be the only feature that will modify the mbuf structure. I think
> >> the process described here should be applied:
> >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>
> >> (starting from "Some ABI changes may be too significant to reasonably
> >> maintain multiple versions of").
> >>
> >>
> >> Regards,
> >> Olivier
> >>
> >
> > This is just like the change that Steve (Cunming) Liang submitted for Interrupt Mode. We have the same problem in both cases: we
> want to find a way to get the features included, but need to comply with our ABI policy. So, in both cases, the proposal is to add a
> config option to enable the change by default, so we maintain backward compatibility. Users that want these changes, and are willing
> to accept the associated ABI change, have to specifically enable them.
> >
> > We can note in the Deprecation Notices in the Release Notes for 2.1 that these config options will be removed in 2.2. The features
> will then be enabled by default.
> >
> > This seems like a good compromise which allows us to get these changes into 2.1 but avoids breaking the ABI policy.
> 
> Sorry for the late answer.
> 
> After some thoughts on this topic, I understand that having a
> compile-time option is perhaps a good compromise between
> keeping compatibility and having new features earlier.
> 
> I'm just afraid about having one #ifdef in the code for
> each new feature that cannot keep the ABI compatibility.
> What do you think about having one option -- let's call
> it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
> and that would surround any new feature that breaks the
> ABI?

I am not Tim/Helin, but really like that idea :)
Konstantin


> 
> This would have several advantages:
> - only 2 cases (on or off), the combinatorial is smaller than
>    having one option per feature
> - all next features breaking the abi can be identified by a grep
> - the code inside the #ifdef can be enabled in a simple operation
>    by Thomas after each release.
> 
> Thomas, any comment?
> 
> Regards,
> Olivier
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  2015-06-10 14:32  4%         ` Olivier MATZ
@ 2015-06-10 14:51  0%           ` Zhang, Helin
  2015-06-10 15:39  0%           ` Ananyev, Konstantin
  2015-06-10 16:14  5%           ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Zhang, Helin @ 2015-06-10 14:51 UTC (permalink / raw)
  To: Olivier MATZ, O'Driscoll, Tim, dev

Hi Oliver

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Wednesday, June 10, 2015 10:33 PM
> To: O'Driscoll, Tim; Zhang, Helin; dev@dpdk.org
> Cc: Thomas Monjalon
> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
> rte_mbuf
> 
> Hi Tim, Helin,
> 
> On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Monday, June 1, 2015 9:15 AM
> >> To: Zhang, Helin; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type
> >> in rte_mbuf
> >>
> >> Hi Helin,
> >>
> >> +CC Neil
> >>
> >> On 06/01/2015 09:33 AM, Helin Zhang wrote:
> >>> In order to unify the packet type, the field of 'packet_type' in
> >>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
> >>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
> >>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for KNI
> >>> should be right mapped to 'struct rte_mbuf', it should be modified
> >>> accordingly. In addition, Vector PMD of ixgbe is disabled by
> >>> default, as 'struct rte_mbuf' changed.
> >>> To avoid breaking ABI compatibility, all the changes would be
> >>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
> >>
> >> What are the plans for this compile-time option in the future?
> >>
> >> I wonder what are the benefits of having this option in terms of ABI
> >> compatibility: when it is disabled, it is ABI-compatible but the
> >> packet-type feature is not present, and when it is enabled we have
> >> the feature but it breaks the compatibility.
> >>
> >> In my opinion, the v5 is preferable: for this kind of features, I
> >> don't see how the ABI can be preserved, and I think packet-type won't
> >> be the only feature that will modify the mbuf structure. I think the
> >> process described here should be applied:
> >> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
> >>
> >> (starting from "Some ABI changes may be too significant to reasonably
> >> maintain multiple versions of").
> >>
> >>
> >> Regards,
> >> Olivier
> >>
> >
> > This is just like the change that Steve (Cunming) Liang submitted for Interrupt
> Mode. We have the same problem in both cases: we want to find a way to get
> the features included, but need to comply with our ABI policy. So, in both cases,
> the proposal is to add a config option to enable the change by default, so we
> maintain backward compatibility. Users that want these changes, and are willing
> to accept the associated ABI change, have to specifically enable them.
> >
> > We can note in the Deprecation Notices in the Release Notes for 2.1 that these
> config options will be removed in 2.2. The features will then be enabled by
> default.
> >
> > This seems like a good compromise which allows us to get these changes into
> 2.1 but avoids breaking the ABI policy.
> 
> Sorry for the late answer.
> 
> After some thoughts on this topic, I understand that having a compile-time
> option is perhaps a good compromise between keeping compatibility and having
> new features earlier.
> 
> I'm just afraid about having one #ifdef in the code for each new feature that
> cannot keep the ABI compatibility.
> What do you think about having one option -- let's call it
> "CONFIG_RTE_NEXT_ABI" --, that is disabled by default, and that would surround
> any new feature that breaks the ABI?
Will we allow this type of workaround for a long time? If yes, agree with your good idea.

Regards,
Helin

> 
> This would have several advantages:
> - only 2 cases (on or off), the combinatorial is smaller than
>    having one option per feature
> - all next features breaking the abi can be identified by a grep
> - the code inside the #ifdef can be enabled in a simple operation
>    by Thomas after each release.
> 
> Thomas, any comment?
> 
> Regards,
> Olivier
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf
  @ 2015-06-10 14:32  4%         ` Olivier MATZ
  2015-06-10 14:51  0%           ` Zhang, Helin
                             ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Olivier MATZ @ 2015-06-10 14:32 UTC (permalink / raw)
  To: O'Driscoll, Tim, Zhang, Helin, dev

Hi Tim, Helin,

On 06/02/2015 03:27 PM, O'Driscoll, Tim wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
>> Sent: Monday, June 1, 2015 9:15 AM
>> To: Zhang, Helin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in
>> rte_mbuf
>>
>> Hi Helin,
>>
>> +CC Neil
>>
>> On 06/01/2015 09:33 AM, Helin Zhang wrote:
>>> In order to unify the packet type, the field of 'packet_type' in
>>> 'struct rte_mbuf' needs to be extended from 16 to 32 bits.
>>> Accordingly, some fields in 'struct rte_mbuf' are re-organized to
>>> support this change for Vector PMD. As 'struct rte_kni_mbuf' for
>>> KNI should be right mapped to 'struct rte_mbuf', it should be
>>> modified accordingly. In addition, Vector PMD of ixgbe is disabled
>>> by default, as 'struct rte_mbuf' changed.
>>> To avoid breaking ABI compatibility, all the changes would be
>>> enabled by RTE_UNIFIED_PKT_TYPE, which is disabled by default.
>>
>> What are the plans for this compile-time option in the future?
>>
>> I wonder what are the benefits of having this option in terms
>> of ABI compatibility: when it is disabled, it is ABI-compatible but
>> the packet-type feature is not present, and when it is enabled we
>> have the feature but it breaks the compatibility.
>>
>> In my opinion, the v5 is preferable: for this kind of features, I
>> don't see how the ABI can be preserved, and I think packet-type
>> won't be the only feature that will modify the mbuf structure. I think
>> the process described here should be applied:
>> http://dpdk.org/browse/dpdk/tree/doc/guides/rel_notes/abi.rst
>>
>> (starting from "Some ABI changes may be too significant to reasonably
>> maintain multiple versions of").
>>
>>
>> Regards,
>> Olivier
>>
>
> This is just like the change that Steve (Cunming) Liang submitted for Interrupt Mode. We have the same problem in both cases: we want to find a way to get the features included, but need to comply with our ABI policy. So, in both cases, the proposal is to add a config option to enable the change by default, so we maintain backward compatibility. Users that want these changes, and are willing to accept the associated ABI change, have to specifically enable them.
>
> We can note in the Deprecation Notices in the Release Notes for 2.1 that these config options will be removed in 2.2. The features will then be enabled by default.
>
> This seems like a good compromise which allows us to get these changes into 2.1 but avoids breaking the ABI policy.

Sorry for the late answer.

After some thoughts on this topic, I understand that having a
compile-time option is perhaps a good compromise between
keeping compatibility and having new features earlier.

I'm just afraid about having one #ifdef in the code for
each new feature that cannot keep the ABI compatibility.
What do you think about having one option -- let's call
it "CONFIG_RTE_NEXT_ABI" --, that is disabled by default,
and that would surround any new feature that breaks the
ABI?

This would have several advantages:
- only 2 cases (on or off), the combinatorial is smaller than
   having one option per feature
- all next features breaking the abi can be identified by a grep
- the code inside the #ifdef can be enabled in a simple operation
   by Thomas after each release.

Thomas, any comment?

Regards,
Olivier

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD
  2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  2015-06-08  5:29  2%     ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-06-08  5:29 11%     ` [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-06-09 23:59  0%     ` Stephen Hemminger
  2015-06-19  4:00  4%     ` [dpdk-dev] [PATCH v13 " Cunming Liang
  3 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-06-09 23:59 UTC (permalink / raw)
  To: Cunming Liang; +Cc: dev, liang-min.wang

On Mon,  8 Jun 2015 13:28:57 +0800
Cunming Liang <cunming.liang@intel.com> wrote:

> v12 changes
>  - bsd cleanup for unused variable warning
>  - fix awkward line split in debug message
> 
> v11 changes
>  - typo cleanup and check kernel style
> 
> v10 changes
>  - code rework to return actual error code
>  - bug fix for lsc when using uio_pci_generic
> 
> v9 changes
>  - code rework to fix open comment
>  - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
>  - new patch to turn off the feature by default so as to avoid v2.1 abi broken
> 
> v8 changes
>  - remove condition check for only vfio-msix
>  - add multiplex intr support when only one intr vector allowed
>  - lsc and rxq interrupt runtime enable decision
>  - add safe event delete while the event wakeup execution happens
> 
> v7 changes
>  - decouple epoll event and intr operation
>  - add condition check in the case intr vector is disabled
>  - renaming some APIs
> 
> v6 changes
>  - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
>  - rewrite rte_intr_rx_wait/rte_intr_rx_set.
>  - using vector number instead of queue_id as interrupt API params.
>  - patch reorder and split.
> 
> v5 changes
>  - Rebase the patchset onto the HEAD
>  - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
>  - Export wait-for-rx interrupt function for shared libraries
>  - Split-off a new patch file for changed struct rte_intr_handle that
>    other patches depend on, to avoid breaking git bisect
>  - Change sample applicaiton to accomodate EAL function spec change
>    accordingly
> 
> v4 changes
>  - Export interrupt enable/disable functions for shared libraries
>  - Adjust position of new-added structure fields and functions to
>    avoid breaking ABI
>  
> v3 changes
>  - Add return value for interrupt enable/disable functions
>  - Move spinlok from PMD to L3fwd-power
>  - Remove unnecessary variables in e1000_mac_info
>  - Fix miscelleous review comments
>  
> v2 changes
>  - Fix compilation issue in Makefile for missed header file.
>  - Consolidate internal and community review comments of v1 patch set.
>  
> The patch series introduce low-latency one-shot rx interrupt into DPDK with
> polling and interrupt mode switch control example.
>  
> DPDK userspace interrupt notification and handling mechanism is based on UIO
> with below limitation:
> 1) It is designed to handle LSC interrupt only with inefficient suspended
>    pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
>    which then wakes up DPDK polling thread). In this way, it introduces
>    non-deterministic wakeup latency for DPDK polling thread as well as packet
>    latency if it is used to handle Rx interrupt.
> 2) UIO only supports a single interrupt vector which has to been shared by
>    LSC interrupt and interrupts assigned to dedicated rx queues.
>  
> This patchset includes below features:
> 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
> 2) Build on top of the VFIO mechanism instead of UIO, so it could support
>    up to 64 interrupt vectors for rx queue interrupts.
> 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
>    VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
>    user space.
> 4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
>    switch algorithms in L3fwd-power example.
> 
> Known limitations:
> 1) It does not work for UIO due to a single interrupt eventfd shared by LSC
>    and rx queue interrupt handlers causes a mess. [FIXED]
> 2) LSC interrupt is not supported by VF driver, so it is by default disabled
>    in L3fwd-power now. Feel free to turn in on if you want to support both LSC
>    and rx queue interrupts on a PF.
> 
> Cunming Liang (14):
>   eal/linux: add interrupt vectors support in intr_handle
>   eal/linux: add rte_epoll_wait/ctl support
>   eal/linux: add API to set rx interrupt event monitor
>   eal/linux: fix comments typo on vfio msi
>   eal/linux: add interrupt vectors handling on VFIO
>   eal/linux: standalone intr event fd create support
>   eal/linux: fix lsc read error in uio_pci_generic
>   eal/bsd: dummy for new intr definition
>   eal/bsd: fix inappropriate linuxapp referred in bsd
>   ethdev: add rx intr enable, disable and ctl functions
>   ixgbe: enable rx queue interrupts for both PF and VF
>   igb: enable rx queue interrupts for PF
>   l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
>     switch
>   abi: fix v2.1 abi broken issue
> 
>  drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
>  drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
>  examples/l3fwd-power/main.c                        | 206 ++++++--
>  lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  30 ++
>  .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  91 +++-
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
>  lib/librte_ether/rte_ethdev.c                      | 109 +++++
>  lib/librte_ether/rte_ethdev.h                      | 132 ++++++
>  lib/librte_ether/rte_ether_version.map             |   4 +
>  13 files changed, 1871 insertions(+), 128 deletions(-)
> 

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

This still needs more work in lots more drivers (like bonding) and in several
other subsystems (like pipeline) before it is widely useful. But this is
a great first step.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: guidelines for library statistics
@ 2015-06-08 14:50  5% Cristian Dumitrescu
  2015-06-11 12:05  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Cristian Dumitrescu @ 2015-06-08 14:50 UTC (permalink / raw)
  To: dev

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 doc/guides/guidelines/index.rst      |  1 +
 doc/guides/guidelines/statistics.rst | 42 ++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)
 create mode 100644 doc/guides/guidelines/statistics.rst

diff --git a/doc/guides/guidelines/index.rst b/doc/guides/guidelines/index.rst
index b2b0a92..c01f958
--- a/doc/guides/guidelines/index.rst
+++ b/doc/guides/guidelines/index.rst
@@ -6,3 +6,4 @@ Guidelines
     :numbered:
 
     coding_style
+    statistics
diff --git a/doc/guides/guidelines/statistics.rst b/doc/guides/guidelines/statistics.rst
new file mode 100644
index 0000000..32c6020
--- /dev/null
+++ b/doc/guides/guidelines/statistics.rst
@@ -0,0 +1,42 @@
+Library Statistics
+==================
+
+Description
+-----------
+
+This document describes the guidelines for DPDK library-level statistics counter support. This includes guidelines for turning library statistics on and off, requirements for preventing ABI changes when library statistics are turned on and off, etc.
+
+Motivation to allow the application to turn library statistics on and off
+-------------------------------------------------------------------------
+
+It is highly recommended that each library provides statistics counters to allow the application to monitor the library-level run-time events. Typical counters are: number of packets received/dropped/transmitted, number of buffers allocated/freed, number of occurrences for specific events, etc.
+
+Since the resources consumed for library-level statistics counter collection have to be spent out of the application budget and the counters collected by some libraries might not be relevant for the current application, in order to avoid any unwanted waste of resources and/or performance for the application, the application is to decide at build time whether the collection of library-level statistics counters should be turned on or off for each library individually.
+
+Library-level statistics counters can be relevant or not for specific applications:
+* For application A, counters maintained by library X are always relevant and the application needs to use them to implement certain features, as traffic accounting, logging, application-level statistics, etc. In this case, the application requires that collection of statistics counters for library X is always turned on;
+* For application B, counters maintained by library X are only useful during the application debug stage and not relevant once debug phase is over. In this case, the application may decide to turn on the collection of library X statistics counters during the debug phase and later on turn them off;
+* For application C, counters maintained by library X are not relevant at all. It might me that the application maintains its own set of statistics counters that monitor a different set of run-time events than library X (e.g. number of connection requests, number of active users, etc). It might also be that application uses multiple libraries (library X, library Y, etc) and it is interested in the statistics counters of library Y, but not in those of library X. In this case, the application may decide to turn the collection of statistics counters off for library X and on for library Y.
+
+The statistics collection consumes a certain amount of CPU resources (cycles, cache bandwidth, memory bandwidth, etc) that depends on:
+* Number of libraries used by the current application that have statistics counters collection turned on;
+* Number of statistics counters maintained by each library per object type instance (e.g. per port, table, pipeline, thread, etc);
+* Number of instances created for each object type supported by each library;
+* Complexity of the statistics logic collection for each counter: when only some occurrences of a specific event are valid, several conditional branches might involved in the decision of whether the current occurrence of the event should be counted or not (e.g. on the event of packet reception, only TCP packets with destination port within a certain range should be recorded), etc.
+
+Mechanism to allow the application to turn library statistics on and off
+------------------------------------------------------------------------
+
+Each library that provides statistics counters should provide a single build time flag that decides whether the statistics counter collection is enabled or not for this library. This flag should be exposed as a variable within the DPDK configuration file. When this flag is set, all the counters supported by current library are collected; when this flag is cleared, none of the counters supported by the current library are collected:
+
+	#DPDK file “./config/common_linuxapp”, “./config/common_bsdapp”, etc
+	CONFIG_RTE_LIBRTE_<LIBRARY_NAME>_COLLECT_STATS=y/n
+
+The default value for this DPDK configuration file variable (either “yes” or “no”) is left at the decision of each library.
+
+Prevention of ABI changes due to library statistics support
+-----------------------------------------------------------
+
+The layout of data structures and prototype of functions that are part of the library API should not be affected by whether the collection of statistics counters is turned on or off for the current library. In practical terms, this means that space is always allocated in the API data structures for statistics counters and the statistics related API functions are always built into the code, regardless of whether the statistics counter collection is turned on or off for the current library.
+
+When the collection of statistics counters for the current library is turned off, the counters retrieved through the statistics related API functions should have the default value of zero.
-- 
1.8.5.3

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
    @ 2015-06-08  7:40  0%   ` Olivier MATZ
  2015-06-11  7:03  3%   ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
  2 siblings, 0 replies; 200+ results
From: Olivier MATZ @ 2015-06-08  7:40 UTC (permalink / raw)
  To: Helin Zhang, dev

Hi Helin,

On 06/02/2015 05:16 AM, Helin Zhang wrote:
> As i40e hardware can be reconfigured to support QinQ stripping and
> insertion, this patch set is to enable that together with using the
> reserved 16 bits in 'struct rte_mbuf' for the second vlan tag.
> Corresponding command is added in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/6] support i40e QinQ stripping and insertion
  @ 2015-06-08  7:32  0%     ` Cao, Min
  0 siblings, 0 replies; 200+ results
From: Cao, Min @ 2015-06-08  7:32 UTC (permalink / raw)
  To: Liu, Jijiang, Zhang, Helin, dev

Tested-by: Min Cao <min.cao@intel.com>

- OS: Fedora20  3.11.10-301
- GCC: gcc version 4.8.2 20131212
- CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
- NIC: Ethernet controller: Intel Corporation Device 1572 (rev 01)
- Default x86_64-native-linuxapp-gcc configuration
- Total 2 cases, 2 passed, 0 failed

- Case: double vlan filter
- Case: double vlan insertion 

> -----Original Message-----
> From: Zhang, Helin
> Sent: Tuesday, June 2, 2015 11:16 AM
> To: dev@dpdk.org
> Cc: Cao, Min; Liu, Jijiang; Wu, Jingjing; Ananyev, Konstantin; 
> Richardson, Bruce; olivier.matz@6wind.com; Zhang, Helin
> Subject: [PATCH v2 0/6] support i40e QinQ stripping and insertion
> 
> As i40e hardware can be reconfigured to support QinQ stripping and 
> insertion, this patch set is to enable that together with using the 
> reserved 16 bits in 'struct rte_mbuf' for the second vlan tag.
> Corresponding command is added in testpmd for testing.
> Note that no need to rework vPMD, as nothings used in it changed.
> 
> v2 changes:
> * Added more commit logs of which commit it fix for.
> * Fixed a typo.
> * Kept the original RX/TX offload flags as they were, added new
>   flags after with new bit masks, for ABI compatibility.
> * Supported double vlan stripping/insertion in examples/ipv4_multicast.
> 
> Helin Zhang (6):
>   ixgbe: remove a discarded source line
>   mbuf: use the reserved 16 bits for double vlan
>   i40e: support double vlan stripping and insertion
>   i40evf: add supported offload capability flags
>   app/testpmd: add test cases for qinq stripping and insertion
>   examples/ipv4_multicast: support double vlan stripping and insertion
> 
>  app/test-pmd/cmdline.c            | 78 +++++++++++++++++++++++++++++++++----
>  app/test-pmd/config.c             | 21 +++++++++-
>  app/test-pmd/flowgen.c            |  4 +-
>  app/test-pmd/macfwd.c             |  3 ++
>  app/test-pmd/macswap.c            |  3 ++
>  app/test-pmd/rxonly.c             |  3 ++
>  app/test-pmd/testpmd.h            |  6 ++-
>  app/test-pmd/txonly.c             |  8 +++-
>  drivers/net/i40e/i40e_ethdev.c    | 52 +++++++++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev_vf.c | 13 +++++++
>  drivers/net/i40e/i40e_rxtx.c      | 81 +++++++++++++++++++++++++--------------
>  drivers/net/ixgbe/ixgbe_rxtx.c    |  1 -
>  examples/ipv4_multicast/main.c    |  1 +
>  lib/librte_ether/rte_ethdev.h     |  2 +
>  lib/librte_mbuf/rte_mbuf.h        | 10 ++++-
>  15 files changed, 243 insertions(+), 43 deletions(-)
> 
> --
> 1.9.3

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue
  2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  2015-06-08  5:29  2%     ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-06-08  5:29 11%     ` Cunming Liang
  2015-06-09 23:59  0%     ` [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD Stephen Hemminger
  2015-06-19  4:00  4%     ` [dpdk-dev] [PATCH v13 " Cunming Liang
  3 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-08  5:29 UTC (permalink / raw)
  To: dev, shemming; +Cc: liang-min.wang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The usrs should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 v9 Acked-by: vincent jardin <vincent.jardin@6wind.com>

 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1894,6 +1913,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3750,6 +3770,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3791,6 +3812,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3800,18 +3822,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_EAL_RX_INTR
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3868,6 +3893,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index bcec971..3a70af6 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1481,7 +1487,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1514,6 +1522,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1532,6 +1541,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1619,9 +1629,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1727,12 +1739,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2335,6 +2349,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_EAL_RX_INTR
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2345,6 +2360,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3127,7 +3143,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3160,6 +3178,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3177,7 +3196,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3223,19 +3242,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3246,11 +3269,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3834,6 +3859,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3902,21 +3928,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3927,6 +3957,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3937,18 +3968,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4000,6 +4034,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 538bb93..3b4054c 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,6 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
 	},
 };
 
@@ -889,7 +888,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index ba4640a..11dc59b 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -51,9 +51,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index d7a5403..f81d553 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_EAL_RX_INTR
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_EAL_RX_INTR
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -876,6 +884,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -918,6 +927,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1056,6 +1066,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1168,3 +1179,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 912cc50..a2056bd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_EAL_RX_INTR
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_EAL_RX_INTR
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 27a87f5..3f6e1f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3352,6 +3353,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c199d32..8bea68d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_EAL_RX_INTR
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2943,8 +2945,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2967,9 +2981,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 11%]

* [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions
  2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
@ 2015-06-08  5:29  2%     ` Cunming Liang
  2015-06-08  5:29 11%     ` [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue Cunming Liang
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-08  5:29 UTC (permalink / raw)
  To: dev, shemming; +Cc: liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..27a87f5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1386,6 +1396,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2868,6 +2882,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD
  2015-06-05  8:19  4% ` [dpdk-dev] [PATCH v11 " Cunming Liang
                     ` (2 preceding siblings ...)
  2015-06-05  8:59  0%   ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Zhou, Danny
@ 2015-06-08  5:28  4%   ` Cunming Liang
  2015-06-08  5:29  2%     ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
                       ` (3 more replies)
  3 siblings, 4 replies; 200+ results
From: Cunming Liang @ 2015-06-08  5:28 UTC (permalink / raw)
  To: dev, shemming; +Cc: liang-min.wang

v12 changes
 - bsd cleanup for unused variable warning
 - fix awkward line split in debug message

v11 changes
 - typo cleanup and check kernel style

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by default so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (14):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  eal/bsd: fix inappropriate linuxapp referred in bsd
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 206 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  30 ++
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  91 +++-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1871 insertions(+), 128 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 2/7] eal: memzone allocated by malloc
  @ 2015-06-06 10:32  1%   ` Sergio Gonzalez Monroy
  2015-06-19 17:21  4%   ` [dpdk-dev] [PATCH v3 0/9] Dynamic memzone Sergio Gonzalez Monroy
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Sergio Gonzalez Monroy @ 2015-06-06 10:32 UTC (permalink / raw)
  To: dev

In the current memory hierarchy, memsegs are groups of physically contiguous
hugepages, memzones are slices of memsegs and malloc further slices memzones
into smaller memory chunks.

This patch modifies malloc so it partitions memsegs instead of memzones.
Thus memzones would call malloc internally for memory allocation while
maintaining its ABI.

It would be possible to free memzones and therefore any other structure
based on memzones, ie. mempools

Signed-off-by: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c        | 273 ++++++----------------
 lib/librte_eal/common/include/rte_eal_memconfig.h |   2 +-
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/include/rte_memory.h        |   1 +
 lib/librte_eal/common/malloc_elem.c               |  68 ++++--
 lib/librte_eal/common/malloc_elem.h               |  14 +-
 lib/librte_eal/common/malloc_heap.c               | 140 ++++++-----
 lib/librte_eal/common/malloc_heap.h               |   6 +-
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 9 files changed, 197 insertions(+), 317 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 888f9e5..742f6c9 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -50,11 +50,10 @@
 #include <rte_string_fns.h>
 #include <rte_common.h>
 
+#include "malloc_heap.h"
+#include "malloc_elem.h"
 #include "eal_private.h"
 
-/* internal copy of free memory segments */
-static struct rte_memseg *free_memseg = NULL;
-
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
@@ -68,8 +67,9 @@ memzone_lookup_thread_unsafe(const char *name)
 	 * the algorithm is not optimal (linear), but there are few
 	 * zones and this function should be called at init only
 	 */
-	for (i = 0; i < RTE_MAX_MEMZONE && mcfg->memzone[i].addr != NULL; i++) {
-		if (!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
+	for (i = 0; i < RTE_MAX_MEMZONE; i++) {
+		if (mcfg->memzone[i].addr != NULL &&
+				!strncmp(name, mcfg->memzone[i].name, RTE_MEMZONE_NAMESIZE))
 			return &mcfg->memzone[i];
 	}
 
@@ -88,39 +88,45 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 			len, socket_id, flags, RTE_CACHE_LINE_SIZE);
 }
 
-/*
- * Helper function for memzone_reserve_aligned_thread_unsafe().
- * Calculate address offset from the start of the segment.
- * Align offset in that way that it satisfy istart alignmnet and
- * buffer of the  requested length would not cross specified boundary.
- */
-static inline phys_addr_t
-align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align,
-	size_t bound)
+/* Find the heap with the greatest free block size */
+static void
+find_heap_max_free_elem(int *s, size_t *len, unsigned align)
 {
-	phys_addr_t addr_offset, bmask, end, start;
-	size_t step;
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-	step = RTE_MAX(align, bound);
-	bmask = ~((phys_addr_t)bound - 1);
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* calculate offset to closest alignment */
-	start = RTE_ALIGN_CEIL(ms->phys_addr, align);
-	addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size > *len) {
+			*len = stats.greatest_free_size;
+			*s = i;
+		}
+	}
+	*len -= (MALLOC_ELEM_OVERHEAD + align);
+}
 
-	while (addr_offset + len < ms->len) {
+/* Find a heap that can allocate the requested size */
+static void
+find_heap_suitable(int *s, size_t len, unsigned align)
+{
+	struct rte_mem_config *mcfg;
+	struct rte_malloc_socket_stats stats;
+	unsigned i;
 
-		/* check, do we meet boundary condition */
-		end = start + len - (len != 0);
-		if ((start & bmask) == (end & bmask))
-			break;
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
 
-		/* calculate next offset */
-		start = RTE_ALIGN_CEIL(start + 1, step);
-		addr_offset = start - ms->phys_addr;
+	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
+		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
+		if (stats.greatest_free_size >= len + MALLOC_ELEM_OVERHEAD + align) {
+			*s = i;
+			break;
+		}
 	}
-
-	return (addr_offset);
 }
 
 static const struct rte_memzone *
@@ -128,13 +134,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned flags, unsigned align, unsigned bound)
 {
 	struct rte_mem_config *mcfg;
-	unsigned i = 0;
-	int memseg_idx = -1;
-	uint64_t addr_offset, seg_offset = 0;
 	size_t requested_len;
-	size_t memseg_len = 0;
-	phys_addr_t memseg_physaddr;
-	void *memseg_addr;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -166,7 +166,6 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (align < RTE_CACHE_LINE_SIZE)
 		align = RTE_CACHE_LINE_SIZE;
 
-
 	/* align length on cache boundary. Check for overflow before doing so */
 	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
 		rte_errno = EINVAL; /* requested size too big */
@@ -180,129 +179,50 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	requested_len = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE,  len);
 
 	/* check that boundary condition is valid */
-	if (bound != 0 &&
-			(requested_len > bound || !rte_is_power_of_2(bound))) {
+	if (bound != 0 && (requested_len > bound || !rte_is_power_of_2(bound))) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	/* find the smallest segment matching requirements */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		/* last segment */
-		if (free_memseg[i].addr == NULL)
-			break;
+	if (len == 0) {
+		if (bound != 0)
+			requested_len = bound;
+		else
+			requested_len = 0;
+	}
 
-		/* empty segment, skip it */
-		if (free_memseg[i].len == 0)
-			continue;
-
-		/* bad socket ID */
-		if (socket_id != SOCKET_ID_ANY &&
-		    free_memseg[i].socket_id != SOCKET_ID_ANY &&
-		    socket_id != free_memseg[i].socket_id)
-			continue;
-
-		/*
-		 * calculate offset to closest alignment that
-		 * meets boundary conditions.
-		 */
-		addr_offset = align_phys_boundary(free_memseg + i,
-			requested_len, align, bound);
-
-		/* check len */
-		if ((requested_len + addr_offset) > free_memseg[i].len)
-			continue;
-
-		/* check flags for hugepage sizes */
-		if ((flags & RTE_MEMZONE_2MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_1G)
-			continue;
-		if ((flags & RTE_MEMZONE_1GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_2M)
-			continue;
-		if ((flags & RTE_MEMZONE_16MB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16G)
-			continue;
-		if ((flags & RTE_MEMZONE_16GB) &&
-				free_memseg[i].hugepage_sz == RTE_PGSIZE_16M)
-			continue;
-
-		/* this segment is the best until now */
-		if (memseg_idx == -1) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
-		}
-		/* find the biggest contiguous zone */
-		else if (len == 0) {
-			if (free_memseg[i].len > memseg_len) {
-				memseg_idx = i;
-				memseg_len = free_memseg[i].len;
-				seg_offset = addr_offset;
-			}
-		}
-		/*
-		 * find the smallest (we already checked that current
-		 * zone length is > len
-		 */
-		else if (free_memseg[i].len + align < memseg_len ||
-				(free_memseg[i].len <= memseg_len + align &&
-				addr_offset < seg_offset)) {
-			memseg_idx = i;
-			memseg_len = free_memseg[i].len;
-			seg_offset = addr_offset;
+	if (socket_id == SOCKET_ID_ANY) {
+		if (requested_len == 0)
+			find_heap_max_free_elem(&socket_id, &requested_len, align);
+		else
+			find_heap_suitable(&socket_id, requested_len, align);
+
+		if (socket_id == SOCKET_ID_ANY) {
+			rte_errno = ENOMEM;
+			return NULL;
 		}
 	}
 
-	/* no segment found */
-	if (memseg_idx == -1) {
-		/*
-		 * If RTE_MEMZONE_SIZE_HINT_ONLY flag is specified,
-		 * try allocating again without the size parameter otherwise -fail.
-		 */
-		if ((flags & RTE_MEMZONE_SIZE_HINT_ONLY)  &&
-		    ((flags & RTE_MEMZONE_1GB) || (flags & RTE_MEMZONE_2MB)
-		|| (flags & RTE_MEMZONE_16MB) || (flags & RTE_MEMZONE_16GB)))
-			return memzone_reserve_aligned_thread_unsafe(name,
-				len, socket_id, 0, align, bound);
-
+	/* allocate memory on heap */
+	void *mz_addr = malloc_heap_alloc(&mcfg->malloc_heaps[socket_id], NULL,
+			requested_len, flags, align, bound);
+	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
 	}
 
-	/* save aligned physical and virtual addresses */
-	memseg_physaddr = free_memseg[memseg_idx].phys_addr + seg_offset;
-	memseg_addr = RTE_PTR_ADD(free_memseg[memseg_idx].addr,
-			(uintptr_t) seg_offset);
-
-	/* if we are looking for a biggest memzone */
-	if (len == 0) {
-		if (bound == 0)
-			requested_len = memseg_len - seg_offset;
-		else
-			requested_len = RTE_ALIGN_CEIL(memseg_physaddr + 1,
-				bound) - memseg_physaddr;
-	}
-
-	/* set length to correct value */
-	len = (size_t)seg_offset + requested_len;
-
-	/* update our internal state */
-	free_memseg[memseg_idx].len -= len;
-	free_memseg[memseg_idx].phys_addr += len;
-	free_memseg[memseg_idx].addr =
-		(char *)free_memseg[memseg_idx].addr + len;
+	const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
 
 	/* fill the zone in config */
 	struct rte_memzone *mz = &mcfg->memzone[mcfg->memzone_idx++];
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
-	mz->phys_addr = memseg_physaddr;
-	mz->addr = memseg_addr;
-	mz->len = requested_len;
-	mz->hugepage_sz = free_memseg[memseg_idx].hugepage_sz;
-	mz->socket_id = free_memseg[memseg_idx].socket_id;
+	mz->phys_addr = rte_malloc_virt2phy(mz_addr);
+	mz->addr = mz_addr;
+	mz->len = (requested_len == 0? elem->size: requested_len);
+	mz->hugepage_sz = elem->ms->hugepage_sz;
+	mz->socket_id = elem->ms->socket_id;
 	mz->flags = 0;
-	mz->memseg_id = memseg_idx;
+	mz->memseg_id = elem->ms - rte_eal_get_configuration()->mem_config->memseg;
 
 	return mz;
 }
@@ -419,45 +339,6 @@ rte_memzone_dump(FILE *f)
 }
 
 /*
- * called by init: modify the free memseg list to have cache-aligned
- * addresses and cache-aligned lengths
- */
-static int
-memseg_sanitize(struct rte_memseg *memseg)
-{
-	unsigned phys_align;
-	unsigned virt_align;
-	unsigned off;
-
-	phys_align = memseg->phys_addr & RTE_CACHE_LINE_MASK;
-	virt_align = (unsigned long)memseg->addr & RTE_CACHE_LINE_MASK;
-
-	/*
-	 * sanity check: phys_addr and addr must have the same
-	 * alignment
-	 */
-	if (phys_align != virt_align)
-		return -1;
-
-	/* memseg is really too small, don't bother with it */
-	if (memseg->len < (2 * RTE_CACHE_LINE_SIZE)) {
-		memseg->len = 0;
-		return 0;
-	}
-
-	/* align start address */
-	off = (RTE_CACHE_LINE_SIZE - phys_align) & RTE_CACHE_LINE_MASK;
-	memseg->phys_addr += off;
-	memseg->addr = (char *)memseg->addr + off;
-	memseg->len -= off;
-
-	/* align end address */
-	memseg->len &= ~((uint64_t)RTE_CACHE_LINE_MASK);
-
-	return 0;
-}
-
-/*
  * Init the memzone subsystem
  */
 int
@@ -465,14 +346,10 @@ rte_eal_memzone_init(void)
 {
 	struct rte_mem_config *mcfg;
 	const struct rte_memseg *memseg;
-	unsigned i = 0;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
 
-	/* mirror the runtime memsegs from config */
-	free_memseg = mcfg->free_memseg;
-
 	/* secondary processes don't need to initialise anything */
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
 		return 0;
@@ -485,33 +362,13 @@ rte_eal_memzone_init(void)
 
 	rte_rwlock_write_lock(&mcfg->mlock);
 
-	/* fill in uninitialized free_memsegs */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (memseg[i].addr == NULL)
-			break;
-		if (free_memseg[i].addr != NULL)
-			continue;
-		memcpy(&free_memseg[i], &memseg[i], sizeof(struct rte_memseg));
-	}
-
-	/* make all zones cache-aligned */
-	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-		if (free_memseg[i].addr == NULL)
-			break;
-		if (memseg_sanitize(&free_memseg[i]) < 0) {
-			RTE_LOG(ERR, EAL, "%s(): Sanity check failed\n", __func__);
-			rte_rwlock_write_unlock(&mcfg->mlock);
-			return -1;
-		}
-	}
-
 	/* delete all zones */
 	mcfg->memzone_idx = 0;
 	memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
 
 	rte_rwlock_write_unlock(&mcfg->mlock);
 
-	return 0;
+	return rte_eal_malloc_heap_init();
 }
 
 /* Walk all reserved memory zones */
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 34f5abc..055212a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -73,7 +73,7 @@ struct rte_mem_config {
 	struct rte_memseg memseg[RTE_MAX_MEMSEG];    /**< Physmem descriptors. */
 	struct rte_memzone memzone[RTE_MAX_MEMZONE]; /**< Memzone descriptors. */
 
-	/* Runtime Physmem descriptors. */
+	/* Runtime Physmem descriptors - NOT USED */
 	struct rte_memseg free_memseg[RTE_MAX_MEMSEG];
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index 716216f..b270356 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -40,7 +40,7 @@
 #include <rte_memory.h>
 
 /* Number of free lists per heap, grouped by size. */
-#define RTE_HEAP_NUM_FREELISTS  5
+#define RTE_HEAP_NUM_FREELISTS  13
 
 /**
  * Structure to hold malloc heap
@@ -48,7 +48,6 @@
 struct malloc_heap {
 	rte_spinlock_t lock;
 	LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
-	unsigned mz_count;
 	unsigned alloc_count;
 	size_t total_size;
 } __rte_cache_aligned;
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 7f8103f..675b630 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -100,6 +100,7 @@ struct rte_memseg {
 	 /**< store segment MFNs */
 	uint64_t mfn[DOM0_NUM_MEMBLOCK];
 #endif
+	uint8_t used;               /**< Used by a heap */
 } __attribute__((__packed__));
 
 /**
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index a5e1248..b54ee33 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -37,7 +37,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
 #include <rte_per_lcore.h>
@@ -56,10 +55,10 @@
  */
 void
 malloc_elem_init(struct malloc_elem *elem,
-		struct malloc_heap *heap, const struct rte_memzone *mz, size_t size)
+		struct malloc_heap *heap, const struct rte_memseg *ms, size_t size)
 {
 	elem->heap = heap;
-	elem->mz = mz;
+	elem->ms = ms;
 	elem->prev = NULL;
 	memset(&elem->free_list, 0, sizeof(elem->free_list));
 	elem->state = ELEM_FREE;
@@ -70,12 +69,12 @@ malloc_elem_init(struct malloc_elem *elem,
 }
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
 {
-	malloc_elem_init(elem, prev->heap, prev->mz, 0);
+	malloc_elem_init(elem, prev->heap, prev->ms, 0);
 	elem->prev = prev;
 	elem->state = ELEM_BUSY; /* mark busy so its never merged */
 }
@@ -86,12 +85,24 @@ malloc_elem_mkend(struct malloc_elem *elem, struct malloc_elem *prev)
  * fit, return NULL.
  */
 static void *
-elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
+elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	const uintptr_t end_pt = (uintptr_t)elem +
+	const size_t bmask = ~(bound - 1);
+	uintptr_t end_pt = (uintptr_t)elem +
 			elem->size - MALLOC_ELEM_TRAILER_LEN;
-	const uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
-	const uintptr_t new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
+	uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+	uintptr_t new_elem_start;
+
+	/* check boundary */
+	if ((new_data_start & bmask) != ((end_pt - 1) & bmask)) {
+		end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
+		new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
+		if (((end_pt - 1) & bmask) != (new_data_start & bmask))
+			return NULL;
+	}
+
+	new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
 
 	/* if the new start point is before the exist start, it won't fit */
 	return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
@@ -102,9 +113,10 @@ elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align)
  * alignment request from the current element
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,	unsigned align,
+		size_t bound)
 {
-	return elem_start_pt(elem, size, align) != NULL;
+	return elem_start_pt(elem, size, align, bound) != NULL;
 }
 
 /*
@@ -115,10 +127,10 @@ static void
 split_elem(struct malloc_elem *elem, struct malloc_elem *split_pt)
 {
 	struct malloc_elem *next_elem = RTE_PTR_ADD(elem, elem->size);
-	const unsigned old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
-	const unsigned new_elem_size = elem->size - old_elem_size;
+	const size_t old_elem_size = (uintptr_t)split_pt - (uintptr_t)elem;
+	const size_t new_elem_size = elem->size - old_elem_size;
 
-	malloc_elem_init(split_pt, elem->heap, elem->mz, new_elem_size);
+	malloc_elem_init(split_pt, elem->heap, elem->ms, new_elem_size);
 	split_pt->prev = elem;
 	next_elem->prev = split_pt;
 	elem->size = old_elem_size;
@@ -168,8 +180,9 @@ malloc_elem_free_list_index(size_t size)
 void
 malloc_elem_free_list_insert(struct malloc_elem *elem)
 {
-	size_t idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
+	size_t idx;
 
+	idx = malloc_elem_free_list_index(elem->size - MALLOC_ELEM_HEADER_LEN);
 	elem->state = ELEM_FREE;
 	LIST_INSERT_HEAD(&elem->heap->free_head[idx], elem, free_list);
 }
@@ -190,12 +203,26 @@ elem_free_list_remove(struct malloc_elem *elem)
  * is not done here, as it's done there previously.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
+malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
+		size_t bound)
 {
-	struct malloc_elem *new_elem = elem_start_pt(elem, size, align);
-	const unsigned old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
+	const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
+	const size_t trailer_size = elem->size - old_elem_size - size -
+		MALLOC_ELEM_OVERHEAD;
+
+	elem_free_list_remove(elem);
 
-	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE){
+	if (trailer_size > MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+		/* split it, too much free space after elem */
+		struct malloc_elem *new_free_elem =
+				RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
+
+		split_elem(elem, new_free_elem);
+		malloc_elem_free_list_insert(new_free_elem);
+	}
+
+	if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
 		/* don't split it, pad the element instead */
 		elem->state = ELEM_BUSY;
 		elem->pad = old_elem_size;
@@ -208,8 +235,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 			new_elem->size = elem->size - elem->pad;
 			set_header(new_elem);
 		}
-		/* remove element from free list */
-		elem_free_list_remove(elem);
 
 		return new_elem;
 	}
@@ -219,7 +244,6 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align)
 	 * Re-insert original element, in case its new size makes it
 	 * belong on a different list.
 	 */
-	elem_free_list_remove(elem);
 	split_elem(elem, new_elem);
 	new_elem->state = ELEM_BUSY;
 	malloc_elem_free_list_insert(elem);
diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h
index 9790b1a..e05d2ea 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -47,9 +47,9 @@ enum elem_state {
 
 struct malloc_elem {
 	struct malloc_heap *heap;
-	struct malloc_elem *volatile prev;      /* points to prev elem in memzone */
+	struct malloc_elem *volatile prev;      /* points to prev elem in memseg */
 	LIST_ENTRY(malloc_elem) free_list;      /* list of free elements in heap */
-	const struct rte_memzone *mz;
+	const struct rte_memseg *ms;
 	volatile enum elem_state state;
 	uint32_t pad;
 	size_t size;
@@ -136,11 +136,11 @@ malloc_elem_from_data(const void *data)
 void
 malloc_elem_init(struct malloc_elem *elem,
 		struct malloc_heap *heap,
-		const struct rte_memzone *mz,
+		const struct rte_memseg *ms,
 		size_t size);
 
 /*
- * initialise a dummy malloc_elem header for the end-of-memzone marker
+ * initialise a dummy malloc_elem header for the end-of-memseg marker
  */
 void
 malloc_elem_mkend(struct malloc_elem *elem,
@@ -151,14 +151,16 @@ malloc_elem_mkend(struct malloc_elem *elem,
  * of the requested size and with the requested alignment
  */
 int
-malloc_elem_can_hold(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_can_hold(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * reserve a block of data in an existing malloc_elem. If the malloc_elem
  * is much larger than the data block requested, we split the element in two.
  */
 struct malloc_elem *
-malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align);
+malloc_elem_alloc(struct malloc_elem *elem, size_t size,
+		unsigned align, size_t bound);
 
 /*
  * free a malloc_elem block by adding it to the free list. If the
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index defb903..4a423c1 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -39,7 +39,6 @@
 #include <sys/queue.h>
 
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_launch.h>
@@ -54,123 +53,105 @@
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
-/* since the memzone size starts with a digit, it will appear unquoted in
- * rte_config.h, so quote it so it can be passed to rte_str_to_size */
-#define MALLOC_MEMZONE_SIZE RTE_STR(RTE_MALLOC_MEMZONE_SIZE)
-
-/*
- * returns the configuration setting for the memzone size as a size_t value
- */
-static inline size_t
-get_malloc_memzone_size(void)
+static unsigned
+check_hugepage_sz(unsigned flags, size_t hugepage_sz)
 {
-	return rte_str_to_size(MALLOC_MEMZONE_SIZE);
+	unsigned ret = 1;
+
+	if ((flags & RTE_MEMZONE_2MB) && hugepage_sz == RTE_PGSIZE_1G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_1GB) && hugepage_sz == RTE_PGSIZE_2M)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16MB) && hugepage_sz == RTE_PGSIZE_16G)
+		ret = 0;
+	if ((flags & RTE_MEMZONE_16GB) && hugepage_sz == RTE_PGSIZE_16M)
+		ret = 0;
+
+	return ret;
 }
 
 /*
- * reserve an extra memory zone and make it available for use by a particular
- * heap. This reserves the zone and sets a dummy malloc_elem header at the end
+ * Expand the heap with a memseg.
+ * This reserves the zone and sets a dummy malloc_elem header at the end
  * to prevent overflow. The rest of the zone is added to free list as a single
  * large free block
  */
-static int
-malloc_heap_add_memzone(struct malloc_heap *heap, size_t size, unsigned align)
+static void
+malloc_heap_add_memseg(struct malloc_heap *heap, struct rte_memseg *ms)
 {
-	const unsigned mz_flags = 0;
-	const size_t block_size = get_malloc_memzone_size();
-	/* ensure the data we want to allocate will fit in the memzone */
-	const size_t min_size = size + align + MALLOC_ELEM_OVERHEAD * 2;
-	const struct rte_memzone *mz = NULL;
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	unsigned numa_socket = heap - mcfg->malloc_heaps;
-
-	size_t mz_size = min_size;
-	if (mz_size < block_size)
-		mz_size = block_size;
-
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	snprintf(mz_name, sizeof(mz_name), "MALLOC_S%u_HEAP_%u",
-		     numa_socket, heap->mz_count++);
-
-	/* try getting a block. if we fail and we don't need as big a block
-	 * as given in the config, we can shrink our request and try again
-	 */
-	do {
-		mz = rte_memzone_reserve(mz_name, mz_size, numa_socket,
-					 mz_flags);
-		if (mz == NULL)
-			mz_size /= 2;
-	} while (mz == NULL && mz_size > min_size);
-	if (mz == NULL)
-		return -1;
-
 	/* allocate the memory block headers, one at end, one at start */
-	struct malloc_elem *start_elem = (struct malloc_elem *)mz->addr;
-	struct malloc_elem *end_elem = RTE_PTR_ADD(mz->addr,
-			mz_size - MALLOC_ELEM_OVERHEAD);
+	struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
+	struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr,
+			ms->len - MALLOC_ELEM_OVERHEAD);
 	end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
 
-	const unsigned elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
-	malloc_elem_init(start_elem, heap, mz, elem_size);
+	const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
+	malloc_elem_init(start_elem, heap, ms, elem_size);
 	malloc_elem_mkend(end_elem, start_elem);
 	malloc_elem_free_list_insert(start_elem);
 
-	/* increase heap total size by size of new memzone */
-	heap->total_size+=mz_size - MALLOC_ELEM_OVERHEAD;
-	return 0;
+	heap->total_size += elem_size;
 }
 
 /*
  * Iterates through the freelist for a heap to find a free element
  * which can store data of the required size and with the requested alignment.
+ * If size is 0, find the biggest available elem.
  * Returns null on failure, or pointer to element on success.
  */
 static struct malloc_elem *
-find_suitable_element(struct malloc_heap *heap, size_t size, unsigned align)
+find_suitable_element(struct malloc_heap *heap, size_t size,
+		unsigned flags, size_t align, size_t bound)
 {
 	size_t idx;
-	struct malloc_elem *elem;
+	struct malloc_elem *elem, *alt_elem = NULL;
 
 	for (idx = malloc_elem_free_list_index(size);
-		idx < RTE_HEAP_NUM_FREELISTS; idx++)
-	{
+			idx < RTE_HEAP_NUM_FREELISTS; idx++) {
 		for (elem = LIST_FIRST(&heap->free_head[idx]);
-			!!elem; elem = LIST_NEXT(elem, free_list))
-		{
-			if (malloc_elem_can_hold(elem, size, align))
-				return elem;
+				!!elem; elem = LIST_NEXT(elem, free_list)) {
+			if (malloc_elem_can_hold(elem, size, align, bound)) {
+				if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
+					return elem;
+				else
+					alt_elem = elem;
+			}
 		}
 	}
+
+	if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
+		return alt_elem;
+
 	return NULL;
 }
 
 /*
- * Main function called by malloc to allocate a block of memory from the
- * heap. It locks the free list, scans it, and adds a new memzone if the
- * scan fails. Once the new memzone is added, it re-scans and should return
+ * Main function to allocate a block of memory from the heap.
+ * It locks the free list, scans it, and adds a new memseg if the
+ * scan fails. Once the new memseg is added, it re-scans and should return
  * the new element after releasing the lock.
  */
 void *
 malloc_heap_alloc(struct malloc_heap *heap,
-		const char *type __attribute__((unused)), size_t size, unsigned align)
+		const char *type __attribute__((unused)), size_t size, unsigned flags,
+		size_t align, size_t bound)
 {
+	struct malloc_elem *elem;
+
 	size = RTE_CACHE_LINE_ROUNDUP(size);
 	align = RTE_CACHE_LINE_ROUNDUP(align);
+
 	rte_spinlock_lock(&heap->lock);
-	struct malloc_elem *elem = find_suitable_element(heap, size, align);
-	if (elem == NULL){
-		if ((malloc_heap_add_memzone(heap, size, align)) == 0)
-			elem = find_suitable_element(heap, size, align);
-	}
 
-	if (elem != NULL){
-		elem = malloc_elem_alloc(elem, size, align);
+	elem = find_suitable_element(heap, size, flags, align, bound);
+	if (elem != NULL) {
+		elem = malloc_elem_alloc(elem, size, align, bound);
 		/* increase heap's count of allocated elements */
 		heap->alloc_count++;
 	}
 	rte_spinlock_unlock(&heap->lock);
-	return elem == NULL ? NULL : (void *)(&elem[1]);
 
+	return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
 /*
@@ -207,3 +188,20 @@ malloc_heap_get_stats(const struct malloc_heap *heap,
 	return 0;
 }
 
+int
+rte_eal_malloc_heap_init(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned ms_cnt;
+	struct rte_memseg *ms;
+
+	if (mcfg == NULL)
+		return -1;
+
+	for (ms = &mcfg->memseg[0], ms_cnt = 0;
+			(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
+			ms_cnt++, ms++)
+		malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+
+	return 0;
+}
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a47136d..3ccbef0 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -53,15 +53,15 @@ malloc_get_numa_socket(void)
 }
 
 void *
-malloc_heap_alloc(struct malloc_heap *heap, const char *type,
-		size_t size, unsigned align);
+malloc_heap_alloc(struct malloc_heap *heap,	const char *type, size_t size,
+		unsigned flags, size_t align, size_t bound);
 
 int
 malloc_heap_get_stats(const struct malloc_heap *heap,
 		struct rte_malloc_socket_stats *socket_stats);
 
 int
-rte_eal_heap_memzone_init(void);
+rte_eal_malloc_heap_init(void);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index c313a57..54c2bd8 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -39,7 +39,6 @@
 
 #include <rte_memcpy.h>
 #include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
 #include <rte_branch_prediction.h>
@@ -87,7 +86,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 		return NULL;
 
 	ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
-				size, align == 0 ? 1 : align);
+				size, 0, align == 0 ? 1 : align, 0);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
@@ -98,7 +97,7 @@ rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
 			continue;
 
 		ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
-					size, align == 0 ? 1 : align);
+					size, 0, align == 0 ? 1 : align, 0);
 		if (ret != NULL)
 			return ret;
 	}
@@ -256,5 +255,5 @@ rte_malloc_virt2phy(const void *addr)
 	const struct malloc_elem *elem = malloc_elem_from_data(addr);
 	if (elem == NULL)
 		return 0;
-	return elem->mz->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->mz->addr);
+	return elem->ms->phys_addr + ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
-- 
1.9.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
  2015-06-05 14:55  2% [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros Daniel Mrzyglod
@ 2015-06-05 15:31  0% ` Dumitrescu, Cristian
  2015-06-22 20:16  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2015-06-05 15:31 UTC (permalink / raw)
  To: Mrzyglod, DanielX T, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Friday, June 5, 2015 3:55 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
> 
> Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
> meta-data fields.
> Forcing aligned accesses is not really required, so this is removing an
> unneeded constraint.
> This issue was met during testing of the new version of the ip_pipeline
> application. There is no performance impact.
> This change has no ABI impact, as the previous code that uses aligned
> accesses continues to run without any issues.
> 
> Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>


Ack-ed by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros
@ 2015-06-05 14:55  2% Daniel Mrzyglod
  2015-06-05 15:31  0% ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Daniel Mrzyglod @ 2015-06-05 14:55 UTC (permalink / raw)
  To: dev

Fix RTE_MBUF_METADATA macros to allow for unaligned accesses to
meta-data fields.
Forcing aligned accesses is not really required, so this is removing an
unneeded constraint.
This issue was met during testing of the new version of the ip_pipeline
application. There is no performance impact.
This change has no ABI impact, as the previous code that uses aligned
accesses continues to run without any issues.

Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c      |  8 --------
 lib/librte_port/rte_port.h              | 26 +++++++++++++-------------
 lib/librte_table/rte_table_array.c      |  4 +---
 lib/librte_table/rte_table_hash_ext.c   | 13 -------------
 lib/librte_table/rte_table_hash_key16.c | 24 ------------------------
 lib/librte_table/rte_table_hash_key32.c | 24 ------------------------
 lib/librte_table/rte_table_hash_key8.c  | 24 ------------------------
 lib/librte_table/rte_table_hash_lru.c   | 13 -------------
 lib/librte_table/rte_table_lpm.c        |  4 ----
 lib/librte_table/rte_table_lpm_ipv6.c   |  4 ----
 10 files changed, 14 insertions(+), 130 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 36d92c9..b777cf1 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -175,14 +175,6 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 		return -EINVAL;
 	}
 
-	/* offset_port_id */
-	if (params->offset_port_id & 0x3) {
-		RTE_LOG(ERR, PIPELINE,
-			"%s: Incorrect value for parameter offset_port_id\n",
-			__func__);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
diff --git a/lib/librte_port/rte_port.h b/lib/librte_port/rte_port.h
index d84e5a1..c3a0cca 100644
--- a/lib/librte_port/rte_port.h
+++ b/lib/librte_port/rte_port.h
@@ -54,23 +54,23 @@ extern "C" {
  * Macros to allow accessing metadata stored in the mbuf headroom
  * just beyond the end of the mbuf data structure returned by a port
  */
-#define RTE_MBUF_METADATA_UINT8(mbuf, offset)              \
-	(((uint8_t *)&(mbuf)[1])[offset])
-#define RTE_MBUF_METADATA_UINT16(mbuf, offset)             \
-	(((uint16_t *)&(mbuf)[1])[offset/sizeof(uint16_t)])
-#define RTE_MBUF_METADATA_UINT32(mbuf, offset)             \
-	(((uint32_t *)&(mbuf)[1])[offset/sizeof(uint32_t)])
-#define RTE_MBUF_METADATA_UINT64(mbuf, offset)             \
-	(((uint64_t *)&(mbuf)[1])[offset/sizeof(uint64_t)])
-
 #define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)          \
-	(&RTE_MBUF_METADATA_UINT8(mbuf, offset))
+	(&((uint8_t *) &(mbuf)[1])[offset])
 #define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)         \
-	(&RTE_MBUF_METADATA_UINT16(mbuf, offset))
+	((uint16_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
 #define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)         \
-	(&RTE_MBUF_METADATA_UINT32(mbuf, offset))
+	((uint32_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
 #define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)         \
-	(&RTE_MBUF_METADATA_UINT64(mbuf, offset))
+	((uint64_t *) RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
+
+#define RTE_MBUF_METADATA_UINT8(mbuf, offset)              \
+	(* RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT16(mbuf, offset)             \
+	(* RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT32(mbuf, offset)             \
+	(* RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset))
+#define RTE_MBUF_METADATA_UINT64(mbuf, offset)             \
+	(* RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset))
 /**@}*/
 
 /*
diff --git a/lib/librte_table/rte_table_array.c b/lib/librte_table/rte_table_array.c
index c031070..b00ca67 100644
--- a/lib/librte_table/rte_table_array.c
+++ b/lib/librte_table/rte_table_array.c
@@ -66,10 +66,8 @@ rte_table_array_create(void *params, int socket_id, uint32_t entry_size)
 	/* Check input parameters */
 	if ((p == NULL) ||
 	    (p->n_entries == 0) ||
-		(!rte_is_power_of_2(p->n_entries)) ||
-		((p->offset & 0x3) != 0)) {
+		(!rte_is_power_of_2(p->n_entries)))
 		return NULL;
-	}
 
 	/* Memory allocation */
 	total_cl_size = (sizeof(struct rte_table_array) +
diff --git a/lib/librte_table/rte_table_hash_ext.c b/lib/librte_table/rte_table_hash_ext.c
index 66e416b..73beeaf 100644
--- a/lib/librte_table/rte_table_hash_ext.c
+++ b/lib/librte_table/rte_table_hash_ext.c
@@ -149,19 +149,6 @@ check_params_create(struct rte_table_hash_ext_params *params)
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: signature_offset invalid value\n",
-			__func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: key_offset invalid value\n", __func__);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
diff --git a/lib/librte_table/rte_table_hash_key16.c b/lib/librte_table/rte_table_hash_key16.c
index f87ea0e..67a4249 100644
--- a/lib/librte_table/rte_table_hash_key16.c
+++ b/lib/librte_table/rte_table_hash_key16.c
@@ -89,18 +89,6 @@ check_params_create_lru(struct rte_table_hash_key16_lru_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature_offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key_offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE,
@@ -307,18 +295,6 @@ check_params_create_ext(struct rte_table_hash_key16_ext_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE,
diff --git a/lib/librte_table/rte_table_hash_key32.c b/lib/librte_table/rte_table_hash_key32.c
index 6790594..1fdb75d 100644
--- a/lib/librte_table/rte_table_hash_key32.c
+++ b/lib/librte_table/rte_table_hash_key32.c
@@ -89,18 +89,6 @@ check_params_create_lru(struct rte_table_hash_key32_lru_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
@@ -309,18 +297,6 @@ check_params_create_ext(struct rte_table_hash_key32_ext_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
diff --git a/lib/librte_table/rte_table_hash_key8.c b/lib/librte_table/rte_table_hash_key8.c
index 6803eb2..4dfa3c8 100644
--- a/lib/librte_table/rte_table_hash_key8.c
+++ b/lib/librte_table/rte_table_hash_key8.c
@@ -86,18 +86,6 @@ check_params_create_lru(struct rte_table_hash_key8_lru_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature_offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key_offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
@@ -300,18 +288,6 @@ check_params_create_ext(struct rte_table_hash_key8_ext_params *params) {
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid signature_offset\n", __func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: invalid key_offset\n", __func__);
-		return -EINVAL;
-	}
-
 	/* f_hash */
 	if (params->f_hash == NULL) {
 		RTE_LOG(ERR, TABLE, "%s: f_hash function pointer is NULL\n",
diff --git a/lib/librte_table/rte_table_hash_lru.c b/lib/librte_table/rte_table_hash_lru.c
index c9a8afd..b5393f0 100644
--- a/lib/librte_table/rte_table_hash_lru.c
+++ b/lib/librte_table/rte_table_hash_lru.c
@@ -126,19 +126,6 @@ check_params_create(struct rte_table_hash_lru_params *params)
 		return -EINVAL;
 	}
 
-	/* signature offset */
-	if ((params->signature_offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: signature_offset invalid value\n",
-			__func__);
-		return -EINVAL;
-	}
-
-	/* key offset */
-	if ((params->key_offset & 0x7) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: key_offset invalid value\n", __func__);
-		return -EINVAL;
-	}
-
 	return 0;
 }
 
diff --git a/lib/librte_table/rte_table_lpm.c b/lib/librte_table/rte_table_lpm.c
index 64c684d..3f60672 100644
--- a/lib/librte_table/rte_table_lpm.c
+++ b/lib/librte_table/rte_table_lpm.c
@@ -87,10 +87,6 @@ rte_table_lpm_create(void *params, int socket_id, uint32_t entry_size)
 			__func__);
 		return NULL;
 	}
-	if ((p->offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: Invalid offset\n", __func__);
-		return NULL;
-	}
 
 	entry_size = RTE_ALIGN(entry_size, sizeof(uint64_t));
 
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index ce4ddc0..df83ecf 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -93,10 +93,6 @@ rte_table_lpm_ipv6_create(void *params, int socket_id, uint32_t entry_size)
 			__func__);
 		return NULL;
 	}
-	if ((p->offset & 0x3) != 0) {
-		RTE_LOG(ERR, TABLE, "%s: Invalid offset\n", __func__);
-		return NULL;
-	}
 
 	entry_size = RTE_ALIGN(entry_size, sizeof(uint64_t));
 
-- 
2.1.0

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size
  2015-06-05  6:21  3%     ` Zhang, Helin
@ 2015-06-05 10:30  0%       ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2015-06-05 10:30 UTC (permalink / raw)
  To: Zhang, Helin; +Cc: dev

On Fri, Jun 05, 2015 at 06:21:52AM +0000, Zhang, Helin wrote:
> Hi Neil
> 
> Yes, thank you very much for the comments!
> I realized the ABI issue after I sent out the patch. I think even I put the new one the end of this structure, it may also have issue.
> I'd like to have this change announced and then get it merged. That means I'd like to make this change and follow the policy and process.
> 
> Regards,
> Helin
> 

Ok, sounds good.

Thanks!
Neil

> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Thursday, June 4, 2015 9:05 PM
> > To: Zhang, Helin
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key
> > size
> > 
> > On Thu, Jun 04, 2015 at 09:00:33AM +0800, Helin Zhang wrote:
> > > To support querying hash key size per port, an new field of
> > > 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> > > hash key size in bytes.
> > >
> > > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 16dbe00..004b05a 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -916,6 +916,7 @@ struct rte_eth_dev_info {
> > >  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> > >  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> > >  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> > > +	uint8_t hash_key_size; /**< Hash key size in bytes */
> > >  	uint16_t reta_size;
> > >  	/**< Device redirection table size, the total number of entries. */
> > >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> > > --
> > > 1.9.3
> > >
> > >
> > 
> > You'll need to at least move this to the end of the structure to avoid ABI breakage,
> > but even then, since the examples statically allocate this struct on the stack, you
> > need to worry about previously compiled applications not having enough space
> > allocated.  Is there a hole in the struct that this can fit into to avoid changing the
> > other member offsets?
> > Neil
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD
  2015-06-05  8:19  4% ` [dpdk-dev] [PATCH v11 " Cunming Liang
  2015-06-05  8:20  2%   ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
  2015-06-05  8:20 11%   ` [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue Cunming Liang
@ 2015-06-05  8:59  0%   ` Zhou, Danny
  2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  3 siblings, 0 replies; 200+ results
From: Zhou, Danny @ 2015-06-05  8:59 UTC (permalink / raw)
  To: Liang, Cunming, dev; +Cc: shemming, Wang, Liang-min

Acked-by: Danny Zhou <danny.zhou@intel.com>

> -----Original Message-----
> From: Liang, Cunming
> Sent: Friday, June 05, 2015 4:20 PM
> To: dev@dpdk.org
> Cc: shemming@brocade.com; david.marchand@6wind.com; thomas.monjalon@6wind.com; Zhou, Danny; Wang, Liang-min;
> Richardson, Bruce; Liu, Yong; nhorman@tuxdriver.com; Liang, Cunming
> Subject: [PATCH v11 00/13] Interrupt mode PMD
> 
> v11 changes
>  - typo cleanup and check kernel style
> 
> v10 changes
>  - code rework to return actual error code
>  - bug fix for lsc when using uio_pci_generic
> 
> v9 changes
>  - code rework to fix open comment
>  - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
>  - new patch to turn off the feature by default so as to avoid v2.1 abi broken
> 
> v8 changes
>  - remove condition check for only vfio-msix
>  - add multiplex intr support when only one intr vector allowed
>  - lsc and rxq interrupt runtime enable decision
>  - add safe event delete while the event wakeup execution happens
> 
> v7 changes
>  - decouple epoll event and intr operation
>  - add condition check in the case intr vector is disabled
>  - renaming some APIs
> 
> v6 changes
>  - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
>  - rewrite rte_intr_rx_wait/rte_intr_rx_set.
>  - using vector number instead of queue_id as interrupt API params.
>  - patch reorder and split.
> 
> v5 changes
>  - Rebase the patchset onto the HEAD
>  - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
>  - Export wait-for-rx interrupt function for shared libraries
>  - Split-off a new patch file for changed struct rte_intr_handle that
>    other patches depend on, to avoid breaking git bisect
>  - Change sample applicaiton to accomodate EAL function spec change
>    accordingly
> 
> v4 changes
>  - Export interrupt enable/disable functions for shared libraries
>  - Adjust position of new-added structure fields and functions to
>    avoid breaking ABI
> 
> v3 changes
>  - Add return value for interrupt enable/disable functions
>  - Move spinlok from PMD to L3fwd-power
>  - Remove unnecessary variables in e1000_mac_info
>  - Fix miscelleous review comments
> 
> v2 changes
>  - Fix compilation issue in Makefile for missed header file.
>  - Consolidate internal and community review comments of v1 patch set.
> 
> The patch series introduce low-latency one-shot rx interrupt into DPDK with
> polling and interrupt mode switch control example.
> 
> DPDK userspace interrupt notification and handling mechanism is based on UIO
> with below limitation:
> 1) It is designed to handle LSC interrupt only with inefficient suspended
>    pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
>    which then wakes up DPDK polling thread). In this way, it introduces
>    non-deterministic wakeup latency for DPDK polling thread as well as packet
>    latency if it is used to handle Rx interrupt.
> 2) UIO only supports a single interrupt vector which has to been shared by
>    LSC interrupt and interrupts assigned to dedicated rx queues.
> 
> This patchset includes below features:
> 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
> 2) Build on top of the VFIO mechanism instead of UIO, so it could support
>    up to 64 interrupt vectors for rx queue interrupts.
> 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
>    VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
>    user space.
> 4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
>    switch algorithms in L3fwd-power example.
> 
> Known limitations:
> 1) It does not work for UIO due to a single interrupt eventfd shared by LSC
>    and rx queue interrupt handlers causes a mess. [FIXED]
> 2) LSC interrupt is not supported by VF driver, so it is by default disabled
>    in L3fwd-power now. Feel free to turn in on if you want to support both LSC
>    and rx queue interrupts on a PF.
> 
> Cunming Liang (13):
>   eal/linux: add interrupt vectors support in intr_handle
>   eal/linux: add rte_epoll_wait/ctl support
>   eal/linux: add API to set rx interrupt event monitor
>   eal/linux: fix comments typo on vfio msi
>   eal/linux: add interrupt vectors handling on VFIO
>   eal/linux: standalone intr event fd create support
>   eal/linux: fix lsc read error in uio_pci_generic
>   eal/bsd: dummy for new intr definition
>   ethdev: add rx intr enable, disable and ctl functions
>   ixgbe: enable rx queue interrupts for both PF and VF
>   igb: enable rx queue interrupts for PF
>   l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
>     switch
>   abi: fix v2.1 abi broken issue
> 
>  drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
>  drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
>  examples/l3fwd-power/main.c                        | 206 ++++++--
>  lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
>  .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
>  lib/librte_ether/rte_ethdev.c                      | 109 +++++
>  lib/librte_ether/rte_ethdev.h                      | 132 ++++++
>  lib/librte_ether/rte_ether_version.map             |   4 +
>  13 files changed, 1853 insertions(+), 125 deletions(-)
> 
> --
> 1.8.1.4

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue
  2015-06-05  8:19  4% ` [dpdk-dev] [PATCH v11 " Cunming Liang
  2015-06-05  8:20  2%   ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
@ 2015-06-05  8:20 11%   ` Cunming Liang
  2015-06-05  8:59  0%   ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Zhou, Danny
  2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
  3 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-05  8:20 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

RTE_EAL_RX_INTR will be removed from v2.2. It's only used to avoid ABI(unannounced) broken in v2.1.
The users should make sure understand the impact before turning on the feature.
There are two abi changes required in this interrupt patch set.
They're 1) struct rte_intr_handle; 2) struct rte_intr_conf.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 v9 Acked-by: vincent jardin <vincent.jardin@6wind.com>

 drivers/net/e1000/igb_ethdev.c                     | 28 ++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 41 ++++++++++++-
 examples/l3fwd-power/main.c                        |  3 +-
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  7 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 12 ++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 68 +++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.c                      |  2 +
 lib/librte_ether/rte_ethdev.h                      | 32 +++++++++-
 8 files changed, 182 insertions(+), 11 deletions(-)

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index bbd7b74..6f29222 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -96,7 +96,9 @@ static int  eth_igb_flow_ctrl_get(struct rte_eth_dev *dev,
 static int  eth_igb_flow_ctrl_set(struct rte_eth_dev *dev,
 				struct rte_eth_fc_conf *fc_conf);
 static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev);
 static int eth_igb_interrupt_action(struct rte_eth_dev *dev);
 static void eth_igb_interrupt_handler(struct rte_intr_handle *handle,
@@ -199,11 +201,15 @@ static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
 				uint8_t index, uint8_t offset);
+#endif
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -760,7 +766,9 @@ eth_igb_start(struct rte_eth_dev *dev)
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int ret, mask;
 	uint32_t ctrl_ext;
 
@@ -801,6 +809,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	igb_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -818,6 +827,7 @@ eth_igb_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for rx interrupt */
 	eth_igb_configure_msix_intr(dev);
@@ -913,9 +923,11 @@ eth_igb_start(struct rte_eth_dev *dev)
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		eth_igb_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1007,12 +1019,14 @@ eth_igb_stop(struct rte_eth_dev *dev)
 	}
 	filter_info->twotuple_mask = 0;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
@@ -1020,7 +1034,9 @@ eth_igb_close(struct rte_eth_dev *dev)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_link link;
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	eth_igb_stop(dev);
 	e1000_phy_hw_reset(hw);
@@ -1038,11 +1054,13 @@ eth_igb_close(struct rte_eth_dev *dev)
 
 	igb_dev_clear_queues(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 
 	memset(&link, 0, sizeof(link));
 	rte_igb_dev_atomic_write_link_status(dev, &link);
@@ -1867,6 +1885,7 @@ eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 /*
  * It clears the interrupt causes and enables the interrupt.
  * It will be called once only during nic initialized.
@@ -1894,6 +1913,7 @@ static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and gets interrupt causes, check it and set a bit flag
@@ -3750,6 +3770,7 @@ eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eth_igb_write_ivar(struct e1000_hw *hw, uint8_t  msix_vector,
 			uint8_t index, uint8_t offset)
@@ -3791,6 +3812,7 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 					((queue & 0x1) << 4) + 8 * direction);
 	}
 }
+#endif
 
 /*
  * Sets up the hardware to generate MSI-X interrupts properly
@@ -3800,18 +3822,21 @@ eth_igb_assign_msix_vector(struct e1000_hw *hw, int8_t direction,
 static void
 eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 {
+#ifdef RTE_EAL_RX_INTR
 	int queue_id;
 	uint32_t tmpval, regval, intr_mask;
 	struct e1000_hw *hw =
 		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t vec = 0;
+#endif
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* set interrupt vector for other causes */
 	if (hw->mac.type == e1000_82575) {
 		tmpval = E1000_READ_REG(hw, E1000_CTRL_EXT);
@@ -3868,6 +3893,7 @@ eth_igb_configure_msix_intr(struct rte_eth_dev *dev)
 	}
 
 	E1000_WRITE_FLUSH(hw);
+#endif
 }
 
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index bcec971..3a70af6 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -174,7 +174,9 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+#ifdef RTE_EAL_RX_INTR
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
+#endif
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
 static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
@@ -210,8 +212,10 @@ static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 		uint16_t queue_id);
 static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 		 uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		 uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbevf_configure_msix(struct rte_eth_dev *dev);
 
 /* For Eth VMDQ APIs support */
@@ -234,8 +238,10 @@ static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
 static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
 					uint16_t queue_id);
+#ifdef RTE_EAL_RX_INTR
 static void ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 				uint8_t queue, uint8_t msix_vector);
+#endif
 static void ixgbe_configure_msix(struct rte_eth_dev *dev);
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
@@ -1481,7 +1487,9 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	struct ixgbe_vf_info *vfinfo =
 		*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	int err, link_up = 0, negotiate = 0;
 	uint32_t speed = 0;
 	int mask = 0;
@@ -1514,6 +1522,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 	/* configure PF module if SRIOV enabled */
 	ixgbe_pf_host_configure(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -1532,6 +1541,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
+#endif
 
 	/* confiugre msix for sleep until rx interrupt */
 	ixgbe_configure_msix(dev);
@@ -1619,9 +1629,11 @@ skip_link_setup:
 				     " no intr multiplex\n");
 	}
 
+#ifdef RTE_EAL_RX_INTR
 	/* check if rxq interrupt is enabled */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		ixgbe_dev_rxq_interrupt_setup(dev);
+#endif
 
 	/* enable uio/vfio intr/eventfd mapping */
 	rte_intr_enable(intr_handle);
@@ -1727,12 +1739,14 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 	memset(filter_info->fivetuple_mask, 0,
 		sizeof(uint32_t) * IXGBE_5TUPLE_ARRAY_SIZE);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 /*
@@ -2335,6 +2349,7 @@ ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev)
  *  - On success, zero.
  *  - On failure, a negative value.
  */
+#ifdef RTE_EAL_RX_INTR
 static int
 ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
@@ -2345,6 +2360,7 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 
 	return 0;
 }
+#endif
 
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
@@ -3127,7 +3143,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	uint32_t intr_vector = 0;
+#endif
 	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 
 	int err, mask = 0;
@@ -3160,6 +3178,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 
 	ixgbevf_dev_rxtx_start(dev);
 
+#ifdef RTE_EAL_RX_INTR
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0)
 		intr_vector = dev->data->nb_rx_queues;
@@ -3177,7 +3196,7 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 			return -ENOMEM;
 		}
 	}
-
+#endif
 	ixgbevf_configure_msix(dev);
 
 	if (dev->data->dev_conf.intr_conf.lsc != 0) {
@@ -3223,19 +3242,23 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)
 	/* disable intr eventfd mapping */
 	rte_intr_disable(intr_handle);
 
+#ifdef RTE_EAL_RX_INTR
 	/* Clean datapath event and queue/vec mapping */
 	rte_intr_efd_disable(intr_handle);
 	if (intr_handle->intr_vec != NULL) {
 		rte_free(intr_handle->intr_vec);
 		intr_handle->intr_vec = NULL;
 	}
+#endif
 }
 
 static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+#ifdef RTE_EAL_RX_INTR
 	struct rte_pci_device *pci_dev;
+#endif
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -3246,11 +3269,13 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)
 	/* reprogram the RAR[0] in case user changed it. */
 	ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 
+#ifdef RTE_EAL_RX_INTR
 	pci_dev = dev->pci_dev;
 	if (pci_dev->intr_handle.intr_vec) {
 		rte_free(pci_dev->intr_handle.intr_vec);
 		pci_dev->intr_handle.intr_vec = NULL;
 	}
+#endif
 }
 
 static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on)
@@ -3834,6 +3859,7 @@ ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 ixgbevf_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 			uint8_t queue, uint8_t msix_vector)
@@ -3902,21 +3928,25 @@ ixgbe_set_ivar_map(struct ixgbe_hw *hw, int8_t direction,
 		}
 	}
 }
+#endif
 
 static void
 ixgbevf_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t q_idx;
 	uint32_t vector_idx = 0;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* Configure all RX queues of VF */
 	for (q_idx = 0; q_idx < dev->data->nb_rx_queues; q_idx++) {
 		/* Force all queue use vector 0,
@@ -3927,6 +3957,7 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 
 	/* Configure VF Rx queue ivar */
 	ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+#endif
 }
 
 /**
@@ -3937,18 +3968,21 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
 static void
 ixgbe_configure_msix(struct rte_eth_dev *dev)
 {
+	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
+#ifdef RTE_EAL_RX_INTR
 	struct ixgbe_hw *hw =
 		IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
 	uint32_t queue_id, vec = 0;
 	uint32_t mask;
 	uint32_t gpie;
+#endif
 
 	/* won't configure msix register if no mapping is done
 	 * between intr vector and event fd */
 	if (!rte_intr_dp_is_en(intr_handle))
 		return;
 
+#ifdef RTE_EAL_RX_INTR
 	/* setup GPIE for MSI-x mode */
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
@@ -4000,6 +4034,7 @@ ixgbe_configure_msix(struct rte_eth_dev *dev)
 		  IXGBE_EIMS_LSC);
 
 	IXGBE_WRITE_REG(hw, IXGBE_EIAC, mask);
+#endif
 }
 
 static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 538bb93..3b4054c 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -239,7 +239,6 @@ static struct rte_eth_conf port_conf = {
 	},
 	.intr_conf = {
 		.lsc = 1,
-		.rxq = 1, /**< rxq interrupt feature enabled */
 	},
 };
 
@@ -889,7 +888,7 @@ main_loop(__attribute__((unused)) void *dummy)
 	}
 
 	/* add into event wait list */
-	if (port_conf.intr_conf.rxq && event_register(qconf) == 0)
+	if (event_register(qconf) == 0)
 		intr_en = 1;
 	else
 		RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
index fc2c46b..f0f6a3f 100644
--- a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_interrupts.h
@@ -49,9 +49,16 @@ enum rte_intr_handle_type {
 struct rte_intr_handle {
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	int max_intr;                    /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int *intr_vec;               /**< intr vector number array */
+#endif
 };
 
 /**
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 300ebb1..efab896 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -290,18 +290,26 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
 
 	irq_set = (struct vfio_irq_set *) irq_set_buf;
 	irq_set->argsz = len;
+#ifdef RTE_EAL_RX_INTR
 	if (!intr_handle->max_intr)
 		intr_handle->max_intr = 1;
 	else if (intr_handle->max_intr > RTE_MAX_RXTX_INTR_VEC_ID)
 		intr_handle->max_intr = RTE_MAX_RXTX_INTR_VEC_ID + 1;
 
 	irq_set->count = intr_handle->max_intr;
+#else
+	irq_set->count = 1;
+#endif
 	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
 	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
 	irq_set->start = 0;
 	fd_ptr = (int *) &irq_set->data;
+#ifdef RTE_EAL_RX_INTR
 	memcpy(fd_ptr, intr_handle->efds, sizeof(intr_handle->efds));
 	fd_ptr[intr_handle->max_intr - 1] = intr_handle->fd;
+#else
+	fd_ptr[0] = intr_handle->fd;
+#endif
 
 	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
@@ -876,6 +884,7 @@ rte_eal_intr_init(void)
 	return -ret;
 }
 
+#ifdef RTE_EAL_RX_INTR
 static void
 eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 {
@@ -919,6 +928,7 @@ eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
 		return;
 	} while (1);
 }
+#endif
 
 static int
 eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
@@ -1057,6 +1067,7 @@ rte_epoll_ctl(int epfd, int op, int fd,
 	return 0;
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 		int op, unsigned int vec, void *data)
@@ -1168,3 +1179,4 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
 	intr_handle->nb_efd = 0;
 	intr_handle->max_intr = 0;
 }
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 912cc50..a2056bd 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -38,6 +38,10 @@
 #ifndef _RTE_LINUXAPP_INTERRUPTS_H_
 #define _RTE_LINUXAPP_INTERRUPTS_H_
 
+#ifndef RTE_EAL_RX_INTR
+#include <rte_common.h>
+#endif
+
 #define RTE_MAX_RXTX_INTR_VEC_ID     32
 
 enum rte_intr_handle_type {
@@ -86,12 +90,19 @@ struct rte_intr_handle {
 	};
 	int fd;	 /**< interrupt event file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
+#ifdef RTE_EAL_RX_INTR
+	/**
+	 * RTE_EAL_RX_INTR will be removed from v2.2.
+	 * It's only used to avoid ABI(unannounced) broken in v2.1.
+	 * Make sure being aware of the impact before turning on the feature.
+	 */
 	uint32_t max_intr;               /**< max interrupt requested */
 	uint32_t nb_efd;                 /**< number of available efds */
 	int efds[RTE_MAX_RXTX_INTR_VEC_ID];  /**< intr vectors/efds mapping */
 	struct rte_epoll_event elist[RTE_MAX_RXTX_INTR_VEC_ID];
 					 /**< intr vector epoll event */
 	int *intr_vec;                   /**< intr vector number array */
+#endif
 };
 
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
@@ -162,9 +173,23 @@ rte_intr_tls_epfd(void);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
 		int epfd, int op, unsigned int vec, void *data);
+#else
+static inline int
+rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
+		int epfd, int op, unsigned int vec, void *data)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(vec);
+	RTE_SET_USED(data);
+	return -ENOTSUP;
+}
+#endif
 
 /**
  * It enables the fastpath event fds if it's necessary.
@@ -179,8 +204,18 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
+#else
+static inline int
+rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd)
+{
+	RTE_SET_USED(intr_handle);
+	RTE_SET_USED(nb_efd);
+	return 0;
+}
+#endif
 
 /**
  * It disable the fastpath event fds.
@@ -189,8 +224,17 @@ rte_intr_efd_enable(struct rte_intr_handle *intr_handle, uint32_t nb_efd);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
-void
+#ifdef RTE_EAL_RX_INTR
+extern void
 rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
+#else
+static inline void
+rte_intr_efd_disable(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return;
+}
+#endif
 
 /**
  * The fastpath interrupt is enabled or not.
@@ -198,11 +242,20 @@ rte_intr_efd_disable(struct rte_intr_handle *intr_handle);
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
 {
 	return !(!intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 0;
+}
+#endif
 
 /**
  * The interrupt handle instance allows other cause or not.
@@ -211,10 +264,19 @@ rte_intr_dp_is_en(struct rte_intr_handle *intr_handle)
  * @param intr_handle
  *   Pointer to the interrupt handle.
  */
+#ifdef RTE_EAL_RX_INTR
 static inline int
 rte_intr_allow_others(struct rte_intr_handle *intr_handle)
 {
 	return !!(intr_handle->max_intr - intr_handle->nb_efd);
 }
+#else
+static inline int
+rte_intr_allow_others(struct rte_intr_handle *intr_handle)
+{
+	RTE_SET_USED(intr_handle);
+	return 1;
+}
+#endif
 
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 27a87f5..3f6e1f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3281,6 +3281,7 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
 
+#ifdef RTE_EAL_RX_INTR
 int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
 {
@@ -3352,6 +3353,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 
 	return 0;
 }
+#endif
 
 int
 rte_eth_dev_rx_intr_enable(uint8_t port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index c199d32..8bea68d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,8 +830,10 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+#ifdef RTE_EAL_RX_INTR
 	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
 	uint16_t rxq;
+#endif
 };
 
 /**
@@ -2943,8 +2945,20 @@ int rte_eth_dev_rx_intr_disable(uint8_t port_id,
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * RX Interrupt control per queue.
@@ -2967,9 +2981,23 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
  *   - On success, zero.
  *   - On failure, a negative value.
  */
-int
+#ifdef RTE_EAL_RX_INTR
+extern int
 rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
 			  int epfd, int op, void *data);
+#else
+static inline int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	RTE_SET_USED(port_id);
+	RTE_SET_USED(queue_id);
+	RTE_SET_USED(epfd);
+	RTE_SET_USED(op);
+	RTE_SET_USED(data);
+	return -1;
+}
+#endif
 
 /**
  * Turn on the LED on the Ethernet device.
-- 
1.8.1.4

^ permalink raw reply	[relevance 11%]

* [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions
  2015-06-05  8:19  4% ` [dpdk-dev] [PATCH v11 " Cunming Liang
@ 2015-06-05  8:20  2%   ` Cunming Liang
  2015-06-05  8:20 11%   ` [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue Cunming Liang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-05  8:20 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

The patch adds two dev_ops functions to enable and disable rx queue interrupts.
In addtion, it adds rte_eth_dev_rx_intr_ctl/rx_intr_q to support per port or per queue rx intr event set.

Signed-off-by: Danny Zhou <danny.zhou@intel.com>
Signed-off-by: Cunming Liang <cunming.liang@intel.com>

fix by http://www.dpdk.org/dev/patchwork/patch/4784/
---
v9 changes
 - remove unnecessary check after rte_eth_dev_is_valid_port.
   the same as http://www.dpdk.org/dev/patchwork/patch/4784

v8 changes
 - add addtion check for EEXIT

v7 changes
 - remove rx_intr_vec_get
 - add rx_intr_ctl and rx_intr_ctl_q

v6 changes
 - add rx_intr_vec_get to retrieve the vector num of the queue.

v5 changes
 - Rebase the patchset onto the HEAD

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Put new functions at the end of eth_dev_ops to avoid breaking ABI

v3 changes
 - Add return value for interrupt enable/disable functions

 lib/librte_ether/rte_ethdev.c          | 107 +++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 104 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   4 ++
 3 files changed, 215 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5a94654..27a87f5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3280,6 +3280,113 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 	}
 	rte_spinlock_unlock(&rte_eth_dev_cb_lock);
 }
+
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
+		vec = intr_handle->intr_vec[qid];
+		rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+		if (rc && rc != -EEXIST) {
+			PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+					" op %d epfd %d vec %u\n",
+					port_id, qid, op, epfd, vec);
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data)
+{
+	uint32_t vec;
+	struct rte_eth_dev *dev;
+	struct rte_intr_handle *intr_handle;
+	int rc;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%u\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+	if (queue_id >= dev->data->nb_rx_queues) {
+		PMD_DEBUG_TRACE("Invalid RX queue_id=%u\n", queue_id);
+		return -EINVAL;
+	}
+
+	intr_handle = &dev->pci_dev->intr_handle;
+	if (!intr_handle->intr_vec) {
+		PMD_DEBUG_TRACE("RX Intr vector unset\n");
+		return -EPERM;
+	}
+
+	vec = intr_handle->intr_vec[queue_id];
+	rc = rte_intr_rx_ctl(intr_handle, epfd, op, vec, data);
+	if (rc && rc != -EEXIST) {
+		PMD_DEBUG_TRACE("p %u q %u rx ctl error"
+				" op %d epfd %d vec %u\n",
+				port_id, queue_id, op, epfd, vec);
+		return rc;
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			   uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);
+}
+
+int
+rte_eth_dev_rx_intr_disable(uint8_t port_id,
+			    uint16_t queue_id)
+{
+	struct rte_eth_dev *dev;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		return -ENODEV;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
+	return (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
+}
+
 #ifdef RTE_NIC_BYPASS
 int rte_eth_dev_bypass_init(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16dbe00..c199d32 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -830,6 +830,8 @@ struct rte_eth_fdir {
 struct rte_intr_conf {
 	/** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
 	uint16_t lsc;
+	/** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
+	uint16_t rxq;
 };
 
 /**
@@ -1035,6 +1037,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
 				    const struct rte_eth_txconf *tx_conf);
 /**< @internal Setup a transmit queue of an Ethernet device. */
 
+typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
+
+typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
+				    uint16_t rx_queue_id);
+/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
+
 typedef void (*eth_queue_release_t)(void *queue);
 /**< @internal Release memory resources allocated by given RX/TX queue. */
 
@@ -1386,6 +1396,10 @@ struct eth_dev_ops {
 	/** Get current RSS hash configuration. */
 	rss_hash_conf_get_t rss_hash_conf_get;
 	eth_filter_ctrl_t              filter_ctrl;          /**< common filter control*/
+
+	/** Enable/disable Rx queue interrupt. */
+	eth_rx_enable_intr_t       rx_queue_intr_enable; /**< Enable Rx queue interrupt. */
+	eth_rx_disable_intr_t      rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/
 };
 
 /**
@@ -2868,6 +2882,96 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 				enum rte_eth_event_type event);
 
 /**
+ * When there is no rx packet coming in Rx Queue for a long time, we can
+ * sleep lcore related to RX Queue for power saving, and enable rx interrupt
+ * to be triggered when rx packect arrives.
+ *
+ * The rte_eth_dev_rx_intr_enable() function enables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_enable(uint8_t port_id,
+			       uint16_t queue_id);
+
+/**
+ * When lcore wakes up from rx interrupt indicating packet coming, disable rx
+ * interrupt and returns to polling mode.
+ *
+ * The rte_eth_dev_rx_intr_disable() function disables rx queue
+ * interrupt on specific rx queue of a port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if underlying hardware OR driver doesn't support
+ *     that operation.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_rx_intr_disable(uint8_t port_id,
+				uint16_t queue_id);
+
+/**
+ * RX Interrupt control per port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data);
+
+/**
+ * RX Interrupt control per queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the receive queue from which to retrieve input packets.
+ *   The value must be in the range [0, nb_rx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param epfd
+ *   Epoll instance fd which the intr vector associated to.
+ *   Using RTE_EPOLL_PER_THREAD allows to use per thread epoll instance.
+ * @param op
+ *   The operation be performed for the vector.
+ *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
+ * @param data
+ *   User raw data.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t queue_id,
+			  int epfd, int op, void *data);
+
+/**
  * Turn on the LED on the Ethernet device.
  * This function turns on the LED on the Ethernet device.
  *
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index a2d25a6..2799b99 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -48,6 +48,10 @@ DPDK_2.0 {
 	rte_eth_dev_rss_hash_update;
 	rte_eth_dev_rss_reta_query;
 	rte_eth_dev_rss_reta_update;
+	rte_eth_dev_rx_intr_ctl;
+	rte_eth_dev_rx_intr_ctl_q;
+	rte_eth_dev_rx_intr_disable;
+	rte_eth_dev_rx_intr_enable;
 	rte_eth_dev_rx_queue_start;
 	rte_eth_dev_rx_queue_stop;
 	rte_eth_dev_set_link_down;
-- 
1.8.1.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD
  @ 2015-06-05  8:19  4% ` Cunming Liang
  2015-06-05  8:20  2%   ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
                     ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Cunming Liang @ 2015-06-05  8:19 UTC (permalink / raw)
  To: dev; +Cc: shemming, liang-min.wang

v11 changes
 - typo cleanup and check kernel style 

v10 changes
 - code rework to return actual error code
 - bug fix for lsc when using uio_pci_generic

v9 changes
 - code rework to fix open comment
 - bug fix for igb lsc when both lsc and rxq are enabled in vfio-msix
 - new patch to turn off the feature by default so as to avoid v2.1 abi broken

v8 changes
 - remove condition check for only vfio-msix
 - add multiplex intr support when only one intr vector allowed
 - lsc and rxq interrupt runtime enable decision
 - add safe event delete while the event wakeup execution happens

v7 changes
 - decouple epoll event and intr operation
 - add condition check in the case intr vector is disabled
 - renaming some APIs

v6 changes
 - split rte_intr_wait_rx_pkt into two APIs 'wait' and 'set'.
 - rewrite rte_intr_rx_wait/rte_intr_rx_set.
 - using vector number instead of queue_id as interrupt API params.
 - patch reorder and split.

v5 changes
 - Rebase the patchset onto the HEAD
 - Isolate ethdev from EAL for new-added wait-for-rx interrupt function
 - Export wait-for-rx interrupt function for shared libraries
 - Split-off a new patch file for changed struct rte_intr_handle that
   other patches depend on, to avoid breaking git bisect
 - Change sample applicaiton to accomodate EAL function spec change
   accordingly

v4 changes
 - Export interrupt enable/disable functions for shared libraries
 - Adjust position of new-added structure fields and functions to
   avoid breaking ABI
 
v3 changes
 - Add return value for interrupt enable/disable functions
 - Move spinlok from PMD to L3fwd-power
 - Remove unnecessary variables in e1000_mac_info
 - Fix miscelleous review comments
 
v2 changes
 - Fix compilation issue in Makefile for missed header file.
 - Consolidate internal and community review comments of v1 patch set.
 
The patch series introduce low-latency one-shot rx interrupt into DPDK with
polling and interrupt mode switch control example.
 
DPDK userspace interrupt notification and handling mechanism is based on UIO
with below limitation:
1) It is designed to handle LSC interrupt only with inefficient suspended
   pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread
   which then wakes up DPDK polling thread). In this way, it introduces
   non-deterministic wakeup latency for DPDK polling thread as well as packet
   latency if it is used to handle Rx interrupt.
2) UIO only supports a single interrupt vector which has to been shared by
   LSC interrupt and interrupts assigned to dedicated rx queues.
 
This patchset includes below features:
1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only).
2) Build on top of the VFIO mechanism instead of UIO, so it could support
   up to 64 interrupt vectors for rx queue interrupts.
3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated
   VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in
   user space.
4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt
   switch algorithms in L3fwd-power example.

Known limitations:
1) It does not work for UIO due to a single interrupt eventfd shared by LSC
   and rx queue interrupt handlers causes a mess. [FIXED]
2) LSC interrupt is not supported by VF driver, so it is by default disabled
   in L3fwd-power now. Feel free to turn in on if you want to support both LSC
   and rx queue interrupts on a PF.

Cunming Liang (13):
  eal/linux: add interrupt vectors support in intr_handle
  eal/linux: add rte_epoll_wait/ctl support
  eal/linux: add API to set rx interrupt event monitor
  eal/linux: fix comments typo on vfio msi
  eal/linux: add interrupt vectors handling on VFIO
  eal/linux: standalone intr event fd create support
  eal/linux: fix lsc read error in uio_pci_generic
  eal/bsd: dummy for new intr definition
  ethdev: add rx intr enable, disable and ctl functions
  ixgbe: enable rx queue interrupts for both PF and VF
  igb: enable rx queue interrupts for PF
  l3fwd-power: enable one-shot rx interrupt and polling/interrupt mode
    switch
  abi: fix v2.1 abi broken issue

 drivers/net/e1000/igb_ethdev.c                     | 311 ++++++++++--
 drivers/net/ixgbe/ixgbe_ethdev.c                   | 519 ++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h                   |   4 +
 examples/l3fwd-power/main.c                        | 206 ++++++--
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  19 +
 .../bsdapp/eal/include/exec-env/rte_interrupts.h   |  81 ++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map      |   5 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 361 ++++++++++++--
 .../linuxapp/eal/include/exec-env/rte_interrupts.h | 219 +++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map    |   8 +
 lib/librte_ether/rte_ethdev.c                      | 109 +++++
 lib/librte_ether/rte_ethdev.h                      | 132 ++++++
 lib/librte_ether/rte_ether_version.map             |   4 +
 13 files changed, 1853 insertions(+), 125 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 0/4] enable mirror functionality in i40e driver
  @ 2015-06-05  8:16  3% ` Jingjing Wu
    0 siblings, 1 reply; 200+ results
From: Jingjing Wu @ 2015-06-05  8:16 UTC (permalink / raw)
  To: dev

This patch set enables mirror functionality in i40e driver, and redefines structure and macros used to configure mirror.
 
v2 changes:
 - correct comments style
 - add doc change
 
v3 changes:
 - change the mirror rule type to support bit mask and avoid ABI broken
 - fix code style

Jingjing Wu (4):
  ethdev: rename rte_eth_vmdq_mirror_conf
  ethdev: redefine the mirror type
  i40e: enable mirror functionality in i40e driver
  doc: modify the command about mirror in testpmd guide

 app/test-pmd/cmdline.c                      |  62 +++---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   8 +-
 drivers/net/i40e/i40e_ethdev.c              | 334 ++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev.h              |  23 ++
 drivers/net/ixgbe/ixgbe_ethdev.c            |  64 ++++--
 drivers/net/ixgbe/ixgbe_ethdev.h            |   4 +-
 lib/librte_ether/rte_ethdev.c               |  28 +--
 lib/librte_ether/rte_ethdev.h               |  30 +--
 8 files changed, 467 insertions(+), 86 deletions(-)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] abi: announce abi changes plan for interrupt mode
@ 2015-06-05  7:40 23% Cunming Liang
  0 siblings, 0 replies; 200+ results
From: Cunming Liang @ 2015-06-05  7:40 UTC (permalink / raw)
  To: dev, nhorman; +Cc: shemming

It announces the planned ABI changes for interrupt mode on v2.2. 
The feature will turn off by default so as to avoid v2.1 ABI broken.

Signed-off-by: Cunming Liang <cunming.liang@intel.com>
---
 doc/guides/rel_notes/abi.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/abi.rst b/doc/guides/rel_notes/abi.rst
index f00a6ee..4c9bf85 100644
--- a/doc/guides/rel_notes/abi.rst
+++ b/doc/guides/rel_notes/abi.rst
@@ -38,3 +38,4 @@ Examples of Deprecation Notices
 
 Deprecation Notices
 -------------------
+* The ABI changes are planned for struct rte_intr_handle and struct rte_eth_conf in order to support interrupt mode feature. The upcoming release 2.1 will not contain these ABI changes by default, but release 2.2 will, and no backwards compatibility is planed due to the additional interrupt mode feature enabling. Binaries using this library build prior to version 2.2 will require updating and recompilation.
-- 
1.8.1.4

^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size
  2015-06-04 13:05  3%   ` Neil Horman
@ 2015-06-05  6:21  3%     ` Zhang, Helin
  2015-06-05 10:30  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Zhang, Helin @ 2015-06-05  6:21 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

Hi Neil

Yes, thank you very much for the comments!
I realized the ABI issue after I sent out the patch. I think even I put the new one the end of this structure, it may also have issue.
I'd like to have this change announced and then get it merged. That means I'd like to make this change and follow the policy and process.

Regards,
Helin

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Thursday, June 4, 2015 9:05 PM
> To: Zhang, Helin
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key
> size
> 
> On Thu, Jun 04, 2015 at 09:00:33AM +0800, Helin Zhang wrote:
> > To support querying hash key size per port, an new field of
> > 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> > hash key size in bytes.
> >
> > Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> > ---
> >  lib/librte_ether/rte_ethdev.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 16dbe00..004b05a 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -916,6 +916,7 @@ struct rte_eth_dev_info {
> >  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
> >  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
> >  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> > +	uint8_t hash_key_size; /**< Hash key size in bytes */
> >  	uint16_t reta_size;
> >  	/**< Device redirection table size, the total number of entries. */
> >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> > --
> > 1.9.3
> >
> >
> 
> You'll need to at least move this to the end of the structure to avoid ABI breakage,
> but even then, since the examples statically allocate this struct on the stack, you
> need to worry about previously compiled applications not having enough space
> allocated.  Is there a hole in the struct that this can fit into to avoid changing the
> other member offsets?
> Neil

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 8/9] mk, scripts: remove useless blank lines
  2015-06-04 14:43  3% [dpdk-dev] [PATCH 0/9] whitespace cleanups Stephen Hemminger
@ 2015-06-04 14:43 14% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2015-06-04 14:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Signed-off-by: Stephen Hemminger <stephen@neworkplumber.org>
---
 mk/rte.extapp.mk                         | 1 -
 mk/rte.extlib.mk                         | 1 -
 mk/rte.extobj.mk                         | 1 -
 mk/toolchain/gcc/rte.toolchain-compat.mk | 1 -
 mk/toolchain/icc/rte.toolchain-compat.mk | 1 -
 scripts/gen-config-h.sh                  | 1 -
 scripts/validate-abi.sh                  | 2 --
 7 files changed, 8 deletions(-)

diff --git a/mk/rte.extapp.mk b/mk/rte.extapp.mk
index 40ff82c..b4d1ef6 100644
--- a/mk/rte.extapp.mk
+++ b/mk/rte.extapp.mk
@@ -50,4 +50,3 @@ all:
 else
 include $(RTE_SDK)/mk/rte.app.mk
 endif
-
diff --git a/mk/rte.extlib.mk b/mk/rte.extlib.mk
index ac5e84f..ba066bc 100644
--- a/mk/rte.extlib.mk
+++ b/mk/rte.extlib.mk
@@ -50,4 +50,3 @@ all:
 else
 include $(RTE_SDK)/mk/rte.lib.mk
 endif
-
diff --git a/mk/rte.extobj.mk b/mk/rte.extobj.mk
index cb2f996..253de28 100644
--- a/mk/rte.extobj.mk
+++ b/mk/rte.extobj.mk
@@ -50,4 +50,3 @@ all:
 else
 include $(RTE_SDK)/mk/rte.obj.mk
 endif
-
diff --git a/mk/toolchain/gcc/rte.toolchain-compat.mk b/mk/toolchain/gcc/rte.toolchain-compat.mk
index 05aa37f..61bb5b7 100644
--- a/mk/toolchain/gcc/rte.toolchain-compat.mk
+++ b/mk/toolchain/gcc/rte.toolchain-compat.mk
@@ -84,4 +84,3 @@ else
 		MACHINE_CFLAGS := $(filter-out -march% -mtune% -msse%,$(MACHINE_CFLAGS))
 	endif
 endif
-
diff --git a/mk/toolchain/icc/rte.toolchain-compat.mk b/mk/toolchain/icc/rte.toolchain-compat.mk
index 621afcd..4134466 100644
--- a/mk/toolchain/icc/rte.toolchain-compat.mk
+++ b/mk/toolchain/icc/rte.toolchain-compat.mk
@@ -73,4 +73,3 @@ else
 		MACHINE_CFLAGS := $(patsubst -march=%,-xSSE3,$(MACHINE_CFLAGS))
 	endif
 endif
-
diff --git a/scripts/gen-config-h.sh b/scripts/gen-config-h.sh
index d36efd6..1a2436c 100755
--- a/scripts/gen-config-h.sh
+++ b/scripts/gen-config-h.sh
@@ -42,4 +42,3 @@ sed 's,CONFIG_\(.*\)=\(.*\)$,#undef \1\
 #define \1 \2,' |
 sed 's,\# CONFIG_\(.*\) is not set$,#undef \1,'
 echo "#endif /* __RTE_CONFIG_H */"
-
diff --git a/scripts/validate-abi.sh b/scripts/validate-abi.sh
index 369ea8a..1747b8b 100755
--- a/scripts/validate-abi.sh
+++ b/scripts/validate-abi.sh
@@ -241,5 +241,3 @@ done
 git reset --hard
 log "INFO" "ABI CHECK COMPLETE.  REPORTS ARE IN compat_report directory"
 cleanup_and_exit 0
-
-
-- 
2.1.4

^ permalink raw reply	[relevance 14%]

* [dpdk-dev] [PATCH 0/9] whitespace cleanups
@ 2015-06-04 14:43  3% Stephen Hemminger
  2015-06-04 14:43 14% ` [dpdk-dev] [PATCH 8/9] mk, scripts: remove useless blank lines Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2015-06-04 14:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

From: Stephen Hemminger <shemming@brocade.com>

Ran the current code base through a script which:
  - removes trailing whitespace
  - removes space before tabs
  - removes blank lines at end of file


Stephen Hemminger (9):
  kni: fix whitespace
  eal: fix whitespace
  cmdline: fix whitespace
  vhost: fix trailing whitespace
  lib: fix misc whitespace
  app: fix whitespace
  examples: fix whitespace
  mk, scripts: remove useless blank lines
  drivers: fix whitespace

 app/cmdline_test/cmdline_test.py                   |  3 +-
 app/cmdline_test/cmdline_test_data.py              |  1 -
 app/test-pmd/csumonly.c                            |  1 -
 app/test-pmd/mempool_anon.c                        |  4 +-
 app/test-pmd/testpmd.c                             |  8 ++--
 app/test/autotest.py                               |  1 -
 app/test/autotest_data.py                          | 28 ++++++-------
 app/test/autotest_runner.py                        | 41 ++++++++++---------
 app/test/autotest_test_funcs.py                    |  1 -
 app/test/process.h                                 | 12 +++---
 app/test/test_acl.h                                |  4 +-
 app/test/test_sched.c                              |  8 ++--
 drivers/net/e1000/em_rxtx.c                        |  1 -
 drivers/net/e1000/igb_rxtx.c                       |  1 -
 drivers/net/pcap/rte_eth_pcap.c                    |  2 +-
 examples/cmdline/commands.c                        |  1 -
 .../config_files/shumway/dh89xxcc_qa_dev0.conf     |  2 -
 .../config_files/shumway/dh89xxcc_qa_dev1.conf     |  2 -
 .../config_files/stargo/dh89xxcc_qa_dev0.conf      |  2 -
 examples/kni/main.c                                |  1 -
 examples/l2fwd/main.c                              |  1 -
 examples/l3fwd-power/main.c                        |  8 ++--
 .../client_server_mp/mp_server/args.c              |  1 -
 examples/netmap_compat/lib/compat_netmap.c         |  6 +--
 examples/qos_sched/app_thread.c                    |  2 -
 examples/qos_sched/args.c                          |  1 -
 examples/qos_sched/cfg_file.c                      |  2 -
 examples/qos_sched/main.c                          |  1 -
 examples/qos_sched/main.h                          |  2 +-
 examples/qos_sched/stats.c                         |  1 -
 examples/quota_watermark/qw/init.c                 |  2 +-
 examples/quota_watermark/qw/init.h                 |  1 -
 examples/vhost/main.c                              |  9 ++---
 examples/vhost_xen/main.c                          | 16 ++++----
 examples/vhost_xen/vhost_monitor.c                 |  6 +--
 lib/librte_cmdline/cmdline_cirbuf.c                |  1 -
 lib/librte_cmdline/cmdline_parse.c                 |  1 -
 lib/librte_cmdline/cmdline_rdline.c                |  1 -
 lib/librte_compat/rte_compat.h                     |  8 ++--
 lib/librte_eal/bsdapp/contigmem/contigmem.c        |  1 -
 lib/librte_eal/bsdapp/eal/Makefile                 |  1 -
 lib/librte_eal/bsdapp/eal/eal.c                    |  1 -
 lib/librte_eal/bsdapp/eal/eal_interrupts.c         |  1 -
 lib/librte_eal/common/eal_common_hexdump.c         |  3 +-
 lib/librte_eal/common/eal_common_launch.c          |  1 -
 lib/librte_eal/common/eal_common_log.c             |  1 -
 lib/librte_eal/common/include/generic/rte_cycles.h |  8 ++--
 lib/librte_eal/common/include/rte_hexdump.h        | 20 +++++-----
 lib/librte_eal/common/include/rte_interrupts.h     |  1 -
 lib/librte_eal/common/include/rte_memory.h         |  2 +-
 lib/librte_eal/common/include/rte_pci.h            | 10 ++---
 lib/librte_eal/linuxapp/eal/Makefile               |  1 -
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  1 -
 .../linuxapp/kni/ethtool/igb/e1000_api.c           |  1 -
 .../linuxapp/kni/ethtool/igb/e1000_manage.c        |  2 -
 .../linuxapp/kni/ethtool/igb/e1000_mbx.c           |  1 -
 .../linuxapp/kni/ethtool/igb/e1000_nvm.c           |  2 -
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb.h      |  2 +-
 .../linuxapp/kni/ethtool/igb/igb_debugfs.c         |  1 -
 .../linuxapp/kni/ethtool/igb/igb_ethtool.c         | 26 ++++++------
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c |  2 +-
 .../linuxapp/kni/ethtool/igb/igb_param.c           |  3 +-
 .../linuxapp/kni/ethtool/igb/igb_procfs.c          | 46 +++++++++++-----------
 .../linuxapp/kni/ethtool/igb/igb_regtest.h         |  2 -
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_vmdq.c |  1 -
 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h  |  4 +-
 .../linuxapp/kni/ethtool/igb/kcompat_ethtool.c     | 11 +++---
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_82599.c       |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_api.c         |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_common.c      |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_main.c        |  5 ---
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_sriov.h       |  1 -
 .../linuxapp/kni/ethtool/ixgbe/ixgbe_x540.c        |  7 ++--
 .../linuxapp/kni/ethtool/ixgbe/kcompat.h           |  2 +-
 lib/librte_eal/linuxapp/kni/kni_dev.h              |  1 -
 lib/librte_eal/linuxapp/kni/kni_misc.c             |  1 -
 lib/librte_eal/linuxapp/kni/kni_vhost.c            |  3 +-
 lib/librte_eal/linuxapp/xen_dom0/dom0_mm_misc.c    |  2 +-
 lib/librte_hash/rte_fbk_hash.c                     |  1 -
 lib/librte_kni/rte_kni.h                           |  1 -
 lib/librte_lpm/rte_lpm.c                           |  1 -
 lib/librte_malloc/malloc_heap.c                    |  1 -
 lib/librte_vhost/libvirt/qemu-wrap.py              | 13 +++---
 lib/librte_vhost/vhost_rxtx.c                      |  2 +-
 mk/rte.extapp.mk                                   |  1 -
 mk/rte.extlib.mk                                   |  1 -
 mk/rte.extobj.mk                                   |  1 -
 mk/toolchain/gcc/rte.toolchain-compat.mk           |  1 -
 mk/toolchain/icc/rte.toolchain-compat.mk           |  1 -
 scripts/gen-config-h.sh                            |  1 -
 scripts/validate-abi.sh                            |  2 -
 91 files changed, 160 insertions(+), 242 deletions(-)

-- 
2.1.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size
  @ 2015-06-04 13:05  3%   ` Neil Horman
  2015-06-05  6:21  3%     ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2015-06-04 13:05 UTC (permalink / raw)
  To: Helin Zhang; +Cc: dev

On Thu, Jun 04, 2015 at 09:00:33AM +0800, Helin Zhang wrote:
> To support querying hash key size per port, an new field of
> 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> hash key size in bytes.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16dbe00..004b05a 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -916,6 +916,7 @@ struct rte_eth_dev_info {
>  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
>  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
>  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> +	uint8_t hash_key_size; /**< Hash key size in bytes */
>  	uint16_t reta_size;
>  	/**< Device redirection table size, the total number of entries. */
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> -- 
> 1.9.3
> 
> 

You'll need to at least move this to the end of the structure to avoid ABI
breakage, but even then, since the examples statically allocate this struct on
the stack, you need to worry about previously compiled applications not having
enough space allocated.  Is there a hole in the struct that this can fit into to
avoid changing the other member offsets?
Neil

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
  @ 2015-06-04 10:38  3%     ` Ananyev, Konstantin
  2015-06-12  6:06  0%       ` Zhang, Helin
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2015-06-04 10:38 UTC (permalink / raw)
  To: Zhang, Helin, dev

Hi Helin,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Helin Zhang
> Sent: Thursday, June 04, 2015 8:34 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size
> 
> To support querying hash key size per port, an new field of
> 'hash_key_size' was added in 'struct rte_eth_dev_info' for storing
> hash key size in bytes.
> 
> Signed-off-by: Helin Zhang <helin.zhang@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> v2 changes:
> * Disabled the code changes by default, to avoid breaking ABI compatibility.
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 16dbe00..bdebc87 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -916,6 +916,9 @@ struct rte_eth_dev_info {
>  	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
>  	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
>  	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> +#ifdef RTE_QUERY_HASH_KEY_SIZE
> +	uint8_t hash_key_size; /**< Hash key size in bytes */
> +#endif
>  	uint16_t reta_size;
>  	/**< Device redirection table size, the total number of entries. */
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */

Why do you need to introduce an #ifdef RTE_QUERY_HASH_KEY_SIZE
around your code?
Why not to have it always on?
Is it because of not breaking ABI for 2.1?
But here, I suppose there would be no breakage anyway:

struct rte_eth_dev_info {
...
        uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
        uint16_t reta_size;
        /**< Device redirection table size, the total number of entries. */
        /** Bit mask of RSS offloads, the bit offset also means flow type */
        uint64_t flow_type_rss_offloads;
        struct rte_eth_rxconf default_rxconf;
 

so between 'reta_size' and 'flow_type_rss_offloads', there is a 2 bytes gap.
Wonder, why not put it there?

Konstantin

> --
> 1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/6] query hash key size in byte
    @ 2015-06-04  7:33  3% ` Helin Zhang
                       ` (5 more replies)
  1 sibling, 6 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

As different hardware has different hash key sizes, querying it (in byte)
per port was asked by users. Otherwise there is no convenient way to know
the size of hash key which should be prepared.

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

Helin Zhang (6):
  ethdev: add an field for querying hash key size
  e1000: fill the hash key size
  fm10k: fill the hash key size
  i40e: fill the hash key size
  ixgbe: fill the hash key size
  app/testpmd: show the hash key size

 app/test-pmd/config.c             | 4 ++++
 drivers/net/e1000/igb_ethdev.c    | 5 +++++
 drivers/net/fm10k/fm10k_ethdev.c  | 3 +++
 drivers/net/i40e/i40e_ethdev.c    | 4 ++++
 drivers/net/i40e/i40e_ethdev_vf.c | 4 ++++
 drivers/net/ixgbe/ixgbe_ethdev.c  | 5 +++++
 lib/librte_ether/rte_ethdev.h     | 3 +++
 7 files changed, 28 insertions(+)

-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 6/6] app/testpmd: show the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (3 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  5 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

As querying hash key size in byte was supported, it can be shown
in testpmd after getting the device information if not zero.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 app/test-pmd/config.c | 4 ++++
 1 file changed, 4 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f788ed5..a9ec065 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -361,6 +361,10 @@ port_infos_display(portid_t port_id)
 
 	memset(&dev_info, 0, sizeof(dev_info));
 	rte_eth_dev_info_get(port_id, &dev_info);
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	if (dev_info.hash_key_size > 0)
+		printf("Hash key size in bytes: %u\n", dev_info.hash_key_size);
+#endif
 	if (dev_info.reta_size > 0)
 		printf("Redirection table size: %u\n", dev_info.reta_size);
 	if (!dev_info.flow_type_rss_offloads)
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 5/6] ixgbe: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
                     ` (2 preceding siblings ...)
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: show " Helin Zhang
  2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
  5 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 5 +++++
 1 file changed, 5 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0d9f9b2..b6f2574 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -116,6 +116,8 @@
 
 #define IXGBE_QUEUE_STAT_COUNTERS (sizeof(hw_stats->qprc) / sizeof(hw_stats->qprc[0]))
 
+#define IXGBE_HKEY_MAX_INDEX 10
+
 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
@@ -2052,6 +2054,9 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
 				ETH_TXQ_FLAGS_NOOFFLOADS,
 	};
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
+#endif
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
 }
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 4/6] i40e: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
    2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 2/6] e1000: fill the " Helin Zhang
@ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Helin Zhang
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    | 4 ++++
 drivers/net/i40e/i40e_ethdev_vf.c | 4 ++++
 2 files changed, 8 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index da6c0b5..63c76a6 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1540,6 +1540,10 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_SCTP_CKSUM |
 		DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
 		DEV_TX_OFFLOAD_TCP_TSO;
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
+#endif
 	dev_info->reta_size = pf->hash_lut_size;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 9f92a2f..486b394 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1641,6 +1641,10 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
 	dev_info->max_rx_pktlen = I40E_FRAME_SIZE_MAX;
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = (I40E_VFQF_HKEY_MAX_INDEX + 1) *
+						sizeof(uint32_t);
+#endif
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_64;
 	dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 2/6] e1000: fill the hash key size
  2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
  @ 2015-06-04  7:33  3%   ` Helin Zhang
  2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Helin Zhang
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Helin Zhang @ 2015-06-04  7:33 UTC (permalink / raw)
  To: dev

The correct hash key size in bytes should be filled into the
'struct rte_eth_dev_info', to support querying it.

Signed-off-by: Helin Zhang <helin.zhang@intel.com>
---
 drivers/net/e1000/igb_ethdev.c | 5 +++++
 1 file changed, 5 insertions(+)

v2 changes:
* Disabled the code changes by default, to avoid breaking ABI compatibility.

diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index e4b370d..b85b786 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -68,6 +68,8 @@
 #define IGB_DEFAULT_TX_HTHRESH      0
 #define IGB_DEFAULT_TX_WTHRESH      0
 
+#define IGB_HKEY_MAX_INDEX 10
+
 /* Bit shift and mask */
 #define IGB_4_BIT_WIDTH  (CHAR_BIT / 2)
 #define IGB_4_BIT_MASK   RTE_LEN2MASK(IGB_4_BIT_WIDTH, uint8_t)
@@ -1377,6 +1379,9 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		/* Should not happen */
 		break;
 	}
+#ifdef RTE_QUERY_HASH_KEY_SIZE
+	dev_info->hash_key_size = IGB_HKEY_MAX_INDEX * sizeof(uint32_t);
+#endif
 	dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
 	dev_info->flow_type_rss_offloads = IGB_RSS_OFFLOAD_ALL;
 
-- 
1.9.3

^ permalink raw reply	[relevance 3%]

Results 13401-13600 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2015-05-08 16:37     [dpdk-dev] [RFC PATCH 0/2] dynamic memzones Sergio Gonzalez Monroy
2015-06-06 10:32     ` [dpdk-dev] [PATCH v2 0/7] dynamic memzone Sergio Gonzalez Monroy
2015-06-06 10:32  1%   ` [dpdk-dev] [PATCH v2 2/7] eal: memzone allocated by malloc Sergio Gonzalez Monroy
2015-06-19 17:21  4%   ` [dpdk-dev] [PATCH v3 0/9] Dynamic memzone Sergio Gonzalez Monroy
2015-06-19 17:21  1%     ` [dpdk-dev] [PATCH v3 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
2015-06-19 17:21 14%     ` [dpdk-dev] [PATCH v3 8/9] doc: announce ABI change of librte_malloc Sergio Gonzalez Monroy
2015-06-25 14:05  4%   ` [dpdk-dev] [PATCH v4 0/9] Dynamic memzone Sergio Gonzalez Monroy
2015-06-25 14:05  1%     ` [dpdk-dev] [PATCH v4 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
2015-06-25 14:05 14%     ` [dpdk-dev] [PATCH v4 8/9] doc: announce ABI change of librte_malloc Sergio Gonzalez Monroy
2015-06-26 11:32  4%   ` [dpdk-dev] [PATCH v5 0/9] Dynamic memzones Sergio Gonzalez Monroy
2015-06-26 11:32  1%     ` [dpdk-dev] [PATCH v5 2/9] eal: memzone allocated by malloc Sergio Gonzalez Monroy
2015-05-11  3:46     [dpdk-dev] [PATCH 0/6] extend flow director to support L2_paylod type and VF filtering in i40e driver Jingjing Wu
2015-05-11  3:46     ` [dpdk-dev] [PATCH 3/6] ethdev: extend struct to support flow director in VFs Jingjing Wu
2015-06-12 16:45  3%   ` Thomas Monjalon
2015-06-15  7:14  3%     ` Wu, Jingjing
2015-06-16  3:43  3% ` [dpdk-dev] [PATCH v2 0/4] extend flow director to support L2_paylod type Jingjing Wu
2015-06-26  2:26  0%   ` Xu, HuilongX
2015-06-26  3:14  0%   ` Zhang, Helin
2015-05-11 23:45     [dpdk-dev] [RFC PATCH 0/2] ethdev: add port speed capability bitmap Marc Sune
2015-05-29 18:23     ` [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info Thomas Monjalon
2015-06-08  8:50       ` Marc Sune
2015-06-11  9:08         ` Thomas Monjalon
2015-06-11 14:35  3%       ` Marc Sune
2015-05-21  7:49     [dpdk-dev] [PATCH 0/6] Support multiple queues in vhost Ouyang Changchun
2015-06-10  5:52     ` [dpdk-dev] [PATCH v2 0/7] " Ouyang Changchun
2015-06-10  5:52       ` [dpdk-dev] [PATCH v2 2/7] lib_vhost: Support multiple queues in virtio dev Ouyang Changchun
2015-06-11  9:54  5%     ` Panu Matilainen
2015-06-15  7:56       ` [dpdk-dev] [PATCH v3 0/9] Support multiple queues in vhost Ouyang Changchun
2015-06-15  7:56         ` [dpdk-dev] [PATCH v3 2/9] lib_vhost: Support multiple queues in virtio dev Ouyang Changchun
2015-06-18 13:34  3%       ` Flavio Leitner
2015-06-19  1:17  3%         ` Ouyang, Changchun
2015-05-22  8:44     [dpdk-dev] [PATCH v5 00/18] unified packet type Helin Zhang
2015-06-01  7:33     ` [dpdk-dev] [PATCH v6 " Helin Zhang
2015-06-01  7:33       ` [dpdk-dev] [PATCH v6 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
2015-06-01  8:14         ` Olivier MATZ
2015-06-02 13:27           ` O'Driscoll, Tim
2015-06-10 14:32  4%         ` Olivier MATZ
2015-06-10 14:51  0%           ` Zhang, Helin
2015-06-10 15:39  0%           ` Ananyev, Konstantin
2015-06-12  3:22  0%             ` Zhang, Helin
2015-06-10 16:14  5%           ` Thomas Monjalon
2015-06-12  7:24  5%             ` Panu Matilainen
2015-06-12  7:43  3%               ` Zhang, Helin
2015-06-12  8:15  4%                 ` Panu Matilainen
2015-06-12  8:28  3%                   ` Zhang, Helin
2015-06-12  9:00  3%                     ` Panu Matilainen
2015-06-12  9:07  4%                     ` Bruce Richardson
2015-06-19  8:14  4%   ` [dpdk-dev] [PATCH v7 00/18] unified packet type Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 03/18] mbuf: add definitions of unified packet types Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 05/18] ixgbe: " Helin Zhang
2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 06/18] i40e: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 07/18] enic: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 08/18] vmxnet3: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 09/18] fm10k: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 10/18] app/test-pipeline: " Helin Zhang
2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 11/18] app/testpmd: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 12/18] app/test: Remove useless code Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 14/18] examples/ip_reassembly: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 15/18] examples/l3fwd-acl: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 16/18] examples/l3fwd-power: " Helin Zhang
2015-06-19  8:14  3%     ` [dpdk-dev] [PATCH v7 17/18] examples/l3fwd: " Helin Zhang
2015-06-19  8:14  4%     ` [dpdk-dev] [PATCH v7 18/18] mbuf: remove old packet type bit masks Helin Zhang
2015-06-23  1:50  4%     ` [dpdk-dev] [PATCH v8 00/18] unified packet type Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 01/18] mbuf: redefine packet_type in rte_mbuf Helin Zhang
2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 02/18] ixgbe: support unified packet type in vectorized PMD Helin Zhang
2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 03/18] mbuf: add definitions of unified packet types Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 04/18] e1000: replace bit mask based packet type with unified packet type Helin Zhang
2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 05/18] ixgbe: " Helin Zhang
2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 06/18] i40e: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 07/18] enic: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 08/18] vmxnet3: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 09/18] fm10k: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 10/18] app/test-pipeline: " Helin Zhang
2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 11/18] app/testpmd: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 12/18] app/test: Remove useless code Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 13/18] examples/ip_fragmentation: replace bit mask based packet type with unified packet type Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 14/18] examples/ip_reassembly: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 15/18] examples/l3fwd-acl: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 16/18] examples/l3fwd-power: " Helin Zhang
2015-06-23  1:50  3%       ` [dpdk-dev] [PATCH v8 17/18] examples/l3fwd: " Helin Zhang
2015-06-23  1:50  4%       ` [dpdk-dev] [PATCH v8 18/18] mbuf: remove old packet type bit masks Helin Zhang
2015-06-23 16:13  0%       ` [dpdk-dev] [PATCH v8 00/18] unified packet type Ananyev, Konstantin
2015-05-26  8:36     [dpdk-dev] [PATCH 0/5] support i40e QinQ stripping and insertion Helin Zhang
2015-06-02  3:16     ` [dpdk-dev] [PATCH v2 0/6] " Helin Zhang
2015-06-02  7:37       ` Liu, Jijiang
2015-06-08  7:32  0%     ` Cao, Min
2015-06-08  7:40  0%   ` Olivier MATZ
2015-06-11  7:03  3%   ` [dpdk-dev] [PATCH v3 0/7] " Helin Zhang
2015-06-11  7:03         ` [dpdk-dev] [PATCH v3 2/7] mbuf: use the reserved 16 bits for double vlan Helin Zhang
2015-06-25  8:31  3%       ` Zhang, Helin
2015-06-11  7:03  2%     ` [dpdk-dev] [PATCH v3 3/7] i40e: support double vlan stripping and insertion Helin Zhang
2015-06-11  7:25  0%     ` [dpdk-dev] [PATCH v3 0/7] support i40e QinQ " Wu, Jingjing
2015-05-29 15:47     [dpdk-dev] [PATCH] pmd: change initialization to indicate pci drivers Stephen Hemminger
2015-06-12  9:13  0% ` Thomas Monjalon
2015-05-30  0:37     [dpdk-dev] [PATCH 0/2] User-space Ethtool Liang-Min Larry Wang
2015-06-10 15:09     ` [dpdk-dev] [PATCH v4 0/4] " Liang-Min Larry Wang
2015-06-10 15:09       ` [dpdk-dev] [PATCH v4 1/4] ethdev: add apis to support access device info Liang-Min Larry Wang
2015-06-11 12:26         ` Ananyev, Konstantin
2015-06-11 12:57           ` Wang, Liang-min
2015-06-11 13:07             ` Ananyev, Konstantin
2015-06-11 21:51               ` Wang, Liang-min
2015-06-12 12:30                 ` Ananyev, Konstantin
2015-06-15 13:26                   ` Wang, Liang-min
2015-06-15 13:45                     ` Ananyev, Konstantin
2015-06-15 16:05                       ` David Harton (dharton)
2015-06-15 18:23  3%                     ` Ananyev, Konstantin
2015-06-17 22:22     ` [dpdk-dev] [PATCH v7 0/4] User-space Ethtool Liang-Min Larry Wang
2015-06-18  2:04  3%   ` Stephen Hemminger
2015-06-18 12:47  0%     ` Wang, Liang-min
2015-06-23 15:19  0%       ` Wang, Liang-min
2015-06-02  6:53     [dpdk-dev] [PATCH v10 00/13] Interrupt mode PMD Cunming Liang
2015-06-05  8:19  4% ` [dpdk-dev] [PATCH v11 " Cunming Liang
2015-06-05  8:20  2%   ` [dpdk-dev] [PATCH v11 09/13] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-06-05  8:20 11%   ` [dpdk-dev] [PATCH v11 13/13] abi: fix v2.1 abi broken issue Cunming Liang
2015-06-05  8:59  0%   ` [dpdk-dev] [PATCH v11 00/13] Interrupt mode PMD Zhou, Danny
2015-06-08  5:28  4%   ` [dpdk-dev] [PATCH v12 00/14] " Cunming Liang
2015-06-08  5:29  2%     ` [dpdk-dev] [PATCH v12 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-06-08  5:29 11%     ` [dpdk-dev] [PATCH v12 14/14] abi: fix v2.1 abi broken issue Cunming Liang
2015-06-09 23:59  0%     ` [dpdk-dev] [PATCH v12 00/14] Interrupt mode PMD Stephen Hemminger
2015-06-19  4:00  4%     ` [dpdk-dev] [PATCH v13 " Cunming Liang
2015-06-19  4:00  2%       ` [dpdk-dev] [PATCH v13 10/14] ethdev: add rx intr enable, disable and ctl functions Cunming Liang
2015-06-19  4:00 10%       ` [dpdk-dev] [PATCH v13 14/14] abi: fix v2.1 abi broken issue Cunming Liang
2015-06-02  7:55     [dpdk-dev] [PATCH v2 0/4] enable mirror functionality in i40e driver Jingjing Wu
2015-06-05  8:16  3% ` [dpdk-dev] [PATCH v3 " Jingjing Wu
2015-06-10  6:24       ` [dpdk-dev] [PATCH v4 " Jingjing Wu
2015-06-10  6:24         ` [dpdk-dev] [PATCH v4 1/4] ethdev: rename rte_eth_vmdq_mirror_conf Jingjing Wu
2015-06-26  7:03  5%       ` Wu, Jingjing
2015-06-04  1:00     [dpdk-dev] [PATCH 0/6] query hash key size in byte Helin Zhang
2015-06-04  1:00     ` [dpdk-dev] [PATCH 1/6] ethdev: add an field for querying hash key size Helin Zhang
2015-06-04 13:05  3%   ` Neil Horman
2015-06-05  6:21  3%     ` Zhang, Helin
2015-06-05 10:30  0%       ` Neil Horman
2015-06-04  7:33  3% ` [dpdk-dev] [PATCH v2 0/6] query hash key size in byte Helin Zhang
2015-06-04  7:33       ` [dpdk-dev] [PATCH v2 1/6] ethdev: add an field for querying hash key size Helin Zhang
2015-06-04 10:38  3%     ` Ananyev, Konstantin
2015-06-12  6:06  0%       ` Zhang, Helin
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 2/6] e1000: fill the " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 4/6] i40e: " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 5/6] ixgbe: " Helin Zhang
2015-06-04  7:33  3%   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: show " Helin Zhang
2015-06-12  7:33  4%   ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Helin Zhang
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add an field for querying hash key size Helin Zhang
2015-06-15 15:01  3%       ` Thomas Monjalon
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 2/6] e1000: fill the " Helin Zhang
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 3/6] fm10k: " Helin Zhang
2015-06-12  7:33  4%     ` [dpdk-dev] [PATCH v3 4/6] i40e: " Helin Zhang
2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 5/6] ixgbe: " Helin Zhang
2015-06-12  7:34  4%     ` [dpdk-dev] [PATCH v3 6/6] app/testpmd: show " Helin Zhang
2015-06-12  9:31  0%     ` [dpdk-dev] [PATCH v3 0/6] query hash key size in byte Ananyev, Konstantin
2015-06-04 14:43  3% [dpdk-dev] [PATCH 0/9] whitespace cleanups Stephen Hemminger
2015-06-04 14:43 14% ` [dpdk-dev] [PATCH 8/9] mk, scripts: remove useless blank lines Stephen Hemminger
2015-06-05  7:40 23% [dpdk-dev] [PATCH v1] abi: announce abi changes plan for interrupt mode Cunming Liang
2015-06-05 14:33     [dpdk-dev] [PATCH 0/6] Cuckoo hash Pablo de Lara
2015-06-05 14:33     ` [dpdk-dev] [PATCH 2/6] hash: replace existing hash library with cuckoo hash implementation Pablo de Lara
2015-06-18  9:50  4%   ` Bruce Richardson
2015-06-25 22:05  4% ` [dpdk-dev] [PATCH v2 00/11] Cuckoo hash Pablo de Lara
2015-06-25 22:05 14%   ` [dpdk-dev] [PATCH v2 10/11] doc: announce ABI change of librte_hash Pablo de Lara
2015-06-05 14:55  2% [dpdk-dev] [PATCH] lib: fix RTE_MBUF_METADATA macros Daniel Mrzyglod
2015-06-05 15:31  0% ` Dumitrescu, Cristian
2015-06-22 20:16  0%   ` Thomas Monjalon
2015-06-22 20:23  0%     ` Cyril Chemparathy
2015-06-22 20:34  0%     ` Cyril Chemparathy
2015-06-08 14:50  5% [dpdk-dev] [PATCH] doc: guidelines for library statistics Cristian Dumitrescu
2015-06-11 12:05  0% ` Thomas Monjalon
2015-06-15 21:46  4%   ` Dumitrescu, Cristian
2015-06-12  5:18     [dpdk-dev] [PATCH 0/3] do deprecation in 2.1 Stephen Hemminger
2015-06-12  5:18     ` [dpdk-dev] [PATCH 1/3] rte_ring: remove deprecated functions Stephen Hemminger
2015-06-12  5:46  3%   ` Panu Matilainen
2015-06-12 14:00  0%     ` Bruce Richardson
2015-06-12  5:18     ` [dpdk-dev] [PATCH 2/3] kni: " Stephen Hemminger
2015-06-12  6:20  3%   ` Panu Matilainen
2015-06-12 11:28  3% [dpdk-dev] [PATCH 0/4] ethdev: Add checks for function support in driver Bruce Richardson
2015-06-12 11:28     ` [dpdk-dev] [PATCH 4/4] ethdev: check support for rx_queue_count and descriptor_done fns Bruce Richardson
2015-06-12 17:32       ` Roger B. Melton
2015-06-15 10:14  4%     ` Bruce Richardson
2015-06-15 16:51     [dpdk-dev] [PATCH 0/3 v2] remove code marked as deprecated in 2.0 Stephen Hemminger
2015-06-15 16:51     ` [dpdk-dev] [PATCH 1/3] pmd_ring: remove deprecated functions Stephen Hemminger
2015-06-16 13:52  3%   ` Bruce Richardson
2015-06-16 23:05  0%     ` Stephen Hemminger
2015-06-16 23:37  0%       ` Thomas Monjalon
     [not found]           ` <2d83a4d8845f4daa90f0ccafbed918e3@BRMWP-EXMB11.corp.brocade.com>
2015-06-17  0:39  0%         ` Stephen Hemminger
2015-06-17  7:29  0%           ` Panu Matilainen
2015-06-15 16:51     ` [dpdk-dev] [PATCH 3/3] acl: mark " Stephen Hemminger
2015-06-17  7:59  4%   ` Panu Matilainen
2015-06-15 22:07  5% [dpdk-dev] [PATCH v2] doc: guidelines for library statistics Cristian Dumitrescu
2015-06-16  1:38 23% [dpdk-dev] [PATCH] abi: Announce abi changes plan for vhost-user multiple queues Ouyang Changchun
2015-06-16 10:36  5% ` Neil Horman
2015-06-16 13:14  5% [dpdk-dev] [PATCH v2] doc: guidelines for library statistics Cristian Dumitrescu
2015-06-16 13:15  5% [dpdk-dev] [PATCH v3] " Cristian Dumitrescu
2015-06-16 13:35  5% [dpdk-dev] [PATCH v4] " Cristian Dumitrescu
2015-06-16 23:29  9% [dpdk-dev] [dpdk-announce] important design choices - statistics - ABI Thomas Monjalon
2015-06-17  4:36  9% ` Matthew Hall
2015-06-17  5:28  8%   ` Stephen Hemminger
2015-06-17  8:23  7%     ` Thomas Monjalon
2015-06-17  8:23  9%     ` Marc Sune
2015-06-17 11:17  4%   ` Bruce Richardson
2015-06-18 16:32  4%     ` Dumitrescu, Cristian
2015-06-18 13:25  8%   ` Dumitrescu, Cristian
2015-06-17  9:54  8% ` Morten Brørup
2015-06-18 13:00  4%   ` Dumitrescu, Cristian
2015-06-17 10:35  9% ` Neil Horman
2015-06-17 11:06  4%   ` Richardson, Bruce
2015-06-19 11:08  7%     ` Mcnamara, John
2015-06-17 12:14  7%   ` Panu Matilainen
2015-06-17 13:21  8%     ` Vincent JARDIN
2015-06-18  8:36  4%   ` Zhang, Helin
2015-06-18 16:55  8% ` O'Driscoll, Tim
2015-06-18 21:13  4%   ` Vincent JARDIN
2015-06-19 10:26  9%   ` Neil Horman
2015-06-19 12:32  9%     ` Thomas Monjalon
2015-06-19 13:02  9%       ` Neil Horman
2015-06-19 13:16  4%         ` Thomas Monjalon
2015-06-19 15:27  9%           ` Neil Horman
2015-06-19 15:51  9%             ` Thomas Monjalon
2015-06-19 16:13  9%           ` Thomas F Herbert
2015-06-19 17:02  8%             ` Thomas Monjalon
2015-06-19 17:57  9%               ` Thomas F Herbert
2015-06-17  3:36 23% [dpdk-dev] [PATCH] abi: announce abi changes plan for struct rte_eth_fdir_flow_ext Jingjing Wu
2015-06-17  5:48 20% [dpdk-dev] [PATCH] doc: announce ABI changes planned for unified packet type Helin Zhang
2015-06-17 14:06     [dpdk-dev] rte_mbuf.next in 2nd cacheline Bruce Richardson
     [not found]     ` <0DE313B5-C9F0-4879-9D92-838ED088202C@cisco.com>
     [not found]       ` <27EA8870B328F74E88180827A0F816396BD43720@xmb-aln-x10.cisco.com>
     [not found]         ` <59AF69C657FD0841A61C55336867B5B0345592CD@IRSMSX103.ger.corp.intel.com>
     [not found]           ` <1FD9B82B8BF2CF418D9A1000154491D97450B186@ORSMSX102.amr.corp.intel.com>
     [not found]             ` <27EA8870B328F74E88180827A0F816396BD43891@xmb-aln-x10.cisco.com>
     [not found]               ` <2601191342CEEE43887BDE71AB97725836A1237C@irsmsx105.ger.corp.intel.com>
2015-06-17 18:50  3%             ` Ananyev, Konstantin
2015-06-17 16:54     [dpdk-dev] [PATCH 0/5] ethdev: add new API to retrieve RX/TX queue information Konstantin Ananyev
2015-06-17 16:54  3% ` [dpdk-dev] [PATCH 1/5] " Konstantin Ananyev
2015-06-18 13:18       ` [dpdk-dev] [PATCHv2 0/5] " Konstantin Ananyev
2015-06-18 13:18  3%     ` [dpdk-dev] [PATCHv2 1/5] " Konstantin Ananyev
2015-06-18 13:30         ` [dpdk-dev] [PATCHv2 0/5] " Walukiewicz, Miroslaw
2015-06-18 14:17  3%       ` Ananyev, Konstantin
2015-06-18 14:37  0%         ` Walukiewicz, Miroslaw
2015-06-18 13:58  3%     ` Bruce Richardson
2015-06-18 14:43  5% [dpdk-dev] [PATCH v2 1/2] Added ETH_SPEED_CAP bitmap in rte_eth_dev_info Morten Brørup
2015-06-18 15:06  0% ` Marc Sune
2015-06-18 15:33  3%   ` Thomas Monjalon
2015-06-19  3:37     [dpdk-dev] clang build failing in v2.0.0 from poisoned symbols Matthew Hall
2015-06-19  4:31     ` Matthew Hall
2015-06-19 10:15       ` Bruce Richardson
2015-06-21  2:37         ` [dpdk-dev] DPDK v2.0.0 has different rte_eal_pci_probe() behavior Matthew Hall
     [not found]           ` <CAO1kT8_C2QJUrNk-fqOQd=WmOkpvNw5jCvxEhfPdHwyCwBuyKA@mail.gmail.com>
2015-06-22  0:32  4%         ` Matthew Hall
2015-06-19  9:41     [dpdk-dev] [PATCH v5 00/13] port: added port statistics Maciej Gajdzica
2015-06-19  9:41     ` [dpdk-dev] [PATCH v5 01/13] port: added structures for port stats and config option Maciej Gajdzica
2015-06-23 13:55  3%   ` Thomas Monjalon
2015-06-23 14:30  3%     ` Dumitrescu, Cristian
2015-06-23 14:54  0%       ` Thomas Monjalon
2015-06-23 15:21  0%         ` Dumitrescu, Cristian
2015-06-23 15:16  3%     ` Neil Horman
     [not found]     <1434999524-26528-1-git-send-email-cchemparathy@ezchip.com>
2015-06-22 18:58     ` [dpdk-dev] [PATCH v2 08/12] mempool: allow config override on element alignment Cyril Chemparathy
2015-06-23  0:31  3%   ` Ananyev, Konstantin
2015-06-23 20:43  4%     ` Cyril Chemparathy
2015-06-23 21:21  0%       ` Ananyev, Konstantin
2015-06-23 19:33     [dpdk-dev] [PATCH 1/2] rte_compat.h : Clean up some typos Neil Horman
2015-06-23 19:33 14% ` [dpdk-dev] [PATCH 2/2] ABI: Add some documentation Neil Horman
2015-06-24 11:21  4%   ` Mcnamara, John
2015-06-24 18:34     ` [dpdk-dev] [PATCHv2 1/2] rte_compat.h : Clean up some typos Neil Horman
2015-06-24 18:34 14%   ` [dpdk-dev] [PATCHv2 2/2] ABI: Add some documentation Neil Horman
2015-06-24 21:09  9%     ` Thomas Monjalon
2015-06-25 11:35  9%       ` Neil Horman
2015-06-25 13:22  7%         ` Thomas Monjalon
2015-06-25  7:19  4%     ` Zhang, Helin
2015-06-25  7:42  4%       ` Gonzalez Monroy, Sergio
2015-06-25  8:00  4%         ` Gonzalez Monroy, Sergio
2015-06-25 12:25  4%       ` Neil Horman
2015-06-25 14:35     ` [dpdk-dev] [PATCHv3 1/3] rte_compat.h : Clean up some typos Neil Horman
2015-06-25 14:35 28%   ` [dpdk-dev] [PATCHv3 3/3] ABI: Add some documentation Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).